How do I extract a string of text in c# - c#

i am having trouble splitting a string in c#
have a string (text in textbox0)
start and dffdfdddddddfd<m>one</m><m>two</m><m>three</m><m>four</m>dbfjnbjvbnvbnjvbnv and end
and I want to extract the text between <m> and </m> when click in button1 and i need 3 output :
output 1 :
one two three four (output to textbox1)
output 2 :
four (output to textbox2)
output 3 :
one (output to textbox3)
what do i do ?
how would I do this?
please give me full code for button1_Click
thanks and regards.

You can try a regular expression to capture the four values in a list, either using LINQ:
List<string> results = Regex.Matches(s, "<m>(.*?)</m>")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
Or for C# 2.0:
List<string> results = new List<string>();
foreach (Match match in Regex.Matches(s, "<m>(.*?)</m>"))
{
results.Add(match.Groups[1].Value);
}
You can then use string.Join, Enumerable.First (or results[0]) and Enumerable.Last (or results[results.Length - 1]) to get the outputs you need.
If this is XML you should use an XML parser instead.

With customary warning against using Regex for XML and HTML:
You can extract text between <m> and </m> like so:
string input =
"start and dffdfdddddddfd<m>one</m><m>two</m><m>three</m><m>four</m>dbfjnbjvbnvbnjvbnv and end";
var matches = Regex.Matches(input, "<m>(.*?)</m>");
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1]);
}

using System;
using System.Linq;
using System.Xml.Linq;
class Program{
static void Main(string[] args){
string data = "start and dffdfdddddddfd<m>one</m><m>two</m><m>three</m><m>four</m>dbfjnbjvbnvbnjvbnv and end";
string xmlString = "<root>" + data + "</root>";
var doc = XDocument.Parse(xmlString);
var ie = doc.Descendants("m");
Console.Write("output1:");
foreach(var el in ie){
Console.Write(el.Value + " ");
}
Console.WriteLine("\noutput2:{0}",ie.Last().Value);
Console.WriteLine("output3:{0}",ie.First().Value);
}
}

Related

Regex get single, connected number

I am trying to filter out the addressnumber of on inputstring, but the problem is my code yet leads to unwanted results when a string with multiple numbers comes in.
Is there a possibility to tell the Regex to filter into an array or something like that to recognize if there was more than one number in the original string?
String theNumbers = String.Join(String.Empty, Regex.Matches(inputString, #"\d+").OfType<Match>().Select(m => m.Value));
I tried it on a different way now aswell, but Regex.Split generates empty Strings in the Array and just filtering them out seems a bit hacky to me.
String[] extractedNumbersArray = Regex.Split(inputString, #"\D+");
Hope this helps (online):
using System;
using System.Text.RegularExpressions;
using System.Linq;
public class Program
{
public static void Main()
{
var inputString = "1 2 3";
var values = Regex
.Matches(inputString, #"(?<nr>\d+)")
.OfType<Match>()
.Select(m => m.Groups["nr"].Value)
.ToArray();
Console.WriteLine("Multipe numbers: " + (values.Length > 1 ? "yep" : "nope"));
foreach (var v in values)
{
Console.WriteLine(v);
}
}
}

Substitute only one group when dealing with an unknown number of capturing groups

Assuming I have this input:
/green/blah/agriculture/apple/blah/
I'm only trying to capture and replace the occurrence of apple (need to replace it with orange), so I have this regex
var regex = new Regex("^/(?:green|red){1}(?:/.*)+(apple){1}(?:/.*)");
So I'm grouping sections of the input, but as non-capturing, and only capturing the one I'm concerned with. According to this $` will retrieve everything before the match in the input string, and $' will get everything after, so theoretically the following should work:
"$`Orange$'"
But it only retrieves the match ("apple").
Is it possible to do this with just substitutions and NOT match evaluators and looping through groups?
The issue is that apple can occur anywhere in that url scheme, hence an unknown number of capture groups.
Thanks.
To achieve what you want, I slightly changed your regex.
The new regex looks like this look for the updated version at the end of the answer:
What I am doing here is, I want all the other groups to become captured groups. Doing this I can use them as follow:
String replacement = "$1Orange$2";
string result = Regex.Replace(text, regex.ToString(), replacement);
I am using group 1,2 and 4 and in the middle of everything (where I suspect 'apple') I replace it with Orange.
A complete example looks like this:
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
String text = "/green/blah/agriculture/apple/blah/hallo/apple";
var regex = new Regex("^(/(?:green|red)/(?:[^/]+/)*?)apple(/.*)");
String replacement = "$1$2Orange$4";
string result = Regex.Replace(text, regex.ToString(), replacement);
Console.WriteLine(result);
}
}
And as well a running example is here
See the updated regex, I needed to change it again to capture things like this:
/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple
With the above regex it matched the last apple and not the first as prio designated. I changed the regex to this:
var regex = new Regex("^(/(?:green|red)/(?:[^/]+/)*?)apple(/.*)");
I updated the code as well as the running example.
If you really want to replace only the first occurence of apple and dont mind about the URL structure then can you use one of the following methods:
First simply use apple as regex and use the overloaded Replace method.
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
String text = "/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple";
var regex = new Regex(Regex.Escape("apple"));
String replacement = "Orange";
string result = regex.Replace(text, replacement.ToString(), 1);
Console.WriteLine(result);
}
}
See working Example
Second is the use of IndexOf and Substring which could be much quick as the use of the regex classes.
See the following Example:
class Program
{
static void Main(string[] args)
{
string search = "apple";
string text = "/green/blah/agriculture/apple/blah/hallo/apple/green/blah/agriculture/apple/blah/hallo/apple";
int idx = text.IndexOf(search);
int endIdx = idx + search.Length;
int secondStrLen = text.Length - endIdx;
if (idx != -1 && idx < text.Length && endIdx < text.Length && secondStrLen > -1)
{
string first = text.Substring(0, idx);
string second = text.Substring(endIdx, secondStrLen);
string result = first + "Orange" + second;
Console.WriteLine(result);
}
}
}
Working Example

Get only wild card value using regular expression

I want to extract only wild card tokens using regular expressions in dotnet (C#).
Like if I use pattern like Book_* (so it match directory wild card), it extract values what match with *.
For Example:
For a string "Book_1234" and pattern "Book_*"
I want to extract "1234"
For a string "Book_1234_ABC" and pattern "Book_*_*"
I should be able to extract 1234 and ABC
This should do it : (DEMO)
string input = "Book_1234_ABC";
MatchCollection matches = Regex.Matches(input, #"_([A-Za-z0-9]*)");
foreach (Match m in matches)
if (m.Success)
Console.WriteLine(m.Groups[1].Value);
The approach to your scenario would be to
Get the List of strings which appears in between the wildcard (*).
Join the lists with regexp divider (|).
replace the regular expression with char which you do not expect in your string (i suppose space should be adequate here)
trim and then split the returned string by char you used in previous step which will return you the list of wildcard characters.
var str = "Book_1234_ABC";
var inputPattern = "Book_*_*";
var patterns = inputPattern.Split('*');
if (patterns.Last().Equals(""))
patterns = patterns.Take(patterns.Length - 1).ToArray();
string expression = string.Join("|", patterns);
var wildCards = Regex.Replace(str, expression, " ").Trim().Split(' ');
I would first convert the '*' wildcard in an equivalent Regex, ie:
* becames \w+
then I use this regex to extract the matches.
When I run this code using your input strings:
using System;
using System.Text.RegularExpressions;
namespace SampleApplication
{
public class Test
{
static Regex reg = new Regex(#"Book_([^_]+)_*(.*)");
static void DoMatch(String value) {
Console.WriteLine("Input: " + value);
foreach (Match item in reg.Matches(value)) {
for (int i = 0; i < item.Groups.Count; ++i) {
Console.WriteLine(String.Format("Group: {0} = {1}", i, item.Groups[i].Value));
}
}
Console.WriteLine("\n");
}
static void Main(string[] args) {
// For a string "Book_1234" and pattern "Book_*" I want to extract "1234"
DoMatch("Book_1234");
// For a string "Book_1234_ABC" and pattern "Book_*_*" I should be able to extract 1234 and ABC
DoMatch("Book_1234_ABC");
}
}
}
I get this console output:
Input: Book_1234
Group: 0 = Book_1234
Group: 1 = 1234
Group: 2 =
Input: Book_1234_ABC
Group: 0 = Book_1234_ABC
Group: 1 = 1234
Group: 2 = ABC

How to convert CSV string to List<Enum>

I have defined enum events:
public enum Events {
UNLOCK = 1,
LOCK = 2
}
as well as CSV string:
var csv = "1,2";
What would be preferable way to convert csv string to List< Events> in C#?
csv.Split(',').Select(s => (Events)Enum.Parse(typeof(Events), s));
BTW with generic enum class you can parse this way Enum<Events>.Parse(s) and whole code will look like:
csv.Split(',').Select(Enum<Events>.Parse)
csv.Split(',').Select(x => (Events)int.Parse(x)).ToList();
A more verbose way:
var csv = "2,1,1,2,2,1";
List<Events> EventList = new List<Events>();
foreach (string s in csv.Split(','))
{
EventList.Add((Events)Enum.Parse( typeof(Events), s, true));
}
If your whole CSV line is not just events, use regex to parse out the line into its components. Here we have a mixed CSV line and regex can break out each of the values using named match captures. From there we parse it using linq into a new projection for each line of data found.
var csvLine = "1245,1,2,StackOverFlow";
var pattern = #"
^ # Start at the beginning of each line.
(?<IDasInteger>[^,]+) # First one is an ID
(?:,) # Match but don't capture the comma
((?<Events>[^,]+)(?:,)){2} # 2 Events
(?<Summary>[^\r\n]+) # Summary text";
var resultParsed = Regex.Matches(csvLine, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline)
.OfType<Match>()
.Select (mt => new {
ID = int.Parse (mt.Groups["IDasInteger"].ToString()),
Event = mt.Groups["Events"].Captures
.OfType<Capture>()
.Select (cp => (Events)Enum.Parse(typeof(Events), cp.Value))),
Summary = mt.Groups["Summary"].ToString()
});

how to detect strings that end with a number

i am trying to parse out a string and in some cases there is an extra " - [some number]" at the end. for example,
instead of showing
Technologist
it shows
Technologist - 23423
i dont want to just check or split on "-" because there are other names that do have a "-" in them
can anyone think of a clean way of removing this extra noise so:
Technologist - 23423 resolves to Technologist
This looks like a case regular expressions, such as #" - \d+$" in this case. Sample code:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main()
{
Tidy("Technologist - 12345");
Tidy("No trailing stuff");
Tidy("A-B1 - 1 - other things");
}
private static readonly Regex regex = new Regex(#"- \d+$");
static void Tidy(string text)
{
string tidied = regex.Replace(text, "");
Console.WriteLine("'{0}' => '{1}'", text, tidied);
}
}
Note that this currently doesn't spot negative numbers. If you wanted it to, you could use
new Regex(#"- -?\d+$");
Try this regular expression:
var strRegex = #"\s*-\s*\d+$";
var regex = new Regex(strRegexs);
var strTargetString = "Technologist - 23423";
var res = myRegex.Replace(strTargetString, "");
This will work on the following strings (all is evaluating to Text):
Text - 34234
Text -1
Text - 342
Text-3443

Categories

Resources