Search string pattern - c#

If I have a string like MCCORMIC 3H R Final 08-26-2011.dwg or even MCCORMIC SMITH 2N L Final 08-26-2011.dwg and I wanted to capture the R in the first string or the L in the second string in a variable, what is the best method for doing so? I was thinking about trying the below statement but it does not work.
string filename = "MCCORMIC 3H R Final 08-26-2011.dwg"
string WhichArea = "";
int WhichIndex = 0;
WhichIndex = filename.IndexOf("Final");
WhichArea = filename.Substring(WhichIndex - 1,1); //Trying to get the R in front of word Final

Just split by space:
var parts = filename.Split(new [] {' '},
StringSplitOptions.RemoveEmptyEntries);
WhichArea = parts[parts.Length - 3];
It looks like the file names have a very specific format, so this will work just fine.
Even with any number of spaces, using StringSplitOptions.RemoveEmptyEntries means spaces will not be part of the split result set.
Code updated to deal with both examples - thanks Nikola.

I had to do something similar, but with Mirostation drawings instead of Autocad. I used regex in my case. Here's what I did, just in case you feel like making it more complex.
string filename = "MCCORMIC 3H R Final 08-26-2011.dwg";
string filename2 = "MCCORMIC SMITH 2N L Final 08-26-2011.dwg";
Console.WriteLine(TheMatch(filename));
Console.WriteLine(TheMatch(filename2));
public string TheMatch(string filename) {
Regex reg = new Regex(#"[A-Za-z0-9]*\s*([A-Z])\s*Final .*\.dwg");
Match match = reg.Match(filename);
if(match.Success) {
return match.Groups[1].Value;
}
return String.Empty;
}

I don't think Oded's answer covers all cases. The first example has two words before the wanted letter, and the second one has three words before it.
My opinion is that the best way to get this letter is by using RegEx, assuming that the word Final always comes after the letter itself, separated by any number of spaces.
Here's the RegEx code:
using System.Text.RegularExpressions;
private string GetLetter(string fileName)
{
string pattern = "\S(?=\s*?Final)";
Match match = Regex.Match(fileName, pattern);
return match.Value;
}
And here's the explanation of RegEx pattern:
\S(?=\s*?Final)
\S // Anything other than whitespace
(?=\s*?Final) // Positive look-ahead
\s*? // Whitespace, unlimited number of repetitions, as few as possible.
Final // Exact text.

Related

.NET Regex - get parts of string that do not match pattern

I have this string
TEST_TEXT_ONE_20112017
I want to eliminate _20112017, which is a underscore with numbers, those numbers can vary; my goal is to have only
TEST_TEXT_ONE
So far I have this but I get the entire string, is there something I'm missing?
Regex r = new Regex(#"\b\w+[0-9]+\b");
MatchCollection words = r.Matches("TEST_TEXT_ONE_20112017");
foreach(Match word in words)
{
string w = word.Groups[0].Value;
//I still get the entire string
}
Notes for your consideration:
You should use parenthesis to mark groups for capture -or- use named group. The first group (index=0) is the entire match. you probably want index=1 instead.
\w stands for word character and it already includes both underscore and digits. If you want to match anything before the numbers then you should consider using . instead of \w.
by default +is greedy and your \w+ will consume your last undescore and all but the very last number as well. You probably want to explicitly require an underscore before last block of numbers.
I would suggest considering if you want to find a matching substring or the entire string to match. if the latter, then consider using the start and end markers: ^ and $.
if you know you want to eliminate 8 digits, then you could giving explicit count like \d{8}
For example this should work:
Regex r = new Regex(#"^(.+)_\d+$");
MatchCollection words = r.Matches("TEST_TEXT_ONE_20112017");
foreach (Match word in words)
{
string w = word.Groups[1].Value;
}
Alternative
Use a Zero-Width Positive Lookahead Assertions construct to check what comes next without capturing it. This uses the syntax on (?=stuff). So you could use a shorter code and avoid surfing in Groups altogether:
Regex r = new Regex(#"^.+(?=_\d+$)");
String result = r.Match("TEST_TEXT_ONE_20112017").Value;
Note that we require the end marker $ within the positive lookahead group.
Regex r = new Regex(#"(\b.+)_([0-9]+)\b");
String w = r.Match("TEST_TEXT_ONE_20112017").Groups[1].Value; //TEST_TEXT_ONE
or:
String w = r.Match("TEST_TEXT_ONE_20112017").Groups[2].Value; //20112017
This seems a bit overkill for Regex in my opinion. As an alternative you could just split on the _ character and rebuild the string:
private static string RemoveDate(string input)
{
string[] parts = input.Split('_');
return string.Join("_", parts.Take(parts.Length - 1));
}
Or if the date suffix is always the same length, you could also just substring:
private static string RemoveDateFixedLength(string input)
{
//Removes last 9 characters (8 for date, 1 for underscore)
return input.Substring(0, input.Length - 9);
}
However I feel like the first approach is better, this is just another option.
Fiddle here

How to split string by another string

I have this string (it's from EDI data):
ISA*ESA?ISA*ESA?
The * indicates it could be any character and can be of any length.
? indicates any single character.
Only the ISA and ESA are guaranteed not to change.
I need this split into two strings which could look like this: "ISA~this is date~ESA|" and
"ISA~this is more data~ESA|"
How do I do this in c#?
I can't use string.split, because it doesn't really have a delimeter.
You can use Regex.Split for accomplishing this
string splitStr = "|", inputStr = "ISA~this is date~ESA|ISA~this is more data~ESA|";
var regex = new Regex($#"(?<=ESA){Regex.Escape(splitStr)}(?=ISA)", RegexOptions.Compiled);
var items = regex.Split(inputStr);
foreach (var item in items) {
Console.WriteLine(item);
}
Output:
ISA~this is date~ESA
ISA~this is more data~ESA|
Note that if your string between the ISA and ESA have the same pattern that we are looking for, then you will have to find some smart way around it.
To explain the Regex a bit:
(?<=ESA) Look-behind assertion. This portion is not captured but still matched
(?=ISA) Look-ahead assertion. This portion is not captured but still matched
Using these look-around assertions you can find the correct | character for splitting
Simply use the
int x = whateverString.indexOf("?ISA"); // replace ? with the actual character here
and then just use the substring from 0 to that indexOf, indexOf to length.
Edit:
If ? is not known,
can we just use the regex Pattern and Matcher.
Matcher matcher = Patter.compile("ISA.*ESA").match(whateverString);
if(matcher.find()) {
matcher.find();
int x = matcher.start();
}
Here x would give that start index of that match.
Edit: I mistakenly saw it as java one, for C#
string pattern = #"ISA.*ESA";
Regex myRegex = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = myRegex.Match(whateverString); // m is the first match
while (m.Success)
{
Console.writeLine(m.value);
m = m.NextMatch(); // more matches
}
RegEx will probably be the best for this. See this link
Mask would be
ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.
This will give you 2 groups with data you need
Match match = Regex.Match(input, #"ISA(?<data1>.*?)ESA.ISA(?<data2>.*?)ESA.",RegexOptions.IgnoreCase);
if (match.Success)
{
var data1 = match.Groups["data1"].Value;
var data2 = match.Groups["data2"].Value;
}
Use Regex.Matches If you need multiple matches found, and specify different RegexOptions if needed.
It's kinda hacky but you could do...
string x = "ISA*ESA?ISA*ESA?";
x = x.Replace("*","~"); // OR SOME OTHER DELIMITER
string[] y = x.Split('~');
Not perfect in all situations, but it could solve your problem simply.
You could split by "ISA" and "ESA" and then put the parts back together.
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
string start = "ISA",
end = "ESA";
var splitedInput = input.Split(new[] { start, end }, StringSplitOptions.None);
var firstPart = $"{start}{splitedInput[1]}{end}{splitedInput[2]}";
var secondPart = $"{start}{splitedInput[3]}{end}{splitedInput[4]}";
firstPart = "ISA~this is date~ESA|"
secondPart = "ISA~this is more data~ESA|";
Use a Regex like ISA(.+?)ESA and select the first group
string input = "ISA~mycontent+ESA";
Match match = Regex.Match(input, #"ISA(.+?)ESA",RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
}
Instead of "splitting" by a string, I would instead describe your question as "grouping" by a string. This can easily be done using a regular expression:
Regular expression: ^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$
Explanation:
^ - asserts position at start of the string
( - start capturing group
ISA - match string ISA exactly
.*?(?=ESA) - match any character 0 or more times, positive lookahead on the
string ESA (basically match any character until the string ESA is found)
ESA - match string ESA exactly
. - match any character
) - end capturing group
repeat one more time...
$ - asserts position at end of the string
Try it on Regex101
Example:
string input = "ISA~this is date~ESA|ISA~this is more data~ESA|";
Regex regex = new Regex(#"^(ISA.*?(?=ESA)ESA.)(ISA.*?(?=ESA)ESA.)$",
RegexOptions.Compiled);
Match match = regex.Match(input);
if (match.Success)
{
string firstValue = match.Groups[1].Value; // "ISA~this is date~ESA|"
string secondValue = match.Groups[2].Value; // "ISA~this is more data~ESA|"
}
There are two answers to the question "How to split a string by another string".
var matches = input.Split(new [] { "ISA" }, StringSplitOptions.RemoveEmptyEntries);
and
var matches = Regex.Split(input, "ISA").ToList();
However, the first removes empty entries, while the second does not.

Regex Matchcollection groups

I already tried two days to solve the Problem, that I have a MatchCollection. In the patter is a Group and I want to have a list with the Solutions of the Group (there were two or more Solutions).
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "^<tr>$?<td>$?[D-M][i-r],[' '][0-3][1-9].[0-1][1-9].[0-9][0-9]$?</td>$?<td>$?([1-9][0-2]?)$?</td>$?";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
string s = groups[1].Value;
Datum2.Text = s;
}
But only the last match (2) appears in the TextBox "Datum2".
I know that I have to use e.g. a listbox, but the Groups[1].Value is a string...
Thanks for your help and time.
Dieter
First thing you need to correct in the code is Datum2.Text = s; would overwrite the text in Datum2 if it were more than one match.
Now, about your regex,
^ forces a match at the begging of the line, so there is really only 1 match. If you remove it, it'll match twice.
I can't seem to understand what was intended with $? all over the pattern (just take them out).
[' '] matches "either a quote, a space or a quote (no need to repeat characters in a character class.
All dots in [0-3][1-9].[0-1][1-9].[0-9][0-9] need to be escaped. A dot matches any character otherwise.
[0-1][1-9] matches all months except "10". The second character shoud be [0-9] (or \d).
Code:
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "<tr><td>[D-M][i-r],[' ][0-3][0-9]\\.[0-1][0-9]\\.[0-9][0-9]</td><td>([1-9][0-2]?)</td>";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
string s= "";
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
s = s + " " + groups[1].Value;
}
Datum2.Text = s;
Output:
1 2
DEMO
You should know that regex is not the tool to parse HTML. It'll work for simple cases, but for real cases do consider using HTML Agility Pack

Simple regex-matching

I have a String
String test = #"Lists/Versions/2_.000";
I'm a bit confused on how to use regex to do this.
I'm using the pattern
String pattern = #"\D+";
The msdn page for regular expression says \D is "Matches any character other than a decimal digit"
So shouldn't it be returning 'Lists/Versions/' , '2'?
However its returning
'' , '2', '000'
I would like the string to only match the 2(Or any Integer). How would I do that?
String url = #"Lists/Versions/2_.000";
String pattern = #"\D+";
string[] substrings = Regex.Split(url, pattern);
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
The reason your receiving the issue, is because the /D is to capture non digits, so it detects two separate numeric values (2 and 000) because of the _. So that is how it is grabbing the data. So you have a couple of choices:
Break the string into manageable portions, then anchor to the array.
Build a better pattern to separate.
So the question will be, what are you trying to parse? 2.00 ? Or are you trying to separate numeric numbers in your string?
I'm assuming you have a typo also:
\d Matches a digit character. Equivalent to [0-9].
\D Matches a non-digit character. Equivalent to [^0-9].
\w Matches any word character including underscore. Equivalent to
"[A-Za-z0-9_]".
\W Matches any non-word character. Equivalent to "[^A-Za-z0-9_]".
You should be able to use:
You should simply do the following:
string url = #"Lists/Versions/2_.000";
var data = Regex.Split(url, #"\D+");
Console.WriteLine(#"Value: {0} and Secondary Value: {1}", data[0], data[1]);
That should find all integer values, so it should provide an output of:
2
000
Which should return as a normal string []. My syntax or expression may be off, but you can find a nice cheat sheet for Regular Expressions here. You'll also want to ensure you check the bounds of the array.
https://dotnetfiddle.net/BU6gp2
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
String url = #"Lists/Versions/2_.000";
String pattern = #"\D+";
string[] substrings = Regex.Split(url, pattern);
Console.WriteLine("'{0}'", substrings[1]);
}
}
Please try the following:
// using System.Linq;
String url = #"Lists/Versions/2_.000";
String pattern = #"(?<=/)\d+";
string[] substrings = Regex.Matches(url, pattern)
.Cast<Match>()
.Select(_ => _.Value)
.ToArray();
foreach (string match in substrings)
{
Console.WriteLine("'{0}'", match);
}
Alternatively, if you don't need an array.
String url = #"Lists/Versions/2_.000";
String pattern = #"(?<=/)\d+";
Console.WriteLine("'{0}'", Regex.Match(url, pattern).Value);

How can I use RegEx (Or Should I) to extract a string between the starting string '__' and ending with '__' or 'nothing'

RegEx has always confused me.
I have a string like this:
IDE\DiskDJ205GA20_____________________________A3VS____\5&1003ca0&0&0.0.0
Or Sometimes stored like this:
IDE\DiskSJ305GA23_____________________________PG33S\6&2003Sa0&0&0.0.0
I want to get the 'A3VS' or 'PG33S' string. It's my firmware and is varied in length and type. I used to use:
string[] split = PNP.Split('\\'); //where PHP is my string name
var start = split[1].LastIndexOf('_');
string mystring = split[1].Substring(start + 1);
But that only works for strings that don't end with __ after the firmware string. I noticed that some have an additional random '_' after it.
Is RegEx the way to solve this? Or is there another way better
just without RegEx it can be expressed like this:
var firmware = PNP.Split(new[] {'_'}, StringSplitOptions.RemoveEmptyEntries)[1].Split('\\')[0];
string s = split[1].TrimEnd('_');
string mystring = s.Substring(s.LastIndexOf('_') + 1);
If you want the RegEX way to do it here it is:
Regex regex = new Regex(#"\\.*_+(?<firmware>[A-Za-z0-9]+)_*\\");
var m1 = regex.Match("IDE\DiskSJ305GA23_____________________________PG33S\6&2003Sa0&0&0.0.0");
var g1 = m1.Groups["firmware"].Value;
//g1 == "PG33S"
Keep in mind you have to use [A-Za-z0-9] instead of \w in the capture subexpression since \w also matches an underscore (_).

Categories

Resources