Regex pattern works in Lua but not in C# - c#

I need to use regex in C# to split up something like "21A244" where
The first two numbers can be 1-99
The letter can only be 1 letter, A-Z
The last three numbers can be 111-999
So I made this match
"([0-9]+)([A-Z])([0-9]+)"
but for some reason when used in C#, the match functions just return the input string. So I tried it in Lua, just to make sure the pattern was correct, and it works just fine there.
Here's the relevant code:
var m = Regex.Matches( mdl.roomCode, "(\\d+)([A-Z])(\\d+)" );
System.Diagnostics.Debug.Print( "Count: " + m.Count );
And here's the working Lua code in case you were wondering
local str = "21A244"
print(string.match( str, "(%d+)([A-Z])(%d+)" ))
Thank you for any help
EDIT: Found the solution
var match = Regex.Match(mdl.roomCode, "(\\d+)([A-Z])(\\d+)");
var group = match.Groups;
System.Diagnostics.Debug.Print( "Count: " + group.Count );
System.Diagnostics.Debug.Print("houseID: " + group[1].Value);
System.Diagnostics.Debug.Print("section: " + group[2].Value);
System.Diagnostics.Debug.Print("roomID: " + group[3].Value);

Firstly you should make your regex a little more specific and limit how many numbers are allowed at the beginning/end. How about:
([1-9]{1,2})([A-Z])([1-9]{1,3})
Next, the results of the captures (i.e. the 3 parts in parens) will be in the Groups property of your regex matcher object. I.e.
m.Groups[1] // First number
m.Groups[2] // Letter
m.Groups[3] // Second number

Regex.Matches(mdl.roomCode, "(\d+)([A-Z])(\d+)") returns an collection of matches. If there is no match, then it will return an empty MatchCollection.
Since the regular expression matches the string, it returns a colletion with one item, the input string.

Related

Get string that follows second-last occurrence of character/string

I have an unusual project in which I need to retrieve the text after the second-last occurrence of the character "\", effectively giving me the last two directories in the following example strings:
D:\Archive Directory\2015-12-31 PM\SerialNo_01
D:\Archive Directory\2016-01-01\SerialNo_02
D:\Archive Directory\January 2016\SerialNo_03
The desired result is, respectively:
2015-12-31 PM\SerialNo_01
2016-01-01\SerialNo_02
January 2016\SerialNo_03
I'd like to do this as cleanly as possible and preferably in one line of code for each string.
This question is being answered by me after finding nothing on Stack Overflow about finding the second-last occurrence (or, for that matter, any Nth occurrence going backwards) of a string or character within a string in c#. If the community finds this question is duplicated or feels it is too obscure a case, I am willing to remove it.
Edit: Clarified that I don't need to do this as a list of strings; they will be run one at a time. I'm dynamically adding them as radio button controls to a form.
You don't need regex, you can rely on the built-in path handling provided by .NET.
var input = new List<string> {
#"D:\Archive Directory\2015-12-31 PM\SerialNo_01",
#"D:\Archive Directory\2016-01-01\SerialNo_02",
#"D:\Archive Directory\January 2016\SerialNo_03"
};
var result = input.Select(s => Path.Combine(Directory.GetParent(s).Name, Path.GetFileName(s)));
Yields:
2015-12-31 PM\SerialNo_01
2016-01-01\SerialNo_02
January 2016\SerialNo_03
Then you don't need to worry about edge cases, or even cross-OS compatibility.
I was able to come up with a solution after tweaking the code from this clever answer.
myString.Split('\\').Reverse().Take(2).Aggregate((s1, s2) => s2 + "\\" + s1);
This will split the string at each backslash, then reverse the resulting array of strings and take only the last two elements before concatenating them back together, now in reverse order, giving the desired result.
You can also match the parts you want.
(?<=\\)[^\\]*\\[^\\]*$
See demo.
https://regex101.com/r/fM9lY3/56
string strRegex = #"(?<=\\)[^\\]*\\[^\\]*$";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
string strTargetString = #"D:\Archive Directory\2015-12-31 PM\SerialNo_01" + "\n" + #"D:\Archive Directory\2016-01-01\SerialNo_02" + "\n" + #"D:\Archive Directory\January 2016\SerialNo_03";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}
If you still want to use Regex, you can use
Match match = Regex.Match(inputString,#".:\\.*\\(.*\\.*)");
if(match.success)
{
Result = match.Groups[1].value;
}
The first group in match will give the required result
For multiple results use Matches instead of match
List<string> paths = new List<string> {
#"D:\Archive Directory\2015-12-31 PM\SerialNo_01",
#"D:\Archive Directory\2016-01-01\SerialNo_02",
#"D:\Archive Directory\January 2016\SerialNo_03" };
var requiredPaths = paths.Select(item=> string.Join(#"\",item.Split('\\').
Reverse().Take(2).Reverse()));

Why this function for finding the n-th occurrence does not work on text with line breaks?

I found the following code to find the n-th occurrence of a value in a text here.
This is the code:
public static int NthIndexOf(this string target, string value, int n)
{
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}");
if (m.Success)
return m.Groups[2].Captures[n - 1].Index;
else
return -1;
}
I tried to find the index of the second occurrence of "< /form>" (the space does not appear in the original string) in some webpage, and it failed, although for sure it exists in the text. I also cut some prefix of the webpage, so the second occurrence will be the first, and then I succeeded to find the expression as the first occurrence.
In one of the comment on this code, someone wrote that "This Regex does not work if the target string contains linebreaks.".
My two questions are:
Why does not this code work if the target string contains linebreaks?
How can I fix this code, so it will work also for strings that contain linebreaks (replacing/removing the linebreaks is not considered a good solution for me)?
I don't look for other techniques to do the same thing.
the regex match till the end of the line.
For what you want you need to use the Singleline mode, so your code should look something like this:
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}", RegexOptions.Singleline);
By default Regular Expression end on a new line. To fix it you need to specify the regex option
Match m = Regex.Match(target, "((" + value + ").*?){" + n + "}", RegexOptions.MultiLine);
You can find more information about RegExOptions here.

Regular Expression without braces

i have the following sample cases :
1) "Sample"
2) "[10,25]"
I want to form a(only one) regular expression pattern, to which the above examples are passed returns me "Sample" and "10,25".
Note: Input strings do not include Quotes.
I came up with the following expression (?<=\[)(.*?)(?=\]), this satisfies the second case and retreives me only "10,25" but when the first case is matched it returns me blank. I want "Sample" to be returned? can anyone help me.
C#.
here you go, a small regex using a positive lookbehind, sometime these are very handy
Regex
(?<=^|\[)([\w,]+)
Test string
Sample
[10,25]
Result
MATCH 1
[0-6] Sample
MATCH 2
[8-13] 10,25
try at regex101.com
if " is included in your original string, use this regex, this will look for " mark as well, you may choose to remove ^| from lookup if " mark is always included or you may choose to leave it as it is if your text has combination of with and without " marks
Regex
(?<=^|\[|\")([\w,]+)
try at regex101.com
As far as I can tell, the below regex should help:
Regex regex = new Regex(#"^\w+|[[](\w)+\,(\w)+[]]$");
This will match multiple words, or 2 words (alphanumeric) separated by commas and inside square brackets.
One Java example:
// String input = "Sample";
String input = "[10,25]";
String text = "[^,\\[\\]]+";
Pattern pMod = Pattern.compile("(" + text + ")|(?>\\[(" + text + "," + text + ")\\])");
Matcher mMod = pMod.matcher(input);
while (mMod.find()) {
if(mMod.group(1) != null) {
System.out.println(mMod.group(1));
}
if(mMod.group(2)!=null) {
System.out.println(mMod.group(2));
}
}
if input is "[hello&bye,25|35]", then the output is hello&bye,25|35

Error when counting a specific word in a string with Regex.Matches

I tried to find a method to count a specific word in a string, and I found this.
using System.Text.RegularExpressions;
MatchCollection matches = Regex.Matches("hi, hi, everybody.", ",");
int cnt = matches.Count;
Console.WriteLine(cnt);
It worked fine, and the result shows 2.
But when I change "," to ".", it shows 18, not the expected 1. Why?
MatchCollection matches = Regex.Matches("hi, hi, everybody.", ".");
and when I change "," to "(", it shows me an error!
the error reads:
SYSTEM.ARGUMENTEXCEPTION - THERE ARE TOO MANY (...
I don't understand why this is happening
MatchCollection matches = Regex.Matches("hi( hi( everybody.", "(");
Other cases seem to work fine but I need to count "(".
The first instance, with the ., is using a special character which has a different meaning in regular expressions. It is matching ALL of the characters you have; hence you getting a result of 18.
http://www.regular-expressions.info/dot.html
To match an actual "." character, you'll need to "escape" it so that it is read as a full-stop and not a special character.
MatchCollection matches = Regex.Matches("hi, hi, everybody.", "\.");
The same exists for the ( character. It's a special character that has a different meaning in terms of regular expressions and you will need to escape it.
MatchCollection matches = Regex.Matches("hi( hi( everybody.", "\(");
Looks like you're new to regular expressions so I'd suggest reading, the link I posted above is a good start.
HOWEVER!
If you are looking to just count ocurences in a string, you don't need regex.
How would you count occurrences of a string within a string?
If you're using .NET 3.5 you can do this in a one-liner with LINQ:
int cnt = source.Count(f => f == '(');
If you don't want to use LINQ you can do it with:
int cnt = source.Split('(').Length - 1;
The second parameter represents a pattern, not necessarily just a character to search for in your string, and the ( by itself is an invalid pattern.
You don't need Regex to count occurrences of a character. Just use LINQ's Count():
var input = "hi( hi( everybody.";
var occurrences = input.Count(x => x == '('); // 2
( character is a special character which means start of a group. If you need to use ( as literal you need to escape it with \(. That should solve your problem.

Regular Expression to get all characters before "-"

How can I get the string before the character "-" using regular expressions?
For example, I have "text-1" and I want to return "text".
So I see many possibilities to achieve this.
string text = "Foobar-test";
Regex Match everything till the first "-"
Match result = Regex.Match(text, #"^.*?(?=-)");
^ match from the start of the string
.*? match any character (.), zero or more times (*) but as less as possible (?)
(?=-) till the next character is a "-" (this is a positive look ahead)
Regex Match anything that is not a "-" from the start of the string
Match result2 = Regex.Match(text, #"^[^-]*");
[^-]* matches any character that is not a "-" zero or more times
Regex Match anything that is not a "-" from the start of the string till a "-"
Match result21 = Regex.Match(text, #"^([^-]*)-");
Will only match if there is a dash in the string, but the result is then found in capture group 1.
Split on "-"
string[] result3 = text.Split('-');
Result is an Array the part before the first "-" is the first item in the Array
Substring till the first "-"
string result4 = text.Substring(0, text.IndexOf("-"));
Get the substring from text from the start till the first occurrence of "-" (text.IndexOf("-"))
You get then all the results (all the same) with this
Console.WriteLine(result);
Console.WriteLine(result2);
Console.WriteLine(result21.Groups[1]);
Console.WriteLine(result3[0]);
Console.WriteLine(result4);
I would prefer the first method.
You need to think also about the behavior, when there is no dash in the string. The fourth method will throw an exception in that case, because text.IndexOf("-") will be -1. Method 1 and 2.1 will return nothing and method 2 and 3 will return the complete string.
Here is my suggestion - it's quite simple as that:
[^-]*
This is something like the regular expression you need:
([^-]*)-
Quick tests in JavaScript:
/([^-]*)-/.exec('text-1')[1] // 'text'
/([^-]*)-/.exec('foo-bar-1')[1] // 'foo'
/([^-]*)-/.exec('-1')[1] // ''
/([^-]*)-/.exec('quux')[1] // explodes
I dont think you need regex to achieve this. I would look at the SubString method along with the indexOf method. If you need more help, add a comment showing what you have attempted and I will offer more help.
You could just use another non-regex based method. Someone gave the suggestion of using Substring, but you could also use Split:
string testString = "my-string";
string[] splitString = testString.Split("-");
string resultingString = splitString[0]; //my
See http://msdn.microsoft.com/en-US/library/ms228388%28v=VS.80%29.aspx for another good example.
If you want use RegEx in .NET,
Regex rx = new Regex(#"^([\w]+)(\-)*");
var match = rx.Match("thisis-thefirst");
var text = match.Groups[1].Value;
Assert.AreEqual("thisis", text);
Find all word and space characters up to and including a -
^[\w ]+-

Categories

Resources