Lookbehind with equal sign - c#

I want to match
===Something===
but not
====Something====
I've come up with the following regular expression
Regex.Match("====Something====", #"^\s*===\s*(?<!=====\s*)(?<Title>.*?)\s*===\s*$").Groups["Title"]
but it returns
=Something=
please help what's the issue with the lookbehind pattern.

Match for the full word! the angle brackets are all important. The below expression translated - if we are talking to the computer is like this: computer, search for a word starting with with three = signs then have any number of letters then end the word with three equals signs.
Hence if 4 equals signs are there at the start of the word - it won't match.
string regExpression = #"<={3}(\w+)={3}>";
static void Main(string[] args)
{
// searches for the first specified instance.
string textToSearchThrough = "===Something===";
string textToSearchThrough2 = "====Something====";
// add in \s+ to the below if you wish
string regexExpression = #"<={3}(\w+)={3}>";
Regex r = new Regex(regexExpression);
// change the text to search through to the second variable textToSearchThrough2 if you wish to check
Match m = r.Match(textToSearchThrough);
Console.WriteLine(m.Success.ToString());
Console.ReadLine();
}

One more possible solution:
(?<!=)===(?!=)(?<Title>.*?)(?<!=)===(?!=)

Your regex works wrong because you use .*? which can also match =. So it looks for === then accepts anything (other = also), and look for a match which will end with === again. So it will match also === in ========= string, and it is not what you are looking for. However if you change . (match any character) on \w (match word character). Also it would be better to use \w+ insted \w* to avoid maching only ====== without any word (if you don't want to) it should work nad match only ===Something=== even without lookbehind, like:
^\s*===\s*(?<Title>\w+?)\s*===\s*$
Try it HERE.

Related

How to split Alphanumeric with Symbol in C#

I want to spilt Alphanumeric with two part Alpha and numeric with special character like -
string mystring = "1- Any Thing"
I want to store like:
numberPart = 1
alphaPart = Any Thing
For this i am using Regex
Regex re = new Regex(#"([a-zA-Z]+)(\d+)");
Match result = re.Match("1- Any Thing");
string alphaPart = result.Groups[1].Value;
string numberPart = result.Groups[2].Value;
If there is no space in between string its working fine but space and symbol both alphaPart and numberPart showing null where i am doing wrong Might be Regex expression is wrong for this type of filter please suggest me on same
Try this:
(\d+)(?:[^\w]+)?([a-zA-Z\s]+)
Demo
Explanation:
(\d+) - capture one or more digit
[^\w]+ match anything except alphabets
? this tell that anything between word and number can appear or not(when not space is between them)
[a-zA-Z\s]+ match alphabets(even if between them have spaces)
Start of string is matched with ^.
Digits are matched with \d+.
Any non-alphanumeric characters are matched with [\W_] or \W.
Anything is matched with .*.
Use
(?s)^(\d+)\W*(.*)
See proof
(?s) makes . match linebreaks. So, it literally matches everything.

Regex including what is supposed to be non-capturing group in result

I have the following simple test where i'm trying to get the Regex pattern such that it yanks the executable name without the ".exe" suffix.
It appears my non-capturing group setting (?:\\.exe) isn't working or i'm misunderstanding how its intended to work.
Both regex101 and regexstorm.net show the same result and the former confirms that "(?:\.exe)" is a non-capturing match.
Any thoughts on what i'm doing wrong?
// test variable for what i would otherwise acquire from Environment.CommandLine
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?"
var asmName = Regex.Match(testEcl, #"[^\\]+(?:\.exe)", RegexOptions.IgnoreCase).Value;
// expecting "MyApp" but I get "MyApp.exe"
I have been able to extract the value i wanted by using a matching pattern with group names defined, as shown in the following, but would like to understand why non-capturing group setting approach didn't work the way i expected it to.
// test variable for what i would otherwise acquire from Environment.CommandLine
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?"
var asmName = Regex.Match(Environment.CommandLine, #"(?<fname>[^\\]+)(?<ext>\.exe)",
RegexOptions.IgnoreCase).Groups["fname"].Value;
// get the desired "MyApp" result
/eoq
A (?:...) is a non-capturing group that matches and still consumes the text. It means the part of text this group matches is still added to the overall match value.
In general, if you want to match something but not consume, you need to use lookarounds. So, if you need to match something that is followed with a specific string, use a positive lookahead, (?=...) construct:
some_pattern(?=specific string) // if specific string comes immmediately after pattern
some_pattern(?=.*specific string) // if specific string comes anywhere after pattern
If you need to match but "exclude from match" some specific text before, use a positive lookbehind:
(?<=specific string)some_pattern // if specific string comes immmediately before pattern
(?<=specific string.*?)some_pattern // if specific string comes anywhere before pattern
Note that .*? or .* - that is, patterns with *, +, ?, {2,} or even {1,3} quantifiers - in lookbehind patterns are not always supported by regex engines, however, C# .NET regex engine luckily supports them. They are also supported by Python PyPi regex module, Vim, JGSoft software and now by ECMAScript 2018 compliant JavaScript environments.
In this case, you may capture what you need to get and just match the context without capturing:
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
var asmName = string.Empty;
var m = Regex.Match(testEcl, #"([^\\]+)\.exe", RegexOptions.IgnoreCase);
if (m.Success)
{
asmName = m.Groups[1].Value;
}
Console.WriteLine(asmName);
See the C# demo
Details
([^\\]+) - Capturing group 1: one or more chars other than \
\. - a literal dot
exe - a literal exe substring.
Since we are only interested in capturing group #1 contents, we grab m.Groups[1].Value, and not the whole m.Value (that contains .exe).
You're using a non-capturing group. The emphasis is on the word group here; the group does not capture the .exe, but the regex in general still does.
You're probably wanting to use a positive lookahead, which just asserts that the string must meet a criteria for the match to be valid, though that criteria is not captured.
In other words, you want (?=, not (?:, at the start of your group.
The former is only if you are enumerating the Groups property of the Match object; in your case, you're just using the Value property, so there's no distinction between a normal group (\.exe) and a non-capturing group (?:\.exe).
To see the distinction, consider this test program:
static void Main(string[] args)
{
var positiveInput = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
Test(positiveInput, #"[^\\]+(\.exe)");
Test(positiveInput, #"[^\\]+(?:\.exe)");
Test(positiveInput, #"[^\\]+(?=\.exe)");
var negativeInput = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.dll\" /?";
Test(negativeInput, #"[^\\]+(?=\.exe)");
}
static void Test(String input, String pattern)
{
Console.WriteLine($"Input: {input}");
Console.WriteLine($"Regex pattern: {pattern}");
var match = Regex.Match(input, pattern, RegexOptions.IgnoreCase);
if (match.Success)
{
Console.WriteLine("Matched: " + match.Value);
for (int i = 0; i < match.Groups.Count; i++)
{
Console.WriteLine($"Groups[{i}]: {match.Groups[i]}");
}
}
else
{
Console.WriteLine("No match.");
}
Console.WriteLine("---");
}
The output of this is:
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(\.exe)
Matched: MyApp.exe
Groups[0]: MyApp.exe
Groups[1]: .exe
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(?:\.exe)
Matched: MyApp.exe
Groups[0]: MyApp.exe
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.exe" /?
Regex pattern: [^\\]+(?=\.exe)
Matched: MyApp
Groups[0]: MyApp
---
Input: "D:\src\repos\myprj\bin\Debug\MyApp.dll" /?
Regex pattern: [^\\]+(?=\.exe)
No match.
---
The first regex (#"[^\\]+(\.exe)") has \.exe as just a normal group.
When we enumerate the Groups property, we see that .exe is indeed a group captured in our input.
(Note that the entire regex is itself a group, hence Groups[0] is equal to Value).
The second regex (#"[^\\]+(?:\.exe)") is the one provided in your question.
The only difference compared to the previous scenario is that the Groups property doesn't contain .exe as one of its entries.
The third regex (#"[^\\]+(?=\.exe)") is the one I'm suggesting you use.
Now, the .exe part of the input isn't captured by the regex at all, but a regex won't match a string unless it ends in .exe, as the fourth scenario illustrates.
It would match the non capturing group but won't capture it, so if you want the non captured part you should access the capture group instead of the whole match
you can access groups in
var asmName = Regex.Match(testEcl, #"([^\\]+)(?:\.exe)", RegexOptions.IgnoreCase);
asmName.Groups[1].Value
the demo for the regex can be found here

Regex working in Regexr but not C#, why?

From the below mentioned input string, I want to extract the values specified in {} for s:ds field. I have attached my regex pattern. Now the pattern I used for testing on http://www.regexr.com/ is:
s:ds=\\\"({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\\\"
and it works absolutely fine.
But the same in C# code does not work. I have also added \\ instead of \ for c# code and replaced \" with \"" . Let me know if Im doing something wrong. Below is the code snippet.
string inputString is "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311}\" s:ds=\"{37BA4BA0-581C-40DC-A542-FFD9E99BC345}\" s:id=\"{C091E71D-4817-49BC-B120-56CE88BC52C2}\"";
string regex = #"s:ds=\\\""({[\d\w]{8}\-(?:[\d\w]{4}\-){3}[\d\w]{12}})\\\""";
MatchCollection matchCollection = Regex.Matches(layoutField, regex);
if (matchCollection.Count > 1)
{
Log.Info("Collection Found.", this);
}
If you only watch to match the values...
You should be able to just use ([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}) for your expression if you only want to match the withing your gullwing braces :
string input = "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311} ...";
// Use the following expression to just match your GUID values
string regex = #"([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
// Store your matches
MatchCollection matchCollection = Regex.Matches(input, regex);
// Iterate through each one
foreach(var match in matchCollection)
{
// Output the match
Console.WriteLine("Collection Found : {0}", match);
}
You can see a working example of this in action here and example output demonstrated below :
If you want to only match those following s:ds...
If you only want to capture the values for s:ds sections, you could consider appending (?<=(s:ds=""{)) to the front of your expression, which would be a look-behind that would only match values that were preceded by "s:ds={" :
string regex = #"(?<=(s:ds=""{))([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
You can see an example of this approach here and demonstrated below (notice it doesn't match the s:id element :
Another Consideration
Currently you are using \w to match "word" characters within your expression and while this might work for your uses, it will match all digits \d, letters a-zA-z and underscores _. It's unlikely that you would need some of these, so you may want to consider revising your character sets to use just what you would expect like [A-Z\d] to only match uppercase letters and numbers or [0-9A-Fa-f] if you are only expected GUID values (e.g. hex).
Looks like you might be over-escaping.
Give this a shot:
#"s:ds=\""({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\"""

c# String - Split Pascal Case

I've been trying to get a C# regex command to turn something like
EYDLessThan5Days
into
EYD Less Than 5 Days
Any ideas?
The code I used :
public static string SplitPascalCase(this string value) {
Regex NameExpression = new Regex("([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z0-9]+)",
RegexOptions.Compiled);
return NameExpression.Replace(value, " $1").Trim();
}
Out:
EYD Less Than5 Days
But still give me wrong result.
Actually I already asked about this in javascript code but when i implemented in c# code with same logic, it's failed.
Please help me. Thanks.
Use lookarounds in your regex so that it won't consume any characters and it allows overlapping of matches.
(?<=[A-Za-z])(?=[A-Z][a-z])|(?<=[a-z0-9])(?=[0-9]?[A-Z])
Replace the matched boundaries with a space.
Regex.Replace(yourString, #"(?<=[A-Za-z])(?=[A-Z][a-z])|(?<=[a-z0-9])(?=[0-9]?[A-Z])", " ");
DEMO
Explanation:
(?<=[A-Za-z])(?=[A-Z][a-z]) Matches the boundary which was exists inbetween an upper or lowercase letter and an Uppercase letter which was immediately followed by a lowercase letter. For example. consider this ABc string. And this regex would match, the boundary exists inbetween A and Bc. For this aBc example , this regex would match, the boundary exists inbetween a and Bc
| Logical OR operator.
(?<=[a-z0-9])(?=[0-9]?[A-Z]) Matches the boundary which was exists inbetween an lower case letter or digit and an optional digit which was immediately followed by an Uppercase letter. For example. consider this a9A string. And this regex would match, the boundary exists inbetween a and 9A , and also the boundary exists inbetween 9 and A, because we gave [0-9] as optional in positive lookahead.
You can just match and join..
var arr = Regex.Matches(str, #"[A-Z]+(?=[A-Z][a-z]+)|\d|[A-Z][a-z]+").Cast<Match>().Select(m => m.Value).ToArray();
Console.WriteLine(String.Join(" ",arr));
The regex isn't complex at all, it is just capturing each and joining them with a " "
DEMO
Something like this should do
string pattern=#"(?<=\d)(?=[a-zA-Z])|(?<=[a-zA-Z])(?=\d)|(?=[A-Z][a-z])|(?<=[a-z])(?=[A-Z])";
Regex.Replace(input,pattern," ");

Extending [^,]+, Regular Expression in C#

Duplicate
Regex for variable declaration and initialization in c#
I was looking for a Regular Expression to parse CSV values, and I came across this Regular Expression
[^,]+
Which does my work by splitting the words on every occurance of a ",". What i want to know is say I have the string
value_name v1,v2,v3,v4,...
Now I want a regular expression to find me the words v1,v2,v3,v4..
I tried ->
^value_name\s+([^,]+)*
But it didn't work for me. Can you tell me what I am doing wrong? I remember working on regular expressions and their statemachine implementation. Doesn't it work in the same way.
If a string starts with Value_name followed by one or more whitespaces. Go to Next State. In That State read a word until a "," comes. Then do it again! And each word will be grouped!
Am i wrong in understanding it?
You could use a Regex similar to those proposed:
(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?
The first group is non-capturing and would match the start of the line and the value_name.
To ensure that the Regex is still valid over all matches, we make that group optional by using the '?' modified (meaning match at most once).
The second group is capturing and would match your vXX data.
The third group is non-capturing and would match the ,, and any whitespace before and after it.
Again, we make it optional by using the '?' modifier, otherwise the last 'vXX' group would not match unless we ended the string with a final ','.
In you trials, the Regex wouldn't match multiple times: you have to remember that if you want a Regex to match multiple occurrences in a strings, the whole Regex needs to match every single occurrence in the string, so you have to build your Regex not only to match the start of the string 'value_name', but also match every occurrence of 'vXX' in it.
In C#, you could list all matches and groups using code like this:
Regex r = new Regex(#"(?:^value_name\s+)?([^,]+)(?:\s*,\s*)?");
Match m = r.Match(subjectString);
while (m.Success) {
for (int i = 1; i < m.Groups.Count; i++) {
Group g = m.Groups[i];
if (g.Success) {
// matched text: g.Value
// match start: g.Index
// match length: g.Length
}
}
m = m.NextMatch();
}
I would expect it only to get v1 in the group, because the first comma is "blocking" it from grabbing the rest of the fields. How you handle this is going to depend on the methods you use on the regular expression, but it may make sense to make two passes, first grab all the fields seperated by commas and then break things up on spaces. Perhaps ^value_name\s+(?:([^,]+),?)* instead.
Oh yeah, lists....
/(?:^value_name\s+|,\s*)([^,]+)/g will theoreticly grab them, but you will have to use RegExp.exec() in a loop to get the capture, rather than the whole match.
I wish pre-matches worked in JS :(.
Otherwise, go with Logan's idea: /^value_name\s+([^,]+(?:,\s*[^,]+)*)$/ followed by .split(/,\s*/);

Categories

Resources