Regex working in Regexr but not C#, why? - c#

From the below mentioned input string, I want to extract the values specified in {} for s:ds field. I have attached my regex pattern. Now the pattern I used for testing on http://www.regexr.com/ is:
s:ds=\\\"({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\\\"
and it works absolutely fine.
But the same in C# code does not work. I have also added \\ instead of \ for c# code and replaced \" with \"" . Let me know if Im doing something wrong. Below is the code snippet.
string inputString is "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311}\" s:ds=\"{37BA4BA0-581C-40DC-A542-FFD9E99BC345}\" s:id=\"{C091E71D-4817-49BC-B120-56CE88BC52C2}\"";
string regex = #"s:ds=\\\""({[\d\w]{8}\-(?:[\d\w]{4}\-){3}[\d\w]{12}})\\\""";
MatchCollection matchCollection = Regex.Matches(layoutField, regex);
if (matchCollection.Count > 1)
{
Log.Info("Collection Found.", this);
}

If you only watch to match the values...
You should be able to just use ([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}) for your expression if you only want to match the withing your gullwing braces :
string input = "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311} ...";
// Use the following expression to just match your GUID values
string regex = #"([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
// Store your matches
MatchCollection matchCollection = Regex.Matches(input, regex);
// Iterate through each one
foreach(var match in matchCollection)
{
// Output the match
Console.WriteLine("Collection Found : {0}", match);
}
You can see a working example of this in action here and example output demonstrated below :
If you want to only match those following s:ds...
If you only want to capture the values for s:ds sections, you could consider appending (?<=(s:ds=""{)) to the front of your expression, which would be a look-behind that would only match values that were preceded by "s:ds={" :
string regex = #"(?<=(s:ds=""{))([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
You can see an example of this approach here and demonstrated below (notice it doesn't match the s:id element :
Another Consideration
Currently you are using \w to match "word" characters within your expression and while this might work for your uses, it will match all digits \d, letters a-zA-z and underscores _. It's unlikely that you would need some of these, so you may want to consider revising your character sets to use just what you would expect like [A-Z\d] to only match uppercase letters and numbers or [0-9A-Fa-f] if you are only expected GUID values (e.g. hex).

Looks like you might be over-escaping.
Give this a shot:
#"s:ds=\""({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\"""

Related

Regular Expression in C# : Matching after a special character + specific text + till special character

I am looking for a pattern that matches everything after the special character } then match a text and then till first special character.
this will be clear with example.my string is like
string a = #"}var e=l.visible}.abc.jYOxx{margin-top:0px}.acOxx{margin-top:0px}";
I want to extract everything after last } and text matching is ".jYOxx" and then everything before first }.
result of my desired regex will be :
.abc.jYOxx{margin-top:0px}
I wrote this:
MatchCollection matchesSource = Regex.Matches(a, #"\}(.*?).jYOxx(.*?)}", RegexOptions.Multiline);
my result is :
}var e=l.visible}.abc.jYOxx{margin-top:0px}
Logically I have a string of css classes and I have class name to match I want to extract that css class only. Please help I have wasted whole day in it.
You can match any char except the curly brackets before matching .jYOxx
If you also do not want to match whitspace chars, you can add \s to the negated character class [^{}\s]
[^{}]*\.jYOxx{[^{}]*}
Regex demo

Regex to extract text from a pattern in C#

I have a string pattern, which contains a ID and Text to make markup easier for our staff.
The pattern to create a "fancy" button in our CMS is:
(button_section_ID:TEXT)
Example:
(button_section_25:This is a fancy button)
How do I extract the "This is a fancy button" part of that pattern? The pattern will always be the same. I tried to do some substring stuff but that got complicated very fast.
Any help would be much appreciated!
If the text is always in the format you specified, you just need to trim parentheses and then split with ::
var res = input.Trim('(', ')').Split(':')[1];
If the string is a substring, use a regex:
var match = Regex.Match(input, #"\(button_section_\d+:([^()]+)\)");
var res = match.Success ? match.Groups[1].Value : "";
See this regex demo.
Explanation:
\(button_section_ - a literal (button_section_
\d+ - 1 or more digits
: - a colon
([^()]+) - Group 1 capturing 1+ characters other than ( and ) (you may replace with ([^)]*) to make matching safer and allow an empty string and ( inside this value)
)- a literal)`
The following .NET regex will give you a match containing a group with the text you want:
var match = Regex.Matches(input, #"\(button_section_..:(.*)\)");
The braces define a match group, which will give you everything between the button section, and the final curly brace.

Regex match method is not works correctly

I have created a Regex as below, but the Match method does not work correctly:
Regex regex = new Regex("(" + SearchText + ")", RegexOptions.IgnoreCase);
if(regex.Match(item).Success) { ... }
For example, if I set SearchText to e., and i set item to es, then Success is true.
Similarly, if have set SearchText to $ or ., then a match with 4 returns Success as true.
How come this is happening, and how can I solve this problem?
When you use a regex there are a bunch of common characters which have special meanings. For example, the period (.) character will match any character at all so if you wanted to match the words dog and dig, you could use the regex d.g.
There are MANY different special characters you can use, you should see the full .NET Regex documentation for more details.
This makes matching specific things slightly more complicated when you want to match something specific, like the end of a sentence. To match dog. you actually have to pass in dog\. as the regex to match against. You can use the Regex.Escape(string str) method to escape most simple strings, before passing them into your Regex constructor.
The other question is if you are only looking for literals, why do you use Regex at all.
string item = "bla bla e. bla";
bool result = item.Contains("e."); //returns true
Edit
Case insensitive:
result = item.IndexOf("e.", 0, StringComparison.OrdinalIgnoreCase) != -1;

Regex to match a word beginning with a period and ending with an underscore?

I'm quite the Regex novice, but I have a series of strings similar to this "[$myVar.myVar_STATE]" I need to replace the 2nd myVar that begins with a period and ends with an underscore. I need it to match it exactly, as sometimes I'll have "[$myVar.myVar_moreMyVar_STATE]" and in that case I wouldn't want to replace anything.
I've tried things like "\b.myVar_\b", "\.\bmyVar_\b" and several more, all to no luck.
How about this:
\[\$myVar\.([^_]+)_STATE\]
Matches:
[$myVar.myVar_STATE] // matches and captures 'myvar'
[$myVar.myVar_moreMyVar_STATE] // no match
Working regex example:
http://regex101.com/r/yM9jQ3
Or if _STATE was variable, you could use this: (as long as the text in the STATE part does not have underscores in it.)
\[\$myVar\.([^_]+)_[^_]+\]
Working regex example:
http://regex101.com/r/kW8oE1
Edit: Conforming to OP's comments below, This should be what he's going for:
(\[\$myVar\.)([^_]+)(_[^_]+\])
Regex replace example:
http://regex101.com/r/pU6yL8
C#
var pattern = #"(\[\$myVar\.)([^_]+)(_[^_]+\])";
var replaced = Regex.Replace(input, pattern, "$1"+ newVar + "$3")
What about something like:
.*.(myVar_).*
This looks for anything then a . and "myVar_" followed by anything.
It matches:
"[$myVar.myVar_STATE]"
And only the first myVar_ here:
"[$myVar.myVar_moremyVar_STATE]"
See it in action.
This should do it:
\[\$myVar\.(.*?)_STATE\]
You can use this little trick to pick out the groups, and build the replacement at the end, like so:
var replacement = "something";
var input = #"[$myVar.myVar_STATE]";
var pattern = #"(\[\$myVar\.)(.*?)_(.*?)]";
var replaced = Regex.Replace(input, pattern, "$1"+ replacement + "_$2]")
C# already has builtin method to do this
string text = ".asda_";
Response.Write((text.StartsWith(".") && text.EndsWith("_")));
Is Regex really required?
string input = "[$myVar.myVar_STATE]";
string oldVar = "myVar";
string newVar = "myNewVar";
string result = input.Replace("." + oldVar + "_STATE]", "." + newVar + "_STATE]");
In case "STATE" is a variable part, then we'll need to use Regex. The easiest way is to use this Regex pattern which matches a position between a prefix and a suffix. Prefix and suffix are used for searching but are not included in the resulting match:
(?<=prefix)find(?=suffix)
result =
Regex.Replace(input, #"(?<=\.)" + Regex.Escape(oldVar) + "(?=_[A-Z]+])", newVar);
Explanation:
The prefix part is \., which stand for ".".
The find part is the escaped old variable to be replaced. Regex escaping makes sure that characters with a special meaning in Regex are escaped.
The suffix part is _[A-Z]+], an underscore followed by at least one letter followed by "]". Note: the second ] needs not to be escaped. An opening bracket [ would have to be escaped like this: \[. We cannot use \w for word characters for the STATE-part as \w includes underscores. You might have to adapt the [A-Z] part to exactly match all possible states (e.g. if state has digits, use [A-Z0-9].

C# regex match, match.Success returns false even after following the rules

Friends,
I want to match a string like
"int lnum[];" so I am trying to match it with a pattern like this
[A-Za-z_0-9] [A-Za-z_0-9]\[\]
but it does not seem to work.
I looked up rules at http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
string pJavaLine = "int lnum[]";
match = Regex.Match(pJavaLine, #"[A-Za-z_0-9] [A-Za-z_0-9]\[\] ", RegexOptions.IgnoreCase);
if (match.Success) {
// Finally, we get the Group value and display it.
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
the match.Success returns false.
Would anybody please let me know a possible way to get this.
Each of your character classes, like [A-Za-z_0-9], matches only a single character. If you want to match more than one character, you need to add something to the end. For example, [A-Za-z_0-9]+ -- the + means 1 or more of these. You could also use * for 0 or more, or specify a range, like {2,5} for 2-5 characters.
That said, you can use this pattern to match that string:
[A-Za-z_0-9]+ [A-Za-z_0-9]+\[\]
The \w is loosely equivalent to [A-Za-z_0-9] (see link in jessehouwing's comment below), so you can probably simply use:
\w+ \w+\[\]
Check here for more info on the standard Character Classes.

Categories

Resources