I am trying to do a program that searches for certain tags inside textfiles and see if there is text in between those tags. Example of tags below.
--<UsrDef_Mod_Trigger_repl_BeginMod>
--<UsrDef_Mod_Trigger_repl_EndMod>
So i want to search for --<UsrDef_Mod_ and _Begin or _End
I made these RegExp, but i get false on every single one.
if (Regex.Match(line, #"/--<UsrDef_Mod_.*_BeginMod>/g", RegexOptions.None).Success)
else if (Regex.Match(line, #"/--<UsrDef_Mod_.*_EndMod>/g", RegexOptions.None).Success)
So any help to find where im going wrong. I have used regexr.com to check my regexp and its getting a match there but not in C#.
The .NET library Regex doesn't understand the "/ /g"wrapper.
Just remove it:
// Regex.Match(line, #"/--<UsrDef_Mod_.*_BeginMod>/g",
Regex.Match(line, #"--<UsrDef_Mod_.*_BeginMod>",
if (Regex.Match(line, #"--<UsrDef_Mod_.*_BeginMod>", RegexOptions.None).Success)
if (Regex.Match(line, #"--<UsrDef_Mod_.*_EndMod>", RegexOptions.None).Success)
Those both get a match - you just remove the /-- and /g options -
As per Henk Holtermann´s Answer - a comparison of perl and c# regex options on SO - for further reference.
var matches = Regex.Matches(text, #"<UsrDef_Mod_([a-zA-Z_]+)_BeginMod>([\s\S]+?)<UsrDef_Mod_\1_EndMod>");
if (matches != null)
foreach (Match m in matches)
Console.WriteLine(m.Groups[2].Value);
Group #2 will contain the text inside two tags.
Related
From the below mentioned input string, I want to extract the values specified in {} for s:ds field. I have attached my regex pattern. Now the pattern I used for testing on http://www.regexr.com/ is:
s:ds=\\\"({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\\\"
and it works absolutely fine.
But the same in C# code does not work. I have also added \\ instead of \ for c# code and replaced \" with \"" . Let me know if Im doing something wrong. Below is the code snippet.
string inputString is "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311}\" s:ds=\"{37BA4BA0-581C-40DC-A542-FFD9E99BC345}\" s:id=\"{C091E71D-4817-49BC-B120-56CE88BC52C2}\"";
string regex = #"s:ds=\\\""({[\d\w]{8}\-(?:[\d\w]{4}\-){3}[\d\w]{12}})\\\""";
MatchCollection matchCollection = Regex.Matches(layoutField, regex);
if (matchCollection.Count > 1)
{
Log.Info("Collection Found.", this);
}
If you only watch to match the values...
You should be able to just use ([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}) for your expression if you only want to match the withing your gullwing braces :
string input = "s:ds=\"{46C01EB7-6D43-4E2A-9267-608DE8AFA311} ...";
// Use the following expression to just match your GUID values
string regex = #"([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
// Store your matches
MatchCollection matchCollection = Regex.Matches(input, regex);
// Iterate through each one
foreach(var match in matchCollection)
{
// Output the match
Console.WriteLine("Collection Found : {0}", match);
}
You can see a working example of this in action here and example output demonstrated below :
If you want to only match those following s:ds...
If you only want to capture the values for s:ds sections, you could consider appending (?<=(s:ds=""{)) to the front of your expression, which would be a look-behind that would only match values that were preceded by "s:ds={" :
string regex = #"(?<=(s:ds=""{))([\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12})";
You can see an example of this approach here and demonstrated below (notice it doesn't match the s:id element :
Another Consideration
Currently you are using \w to match "word" characters within your expression and while this might work for your uses, it will match all digits \d, letters a-zA-z and underscores _. It's unlikely that you would need some of these, so you may want to consider revising your character sets to use just what you would expect like [A-Z\d] to only match uppercase letters and numbers or [0-9A-Fa-f] if you are only expected GUID values (e.g. hex).
Looks like you might be over-escaping.
Give this a shot:
#"s:ds=\""({[\d\w]{8}\-([\d\w]{4}\-){3}[\d\w]{12}})\"""
I have this block of code in C# code-behind:
string input = "CN=L_WDJACK127_WDC_SSIS_USER_CH,OU=ALOSup,OU=Infra,DC=internal, DC=mycompany,DC=com"
string pattern = #"CN\=(.+)\,";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
When I run this, match.Groups[1].Value is equal to
L_WDJACK127_WDC_SSIS_USER_CH,OU=ALOSup,OU=Infra,DC=internal,
DC=mycompany
I need it to be equal to
L_WDJACK127_WDC_SSIS_USER_CH
Can someone please fix my Regex?
Basic Greedy/Lazy quantifier problem:
string pattern = #"CN\=(.+?)\,";
This resource should help as to why: http://www.regular-expressions.info/repeat.html
Basically, the .+ tries to match as many of any character, and at least one of which, as possible before hitting the last comma. By adding a ? to the end of it (.+?) you tell the Regex Engine to match as many of any, and at least one of which, character as possible before you hit the first comma.
I'm trying to find some filenames that are written into a logfile that end on 'K.TIF'.
I'm trying to find:
20130629VGM180ZZ001001K.TIF
20130629VGM180ZZ001002K.TIF
etc.
As I'm terrible in regex's, I tried this:
Regex.Match(line, #"([A-Z0-9]+){23}\.TIF", RegexOptions.IgnoreCase);
Regex.Match(line, #"(?<=\\)(.>)(?=K\.TIF){23}", RegexOptions.IgnoreCase);
The first one is terrible, doesn't perform and gives bad results.
The second one actually gives all the TIF that end on Z.TIF if I change K\ to Z. However, it does not find any K.TIF's with the current regex.
This seems to work for me:
^.*\\(\w*K.TIF)$
It searches for the last slash and then captures the word characters followed by K.TIF. Example: http://www.regex101.com/r/nH6gV4
This should work:
#"\w+K\.TIF$"
The first regular expression is very close to the answer, but it has an extra '+'. I think you can try the following code.
Regex.Match(line, #"([A-Z0-9]){22}K\.TIF", RegexOptions.IgnoreCase);
This regex will get what you want:
\\([A-Z0-9]{22}K\.TIF)$
You shouldn't use IgnoreCase as you specifically made the regex to match just caps.
The extract value will be inside a match group so use:
string MatchedFileName = Regex.Match(line, #"[A-Z0-9]{22}K\.TIF$").Value;
(Updated, thanks Tyler for pointing out I hadn't read the OP's question properly)
(Updated again as it didnt need the backslash at the start or the capture group)
use this regex var res = Regex.Match(line, #"(?im)^.+k\.tif$";
I was wondering if it is possible to build equivalent C# regular expression for finding this pattern in a filename. For example, this is the expr in perl /^filer_(\d{10}).txt(.gz)?$/i Could we find or extract the \d{10} part so I can use it in processing?
To create a Regex object that will ignore character casing and match your filter try the following:
Regex fileFilter = new Regex(#"^filter_(\d{10})\.txt(\.gz)?$", RegexOptions.IgnoreCase),
To perform the match:
Match match = fileFilter.Match(filename);
And to get the value (number here):
if(match.Success)
string id = match.Groups[1].Value;
The matched groups work similar to Perl's matches, [0] references the whole match, [1] the first sub pattern/match, etc.
Note: In your initial perl code you didn't escape the . characters so they'd match any character, not just real periods!
Yes, you can. See the Groups property of the Match class that is returned by a call to Regex.Match.
In your case, it would be something along the lines of the following:
Regex yourRegex = new Regex("^filer_(\d{10}).txt(.gz)?$");
Match match = yourRegex.Match(input);
if(match.Success)
result = match.Groups[1].Value;
I don't know, what the /i means at the end of your regex, so I removed it in my sample code.
As daniel shows, you can access the content of the matched input via groups. But instead of using default indexed groups you can also use named groups. In the following i show how and also that you can use the static version of Match.
Match m = Regex.Match(input, #"^(?i)filer_(?<fileID>\d{10}).txt(?:.gz)?$");
if(m.Success)
string s = m.Groups["fileID"].Value;
The /i in perl means IgnoreCase as also shown by Mario. This can also be set inline in the regex statement using (?i) as shown above.
The last part (?:.gz) creates a non-capturing group, which means that it’s used in the match but no group is created.
I'm not sure if that's what you want, this is how you can do that.
public class MyExample
{
public static void Main(String[] args)
{
string input = "The Venture Bros</p></li>";
// Call Regex.Match
Match m = Regex.Match(input, "/show_name=(.*?)&show_name_exact=true\">(.*?)</i");
// Check Match instance
if (m.Success)
{
// Get Group value
string key = m.Groups[1].Value;
Console.WriteLine(key);
// alternate-1
}
}
I want "The Venture Bros" as output (in this example).
try this :
string input = "The Venture Bros</p></li>";
// Call Regex.Match
Match m = Regex.Match(input, "show_name=(.*?)&show_name_exact=true\">(.*?)</a");
// Check Match instance
if (m.Success)
{
// Get Group value
string key = m.Groups[2].Value;
Console.WriteLine(key);
// alternate-1
}
I think it's because you're trying to do the perl-style slashes on the front and the end. A couple of other answerers have been confused by this already. The way he's written it, he's trying to do case-insensitive by starting and ending with / and putting an i on the end, the way you'd do it in perl.
But I'm pretty sure that .NET regexes don't work that way, and that's what's causing the problem.
Edit: to be more specific, look into RegexOptions, an example I pulled from MSDN is like this:
Dim rx As New Regex("\b(?<word>\w+)\s+(\k<word>)\b", RegexOptions.Compiled Or RegexOptions.IgnoreCase)
The key there is the "RegexOptions.IgnoreCase", that'll cause the effect that you were trying for with /pattern/i.
The correct regex in your case would be
^.*&show_name_exact=true\"\>(.*)</a></p></li>$
regexp is tricky, but at http://www.regular-expressions.info/ you can find a great tutorial
/?show_name=(.)&show_name_exact=true\">(.)
would work as you expect I believe. But another thing I notice, is that you're trying to get the value of group[1], but I believe that you want the value of group[2], because there will be 3 groups, the first is the match, and the second is the first group...
Gl ;)
Because of the question mark before show_name. It is in input but not in pattern, thus no match.
Also, you try to match </i but the input doesn't contain this (it contains </li>).
First the regex starts "/show_name", but the target string has "/?show_name" so the first group won't want the first expected hit.
This will cause the whole regex to fail.
Ok, let's break this down.
Test Data: "The Venture Bros</p></li>"
Original Regex: "/show_name=(.*?)&show_name_exact=true\">(.*?)</i"
Working Regex: "/\?show_name=(.*)&show_name_exact=true\">(.*)</a"
We'll start at the left and work our way to the right, through the regex.
"?" became "\?" this is because a "?" means that the preceding character or group is optional. When we put a slash before it, it now matches a literal question mark.
"(.*?)" became "(.*)" the parentheses denote a group, and a question mark means "optional", but the "*" already means "0 or more" so this is really just removing a redundancy.
"</i" became "</a" this change was made to match your actual text which terminates the anchor with a "</a>" tag.
Suggested Regex: "[\\W]show_name=([^><\"]*)&show_name_exact=true\">([^<]*)<"
(The extra \'s were added to provide proper c# string escaping.)
A good tool for testing regular expressions in c#, is the regex-freetool at code.google.com