Regex in c# to find files in log - c#

I'm trying to find some filenames that are written into a logfile that end on 'K.TIF'.
I'm trying to find:
20130629VGM180ZZ001001K.TIF
20130629VGM180ZZ001002K.TIF
etc.
As I'm terrible in regex's, I tried this:
Regex.Match(line, #"([A-Z0-9]+){23}\.TIF", RegexOptions.IgnoreCase);
Regex.Match(line, #"(?<=\\)(.>)(?=K\.TIF){23}", RegexOptions.IgnoreCase);
The first one is terrible, doesn't perform and gives bad results.
The second one actually gives all the TIF that end on Z.TIF if I change K\ to Z. However, it does not find any K.TIF's with the current regex.

This seems to work for me:
^.*\\(\w*K.TIF)$
It searches for the last slash and then captures the word characters followed by K.TIF. Example: http://www.regex101.com/r/nH6gV4

This should work:
#"\w+K\.TIF$"

The first regular expression is very close to the answer, but it has an extra '+'. I think you can try the following code.
Regex.Match(line, #"([A-Z0-9]){22}K\.TIF", RegexOptions.IgnoreCase);

This regex will get what you want:
\\([A-Z0-9]{22}K\.TIF)$
You shouldn't use IgnoreCase as you specifically made the regex to match just caps.
The extract value will be inside a match group so use:
string MatchedFileName = Regex.Match(line, #"[A-Z0-9]{22}K\.TIF$").Value;
(Updated, thanks Tyler for pointing out I hadn't read the OP's question properly)
(Updated again as it didnt need the backslash at the start or the capture group)

use this regex var res = Regex.Match(line, #"(?im)^.+k\.tif$";

Related

RegEx for matching special chars no spaces or newlines

I have a string and want to use regex to match all the chars, but no spaces.
I tried to replace all the spaces with nothing, using:
Regex.Replace(seller, #"[A-z](.+)", m => m.Groups[1].Value);
//rating
var betyg = Regex.Replace(seller, #"[A-z](.+)", m => m.Groups[1].Value);`
I expect the output of
"Iris-presenter | 5"
but, the output is
"Iris-presenter"
seen in this also seen in this demo.
The string is:
<spaces>Iris-presenter
<spaces>|
<spaces>5
Great question! I'm not quite sure, if this would be what you might be looking for. This expression however matches your input string:
^((?!\s|\n).)*
Graph
The graph shows how it might work:
Edit
Based on revo's advice, the expression can be much simplified, because
^((?!\s|\n).)* is equal to ^((?!\s).)* and both are equal to ^\S*.
I used (\s(.*?)) for it to work. This removes all spaces and new lines seen here

C# Regular expression search gives false

I am trying to do a program that searches for certain tags inside textfiles and see if there is text in between those tags. Example of tags below.
--<UsrDef_Mod_Trigger_repl_BeginMod>
--<UsrDef_Mod_Trigger_repl_EndMod>
So i want to search for --<UsrDef_Mod_ and _Begin or _End
I made these RegExp, but i get false on every single one.
if (Regex.Match(line, #"/--<UsrDef_Mod_.*_BeginMod>/g", RegexOptions.None).Success)
else if (Regex.Match(line, #"/--<UsrDef_Mod_.*_EndMod>/g", RegexOptions.None).Success)
So any help to find where im going wrong. I have used regexr.com to check my regexp and its getting a match there but not in C#.
The .NET library Regex doesn't understand the "/ /g"wrapper.
Just remove it:
// Regex.Match(line, #"/--<UsrDef_Mod_.*_BeginMod>/g",
Regex.Match(line, #"--<UsrDef_Mod_.*_BeginMod>",
if (Regex.Match(line, #"--<UsrDef_Mod_.*_BeginMod>", RegexOptions.None).Success)
if (Regex.Match(line, #"--<UsrDef_Mod_.*_EndMod>", RegexOptions.None).Success)
Those both get a match - you just remove the /-- and /g options -
As per Henk HoltermannĀ“s Answer - a comparison of perl and c# regex options on SO - for further reference.
var matches = Regex.Matches(text, #"<UsrDef_Mod_([a-zA-Z_]+)_BeginMod>([\s\S]+?)<UsrDef_Mod_\1_EndMod>");
if (matches != null)
foreach (Match m in matches)
Console.WriteLine(m.Groups[2].Value);
Group #2 will contain the text inside two tags.

Regex match through multiple lines

For example we have following string:
Something
AnotherThing
Something AnotherThing
If I use RegexOptions.Singleline with pattern Something.+?AnotherThing then I get two matches when I want to match first and second lines only. I want to use something like FirstLine#endofline##startofline#AnotherLine. So i use:
var regex = new Regex(#"Something$^AnotherThing", RegexOptions.Multiline);
but it doesn't work. I know that I can use some hack with Singleline to match first two lines (and not the last one), but the question: Is it even possible to match exact two texts in exact 2 lines without Singleline specifier, with Multiline option only? And why does it behaves like this.
How about:
Something\r?\nAnotherThing
\r? in case the string doesn't come from Windows.
The reason Something$^AnotherThing doesn't work with the RegexOptions.Multiline option, is because ^ and $ match at line breaks, not the line breaks themselves, so the following would work:
new Regex(#"Something$\r?\n^AnotherThing", RegexOptions.Multiline);
Try to match using the carriage return and line break characters, e.g.
Something\r?\nAnotherThing
Basically, carriage returns are causing you trouble (you're not alone). Do you know which OS your text is coming from? If it's from Windows then there will be a \r before the \n, which you need to account for.

Regex filter out space at the end of line

Why does the following code find nothing in C# but works fine when I tested it online?
Match m = Regex.Match(#"abc
cd", "^abc[ \t]*$", RegexOptions.Multiline);
I am using this online regular expressions tester: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
I expect to get "abc"
For a multiline string, you can remove the spaces at the end without using Regex.
string trimEnd = string.Join("\n", yourString.Split('\n').Select(x => x.TrimEnd()));
If you want to get abc
you can use: ^abc.*$ instead.
If you want to get all,
you can use: (?s)^abc.*$ instead.
I think your issue is: [ \t] can not match newline, so you also can change your code to match new line like this:
Match m = Regex.Match(#"abc
cd", #"^abc[ \t\r]*$", RegexOptions.Multiline);
As I understood you haven't captured any group so you get nothing as a result.
First try this:
Debug.Print(Regex.IsMatch(#"abc
cd", "^abc[ \t]*$", RegexOptions.Multiline).ToString());
You should get true, since there is a match.
Then try this: (notice the braces '()' after '^' and before '$')
Debug.Print(Regex.Match(#"abc
cd", "^(abc[ \t]*)$", RegexOptions.Multiline).ToString());
You should get a result in Output Window.
Hope it helps!
I think I know what's happening here. In my text file the EOL is CRLF. But the C# regex treats the LF as the EOL(which is the '$'). So in my case the regex cannot find the CR and claim a failure. And #"^abc[ \t\r]*$" works.

Why is my .NET regex not working correctly?

I have a text file which is in the format:
key1:val1,
key2:val2,
key3:val3
and I am trying to parse the key/value pairs out with a regex. Here is the regex code I am using with the same example:
string input = #"key1:val1,
key2:val2,
key3:val3";
var r = new Regex(#"^(?<name>\w+):(?<value>\w+),?$", RegexOptions.Multiline | RegexOptions.ExplicitCapture);
foreach (Match m in r.Matches(input))
{
Console.WriteLine(m.Groups["name"].Value);
Console.WriteLine(m.Groups["value"].Value);
}
When I loop through r.Matches, sometimes certain key/value pairs don't appear, and it seems to be the ones with the comma at the end of the line - but I should be taking that into account with the ,?. What am I missing here?
this might be a good situation for String.Split rather than a regex:
foreach(string pair in input.Split(new Char [] {','}))
{
string [] items = pair.Split(new Char [] {':'});
Console.WriteLine(items[0]);
Console.WriteLine(items[1]);
}
The problem is that your regular expression is not matching the newline in the first two lines.
Try changing it to
#"^(?<name>\w+):(?<value>\w+),?(\n|\r|\r\n)?$"
and it should work.
By the way, I love regular expressions, but given the problem you are trying to solve, go for the string.Split solution. It will be much easier to read...
EDIT: after reading your comment, where you say that this is a simplified version of your problem, then maybe you could simplify the expression by adding some "tolerance" for spaces / newline at the end of the match with
#"^(?<name>\w+):(?<value>\w+),?\s*$"
Also, when you play with regular expressions, test them with a tool like Expresso, it saves a lot of time.
Get rid of the RegexOptions.Multiline option.

Categories

Resources