C# regex not working properly - c#

I've been trying to write a regex which I know finds 6 matches, since I used many regex engines to check it. The problem is with the Match-> nextMatch, or it's smarter equivalent:
Match m= regex.Match(data,nextRelevantIndex);
when I use the methods above I get 3 results out of 6.
however when I use
MatchCollection mc = r.Matches(data);
foreach (Match m in mc)
{
// …
}
it iterates over 6 times.
Unfortunately I cannot use this version, since I'm changing the data I run on, and it will be much more difficult for me than to use
regex.Match(data,nextRelevantIndex);
Is it a known problem in C#? what is the best solution for this?
the regex is:
((?:var\s+)?[\w_]+\s*=)?\s*\$\.import\s*\((?:[""'']([^''"";)]+)[""''])(?:\s*,\s*(?:[""'']([^''"";)]+)[""'']))?\s*\)(\.[^;]+;)?
The string is:
//from project
$.import("sap.hana.ide.catalog.plugin.performance.server.lib", "helpers");
var h = $.sap.hana.ide.catalog.plugin.performance.server.lib.helpers;
//basic example
$.import("a.b","math"); //var otherHashVar= new otherHash();
$.a.b.math.max(1); //otherHashVar.max(1);
alert($.a.b.math.a);//alert(otherHashVar.a);
//a bit more complex
var z=$.import("a.b.c","x"); // var z=new otherHash(); -> no additional fixes to be done
z.foo();
//another variation
$.import ("a.b","myLib").x(); // similar to first
//var otherHashVar=new OtherHash();
//otherHashVar.x();
var z=$.import("a\b\c.xsjs");
z=$.import("a\b\c.xsjs").a.b.c++;
and the code is:
while(m.Success){
m = r.Match(data, m.Index + m.Length);
}
since I'm not currently modifying the data (will do when I will success to have 6 matches)

The problem is elsewhere in your program.
The following writes 6 matches to console:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
String data="//from project\r\n$.import(\"sap.hana.ide.catalog.plugin.performance.server.lib\", \"helpers\");\r\nvar h = $.sap.hana.ide.catalog.plugin.performance.server.lib.helpers;\r\n//basic example\r\n$.import(\"a.b\",\"math\"); //var otherHashVar= new otherHash();\r\n$.a.b.math.max(1); //otherHashVar.max(1);\r\n\ralert($.a.b.math.a);//alert(otherHashVar.a);\r\n\r\n//a bit more complex\rv\n\r z=$.import(\"a.b.c\",\"x\"); // var z=new otherHash(); -> no additional fixes to be done\rz\n.foo(); \r\n\r//another variation\r$.import (\"a.b\",\"myLib\").x(); // similar to first \r\n//var otherHashVar=new OtherHash();\r\n//otherHashVar.x();\r\n\r\nvar z=$.import(\"a\\b\\c.xsjs\"); \r\n\r\nz=$.import(\"a\\b\\c.xsjs\").a.b.c++;"
;
//System.Console.WriteLine(data);
String expr="((?:var\\s+)?[\\w_]+\\s*=)?\\s*\\$\\.import\\s*\\((?:[\"\"'']([^''\"\";)]+)[\"\"''])(?:\\s*,\\s*(?:[\"\"'']([^''\"\";)]+)[\"\"'']))?\\s*\\)(\\.[^;]+;)?";
Regex r=new Regex(expr);
Match m=r.Match(data);
while(m.Success){
System.Console.WriteLine("Match found ");
System.Console.WriteLine(m.Value);
System.Console.WriteLine();
m = r.Match(data, m.Index + m.Length);
}
}
}
Dot Net fiddle
Also you state that you can't use foreach match in matchcollection because you are modifying your data. What modification are you doing, and have you considered using Regex.Replace?

Related

C# Regex will not match

Attempting to write code that will pick up all of a month's log files and zip them up. I can't see to get the RegEx pattern to work in my code. Below is a sandbox console app I'm using to test with:
public static void Main(string[] args)
{
var targetDate = DateTime.Now.AddMonths(-1);
var pattern = $#"c:\\logs\\client-{targetDate.Year}-{targetDate.Month:d2}-.*.log";
Regex regex = new Regex(Regex.Escape(pattern), RegexOptions.IgnoreCase);
var files = Directory.EnumerateFiles(#"c:\logs").Where(f => regex.IsMatch(f)).ToList();
foreach(var file in files)
{
Console.WriteLine(file);
}
}
The Enumerated files look like the following:
c:\logs\client-2021-03-01.log
c:\logs\client-2021-03-02.log
c:\logs\client-2021-03-03.log
c:\logs\client-2021-03-04.log
c:\logs\client-2021-03-05.log
c:\logs\client-2021-03-06.log
c:\logs\client-2021-03-07.log
c:\logs\client-2021-03-08.log
c:\logs\client-2021-03-09.log
c:\logs\client-2021-03-10.log
c:\logs\client-2021-03-11.log
c:\logs\client-2021-03-12.log
c:\logs\client-2021-03-13.log
c:\logs\client-2021-03-14.log
c:\logs\client-2021-03-15.log
c:\logs\client-2021-03-16.log
c:\logs\client-2021-03-17.log
c:\logs\client-2021-03-18.log
c:\logs\client-2021-03-19.log
c:\logs\client-2021-03-20.log
c:\logs\client-2021-03-21.log
c:\logs\client-2021-03-22.log
c:\logs\client-2021-03-23.log
c:\logs\client-2021-03-24.log
c:\logs\client-2021-03-25.log
c:\logs\client-2021-03-26.log
c:\logs\client-2021-03-27.log
c:\logs\client-2021-03-28.log
c:\logs\client-2021-03-29.log
c:\logs\client-2021-03-30.log
c:\logs\client-2021-03-31.log
c:\logs\client-2021-04-01.log
c:\logs\client-2021-04-02.log
c:\logs\client-2021-04-03.log
c:\logs\client-2021-04-05.log
c:\logs\client-2021-04-06.log
c:\logs\client-2021-04-07.log
c:\logs\client-2021-04-08.log
c:\logs\client-2021-04-09.log
c:\logs\client-2021-04-10.log
c:\logs\client-2021-04-12.log
c:\logs\client-2021-04-13.log
c:\logs\client-2021-04-14.log
c:\logs\client-2021-04-15.log
c:\logs\client-2021-04-16.log
c:\logs\client-2021-04-17.log
c:\logs\client-2021-04-18.log
c:\logs\client-2021-04-19.log
c:\logs\client-2021-04-20.log
c:\logs\client-2021-04-21.log
c:\logs\client-2021-04-22.log
c:\logs\client-2021-04-23.log
c:\logs\client-2021-04-24.log
c:\logs\client-2021-04-25.log
c:\logs\client-2021-04-26.log
c:\logs\client-2021-04-27.log
I've checked the RegEx pattern against a couple of testers, including one for .NET and it passes so I'm not sure where the discrepancy is. Any help would be greatly appreciated.
The way you escape your pattern is breaking the RegEx. Here's an example of how you can do it:
var source = #"c:\logs\client-2021-03-01.log";
var pattern = $#"c:\\logs\\client-2021-03-.*.log";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var match = regex.Match(source); // This is now true

Parsing Email To Header

Ok so I am currently facing a few difficulties with my email parser
When I started, most of the emails I tested with was something like the following
"name#domain.co.za, othername#domain.co.za" this I can easilly split by the comma, but I get the following cases that doesn't work
1) "\"Surname, Name, Company Country\" <name.surname#domain.co.za>"
With that I tried the following
Regex.Split(Headers["to"] ?? "", "(?<=#\\S*)\\s+");
But that doesn't remove the comma then so I am using .Trim(',') to remove the trailing comma then some cases work
Example that works
"name#domain.co.za, othername#domain.co.za"
For example the following doesn't work
2) "\"Name Surname <name#domain.co.za>\" <name#domain.co.za>"
I also tried to use Regex.Split(Headers["to"] ?? "", ",(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)");
But it fails in a situation like the following
"\"Name Surname\" <Name#domain.co.za>, \"Name Surname\" <Othername#domain.co.za>"
Now using a new Regex (?:""([^""]+)"")?\s*<?\b(\S+#\S+\.\S+)\b it is quite close, using the following exaple I get the following output
Input: "\"Donald Jansen\" <Donald#peachss.co.za>, \"Donald Jansen\" <djhabana#gmail.com>"
Output
"\"Donald Jansen\" <Donald#peachss.co.za
\"Donald Jansen\" <djhabana#gmail.com
So it ignored the trailing >, I fixed this by adding >? to the regex and I also found a new scenario that is not working, so the regex is now (?:"([^"]+)")?\s*<?\b(\S+#\S+\.\S+)\b>?
"name <name#xxx.co.za>, name name <name#xxx.co.za>, name <name#xxx.co.za>, \"'name'\" <name#xxx.com>"
The output now is
<name#xxx.co.za> << not correct name is needed
<name#xxx.co.za> << not correct name is needed
<name#xxx.co.za> << not correct name is needed
\"'name'\" <name#xxx.com>" << this is correct
This might do the trick to find all valid emails in the string.
Regex emailRegex = new Regex(#"\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*", RegexOptions.IgnoreCase);
MatchCollection emailMatches = emailRegex.Matches(data);
foreach (Match emailMatch in emailMatches)
{
Console.WriteLine(emailMatch.Value);
}
You can use the 'Email Regex' from emailregex.com
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
Thanks to the help that #MohitShrivastava and #WiktorStribiżew I managed to build my own regex using a combination of their regex that they provided, it is probably not optimized and a bit ugly but it is working as I expect it to
((\w+[ ])|\"(.*?)\"+[ ])+(<?\b(\S+#\S+\.\S+)\b>)|(\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*)
Sample code
var emailRegex = new Regex(#"((\w+[ ])|\""(.*?)\""+[ ])+(<?\b(\S+#\S+\.\S+)\b>)|(\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*)", RegexOptions.IgnoreCase);
var emailMatches = emailRegex.Matches(Headers["to"]);
foreach (Match emailMatch in emailMatches)
{
try
{
To.Add(new MailAddress(emailMatch.Value));
}
catch (Exception ex)
{
}
}

Parse Line and Break it into Variables

I have a text file that contain only the FULL version number of an application that I need to extract and then parse it into separate Variables.
For example lets say the version.cs contains 19.1.354.6
Code I'm using does not seem to be working:
char[] delimiter = { '.' };
string currentVersion = System.IO.File.ReadAllText(#"C:\Applicaion\version.cs");
string[] partsVersion;
partsVersion = currentVersion.Split(delimiter);
string majorVersion = partsVersion[0];
string minorVersion = partsVersion[1];
string buildVersion = partsVersion[2];
string revisVersion = partsVersion[3];
Altough your problem is with the file, most likely it contains other text than a version, why dont you use Version class which is absolutely for this kind of tasks.
var version = new Version("19.1.354.6");
var major = version.Major; // etc..
What you have works fine with the correct input, so I would suggest making sure there is nothing else in the file you're reading.
In the future, please provide error information, since we can't usually tell exactly what you expect to happen, only what we know should happen.
In light of that, I would also suggest looking into using Regex for parsing in the future. In my opinion, it provides a much more flexible solution for your needs. Here's an example of regex to use:
var regex = new Regex(#"([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9])");
var match = regex.Match("19.1.354.6");
if (match.Success)
{
Console.WriteLine("Match[1]: "+match.Groups[1].Value);
Console.WriteLine("Match[2]: "+match.Groups[2].Value);
Console.WriteLine("Match[3]: "+match.Groups[3].Value);
Console.WriteLine("Match[4]: "+match.Groups[4].Value);
}
else
{
Console.WriteLine("No match found");
}
which outputs the following:
// Match[1]: 19
// Match[2]: 1
// Match[3]: 354
// Match[4]: 6

C# using regex to replace value only after = sign

ok I have a text file that contains:
books_book1 = 1
books_book2 = 2
books_book3 = 3
I would like to retain "books_book1 = "
so far I have:
string text = File.ReadAllText("settings.txt");
text = Regex.Replace(text, ".*books_book1*.", "books_book1 = a",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
text = Regex.Replace(text, ".*books_book2*.", "books_book2 = b",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
text = Regex.Replace(text, ".*books_book3*.", "books_book3 = c",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
this results in:
books_book1 = a=1
output to file should be:
books_book1 = a
books_book2 = b
books_book3 = c
Thanks much in advance...
In a comment I stated:
"I would personally just go for recreating the file if it is that simple. Presumably you load all the values from the file into an object of some kind initially so just use that to recreate the file with the new values. Much easier than messing with Regularexpressions - its simpler and easier to test and see what is going on and easier to change if you ever need to."
I think having looked at this again it is even more true.
From what you said in comments: "when the program loads it reads the values from this text file, then the user has an option to change the values and save it back to the file". Presumably this means that you need to actually know which of the books1, books2, etc. lines you are replacing so you know which of the user supplied values to put in. This is fine (though a little unwieldy) with three items but if you increase that number then you'll need to update your code for every new item. This is never a good thing and will quickly produce some very horrendous looking code liable to get bugs in.
If you have your new settings in some kind of data structure (eg a dictionary) then as I say recreating the file from scratch is probably easiest. See for example this small fully contained code snippet:
//Set up our sample Dictionary
Dictionary<string, string> settings = new Dictionary<string,string>();
settings.Add("books_book1","a");
settings.Add("books_book2","b");
settings.Add("books_book3","c");
//Write the values to file via an intermediate stringbuilder.
StringBuilder sb = new StringBuilder();
foreach (var item in settings)
{
sb.AppendLine(String.Format("{0} = {1}", item.Key, item.Value));
}
File.WriteAllText("settings.txt", sb.ToString());
This has obvious advantages of being simpler and that if you add more settings then they will just go into the dictionary and you don't need to change the code.
I don't think this is the best way to solve the problem, but to make the RegEx do what you want you can do the following:
var findFilter = #"(.*books_book1\s*=\s)(.+)";
var replaceFilter = "${1}a"
text = Regex.Replace(text, findFilter, replaceFilter, RegexOptions.Multiline)
File.WriteLine("settings.txt", text);
....
The code between the ( and ) in the regex is in this case the first and only back reference capturing group and ${1} in the replace portion will use the matching group text to create the output you want. Also you'll notice I used \s for white space so you don't match book111 for example. I'm sure there are other edge cases you'll need to deal with.
books_book1 = a
...
Here's the start to a more generic approach:
This regular expression captures the last digit, taking care to account for variability in digit and whitespace length.
text = Regex.Replace(text , #"(books_book\d+\s*=\s*)(\d+)", DoReplace)
// ...
string DoReplace(Match m)
{
return m.Groups[1].Value + Convert.ToChar(int.Parse(m.Groups[2].Value) + 96);
}
How about something like this (no error checking):
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace TestRegex
{
class Program
{
static void Main( string[] args )
{
var path = #"settings.txt";
var pattern = #"(^\s*books_book\d+\s*=\s*)(\d+)(\s*)$";
var options = RegexOptions.IgnoreCase | RegexOptions.Multiline;
var contents = Regex.Replace( File.ReadAllText( path ), pattern, MyMatchEvaluator, options );
File.WriteAllText( path, contents );
}
static int x = char.ConvertToUtf32( "a", 0 );
static string MyMatchEvaluator( Match m )
{
var x1 = m.Groups[ 1 ].Value;
var x2 = char.ConvertFromUtf32( x++ );
var x3 = m.Groups[ 3 ].Value;
var result = x1 + x2 + x3;
return result;
}
}
}

C# Regex Split To Java Pattern split

I have to port some C# code to Java and I am having some trouble converting a string splitting command.
While the actual regex is still correct, when splitting in C# the regex tokens are part of the resulting string[], but in Java the regex tokens are removed.
What is the easiest way to keep the split-on tokens?
Here is an example of C# code that works the way I want it:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
String[] values = Regex.Split("5+10", #"([\+\-\*\(\)\^\\/])");
foreach (String value in values)
Console.WriteLine(value);
}
}
Produces:
5
+
10
I don't know how C# does it, but to accomplish it in Java, you'll have to approximate it. Look at how this code does it:
public String[] split(String text) {
if (text == null) {
text = "";
}
int last_match = 0;
LinkedList<String> splitted = new LinkedList<String>();
Matcher m = this.pattern.matcher(text);
// Iterate trough each match
while (m.find()) {
// Text since last match
splitted.add(text.substring(last_match,m.start()));
// The delimiter itself
if (this.keep_delimiters) {
splitted.add(m.group());
}
last_match = m.end();
}
// Trailing text
splitted.add(text.substring(last_match));
return splitted.toArray(new String[splitted.size()]);
}
This is because you are capturing the split token. C# takes this as a hint that you wish to retain the token itself as a member of the resulting array. Java does not support this.

Categories

Resources