C# Regex will not match - c#

Attempting to write code that will pick up all of a month's log files and zip them up. I can't see to get the RegEx pattern to work in my code. Below is a sandbox console app I'm using to test with:
public static void Main(string[] args)
{
var targetDate = DateTime.Now.AddMonths(-1);
var pattern = $#"c:\\logs\\client-{targetDate.Year}-{targetDate.Month:d2}-.*.log";
Regex regex = new Regex(Regex.Escape(pattern), RegexOptions.IgnoreCase);
var files = Directory.EnumerateFiles(#"c:\logs").Where(f => regex.IsMatch(f)).ToList();
foreach(var file in files)
{
Console.WriteLine(file);
}
}
The Enumerated files look like the following:
c:\logs\client-2021-03-01.log
c:\logs\client-2021-03-02.log
c:\logs\client-2021-03-03.log
c:\logs\client-2021-03-04.log
c:\logs\client-2021-03-05.log
c:\logs\client-2021-03-06.log
c:\logs\client-2021-03-07.log
c:\logs\client-2021-03-08.log
c:\logs\client-2021-03-09.log
c:\logs\client-2021-03-10.log
c:\logs\client-2021-03-11.log
c:\logs\client-2021-03-12.log
c:\logs\client-2021-03-13.log
c:\logs\client-2021-03-14.log
c:\logs\client-2021-03-15.log
c:\logs\client-2021-03-16.log
c:\logs\client-2021-03-17.log
c:\logs\client-2021-03-18.log
c:\logs\client-2021-03-19.log
c:\logs\client-2021-03-20.log
c:\logs\client-2021-03-21.log
c:\logs\client-2021-03-22.log
c:\logs\client-2021-03-23.log
c:\logs\client-2021-03-24.log
c:\logs\client-2021-03-25.log
c:\logs\client-2021-03-26.log
c:\logs\client-2021-03-27.log
c:\logs\client-2021-03-28.log
c:\logs\client-2021-03-29.log
c:\logs\client-2021-03-30.log
c:\logs\client-2021-03-31.log
c:\logs\client-2021-04-01.log
c:\logs\client-2021-04-02.log
c:\logs\client-2021-04-03.log
c:\logs\client-2021-04-05.log
c:\logs\client-2021-04-06.log
c:\logs\client-2021-04-07.log
c:\logs\client-2021-04-08.log
c:\logs\client-2021-04-09.log
c:\logs\client-2021-04-10.log
c:\logs\client-2021-04-12.log
c:\logs\client-2021-04-13.log
c:\logs\client-2021-04-14.log
c:\logs\client-2021-04-15.log
c:\logs\client-2021-04-16.log
c:\logs\client-2021-04-17.log
c:\logs\client-2021-04-18.log
c:\logs\client-2021-04-19.log
c:\logs\client-2021-04-20.log
c:\logs\client-2021-04-21.log
c:\logs\client-2021-04-22.log
c:\logs\client-2021-04-23.log
c:\logs\client-2021-04-24.log
c:\logs\client-2021-04-25.log
c:\logs\client-2021-04-26.log
c:\logs\client-2021-04-27.log
I've checked the RegEx pattern against a couple of testers, including one for .NET and it passes so I'm not sure where the discrepancy is. Any help would be greatly appreciated.

The way you escape your pattern is breaking the RegEx. Here's an example of how you can do it:
var source = #"c:\logs\client-2021-03-01.log";
var pattern = $#"c:\\logs\\client-2021-03-.*.log";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var match = regex.Match(source); // This is now true

Related

C# Regex Pattern to remove comma inside double quote delimited string

I can't be the first person to have this issue but hours of searching Stack revealed nothing close to an answer. I have an SSIS script that works over a directory of csv files. This script folds, bends and mutilates these files; performs queries, data cleansing, persists some data and finally outputs a small set to csv file that is ingested by another system.
One of the files has a free text field that contains the value: "20,000 BONUS POINTS". This one field, in a file of 10k rows, one of dozens of similar files, is the problem that I can't seem to solve.
Be advised: I'm weak on both C# and Regex.
Sample csv set:
4121,6383,0,,,TRUE
4122,6384,0,"20,000 BONUS POINTS",,TRUE
4123,6385,,,,
4124,6386,0,,,TRUE
4125,6387,0,,,TRUE
4126,6388,0,,,TRUE
4127,6389,0,,,TRUE
4128,6390,0,,,TRUE
I found plenty of information on how to parse this using a variety of Regex patterns but what I've noticed is the StreamReader.ReadLine() method wraps the complete line with double quotes:
"4121,6383,0,,,TRUE"
such that the output of the regex Replace method:
s = Regex.Replace(line, #"[^\""]([^\""])*[^\""]",
m => m.Value.Replace(",", ""));
looks like this:
412163830TRUE
and the target line that actually contains a double quote delimited string ends up looking like:
"412263840\"20000 BONUS POINTS\"TRUE"
My entire method (for your reading pleasure) is this:
string fileDirectory = "C:\\tmp\\Unzip\\";
string fullPath = "C:\\tmp\\Unzip\\test.csv";
string line = "";
//int count=0;
List<string> list = new List<string>();
try
{
//MessageBox.Show("inside Try Block");
string s = null;
StreamReader infile = new StreamReader(fullPath);
StreamWriter outfile = new StreamWriter(Path.Combine(fileDirectory, "output.csv"));
while ((line = infile.ReadLine()) != null)
{
//line.Substring(0,1).Substring(line.Length-1, 1);
System.Console.WriteLine(line);
Console.WriteLine(line);
line =
s = Regex.Replace(line, #"[^\""]([^\""])*[^\""]",
m => m.Value.Replace(",", ""));
System.Console.WriteLine(s);
list.Add(s);
}
foreach (string item in list)
{
outfile.WriteLine(item);
};
infile.Close();
outfile.Close();
//System.Console.WriteLine("There were {0} lines.", count);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
//another addition for TFS consumption
}
Thanks for reading and if you have a useful answer, bless you and your prodigy for generations to come!
mfc
EDIT: The requirement is a valid csv file output. In the case of the test data, it would look like this:
4121,6383,0,,,TRUE
4122,6384,0,"20000 BONUS POINTS",,TRUE
4123,6385,,,,
4124,6386,0,,,TRUE
4125,6387,0,,,TRUE
4126,6388,0,,,TRUE
4127,6389,0,,,TRUE
4128,6390,0,,,TRUE
I recommend using a CSV reader lib like others have suggested.
Install-Package LumenWorksCsvReader
https://github.com/phatcher/CsvReader#getting-started
However, if you just want to try something fast and dirty. Give this a try.
If I understand correctly. You need to remove commas between double quotes within each line of a CSV file. This should do that.
using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string pattern = #"([""'])(?:(?=(\\?))\2.)*?\1";
List<string> lines = new List<string>();
lines.Add("4121,6383,0,,,TRUE");
lines.Add("4122,6384,0,\"20,000 BONUS POINTS\",,TRUE");
lines.Add("4123,6385,,,,");
lines.Add("4124,6386,0,,,TRUE");
lines.Add("4125,6387,0,,,TRUE");
lines.Add("4126,6388,0,,,TRUE");
lines.Add("4127,6389,0,,,TRUE");
lines.Add("4128,6390,0,,,TRUE");
StringBuilder sb = new StringBuilder();
foreach (var line in lines)
{
sb.Append(Regex.Replace(line, pattern, m => m.Value.Replace(",", ""))+"\n");
}
Console.WriteLine(sb.ToString());
}
}
OUTPUT
4121,6383,0,,,TRUE
4122,6384,0,"20000 BONUS POINTS",,TRUE
4123,6385,,,,
4124,6386,0,,,TRUE
4125,6387,0,,,TRUE
4126,6388,0,,,TRUE
4127,6389,0,,,TRUE
4128,6390,0,,,TRUE
https://dotnetfiddle.net/flmWG3
I haven't tried with numerous lines, but this would be my first approach:
namespace ConsoleTestApplication
{
class Program
{
static void Main(string[] args)
{
var before = "4122,6384,0,\"20,000 BONUS POINTS\",,TRUE";
var pattern = #"""[^""]*""";
var after = Regex.Replace(before, pattern, match => match.Value.Replace(",", ""));
Console.WriteLine(after);
}
}
}

C# regex not working properly

I've been trying to write a regex which I know finds 6 matches, since I used many regex engines to check it. The problem is with the Match-> nextMatch, or it's smarter equivalent:
Match m= regex.Match(data,nextRelevantIndex);
when I use the methods above I get 3 results out of 6.
however when I use
MatchCollection mc = r.Matches(data);
foreach (Match m in mc)
{
// …
}
it iterates over 6 times.
Unfortunately I cannot use this version, since I'm changing the data I run on, and it will be much more difficult for me than to use
regex.Match(data,nextRelevantIndex);
Is it a known problem in C#? what is the best solution for this?
the regex is:
((?:var\s+)?[\w_]+\s*=)?\s*\$\.import\s*\((?:[""'']([^''"";)]+)[""''])(?:\s*,\s*(?:[""'']([^''"";)]+)[""'']))?\s*\)(\.[^;]+;)?
The string is:
//from project
$.import("sap.hana.ide.catalog.plugin.performance.server.lib", "helpers");
var h = $.sap.hana.ide.catalog.plugin.performance.server.lib.helpers;
//basic example
$.import("a.b","math"); //var otherHashVar= new otherHash();
$.a.b.math.max(1); //otherHashVar.max(1);
alert($.a.b.math.a);//alert(otherHashVar.a);
//a bit more complex
var z=$.import("a.b.c","x"); // var z=new otherHash(); -> no additional fixes to be done
z.foo();
//another variation
$.import ("a.b","myLib").x(); // similar to first
//var otherHashVar=new OtherHash();
//otherHashVar.x();
var z=$.import("a\b\c.xsjs");
z=$.import("a\b\c.xsjs").a.b.c++;
and the code is:
while(m.Success){
m = r.Match(data, m.Index + m.Length);
}
since I'm not currently modifying the data (will do when I will success to have 6 matches)
The problem is elsewhere in your program.
The following writes 6 matches to console:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
String data="//from project\r\n$.import(\"sap.hana.ide.catalog.plugin.performance.server.lib\", \"helpers\");\r\nvar h = $.sap.hana.ide.catalog.plugin.performance.server.lib.helpers;\r\n//basic example\r\n$.import(\"a.b\",\"math\"); //var otherHashVar= new otherHash();\r\n$.a.b.math.max(1); //otherHashVar.max(1);\r\n\ralert($.a.b.math.a);//alert(otherHashVar.a);\r\n\r\n//a bit more complex\rv\n\r z=$.import(\"a.b.c\",\"x\"); // var z=new otherHash(); -> no additional fixes to be done\rz\n.foo(); \r\n\r//another variation\r$.import (\"a.b\",\"myLib\").x(); // similar to first \r\n//var otherHashVar=new OtherHash();\r\n//otherHashVar.x();\r\n\r\nvar z=$.import(\"a\\b\\c.xsjs\"); \r\n\r\nz=$.import(\"a\\b\\c.xsjs\").a.b.c++;"
;
//System.Console.WriteLine(data);
String expr="((?:var\\s+)?[\\w_]+\\s*=)?\\s*\\$\\.import\\s*\\((?:[\"\"'']([^''\"\";)]+)[\"\"''])(?:\\s*,\\s*(?:[\"\"'']([^''\"\";)]+)[\"\"'']))?\\s*\\)(\\.[^;]+;)?";
Regex r=new Regex(expr);
Match m=r.Match(data);
while(m.Success){
System.Console.WriteLine("Match found ");
System.Console.WriteLine(m.Value);
System.Console.WriteLine();
m = r.Match(data, m.Index + m.Length);
}
}
}
Dot Net fiddle
Also you state that you can't use foreach match in matchcollection because you are modifying your data. What modification are you doing, and have you considered using Regex.Replace?

How to build a command parser with regex

I'm trying to implement a command parser to parse command parameters to a key value pair list.
For example, there is a command to output images:[name]_w[width]_h[height]_t[transparency],say"image01_w64_h128_t90",the program would output the image "image01" with specified size and transparency, and so far I'm using regex to solve it.
Code:
private static readonly Regex CommandReg = new Regex(
#"^(?<name>[\d\w]+?)(_W(?<width>\d+))?(_H(?<height>\d+))?(_T(?<transparency>\d+))?$"
, RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.ExplicitCapture);
public static NameValueCollection ParseCommand(string command)
{
var match = CommandReg.Match(command);
if (!match.Success) return null;
var groups = match.Groups;
var paramList = new NameValueCollection(4);
paramList["name"] = groups["name"].Value;
paramList["width"] = groups["width"].Value;
paramList["height"] = groups["height"].Value;
paramList["transparency"] = groups["transparency"].Value;
return paramList;
}
This way worked and the code is very easy. However, a higher demand is if the order of parameters is changed, say "image01_h128_w64_t90" or "image01_t90_w64_h128", the program can also output expected result.
Is it possible to solve the problem using regex?
If regex is helpless,any other suggestions?
Thanks for any suggestion, editing, and viewing.
Just do string.split('_') then iterate through the array to find everything you need.
if(arr[i].startswith("w"))
{
paramList["width"] = arr[i].remove(0,1);
}
and so on.

C# using regex to replace value only after = sign

ok I have a text file that contains:
books_book1 = 1
books_book2 = 2
books_book3 = 3
I would like to retain "books_book1 = "
so far I have:
string text = File.ReadAllText("settings.txt");
text = Regex.Replace(text, ".*books_book1*.", "books_book1 = a",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
text = Regex.Replace(text, ".*books_book2*.", "books_book2 = b",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
text = Regex.Replace(text, ".*books_book3*.", "books_book3 = c",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
this results in:
books_book1 = a=1
output to file should be:
books_book1 = a
books_book2 = b
books_book3 = c
Thanks much in advance...
In a comment I stated:
"I would personally just go for recreating the file if it is that simple. Presumably you load all the values from the file into an object of some kind initially so just use that to recreate the file with the new values. Much easier than messing with Regularexpressions - its simpler and easier to test and see what is going on and easier to change if you ever need to."
I think having looked at this again it is even more true.
From what you said in comments: "when the program loads it reads the values from this text file, then the user has an option to change the values and save it back to the file". Presumably this means that you need to actually know which of the books1, books2, etc. lines you are replacing so you know which of the user supplied values to put in. This is fine (though a little unwieldy) with three items but if you increase that number then you'll need to update your code for every new item. This is never a good thing and will quickly produce some very horrendous looking code liable to get bugs in.
If you have your new settings in some kind of data structure (eg a dictionary) then as I say recreating the file from scratch is probably easiest. See for example this small fully contained code snippet:
//Set up our sample Dictionary
Dictionary<string, string> settings = new Dictionary<string,string>();
settings.Add("books_book1","a");
settings.Add("books_book2","b");
settings.Add("books_book3","c");
//Write the values to file via an intermediate stringbuilder.
StringBuilder sb = new StringBuilder();
foreach (var item in settings)
{
sb.AppendLine(String.Format("{0} = {1}", item.Key, item.Value));
}
File.WriteAllText("settings.txt", sb.ToString());
This has obvious advantages of being simpler and that if you add more settings then they will just go into the dictionary and you don't need to change the code.
I don't think this is the best way to solve the problem, but to make the RegEx do what you want you can do the following:
var findFilter = #"(.*books_book1\s*=\s)(.+)";
var replaceFilter = "${1}a"
text = Regex.Replace(text, findFilter, replaceFilter, RegexOptions.Multiline)
File.WriteLine("settings.txt", text);
....
The code between the ( and ) in the regex is in this case the first and only back reference capturing group and ${1} in the replace portion will use the matching group text to create the output you want. Also you'll notice I used \s for white space so you don't match book111 for example. I'm sure there are other edge cases you'll need to deal with.
books_book1 = a
...
Here's the start to a more generic approach:
This regular expression captures the last digit, taking care to account for variability in digit and whitespace length.
text = Regex.Replace(text , #"(books_book\d+\s*=\s*)(\d+)", DoReplace)
// ...
string DoReplace(Match m)
{
return m.Groups[1].Value + Convert.ToChar(int.Parse(m.Groups[2].Value) + 96);
}
How about something like this (no error checking):
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace TestRegex
{
class Program
{
static void Main( string[] args )
{
var path = #"settings.txt";
var pattern = #"(^\s*books_book\d+\s*=\s*)(\d+)(\s*)$";
var options = RegexOptions.IgnoreCase | RegexOptions.Multiline;
var contents = Regex.Replace( File.ReadAllText( path ), pattern, MyMatchEvaluator, options );
File.WriteAllText( path, contents );
}
static int x = char.ConvertToUtf32( "a", 0 );
static string MyMatchEvaluator( Match m )
{
var x1 = m.Groups[ 1 ].Value;
var x2 = char.ConvertFromUtf32( x++ );
var x3 = m.Groups[ 3 ].Value;
var result = x1 + x2 + x3;
return result;
}
}
}

How to test a directory path that matches a specific pattern/regex?

I have the following test which aims to ensure that file path generated is of a specific format. Using Nunit's fluent interfaces, how can I go about this?
I am having trouble with the regex.
[Test]
public void AssetControlPath_ShouldHaveFormat_BaseDir_YYYY_MMM_YYYYMMMDD()
{
//Arrange
var baseDir = "C:\\BaseDir";
var fpBuilder = new FilePathBuilder(new DateTime(2010,10,10), baseDir );
//Act
var destinationPath = fpBuilder.AssetControlPath();
//Assert
// destinationPath = C:\BasDir\2010\Oct\20101010
Assert.That(destinationPath, Is.StringMatching(#"C:\\BaseDir\\d{4}\\[a-zA-Z]{3}\\d{8}"));
}
The unit test error
XXX.FilepathBuilderTests.AssetControlPath_ShouldHaveFormat_BaseDir_YYYY_MMM_YYYYMMMDD:
Expected: String matching "C:\\BaseDir\\d{4}\\[a-zA-Z]{3}\\d{8}"
But was: "C:\BaseDir\2010\Oct\20101010"
Edit:
I have actually switched the test to use #ChrisF's approach. The question however still stands.
A String.Split with \ as the split character and then check that you get the right number of elements (5) and that each element is the expected value might be:
a) be clearer in what the intent of the test is and
b) easier to maintain.
#"C:\\BaseDir\\d{4}\\[a-zA-Z]{3}]\\d{8}"
// /\ extra bracket
Also you have a problem with \ escaping, you need \\\d{4} and \\\d{8}, you want to match xxx\20101010 and not xxx20101010. The following fix matches correct:
var str = #"C:\BaseDir\2010\Oct\20101010";
var re = new Regex(#"C:\\BaseDir\\\d{4}\\[a-zA-Z]{3}\\\d{8}", RegexOptions.IgnoreCase);
var result = re.IsMatch(str); // true
You have an extra closing square bracket!! remove that and it chould be fine.

Categories

Resources