How to build a command parser with regex

How to build a command parser with regex - c#

I'm trying to implement a command parser to parse command parameters to a key value pair list.
For example, there is a command to output images:[name]_w[width]_h[height]_t[transparency],say"image01_w64_h128_t90",the program would output the image "image01" with specified size and transparency, and so far I'm using regex to solve it.
Code:
private static readonly Regex CommandReg = new Regex(
#"^(?<name>[\d\w]+?)(_W(?<width>\d+))?(_H(?<height>\d+))?(_T(?<transparency>\d+))?$"
, RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.ExplicitCapture);
public static NameValueCollection ParseCommand(string command)
{
var match = CommandReg.Match(command);
if (!match.Success) return null;
var groups = match.Groups;
var paramList = new NameValueCollection(4);
paramList["name"] = groups["name"].Value;
paramList["width"] = groups["width"].Value;
paramList["height"] = groups["height"].Value;
paramList["transparency"] = groups["transparency"].Value;
return paramList;
}
This way worked and the code is very easy. However, a higher demand is if the order of parameters is changed, say "image01_h128_w64_t90" or "image01_t90_w64_h128", the program can also output expected result.
Is it possible to solve the problem using regex?
If regex is helpless,any other suggestions?
Thanks for any suggestion, editing, and viewing.

Just do string.split('_') then iterate through the array to find everything you need.
if(arr[i].startswith("w"))
{
paramList["width"] = arr[i].remove(0,1);
}
and so on.

Related

C# Regex will not match

Attempting to write code that will pick up all of a month's log files and zip them up. I can't see to get the RegEx pattern to work in my code. Below is a sandbox console app I'm using to test with:
public static void Main(string[] args)
{
var targetDate = DateTime.Now.AddMonths(-1);
var pattern = $#"c:\\logs\\client-{targetDate.Year}-{targetDate.Month:d2}-.*.log";
Regex regex = new Regex(Regex.Escape(pattern), RegexOptions.IgnoreCase);
var files = Directory.EnumerateFiles(#"c:\logs").Where(f => regex.IsMatch(f)).ToList();
foreach(var file in files)
{
Console.WriteLine(file);
}
}
The Enumerated files look like the following:
c:\logs\client-2021-03-01.log
c:\logs\client-2021-03-02.log
c:\logs\client-2021-03-03.log
c:\logs\client-2021-03-04.log
c:\logs\client-2021-03-05.log
c:\logs\client-2021-03-06.log
c:\logs\client-2021-03-07.log
c:\logs\client-2021-03-08.log
c:\logs\client-2021-03-09.log
c:\logs\client-2021-03-10.log
c:\logs\client-2021-03-11.log
c:\logs\client-2021-03-12.log
c:\logs\client-2021-03-13.log
c:\logs\client-2021-03-14.log
c:\logs\client-2021-03-15.log
c:\logs\client-2021-03-16.log
c:\logs\client-2021-03-17.log
c:\logs\client-2021-03-18.log
c:\logs\client-2021-03-19.log
c:\logs\client-2021-03-20.log
c:\logs\client-2021-03-21.log
c:\logs\client-2021-03-22.log
c:\logs\client-2021-03-23.log
c:\logs\client-2021-03-24.log
c:\logs\client-2021-03-25.log
c:\logs\client-2021-03-26.log
c:\logs\client-2021-03-27.log
c:\logs\client-2021-03-28.log
c:\logs\client-2021-03-29.log
c:\logs\client-2021-03-30.log
c:\logs\client-2021-03-31.log
c:\logs\client-2021-04-01.log
c:\logs\client-2021-04-02.log
c:\logs\client-2021-04-03.log
c:\logs\client-2021-04-05.log
c:\logs\client-2021-04-06.log
c:\logs\client-2021-04-07.log
c:\logs\client-2021-04-08.log
c:\logs\client-2021-04-09.log
c:\logs\client-2021-04-10.log
c:\logs\client-2021-04-12.log
c:\logs\client-2021-04-13.log
c:\logs\client-2021-04-14.log
c:\logs\client-2021-04-15.log
c:\logs\client-2021-04-16.log
c:\logs\client-2021-04-17.log
c:\logs\client-2021-04-18.log
c:\logs\client-2021-04-19.log
c:\logs\client-2021-04-20.log
c:\logs\client-2021-04-21.log
c:\logs\client-2021-04-22.log
c:\logs\client-2021-04-23.log
c:\logs\client-2021-04-24.log
c:\logs\client-2021-04-25.log
c:\logs\client-2021-04-26.log
c:\logs\client-2021-04-27.log
I've checked the RegEx pattern against a couple of testers, including one for .NET and it passes so I'm not sure where the discrepancy is. Any help would be greatly appreciated.

The way you escape your pattern is breaking the RegEx. Here's an example of how you can do it:
var source = #"c:\logs\client-2021-03-01.log";
var pattern = $#"c:\\logs\\client-2021-03-.*.log";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var match = regex.Match(source); // This is now true

Parse Line and Break it into Variables

I have a text file that contain only the FULL version number of an application that I need to extract and then parse it into separate Variables.
For example lets say the version.cs contains 19.1.354.6
Code I'm using does not seem to be working:
char[] delimiter = { '.' };
string currentVersion = System.IO.File.ReadAllText(#"C:\Applicaion\version.cs");
string[] partsVersion;
partsVersion = currentVersion.Split(delimiter);
string majorVersion = partsVersion[0];
string minorVersion = partsVersion[1];
string buildVersion = partsVersion[2];
string revisVersion = partsVersion[3];

Altough your problem is with the file, most likely it contains other text than a version, why dont you use Version class which is absolutely for this kind of tasks.
var version = new Version("19.1.354.6");
var major = version.Major; // etc..

What you have works fine with the correct input, so I would suggest making sure there is nothing else in the file you're reading.
In the future, please provide error information, since we can't usually tell exactly what you expect to happen, only what we know should happen.
In light of that, I would also suggest looking into using Regex for parsing in the future. In my opinion, it provides a much more flexible solution for your needs. Here's an example of regex to use:
var regex = new Regex(#"([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9])");
var match = regex.Match("19.1.354.6");
if (match.Success)
{
Console.WriteLine("Match[1]: "+match.Groups[1].Value);
Console.WriteLine("Match[2]: "+match.Groups[2].Value);
Console.WriteLine("Match[3]: "+match.Groups[3].Value);
Console.WriteLine("Match[4]: "+match.Groups[4].Value);
}
else
{
Console.WriteLine("No match found");
}
which outputs the following:
// Match[1]: 19
// Match[2]: 1
// Match[3]: 354
// Match[4]: 6

C# regex not working properly

I've been trying to write a regex which I know finds 6 matches, since I used many regex engines to check it. The problem is with the Match-> nextMatch, or it's smarter equivalent:
Match m= regex.Match(data,nextRelevantIndex);
when I use the methods above I get 3 results out of 6.
however when I use
MatchCollection mc = r.Matches(data);
foreach (Match m in mc)
{
// …
}
it iterates over 6 times.
Unfortunately I cannot use this version, since I'm changing the data I run on, and it will be much more difficult for me than to use
regex.Match(data,nextRelevantIndex);
Is it a known problem in C#? what is the best solution for this?
the regex is:
((?:var\s+)?[\w_]+\s*=)?\s*\$\.import\s*\((?:[""'']([^''"";)]+)[""''])(?:\s*,\s*(?:[""'']([^''"";)]+)[""'']))?\s*\)(\.[^;]+;)?
The string is:
//from project
$.import("sap.hana.ide.catalog.plugin.performance.server.lib", "helpers");
var h = $.sap.hana.ide.catalog.plugin.performance.server.lib.helpers;
//basic example
$.import("a.b","math"); //var otherHashVar= new otherHash();
$.a.b.math.max(1); //otherHashVar.max(1);
alert($.a.b.math.a);//alert(otherHashVar.a);
//a bit more complex
var z=$.import("a.b.c","x"); // var z=new otherHash(); -> no additional fixes to be done
z.foo();
//another variation
$.import ("a.b","myLib").x(); // similar to first
//var otherHashVar=new OtherHash();
//otherHashVar.x();
var z=$.import("a\b\c.xsjs");
z=$.import("a\b\c.xsjs").a.b.c++;
and the code is:
while(m.Success){
m = r.Match(data, m.Index + m.Length);
}
since I'm not currently modifying the data (will do when I will success to have 6 matches)

The problem is elsewhere in your program.
The following writes 6 matches to console:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
String data="//from project\r\n$.import(\"sap.hana.ide.catalog.plugin.performance.server.lib\", \"helpers\");\r\nvar h = $.sap.hana.ide.catalog.plugin.performance.server.lib.helpers;\r\n//basic example\r\n$.import(\"a.b\",\"math\"); //var otherHashVar= new otherHash();\r\n$.a.b.math.max(1); //otherHashVar.max(1);\r\n\ralert($.a.b.math.a);//alert(otherHashVar.a);\r\n\r\n//a bit more complex\rv\n\r z=$.import(\"a.b.c\",\"x\"); // var z=new otherHash(); -> no additional fixes to be done\rz\n.foo(); \r\n\r//another variation\r$.import (\"a.b\",\"myLib\").x(); // similar to first \r\n//var otherHashVar=new OtherHash();\r\n//otherHashVar.x();\r\n\r\nvar z=$.import(\"a\\b\\c.xsjs\"); \r\n\r\nz=$.import(\"a\\b\\c.xsjs\").a.b.c++;"
;
//System.Console.WriteLine(data);
String expr="((?:var\\s+)?[\\w_]+\\s*=)?\\s*\\$\\.import\\s*\\((?:[\"\"'']([^''\"\";)]+)[\"\"''])(?:\\s*,\\s*(?:[\"\"'']([^''\"\";)]+)[\"\"'']))?\\s*\\)(\\.[^;]+;)?";
Regex r=new Regex(expr);
Match m=r.Match(data);
while(m.Success){
System.Console.WriteLine("Match found ");
System.Console.WriteLine(m.Value);
System.Console.WriteLine();
m = r.Match(data, m.Index + m.Length);
}
}
}
Dot Net fiddle
Also you state that you can't use foreach match in matchcollection because you are modifying your data. What modification are you doing, and have you considered using Regex.Replace?

C# using regex to replace value only after = sign

ok I have a text file that contains:
books_book1 = 1
books_book2 = 2
books_book3 = 3
I would like to retain "books_book1 = "
so far I have:
string text = File.ReadAllText("settings.txt");
text = Regex.Replace(text, ".*books_book1*.", "books_book1 = a",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
text = Regex.Replace(text, ".*books_book2*.", "books_book2 = b",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
text = Regex.Replace(text, ".*books_book3*.", "books_book3 = c",
RegexOptions.Multiline);
File.WriteAllText("settings.txt", text);
this results in:
books_book1 = a=1
output to file should be:
books_book1 = a
books_book2 = b
books_book3 = c
Thanks much in advance...

In a comment I stated:
"I would personally just go for recreating the file if it is that simple. Presumably you load all the values from the file into an object of some kind initially so just use that to recreate the file with the new values. Much easier than messing with Regularexpressions - its simpler and easier to test and see what is going on and easier to change if you ever need to."
I think having looked at this again it is even more true.
From what you said in comments: "when the program loads it reads the values from this text file, then the user has an option to change the values and save it back to the file". Presumably this means that you need to actually know which of the books1, books2, etc. lines you are replacing so you know which of the user supplied values to put in. This is fine (though a little unwieldy) with three items but if you increase that number then you'll need to update your code for every new item. This is never a good thing and will quickly produce some very horrendous looking code liable to get bugs in.
If you have your new settings in some kind of data structure (eg a dictionary) then as I say recreating the file from scratch is probably easiest. See for example this small fully contained code snippet:
//Set up our sample Dictionary
Dictionary<string, string> settings = new Dictionary<string,string>();
settings.Add("books_book1","a");
settings.Add("books_book2","b");
settings.Add("books_book3","c");
//Write the values to file via an intermediate stringbuilder.
StringBuilder sb = new StringBuilder();
foreach (var item in settings)
{
sb.AppendLine(String.Format("{0} = {1}", item.Key, item.Value));
}
File.WriteAllText("settings.txt", sb.ToString());
This has obvious advantages of being simpler and that if you add more settings then they will just go into the dictionary and you don't need to change the code.

I don't think this is the best way to solve the problem, but to make the RegEx do what you want you can do the following:
var findFilter = #"(.*books_book1\s*=\s)(.+)";
var replaceFilter = "${1}a"
text = Regex.Replace(text, findFilter, replaceFilter, RegexOptions.Multiline)
File.WriteLine("settings.txt", text);
....
The code between the ( and ) in the regex is in this case the first and only back reference capturing group and ${1} in the replace portion will use the matching group text to create the output you want. Also you'll notice I used \s for white space so you don't match book111 for example. I'm sure there are other edge cases you'll need to deal with.
books_book1 = a
...

Here's the start to a more generic approach:
This regular expression captures the last digit, taking care to account for variability in digit and whitespace length.
text = Regex.Replace(text , #"(books_book\d+\s*=\s*)(\d+)", DoReplace)
// ...
string DoReplace(Match m)
{
return m.Groups[1].Value + Convert.ToChar(int.Parse(m.Groups[2].Value) + 96);
}

How about something like this (no error checking):
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace TestRegex
{
class Program
{
static void Main( string[] args )
{
var path = #"settings.txt";
var pattern = #"(^\s*books_book\d+\s*=\s*)(\d+)(\s*)$";
var options = RegexOptions.IgnoreCase | RegexOptions.Multiline;
var contents = Regex.Replace( File.ReadAllText( path ), pattern, MyMatchEvaluator, options );
File.WriteAllText( path, contents );
}
static int x = char.ConvertToUtf32( "a", 0 );
static string MyMatchEvaluator( Match m )
{
var x1 = m.Groups[ 1 ].Value;
var x2 = char.ConvertFromUtf32( x++ );
var x3 = m.Groups[ 3 ].Value;
var result = x1 + x2 + x3;
return result;
}
}
}

What is the best way to match a set of regular expressions in HashSet to a string in ASP.NET using C#?

I was wondering if I'm doing the following ASP.NET C# regexp match in the most efficient way?
I have a set of regular expressions in a HashSet that I need to match to an input string, so I do:
HashSet<string> hashMatchTo = new HashSet<string>();
hashMatchTo.Add(#"regexp 1");
hashMatchTo.Add(#"regexp 2");
hashMatchTo.Add(#"regexp 3");
hashMatchTo.Add(#"regexp 4");
hashMatchTo.Add(#"regexp 5");
//and so on
string strInputString = "Some string";
bool bMatched = false;
foreach (string strRegExp in hashMatchTo)
{
Regex rx = new Regex(strRegExp, RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
if (rx.IsMatch(strInputString))
{
bMatched = true;
break;
}
}

Two things jump out at me. The first is that you can populate a collection at the same time you create it, like so:
HashSet<string> hashMatchTo = new HashSet<string>()
{
#"^regexp 1$",
#"^regexp 2$",
#"^regexp 3$",
#"^[\w\s]+$",
#"^regexp 5$"
//and so on
};
The second is that you should use the static version of IsMatch(), like so:
string strInputString = "Some string";
bool bMatched = false;
foreach (string strRegExp in hashMatchTo)
{
if (Regex.IsMatch(strInputString, strRegExp,
RegexOptions.CultureInvariant | RegexOptions.IgnoreCase))
{
bMatched = true;
break;
}
}
Console.WriteLine(bMatched);
}
The reason for doing this is that the static Regex methods automatically cache whatever Regex objects they create. But be aware that the cache size is only 15 by default; if you think you'll be using more than that, you'll need to increase the value of CacheSize property.

If your goal is a simple "does match any? true/false" then concatenate all of your regex into one big regex and just run that.
string strRegexp = string.Join("|", listOfRegex.ToArray());
bool bIsMatched = Regex.IsMatch(strInputString, strRegExp, RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
Console.WriteLine(bMatched);
No "foreach" looping
Better readability
No need to mess with the static Regex caching
While processing it will short circuit much like it does in the loop version with "break", but less method calls will be made which (should) improve performance.

I dont see any thing wrong. I will consider readability over efficiency as long as it is fast enough and meets the business requirements.

It depends upon your set content, I don't know how many is really many. But you may think about searching criteria based on case by case basis. Make your program know what and where to search instead of running through all of hash-set content to check for possible issues.
I used to work with a simple regular expression to extract from 2000 provided urls information that is to be displayed in a listview but it degraded the whole program performance severely.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to build a command parser with regex - c#

Just do string.split('_') then iterate through the array to find everything you need. if(arr[i].startswith("w")) { paramList["width"] = arr[i].remove(0,1); } and so on.

Related

C# Regex will not match

Parse Line and Break it into Variables

C# regex not working properly

C# using regex to replace value only after = sign

What is the best way to match a set of regular expressions in HashSet to a string in ASP.NET using C#?

Categories

Resources