C# Regex: Get sub-capture? - c#

I've got a regex...
internal static readonly Regex _parseSelector = new Regex(#"
(?<tag>" + _namePattern + #")?
(?:\.(?<class>" + _namePattern + #"))*
(?:\#(?<id>" + _namePattern + #"))*
(?<attr>\[\s*
(?<name>" + _namePattern + #")\s*
(?:
(?<op>[|*~$!^%<>]?=|[<>])\s*
(?<quote>['""]?)
(?<value>.*?)
(?<!\\)\k<quote>\s*
)?
\])*
(?::(?<pseudo>" + _namePattern + #"))*
", RegexOptions.IgnorePatternWhitespace);
For which I grab the match object...
var m = _parseSelector.Match("tag.class1.class2#id[attr1=val1][attr2=\"val2\"][attr3]:pseudo");
Now is there a way to do something akin to m.Group["attr"]["name"]? Or somehow get the groups inside the attr group?

Group names aren't nested in regular expressions - it's a flat structure. You can just use this:
m.Group["name"]

Related

how to declare xpath as a string variable? C#

How can I declare the following xpath value as a string variable in C#?
Value: //*[contains(concat( " ", #class, " " ), concat( " ", "lit-movie", " " ))]
You have to use escape characters (use backslash), so " should be replaced by \":
string xpath = "//*[contains(concat( \" \", #class, \" \" ), concat( \" \", \"lit-movie\", \" \" ))]";

Insert space when needed with Regex in c#

I have to write a function that looks up for a string and check if is followed/preceded by a blank space, and if not add it here is my try :
public string AddSpaceIfNeeded(string originalValue, string targetValue)
{
if (originalValue.Contains(targetValue))
{
if (!originalValue.StartsWith(targetValue))
{
int targetValueIndex = originalValue.IndexOf(targetValue);
if (!char.IsWhiteSpace(originalValue[targetValueIndex - 1]))
originalValue.Insert(targetValueIndex - 1, " ");
}
if (!originalValue.EndsWith(targetValue))
{
int targetValueIndex = originalValue.IndexOf(targetValue);
if (!char.IsWhiteSpace(originalValue[targetValueIndex + targetValue.Length + 1]) && !originalValue[targetValueIndex + targetValue.Length + 1].Equals("(s)"))
originalValue.Insert(targetValueIndex + targetValue.Length + 1, " ");
}
}
return originalValue;
}
I want to try with Regex :
I tried like this for adding spaces after the targetValue :
Regex spaceRegex = new Regex("(" + targetValue + ")(?!,)(?!!)(?!(s))(?= )");
originalValue = spaceRegex.Replace(originalValue, (Match m) => m.ToString() + " ");
But not working, and I don't really know for adding space before the word.
Example adding space after:
AddSpaceIfNeeded(Hello my nameis ElBarto, name)
=> Output Hello my name is ElBarto
Example adding space before:
AddSpaceIfNeeded(Hello myname is ElBarto, name)
=> Output Hello my name is ElBarto
You may match your word in all three context while capturing them in separate groups and test for a match later in the match evaluator:
public static string AddSpaceIfNeeded(string originalValue, string targetValue)
{
return Regex.Replace(originalValue,
$#"(?<=\S)({targetValue})(?=\S)|(?<=\S)({targetValue})(?!\S)|(?<!\S){targetValue}(?=\S)", m =>
m.Groups[1].Success ? $" {targetValue} " :
m.Groups[2].Success ? $" {targetValue}" :
$"{targetValue} ");
}
See the C# demo
Note you may need to use Regex.Escape(targetValue) to escape any sepcial chars in the string used as a dynamic pattern.
Pattern details
(?<=\S)({targetValue})(?=\S) - a targetValue that is preceded with a non-whitespace ((?<=\S)) and followed with a non-whitespace ((?=\S))
| - or
(?<=\S)({targetValue})(?!\S) - a targetValue that is preceded with a non-whitespace ((?<=\S)) and not followed with a non-whitespace ((?!\S))
| - or
(?<!\S){targetValue}(?=\S) - a targetValue that is not preceded with a non-whitespace ((?<!\S)) and followed with a non-whitespace ((?!\S))
When m.Groups[1].Success is true, the whole value should be enclosed with spaces. When m.Groups[2].Success is true, we need to add a space before the value. Else, we add a space after the value.

Creating SQL Statements from a Text File - C#

Good morning,
I´m trying to create a program to create statements in a .Sql document but i´m having some troubles.
this is my code so far:
string[] filas = File.ReadAllLines("c:\\temp\\Statements.txt");
StreamWriter sw = new StreamWriter("c:\\temp\\Statements.sql");
foreach (string fila in filas)
{
string sql = "INSERT ";
string[] campos = fila.Split(' ');
if (campos[0]== "1A")
{
sql += " INTO TABLE1 (field1) VALUES (" + campos[1] + ");";
}
else
{
sql += " INTO TABLE2 (field1,field2,field3) VALUES (" + campos[1] + "," + campos[2] + "," + campos[3] + ");";
}
sw.WriteLine(sql);
}
sw.Close();
{
the thing is:
I need to read a txt document (the lenght will change), and then transform it to a sql document with all the statements, there are only two tipes of lines starting in "1A" or "2B", example:
1A123456 456,67
2B123456 mr awesome great strt germany
1A123456 456,67
2B123456 mr awesome great strt germany
2B123456 mr awesome great strt germany
1A123456 456,67
1A123456 456,67
then im trying to "transform" that information on "inserts":
INSERT INTO TABLE1 (REF,MONEY) VALUES (A123456,456,67);
INSERT INTO TABLE2 (REF,NAME,ADR) VALUES (B123456,mr awesome,great strt);
INSERT INTO TABLE1 (REF,MONEY) VALUES (A123456,456,67);
INSERT INTO TABLE2 (REF,NAME,ADR) VALUES (B123456,mr awesome,great strt);
INSERT INTO TABLE2 (REF,NAME,ADR) VALUES (B123456,mr awesome,great strt);
INSERT INTO TABLE1 (REF,MONEY) VALUES (A123456,456,67);
INSERT INTO TABLE1 (REF,MONEY) VALUES (A123456,456,67);
my code is not working so well... i hope someone can help me a litte :).
regards.
Firstly I could not see space between 1A and 123456 . So if (campos[0]== "1A") will not work. Use contains method to do this check - if (campos[0].contains("1A"). you can alternately evaluate using startswith
Secondly you need to split 1A123456 to get A123456 .. you can use substring or similar functions for same. (Same for 2B)
Thirdly, you are splitting the string with ' ' - this could result in many more string than your anticipated strings. 2B123456 mr awesome great strt germany - in this case mr awesome great strt are all different. You need to put in logic to concatenate campos[1] & campos[2] and campos[3] & campos[1=4] in the case of 2B ....
Fourthly for the 1A case you need to split campos[1] using , as delimiter to get the two values you want
Hope this provides you enough guidance to solve your issue.
After some research and with the help from anil and Pikoh i found a good solution:
string mydate = DateTime.Now.ToString("yyyyMMdd");
string AÑO = DateTime.Now.ToString("yyyy");
string MES = DateTime.Now.ToString("MM");
string DIA = DateTime.Now.ToString("dd");
string sql = "INSERT ";
string[] campos = fila.Split(' ');
if (campos[0].StartsWith("1H"))
{
sql += "INTO TABLE (VALUES,VALUES,VALUES) VALUES (" + "'" + mydate + "'" + "," + "'" + campos[0].Substring(1, 8) + "'" + "," + "'" + campos[0].Substring(9, 7) + "'" + "," + "'" + campos[8] + "'" + ");";
Inserting data and manipulating strings was good, but now i have the last problem,
what happen if i need to make a "backspace" to an specific string because my logic cant pick the correct information? regards.

Regex with optional matching groups

I'm trying to parse given string which is kind a of path separated with /. I need to write regex that would match each segment in the path to corresponding regex group.
Example 1:
input:
/EAN/SomeBrand/appliances/refrigerators/RF444
output:
Group: producer, Value: SomeBrand
Group: category, Value: appliances
Group: subcategory, Value: refrigerators
Group: product, Value: RF4441
Example 2:
input:
/EAN/SomeBrand/appliances
output:
Group: producer, Value: SomeBrand
Group: category, Value: appliances
Group: subcategory, Value:
Group: product, Value:
I tried following code, it works fine when the path is full (like in the first exmaple) but fails to find the groups when the input string is impartial (like in example 2).
static void Main()
{
var pattern = #"^" + #"/EAN"
+ #"/" + #"(?<producer>.+)"
+ #"/" + #"(?<category>.+)"
+ #"/" + #"(?<subcategory>.+)"
+ #"/" + #"(?<product>.+)?"
+ #"$";
var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
var result = rgx.Match(#"/EAN/SomeBrand/appliances/refrigerators/RF444");
foreach (string groupName in rgx.GetGroupNames())
{
Console.WriteLine(
"Group: {0}, Value: {1}",
groupName,
result.Groups[groupName].Value);
}
Console.ReadLine();
}
Any suggestion is welcome. Unfortunately I cannot simply split the string since the framework I'm using expects regex object.
You can use optional groups (...)? and replace the .+ greedy dot matching patterns with negated character classes [^/]+:
^/EAN/(?<producer>[^/]+)/(?<category>[^/]+)(/(?<subcategory>[^/]+))?(/(?<product>[^/]+))?$
^ ^^^ ^^
See the regex demo
This is how you need to declare your regex in the C# code:
var pattern = #"^" + #"/EAN"
+ #"/(?<producer>[^/]+)"
+ #"/(?<category>[^/]+)"
+ #"(/(?<subcategory>[^/]+))?"
+ #"(/(?<product>[^/]+))?"
+ #"$";
var rgx = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);
Note I am using regular capturing groups as optional ones, but the RegexOptions.ExplicitCapture flag turns all non-named capturing groups into non-capturing and thus, they do not appear among the Match.Groups. So, we only have 5 groups all the time even without using non-capturing optional groups (?:...)?.
Try
var pattern = #"^" + #"/EAN"
+ #"(?:/" + #"(?<producer>[^/]+))?"
+ #"(?:/" + #"(?<category>[^/]+))?"
+ #"(?:/" + #"(?<subcategory>[^/]+))?"
+ #"(?:/" + #"(?<product>[^/]+))?";
Note how I replaced the . with [^/], because you want to use the / to split strings. Note even the use of the optional quantifier for each sub-part (?)

Can I write this regex in one step?

This is the input string "23x +y-34 x + y+21x - 3y2-3x-y+2". I want to surround every '+' and '-' character with whitespaces but only if they are not allready sourrounded from left or right side. So my input string would look like this "23x + y - 34 x + y + 21x - 3y2 - 3x - y + 2". I wrote this code that does the job:
Regex reg1 = new Regex(#"\+(?! )|\-(?! )");
input = reg1.Replace(input, delegate(Match m) { return m.Value + " "; });
Regex reg2 = new Regex(#"(?<! )\+|(?<! )\-");
input = reg2.Replace(input, delegate(Match m) { return " " + m.Value; });
explanation:
reg1 // Match '+' followed by any character not ' ' (whitespace) or same thing for '-'
reg2 // Same thing only that I match '+' or '-' not preceding by ' '(whitespace)
delegate 1 and 2 just insert " " before and after m.Value ( match value )
Question is, is there a way to create just one regex and just one delegate? i.e. do this job in one step? I am a new to regex and I want to learn efficient way.
I don't see the need of lookarounds or delegates here. Just replace
\s*([-+])\s*
with
" $1 "
(See http://ideone.com/r3Oog.)
I'd try
Regex.Replace(input, #"\s*[+-]\s*", m => " " + m.ToString().Trim() + " ");

Categories

Resources