Matching Fields Wrapped in Punctuation but not Single Apostrophe - c#

I have the following Regex that I use to translate fields in SQL strings
string posLookBehind = #"(?<=\p{P}*)\b";
string posLookAhead = #"\b(?=\p{P}*)";
string keyword = "FieldA";
string translatedKeyword = "FldA";
string strSql = "SELECT [FieldA] FROM SomeTableA;";
strSql = Regex.Replace(
strSql,
baseRegexLeft + keyword + baseRegexRight,
translatedKeyword,
RegexOptions.IgnoreCase);
For some keyword FieldA, the above regex would replace 'FieldA' with 'FldA' in each of the following: [FieldA], [FieldA + FieldB] et al. However, I now wish to restrict the regex. I do not want to match 'FieldA' note the single apostrophe.
So, I have changed the regex using regex subtraction to remove ' from the punctuation set:
string posLookBehind = #"(?<=[\p{P}-[']]*)\b";
string posLookAhead = #"\b(?=[\p{P}-[']]*)";
where the other code is the same. But this is still matching 'FieldA'. What am I doing wrong here?
Thanks for your time.

I think the reason behind this is the the star * in the look behind and the look ahead, because it matches zero or more, so after matching FieldA it self it then checks behind for zero or more punctuations that are not apostrophes but since there is an apostrophe it just matches zero times.
You can fix this by changing the star to a plus +:
string posLookBehind = #"(?<=[\p{P}-[']]+)\b";
string posLookAhead = #"\b(?=[\p{P}-[']]+)";
Or if there is only a single surrounding character all the time then:
string posLookBehind = #"(?<=[\p{P}-[']])\b";
string posLookAhead = #"\b(?=[\p{P}-[']])";

Related

Regex get value of a specific word

I've got the following value:
--> Some comment
CREATE VIEW ABC
AS SELECT
Z.NUMBER AS ID,
Z.LANGUAGE AS LNG,
SUBSTR(Z.VALUE_01,01,02) AS RUN_NUMB,
SUBSTR(Z.TXT_VALUE_01,01,79) AS TXT
FROM
MYTABLE Z
WHERE ID = '0033'
AND LNG = 'DE'
I want a regular expression, where I can pass the value (or a part of the value) before the AS and I'll receive the AS-Value, e.g.
Z.NUMBER --> I'll receive ID
Z.LANGUAGE --> I'll receive LNG
Z.VALUE_01 --> I'll receive RUN_NUMB
Z.TXT_VALUE_01 --> I'll receive TXT
Currently I have something like this:
(?<=Z.NUMBER\sAS).+?(?=(,|FROM))
...but this doesn't work for my SUBSTR values
Edit: I'm using C# to execute the Regex:
string expr = #"--> Some comment ....."; //so the long text
string columnExprValue = "Z.LANGUAGE";
string asValue = Regex.Match(expr, #"(?<=" + columnExprValue + #"\sAS).+?(?=(,|FROM))")?.Value.Replace("AS", "").Trim() ?? ""; //Workaround to remove AS, because I don't know how to remove it in Regex
This should work, but the implementation is "naive" in sense that it always expects correct valid parameters that do really exists, you can add necessary checks needed.
So the regex I'm going to use is this .*Z\.VALUE_01.*\s+AS\s+(?<Alias>[^,\s]*), where "Z\.VALUE_01" I will do as parameter. See regex tester - https://regex101.com/r/UJi8pY/1
The idea here is that in Group named "Alias" we should have the exact thing you are looking for
Then C# code will look like this:
public static string GetAlias(string input, string column)
{
var regexPart = column.Replace(".","\\.");
return Regex.Match(input, $".*{regexPart}.*\\s+AS\\s+(?<Alias>[^,\\s]*)").Groups["Alias"].ToString();
}
public static void Main()
{
string val = #"--> Some comment
CREATE VIEW ABC
AS SELECT
Z.NUMBER AS ID,
Z.LANGUAGE AS LNG,
SUBSTR(Z.VALUE_01,01,02) AS RUN_NUMB,
SUBSTR(Z.TXT_VALUE_01,01,79) AS TXT
FROM
MYTABLE Z
WHERE ID = '0033'
AND LNG = 'DE'";
Console.WriteLine(GetAlias(val, "Z.NUMBER"));
Console.WriteLine(GetAlias(val, "Z.LANGUAGE "));
Console.WriteLine(GetAlias(val, "Z.VALUE_01"));
Console.WriteLine(GetAlias(val, "Z.TXT_VALUE_01"));
}
.NET Fiddle - https://dotnetfiddle.net/Z9kd8h
Good suggestion in another answer from #the-fourth-bird to use Regex.Escape instead of column.Replace(".","\\."), so all regex symbols would be escaped
Getting the values with a regex from sql can be very brittle, this pattern is based on the example data.
To get the values only you might use lookarounds:
(?<=\bZ\.(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b.*?\sAS\s+)[^\s,]+(?=,|\s+FROM\b)
Explanation
(?<= Lookbehind assertion
\b A word boundary
Z\. Match Z.
(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b Match any of the alternatives followed by a word boundary (Or just match a single string like Z\.LANGUAGE)
.*? Match optional characters, as few as possible
\sAS\s+ Match AS between whitespace chars
) Close the lookbehind
[^\s,]+ Match 1+ non whitspace chars except for a comma
(?=,|\s+FROM\b) Positive lookahead, assert either , or FROM to the right
See a .NET regex demo.
Or a capture group variant:
\bZ\.(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b.*?\sAS\s+([^\s,])+(?:,|\s+FROM\b)
See another .NET regex demo.
If you want to make the pattern dynamic, you can make use of Regex.Escape to escape the meta characters like the dot to match it literally, or else it would match any character.
For example:
string input = #"--> Some comment
CREATE VIEW ABC
AS SELECT
Z.NUMBER AS ID,
Z.LANGUAGE AS LNG,
SUBSTR(Z.VALUE_01,01,02) AS RUN_NUMB,
SUBSTR(Z.TXT_VALUE_01,01,79) AS TXT
FROM
MYTABLE Z
WHERE ID = '0033'
AND LNG = 'DE'";
string columnExprValue = Regex.Escape("Z.LANGUAGE");
string pattern = #"(?<=\b" + columnExprValue + #"\b.*?\sAS\s+)[^\s,]+(?=,|\s+FROM\b)";
string asValue = Regex.Match(input, pattern)?.Value ?? "";
Console.WriteLine(asValue);
Output
LNG
Check this :
/^ \h*+ (?:substr[(])?(?: Z.TXT_VALUE_01 )(?:,[^,]+,[^,]+[)])? \h* AS \h+ (\w+) \v* [,]? \v* $/gmxi

Regex pattern for text between 2 strings

I am trying to extract all of the text (shown as xxxx) in the follow pattern:
Session["xxxx"]
using c#
This may be Request.Querystring["xxxx"] so I am trying to build the expression dynamically. When I do so, I get all sorts of problems about unescaped charecters or no matches :(
an example might be:
string patternstart = "Session[";
string patternend = "]";
string regexexpr = #"\\" + patternstart + #"(.*?)\\" + patternend ;
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Can anyone help with this as I am stumped (as I always seem to be with RegEx :) )
With some little modifications to your code.
string patternstart = Regex.Escape("Session[");
string patternend = Regex.Escape("]");
string regexexpr = patternstart + #"(.*?)" + patternend;
The pattern you construct in your example looks something like this:
\\Session[(.*?)\\]
There are a couple of problems with this. First it assumes the string starts with a literal backslash, second, it wraps the entire (.*?) in a character class, that means it will match any single open parenthesis, period, asterisk, question mark, close parenthesis or backslash. You'd need to escape the the brackets in your pattern, if you want to match a literal [.
You could use a pattern like this:
Session\[(.*?)]
For example:
string regexexpr = #"Session\[(.*?)]";
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Console.WriteLine(matches[0].Groups[1].Value); // "xxxx"
The characters [ and ] have a special meaning with regular expressions - they define a group where one of the contained characters must match. To work around this, simply 'escape' them with a leading \ character:
string patternstart = "Session\[";
string patternend = "\]";
An example "final string" could then be:
Session\["(.*)"\]
However, you could easily write your RegEx to handle Session, Querystring, etc automatically if you require (without also matching every other array you throw at it), and avoid having to build up the string in the first place:
(Querystring|Session|Form)\["(.*)"\]
and then take the second match.

How can I use RegEx (Or Should I) to extract a string between the starting string '__' and ending with '__' or 'nothing'

RegEx has always confused me.
I have a string like this:
IDE\DiskDJ205GA20_____________________________A3VS____\5&1003ca0&0&0.0.0
Or Sometimes stored like this:
IDE\DiskSJ305GA23_____________________________PG33S\6&2003Sa0&0&0.0.0
I want to get the 'A3VS' or 'PG33S' string. It's my firmware and is varied in length and type. I used to use:
string[] split = PNP.Split('\\'); //where PHP is my string name
var start = split[1].LastIndexOf('_');
string mystring = split[1].Substring(start + 1);
But that only works for strings that don't end with __ after the firmware string. I noticed that some have an additional random '_' after it.
Is RegEx the way to solve this? Or is there another way better
just without RegEx it can be expressed like this:
var firmware = PNP.Split(new[] {'_'}, StringSplitOptions.RemoveEmptyEntries)[1].Split('\\')[0];
string s = split[1].TrimEnd('_');
string mystring = s.Substring(s.LastIndexOf('_') + 1);
If you want the RegEX way to do it here it is:
Regex regex = new Regex(#"\\.*_+(?<firmware>[A-Za-z0-9]+)_*\\");
var m1 = regex.Match("IDE\DiskSJ305GA23_____________________________PG33S\6&2003Sa0&0&0.0.0");
var g1 = m1.Groups["firmware"].Value;
//g1 == "PG33S"
Keep in mind you have to use [A-Za-z0-9] instead of \w in the capture subexpression since \w also matches an underscore (_).

regex and string

Consider the following:
string keywords = "(load|save|close)";
Regex x = new Regex(#"\b"+keywords+"\b");
I get no matches. However, if I do this:
Regex x = new Regex(#"\b(load|save|close)\b");
I get matches. How come the former doesn't work, and how can I fix this? Basically, I want the keywords to be configurable so I placed them in a string.
The last \b in the first code snippet needs a verbatim string specifier (#) in front of it as well as it is a seperate string instance.
string keywords = "(load|save|close)";
Regex x = new Regex(#"\b"+keywords+#"\b");
You're missing another verbatim string specifier (# prefixed to the last \b):
Regex x = new Regex(#"\b" + keywords + #"\b");
Regex x = new Regex(#"\b"+keywords+#"\b");
You forgot additional # before second "\b"

String Replace Problem

I have a string replace problem. I have a result like:
"Animal.Active = 1 And Animal.Gender = 2"
I want to replace something in this text.
Animal.Active part is returned from a database and sometimes it is returned with the Animal.Gender part.
When Animal.Gender part comes from the database I have to remove this And Animal.Gender part.
Also if the string has Animal.Active = 1, I have to remove Animal.Active = 1 And part. Note the And.
How can I do this?
You will need to use a regular expression (regex) to replace this, then, since you want to match a number.
string pattern = "And Animal\.Gender\s*=\s*[0-9]+";
string replacement = "";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Will work to replace "And Animal.Gender = #" with zero or more white spaces between the = sign.
You can do a similar replacement for the second request, with Animal.Active.
Granted this is a very specific solution that will undoubtedly become more complicated as you add more conditions, but here goes:
dbReturn =
dbReturn.Replace("And Animal.Gender","").Replace("Animal.Active = 1","");

Categories

Resources