Regex get value of a specific word - c#

I've got the following value:
--> Some comment
CREATE VIEW ABC
AS SELECT
Z.NUMBER AS ID,
Z.LANGUAGE AS LNG,
SUBSTR(Z.VALUE_01,01,02) AS RUN_NUMB,
SUBSTR(Z.TXT_VALUE_01,01,79) AS TXT
FROM
MYTABLE Z
WHERE ID = '0033'
AND LNG = 'DE'
I want a regular expression, where I can pass the value (or a part of the value) before the AS and I'll receive the AS-Value, e.g.
Z.NUMBER --> I'll receive ID
Z.LANGUAGE --> I'll receive LNG
Z.VALUE_01 --> I'll receive RUN_NUMB
Z.TXT_VALUE_01 --> I'll receive TXT
Currently I have something like this:
(?<=Z.NUMBER\sAS).+?(?=(,|FROM))
...but this doesn't work for my SUBSTR values
Edit: I'm using C# to execute the Regex:
string expr = #"--> Some comment ....."; //so the long text
string columnExprValue = "Z.LANGUAGE";
string asValue = Regex.Match(expr, #"(?<=" + columnExprValue + #"\sAS).+?(?=(,|FROM))")?.Value.Replace("AS", "").Trim() ?? ""; //Workaround to remove AS, because I don't know how to remove it in Regex

This should work, but the implementation is "naive" in sense that it always expects correct valid parameters that do really exists, you can add necessary checks needed.
So the regex I'm going to use is this .*Z\.VALUE_01.*\s+AS\s+(?<Alias>[^,\s]*), where "Z\.VALUE_01" I will do as parameter. See regex tester - https://regex101.com/r/UJi8pY/1
The idea here is that in Group named "Alias" we should have the exact thing you are looking for
Then C# code will look like this:
public static string GetAlias(string input, string column)
{
var regexPart = column.Replace(".","\\.");
return Regex.Match(input, $".*{regexPart}.*\\s+AS\\s+(?<Alias>[^,\\s]*)").Groups["Alias"].ToString();
}
public static void Main()
{
string val = #"--> Some comment
CREATE VIEW ABC
AS SELECT
Z.NUMBER AS ID,
Z.LANGUAGE AS LNG,
SUBSTR(Z.VALUE_01,01,02) AS RUN_NUMB,
SUBSTR(Z.TXT_VALUE_01,01,79) AS TXT
FROM
MYTABLE Z
WHERE ID = '0033'
AND LNG = 'DE'";
Console.WriteLine(GetAlias(val, "Z.NUMBER"));
Console.WriteLine(GetAlias(val, "Z.LANGUAGE "));
Console.WriteLine(GetAlias(val, "Z.VALUE_01"));
Console.WriteLine(GetAlias(val, "Z.TXT_VALUE_01"));
}
.NET Fiddle - https://dotnetfiddle.net/Z9kd8h
Good suggestion in another answer from #the-fourth-bird to use Regex.Escape instead of column.Replace(".","\\."), so all regex symbols would be escaped

Getting the values with a regex from sql can be very brittle, this pattern is based on the example data.
To get the values only you might use lookarounds:
(?<=\bZ\.(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b.*?\sAS\s+)[^\s,]+(?=,|\s+FROM\b)
Explanation
(?<= Lookbehind assertion
\b A word boundary
Z\. Match Z.
(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b Match any of the alternatives followed by a word boundary (Or just match a single string like Z\.LANGUAGE)
.*? Match optional characters, as few as possible
\sAS\s+ Match AS between whitespace chars
) Close the lookbehind
[^\s,]+ Match 1+ non whitspace chars except for a comma
(?=,|\s+FROM\b) Positive lookahead, assert either , or FROM to the right
See a .NET regex demo.
Or a capture group variant:
\bZ\.(?:LANGUAGE|NUMBER|(?:TXT_)?VALUE_01)\b.*?\sAS\s+([^\s,])+(?:,|\s+FROM\b)
See another .NET regex demo.
If you want to make the pattern dynamic, you can make use of Regex.Escape to escape the meta characters like the dot to match it literally, or else it would match any character.
For example:
string input = #"--> Some comment
CREATE VIEW ABC
AS SELECT
Z.NUMBER AS ID,
Z.LANGUAGE AS LNG,
SUBSTR(Z.VALUE_01,01,02) AS RUN_NUMB,
SUBSTR(Z.TXT_VALUE_01,01,79) AS TXT
FROM
MYTABLE Z
WHERE ID = '0033'
AND LNG = 'DE'";
string columnExprValue = Regex.Escape("Z.LANGUAGE");
string pattern = #"(?<=\b" + columnExprValue + #"\b.*?\sAS\s+)[^\s,]+(?=,|\s+FROM\b)";
string asValue = Regex.Match(input, pattern)?.Value ?? "";
Console.WriteLine(asValue);
Output
LNG

Check this :
/^ \h*+ (?:substr[(])?(?: Z.TXT_VALUE_01 )(?:,[^,]+,[^,]+[)])? \h* AS \h+ (\w+) \v* [,]? \v* $/gmxi

Related

Replacing a portion of a string with an exact matching

I just want to replace a portion of a string only if matches the given text.
My use case is as follows:
var text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
string result = text.Replace("wd:response", "response");
/*
* expecting the below text
<response><wd:response-data></wd:response-data></response>
*
*/
I followed the following answers:
Way to have String.Replace only hit "whole words"
Regular expression for exact match of a string
But I failed to achieve what I want.
Please share your thoughts/solutions.
Sample on
https://dotnetfiddle.net/pMkO8Q
In general, you should really be parsing and manipulating XML as XML, using functions that know how XML works and what's legal in the language. Regex and other naive text manipulation will often lead you into trouble.
That said, for a very simple solution to this specific problem, you can do this with two replaces:
var text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
text.Replace("wd:response>", "response>").Replace("wd:response ", "response ")
(Note the spaces at the end of the parameters to the second replace.)
Alternatively use a regex similar to "wd:response\s*>"
The easiest way to achieve your result as per your .net fiddle is use the replace as below.
string result = text.Replace("wd:response>", "response>");
But proper way to achieve this is parsing using XML
You can capture the string wd-response in a capturing group and replace using Regex.Replace using the MatchEvaluator like this.
Regex explanation - <[/]?(wd:response)[\s+]?>
Match < literally
Match / optionally hence the ?
Match the string wd:response and place it in a capturing group enclosed with ()
Match one or more optional whitespace [\s+]?
Match > literally
public class Program
{
public static void Main(string[] args)
{
string text = "<wd:response><wd:response-data></wd:response-data></wd:response >";
string replacePattern = "response";
string pattern = #"<[/]?(wd:response)[\s+]?>";
string replacedPattern = Regex.Replace(text, pattern, match =>
{
// Extract the first group
Group group = match.Groups[1];
// Replace the group value with the replacePattern
return string.Format("{0}{1}{2}", match.Value.Substring(0, group.Index - match.Index), replacePattern, match.Value.Substring(group.Index - match.Index + group.Length));
});
Console.WriteLine(replacedPattern);
}
}
Outputting:
<response><wd:response-data></wd:response-data></response >

Use regular expression in C# to select a specific occurrence from a string by limiting input

Using C#, i am stuck while trying to extract a specific string while limiting the string to be matched. Here is my input string:
NPS_CNTY01_10112018_Adult_Submittal.txt
I would like to extract 01 after CNTY and ingnore anything after 01.
So far i have the regex to be:
(?!NPS_CNTY)\d{2}
But the above regex gets many other digit matches from the input string. One approach i was thinking was to limit the input to 9 characters to eventually get 01. But somehow not able to achieve that. Any help is appreciated.
I would like to add that the only variable data in this input string is:
NPS_CNTY[two digit county code excluding this bracket]_[date in MMDDYYYY format excluding the brackets]_Adult_Submittal.txt.
Also please limit solutions to regex's.
The (?!NPS_CNTY)\d{2} pattern matches a location that is not immediately followed with NPS_CNTY and then matches 2 digits. The lookahead always returns true since two digits cannot start a NPS_CNTY char sequence, it is redundant.
You may use a positive lookbehind like this to get 01:
var m = Regex.Match(s, #"(?<=NPS_CNTY)\d+");
var result = "";
if (m.Success)
{
result = m.Value;
}
See the .NET regex demo
Here, (?<=NPS_CNTY), a positive lookbehind, matches a location that is immediately preceded with NPS_CNTY and then \d+ matches 1 or more digits.
An equivalent solution using capturing mechanism is
var m = Regex.Match(s, #"NPS_CNTY(\d+)");
var result = "";
if (m.Success)
{
result = m.Groups[1].Value;
}
If the string always start with NPS_CNTY and you have to extract 2 digits then you don't need a regular expression. Just use Substring() method:
string text = #"NPS_CNTY01_01141980_Adult_Submittal.txt";
string digits = text.Substring(8, 2);
EDIT:
In case you need to match N digits after NPS_CNTY you can use the following code:
string text = #"NPS_CNTY012_01141980_Adult_Submittal.txt";
string digits = text.Replace("NPS_CNTY", string.Empty)
.Split("_", StringSplitOptions.RemoveEmptyEntries)
.FirstOrDefault();

Modifying string value

I have a string which is
string a = #"\server\MainDirectory\SubDirectoryA\SubDirectoryB\SubdirectoryC\MyFile.pdf";
The SubDirectoryB will always start with a prefix of RN followed by 6 unique numbers. Now I'm trying to modify SubDirectoryB parth of the string to be replaced by a new value lets say RN012345
So the new string should look like
string b = #"\server\MainDirectory\SubDirectoryA\RN012345\SubdirectoryC\MyFile.pdf";
To achieve this I'm making use of the following helper method
public static string ReplaceAt(this string path, int index, int length, string replace)
{
return path.Remove(index, Math.Min(length, path.Length - index)).Insert(index, replace);
}
Which works great for now.
However the orginial path will be changing in the near future so it will something like #\MainDirectory\RN012345\AnotherDirectory\MyFile.pdf. So I was wondering if there is like a regex or another feature I can use to just change the value in the path rather than providing the index which will change in the future.
Assuming you need to only replace those \RNxxxxxx\ where each x is a unique digit, you need to capture the 6 digits and analyze the substring inside a match evaluator.
var a = #"\server\MainDirectory\SubDirectoryA\RN012345\SubdirectoryC\MyFile.pdf";
var res = Regex.Replace(a, #"\\RN([0-9]{6})\\", m =>
m.Groups[1].Value.Distinct().Count() == m.Groups[1].Value.Length ?
"\\RN0123456\\" : m.Value);
// res => \server\MainDirectory\SubDirectoryA\RN0123456\SubdirectoryC\MyFile.pdf
See the C# demo
The regex is
\\RN([0-9]{6})\\
It matches a \ with \\, then matches RN, then matches and captures into Group 1 six digits (with ([0-9]{6})) and then will match a \. In the replacment part, the m.Groups[1].Value.Distinct().Count() == m.Groups[1].Value.Length checks if the number of distinct digits is the same as the number of the substring captured, and if yes, the digits are unique and the replacement occurs, else, the whole match is put back into the replacement result.
Use String.Replace
string oldSubdirectoryB = "RN012345";
string newSubdirectoryB = "RN147258";
string fileNameWithPath = #"\server\MainDirectory\SubDirectoryA\RN012345\SubdirectoryC\MyFile.pdf";
fileNameWithPath = fileNameWithPath.Replace(oldSubdirectoryB, newSubdirectoryB);
You can use Regex.Replace to replace the SubDirectoryB with your required value
string a = #"\server\MainDirectory\SubDirectoryA\RN123456\SubdirectoryC\MyFile.pdf";
a = Regex.Replace(a, "RN[0-9]{6,6}","Mairaj");
Here i have replaced a string with RN followed by 6 numbers with Mairaj.

Regex to text before set of numbers

I have text like this
Inc12345_Month
Ted12345_Month
J8T12345_Month
What I need to do is extract the 12345 and also remove everything before it. This will be done in C#
.+?(?=\d_Monthly) was working in a regex tester online but when I put it in my code it only returned 5_Month.
Edit: the 12345 could be a variable length so I cannot [0-9] multiple times.
Edit2: Code this was just to try and remove everything before the 12345
string text = /* the above text pulled in from a file */;
Regex reg = new Regex(#".+?(?=\d+_Monthly)");
text = reg.Replace(string, "");
You can use this function to strip it:
private static Regex getNumberAndBeyondRegex = new Regex(^.{2}\D+(\d.*)$", RegexOptions.Compiled);
public static string GetNumberAndBeyond(string input)
{
var match = getNumberAndBeyondRegex.Match(input);
if (!match.Success) throw new ArgumentException("String isn't in the correct format.", "input");
return match.Groups[1].Value;
}
The regex at work is ^.{2}\D+(\d.*)$
It works by grabbing anything that's a number, after at least one character that isn't a number. It'll not only match _Month but also other endings.
The regex exists out of a few parts:
^ matches the beginning of the string
.{2} matches any two characters, to prevent a digit from matching if it's the first or 2nd character, you can increase this number to be equal to the minimum prefix length - 1
\D+ matches at least one character that isn't a number
( starts capturing a group
\d.* matches at least one number and any values beyond that
) closes the capturing group
$ matches the end of the string
There are a lot of different regex flavors, many of them have slight differences in terms of escaping, capturing, replacing and quite surely some others.
For testing .NET regexes online I use the free version of the tool RegexHero, it has an popup every now and then, but it makes up for that time by showing you live results, capture groups, and instant replacing. Next to having quite a lot of features.
If you want to match anywhere within the string, you can use the regex \d+_Month, it is very similiar to your original regex. In code:
new Regex("\d+_Month").Match(input).Value
Edit:
Based on the format you supplied in the comment I've created a regex and function to parse the entire file name:
private static Regex parseFileNameRegex = new Regex(#"^.*\D(\d+)_Month_([a-zA-Z]+)\.(\w+)$", RegexOptions.Compiled);
public static bool TryParseFileName(string fileName, out int id, out string month, out string fileExtension)
{
id = 0; month = null; fileExtension = null;
if (fileName == null) return false;
var match = parseFileNameRegex.Match(fileName);
if (!match.Success) return false;
if (!int.TryParse(match.Groups[1].Value, out id) || id < 1) return false; // Convert the ID into a number
month = match.Groups[2].Value;
fileExtension = match.Groups[3].Value;
return true;
}
In the parse function it requires the ID to be at least 1, 0 isn't accepted (and negative numbers won't match the regex), if you don't want this restriction, simply remove || id < 1 from the function.
Using the function would look like:
int id; string month, fileExtension;
if (!TryParseFileName("CompanyName_ClientName12345_Month_Nov.pdf", out id, out month, out fileExtension))
throw new FormatException("File name is incorrectly formatted."); // Do whatever you want when you get an invalid filename
// Use id, month and fileExtension here :)
The regex ^.*\D(\d+)_Month_([a-zA-Z]+)\.(\w+)$ works like:
^ matches the beginning of the string
.*\D matches at least one non-numeric character
(\d+) captures at least 1 number, this is the ID
_Month_ is the literal text in between
([a-zA-Z]+) matches and captures at least 1 letter, this is the month
\. matches a . character
(\w+) matches and captures any alphanumeric (letters and numbers), this is the file extension
$ matches the end of the string
Using :
Regex reg = new Regex(#"\D+(?=(\d+)_Monthly)");
is more explicit, the result is in Groups[1].
Part by part:
.+?
Match anything, maybe. This doesn't make any sense to me. It would be equivalent to ".*", which may or may not be what you meant.
(?=
start a group
\d
Match exactly 1 decimal, which explains what you are seeing, and the rest of the number is matched by .+? which is outside the group
_Monthly
match the literal text
)
end group
I think what you want is:
.*(?=\d+_Monthly)
I guess you are missing the + sign after \d
.+?(?=\d+_Monthly)
This should ask for one or more digits.
If you don't need anything before the number, this should work:
(\d+_Month)
I use Derek Slager's regex tester when I'm working with C# regex.
Better dotnet regular expression tester

Regex pattern for text between 2 strings

I am trying to extract all of the text (shown as xxxx) in the follow pattern:
Session["xxxx"]
using c#
This may be Request.Querystring["xxxx"] so I am trying to build the expression dynamically. When I do so, I get all sorts of problems about unescaped charecters or no matches :(
an example might be:
string patternstart = "Session[";
string patternend = "]";
string regexexpr = #"\\" + patternstart + #"(.*?)\\" + patternend ;
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Can anyone help with this as I am stumped (as I always seem to be with RegEx :) )
With some little modifications to your code.
string patternstart = Regex.Escape("Session[");
string patternend = Regex.Escape("]");
string regexexpr = patternstart + #"(.*?)" + patternend;
The pattern you construct in your example looks something like this:
\\Session[(.*?)\\]
There are a couple of problems with this. First it assumes the string starts with a literal backslash, second, it wraps the entire (.*?) in a character class, that means it will match any single open parenthesis, period, asterisk, question mark, close parenthesis or backslash. You'd need to escape the the brackets in your pattern, if you want to match a literal [.
You could use a pattern like this:
Session\[(.*?)]
For example:
string regexexpr = #"Session\[(.*?)]";
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Console.WriteLine(matches[0].Groups[1].Value); // "xxxx"
The characters [ and ] have a special meaning with regular expressions - they define a group where one of the contained characters must match. To work around this, simply 'escape' them with a leading \ character:
string patternstart = "Session\[";
string patternend = "\]";
An example "final string" could then be:
Session\["(.*)"\]
However, you could easily write your RegEx to handle Session, Querystring, etc automatically if you require (without also matching every other array you throw at it), and avoid having to build up the string in the first place:
(Querystring|Session|Form)\["(.*)"\]
and then take the second match.

Categories

Resources