Regex without taking care of escape codes - c#

I want to validate a string like this (netsh cmd output):
"\r\nR‚servations d'URLÿ:\r\n--------------------\r\n\r\n URL r‚serv‚e : https://+:443/SomeWebSite/ \r\n Utilisateurÿ: AUTORITE NT\\SERVICE R\u0090SEAU\r\n \u0090couterÿ: Yes\r\n D‚l‚guerÿ: Yes\r\n SDDLÿ: D:(A;;GA;;;NS) \r\n\r\n\r\n"
with this pattern:
"URL .+https:\/\/\+:443\/SomeWebSite\/.+Yes.+Yes.+SDDL.+"
So, I intend to detect this kind of strings (xxxxx is something(+)):
xxxxxURLxxxxxhttps://+:443/SomeWebSite/xxxxxYesxxxxxYesxxxxxSDDLxxxx
I wrote this code in C# to do it but my expression still doesn't work:
string output = "\r\nR‚servations d'URLÿ:\r\n--------------------\r\n\r\n URL r‚serv‚e : https://+:443/SomeWebSite/ \r\n Utilisateurÿ: AUTORITE NT\\SERVICE R\u0090SEAU\r\n \u0090couterÿ: Yes\r\n D‚l‚guerÿ: Yes\r\n SDDLÿ: D:(A;;GA;;;NS) \r\n\r\n\r\n";
output = output.Replace(Environment.NewLine, ""); //==> output2=="R‚servations d'URLÿ:-----------
Regex testUrlOpened = new Regex(output, RegexOptions.Singleline);
MessageBox.Show(testUrlOpened.IsMatch(#"URL").ToString()); // ==> False
MessageBox.Show(testUrlOpened.IsMatch(#".+URL.+").ToString()); // ==> False
MessageBox.Show(testUrlOpened.IsMatch(#"URL .+https:\/\/\+:443\/SomeWebSite\/.+Yes.+Yes.+SDDL.+").ToString()); // ==> False
So I suppose that I've another issue with regex in c#...
May be encoding issue?

Start by removing the escape codes expected in the string . It might be better to remove them all depending on your use scenario (C# escape codes)
output = output.Replace('\n').Replace('\r').Replace('\t')
Now you have a single line string, you can do the regex matching
.+URL.+https:\/\/.+:443\/SomeWebSite\/.+Yes.+Yes.+SDDL.+
Notice the following:
1- the ^ and $ means to match the exact begin and end of the string. If you have the target string within the line using these will cause the matching to fail.
2- You need to escape the necessary regex characters .
3- To match "Any character except new line one or more times" you use .+
I hope this helps

You can use Regex.Unescape to unescape the string, and then do your regex match :
var output = #"\r\nR‚servations d'URLÿ:\r\n--------------------\r\n\r\n URL r‚serv‚e : https://+:443/SomeWebSite/ \r\n Utilisateurÿ: AUTORITE NT\\SERVICE R\u0090SEAU\r\n \u0090couterÿ: Yes\r\n D‚l‚guerÿ: Yes\r\n SDDLÿ: D:(A;;GA;;;NS) \r\n\r\n\r\n";
output = Regex.Unescape(output).Dump();
var foundUrl = Regex.IsMatch(output, #"URL .+ https://\+:443/SomeWebSite/.+YES.+YES.+SDDL.+");

+ indicates 1 or more of the previously stated pattern, if we put the pattern (.|\n), which matches anything, in front of those +'s, you'll be all set, without having to remove or account for escape codes.
^(.|\n)+URL(.|\n)+https://(.|\n)+:443/SomeWebSite/(.|\n)+Yes(.|\n)+Yes(.|\n)+SDDL(.|\n)+$
EDIT: The risk of doing something like this instead of sanitizing your string first is that you may get false positives because there could be any character separating the matches, all this regex does is ensure that somewhere in the string, in order, are the strings
"URL", "https://", ":443/SomeWebSite/", "Yes", "Yes", "SDDL"

So simple. Last issue was due to reg expression to put in Regex constructor and input string in IsMatch Method... :(
So final code is:
string output = "\r\nR‚servations d'URLÿ:\r\n--------------------\r\n\r\n URL r‚serv‚e : https://+:443/SomeWebSite/ \r\n Utilisateurÿ: AUTORITE NT\\SERVICE R\u0090SEAU\r\n \u0090couterÿ: Yes\r\n D‚l‚guerÿ: Yes\r\n SDDLÿ: D:(A;;GA;;;NS) \r\n\r\n\r\n";
output = output.Replace(Environment.NewLine, ""); //==> output2=="R‚servations d'URLÿ:-----------
Regex testUrlOpened = new Regex((#"URL .+https:\/\/\+:443\/SomeWebSite\/.+Yes.+Yes.+SDDL.+", RegexOptions.Singleline);
MessageBox.Show(testUrlOpened.IsMatch(output).ToString()); // ==> True!!!

Regex taking decimal number only without using escape character.
^[0-9]+([.][0-9]+)?$
Test It

Related

Splitting of a string using Regex

I have string of the following format:
string test = "test.BO.ID";
My aim is string that part of the string whatever comes after first dot.
So ideally I am expecting output as "BO.ID".
Here is what I have tried:
// Checking for the first occurence and take whatever comes after dot
var output = Regex.Match(test, #"^(?=.).*?");
The output I am getting is empty.
What is the modification I need to make it for Regex?
You get an empty output because the pattern you have can match an empty string at the start of a string, and that is enough since .*? is a lazy subpattern and . matches any char.
Use (the value will be in Match.Groups[1].Value)
\.(.*)
or (with a lookahead, to get the string as a Match.Value)
(?<=\.).*
See the regex demo and a C# online demo.
A non-regex approach can be use String#Split with count argument (demo):
var s = "test.BO.ID";
var res = s.Split(new[] {"."}, 2, StringSplitOptions.None);
if (res.GetLength(0) > 1)
Console.WriteLine(res[1]);
If you only want the part after the first dot you don't need a regex at all:
x.Substring(x.IndexOf('.'))

C# Replace everything except two cases

how can i do something like this.
new Regex("([^my]|[^test])").Replace("Thats my working test", "");
I would get this:
my test
But i would get a empty string, because everything would be replaced with none.
Thank you in Advance!
You can use this lookahead based regex:
new Regex("(?!\b(?:my|test)\b)\b(\w+)\s*").Replace("Thats my working test", "");
//=> my test
Your use of negation in character class is incorrect here: ([^my]|[^test])
Since inside character class every character is checked individually not as a string.
RegEx Demo
Use this regex replacement:
new Regex("\b(?!my|test)\w+\s?").Replace("Thats my working test", "");
Here is a regex demo!
\b Asserts the position before our word to check.
(?! Negative lookahead - asserts that our match is NOT:
my|test The character sequences "my" or "test".
)
\w+ Then match the word because it's what we want.
\s? And scrap the whitespace after it if it's there, too.
I can suggest to use next regEx :
var res = new Regex(#"my(?:$|[\s\.;\?\!,])|test(?:$|[\s\.;\?\!,])").Replace("Thats my working test", "");
Upd: Or even simplier:
var res = new Regex(#"my($|[\s])|test($|[\s])").Replace("Thats my working test", "");
Upd2: If you don't know what word you'll use you can do it even more flexible:
private string ExeptWords(string input, string[] exept){
string tmpl = "{0}|[\s]";
var regexp = string.Join((exept.Select(s => string.Format(tmpl, s)),"|");
return new Regex(regexp).Replace(("Thats my working test", "");
}

Regex pattern for text between 2 strings

I am trying to extract all of the text (shown as xxxx) in the follow pattern:
Session["xxxx"]
using c#
This may be Request.Querystring["xxxx"] so I am trying to build the expression dynamically. When I do so, I get all sorts of problems about unescaped charecters or no matches :(
an example might be:
string patternstart = "Session[";
string patternend = "]";
string regexexpr = #"\\" + patternstart + #"(.*?)\\" + patternend ;
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Can anyone help with this as I am stumped (as I always seem to be with RegEx :) )
With some little modifications to your code.
string patternstart = Regex.Escape("Session[");
string patternend = Regex.Escape("]");
string regexexpr = patternstart + #"(.*?)" + patternend;
The pattern you construct in your example looks something like this:
\\Session[(.*?)\\]
There are a couple of problems with this. First it assumes the string starts with a literal backslash, second, it wraps the entire (.*?) in a character class, that means it will match any single open parenthesis, period, asterisk, question mark, close parenthesis or backslash. You'd need to escape the the brackets in your pattern, if you want to match a literal [.
You could use a pattern like this:
Session\[(.*?)]
For example:
string regexexpr = #"Session\[(.*?)]";
string sText = "Text to be searched containing Session[\"xxxx\"] the result would be xxxx";
MatchCollection matches = Regex.Matches(sText, #regexexpr);
Console.WriteLine(matches[0].Groups[1].Value); // "xxxx"
The characters [ and ] have a special meaning with regular expressions - they define a group where one of the contained characters must match. To work around this, simply 'escape' them with a leading \ character:
string patternstart = "Session\[";
string patternend = "\]";
An example "final string" could then be:
Session\["(.*)"\]
However, you could easily write your RegEx to handle Session, Querystring, etc automatically if you require (without also matching every other array you throw at it), and avoid having to build up the string in the first place:
(Querystring|Session|Form)\["(.*)"\]
and then take the second match.

Need Regex to match [#URL^Url Description^#]

I need regex to find this text
[#URL^Url Description^#]
in a string and replace it with
Url Description
"Url Description" can be set of characters in any language.
Any Regex Experts out there to help me?
Thanks.
It might be a bit confusing, but you can use the following:
string str = #"[#URL^Url Description^#]";
var regex = new Regex(#"^[^^]+\^([^^]+)\^[^^]+$");
var result = regex.Replace(str, #"$1");
The first ^ means the beginning of the string;
The [^^]+ means anything not a caret character;
The \^ is a literal caret;
The $ is the end of the string.
Basically, it captures all characters between the carets (^) and replace this in between the <a> tags.
See ideone demo.
You can also replace the last line with this:
var result = regex.Replace(str, #"$1");
Where link is the variable containing the link you want to replace in.
Why don't you use String.Replace()? A regex would work, but it looks like the format is well defined and regexes are harder to read.
string url = "[#URL^blah^#]";
string url_html = url.Replace("[#URL^", "<a href=\"http://www.somewhere.net\">")
.Replace("^#]", "</a>");

RegEx Problem using .NET

I have a little problem on RegEx pattern in c#. Here's the rule below:
input: 1234567
expected output: 123/1234567
Rules:
Get the first three digit in the input. //123
Add /
Append the the original input. //123/1234567
The expected output should looks like this: 123/1234567
here's my regex pattern:
regex rx = new regex(#"((\w{1,3})(\w{1,7}))");
but the output is incorrect. 123/4567
I think this is what you're looking for:
string s = #"1234567";
s = Regex.Replace(s, #"(\w{3})(\w+)", #"$1/$1$2");
Instead of trying to match part of the string, then match the whole string, just match the whole thing in two capture groups and reuse the first one.
It's not clear why you need a RegEx for this. Why not just do:
string x = "1234567";
string result = x.Substring(0, 3) + "/" + x;
Another option is:
string s = Regex.Replace("1234567", #"^\w{3}", "$&/$&"););
That would capture 123 and replace it to 123/123, leaving the tail of 4567.
^\w{3} - Matches the first 3 characters.
$& - replace with the whole match.
You could also do #"^(\w{3})", "$1/$1" if you are more comfortable with it; it is better known.
Use positive look-ahead assertions, as they don't 'consume' characters in the current input stream, while still capturing input into groups:
Regex rx = new Regex(#"(?'group1'?=\w{1,3})(?'group2'?=\w{1,7})");
group1 should be 123, group2 should be 1234567.

Categories

Resources