C#, Regex, ASCII Escape Characters - won't match

C#, Regex, ASCII Escape Characters - won't match - c#

string pat12 = #"\e\[36m(.+)";
line2parse = "[36mA Rocky Landing";
if (Regex.Match(line2parse,pat12).Success)
{
Console.WriteLine("ROOM NAME: "+Regex.Match(line2parse,pat12).Groups[1].ToString().Trim());
}
NOTE: You are unable to see the ESC sequence before the [36 in my variable "line2parse" but its there. If I wanted to make it show up in Notepad++ I would hold down ALT and type "027" on the numpad. It then shows up as ESC in Notepad++. In C# console it shows up as a left pointing arrow. Ironically I can do "ALT + 27" and it shows that left arrow in Notepad++ (but if i add the 0, it does the ESC look instead of the arrow)
I am doing many Regex matches so the problem isn't with my Regex variable or anything like that. I cannot get this to match properly despite it working at this site: http://regexstorm.net/tester
At that site, this is the pattern: "\e[36m(.+)" (without quotes) and this is the input: [36mA Rocky Landing (again you can't see the escape thing). It then tells me $1 would be "A Rocky Landing" but in my actual code, it doesn't match.
What am I doing wrong?
I have looked through quite a few other, similar, posts and based on what they say, I believe this should work. I even tried [^\x00-\x7F] as my escape char catch and it still wont match.

Why do you think \e is the escape sequence for an escape character? You need to use \x1b instead.
string pat12 = #"\x1b\[36m(.+)";

Related

When multi-line text pasted into text input regex does not match the space

When user pastes something like this (from notepad for example):
multi
line#email.com
into input text box, the line break dissapears and it looks like this:
multi
line#email.com
But whatever the line break is converted to does not match this regex:
'\s|\t|\r|\n|\0','i'
so this invalid character passes through js validation to the .NET application code I am working on.
It is interesting but this text editor does the same transformation, that is why I had to post original sample as code. I would like to find out what the line break got converted to, so I can add a literal to the regex but I don't know how. Many thanks!
Here is the whole snippet:
var invalidChars = new RegExp('(^[.])|[<]|[>]|[(]|[)]|[\]|[,]|[;]|[:]|([.])[.]|\s|\t|\r|\n|\0', 'i');
if (text.match(invalidChars)) {
return false;
}

Your immediate problem is escaping. You're using a string literal to create the regex, like this:
'(^[.])|[<]|[>]|[(]|[)]|[\]|[,]|[;]|[:]|([.])[.]|\s|\t|\r|\n|\0'
But before it ever reaches the RegExp constructor, the [\] becomes []; \s becomes s; \0 becomes 0; and \t, \r and \n are converted to the characters they represent (tab, carriage return and linefeed, respectively). That won't happen if you use a regex literal instead, but you still have to escape the backslash to match a literal backslash.
Your regex is also has way more brackets than it needs. I think this is what you were trying for:
/^\.|\.\.|[<>()\\,;:\s]/
That matches a dot at the beginning, two consecutive dots, or one of several forbidden characters including any whitespace character (\s matches any whitespace character, not just a space).

Ok - here it is
vbCrLF
This is what pasted line breaks are converted to. I added (vbCrLF) group and those spaces are now detected. Thanks, Dan1M
http://forums.asp.net/t/1183613.aspx?Multiline+Textbox+Input+not+showing+line+breaks+in+Repeater+Control

Backslashes not working properly in my web service

I have a simple line of code in a web service:
instance = #"\instanceNameHere";
Yet the output is always the same.
\\instanceNameHere
If I remove the # and use two slashes, I get the same result. I've never seen this before and my Google-fu has failed me. I even wrote a simple app and the result was correct. So why is it acting up in the web service?

It's escaping the slash for you in the debugger so you know that it's a slash and not an escape sequence like \t. If the debugger did not do this, how could you distinguish the string
\t
from the string
<tab>
in the debugger since the latter is represented in an escape sequence by \t? Therefor the former is shown as
\\t
and the latter as
\t
Write it to a stream or the console and you'll see that it only has one slash, or do instance.Length and compare to a count of the characters. You'll see 17 on the console, whereas \\instanceNameHere has eighteen characters.

The debugger displays strings as C# literals. So it's displaying them with characters escaped. It would also show carriage returns as \r and tabs as \t. This is purely for visualization -- the string does not literally contain these escape characters. If you write it out to a log, it will not include the escape characters -- it will look as you expect.

A UNC name of any format, which always start with two backslash characters ("\").
Link
Update : Please see #Jason post above! I didn't realise he was checking in the debugger.

Regex vs String.Contains

Hola. I'm failing to write a method to test for words within a plain text or html document. I was reasonably literate with regex, and I am newer to c# (from way more java).
Just 'cause,
string html = source.ToLower();
string plaintext = Regex.Replace(html, #"<(.|\n)*?>", " "); // remove tags
plaintext = Regex.Replace(plaintext, #"\s+", " "); // remove excess white space
and then,
string tag = "c++";
bool foundAsRegex = Regex.IsMatch(plaintext,#"\b" + Regex.Escape(tag) + #"\b");
bool foundAsContains = plaintext.Contains(tag);
For a case where "c++" should be found, sometimes foundAsRegex is true and sometimes false. My google-fu is weak, so I didn't get much back on "what the hell". Any ideas or pointers welcome!
edit:
I'm searching for matches on skills in resumes. for example, the distinct value "c++".
edit:
a real excerpt is given below:
"...administration- c, c++, perl, shell programming..."

The problem is that \b matches between a word character and a non-word character. Given the expression \bc\+\+\b, you have a problem. "+" is a non-word character. So searching for the pattern in "xxx c++, xxx", you're not going to find anything. There's no "word break" after the "+" character.
If you're looking for non-word characters then you'll have to change your logic. Not sure what the best thing would be. I suppose you can use \W, but then it's not going to match at the beginning or end of the line, so you'll need (^|\W) and (\W|$) ... which is ugly. And slow, although perhaps still fast enough depending on your needs.

Your regular expression is turning into:
/\bc\+\+\b/
Which means you're looking for a word boundary, followed by the string c++, followed by another word boundary. This means it won't match on strings like abc++, whereas plaintext.Contains will succeed.
If you can give us examples of where your regex fails when you expected it to succeed, then we can give you a more definite answer.
Edit: My original regex was /\bc++\b/, which is incorrect, as c++ is being passed to Regex.Escape(), which escapes out regular expression metacharacters like +. I've fixed it above.

string.replace seriously broken with \

"C://test/test/test.png" -> blub
blub = blub.Replace(#"/", #"\");
result = "C:\\\\test\\test\\test.png"
how does that make sense? It replaces a single / with two \
?

It's actually working:
string blub = "C://test/test/test.png";
string blub2 = blub.Replace(#"/", #"\");
Console.WriteLine(blub);
Console.WriteLine(blub2);
Output:
C://test/test/test.png
C:\\test\test\test.png
BUT viewing the string in the debugger does show the effect you describe (and is how you would write the string literal in code without the #).
I've noticed this before but never found out why the debugger chooses this formatting.

No, it doesn't.
What you're seeing is the properly formatted string according to C# rules, and since the output you're seeing is shown as though you haven't prefixed it with the # character, every backslash is doubled up, because that's what you would have to write if you wanted that string in the first place.
Create a new console app and write the result to the console, and you'll see that the string looks like you wanted it to.
So this is just an artifact of how you look at the string (I assume the debugger).

The \ character in C# is the escape character, so if you are going to use it as a \ character you need two - otherwise the next character gets treated specially (new line etc).
See What character escape sequences are available? (C#)

The character \ is a special character, which changes the meaning of the character after it in string literals. So when you refer to \ itself, it needs to be escaped: \\.

Look up "escape characters".

Its done what it should.
"\\" is the same as #"\"
"\" is an escape character. Without the verbatim indicator "#" before a string a single \ is shown as "\\"

You should think twice before saying something like that....
The string.Replace function is basic functionality that has been around for a long time.... Whenever you find you have a problem with something like that, it's probably not the function that is broken, but your understanding or use of it.

I need a regular expression to convert US tel number to link

Basically, the input field is just a string. People input their phone number in various formats. I need a regular expression to find and convert those numbers into links.
Input examples:
(201) 555-1212
(201)555-1212
201-555-1212
555-1212
Here's what I want:
(201) 555-1212 - Notice the space is gone
(201)555-1212
201-555-1212
555-1212
I know it should be more robust than just removing spaces, but it is for an internal web site that my employees will be accessing from their iPhone. So, I'm willing to "just get it working."
Here's what I have so far in C# (which should show you how little I know about regular expressions):
strchk = Regex.Replace(strchk, #"\b([\d{3}\-\d{4}|\d{3}\-\d{3}\-\d{4}|\(\d{3}\)\d{3}\-\d{4}])\b", "<a href='tel:$&'>$&</a>", RegexOptions.IgnoreCase);
Can anyone help me by fixing this or suggesting a better way to do this?
EDIT:
Thanks everyone. Here's what I've got so far:
strchk = Regex.Replace(strchk, #"\b(\d{3}[-\.\s]\d{3}[-\.\s]\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]\d{4}|\d{3}[-\.\s]\d{4})\b", "<a href='tel:$1'>$1</a>", RegexOptions.IgnoreCase);
It is picking up just about everything EXCEPT those with (nnn) area codes, with or without spaces between it and the 7 digit number. It does pick up the 7 digit number and link it that way. However, if the area code is specified it doesn't get matched. Any idea what I'm doing wrong?
Second Edit:
Got it working now. All I did was remove the \b from the start of the string.

Remove the [] and add \s* (zero or more whitespace characters) around each \-.
Also, you don't need to escape the -. (You can take out the \ from \-)
Explanation: [abcA-Z] is a character group, which matches a, b, c, or any character between A and Z.
It's not what you're trying to do.
Edits
In response to your updated regex:
Change [-\.\s] to [-\.\s]+ to match one or more of any of those characters (eg, a - with spaces around it)
The problem is that \b doesn't match the boundary between a space and a (.

Afaik, no phone enters the other characters, so why not replace [^0-9] with '' ?

Here's a regex I wrote for finding phone numbers:
(\+?\d[-\.\s]?)?(\(\d{3}\)\s?|\d{3}[-\.\s]?)\d{3}[-\.\s]?\d{4}
It's pretty flexible... allows a variety of formats.
Then, instead of killing yourself trying to replace it w/out spaces using a bunch of back references, instead pass the match to a function and just strip the spaces as you wanted.
C#/.net should have a method that allows a function as the replace argument...
Edit: They call it a `MatchEvaluator. That example uses a delegate, but I'm pretty sure you could use the slightly less verbose
(m) => m.Value.Replace(' ', '')
or something. working from memory here.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C#, Regex, ASCII Escape Characters - won't match - c#

Why do you think \e is the escape sequence for an escape character? You need to use \x1b instead. string pat12 = #"\x1b\[36m(.+)";

Related

When multi-line text pasted into text input regex does not match the space

Backslashes not working properly in my web service

Regex vs String.Contains

string.replace seriously broken with \

I need a regular expression to convert US tel number to link

Categories

Resources