Regex between " but not beyond - c#

I want to use a regex to capture everything between " (including the " itself)
The problem is this:
Regex:
\\\"(.[^,][^\\\"]*)\\\"
Text:
"text", text2, "text"
meeeh = "Y"
else
meeeh2 = "N"
with this regex, the folling is selected:
"text" "text"
"Y"
else
meeeh2 = "
The problem seems to be that the regex doesn't stop when nothing is behind the " or when there is a newline.
Any ideas?

.*?(\".*?\").*?
Try this.Please have a look at the demo.
http://regex101.com/r/cA4wE0/7

When it reaches the first " in "Y", this is what the regex does:
\" matches "
. matches Y
[^,] matches "
[^\"]* matches else meeeh2 =
\" matches "
Essentially you're looking for "Any character, then anything that's not a comma, then anything but double quotes until the end" between the quotes. This means at least 2 characters, but Y is only 1.
If you mean anything but quotes between quotes, use \"([^"]*)\". If you mean anything but quotes and commas, \"([^",]*)\" should do.

Related

how to regex that kind of string

I want regex that string, but I really dont know how. I have figured out how I can get the numbers, but not the other strings
string text = "1cb07348-34a4-4741-b50f-c41e584370f7 Youtuber https://youtube.com/lol love youtube";
string regexstring = "[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]*(?<id>)"
code
Match m = Regex.Match(text, regexstring);
if(m.Success)
Console.WriteLine(m.Groups[0]);
Output
1cb07348-34a4-4741-b50f-c41e584370f7
now I want that the output is that
1cb07348-34a4-4741-b50f-c41e584370f7
Youtuber
https://youtube.com/lol
love youtube
what I finished is the first line of the output but I dont know how to regex the other strings
([\w]+-){5} is cleaner to replace what you already did.
\w means [a-zA-Z0-9_].
Then, if your string always has a website preceded and followed by a number of words separated by spaces, you can do this:
string regexstring = "((\w*-){4})(\w*) (.+?)[A-Za-z]?(https://[^ ]+?) (.+)";
Ouput
Match m = Regex.Match(text, regexstring);
if(m.Success)
Console.WriteLine(m.Groups[1] + "" + m.Groups[2] + "" + m.Groups[3] + "\n" + m.Groups[4] + "\n" + m.Groups[5] + "\n" + m.Groups[6]);
I'm guessing that, if our inputs would look like the same, this expression might be somewhat close to what you might have in mind, not sure though:
^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s+(.*?)\s+[A-Z](https?:\/\/\S+)\s+(.*)$
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Reference
Searching for UUIDs in text with regex

Regex to remove whitepsace except when it is inside " "

I am looking everywhere but can't find the answer. I need a regex which removes all spaces in a string but keeps only the ones that are inside "".
Example: $F:2 $PX:30 $PY:980 $T: " " or $F:A $PX:30B $PY:9K80 $T: " " so in the end it should look like $F:2$PX:30$PY:980$T:" "
It would be valuable to explain how to read the regex that you answer.
This will match whitespace which is touching a ", yet not enclosed by them.
" +(?!\")|(?<!\") +"
And for all white space:
"\s+(?!\")|(?<!\")\s+"
You can test it on Regex101 or Rextester
The Greatest Regex Trick Ever is quite helpful in such cases:
var str = "$F:2 $PX:30 $PY:980 \" \"$T:\" \"";
str = Regex.Replace(str, "\"\\s+\"|\\s+", m => { return m.Value.StartsWith("\"") ? m.Value : ""; });
Console.WriteLine(str);
Demo: https://dotnetfiddle.net/Q54FlJ
Matching a space not preceeded nor followed by quotation mark:
(?<!") (?!")
Matching all whitespace:
(?<!")\s+(?!")
Note: This might not work on more than one space, as pointed out by Dmitry.

Getting the substring after a character in C# using regex

I have the following input string:
string val = "[01/02/70]\nhello world ";
I want to get the all words after the last ] character.
Example output for a sample string above:
\nhello world
In C#, use Substring() with IndexOf:
string val = val.Substring(val.IndexOf(']') + 1);
If you have multiple ] symbols, and you want to get all the string after the last one, use LastIndexOf:
string val = "[01/02/70]\nhello [01/02/80] world ";
string val = val.Substring(val.LastIndexOf(']') + 1); // => " world "
If you are a fan of Regex, you might want to use a Regex.Replace like
string val = "[01/02/70]\nhello [01/02/80] world ";
val = Regex.Replace(val, #"^.*\]", string.Empty, RegexOptions.Singleline); // => " world "
See demo
Notes on REGEX:
RegexOptions.Singleline makes . match a linebreak
^ - matches beginning of string
.* - matches 0 or more characters but as many as possible (greedy matching)
\] - matches literal ] (as it is a special regex metacharacter, it must be escaped).
You need to use lookbehind assertion. And not only that, you have to enable DOTALL modifier also, so that it would also match the newline character present inbetween.
"(?s)(?<=\\]).*"
(?s) - DOTALL modifier.
(?<=\\]) - lookbehind which asserts that the match must be preceeded by a close bracket
.* - Matches any chracater zero or more times.
or
"(?s)(?<=\\])[\\s\\S]*"
Try this if you don't want to match the following newline character.
#"(?<=\][\n\r]*).*"

C# Regex Can't Match Anything (Probably because can't escape characters properly)

I make a regex pattern and tested in this site : http://rubular.com/
I'm writing this pattern exactly like this to the first box in that site.
<div class="product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/>
I left the second box empty.
My regex pattern working perfectly fine respect to this site.
But i can't get it working in C#
I'm trying this:
WebClient client = new WebClient();
string MainPage = client.DownloadString("http://www.vatanbilgisayar.com/cep-telefonu-modelleri/");
string ItemPattern = "<div class=\"product clearfix\">\\n+" + // <div class="product clearfix">\n
"<div class=\"img\">\\n" + // <div class="img">\n
"+<a href=\"(.*?)\">\\n" + // +<a href="(.*?)">\n
"+<img class=\"lazyload\"" + // +<img class="lazyload"
"id='.*' data-original=\"(.*?)\"" + // id='.*' data-original="(.*?)"
"alt=\".*\" title=\"(.*?)\"\\/>"; // alt=".*" title="(.*?)" \/>
MatchCollection matches = Regex.Matches(MainPage, ItemPattern);
foreach (Match match in matches)
{
Console.WriteLine("Area Code: {0}", match.Groups[1].Value);
Console.WriteLine("Telephone number: {0}", match.Groups[2].Value);
Console.WriteLine();
}
I simply escaped every " with \ . I really don't understand why it's not working and this starting to drive me crazy..
You need 2 layers of escape sequences. You need to escape once for c# and once more for the regex syntax.
If you want to escape characters for regex have to escape \ too, so you should change your \ to \\ for escape sequences at the regex level
use TWO \'s for every single \ in your string. Not counting the escaping you already did for the quotes. Since \ is an escape character. It looks like mainly with "\n" occurring 3 times.
Original String:
"product clearfix">\n+<div class="img">\n+<a href="(.*?)">\n+<img class="lazyload" id='.*' data-original="(.*?)" alt=".*" title="(.*?)" \/
Also, you can break that up into more than one line. c# ignores spaces, so just close the quote and add a "+" to the end of the line, continue by starting with another quote.
C# String:
string ItemPattern = "<div class=\"product clearfix\">\\n" + // <div class="product clearfix">\n
"+<div class=\"img\">\\n" + // +<div class="img">\n
"+<a href=\"(.*?)\">\\n" + // +<a href="(.*?)">\n
"+<img class=\"lazyload\"" + // +<img class="lazyload"
"id='.*' data-original=\"(.*?)\"" + // id='.*' data-original="(.*?)"
"alt=\".*\" title=\"(.*?)\"\\/>"; // alt=".*" title="(.*?)" \/>
If you still have a problem with it, there is something else wrong, probably in the RegEx.Match(mainPage, ItemPattern). According to the debugging you did, it sounds like the string is successfully being created, and there is no MatchCollection. So it's either in how you are obtaining the matches, or in referencing them.

Regex syntax in a C# application

I am trying to figure out how to replace by a space all punctuation from a string but keeping one special character : '-'
For example, the sentence
"hi! I'm an out-of-the-box person, did you know ?"
should be transformed into
"hi I m an out-of-the-box person did you know "
I know the solution will be a single line Regex expression, but I'm really not used to "think" in Regex, so what I have tried so far is replacing all '-' by '9', then replacing all punctuation by ' ', then re-replacing all '9' by '-'. It works, but this is awful (especially if the input contains some '9' characters) :
string s = #"Hello! Hi want to remove all punctuations but not ' - ' signs ... Please help ;)";
s = s.Replace("-", "9");
s = Regex.Replace(s, #"[\W_]", " ");
s = s.Replace("9", "-");
So, can someone help me writing a Regex that only catch punctuation different from '-' ?
How about replacing matches for the following regex with a space:
[^\w\s-]|_
This says, any character that is not a word character, digit, whitespace, or dash.
This regex should help. Use Character class subtraction to remove some character from character classes.
var expected = Regex.Replace(subject, #"[_\W-[\-\s]]","");
You can do this by using Linq:
var chars = s.Select(c => char.IsPunctuation(c) && c != '-' ? ' ' : c);
var result = new string(chars.ToArray());
Place everything you consider punctuation into a set [ ... ] and look for that as a single match character in a ( ... ) to be replaced. Here is an example where I seek to replace !, ., ,,', and ?.
string text = "hi! I'm an out-of-the-box person, did you know ?";
Console.WriteLine (
Regex.Replace(text, "([!.,'?])", " ")
);
// result:
// hi I m an out-of-the-box person did you know
Update
For the regex purist who doesn't want to specify a set one can use set subtraction. I still specify a set which searches for any non alphabetic character \W which will match all items including the -. But by using set subtraction -[ ... ] we can place the - to be excluded.
Here is that example
Regex.Replace(text, #"([\W-[-]])", " ")
// result:
// hi I m an out-of-the-box person did you know

Categories

Resources