Regex to remove whitepsace except when it is inside " " - c#

I am looking everywhere but can't find the answer. I need a regex which removes all spaces in a string but keeps only the ones that are inside "".
Example: $F:2 $PX:30 $PY:980 $T: " " or $F:A $PX:30B $PY:9K80 $T: " " so in the end it should look like $F:2$PX:30$PY:980$T:" "
It would be valuable to explain how to read the regex that you answer.

This will match whitespace which is touching a ", yet not enclosed by them.
" +(?!\")|(?<!\") +"
And for all white space:
"\s+(?!\")|(?<!\")\s+"
You can test it on Regex101 or Rextester

The Greatest Regex Trick Ever is quite helpful in such cases:
var str = "$F:2 $PX:30 $PY:980 \" \"$T:\" \"";
str = Regex.Replace(str, "\"\\s+\"|\\s+", m => { return m.Value.StartsWith("\"") ? m.Value : ""; });
Console.WriteLine(str);
Demo: https://dotnetfiddle.net/Q54FlJ

Matching a space not preceeded nor followed by quotation mark:
(?<!") (?!")
Matching all whitespace:
(?<!")\s+(?!")
Note: This might not work on more than one space, as pointed out by Dmitry.

Related

how to regex that kind of string

I want regex that string, but I really dont know how. I have figured out how I can get the numbers, but not the other strings
string text = "1cb07348-34a4-4741-b50f-c41e584370f7 Youtuber https://youtube.com/lol love youtube";
string regexstring = "[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]*(?<id>)"
code
Match m = Regex.Match(text, regexstring);
if(m.Success)
Console.WriteLine(m.Groups[0]);
Output
1cb07348-34a4-4741-b50f-c41e584370f7
now I want that the output is that
1cb07348-34a4-4741-b50f-c41e584370f7
Youtuber
https://youtube.com/lol
love youtube
what I finished is the first line of the output but I dont know how to regex the other strings
([\w]+-){5} is cleaner to replace what you already did.
\w means [a-zA-Z0-9_].
Then, if your string always has a website preceded and followed by a number of words separated by spaces, you can do this:
string regexstring = "((\w*-){4})(\w*) (.+?)[A-Za-z]?(https://[^ ]+?) (.+)";
Ouput
Match m = Regex.Match(text, regexstring);
if(m.Success)
Console.WriteLine(m.Groups[1] + "" + m.Groups[2] + "" + m.Groups[3] + "\n" + m.Groups[4] + "\n" + m.Groups[5] + "\n" + m.Groups[6]);
I'm guessing that, if our inputs would look like the same, this expression might be somewhat close to what you might have in mind, not sure though:
^(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)\s+(.*?)\s+[A-Z](https?:\/\/\S+)\s+(.*)$
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Reference
Searching for UUIDs in text with regex

Regex between " but not beyond

I want to use a regex to capture everything between " (including the " itself)
The problem is this:
Regex:
\\\"(.[^,][^\\\"]*)\\\"
Text:
"text", text2, "text"
meeeh = "Y"
else
meeeh2 = "N"
with this regex, the folling is selected:
"text" "text"
"Y"
else
meeeh2 = "
The problem seems to be that the regex doesn't stop when nothing is behind the " or when there is a newline.
Any ideas?
.*?(\".*?\").*?
Try this.Please have a look at the demo.
http://regex101.com/r/cA4wE0/7
When it reaches the first " in "Y", this is what the regex does:
\" matches "
. matches Y
[^,] matches "
[^\"]* matches else meeeh2 =
\" matches "
Essentially you're looking for "Any character, then anything that's not a comma, then anything but double quotes until the end" between the quotes. This means at least 2 characters, but Y is only 1.
If you mean anything but quotes between quotes, use \"([^"]*)\". If you mean anything but quotes and commas, \"([^",]*)\" should do.

Replace - if preceeded by alphabet

I want to replace hyphen character with space if it is NOT enclosed by digits on both sides.
string str = "Hefer 789-567 dfg-5mh";
str = Regex.Replace(str, #"[a-zA-Z]\-(\d+)", "$1");
Output
Hefer 789-567 df5mh
Desired output
Hefer 789-567 dfg 5mh
You can use negative lookahead and lookbehind: (?<!\d)-|-(?!\d) says "match a - that is not preceeded by a \d or a - that is not followed by a \d".
Thus your regex would be something like
string str = "Hefer 789-567 dfg-5gh";
str = Regex.Replace(str, #"(?<!\d)-|-(?!\d)", " ");
Edit: Note that this also replaces hyphens at the start or end of the string. If you want to avoid this you can use (?<!\d|^)-(?=.)|(?<=.)-(?!\d|$) or (?<=[^\d])-(?=.)|(?<=.)-(?=[^\d]).
The problem you are describing in your title can be solved using this:
Regex.Replace(str, #"(?<=[A-Za-z])-", " ");
The problem you are describing in the body of your question can be solved using this:
Regex.Replace(str, #"(?<!\d)-|-(?!\d)", " ");
Or without lookaround:
Regex.Replace(str, #"([^\d])-|-([^\d])", "$1 $2");

Regular expression missing characters

Below code works when the regular expression match with the string. What if one of the characters are not there, for example MONEY-STAT is missing?
string s = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
Regex regex =
new Regex(#"MONEY-ID(?<moneyId>.*?)\:MONEY-STAT(?<moneyStat>.*?)\:MONEY-PAYetr-(?<moneyPaetr>.*?)$");
Match match = regex.Match(s);
if (match.Success)
{
Console.WriteLine("Money ID: " + match.Groups["moneyId"].Value);
Console.WriteLine("Money Stat: " + match.Groups["moneyStat"].Value);
Console.WriteLine("Money Paetr: " + match.Groups["moneyPaetr"].Value);
}
Console.WriteLine("hit <enter>");
Console.ReadLine();
I changed MONEY-STAT to (?:MONEY-STAT)?
MONEY-ID(?<moneyId>.*?)\:(?:MONEY-STAT)?(?<moneyStat>.*?)\:MONEY-PAYetr-(?<moneyPaetr>.*?)$
explain:
(?: subexpression) Defines a noncapturing group.
? Matches the previous element zero or one time.
Maybe I misunderstand your question.. but does this suit you?
(MONEY-ID(?<moneyId>.*?)\:)?(MONEY-STAT(?<moneyStat>.*?)\:)?(MONEY-PAYetr-)?(?<moneyPaetr>.*?)$
It basically makes each token optional.. It also includes the colon, since that's obviously a delimiter of some kind.
DISCLAIMER
I am terrible at regular expressions.. but this worked from my test here: http://ideone.com/0pdFk
You could try this:
(MONEY-ID[\d]+|(:?)MONEY-STAT[\d]+|:MONEY-PAYetr-[\d]+)
This will match patterns like below:
MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938
MONEY-STAT43:MONEY-PAYetr-1232832938
MONEY-ID123456:MONEY-PAYetr-1232832938

How to convert a space into non breaking space entity in c#

I want to convert more that spaces in a string to through c#?
Like if string is
My name is this.
then output should be
My name is this.
Replace the "regular" space with the "non-breaking space" Unicode character:
string outputString = "Input text".Replace(" ", "\u00A0");
Try with RegEx if you need to convert multiple spaces to a single non-breaking-space:
string convertedText =
new Regex("[ ]{2,}").Replace(textToConvert, " ");
Example:
My Name is this
^ ^^^ ^
It'll be changed to:
My Name is this
UPDATE
If you need to preserve extra spaces (and to replace with nbsp only multiple spaces) you may use this regex:
string convertedText =
new Regex(" (?= )|(?<= ) ").Replace(textToConvert, " ");
Example:
My Name is this
^ ^^^ ^
It'll be changed to:
My Name is this
For the second case, as alternative, you may even do not use regex at all (just loop) but they should be faster if you have to do it often with the same regex.
Correction the line below will not work
Please use Server.HtmlEncode for it
You will have to do it by code
string s = " ";
if(s == " ")
{
s = " "
}
Or use "My name is this".Replace(" ", " ");
Try this
string myString = "My name is this".Replace(" ", " ");

Categories

Resources