Split by delimiter without remove it from string - c#

I'm want to use a Regex to split long string for seperated lines.
Line can include any possible unicode character.
Line is "ending" on dot ("." - one or more) or on new line ("\n").
Example:
This string will be the input:
"line1. line2.. line3... line4.... line5..... line6
\n
line7"
The output:
"line1."
"line2.."
"line3..."
"line4...."
"line5....."
"line6"
"line7"

If I understand what you're asking for, you might try a pattern like this:
(?<=\.)(?!\.)|\n
This will split the string on any position which is preceded by a . but not followed by a . or a \n character.
Note that this pattern preserves any whitespace after the dots, for example:
var input = #"line1. line2.. line3... line4.... line5..... line6\nline7";
var output = Regex.Split(input, #"(?<=\.)(?!\.)|\n");
Produces
line1.
line2..
line3...
line4....
line5.....
line6
line7
If you'd like to get rid of the whitespace simply change this to:
(?<=\.)(?!\.)\s*|\n
But if you know that the dots will always be followed by whitespace, you can simplify this to:
(?<=\.)\s+|\n

Try this:
String result = Regex.Replace(subject, #"""?(\w+([.]+)?)(?:[\n ]|[""\n]$)+", #"""$1""\n");
/*
"line1."
"line2.."
"line3..."
"line4...."
"line5....."
"line6"
"line7"
*/
Regex Explanation
"?(\w+([.]+)?)(?:[\n ]|["\n]$)+
Match the character “"” literally «"?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regular expression below and capture its match into backreference number 1 «(\w+([.]+)?)»
Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below and capture its match into backreference number 2 «([.]+)?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character “.” «[.]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below «(?:[\n ]|["\n]$)+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match either the regular expression below (attempting the next alternative only if this one fails) «[\n ]»
Match a single character present in the list below «[\n ]»
A line feed character «\n»
The character “ ” « »
Or match regular expression number 2 below (the entire group fails if this one fails to match) «["\n]$»
Match a single character present in the list below «["\n]»
The character “"” «"»
A line feed character «\n»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

If you want to keep all dots intact and dots will be followed by a empty space, then this could be your regex:
String result = Regex.Replace(t, #".\s", #".\n");
This will be one string. You haven't stated if you want more strings or one as result.

Related

Is there a regular expression for matching a string that has no more than 2 repeating characters? [duplicate]

I want to match strings that do not contain more than 3 of the same character repeated in a row. So:
abaaaa [no match]
abawdasd [match]
abbbbasda [no match]
bbabbabba [match]
Yes, it would be much easier and neater to do a regex match for containing the consecutive characters, and then negate that in the code afterwards. However, in this case that is not possible.
I would like to open out the question to x consecutive characters so that it can be extended to the general case to make the question and answer more useful.
Negative lookahead is supported in this case.
Use a negative lookahead with back references:
^(?:(.)(?!\1\1))*$
See live demo using your examples.
(.) captures each character in group 1 and the negative look ahead asserts that the next 2 chars are not repeats of the captured character.
To match strings not containing a character repeated more than 3 times consecutively:
^((.)\2?(?!\2\2))+$
How it works:
^ Start of string
(
(.) Match any character (not a new line) and store it for back reference.
\2? Optionally match one more exact copies of that character.
(?! Make sure the upcoming character(s) is/are not the same character.
\2\2 Repeat '\2' for as many times as you need
)
)+ Do ad nauseam
$ End of string
So, the number of /2 in your whole expression will be the number of times you allow a character to be repeated consecutively, any more and you won't get a match.
E.g.
^((.)\2?(?!\2\2\2))+$ will match all strings that don't repeat a character more than 4 times in a row.
^((.)\2?(?!\2\2\2\2))+$ will match all strings that don't repeat a character more than 5 times in a row.
Please be aware this solution uses negative lookahead, but not all not all regex flavors support it.
I'm answering this question :
Is there a regular expression for matching a string that has no more than 2 repeating characters?
which was marked as an exact duplicate of this question.
Its much quicker to negate the match instead
if (!Regex.Match("hello world", #"(.)\1{2}").Success) Console.WriteLine("No dups");

C# - Removing single word in string after certain character

I have string that I would like to remove any word following a "\", whether in the middle or at the end, such as:
testing a\determiner checking test one\pronoun
desired result:
testing a checking test one
I have tried a simple regex that removes anything between the backslash and whitespace, but it gives the following result:
string input = "testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*\s");
string output = regex.Replace(input, " ");
Result:
testing a one\pronoun
It looks like this regex matches from the backslash until the last whitespace in the string. I cannot seem to figure out how to match from the backlash to the next whitespace. Also, I am not guaranteed a whitespace at the end, so I would need to handle that. I could continue processing the string and remove any text after the backslash, but I was hoping I could handle both cases with one step.
Any advice would be appreciated.
Change .* which match any characters, to \w*, which only match word characters.
Regex regex = new Regex(#"\\\w*");
string output = regex.Replace(input, "");
".*" matches zero or more characters of any kind. Consider using "\w+" instead, which matches one or more "word" characters (not including whitespace).
Using "+" instead of "*" would allow a backslash followed by a non-"word" character to remain unmatched. For example, no matches would be found in the sentence "Sometimes I experience \ an uncontrollable compulsion \ to intersperse backslash \ characters throughout my sentences!"
With your current pattern, .* tells the parser to be "greedy," that is, to take as much of the string as possible until it hits a space. Adding a ? right after that * tells it instead to make the capture as small as possible--to stop as soon as it hits the first space.
Next, you want to end at not just a space, but at either a space or the end of the string. The $ symbol captures the end of the string, and | means or. Group those together using parentheses and your group collectively tells the parser to stop at either a space or the end of the string. Your code will look like this:
string input = #"testing a\determiner checking test one\pronoun";
Regex regex = new Regex(#"\\.*?(\s|$)");
string output = regex.Replace(input, " ");
Try this regex (\\[^\s]*)
(\\[^\s]*)
1st Capturing group (\\[^\s]*)
\\ matches the character \ literally
[^\s]* match a single character not present in the list below
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ].

Parsing text between quotes with .NET regular expressions

I have the following input text:
#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
I would like to parse the values with the #name=value syntax as name/value pairs. Parsing the previous string should result in the following named captures:
name:"foo"
value:"bar"
name:"name"
value:"John \""The Anonymous One\"" Doe"
name:"age"
value:"38"
I tried the following regex, which got me almost there:
#"(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>[A-Za-z0-9_-]+|(?="").+?(?=(?<!\\)""))"
The primary issue is that it captures the opening quote in "John \""The Anonymous One\"" Doe". I feel like this should be a lookbehind instead of a lookahead, but that doesn't seem to work at all.
Here are some rules for the expression:
Name must start with a letter and can contain any letter, number, underscore, or hyphen.
Unquoted must have at least one character and can contain any letter, number, underscore, or hyphen.
Quoted value can contain any character including any whitespace and escaped quotes.
Edit:
Here's the result from regex101.com:
(?:(?<=\s)|^)#(?<name>\w+[A-Za-z0-9_-]+?)\s*=\s*(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)"))
(?:(?<=\s)|^) Non-capturing group
# matches the character # literally
(?<name>\w+[A-Za-z0-9_-]+?) Named capturing group name
\s* match any white space character [\r\n\t\f ]
= matches the character = literally
\s* match any white space character [\r\n\t\f ]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?<value>(?<!")[A-Za-z0-9_-]+|(?=").+?(?=(?<!\\)")) Named capturing group value
1st Alternative: [A-Za-z0-9_-]+
[A-Za-z0-9_-]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
0-9 a single character in the range between 0 and 9
_- a single character in the list _- literally
2nd Alternative: (?=").+?(?=(?<!\\)")
(?=") Positive Lookahead - Assert that the regex below can be matched
" matches the characters " literally
.+? matches any character (except newline)
Quantifier: +? Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(?=(?<!\\)") Positive Lookahead - Assert that the regex below can be matched
(?<!\\) Negative Lookbehind - Assert that it is impossible to match the regex below
\\ matches the character \ literally
" matches the characters " literally
You can use a very useful .NET regex feature where multiple same-named captures are allowed. Also, there is an issue with your (?<name>) capture group: it allows a digit in the first position, which does not meet your 1st requirement.
So, I suggest:
(?si)(?:(?<=\s)|^)#(?<name>\w+[a-z0-9_-]+?)\s*=\s*(?:(?<value>[a-z0-9_-]+)|(?:"")?(?<value>.+?)(?=(?<!\\)""))
See demo
Note that you cannot debug .NET-specific regexes at regex101.com, you need to test them in .NET-compliant environment.
Use string methods.
Split
string myLongString = ""#"This is some text #foo=bar #name=""John \""The Anonymous One\"" Doe"" #age=38"
string[] nameValues = myLongString.Split('#');
From there either use Split function with "=" or use IndexOf("=").

Using Regular Expression for Phone Number

I know this question is asked like a thousand times in here, but I can't get the hang of it yet. I need help with checking a textbox if it matches a Phone Number format. The format should be likes this :
000-000-000 or (+000)00-000-000. Can anybody help me ?
give this pattern a try,
^(\(\+\d{3}\)|\d)\d{2}(-\d{3}){2}$
ScreenShot:
Generated Explanation:
Assert position at the beginning of a line (at beginning of the string or after a line break character) ^
Match the regular expression below and capture its match into backreference number 1 (\(\+\d{3}\)|\d)
Match either the regular expression below (attempting the next alternative only if this one fails) \(\+\d{3}\)
Match the character “(” literally \(
Match the character “+” literally \+
Match a single digit 0..9 \d{3}
Exactly 3 times {3}
Match the character “)” literally \)
Or match regular expression number 2 below (the entire group fails if this one fails to match) \d
Match a single digit 0..9 \d
Match a single digit 0..9 \d{2}
Exactly 2 times {2}
Match the regular expression below and capture its match into backreference number 2 (-\d{3}){2}
Exactly 2 times {2}
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. {2}
Match the character “-” literally -
Match a single digit 0..9 \d{3}
Exactly 3 times {3}
Assert position at the end of a line (at the end of the string or before a line break character) $
Pattern 1 is \d{3}\-\d{3}\-\d{3}
Pattern 2 is \(\+\d{3}\)\d{2}\-d{3}\-\d{3}
So you need to match for Pattern1 OR Pattern2:
(\d{3}\-\d{3}\-\d{3})|(\(\+\d{3}\)\d{2}\-d{3}\-\d{3})
(?:\d|\(\+\d{3}\))\d{2}(?:-\d{3}){2}
Or, if you're regarding of performance, better change it to:
(?:\(\+\d{3}\)|\d)\d{2}(?:-\d{3}){2}

Regex statement for only numbers between 0 and 255 in C#

How do i write regex statement for only numbers between 0 and 255? 0 and 255 will be valid for the statement.
You can find some numeric ranges here:
http://www.regular-expressions.info/numericranges.html
Your example would be:
^([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])$
^([0-9]{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])$
This tool is quite helpful for such things. A little searching doesn't hurt anyone either.
If you want to allow leading zeroes the pattern needs to be adapted, though. E.g.:
^([01][0-9][0-9]|2[0-4][0-9]|25[0-5])$
Try a negative look behind:
(?<!\-)\b0*([0-9]{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])\b
Explanation
<!--
(?<!\-)\b0*([0-9]{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])\b
Options: ^ and $ match at line breaks
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\-)»
Match the character “-” literally «\-»
Assert position at a word boundary «\b»
Match the character “0” literally «0*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 1 «([0-9]{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])»
Match either the regular expression below (attempting the next alternative only if this one fails) «[0-9]{1,2}»
Match a single character in the range between “0” and “9” «[0-9]{1,2}»
Between one and 2 times, as many times as possible, giving back as needed (greedy) «{1,2}»
Or match regular expression number 2 below (attempting the next alternative only if this one fails) «1[0-9]{2}»
Match the character “1” literally «1»
Match a single character in the range between “0” and “9” «[0-9]{2}»
Exactly 2 times «{2}»
Or match regular expression number 3 below (attempting the next alternative only if this one fails) «2[0-4][0-9]»
Match the character “2” literally «2»
Match a single character in the range between “0” and “4” «[0-4]»
Match a single character in the range between “0” and “9” «[0-9]»
Or match regular expression number 4 below (the entire group fails if this one fails to match) «25[0-5]»
Match the characters “25” literally «25»
Match a single character in the range between “0” and “5” «[0-5]»
Assert position at a word boundary «\b»
-->
Try
([01]?\d\d?|2[0-4]\d|25[0-5])

Categories

Resources