Regex: match quotations marks in C# [duplicate] - c#

This question already has answers here:
RegEx Match multiple times in string
(4 answers)
Closed 3 years ago.
I'm new to regex and I don't seem te find my way out with those patterns. I'm trying to match the punctuation in a sentence (quotation marks and question mark) with no success.
Here is my code:
string sentence = "\"This is the end?\"";
string punctuation = Regex.Match(sentence, "[\"?]").Value;
What am I doing wrong here? I'd expect the console to display "?", however, it shows me a double quote.

If you want to match all quotation marks and question marks as your question states, then your pattern is okay. The problem is that Regex.Match will only return the first match it finds. From MSDN:
Searches the input string for the first occurrence of the specified regular expression...
You probably want to use Matches:
string sentence = "\"This is the end?\"";
MatchCollection allPunctuation = Regex.Matches(sentence, "[\"?]");
foreach(Match punctuation in allPunctuation)
{
Console.WriteLine("Found {0} at position {1}", punctuation.Value, punctuation.Index);
}
This will return:
Found " at position 0
Found ? at position 16
Found " at position 17
I'd also point out that if you truly want to match all punctuation characters, including things like 'French' quotes (« and »), 'smart' quotes (“ and ”), inverted question marks (¿), and many others, you can use Unicode Character categories with a pattern like \p{P}.

You need to call Matches instead of Match.
Example:
string sentence = "\"This is the end?\"";
var matches = Regex.Matches(sentence, "[\"?]");
var punctuationLocations = string.Empty;
foreach(Match match in matches)
{
punctuationLocations += match.Value + " at index:" + match.Index + Environment.NewLine;
}
// punctuationLocations:
// " at index:0
// ? at index:16
// " at index:17

Related

Regex to get string between number and underscore C# [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
i'm tryng make a regex to get the string between some number and underscore, for example:
I have CP_01Ags_v5, so I need a regex to match just Ags. another example could be CP_13Hgo_v5 and match Hgo.
Some idea?
Based off the examples and matches you are describing. You want something along the lines of.
[0-9]+(.*)[_]
to break it down.
The regex looking for any number that shows up one or more times then matches everything after the number(s) up until the [_] underscore.
The downfall is this assumes the examples you provided are similar. If your example is
CP_13Hgo_v5asdf_
then it will match
Hgo_v5asdf
if you have other possible findings then you want the non-greedy version of this regex.
[0-9]+(.*?)[_]
this will cause two groups to be found in this example
CP_13Hgo_v5asdf_
will find the following groups:
Hgo
and
asdf
You can use look-arounds to match just the string between the digits and the underscore e.g.
(?<=\d)[A-Za-z]+(?=_)
Demo on regex101
In C# (note the need to escape the \ in the regex):
String s = #"CP_01Ags_v5 CP_13Hgo_v5";
Match m = Regex.Match(s, "(?<=\\d)[A-Za-z]+(?=_)");
while (m.Success) {
Console.WriteLine(m.Value);
m = m.NextMatch();
}
Output
Ags
Hgo
If your string is always at least two characters and there are no other strings of at least two characters, then you can apply the following:
var text = "CP_01Ags_v5";
var x = Regex.Match(text, #"(?<!^)[A-Za-z]{2,}");
Use Regex Group:
(?<leftPart>_\d{2})(?<YourTarget>[a-zA-Z])(?<rightPart>_[a-zA-Z0-9]{2})
C#:
Regex re = new Regex(#"(?<leftPart>_\d{2})(?<YourTarget>[a-zA-Z])(?<rightPart>_[a-zA-Z0-9]{2})");
/*
* Loop
* To get value of group you want
*/
foreach (Match item in re.Matches("CP_01Ags_v5 CP_13Hgo_v5,"))
{
Console.WriteLine(" Match: " + item.ToString());
Console.WriteLine(" Your Target you want: " + item.Groups["YourTarget"]);
}

mvc regex replace array of pattern on one sentence [duplicate]

This question already has answers here:
Why \b does not match word using .net regex
(2 answers)
Closed 5 years ago.
I want to replace certain words in one sentence string with regex replace.
For it, I create pattern array :
string[] words = {"abc","132","qwe","bold","test"};
and for replace, I do it :
foreach (string item in words){
output = Regex.Replace(output,#"\b" + item + "\b", " ");
}
but this way don't work ...
Someone has an idea?
Explanation
I use the above method in VB.net and will respond without problems.
I am a beginner in C #
It looks like you forgot the # literal at the second \b:
either put it in there
output = Regex.Replace(output, #"\b" + item + #"\b", " ");
or double the backslash so it is used as an escape character:
output = Regex.Replace(output, #"\b" + item + "\\b", " ");

Remove special characters from string with unicode

I found the most popular answer to this question is:
Regex.Replace(value, "[^a-zA-Z0-9]+", " ", RegexOptions.Compiled);
However, if users type in Non-English name when billing, this method will consider these non- are special characters and remove them.
Is there any way we can build for most of users since my website is multi-language.
Make it Unicode aware:
var res = Regex.Replace(value, #"[^\p{L}\p{M}\p{N}]+", " ");
If you plan to keep only regular digits, keep [0-9].
The regex matches one or more symbols other than Unicode letters (\p{L}), diacritics (\p{M}) and digits (\p{N}).
You might consider var res = Regex.Replace(value, #"\W+", " "), but it will keep _ since the underscore is a "word" character.
I found my self that the best way to achieve this and make work with all languages is create a string with all banned characters, look this code:
string input = #"heya's #FFFFF , CUL8R M8 how are you?'"; // This is the input string
string regex = #"[!""#$%&'()*+,\-./:;<=>?#[\\\]^_`{|}~]"; //Banned characters string, add all characters you don´t want to be displayed here.
Match m;
while ((m = Regex.Match(input, regex)) != null)
{
if (m.Success)
input = input.Remove(m.Index, m.Length);
else // if m.Success is false: break, because while loop can be infinite
break;
}
input = input.Replace(" ", " ").Replace(" "," "); //if string has two-three-four spaces together change it to one
MessageBox.Show(input);
Hope it works!
PS: As others posted here, there are other ways. But I personally prefer that one even though it´s way more code. Choose the one you think better fits for your needing.

Regex Matchcollection groups

I already tried two days to solve the Problem, that I have a MatchCollection. In the patter is a Group and I want to have a list with the Solutions of the Group (there were two or more Solutions).
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "^<tr>$?<td>$?[D-M][i-r],[' '][0-3][1-9].[0-1][1-9].[0-9][0-9]$?</td>$?<td>$?([1-9][0-2]?)$?</td>$?";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
string s = groups[1].Value;
Datum2.Text = s;
}
But only the last match (2) appears in the TextBox "Datum2".
I know that I have to use e.g. a listbox, but the Groups[1].Value is a string...
Thanks for your help and time.
Dieter
First thing you need to correct in the code is Datum2.Text = s; would overwrite the text in Datum2 if it were more than one match.
Now, about your regex,
^ forces a match at the begging of the line, so there is really only 1 match. If you remove it, it'll match twice.
I can't seem to understand what was intended with $? all over the pattern (just take them out).
[' '] matches "either a quote, a space or a quote (no need to repeat characters in a character class.
All dots in [0-3][1-9].[0-1][1-9].[0-9][0-9] need to be escaped. A dot matches any character otherwise.
[0-1][1-9] matches all months except "10". The second character shoud be [0-9] (or \d).
Code:
string input = "<tr><td>Mi, 09.09.15</td><td>1</td><td>PK</td><td>E</td><td>123</td><td></td></tr><tr><td>Mi, 09.09.15</td><td>2</td><td>ER</td><td>ER</td><td>234</td><td></td></tr>";
string Patter2 = "<tr><td>[D-M][i-r],[' ][0-3][0-9]\\.[0-1][0-9]\\.[0-9][0-9]</td><td>([1-9][0-2]?)</td>";
Regex r2 = new Regex(Patter2);
MatchCollection mc2 = r2.Matches(input);
string s= "";
foreach (Match match in mc2)
{
GroupCollection groups = match.Groups;
s = s + " " + groups[1].Value;
}
Datum2.Text = s;
Output:
1 2
DEMO
You should know that regex is not the tool to parse HTML. It'll work for simple cases, but for real cases do consider using HTML Agility Pack

Remove characters between different parameters [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 7 years ago.
I have a string of different emails
ex: "email1#uy.com, email2#iu.it, email3#uu.edu" etc, etc
I would like to formulate a Regex that creates the following output
ex: "email1,email2,email3" etc, etc
How can I remove characters between an "#" and "," but leaving a "," and a Space in C#
Thank you so much for the help!!
If you want to replace all characters between # and comma by blank, the easiest option is to use Regex.Replace:
var emails = "a#m.com, b#m.com, d#m.com";
var result = Regex.Replace(emails, "#[^,]+", string.Empty);
// result is "a, b, d"
Please note that it leaves spaces after comma in the result, as you wanted in your question, though your example result has spaces removed.
The regular expression looks for all substrings starting '#' characters, followed by any character which is not comma. Those substrings are replaced with empty string.
Replacing all occurrences of #[^,]+ with an empty string will do the job.
The expression matches sequences that start in #, inclusive, up to a comma or to the end, exclusive. Therefore, commas in the original string of e-mails would be kept.
Demo.
Maybe you don't need to use a regex, in that case you can do the following:
string input = "email1#uy.com, email2#iu.it, email3#uu.edu";
input = input.Replace(" ", "");
string[] ocurrences = input.Split(',');
for (int i = 0; i < ocurrences.Length; i++)
{
string s = ocurrences[i];
ocurrences[i] = s.Substring(0, s.IndexOf('#'));
}
string final = string.Join(", ", occurences);

Categories

Resources