Regular expression missing characters - c#

Below code works when the regular expression match with the string. What if one of the characters are not there, for example MONEY-STAT is missing?
string s = "MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938";
Regex regex =
new Regex(#"MONEY-ID(?<moneyId>.*?)\:MONEY-STAT(?<moneyStat>.*?)\:MONEY-PAYetr-(?<moneyPaetr>.*?)$");
Match match = regex.Match(s);
if (match.Success)
{
Console.WriteLine("Money ID: " + match.Groups["moneyId"].Value);
Console.WriteLine("Money Stat: " + match.Groups["moneyStat"].Value);
Console.WriteLine("Money Paetr: " + match.Groups["moneyPaetr"].Value);
}
Console.WriteLine("hit <enter>");
Console.ReadLine();

I changed MONEY-STAT to (?:MONEY-STAT)?
MONEY-ID(?<moneyId>.*?)\:(?:MONEY-STAT)?(?<moneyStat>.*?)\:MONEY-PAYetr-(?<moneyPaetr>.*?)$
explain:
(?: subexpression) Defines a noncapturing group.
? Matches the previous element zero or one time.

Maybe I misunderstand your question.. but does this suit you?
(MONEY-ID(?<moneyId>.*?)\:)?(MONEY-STAT(?<moneyStat>.*?)\:)?(MONEY-PAYetr-)?(?<moneyPaetr>.*?)$
It basically makes each token optional.. It also includes the colon, since that's obviously a delimiter of some kind.
DISCLAIMER
I am terrible at regular expressions.. but this worked from my test here: http://ideone.com/0pdFk

You could try this:
(MONEY-ID[\d]+|(:?)MONEY-STAT[\d]+|:MONEY-PAYetr-[\d]+)
This will match patterns like below:
MONEY-ID123456:MONEY-STAT43:MONEY-PAYetr-1232832938
MONEY-STAT43:MONEY-PAYetr-1232832938
MONEY-ID123456:MONEY-PAYetr-1232832938

Related

Remove special characters from string with unicode

I found the most popular answer to this question is:
Regex.Replace(value, "[^a-zA-Z0-9]+", " ", RegexOptions.Compiled);
However, if users type in Non-English name when billing, this method will consider these non- are special characters and remove them.
Is there any way we can build for most of users since my website is multi-language.
Make it Unicode aware:
var res = Regex.Replace(value, #"[^\p{L}\p{M}\p{N}]+", " ");
If you plan to keep only regular digits, keep [0-9].
The regex matches one or more symbols other than Unicode letters (\p{L}), diacritics (\p{M}) and digits (\p{N}).
You might consider var res = Regex.Replace(value, #"\W+", " "), but it will keep _ since the underscore is a "word" character.
I found my self that the best way to achieve this and make work with all languages is create a string with all banned characters, look this code:
string input = #"heya's #FFFFF , CUL8R M8 how are you?'"; // This is the input string
string regex = #"[!""#$%&'()*+,\-./:;<=>?#[\\\]^_`{|}~]"; //Banned characters string, add all characters you don´t want to be displayed here.
Match m;
while ((m = Regex.Match(input, regex)) != null)
{
if (m.Success)
input = input.Remove(m.Index, m.Length);
else // if m.Success is false: break, because while loop can be infinite
break;
}
input = input.Replace(" ", " ").Replace(" "," "); //if string has two-three-four spaces together change it to one
MessageBox.Show(input);
Hope it works!
PS: As others posted here, there are other ways. But I personally prefer that one even though it´s way more code. Choose the one you think better fits for your needing.

Regular Expression without braces

i have the following sample cases :
1) "Sample"
2) "[10,25]"
I want to form a(only one) regular expression pattern, to which the above examples are passed returns me "Sample" and "10,25".
Note: Input strings do not include Quotes.
I came up with the following expression (?<=\[)(.*?)(?=\]), this satisfies the second case and retreives me only "10,25" but when the first case is matched it returns me blank. I want "Sample" to be returned? can anyone help me.
C#.
here you go, a small regex using a positive lookbehind, sometime these are very handy
Regex
(?<=^|\[)([\w,]+)
Test string
Sample
[10,25]
Result
MATCH 1
[0-6] Sample
MATCH 2
[8-13] 10,25
try at regex101.com
if " is included in your original string, use this regex, this will look for " mark as well, you may choose to remove ^| from lookup if " mark is always included or you may choose to leave it as it is if your text has combination of with and without " marks
Regex
(?<=^|\[|\")([\w,]+)
try at regex101.com
As far as I can tell, the below regex should help:
Regex regex = new Regex(#"^\w+|[[](\w)+\,(\w)+[]]$");
This will match multiple words, or 2 words (alphanumeric) separated by commas and inside square brackets.
One Java example:
// String input = "Sample";
String input = "[10,25]";
String text = "[^,\\[\\]]+";
Pattern pMod = Pattern.compile("(" + text + ")|(?>\\[(" + text + "," + text + ")\\])");
Matcher mMod = pMod.matcher(input);
while (mMod.find()) {
if(mMod.group(1) != null) {
System.out.println(mMod.group(1));
}
if(mMod.group(2)!=null) {
System.out.println(mMod.group(2));
}
}
if input is "[hello&bye,25|35]", then the output is hello&bye,25|35

Regular expressions in C# for extracting parts

I have this text:
" </SYM field/NN name=/IN ""/"" object/NN ""/"" >/SYM Categories/NNS :/: Cars/NNS ,/, About/RB Model/NNP :/: "
I would like to extract values such as
Categories/NNS :/: Cars/NNS ,/, About/RB
where the pattern is
WORD + /NNS + :/: ANYTHING until you reach the same pattern
I tried:
Match match = Regex.Match(input, #"([A-Za-z0-9\-]+)/NNS :/: ([A-Za-z0-9\-/s]+)",
RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
and the answer I got back was:
Categories
instead of
Categories/NNS :/: Cars/NNS ,/, About/RB
What I am doing wrong?
You need to enclose the bits of the regex you want as result inside parenthesis.
To obtain what you're looking for, you need to replace your regexp by (not tested, moreover I don't know C# regex specifics but the below should be OK):
"((?:[A-Za-z0-9\-]+)/NNS :/: (?:[A-Za-z0-9\-/s]+))"
The main parenthesis mean that you'll get the entire string as result.
The opening parenthesis followed by ?: mean that you don't want that part in the result.
If you would not put the ?:, it would result in a tuple with your entire string, then the string matching the first sub-regex, then the string matching the second sub-regex.
Why don't you use match.Value? Everything you put in parenthesis represents a group, but it looks like you want the whole thing.
Match match = Regex.Match(input, #"([A-Za-z0-9\-]+)/NNS :/: ([A-Za-z0-9\-/s]+)",
RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Value;
Console.WriteLine(key);
}

Regex pattern works in Lua but not in C#

I need to use regex in C# to split up something like "21A244" where
The first two numbers can be 1-99
The letter can only be 1 letter, A-Z
The last three numbers can be 111-999
So I made this match
"([0-9]+)([A-Z])([0-9]+)"
but for some reason when used in C#, the match functions just return the input string. So I tried it in Lua, just to make sure the pattern was correct, and it works just fine there.
Here's the relevant code:
var m = Regex.Matches( mdl.roomCode, "(\\d+)([A-Z])(\\d+)" );
System.Diagnostics.Debug.Print( "Count: " + m.Count );
And here's the working Lua code in case you were wondering
local str = "21A244"
print(string.match( str, "(%d+)([A-Z])(%d+)" ))
Thank you for any help
EDIT: Found the solution
var match = Regex.Match(mdl.roomCode, "(\\d+)([A-Z])(\\d+)");
var group = match.Groups;
System.Diagnostics.Debug.Print( "Count: " + group.Count );
System.Diagnostics.Debug.Print("houseID: " + group[1].Value);
System.Diagnostics.Debug.Print("section: " + group[2].Value);
System.Diagnostics.Debug.Print("roomID: " + group[3].Value);
Firstly you should make your regex a little more specific and limit how many numbers are allowed at the beginning/end. How about:
([1-9]{1,2})([A-Z])([1-9]{1,3})
Next, the results of the captures (i.e. the 3 parts in parens) will be in the Groups property of your regex matcher object. I.e.
m.Groups[1] // First number
m.Groups[2] // Letter
m.Groups[3] // Second number
Regex.Matches(mdl.roomCode, "(\d+)([A-Z])(\d+)") returns an collection of matches. If there is no match, then it will return an empty MatchCollection.
Since the regular expression matches the string, it returns a colletion with one item, the input string.

How do I do this with one regular expression pattern instead of three?

I think I need to use an alternation construct but I can't get it to work. How can I get this logic into one regular expression pattern?
match = Regex.Match(message2.Body, #"\r\nFrom: .+\(.+\)\r\n");
if (match.Success)
match = Regex.Match(message2.Body, #"\r\nFrom: (.+)\((.+)\)\r\n");
else
match = Regex.Match(message2.Body, #"\r\nFrom: ()(.+)\r\n");
EDIT:
Some sample cases should help with your questions
From: email
and
From: name(email)
Those are the two possible cases. I'm looking to match them so I can do
string name = match.Groups[1].Value;
string email = match.Groups[2].Value;
Suggestions for a different approach are welcome!
Thanks!
This is literally what you're asking for: "(?=" + regex1 + ")" + regex2 + "|" + regex3
match = Regex.Match(message.Body, #"(?=\r\nFrom: (.+\(.+\))\r\n)\r\nFrom: (.+)\((.+)\)\r\n|\r\nFrom: ()(.+)\r\n");
But I don't think that's really what you want.
With .net's Regex, you can name groups like this: (?<name>regex).
match = Regex.Match(message.Body, #"\r\nFrom: (?<one>.+)\((?<two>.+)\)\r\n|\r\nFrom: (?<one>)(?<two>.+)\r\n");
Console.WriteLine (match.Groups["one"].Value);
Console.WriteLine (match.Groups["two"].Value);
However, your \r\n is probably not right. That would be a literal rnFrom:. Try this instead.
match = Regex.Match(message.Body, #"^From: (?:(?<one>.+)\((?<two>.+)\)|(?<one>)(?<two>.+))$");
Console.WriteLine (match.Groups["one"].Value);
Console.WriteLine (match.Groups["two"].Value);

Categories

Resources