Regex to remove consecutive characters - c#

Need a regular expression to replace the following sequence
Before : abbbccdd After : abcd
And also if numeric data is present instead of alphabets i would like to remove the duplicates and display

For the first part, in most languages you can do something like replacing (.)\1+ with $1.
The exact syntax depends on the language and the regular expression engine you are using, so check the manual for your language for more details.

This works in PHP:
preg_replace('/(.)\1+/','$1',$str);
I'm not sure what you mean with your second question, though.

Related

Regex that returns all integers in C# "111; 222; 3333" and "213" in a string with alpha

I am extracting all numbers used in an xml file. The numbers are written in following two patterns
<Environment Id="11" StringId="8407" DescriptionId="5014" RemoteControlAppStringId="8119; 8118" EnvironmentType="BlueToothBridge" AlternateId="1" XML_NAME_ID="BTBSpeechPlusM" FactoryGainType="LIN18">
<Offsets />
</Environment>
I am using regex: "\"\d*;\"" and "\"\d*\"" to extract all numbers.
from the above when i ran Regex "\"\d*\"" using
Regex.Match(myString, "\"\\d*\"")
the above line returns 8407, 11,5014 but it is not returning 8119 and 8118
Your regex will fail to match 8119; 8118 because your pattern is finding quoted numbers.
try with
\b\d+\b
\b specify that \d+ will match only in word boundary. So LIN18 will not match.
Depening on whether you can assume that the provided input is valid XML, you could use the following regular expression:1
Regex.match(myString, "(?<=\")\\d+(?=\")|(?<=\")\\d+(?=; ?\\d+\")|(?<=\"\\d+; ?)\\d+(?=\")" )
The main idea behind this is that it takes the three possible situations into account:
"[number]"
"[number]; [other_number]" (With or without a space before [other_number])
"[other_number]; [number]" (With or without a space before [number])
There are two new concepts I included in the regular expression:2
Positive lookahead: (?=[regex])
Positive lookbehind: (?<=[regex])
These concepts allow the regular expression to check if something specific is before or after it, without putting it in the match.
This regular expression could easily be optimised, but this is meant as an example of a basic approach.
One good tip for developing a regular expression like this is to use a tool (online or offline) to test your regular expression. The tool I used was .NET Regex Tester.
As #poke stated in the comment, it's because your regex doesn't match the string. Change your regex to capture specific matches and account for the possibility of the ';'.
Something like below should probably do the trick.
EDIT: (\b\d+\b)|(\b\d+[;*]\d+\b)

Regular expression that accepts digits, letters and hyphens

I want a regular expression that accepts all numbers, alphabets and only the hyphen (‐) from special characters.
I am trying this expression: ^\d+$/[-]/[a-z] but it does not work. I want to accept expressions like this one:
Emp-IN-0000001
Can someone help me with this?
If it's always this format (Emp-IN-0000001), then use this regexp:
^[a-zA-Z]+-[a-zA-Z][a-zA-Z]-[0-9]+$
or, if you have extended regexps:
^[a-zA-Z]+-[a-zA-Z]{2}-\d+$
when there are always seven digits, use this:
^[a-zA-Z]+-[a-zA-Z]{2}-\d{7}$
You can even say:
^Emp-IN-\d{7}$
if it's exactly "Emp-IN-" + digits.
Btw, this is not C# specific, you can use these regular expressions with any language, as long as they support regexps at all.
If you stickily wants to follow this format Emp-IN-0000001, then you might need to use this regular expression:
^[a-zA-Z]+-[a-zA-Z]+-\d+$
I don't really get what you tried with your regular expression, but it is actually as simple as this:
^[a-zA-Z\d-]+$
Or if you want to allow empty strings:
^[a-zA-Z\d-]*$
If you use the case-insensitive modifier with your regular expression, you can leave out either the a-z or A-Z from both variants.
I recommend you read up on some regex basics in this great tutorial.

How Can I Check If a C# Regular Expression Is Trying to Match 1-(and-only-1)-Character Strings?

Maybe this is a very rare (or even dumb) question, but I do need it in my app.
How can I check if a C# regular expression is trying to match 1-character strings?
That means, I only allow the users to search 1-character strings. If the user is trying to search multi-character strings, an error message will be displaying to the users.
Did I make myself clear?
Thanks.
Peter
P.S.: I saw an answer about calculating the final matched strings' length, but for some unknown reason, the answer is gone.
I thought it for a while, I think calculating the final matched strings length is okay, though it's gonna be kind of slow.
Yet, the original question is very rare and tedious.
a regexp would be .{1}
This will allow any char though. if you only want alpanumeric then you can use [a-z0-9]{1} or shorthand /w{1}
Another option its to limit the number of chars a user can type in an input field. set a maxlength on it.
Yet another option is to save the forms input field to a char and not a string although you may need some handling around this to prevent errors.
Why not use maxlength and save to a char.
You can look for unescaped *, +, {}, ? etc. and count the number of characters (don't forget to flatten the [] as one character).
Basically you have to parse your regex.
Instead of validating the regular expression, which could be complicated, you could apply it only on single characters instead of the whole string.
If this is not possible, you may want to limit the possibilities of regular expression to some certain features. For instance the user can only enter characters to match or characters to exclude. Then you build up the regex in your code.
eg:
ABC matches [ABC]
^ABC matches [^ABC]
A-Z matches [A-Z]
# matches [0-9]
\w matches \w
AB#x-z matches [AB]|[0-9]|[x-z]|\w
which cases do you need to support?
This would be somewhat easy to parse and validate.

How to Check if a String is a "string" or a RegEx?

How can I check if a String in an textbox is a plain String ore a RegEx?
I'm searching through a text file line by line.
Either by .Contains(Textbox.Text); or by Regex(Textbox.Text) Match(currentLine)
(I know, syntax isn't working like this, it's just for presentation)
Now my Program is supposed to autodetect if Textbox.Text is in form of a RegEx or if it is a normal String.
Any suggestions? Write my own little RexEx to detect if Textbox contains a RegEx?
Edit:
I failed to add thad my Strings
can be very simple like Foo ore 0005
I'm trying the suggested solutions
right away!
You can't detect regular expressions with a regular expression, as regular expressions themselves are not a regular language.
However, the easiest you probably could do is trying to compile a regex from your textbox contents and when it succeeds you know that it's a regex. If it fails, you know it's not.
But this would classify ordinary strings like "foo" as a regular expression too. Depending on what you need to do, this may or may not be a problem. If it's a search string, then the results are identical for this case. In the case of "foo.bar" they would differ, though since it's a valid regex but matches different things than the string itself.
My advice, also stated in another comment, would be that you simply always enable regex search since there is exactly no difference if you split code paths here. Aside from a dubious performance benefit (which is unlikely to make any difference if there is much of a benefit at all).
Many strings could be a regex, every regex could actually be a string.
Consider the string "thin." could either be a string ('.' is a dot) or a regex ('.' is any character).
I would just add a checkbox where the user indicates if he enters a regex, as usual in many applications.
One possible solution depending on your definition of string and regex would be to check if the string contains any regex typical characters.
You could do something like this:
string s = "I'm not a Regex";
if (s == Regex.Escape(s))
{
// no regex indeed
}
Try and use it in a regex and see if an exception is thrown.
This approach only checks if it is a valid regex, not whether it was intended to be one.
Another approach could be to check if it is surrounded by slashes (ie. ‘/foo/‘) Surrounding regexes with slashes is common practice (although you must remove the slashes before feeding it into the regex library)

Pulling data out of quotes?

I'm looking for a regex that can pull out quoted sections in a string, both single and double quotes.
IE:
"This is 'an example', \"of an input string\""
Matches:
an example
of an input string
I wrote up this:
[\"|'][A-Za-z0-9\\W]+[\"|']
It works but does anyone see any flaws with it?
EDIT: The main issue I see is that it can't handle nested quotes.
How does it handle single quotes inside of double quotes (or vice versa)?
"This is 'an example', \"of 'quotes within quotes'\""
should match
an example
of 'quotes within quotes'
Use a backreference if you need to support this.
(\"|')[A-Za-z0-9\\W]+?\1
EDIT: Fixed to use a reluctant quantifier.
Like that?
"([\"'])(.*?)\1"
Your desired match would be in sub group 2, and the kind of quote in group one.
The flaw in your regex is 1) the greedy "+" and 2) [A-Za-z0-9] is not really matching an awful lot. Many characters are not in that range.
It works but doesn't match other characters in quotes (e.g., non-alphanumeric, like binary or foreign language chars). How about this:
[\"']([^\"']*)[\"']
My C# regex is a little rusty so go easy on me if that's not exactly right :)
#"(\"|')(.*?)\1"
You might already have one of these, but, in case not, here's a free, open source tool I use all the time to test my regular expressions. I typically have the general idea of what the expression should look like, but need to fiddle around with some of the particulars.
http://renschler.net/RegexBuilder/

Categories

Resources