I need to remove all . that don't match a pattern

I need to remove all . that don't match a pattern - c#

I need to remove all dots from text unless they match a pattern [0-9]+.[0-9]+
for example if the following text is my input:
abc. def. 123.45 ... 12.
the output should look like this:
abc def 123.45 12
Thanks

If the language you are using supports lookarounds you can use this regular expression:
(?<![0-9])\.|\.(?![0-9])
This matches dots that either aren't preceded by a digit, or aren't followed by a digit.
Example for C#:
string result = Regex.Replace(input, #"(?<![0-9])\.|\.(?![0-9])", "");
See it working online: ideone

If your regex flavor supports negative lookarounds (which .NET does fabulously) you can use this:
(?<!\d)\.|\.(?!\d)
This will only match dots that have either a non-digit character before or after them. Simply replace the result with an empty string.
If not, then you can do this:
(?|(^|\D)\.|\.($|\D))
And replace with $1. This does the same, but includes that additional character in the match. The replacement puts that matched character back in place.

Related

How to match exactly one or more characters inside boundary

Currently i using this pattern: [HelloWorld]{1,}.
So if my input is: Hello -> It will be match.
But if my input is WorldHello -> Still match but not right.
So how to make input string must match exactly will value inside pattern?

Just get rid of the square brackets, and the comma and you're good to go!
HelloWorld{1}

In regex what's between square brackets is a character set.
So [HelloWorld] matches 1 character that's in the set [edlorHW].
And .{1,} or .+ both match 1 or more characters.
What you probably want is the literal word.
So the regex would simple be "HelloWorld".
That would match HelloWord in the string "blaHelloWorldbla".
If you want the word to be a single word, and not part of a word?
Then you could use wordboundaries \b, which indicate the transition between a word character (\w = [A-Za-z0-9_]) and a non-word character (\W = [^A-Za-z0-9_]) or the beginning of a line ^ or the end of a line $.
For example #"\bHelloWorld\b" to get a match from "bla HelloWorld bla" but not from "blaHelloWorldbla".
Note that the regex string this time was proceeded by #.
Because by using a verbatim string the backslashes don't have to be backslashed.

it seems you need to use online regex tester web sites to check your pattern. for example you could find one of them here and also you could study c# regex reference here

Try this pattern:
[a-zA-Z]{1,}
You can test it online

Extract only numbers from text

I'm trying to extract only numbers from a string/text. Below is the regex pattern I'm using.
Regex regex = new Regex(#"[\d+]\S+");
string extract_from = " 12 abcd 1-2-3a a123z 1.2.3.4 xyz";
From the string "extract_from" above, the regex is extracting the numbers
12
1-2-3a
123z
1.2.3.4
The regex is extracting it correctly except the second and third one "1-2-3a", "123z", which shouldn't be extracted because it contains an alphabet. What pattern can I add in regex to not extract where the numbers also have an alphabet in between?
dash and dot are ok, just not alphabets.

Here, change the regex \S to be \s, notice the caps.
\S matches all but space, \s matches space.
Regex regex = new Regex(#"[\d+]\s+");

Try this one:
[0-9\-.]+\s+
That will allow expressions with more than one decimal, and dashes inside them, vs just at the beginning.
You can use regexhero.net or www.regexplanet.com to test your regex expressions, they're very powerful tools.
Output from your given input would be the following matches:
12
1.2.3.4
Edit, based on comment from OP
This regex shouldn't require a space at the beginning. If you need to match a number at the end of the line, it's probably simplest to just add a special case for it:
[0-9\-.]+\s|[0-9\-.]+$

use this pattern to catch anything but alphabets
(?!\S*[a-zA-Z])\b([^a-zA-Z\s]+)\b
Demo

Don't use capturing groups in c# Regex

I am writing a regular expression in Visual Studio 2013 using C#
I have the following scenario:
Match match = Regex.Match("%%Text%%More text%%More more text", "(?<!^)%%[^%]+%%");
But my problem is that I don't want to capture groups. The reason is that with capture groups match.Value contains %%More text%% and my idea is the get on match.Value directly the string: More text
The string to get will be always between the second and the third group of %%
Another approach is that the string will be always between the fourth and fifth %
I tried:
Regex.Match("%%Text%%More text%%More more text", "(?:(?<!^)%%[^%]+%%)");
But with no luck.
I want to use match.Value because all my regex are in a database table.
Is there a way to "transform" that regex to one not using capturing groups and the in match.value the desired string?

If you are sure you have no %s inside double %%s, you can just use lookarounds like this:
(?<=^%%[^%]*%%)[^%]+(?=%%)
^^^^^^^^^^^^^^ ^^^^^
If you have single-% delimited strings (like %text1%text2%text3%text4%text5%text6, see demo):
(?<=^%[^%]*%)[^%]+(?=%)
See regex demo
And in case it is between the 4th and the 5th:
(?<=^%%(?:[^%]*%%){3})[^%]+(?=%%)
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^
For single-% delimited strings (see demo):
(?<=^%(?:[^%]*%){3})[^%]+(?=%)
See another demo
Both the regexps contain a variable-width lookbehind and the same lookahead to restrict the context the 1 or more characters other than % appears in.
The (?<=^%%[^%]*%%) makes sure the is %%[something_other_then_%]%% right after the beginning of the string, and (?<=^%%(?:[^%]*%%){3}) matches %%[substring_not_having_%]%%[substring_not_having_%]%%[substring_not_having_%]%% after the string start.
In case there can be single % symbols inside the double %%, you can use an unroll-the-loop regex (see demo):
(?<=^%%(?:[^%]*(?:%(?!%)[^%]*)*%%){3})[^%]*(?:%(?!%)[^%]*)*(?=%%)
Which is matching the same stuff that can be matched with (?<=^%%(?:.*?%%){3}).*?(?=%%). For short strings, the .*? based solution should work faster. For very long input texts, use the unrolled version.

Regular expression to match exactly the start of a string

I'm trying to build a regular expression in c# to check whether a string follow a specific format.
The format i want is: [digit][white space][dot][letters]
For example:
123 .abc follow the format
12345 .def follow the format
123 abc does not follow the format
I write this expression but it not works completelly well
Regex.IsMatch(exampleString, #"^\d+ .")

^ matches the start of the string, and you got it right.
\d+ matches one or more digits, and you got that one right as well.
A space in a regex matches a literal space, so that works too!
However, a . is a wildcard and will match any one character. You will need to escape it with a backslash like this if you want to match a literal period: \..
To match letters now, you can use [a-z]+ right after the period.
#"^\d+ \.[a-z]+"

The dot is a special character in regex, which matches any character (except, typically, newlines). To match a literal ., you need to escape it:
Regex.IsMatch(exampleString, #"^\d+ \.")
If you want to include the condition for the succeeding letters, use:
Regex.IsMatch(exampleString, #"^\d+ \.[A-Za-z]+$")

For you to get yours to match, keep in mind that the period in regular expressions is a special character that will match any character, so you'll need to escape that.
In addition, \s is a match for any white-space character (tabs, line breaks).
^\d+\s+ \..+
(untested)

Regex for: Must contain a numeric, excluding "SomeText1"

I need a regex pattern (must be a single pattern) to match any text that contains a number, excluding a specific literal (i.e. "SomeText1").
I have the match any text containing a number part:
^.*[0-9]+.*$
But am having a problem excluding a specific literal.
Update: This is for .NET Regex.
Thanks in advance.

As a verbose regex:
^ # Start of string
(?=.*[0-9]) # Assert presence of at least one digit
(?!SomeText1$) # Assert that the string is not "SomeText1"
.* # If so, then match any characters
$ # until the end of the string
If your regex flavor doesn't support those:
^(?=.*[0-9])(?!SomeText1$).*$

Use negative-look-ahead:
^(?!.*?SomeText1).*?[0-9]+.*$

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

I need to remove all . that don't match a pattern - c#

I need to remove all dots from text unless they match a pattern [0-9]+.[0-9]+ for example if the following text is my input: abc. def. 123.45 ... 12. the output should look like this: abc def 123.45 12 Thanks

Related

How to match exactly one or more characters inside boundary

Extract only numbers from text

Don't use capturing groups in c# Regex

Regular expression to match exactly the start of a string

Regex for: Must contain a numeric, excluding "SomeText1"

Categories

Resources