Extract only numbers from text - c#

I'm trying to extract only numbers from a string/text. Below is the regex pattern I'm using.
Regex regex = new Regex(#"[\d+]\S+");
string extract_from = " 12 abcd 1-2-3a a123z 1.2.3.4 xyz";
From the string "extract_from" above, the regex is extracting the numbers
12
1-2-3a
123z
1.2.3.4
The regex is extracting it correctly except the second and third one "1-2-3a", "123z", which shouldn't be extracted because it contains an alphabet. What pattern can I add in regex to not extract where the numbers also have an alphabet in between?
dash and dot are ok, just not alphabets.

Here, change the regex \S to be \s, notice the caps.
\S matches all but space, \s matches space.
Regex regex = new Regex(#"[\d+]\s+");

Try this one:
[0-9\-.]+\s+
That will allow expressions with more than one decimal, and dashes inside them, vs just at the beginning.
You can use regexhero.net or www.regexplanet.com to test your regex expressions, they're very powerful tools.
Output from your given input would be the following matches:
12
1.2.3.4
Edit, based on comment from OP
This regex shouldn't require a space at the beginning. If you need to match a number at the end of the line, it's probably simplest to just add a special case for it:
[0-9\-.]+\s|[0-9\-.]+$

use this pattern to catch anything but alphabets
(?!\S*[a-zA-Z])\b([^a-zA-Z\s]+)\b
Demo

Related

How to match exactly one or more characters inside boundary

Currently i using this pattern: [HelloWorld]{1,}.
So if my input is: Hello -> It will be match.
But if my input is WorldHello -> Still match but not right.
So how to make input string must match exactly will value inside pattern?
Just get rid of the square brackets, and the comma and you're good to go!
HelloWorld{1}
In regex what's between square brackets is a character set.
So [HelloWorld] matches 1 character that's in the set [edlorHW].
And .{1,} or .+ both match 1 or more characters.
What you probably want is the literal word.
So the regex would simple be "HelloWorld".
That would match HelloWord in the string "blaHelloWorldbla".
If you want the word to be a single word, and not part of a word?
Then you could use wordboundaries \b, which indicate the transition between a word character (\w = [A-Za-z0-9_]) and a non-word character (\W = [^A-Za-z0-9_]) or the beginning of a line ^ or the end of a line $.
For example #"\bHelloWorld\b" to get a match from "bla HelloWorld bla" but not from "blaHelloWorldbla".
Note that the regex string this time was proceeded by #.
Because by using a verbatim string the backslashes don't have to be backslashed.
it seems you need to use online regex tester web sites to check your pattern. for example you could find one of them here and also you could study c# regex reference here
Try this pattern:
[a-zA-Z]{1,}
You can test it online

How To get text between 2 strings?

String is given below from which i want to extract the text.
String:
Hello Mr John and Hello Ms Rita
Regex
Hello(.*?)Rita
I am try to get text between 2 strings which "Hello" and "Rita" I am using the above given regex, but its is giving me
Mr John and Hello Ms
which is wrong. I need only "Ms" Can anyone help me out to write proper regex for this situation?
Use a tempered greedy token:
Hello((?:(?!Hello|Rita).)*)Rita
^^^^^^^^^^^^^^^^^^^
See regex demo here
The (?:(?!Hello|Rita).)* is the tempered greedy token that only matches text that is not Hello or Rita. You may add word boundaries \b if you need to check for whole words.
In order to get a Ms without spaces on both ends, use this regex variation:
Hello\s*((?:(?!Hello|Rita).)*?)\s*Rita
Adding the ? to * will form a lazy quantifier *? that matches as few characters as needed to find a match, and \s* will match zero or more whitespaces.
To get the closest match towards ending word, let a greedy dot in front of the initial word consume.
.*Hello(.*?)Rita
See demo at regex101
Or without whitespace in captured: .*Hello\s*(.*?)\s*Rita
Or with use of two capture groups: .*(Hello\s*(.*?)\s*Rita)
Your (.*?) is picking up too much text because .* matches any string of characters. So it grabs everything from the first "Hello" to "Rita" at the end.
One easy way you could get what you want is with this regular expression:
Hello (\S+) Rita
\S matches any non-whitespace character, so \S+ matches any consecutive string of non-whitespace characters, i.e. a single word.
This would be a bit more robust, allowing for multiple spaces or other whitespace between the words:
Hello\s+(\S+)\s+Rita
Demo
you can use lookahead and lookbehind (?<=Hello).*?(?=Rita)

Regular expression to match exactly the start of a string

I'm trying to build a regular expression in c# to check whether a string follow a specific format.
The format i want is: [digit][white space][dot][letters]
For example:
123 .abc follow the format
12345 .def follow the format
123 abc does not follow the format
I write this expression but it not works completelly well
Regex.IsMatch(exampleString, #"^\d+ .")
^ matches the start of the string, and you got it right.
\d+ matches one or more digits, and you got that one right as well.
A space in a regex matches a literal space, so that works too!
However, a . is a wildcard and will match any one character. You will need to escape it with a backslash like this if you want to match a literal period: \..
To match letters now, you can use [a-z]+ right after the period.
#"^\d+ \.[a-z]+"
The dot is a special character in regex, which matches any character (except, typically, newlines). To match a literal ., you need to escape it:
Regex.IsMatch(exampleString, #"^\d+ \.")
If you want to include the condition for the succeeding letters, use:
Regex.IsMatch(exampleString, #"^\d+ \.[A-Za-z]+$")
For you to get yours to match, keep in mind that the period in regular expressions is a special character that will match any character, so you'll need to escape that.
In addition, \s is a match for any white-space character (tabs, line breaks).
^\d+\s+ \..+
(untested)

I need to remove all . that don't match a pattern

I need to remove all dots from text unless they match a pattern [0-9]+.[0-9]+
for example if the following text is my input:
abc. def. 123.45 ... 12.
the output should look like this:
abc def 123.45 12
Thanks
If the language you are using supports lookarounds you can use this regular expression:
(?<![0-9])\.|\.(?![0-9])
This matches dots that either aren't preceded by a digit, or aren't followed by a digit.
Example for C#:
string result = Regex.Replace(input, #"(?<![0-9])\.|\.(?![0-9])", "");
See it working online: ideone
If your regex flavor supports negative lookarounds (which .NET does fabulously) you can use this:
(?<!\d)\.|\.(?!\d)
This will only match dots that have either a non-digit character before or after them. Simply replace the result with an empty string.
If not, then you can do this:
(?|(^|\D)\.|\.($|\D))
And replace with $1. This does the same, but includes that additional character in the match. The replacement puts that matched character back in place.

C# regex for assembly style hex numbers

I'm new to regex and I want to highlight hexadecimal numbers in Assembly style. Like this:
$00
$FF
$1234
($00)
($00,x)
and even hexadecimal numbers that begin with #.
So far I wrote "$[A-Fa-f0-9]+" to see if it highlights numbers beginning with $ but it doesn't. Why? And can someone help me with what I'm doing? Thanks.
Put a back slash before $ and your regex will work like so
\$[A-Fa-f0-9]+
$ is a valid regex character that matches with end of string. So if your pattern contains dollar then you need to escape it. See regex reference for details
This should cover all those cases, including the cases in which you get a # instead of a $
public Regex MyRegex = new Regex(
"^(\\()?[\\$#][0-9a-fA-F]+(,x)?(?(1)\\))[\\s]*$",
RegexOptions.Singleline
| RegexOptions.Compiled
);
The unescaped sequence for the single line: ^(\()?[\$#][0-9a-fA-F]+(,x)?(?(1)\))[\s]*$
That should validate on a per-line match.
By the way, I made this regex pretty quickly using Expresso

Categories

Resources