Regex to isolate a specific substring

Regex to isolate a specific substring - c#

I have this string I have retrieved from a File.ReadAllText:
6 11 rows processed
As you can see there is always an integer specifying the line number in this document. What I am interested in is the integer that comes after it and the words "rows processed". So in this case I am only interested in the substring "11 rows processed".
So, knowing that each line will start with an integer and then some white space, I need to be able to isolate the integer that follows it and the words "rows processed" and return that to a string by itself.
I have been told this is easy to do with Regex, but so far I haven't the faintest clue how to build it.

You don't need regular expressions for this. Just split on the whitespace:
var fields = s.Split(new char[0], StringSplitOptions.RemoveEmptyEntries);
Console.WriteLine(String.Join(" ", fields.Skip(1));
Here, I am using the fact that if you pass an empty array as the char [] parameter to String.Split, it splits on all whitespace.

This should work for what you need:
\d+(.*)
This searches for 1 or more digits (\d+) and then it puts everything afterwards in a group:
. = any character
* = repeater (zero or more of the preceding value (which is any character in the above
() = grouping
However, Jason is correct in that you only need to use a split function

If you need to use a Regex it would be like this:
string result = null;
Match match = Regex.Match(row, #"^\s*\d+\s*(.*)");
if (match.Success)
result = match.Groups[1].Value;
The regex matches from start of row: first spaces if any, then digits and then more spaces. Last it extracts rest of line and return it as result.

This is done easily with Regex.Replace() using the following regular expression...
^\d+\s+
So it'd be something like this:
return Regex.Replace(text, #"^\d+\s+", "");
Basically you're just trimming the first number \d and the whitespace \s that follows.

Example in PHP(C# regex should be compatible):
$line = "6 11 rows processed";
$resp = preg_match("/[0-9]+\s+(.*)/",$line,$out);
echo $out[1];
I hope I catched your point.

Related

How can I filter out certain combinations?

I'm trying to filter the input of a TextBox using a Regex. I need up to three numbers before the decimal point and I need two after it. This can be in any form.
I've tried changing the regex commands around, but it creates errors and single inputs won't be valid. I'm using a TextBox in WPF to collect the data.
bool containsLetter = Regex.IsMatch(units.Text, "^[0-9]{1,3}([.] [0-9] {1,3})?$");
if (containsLetter == true)
{
MessageBox.Show("error");
}
return containsLetter;
I want the regex filter to accept these types of inputs:
111.11,
11.11,
1.11,
1.01,
100,
10,
1,

As it has been mentioned in the comment, spaces are characters that will be interpreted literally in your regex pattern.
Therefore in this part of your regex:
([.] [0-9] {1,3})
a space is expected between . and [0-9],
the same goes for after [0-9] where the regex would match 1 to 3 spaces.
This being said, for readability purpose you have several way to construct your regex.
1) Put the comments out of the regex:
string myregex = #"\s" // Match any whitespace once
+ #"\n" // Match one newline character
+ #"[a-zA-Z]"; // Match any letter
2) Add comments within your regex by using the syntax (?#comment)
needle(?# this will find a needle)
Example
3) Activate free-spacing mode within your regex:
nee # this will find a nee...
dle # ...dle (the split means nothing when white-space is ignored)
doc: https://www.regular-expressions.info/freespacing.html
Example

Split credit card number into 4 chunks using Regex lookahead?

I want to chunk a credit card number (in my case I always have 16 digits) into 4 chunks of 4 digits.
I've succeeded doing it via positive look ahead :
var s="4581458245834584";
var t=Regex.Split(s,"(?=(?:....)*$)");
Console.WriteLine(t);
But I don't understand why the result is with two padded empty cells:
I already know that I can use "Remove Empty Entries" flag , But I'm not after that.
However - If I change the regex to (?=(?:....)+$) , then I get this result :
Question
Why does the regex emit empty cells ? and how can I fix my regex so it produce 4 chunks at first place ( without having to 'trim' those empty entries )

But I don't understand why the result is with two padded empty cells:
Let's try breaking down your regex.
Regex: (?=(?:....)*$)
Explanation: Lookahead (?=) for anything 4 times(?:....) for zero or more times. Just looking ahead and matching nothing will match zero width.
Since you are using * quantifier which says zero or more it matches first zero width at beginning or string and also at end of string.
Visualize it from this snapshot of Regex101 Demo
[
So How can I select only those 3 splitters in the middle ?
I don't know C# very well but this 3 step method might work for you.
Search with (\d{4}) and replace with -\1. Result will be -4581-4582-4583-4584. Demo
Now replace first - by searching with ^-. Result will be 4581-4582-4583-4584. Demo
At last search for - and split on it. Demo. Used \n to substitute for demo purpose.
Alternative Solution Inspired from Royi's answer.
Regex: (?=(?!^)(?:\d{4})+$)
Explanation:
(?= // Look ahead for
(?!^) // Not the start of string
(?:\d{4})+$ // Multiple group of 4 digits till end of string
)
Since nothing is matched and only lookaround assertions are used, it will pinpoint Zero width after a group of 4 digits.
Regex101 Demo

It seems like I've found an answer.
Looking at those splitters - I needed to get rid of the edges :
So I thought - how can I tell the regex engine "not at the start of the line " ?
Which is exactly what (?!^) does
So here is the new regex :
var s="4581458245834584";
var t=Regex.Split(s,"(?!^)(?=(?:....)+$)");
Console.WriteLine(t);
Result :

Umm, I don't know WHY you need Regex for this. You just overcomplicate things. Better way is to just split it manually:
var values = new List<int>();
for(int i =0;i < 4;i++)
{
var value = int.Parse(s.Substring(i*4, 4));
values.Add(value);
}
Regex solution:
var s = "4581458245834584";
var separated = Regex.Match(s, "(.{4}){4}").Groups[1].Captures.Cast<Capture>().Select(x => x.Value).ToArray();

It has been mentioned already that the * quantifier also matches at the end of string where there are zero group-matches ahead. To avoid matching at start and end you can use \B non word boundary which only matches between two word characters not giving matches for start and end.
\B(?=(?:.{4})+$)
See demo at regex101
Because the lookahead won't be triggered at start or end of the string you could even use *

Regex matching numbers without letters in front of it

I want to match numbers like "100", "1.1", "5.404", IF they do not include a letter in front like this: "V102".
Here is my current regular expression:
(?<![A-Za-z])[0-9.]+
This is supposed to match any character 0-9. one or more repetitions, if prefix is absent (A-Za-z).
But what it does is match V102, as 02, so it just chips away V and one more letter and then the rest fits while it actually shouldn't match that case at all. How can I make it so it grabs all numbers, and then checks if the prefix is non existent?

Add digits and decimal point to your negative lookbehind:
(?<![A-Za-z0-9.])[0-9.]+
This will force all matches to start with a non-digit and non-letter (i.e., a space or other separator). That way the end of a number will not be a valid match either.
Demo: http://www.rubular.com/r/EDuI2D9jnW

could you possibly be able to use word boundaries?
\b[0-9\.]+\b

Try the regex:
(?<![A-Za-z0-9])[0-9.]+

If you don't want letters or spaces anywhere in your string, then this should work:
^[0-9.]+$

A Non-Regex solution.
If you have the following string, then you can use double.TryParse to see if the string is a double. Try:
string str = "100 1.1 V100 d333 ABC 1.1";
double temp;
string[] result = str.Split().Where(r => (double.TryParse(r, out temp))).ToArray();
Or if you need a double array in return then:
double[] numberArray = str.Split()
.Where(r => double.TryParse(r, out temp))
.Select(r => double.Parse(r))
.ToArray();

Try using the caret ^ operator. This operator indicates that you want your pattern to start at the beginning of the input. For example ^[0-9.]+ will match inputs that begin with a digit or a . and has any number of them.
Note that this pattern does not match only numbers, as it matches also patterns with more then 1 dot, for example 2.00.2, which is not a valid number.

Regular Expression - Remove zeroes inside an expression

I need to remove leading zeroes from the numerical part of an expression (using .net 2.0 C# Regex class).
Ex:
PAR0000034 -> PAR34
WP0003204 -> WP3204
I tried the following:
//keep starting characters, get rid of leading zeroes, keep remaining digits
string result = Regex.Replace(inputStr, "^(.+)(0+)(/d*)", "$1$3", RegexOptions.IgnoreCase)
Obviously, it did not work. I need a bit of help to find the mistake.

You don't need a regular expression for that, the Split method can do that for you.
Splitting on '0', removing empty entries (i.e. between the mulitple zeroes), and limiting the result to two strings will give you the two strings before and after the leading zeroes. Then you just put those two strings together again:
string result = String.Concat(
input.Split(new char[] { '0' }, 2, StringSplitOptions.RemoveEmptyEntries)
);

In your expression the .* part is greedy, so it catches full string. Further
use backslash instead of slash for digit \d
string result = Regex.Replace(inputStr, #"^([^0]+)(0+)(\d*)", "$1$3");
Or use look behind instead:
string result = Regex.Replace(inputStr, "(?<=[a-zA-Z])0+", "");

This works for me:
Regex.Replace("PPP00001001", "([^0]*)0+(.*)", "$1$2");

The phrase "leading zeroes" is confusing, since the zeroes you're talking about aren't actually at the beginning of the string. But if I understand you correctly, you want this:
string result = Regex.Replace(inputStr, "^(.*?)0+", "$1");
There are actually several ways to do it, with and without regex, but the above is probably the shortest and easiest to understand. The important part is the .*? lazy quantifier. This will ensure that it a) finds only the first string of zeroes, and b) deletes all the "leading" zeroes in the string.

insert hyphen in a 9 digit number after 5 digits using regex

I have to automatically insert a hyphen in 9 digit number on text change event in c# only not javascript.
So if my number is 123456789 then it automatically becomes 12345-6789.
I would like to use regex.match.
My try:
The regex "^\d{5}(-\d{4})?$" is how the result should be.
so,
Regex regTest = new Regex("^\\d{5}(-\\d{4})?$");
Match match = regTest.Match(s);
if (match.Success)
{
var numString = match.Value;
}
But the above does not returns a success.
Thanks for help.

Your code sample simply checks that the format is xxxxx-xxxx. It doesn't insert the hyphen.
You do not need a RexEx to insert a hyphen:
myString.Insert(5, "-");

The regular expression seems correct. You can verify it here:
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
Most probably you are not inserting the '-' and then matching.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to isolate a specific substring - c#

This is done easily with Regex.Replace() using the following regular expression... ^\d+\s+ So it'd be something like this: return Regex.Replace(text, #"^\d+\s+", ""); Basically you're just trimming the first number \d and the whitespace \s that follows.

Example in PHP(C# regex should be compatible): $line = "6 11 rows processed"; $resp = preg_match("/[0-9]+\s+(.*)/",$line,$out); echo $out[1]; I hope I catched your point.

Related

How can I filter out certain combinations?

Split credit card number into 4 chunks using Regex lookahead?

Regex matching numbers without letters in front of it

Regular Expression - Remove zeroes inside an expression

insert hyphen in a 9 digit number after 5 digits using regex

Categories

Resources