How to split a string on numbers and it substrings? - c#

How to split a string on numbers and substrings?
Input: 34AG34A
Expected output: {"34","AG","34","A"}
I have tried with Regex.Split() function, but I can not figure out what pattern would work.
Any ideas?

The regular expression (\d+|[A-Za-z]+) will return the groups you require.

I think you have to look for two patterns:
a sequence of digits
a sequence of letters
Hence, I'd use ([a-z]+)|([0-9]+).
For instance, System.Text.RegularExpressions.Regex.Matches("asdf1234be56qq78", "([a-z]+)|([0-9]+)") returns 6 groups, containing "asdf", "1234", "be", "56", "qq", "78".

First, you ask for "numbers" but don't specify what you mean by that.
If you mean "digits in 0-9" then you need the character class [0-9]. There is also the character class \d which in addition to 0-9 matches some other characters.
\d matches any decimal digit. It is equivalent to the \p{Nd} regular expression pattern, which includes the standard decimal digits 0-9 as well as the decimal digits of a number of other character sets.
I assume that you are not interested in negative numbers, numbers containing a decimal point, foreign numerals such as δΊ”, etc.
Split is not the right solution here. What you appear to want to do is tokenize the string, not split it. You can do this by using Matches instead of Split:
string[] output = Regex.Matches(s, "[0-9]+|[^0-9]+")
.Cast<Match>()
.Select(match => match.Value)
.ToArray();

Don't use Regex.Split, use Regex.Match:
var m = Regex.Match("34AG34A", "([0-9]+|[A-Z]+)");
while (m.Success) {
Console.WriteLine(m);
m = m.NextMatch();
}
Converting this to an array is left as an exercise to the reader. :-)

Related

Regex split and merge into single record

In my C# application I'm using the below Regex to split the string ([A-Z0-9]{20}\d{0}). But it is splitting the ErrorCode and ErrorMsg as two different records but I need ErrorCode and ErrorMgs in the Single Array record.
For Example:
Current Logic:
[0] 05300030000GN0030018
[1 Field is required.
But I need like below one
[0] 05300030000GN0030018Field is required.
Current Implementation:
Expected output
Assuming the msg is never empty and \d{0} was used to fail any match if the next char after [A-Z0-9]{20} is a digit, you can use
var result = Regex.Matches(input, #"\b[A-Z0-9]{20}\D.*?(?=\b[A-Z0-9]{20}\D|\z)", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Value)
.ToList();
See the regex demo. Note that in case msg can be empty you need to use a (?!\d) lookahead instead of \D, #"\b[A-Z0-9]{20}(?!\d).*?(?=\b[A-Z0-9]{20}(?!\d)|\z)".
Details:
\b - word boundary (need to make sure the char limit is fine)
[A-Z0-9]{20} - twenty uppercase ASCII letters or digits
\D - a non-digit char
.*? - any zero or more chars as few as possible
(?=\b[A-Z0-9]{20}\D|\z) - a positive lookahead that requires a word boundary, twenty uppercase ASCII letters or digits and a non-digit or end of string immediately to the right of the current location.

Regex Match all characters until reach character, but also include last match

I'm trying to find all Color Hex codes using Regex.
I have this string value for example - #FF0000FF#0038FFFF#51FF00FF#F400FFFF and I use this:
#.+?(?=#)
pattern to match all characters until it reaches #, but it stops at the last character, which should be the last match.
I'm kind of new to this Regex stuff. How could I also get the last match?
Your regex does not match the last value because your regex (with the positive lookahead (?=#)) requires a # to appear after an already consumed value, and there is no # at the end of the string.
You may use
#[^#]+
See the regex demo
The [^#] negated character class matches any char but # (+ means 1 or more occurrences) and does not require a # to appear immediately to the right of the currently matched value.
In C#, you may collect all matches using
var result = Regex.Matches(s, #"#[^#]+")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
A more precise pattern you may use is #[A-Fa-f0-9]{8}, it matches a # and then any 8 hex chars, digits or letters from a to f and A to F.
Don't rely upon any characters after the #, match hex characters and it
will work every time.
(?i)#[a-f0-9]+

Regex: Replace number of digits in the the middle of a number with special charicters

I have a very large number (it's length may vary) as an input.
And I need a regular expression that will leave first 3 digits and last 3 digits unmodified and will replace all the digits in the between them with some character. The total length of an output should remain the same.
For example:
Input 123456789123456
Output 123xxxxxxxxx456
So far i was able to divide the input number in to 3 the groups by using
^(\d{3})(.*)(\d{3})
The second group is the one that needed to be replaced so it will be something like
$1 {Here goes the replacement of the 2 group} $3
I am struggling with the replacement :
Regex r = new Regex("^(\d{3})(.*)(\d{3})");
r.Replace(input,"$1 {Here goes the replacement of the 2 group} $3")
How should i write the replacement for the 2 group here?
Thanks in advance.
You could try the below regex which uses lookbehind and lookahead,
string str = "123456789123456";
string result = Regex.Replace(str, #"(?<=\d{3})\d(?=\d{3})", "x");
Console.WriteLine(result);
Console.ReadLine();
Output:
123xxxxxxxxx456
IDEONE
DEMO

Regex matching numbers without letters in front of it

I want to match numbers like "100", "1.1", "5.404", IF they do not include a letter in front like this: "V102".
Here is my current regular expression:
(?<![A-Za-z])[0-9.]+
This is supposed to match any character 0-9. one or more repetitions, if prefix is absent (A-Za-z).
But what it does is match V102, as 02, so it just chips away V and one more letter and then the rest fits while it actually shouldn't match that case at all. How can I make it so it grabs all numbers, and then checks if the prefix is non existent?
Add digits and decimal point to your negative lookbehind:
(?<![A-Za-z0-9.])[0-9.]+
This will force all matches to start with a non-digit and non-letter (i.e., a space or other separator). That way the end of a number will not be a valid match either.
Demo: http://www.rubular.com/r/EDuI2D9jnW
could you possibly be able to use word boundaries?
\b[0-9\.]+\b
Try the regex:
(?<![A-Za-z0-9])[0-9.]+
If you don't want letters or spaces anywhere in your string, then this should work:
^[0-9.]+$
A Non-Regex solution.
If you have the following string, then you can use double.TryParse to see if the string is a double. Try:
string str = "100 1.1 V100 d333 ABC 1.1";
double temp;
string[] result = str.Split().Where(r => (double.TryParse(r, out temp))).ToArray();
Or if you need a double array in return then:
double[] numberArray = str.Split()
.Where(r => double.TryParse(r, out temp))
.Select(r => double.Parse(r))
.ToArray();
Try using the caret ^ operator. This operator indicates that you want your pattern to start at the beginning of the input. For example ^[0-9.]+ will match inputs that begin with a digit or a . and has any number of them.
Note that this pattern does not match only numbers, as it matches also patterns with more then 1 dot, for example 2.00.2, which is not a valid number.

Regular Expression: Section names with unknown length?

I have a text block that is formatted like this:
1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 etc
An unknow number of numbers and dots (sections of a legal document)
How can I capture the full section (1.2.3.4.5) into a group?
I use C# but any regex is fine, aI can translate it.
UPDATED
Use this Regex:
Regex.Matches(inputString, #"\d[\.\d]*(?<!\.)");
explain:
\d digits (0-9)
[.\d]* any character of: '.', digits (0-9)
(0 or more times, matching the most amount possible))
(?<! subexpression) Zero-width negative lookbehind assertion.
string s = "1.2.3.4.5 or 1.2222.3.4.5 or 1 or 1.2 or 2222.3333.111.5 etc";
var matches = Regex.Matches(s, #"\d+(\.\d+)*").Cast<Match>()
.Select(m => m.Value)
.ToArray();
well, if you know you can't go beyond 5, then you can do
#"1+((.2+)((.3+)((.4+)(.5+)?)?)?)?"
and you can expand on that pattern for every symbol, up to a finite number of symbols
the + means any number of occurrences of the symbol, but at least 1. IF 0 is valid, you can use * instead
put ?: after an opening parenthesies if you don't want the pattern to be captured
like example: (?:abc)
I ommitted them to make the regex more readable.
the ? after the parenthesies, means 1 or 0 of the preceding symbol.
Now if you don't know how far you string can go, for instance
"1.2.3.4......252525.262626.272727.......n.n.n" than my intuition tells me that you can't do that with regex.

Categories

Resources