what is the regular expression to remove everything but digit? - c#

I have this data in format
"NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1"
the format is Item1:Item1Qty_Item2:Item2Qty.........ItemN:ItemNQty
I need to separte the the items and their corresponding quantities and form arrays. I did the item part like this..
var allItemsAry = Regex.Replace(myString, "[\\:]+\\d", "").Split('_');
Now allItemsAry is correct like this [NEW ITEM, BELT, JEANS, BELT, SUIT 3 PCS, SHOES]
But I can't figrure out how to get qty, whatever expression I try that 3 from SUIT 3 PCS comes along with that, like these
var allQtyAry = Regex.Replace(dataForPackageConsume, "[^(\\:+\\d)]", "").split(':')
This comes up as :1:3:1:13:1:1 (when replaced). So I can't separate by : to get make it array, as can be seen the forth item is 13, while it should be 1, that 3 is coming from SUIT 3 PCS. I also tried some other variations, but that 3 from SUIT 3 PCS always pops in. How do I just get the quantities of clothes (possible attached with : so I can split them by this and form the array?
UPDATE : If I didn't make it clear before I want the numbers that are exactly preceded by : along with the semicolon.
So, what I want is :1:3:1:1:1:1.

Instead of removing everything except numerals, how about matching only numerals?
For instance:
Regex regex = new Regex(#":\d+");
string result = string.Empty;
foreach (Match match in regex.Matches(input))
result += match.Value;

[^\d:]+|:(?!\d)|(?<!:)\d+
[^\d:]+ will match all non-digit non-:s.
:(?!\d) will match all :s not followed by a digit (negative lookahead).
(?<!:)\d+ will match all digits not preceded by a : (negative lookbehind).
Source
NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1
Regular Expression
[^\d:]+|:(?!\d)|(?<!:)\d+
Results
Match
NEW ITEM
_BELT
_JEANS
_BELT
_SUIT
3
PCS
_SHOES

You want it only numbers like :1:3:1:1:3:1:1 ?
string s = "NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1";
var output = Regex.Replace(s, #"[^0-9]+", "");
StringBuilder sb = new StringBuilder();
foreach (var i in output)
{
sb.Append(":" + i);
}
Console.WriteLine(sb); // :1:3:1:1:3:1:1
Here is a DEMO.
Ok, if every char is digit after : then you can use it like;
string s = "NEW ITEM:1_BELT:3_JEANS:1_BELT:1_SUIT 3 PCS:1_SHOES:1";
var array = s.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
StringBuilder sb = new StringBuilder();
foreach (var item in array)
{
if (Char.IsDigit(item[0]))
{
sb.Append(":" + item[0]);
}
}
Console.WriteLine(sb); //:1:3:1:1:1:1
DEMO.

This will work with one replace:
var allQtyAry = Regex.Replace(dataForPackageConsume, #"[^_:]+:", "").split('_')
Explanation:
[^_:] means match anything that's not a _ or a :
[^_:]+: means match any sequence of at least one character not matching either _ or :, but ending with a :
Since regular expressions are greedy by default (ie they grab as much as possible), matching will start at the beginning of the string or after each _:
NEW ITEM: 1_BELT: 3_JEANS: 1_BELT: 1_SUIT 3 PCS: 1_SHOES: 1
Removing the matched parts (the italic bold bits above) results in:
1_3_1_1_1_1
Splitting by _ results in:
[1, 3, 1, 1, 1, 1]

Try this regex [^:\d+?].*?(?=:), it should do the trick
string[] list = Regex.Replace(test, #"[^:\d+?].*?(?=:)", string.Empty).Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
The regex matches and replaces with an empty string everything preceding the colon : (exclusive) .*?(?=:). It also excludes :# from the match [^:\d+?] thus you end up with :1:3:1:1:1:1 before the split

Related

Regex How to Match 2 fields

How would capture both the filenames inside the quotes, and the numbers following as named captures (Regex / C#)?
Files("fileone.txt", 5969784, "file2.txt", 45345333)
Out of every occurrence in the string, the ability to capture "fileone.txt" and the integer following (a loop cycles each pair)
I am trying to use this https://regex101.com/r/MwMzBo/1 but having issues matching without the '[' and ']'.
Required to be able to loop each filename+size as a pair and moving next.
Any help is appreciated!
UPDATE
string file = "Files(\"fileone.txt\", 5969784, \"file2.txt\", 45345333, \"file2.txt\", 45345333)";
var regex = new Regex(#"(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<file>.*?)""|'(?<file>.*?)')\s*,\s*(?<number>\d+)");
var match = regex.Match(file);
var names = match.Groups["file"].Captures.Cast<Capture>();
var lengths = match.Groups["number"].Captures.Cast<Capture>();
var filelist = names.Zip(lengths, (f, n) => new { file = f.Value, length = long.Parse(n.Value) }).ToArray();
foreach (var item in filelist)
{
// Only returning 1 pair result, ignoring the rest
}
Reading match.Value to confirm what is being read. Only first pair is being picked up.
while (match.Success)
{
MessageBox.Show(match.Value);
match = match.NextMatch();
}
Now we are getting all results properly. I read, that Regex.Match only returns the first matched result. This explains a lot.
You can use
(?:\G(?!\A)\s*,\s*|\w+\()(?:""(?<file>.*?)""|'(?<file>.*?)')\s*,\s*(?<number>\d+)
See the regex demo
Details:
(?:\G(?!\A)\s*,\s*|\w+\() - end of the previous successful match and a comma enclosed with zero or more whitespaces, or a word and an opening ( char
(?:""(?<file>.*?)""|'(?<file>.*?)') - ", Group "file" capturing any zero or more chars other than a newline char as few as possible and then a ", or a ', Group "file" capturing any zero or more chars other than a newline char as few as possible and then a '
\s*,\s* - a comma enclosed with zero or more whitespaces
(?<number>\d+) - Group "number": one or more digits.
I like doing it in smaller pieces :
string input = "cov('Age', ['5','7','9'])";
string pattern1 = #"\((?'key'[^,]+),\s+\[(?'values'[^\]]+)";
Match match = Regex.Match(input, pattern1);
string key = match.Groups["key"].Value.Trim(new char[] {'\''});
string pattern2 = #"'(?'value'[^']+)'";
string values = match.Groups["values"].Value;
MatchCollection matches = Regex.Matches(values, pattern2);
int[] number = matches.Cast<Match>().Select(x => int.Parse(x.Value.Replace("'",string.Empty))).ToArray();

Regular expression not working in dotnet C# but works in online editors [duplicate]

I want to get a Substring out of a String.
The Substring I want is a sequence of numerical characters.
Input
"abcdefKD-0815xyz42ghijk";
"dag4ah424KD-42ab333k";
"BeverlyHills90210KD-433Nokia3310";
Generally it could be any String, but they all have one thing in common:
There is a part that starts with KD-
and ends with a number
Everything after the number to be gone.
In the examples above this number would be 0815, 42, 433 respectively. But it could be any number
Right now I have a Substring that contains all numerical characters after KD- but I would like to have only the 0815ish part of the string.
What i have so far
String toMakeSub = "abcdef21KD-0815xyz429569468949489694694689ghijk";
toMakeSub = toMakeSub.Substring(toMakeSub.IndexOf("KD-") + "KD-".Length);
String result = Regex.Replace(toMakeSub, "[^0-9]", "");
The Result is 0815429569468949489694694689 but I want only the 0815 (it could be any length though so cutting after four digits is not possible).
Its as easy as the following pattern
(?<=KD-)\d+
The way to read this
(?<=subpattern) : Zero-width positive lookbehind assertion. Continues matching only if subpattern matches on the left.
\d : Matches any decimal digit.
+ : Matches previous element one or more times.
Example
var input = "abcdef21KD-0815xyz429569468949489694694689ghijk";
var regex = new Regex(#"(?<=KD-)\d+");
var match = regex.Match(input);
if (match.Success)
{
Console.WriteLine(match.Value);
}
input = "abcdef21KD-0815xyz429569468949489694694689ghijk, KD-234dsfsdfdsf";
// or to match multiple times
var matches = regex.Matches(input);
foreach (var matchValue in matches)
{
Console.WriteLine(matchValue);
}

Regex for obtaining numeric values within a string in C#

I have the following example strings:
TAR:100
TAR:100|LED:50
TAR:30|LED:30|ASO:40
I need a regex that obtains the numeric values after the colon, which are always in the range 0 to 100 inclusive.
The result after the regex is applied to any of the above strings should be:
for TAR:100 the result should be 100
for TAR:100|LED:50 the result should be the array [100,50]
for TAR:30|LED:30|ASO:40 the result should be the array [30,30,40]
The word before the colon can have any length and both upper and lowercase.
I have tried with the following but it doesn't yield the result I need:
String text = "TAR:100|LED:50";
String pattern = "\\|?([a-zA-Z]{1,}:)";
string[] values= Regex.Split(text, pattern);
The regex should work whether the string is TAR:100 or TAR:100|LED:50 if possible.
You added () which makes the text parts that you want to remove also be returned.
Below is my solution, with a slightly changed regex.
Note that we need to start looping the values at i = 1, which is purely caused by using Split on a string that starts with a split-sequence; it has nothing to do with the Regex itself.
Explanation: if we used a simpler str.Split to split by a separator "#", then "a#b#c" would produce ["a", "b", "c"], whereas "#b#c" would produce ["", "b", "c"]. In general, and by definition: if Split removes N sequences by which the string gets splitted, then the result is N+1 strings. And all the strings that we deal with here are of the form "#b#c", so there is always an empty first result.
Accepting that as a given fact, the results are usable by starting from i = 1:
var pattern = #"\|?[a-zA-Z]+:";
var testCases = new[] { "TAR:100", "TAR:100|LED:50", "TAR:30|LED:30|ASO:40" };
foreach (var text in testCases)
{
string[] values = Regex.Split(text, pattern);
for (var i = 1; i < values.Length; i++)
Console.WriteLine(values[i]);
Console.WriteLine("------------");
}
Output:
100
------------
100
50
------------
30
30
40
------------
Working DotNetFiddle: https://dotnetfiddle.net/i9kH8n
In .NET you can use the Group.Captures and use the same name for 2 capture groups and match the format of the string.
\b[a-zA-Z]+:(?<numbers>[0-9]+)(?:\|[a-zA-Z]+:(?<numbers>[0-9]+))*\b
Regex demo | C# demo
string[] strings = {
"TAR:100",
"TAR:100|LED:50",
"TAR:30|LED:30|ASO:40"
};
string pattern = #"\b[a-zA-Z]+:(?<numbers>[0-9]+)(?:\|[a-zA-Z]+:(?<numbers>[0-9]+))*\b";
foreach (String str in strings)
{
Match match = Regex.Match(str, pattern);
if (match.Success)
{
string[] result = match.Groups["numbers"].Captures.Select(c => c.Value).ToArray();
Console.WriteLine(String.Join(',', result));
}
}
Output
100
100,50
30,30,40
Another option could be making use of the \G anchor and have the value in capture group 1.
\b(?:[a-zA-Z]+:|\G(?!^))([0-9]+)(?:\||$)
Regex demo | C# demo
string[] strings = {
"TAR:100",
"TAR:100|LED:50",
"TAR:30|LED:30|ASO:40"
};
string pattern = #"\b(?:[a-zA-Z]+:|\G(?!^))([0-9]+)(?:\||$)";
foreach (String str in strings)
{
MatchCollection matches = Regex.Matches(str, pattern);
string[] result = matches.Select(m => m.Groups[1].Value).ToArray();
Console.WriteLine(String.Join(',', result));
}
Output
100
100,50
30,30,40

Parsing into dictionary with regex as separator for splitting

As I said in title, I think the idea would be to split it by something like this\d+?=.*?\d= but not quite sure... Any idea how best to parse this string:
1=Some dummy sentence
2=Some other sentence 3=Third sentence which can be in the same line
4=Forth sentence
some text which shouldn't be captured and spplitted
And what I'm hoping to get from this is a Dictionary which will have this number for key, and this string in the value, so for example:
1, "Some dummy sentence"
2, "Some other sentence"
3, "Third sentence which can be in the same line"
4, "Forth sentence"
Method to parse text into dictionary:
public static Dictionary<int, string> GetValuesToDictionary(string text)
{
var pattern = #"(\d+)=(.*?)((?=\d=)|\n)";
//If spaces between digit and equal sign are possible then (\d+)\s*=\s*(.*?)((?=\d\s?=)|\n)
var regex = new Regex(pattern);
var pairs = new Dictionary<int, string>();
var matches = regex.Matches(text);
foreach (Match match in matches)
{
var key = int.Parse(match.Groups[1].Value);
var value = match.Groups[2].Value;
if (!pairs.ContainsKey(key))
{
pairs.Add(key, value);
}
//pairs.Add(key, value);
}
return pairs;
}
In this case i check if lkey already exists and if so i do not add it but you can see for yourself if you need this check.
Includes digit groups without equal sign in the value.
What about this: https://regex101.com/r/6ED8Om/2
\n?(\d+)=(.*?)(?= *\d|\n)
\n?(\d+)= matches optional new line character followed by digits and equal sign
(.*?) matches following text
(?= *\d|\n) matches any number of spaces followed by a digit, or a new line character. The spaces prevent #2 to include the two spaces between its end end #3
EDIT: Use other answer code with this regex to save your values to a dictionnary. Group 1 matches the digits, group 2 matches the text.

Replace Single WhiteSpace without Replacing Multiple WhiteSpace

I have a string in the format:
abc def ghi xyz
I would like to end with it in format:
abcdefghi xyz
What is the best way to do this? In this particular case, I could just strip off the last three characters, remove spaces, and then add them back at the end, but this won't work for cases in which the multiple spaces are in the middle of the string.
In Short, I want to remove all single whitespaces, and then replace all multiple whitespaces with a single. Each of those steps is easy enough by itself, but combining them seems a bit less straightforward.
I'm willing to use regular expressions, but I would prefer not to.
This approach uses regular expressions but hopefully in a way that's still fairly readable. First, split your input string on multiple spaces
var pattern = #" +"; // match two or more spaces
var groups = Regex.Split(input, pattern);
Next, remove the (individual) spaces from each token:
var tokens = groups.Select(group => group.Replace(" ", String.Empty));
Finally, join your tokens with single spaces
var result = String.Join(' ', tokens.ToArray());
This example uses a literal space character rather than 'whitespace' (which includes tabs, linefeeds, etc.) - substitute \s for ' ' if you need to split on multiple whitespace characters rather than actual spaces.
Well, Regular Expressions would probably be the fastest here, but you could implement some algorithm that uses a lookahead for single spaces and then replaces multiple spaces in a loop:
// Replace all single whitespaces
for (int i = 0; i < sourceString.Length; i++)
{
if (sourceString[i] = ' ')
{
if (i < sourceString.Length - 1 && sourceString[i+1] != ' ')
sourceString = sourceString.Delete(i);
}
}
// Replace multiple whitespaces
while (sourceString.Contains(" ")) // Two spaces here!
sourceString = sourceString.Replace(" ", " ");
But hey, that code is pretty ugly and slow compared to a proper regular expression...
For a Non-REGEX option you can use:
string str = "abc def ghi xyz";
var result = str.Split(); //This will remove single spaces from the result
StringBuilder sb = new StringBuilder();
bool ifMultipleSpacesFound = false;
for (int i = 0; i < result.Length;i++)
{
if (!String.IsNullOrWhiteSpace(result[i]))
{
sb.Append(result[i]);
ifMultipleSpacesFound = false;
}
else
{
if (!ifMultipleSpacesFound)
{
ifMultipleSpacesFound = true;
sb.Append(" ");
}
}
}
string output = sb.ToString();
The output would be:
output = "abcdefghi xyz"
Here's an approach which uses some fairly subtle logic:
public static string RemoveUnwantedSpaces(string text)
{
var sb = new StringBuilder();
char lhs = '\0';
char mid = '\0';
foreach (char rhs in text)
{
if (rhs != ' ' || (mid == ' ' && lhs != ' '))
sb.Append(rhs);
lhs = mid;
mid = rhs;
}
return sb.ToString().Trim();
}
How it works:
We will examine each possible three-character subsequence linearly across the string (in a kind of three-character sliding window). These three characters will be represented, in order, by the variables lhs, mid and rhs.
For each rhs character in the string:
If it's not a space we should output it.
If it is a space, and the previous character was also space but the one before that isn't, then this is the second in a sequence of at least two spaces, and therefore we should output one space.
Otherwise, don't output a space because this is either the first or the third (or later) space in a sequence of two or more spaces and in either case we don't want to output a space: If this happens to be the first in a sequence of two or more spaces, a space will be output when the second space comes along. If this is the third or later, we've already output a space for it.
The subtlety here is that I've avoided special casing the beginning of the sequence by initialising the lhs and mid variables with non-space characters. It doesn't matter what those values are, as long as they are not spaces, but I made them \0 to indicate that they are special values.
After second thought here is one line regex solution:
Regex.Replace("abc def ghi xyz", "( )( )*([^ ])", "$2$3")
the result of this is "abcdefghi xyz"
ORIGINAL ANSWER:
Two lines of code regex solution:
var tmp = Regex.Replace("abc def ghi xyz", "( )([^ ])", "$2")
tmp is "abcdefghi xyz"
then:
var result = Regex.Replace(tmp, "( )+", " ");
result is "abcdefghi xyz"
Explanation:
The first line of code removes single whitespaces and removes one whitespace for multiple whitespaces (so there are 3 spaces in tmp between letters i and x).
The second line just replace multiple whitespaces with one.
In-depth explanation of first line:
We match input string to regex that matches one space and non-space next to it. We also put this two characters in separate groups (we use ( ) for anonymous grouping).
So for "abc def ghi xyz" string we have this matches and groups:
match: " d" group1: " " group2: "d"
match: " g" group1: " " group2: "g"
match: " x" group1: " " group2: "x"
We are using substitution syntax for Regex.Replace method to replace match with the content of second group (which is non-whitespace character)

Categories

Resources