How to get a part of string matching a regular expression (c#)?

How to get a part of string matching a regular expression (c#)? - c#

I have some strings like " 8 2 / 5", "5 5/ 7" and so on, containg math fractions. I need to convert them into doubles. So, how can I get these parts
" 8 "
"2 / 5",
into variables?
var src = " 8 2 / 5";
string base = getBase(src);
string frac = getFrac(src);
// Now base must contain "8" and frac "2/5";
I don't know if some stuff could do this "fraction-to-double" task is already exist (if so, give a link or something), but if I could get these parts into variables I would be able to perform a division and an addition at least.

The simplest approach would be a regex like
(\d) (\d) \/ (\d)
which will work with single digits and all spaces are set correctly. To code to calculate the result could look like
string src = "8 2 / 5";
Regex rx = new Regex(#"(\d) (\d) \/ (\d)");
Match m = rx.Match(src);
if (m.Success)
{
int bs = Convert.ToInt32(m.Groups[1].Value);
int f1 = Convert.ToInt32(m.Groups[2].Value);
int f2 = Convert.ToInt32(m.Groups[3].Value);
double result = bs + (double)f1 / f2;
}
To allow usage of multiple digits and multiple spaces between your numbers you could improve the regex like
(\d*)\s+(\d*)\s+\/\s+(\d*)
To test regexes you can use som online tools like regex101.

Simple Operations : the Regex tested on RegexStorm
String src = " 8 2 / 5";
String Regex = #"([1-9])([0-9]*)([ ])([1-9])([0-9]*)((.)([0-9]+)|)([ ])([- / + *]){1}([ ])([1-9])([0-9]*)((.)([0-9]+)|)";
if(!Regex.IsMatch(src , Regex ))
return;
float Base = float.Parse(Base.Split(" ")[0], CultureInfo.InvariantCulture.NumberFormat);
float Devided= float.Parse(Base.Split(" ")[1], CultureInfo.InvariantCulture.NumberFormat);
float Operator= float.Parse(Base.Split(" ")[2], CultureInfo.InvariantCulture.NumberFormat);
float Denominator= float.Parse(Base.Split(" ")[3], CultureInfo.InvariantCulture.NumberFormat);
//TODO your operation here

Related

Extracting integer ranges separated with hyphen

I try to filter some strings I streamed for some useful information in C#.
I got two possible string structures:
string examplestring1 = "from - to (mm) no. 1\r\n\r\nna 570 - 590\r\n60 18.12.20\r\nna 5390 - 5410\r\n60 18.12.20\r\nna 11380 - 11390 60 18.12.20\r\nPage 1/1";
string examplestring2 = "e ne 570 - 590 ne 5390 - 5410 ne 11380 - 11390 e";
I'd like to get an array or a List of strings in the format of "xxx - xxx". Like:
string[] example = new string[]{"570 - 590","5390 - 5410","11380 - 11390"};
I tried to use Regex:
List<string> numbers = new List<string>();
numbers.AddRange(Regex.Split(examplestring2, #"\D+"));
At least I get a list only containg the numbers. But that doesn't work out for examplestring1 since there is date within.
Also I tried to play around with Regex pattern. But things like following does not work.
Regex.Split(examplestring1, #"\D+" + " - " + #"\D+");
I'd be grateful for a solution or at least some hint how to solve that matter.

You can use
var results = Regex.Matches(text, #"\d+\s*-\s*\d+").Cast<Match>().Select(x => x.Value);
See the regex demo. If there must be a single regular space on both ends of the -, you can use \d+ - \d+ regex.
If you want to match any -, you can use [\p{Pd}\xAD] instead of -.
Note that \d in .NET matches any Unicode digits, to only match ASCII digits, use RegexOptions.ECMAScript option: Regex.Matches(text, #"\d+\s*-\s*\d+", RegexOptions.ECMAScript).

Simplify regex code in C#: Add a space between a digit/decimal and unit

I have a regex code written in C# that basically adds a space between a number and a unit with some exceptions:
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+", #"$1");
dosage_value = Regex.Replace(dosage_value, #"(\d)%\s+", #"$1%");
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d+)?)", #"$1 ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+%", #"$1% ");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+:", #"$1:");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+e", #"$1e");
dosage_value = Regex.Replace(dosage_value, #"(\d)\s+E", #"$1E");
Example:
10ANYUNIT
10:something
10 : something
10 %
40 e-5
40 E-05
should become
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
Exceptions are: %, E, e and :.
I have tried, but since my regex knowledge is not top-notch, would someone be able to help me reduce this code with same expected results?
Thank you!

For your example data, you might use 2 capture groups where the second group is in an optional part.
In the callback of replace, check if capture group 2 exists. If it does, use is in the replacement, else add a space.
(\d+(?:\.\d+)?)(?:\s*([%:eE]))?
( Capture group 1
\d+(?:\.\d+)? match 1+ digits with an optional decimal part
) Close group 1
(?: Non capture group to match a as a whole
\s*([%:eE]) Match optional whitespace chars, and capture 1 of % : e E in group 2
)? Close non capture group and make it optional
.NET regex demo
string[] strings = new string[]
{
"10ANYUNIT",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
string pattern = #"(\d+(?:\.\d+)?)(?:\s*([%:eE]))?";
var result = strings.Select(s =>
Regex.Replace(
s, pattern, m =>
m.Groups[1].Value + (m.Groups[2].Success ? m.Groups[2].Value : " ")
)
);
Array.ForEach(result.ToArray(), Console.WriteLine);
Output
10 ANYUNIT
10:something
10: something
10%
40e-5
40E-05
As in .NET \d can also match digits from other languages, \s can also match a newline and the start of the pattern might be a partial match, a bit more precise match can be:
\b([0-9]+(?:\.[0-9]+)?)(?:[\p{Zs}\t]*([%:eE]))?

I think you need something like this:
dosage_value = Regex.Replace(dosage_value, #"(\d+(\.\d*)?)\s*((E|e|%|:)+)\s*", #"$1$3 ");
Group 1 - (\d+(\.\d*)?)
Any number like 123 1241.23
Group 2 - ((E|e|%|:)+)
Any of special symbols like E e % :
Group 1 and Group 2 could be separated with any number of whitespaces.
If it's not working as you asking, please provide some samples to test.

For me it's too complex to be handled just by one regex. I suggest splitting into separate checks. See below code example - I used four different regexes, first is described in detail, the rest can be deduced based on first explanation.
using System.Text.RegularExpressions;
var testStrings = new string[]
{
"10mg",
"10:something",
"10 : something",
"10 %",
"40 e-5",
"40 E-05",
};
foreach (var testString in testStrings)
{
Console.WriteLine($"Input: '{testString}', parsed: '{RegexReplace(testString)}'");
}
string RegexReplace(string input)
{
// First look for exponential notation.
// Pattern is: match zero or more whitespaces \s*
// Then match one or more digits and store it in first capturing group (\d+)
// Then match one ore more whitespaces again.
// Then match part with exponent ([eE][-+]?\d+) and store it in second capturing group.
// It will match lower or uppercase 'e' with optional (due to ? operator) dash/plus sign and one ore more digits.
// Then match zero or more white spaces.
var expForMatch = Regex.Match(input, #"\s*(\d+)\s+([eE][-+]?\d+)\s*");
if(expForMatch.Success)
{
return $"{expForMatch.Groups[1].Value}{expForMatch.Groups[2].Value}";
}
var matchWithColon = Regex.Match(input, #"\s*(\d+)\s*:\s*(\w+)");
if (matchWithColon.Success)
{
return $"{matchWithColon.Groups[1].Value}:{matchWithColon.Groups[2].Value}";
}
var matchWithPercent = Regex.Match(input, #"\s*(\d+)\s*%");
if (matchWithPercent.Success)
{
return $"{matchWithPercent.Groups[1].Value}%";
}
var matchWithUnit = Regex.Match(input, #"\s*(\d+)\s*(\w+)");
if (matchWithUnit.Success)
{
return $"{matchWithUnit.Groups[1].Value} {matchWithUnit.Groups[2].Value}";
}
return input;
}
Output is:
Input: '10mg', parsed: '10 mg'
Input: '10:something', parsed: '10:something'
Input: '10 : something', parsed: '10:something'
Input: '10 %', parsed: '10%'
Input: '40 e-5', parsed: '40e-5'
Input: '40 E-05', parsed: '40E-05'

Sort list by a price, calculated by substrings from item inside list c#

I have a list that looks like this:
Apple - 67 $ - 345 PIECES - 19:03
Banana - 45 $ - 341 PIECES - 12:02
Monkey - 34 $ - 634 PIECES - 16:01
And I want to order that list by the result of (money * amount) ordered, sort of "highest order ranking"
finalResultOfTradeFiltering = finalResultOfTradeFiltering.OrderBy(x => Convert.ToDecimal(FindTextBetween(x,"-","$").Replace(" ", String.Empty)) * Convert.ToInt32(FindTextBetween(x, "-", "PIECES").Replace(" ", String.Empty))).ToList();
public string FindTextBetween(string text, string left, string right)
{
// TODO: Validate input arguments
int beginIndex = text.IndexOf(left); // find occurence of left delimiter
if (beginIndex == -1)
return string.Empty; // or throw exception?
beginIndex += left.Length;
int endIndex = text.IndexOf(right, beginIndex); // find occurence of right delimiter
if (endIndex == -1)
return string.Empty; // or throw exception?
return text.Substring(beginIndex, endIndex - beginIndex).Trim();
}
However my code keeps crashing stating that the format is incorrect
Any clue anyone?

First do this:
public class Trade
{
public string Product {get;set;}
public decimal Price {get;set;}
public int Quantity {get;set;}
public string Time {get;set;} //Make this a DateTime or new TimeOnly later
public string OriginalLine {get;set;}
public static Trade Parse(string input)
{
var result = new Trade();
result.OriginalLine = input;
//example line
//Apple - 67 $ - 345 PIECES - 19:03
var exp = new RegEx(#" - (\d+) [$] - (\d+) PIECES - (\d{1,2}:\d{2})");
var groups = exp.Match(input).Groups;
result.Price = Convert.ToDecimal(groups[1].Value);
result.Quantity = Convert.ToInt32(groups[2].Value);
result.Time = groups[3].Value;
int EndMarker = input.IndexOf(" - ");
result.Product = input.SubString(0, EndMarker).Trim();
return result;
}
}
Then use the type like so:
var result = finalResultOfTradeFiltering.
Select(t => Trade.Parse(t)).
OrderByDescending(t => t.Price * t.Quantity).
Select(t => t.OriginalLine);
Note the lack of a ToList() call. Sticking with IEnumerable as much as possible, rather than converting to a List again after each step, will save on RAM and sometimes CPU, making the code much faster and more efficient. Don't convert to a List until you really need to, which is likely much later than you would think.
Even better if you are able to convert the strings to objects much earlier in your process, and not return them back to strings until after all the other processing is finished.

You might use a pattern to get capture the digits and then do the calculation using Sort.
(\d+) \$ - (\d+) PIECES
(\d+) \$ Capture group 1 for the money, match 1+ digits, a space and $
- Match literally
(\d+) PIECES Capture group 2 for the amount, match 1+ digits and PIECES
Assuming the pattern matches for all the given strings:
List<string> finalResultOfTradeFiltering = new List<string>() {
"Apple - 67 $ - 345 PIECES - 19:03",
"Banana - 45 $ - 341 PIECES - 12:02",
"Monkey - 34 $ - 634 PIECES - 16:01"
};
var regex = #"(\d+) \$ - (\d+) PIECES";
finalResultOfTradeFiltering.Sort((a, b) => {
var matchA = Regex.Match(a, regex);
var matchB = Regex.Match(b, regex);
var valueA = int.Parse(matchA.Groups[1].Value) * int.Parse(matchA.Groups[2].Value);
var valueB = int.Parse(matchB.Groups[1].Value) * int.Parse(matchB.Groups[2].Value);
return valueB.CompareTo(valueA);
});
finalResultOfTradeFiltering.ForEach(s => Console.WriteLine(s));
Output
Apple - 67 $ - 345 PIECES - 19:03
Monkey - 34 $ - 634 PIECES - 16:01
Banana - 45 $ - 341 PIECES - 12:02

How to Regex Split an expression to accept decimal number?

I am trying to make a calculator. I am using Regex.Split() to get the number input from the expression. It works well with non-decimal digits but now I am finding a way to get the decimal number input as well.
string mExp = "1.50 + 2.50";
string[] strNum = (Regex.Split(mExp, #"[\D+]"));
num1 = double.Parse(strNum[0]);
num2 = double.Parse(strNum[1]);

You can change your regex to split on some number of spaces followed by an arithmetic operator, followed by spaces:
string[] strNum = (Regex.Split(mExp, #"\s*[+/*-]\s*"));
Console.WriteLine(string.Join("\n", strNum));
Output:
1.50
2.50
Demo on rextester
To deal with negative numbers, you have to make the regex a bit more sophisticated and add a lookbehind for a digit and a lookahead for either a digit or a -:
string mExp = "-1.50 + 2.50 -3.0 + -1";
string[] strNum = (Regex.Split(mExp, #"(?<=\d)\s*[+*/-]\s*(?=-|\d)"));
Console.WriteLine(string.Join("\n", strNum));
Output:
-1.50
2.50
3.0
-1
Demo on rextester

You can use the following regex for splitting for both non-decimal and decimal numbers:
[^\d.]+
Regex Demo
string[] strNum = (Regex.Split(mExp, #"[^\d.]+"));
Essentially saying to match anything except a digit or a dot character recursively and split by that match.

Regex masking of words that contain a digit

Trying to come up with a 'simple' regex to mask bits of text that look like they might contain account numbers.
In plain English:
any word containing a digit (or a train of such words) should be matched
leave the last 4 digits intact
replace all previous part of the matched string with four X's (xxxx)
So far
I'm using the following:
[\-0-9 ]+(?<m1>[\-0-9]{4})
replacing with
xxxx${m1}
But this misses on the last few samples below
sample data:
123456789
a123b456
a1234b5678
a1234 b5678
111 22 3333
this is a a1234 b5678 test string
Actual results
xxxx6789
a123b456
a1234b5678
a1234 b5678
xxxx3333
this is a a1234 b5678 test string
Expected results
xxxx6789
xxxxb456
xxxx5678
xxxx5678
xxxx3333
this is a xxxx5678 test string
Is such an arrangement possible with a regex replace?
I think I"m going to need some greediness and lookahead functionality, but I have zero experience in those areas.

This works for your example:
var result = Regex.Replace(
input,
#"(?<!\b\w*\d\w*)(?<m1>\s?\b\w*\d\w*)+",
m => "xxxx" + m.Value.Substring(Math.Max(0, m.Value.Length - 4)));
If you have a value like 111 2233 33, it will print xxxx3 33. If you want this to be free from spaces, you could turn the lambda into a multi-line statement that removes whitespace from the value.
To explain the regex pattern a bit, it's got a negative lookbehind, so it makes sure that the word behind it does not have a digit in it (with optional word characters around the digit). Then it's got the m1 portion, which looks for words with digits in them. The last four characters of this are grabbed via some C# code after the regex pattern resolves the rest.

I don't think that regex is the best way to solve this problem and that's why I am posting this answer. For so complex situations, building the corresponding regex is too difficult and, what is worse, its clarity and adaptability is much lower than a longer-code approach.
The code below these lines delivers the exact functionality you are after, it is clear enough and can be easily extended.
string input = "this is a a1234 b5678 test string";
string output = "";
string[] temp = input.Trim().Split(' ');
bool previousNum = false;
string tempOutput = "";
foreach (string word in temp)
{
if (word.ToCharArray().Where(x => char.IsDigit(x)).Count() > 0)
{
previousNum = true;
tempOutput = tempOutput + word;
}
else
{
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}
output = output + " " + word;
}
}
if (previousNum)
{
if (tempOutput.Length >= 4) tempOutput = "xxxx" + tempOutput.Substring(tempOutput.Length - 4, 4);
output = output + " " + tempOutput;
previousNum = false;
}

Have you tried this:
.*(?<m1>[\d]{4})(?<m2>.*)
with replacement
xxxx${m1}${m2}
This produces
xxxx6789
xxxx5678
xxxx5678
xxxx3333
xxxx5678 test string
You are not going to get 'a123b456' to match ... until 'b' becomes a number. ;-)

Here is my really quick attempt:
(\s|^)([a-z]*\d+[a-z,0-9]+\s)+
This will select all of those test cases. Now as for C# code, you'll need to check each match to see if there is a space at the beginning or end of the match sequence (e.g., the last example will have the space before and after selected)
here is the C# code to do the replace:
var redacted = Regex.Replace(record, #"(\s|^)([a-z]*\d+[a-z,0-9]+\s)+",
match => "xxxx" /*new String("x",match.Value.Length - 4)*/ +
match.Value.Substring(Math.Max(0, match.Value.Length - 4)));

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to get a part of string matching a regular expression (c#)? - c#

Related

Extracting integer ranges separated with hyphen

Simplify regex code in C#: Add a space between a digit/decimal and unit

Sort list by a price, calculated by substrings from item inside list c#

How to Regex Split an expression to accept decimal number?

Regex masking of words that contain a digit

Categories

Resources