Find integer in string using C# and divide it - c#

I am trying to find the integers in a .txt that I am reading in C# and divide them, but everything I have tried have not worked. Any tips?
this txt has values like this:
this is the string in the .txt file
queue1 6000
queue2 54888
queue3 1
but they change every second
expected output
queue1 0.6
queue2 5.4
queue3 0.0001
code
int value;
string text = System.IO.File.ReadAllText(#"N:\doc\mytext.txt");
if (int.TryParse(text, out value))
{
value/10000;
}
Console.ForegroundColor = ConsoleColor.Red;
System.Console.WriteLine(text);
thanks

There are several things your code needs to do:
Read the input
Find the target values in the input
Convert the target values into the desired result values
Replace the target values in the input string with the result values
Output the result
For this answer, I am assuming you are happy with your implementation of steps 1 and 5, so I'll only address steps 2, 3, and 4.
Step 2: find the target values
To find the target values in your input, you must first define what they are. According to your question and subsequent comments, your input string looks like this, with the target values bolded:
queue1 6000, queue2 54888, queue3 1
These are integer values, which are embedded in your text as discrete words.
The easiest way to find them is to use a regular expression. One thing that makes your case tricky is the fact that you have numbers embedded in your text that you do not want (e.g. in "queue1"). Fortunately, .NET regular expressions have a couple shortcuts that make it easy to write the necessary expression:
var re = new Regex(#"\b\d+\b");
\b matches a word boundary
\d matches any decimal digit
+ matches one or more of the preceding characters
So this regular expression will match distinct integers like "6000" while ignoring numbers embedded in other words like "queue1".
Step 3 - convert to desired result
This step has a couple sub-steps
Parse the string into a number
Calculate the desired result number
Format the desired result number
To parse the string into a number, you are using int.TryParse, which is one way to do it. Since the regular expression will find only valid integers, we can use the Parse method instead.
Your calculation is to divide by 10000, and you want the result to be a floating point number. If you use 10000 as your literal, the result will be an integer, so you need to do something to ensure that the result is floating point. An easy way is to use 10000.0 as your literal, like so:
int value = 6000;
var result = value / 10000.0; // typeof(result) == double
There are several options to format your output. The simplest starting point is to use the ToString method. The default behavior is often good enough, but if you want to specify the format, you can use the variant that takes a format string.
So the resulting code to parse, calculate, and format is like so:
(Int64.Parse(value) / 10000.0).ToString()
Step 4: replace the target values in the input
Since regular expressions are useful for finding things, they are also used to replace things. In your case, the replacement requires some logic that may not be handled by a straight regular expression (i.e., step 2). The .NET regular expression Replace method provides an overload with a MatchEvaluator parameter for just this scenario.
Each match that is found by the regular expression is passed to the match evaluator, which is responsible for returning the string to be used as the replacement. To make things simple, you can use a lambda expression to supply the match evaluator.
So when you put it all together, you get something like this:
string text = #"queue1 6000, queue2 54888, queue3 1";
var re = new Regex(#"\b\d+\b");
string output = re.Replace(text, m => (Int64.Parse(m.Value) / 10000.0).ToString());
// output == "queue1 0.6, queue2 5.4888, queue3 0.0001"

You can use Regular expressions to get all of the numbers in a string. The string could be the content of a file that you got from System.IO.File.ReadAlltext (for example).
You can use "[0-9]+" to get all of the integers or "[0-9]+(.[0-9]+)?" to get both integers and floating points.
var pattern = "[0-9]" // or "[0-9]+(.[0-9]+)?"
int[] integers = System.Text.RegularExpressions.Regex.Matches(text, pattern).Cast<Match>().Select(m => int.Parse(m.Value)).ToArray();
// for floats
//float[] floats = System.Text.RegularExpressions.Regex.Matches(text, pattern).Cast<Match>().Select(m => float.Parse(m.Value)).ToArray();
In regular expressions you define a pattern then search the string for tokens that answer that pattern. If we need to search for integers then we can use the "[0-9]+" pattern. This pattern will search for all of the tokens that are between 0-9 (inclusive) with at least one character (one digit - the '+' make sure of that).
If we need to search for floating points then we need to use "[0-9]+(.[0-9]+)?" which is composed from the integers pattern ("[0-9]+") with addition to an optional floating point ("(.[0-9]+)?"). The '?' sign means that this token can exist zero or one time.
you can find more information about Regular Expressions and patterns here

You can use Regular Expressions for that:
Regex.Matches(text, #"(?<=^|\s)[0-9]+(?=$|\s)").Cast<Match>().Select(m => int.Parse(m.Value));
First, you select all parts of the string that resemble an integer, and then convert the resulting strings into integers.
(?=$|\s) is a Zero-width positive lookahead assertion. It checks if there is a whitespace or end of string at the end of the match.
(?<=^|\s) is a Zero-width positive lookbehind assertion. It checks if there is a whitespace or start of string at the beginning of the match.
You can find documentation on that here.
These are necessary so that the whitespaces are not included by the matches. If they were, you'd only get one of two integers if they are only separated by one whitespace.

Related

How to match regular expression starting exactly at a given index?

With the .NET Regex class, is there any way to match a regular expression inside a string only if the match starts exactly at a specific character index?
Let's look at an example:
regular expression ab
input string: ababab
Now, I can search for matches for the regular expression (named expr in the following) in the input string, for instance, starting at character index 2:
var match = expr.Match("ababab", 2);
// match ------------->XXab
This will be successful and return a match at index 2.
If I pass index 1, this will also be successful, pointing to the same occurrence as above:
var match = expr.Match("ababab", 1);
// match ------------->X ab
Is there any efficient way to have the second test fail, because the match does not start exactly at the specified index?
Obviously, there are some work-arounds to this.
As my string in which testing occurs might be ... "long" (think possibly 4 digit numbers of characters), I would, however, prefer to avoid the overhead that would presumably occur in all three cases one way or another:
#
Work-Around
Drawback
1
I could check the resulting match to see whether its Index property matches the supplied index.
Matching throughout the entire string would still take place, at least until the first match is found (or the end of the string is reached).
2
I could prepend the start anchor ^ to my regular expression and always test just the substring starting at the specified index.
As the string may be very long and I might be testing the same regex on multiple starting positions (but, again, only exactly on these), I am concerned about performance drawbacks from the frequent partial copying of the long string. (Ranges might be a way out here, but unfortunately, the Regex class cannot (yet?) be used to scan them.)
3
I could prepend "^.{#}" (with # being replaced with the character index to test) for each expression and match from the beginning, then fish out the actually interesting match with a capturing group.
I need to test the same regex on multiple possible start positions throughout my input string. As each time, the number of skipped characters changes, that would mean compiling a new regex every time, rather than re-using the one that I have, which again feels somewhat unclean.
Lastly, the Match overload that accepts a maximum length to check in addition to the start index does not seem useful, as in my case, the regular expression is not fixed and may well include variable-length portions, so I have no idea about the expected length of a match in advance.
It appears you can use the \G operator, \Gab pattern will allow you to match at the second index and will fail at the first one, see this C# demo:
Regex expr = new Regex(#"\Gab");
Console.WriteLine(expr.Match("ababab", 1)?.Success); // => False
Regex expr2 = new Regex(#"\Gab");
Console.WriteLine(expr2.Match("ababab", 2)?.Success); // => True
As per the documentation, \G operator matches like this:
The match must occur at the point where the previous match ended, or if there was no previous match, at the position in the string where matching started."

C# Regex Anomaly

I'm a bit perplexed here.
I have a Regex which is to limit decimal places to two points.
My second and third captures work as expected. But including the 1st capture ($1) corrupts the string and includes all the decimal places (I get the original string).
var t = "553.17765";
var from = #"(\d+)(\.*)(\d{0,2})";
var to = "$1$2$3";
var rd = Regex.Replace(t, from,to);
var r = Regex.Match(t, from);
Why can't I get the 553 in the $1 variable?
LinqPad
What is happening is that you are matching the number multiple times, once before the . and once after. You could work around that by looking for the longest match, but it seems you could improve your Regex instead
(\d+\.?\d{0,2})
Steps are as follows
The capture group covers the whole number at once.
Look for digits, greedy match.
Look for a decimal point, either one or none.
Look for zero to two digits
Furthermore, if you want to replace using Regex.Replace you need something to match the rest of the string.
text = Regex.Replace(text, #".+?(\d+\.?\d{0,2}).+", "$1");
dotnetfiddle
Your example does not work because it triggers twice per definition. The statement (\d+)(\.*)(\d{0,2}) will split the string 553.17765 as follows:
Match 1: 533.17
$1 = 553
$2 = .
$3 = 17
Replace 533.17 with 553.17
Match 2: 765
$1 = 765
Replace 765 with 765
The first match includes - as expected - only two of the decimal places. With this action, the match is complete and the regex starts looking for the next match, because Replace replaces all matches, not the first one only. As you can see, this regex does nothing by design.
The way replace works btw. is to find a match and replace the whole match with the replace pattern. So no need to include the surrounding text. The problem is, that your regex matches too well. It only matches the first two decimal places. Therefore the match only includes the first two decimal places.
That means that whatever you will replace that with, will only replace 553.17 and nothing more. For finding decimal numbers this is good. For replacing not so much, here you want to find the whole number with all decimal places and then replace it.
So a working replace regex would look like this: (\d+\.\d{1,2})\d*. First there is only one capture group, as we don't intend to change the order of numbers around. Second, the point is required as we are only interested to replace numbers that actually have decimal places. Same reason we need at least one, up to two, decimal places. Every decimal place after that is optional, but will be captured greedily to give the whole number to the match so it will be replaced completely.
Match 1: 533.17765
$1 = 533.17
Replace 533.17765 with 533.17
This regex does not handle thousands-separators btw, if that is required.

Regex to ensure that in a string such as "05123:12315", the first number is less than the second?

I must have strings in the format x:y where x and y have to be five digits (zero padded) and x <= y.
Example:
00515:02152
What Regex will match this format?
If possible, please explain the solution briefly to help me learn.
EDIT: Why do I need Regex? I've written a generic tool that takes input and validates it according to a configuration file. An unexpected requirement popped up that would require me to validate a string in the format I've shown (using the configuration file). I was hoping to solve this problem using the existing configuration framework I've coded up, as splitting and parsing would be out of the scope of this tool. For an outstanding requirement such as this, I don't mind having some unorthodox/messy regex, as long as it's not 10000 lines long. Any intelligent solutions using Regex are appreciated! Thanks.
Description
This expression will validate that the first 5 digit number is smaller then the second 5 digit number where zero padded 5 digit numbers are in a : delimited string and is formatted as 01234:23456.
^
(?:
(?=0....:[1-9]|1....:[2-9]|2....:[3-9]|3....:[4-9]|4....:[5-9]|5....:[6-9]|6....:[7-9]|7....:[8-9]|8....:[9])
|(?=(.)(?:0...:\1[1-9]|1...:\1[2-9]|2...:\1[3-9]|3...:\1[4-9]|4...:\1[5-9]|5...:\1[6-9]|6...:\1[7-9]|7...:\1[8-9]|8...:\1[9]))
|(?=(..)(?:0..:\2[1-9]|1..:\2[2-9]|2..:\2[3-9]|3..:\2[4-9]|4..:\2[5-9]|5..:\2[6-9]|6..:\2[7-9]|7..:\2[8-9]|8..:\2[9]))
|(?=(...)(?:0.:\3[1-9]|1.:\3[2-9]|2.:\3[3-9]|3.:\3[4-9]|4.:\3[5-9]|5.:\3[6-9]|6.:\3[7-9]|7.:\3[8-9]|8.:\3[9]))
|(?=(....)(?:0:\4[1-9]|1:\4[2-9]|2:\4[3-9]|3:\4[4-9]|4:\4[5-9]|5:\4[6-9]|6:\4[7-9]|7:\4[8-9]|8:\4[9]))
)
\d{5}:\d{5}$
Live demo: http://www.rubular.com/r/w1QLZhNoEa
Note that this is using the x option to ignore all white space and allow comments, if you use this without x then the expression will need to be all on one line
The language you want to recognize is finite, so the easiest thing to do is just list all the cases separated by "or". The regexp you want is:
(00000:[00000|00001| ... 99999])| ... |(99998:[99998|99999])|(99999:99999)
That regexp will be several billion characters long and take quite some time to execute, but it is what you asked for: a regular expression that matches the stated language.
Obviously that's impractical. Now is it clear why regular expressions are the wrong tool for this job? Use a regular expression to match 5 digits - colon - five digits, and then once you know you have that, split up the string and convert the two sets of digits to integers that you can compare.
x <= y.
Well, you are using wrong tool. Really, regex can't help you here. Or even if you get a solution, that will be too complex, and will be too difficult to expand.
Regex is a text-processing tool to match pattern in regular languages. It is very weak when it comes to semantics. It cannot identify meaning in the given string. Like in your given condition, to conform to x <= y condition, you need to have the knowledge of their numerical values.
For e.g., it can match digits in a sequence, or a mix of digits and characters, but what it cannot do is the stuff like -
match a number greater than 15 and less than 1245, or
match a pattern which is a date between given two dates.
So, where-ever matching a pattern, involves applying semantics to the matched string, Regex is not an option there.
The appropriate way here would be to split the string on colon, and then compare numbers. For leading zero, you can find some workaround.
You can't generally* do this with regex. You can use regex to match the pattern and extract the numbers, then compare the numbers in your code.
For example to match such format (without comparing the numbers) and get the numbers you could use:
^(\d{5}):(\d{5})\z
*) You probably could in this case (as the numbers are always 5 digits and zero padded, but it wouldn't be nice.
You should do something like this instead:
bool IsCorrect(string s)
{
string[] split = s.split(':');
int number1, number2;
if (split.Length == 2 && split[0].Length == 5 && split[1].Length == 5)
{
if (int.TryParse(split[0], out number1) && int.TryParse(split[1], out number2) && number1 <= number2)
{
return true;
}
}
return false;
}
With regex you can't make comparisons to see if a number is bigger than another number.
Let me show you a good example of why you shouldn't try to do this. This is a regex that (nearly) does the same job.
https://gist.github.com/anonymous/ad74e73f0350535d09c1
Raw file:
https://gist.github.com/anonymous/ad74e73f0350535d09c1/raw/03ea835b0e7bf7ac3c5fb6f9c7e934b83fb09d95/gistfile1.txt
Except it's just for 3 digits. For 4, the program that generates these fails with an OutOfMemoryException. With gcAllowVeryLargeObjects enabled. It went on until 5GB until it crashed. You don't want most of your app to be a Regex, right?
This is not a Regex's job.
This is a two step process because regex is a text parser and not analyzer. But with that said, Regex is perfect for validating that we have the 5:5 number pattern and this regex pattern will determine if we have that form factor \d\d\d\d\d:\d\d\d\d\d right. If that form factor is not found then a match fails and the whole validation fails. If it is valid, we can use regex/linq to parse out the numbers and then check for validity.
This code would be inside a method to do the check
var data = "00515:02151";
var pattern = #"
^ # starting from the beginning of the string...
(?=[\d:]{11}) # Is there is a string that is at least 11 characters long with only numbers and a ;, fail if not
(?=\d{5}:\d{5}) # Does it fall into our pattern? If not fail the match
((?<Values>[^:]+)(?::?)){2}
";
// IgnorePatternWhitespace only allows us to comment the pattern, it does not affect the regex parsing
var result = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => mt.Groups["Values"].Captures
.OfType<Capture>()
.Select (cp => int.Parse(cp.Value)))
.FirstOrDefault();
// Two values at this point 515, 2151
bool valid = ((result != null) && (result.First () < result.Last ()));
Console.WriteLine (valid); // True
Using Javascript this can work.
var string = "00515:02152";
string.replace(/(\d{5})\:(\d{5})/, function($1,$2,$3){
return (parseInt($2)<=parseInt($3))?$1:null;
});
FIDDLE http://jsfiddle.net/VdzF7/

What Regex would I use to match file names with the format 'number-text-text'?

I have code that searches a folder that contains SQL patch files. I want to add file names to an array if they match the following name format:
number-text-text.sql
What Regex would I use to match number-text-text.sql?
Would it be possible to make the Regex match file names where the number part of the file name is between two numbers? If so what would be the Regex syntax for this?
The following regex make it halfway there:
\d+-[a-zA-Z]+-[a-zA-Z]+\.sql
Regarding to match in a specific range it gets trickier as regex doesn't have a simple way to handle ranges. To limit the match to a filename with a number between 2 and 13 this is the regex:
([2-9]|1[0-3])-[a-zA-Z]+-[a-zA-Z]+\.sql
Your regular expression should be:
(\d+)-[a-zA-Z]+-[a-zA-Z]+\.sql
You would then use the first captured group to check if your number is between the two numbers you desire. Don't try to check if a number is within a range with a regular expression; do it in two steps. Your code will be much clearer for it.
How about:
\d+-[^-]+-[^-]+\.sql
Edit: You want just letters, so here it is without specific ranges.
\d+-[a-z]+-[a-z]+\.sql - You'll also want to use the i flag, not sure how that's done in c#, but here it is in js/perl:
/\d+-[a-z]+-[a-z]+\.sql/i
Ranges are more difficult. Here's an example of how to match 0-255:
([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])
So to match (0-255)-text-text.sql, you'd have this:
/^(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])-[a-z]+-[a-z]+\.sql/i
(I put the digits in a non-capturing group and matched from the beginning of the string to prevent partial matches on the number and in case you're expecting numbered groups or something).
Basically every time you need another digit of possibility, you'll need to add a new condition inside this case. The smaller the digit you'd like to match, the more cases you'll need as well. What is your desired min/max? AFAIK there's not a simple way to do this dynamically (although I'd love for someone to show me I'm wrong about that).
The simplest way to get around this would be to simply capture the digits, and use native syntax to see if it's in your range. Example in js:
var match = filename.match(/(\d+)-[a-z]+-[a-z]+\.sql/i);
if(match && match[1] < maximumNumber && match[1] > minimumNumber){
doStuff();
}
This should work:
select '4-dfsg-asdfg.sql' ~ E'^[0-9]+-[a-zA-Z]+-[a-zA-Z]+\\.sql$'
This restricts the TEXT to simple ASCII characters. May or may not be what you want.
This is tested in PostgreSQL. Regular expression flavors differ a lot between implementations. You probably know that?
Anchors at begin ^ and end $ are optional, depending how you are going to do it.

Check Formatting of a String

This has probably been answered somewhere before but since there are millions of unrelated posts about string formatting.
Take the following string:
24:Something(true;false;true)[0,1,0]
I want to be able to do two things in this case. I need to check whether or not all the following conditions are true:
There is only one : Achieved using Split() which I needed to use anyway to separate the two parts.
The integer before the : is a 1-3 digit int Simple int.parse logic
The () exists, and that the "Something", in this case any string less than 10 characters, is there
The [] exists and has at least 1 integer in it. Also, make sure the elements in the [] are integers separated by ,
How can I best do this?
EDIT: I have crossed out what I've achieved so far.
A regular expression is the quickest way. Depending on the complexity it may also be the most computationally expensive.
This seems to do what you need (I'm not that good so there might be better ways to do this):
^\d{1,3}:\w{1,9}\((true|false)(;true|;false)*\)\[\d(,[\d])*\]$
Explanation
\d{1,3}
1 to 3 digits
:
followed by a colon
\w{1,9}
followed by a 1-9 character alpha-numeric string,
\((true|false)(;true|;false)*\)
followed by parenthesis containing "true" or "false" followed by any number of ";true" or ";false",
\[\d(,[\d])*\]
followed by another set of parenthesis containing a digit, followed by any number of comma+digit.
The ^ and $ at the beginning and end of the string indicate the start and end of the string which is important since we're trying to verify the entire string matches the format.
Code Sample
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
bool isFormattedCorrectly = regex.IsMatch(input);
Credit # Ian Nelson
This is one of those cases where your only sensible option is to use a Regular Expression.
My hasty attempt is something like:
var input = "24:Something(true;false;true)[0,1,0]";
var regex = new System.Text.RegularExpressions.Regex(#"^\d{1,3}:.{1,9}\(.*\)\[\d(,[\d])*\]$");
System.Diagnostics.Debug.Assert(regex.IsMatch(input));
This online RegEx tester should help refine the expression.
I think, the best way is to use regular expressions like this:
string s = "24:Something(true;false;true)[0,1,0]";
Regex pattern = new Regex(#"^\d{1,3}:[a-zA-z]{1,10}\((true|false)(;true|;false)*\)\[\d(,\d)*\]$");
if (pattern.IsMatch(s))
{
// s is valid
}
If you want anything inside (), you can use following regex:
#"^\d{1,3}:[a-zA-z]{1,10}\([^:\(]*\)\[\d(,\d)*\]$"

Categories

Resources