Number extraction from strings using Regex - c#

i have this C# code that i found and then improved upon for my needs, but now i would like to make it work for all numeric data types.
public static int[] intRemover (string input)
{
string[] inputArray = Regex.Split (input, #"\D+");
int n = 0;
foreach (string inputN in inputArray) {
if (!string.IsNullOrEmpty (inputN)) {
n++;
}
}
int[] intarray = new int[n];
n = 0;
foreach (string inputN in inputArray) {
if (!string.IsNullOrEmpty (inputN)) {
intarray [n] = int.Parse (inputN);
n++;
}
}
return intarray;
}
This works well for trying to extract whole number integers out of strings but the issue that i have is that the regex expression i am using is not setup to account for numbers that are negative or numbers that contain a decimal point in them. My goal in the end like i said is to make a method out of this that works upon all numeric data types. Can anyone help me out please?

You can match it instead of splitting it
public static decimal[] intRemover (string input)
{
return Regex.Matches(input,#"[+-]?\d+(\.\d+)?")//this returns all the matches in input
.Cast<Match>()//this casts from MatchCollection to IEnumerable<Match>
.Select(x=>decimal.Parse(x.Value))//this parses each of the matched string to decimal
.ToArray();//this converts IEnumerable<decimal> to an Array of decimal
}
[+-]? matches + or - 0 or 1 time
\d+ matches 1 to many digits
(\.\d+)? matches a (decimal followed by 1 to many digits) 0 to 1 time
Simplified form of the above code
public static decimal[] intRemover (string input)
{
int n=0;
MatchCollection matches=Regex.Matches(input,#"[+-]?\d+(\.\d+)?");
decimal[] decimalarray = new decimal[matches.Count];
foreach (Match m in matches)
{
decimalarray[n] = decimal.Parse (m.Value);
n++;
}
return decimalarray;
}

try modifying you regular expression like this:
#"[+-]?\d+(?:\.\d*)?"

Related

Collect the digits numbers in the decimal number C#

I want a method to remove the comma from the decimal number and then collect the digits. For example, if the user inputs 1,3 it will remove the comma and collect 1 and 3 together. I mean 1+3 =4. Can I use trim or replace?
public int AddSum(string x1)
{
string x = x1.Trim();
int n = Convert.ToInt32(x);
return n;
}
public int AddSum(string x1)
{
var digitFilter = new Regex(#"[^\d]");
return digitFilter.Replace(x1, "").Select(c => int.Parse(c)).Sum();
}
OR
public int AddSum(string x1)
{
return x1.Where(c => char.IsDigit(c)).Select(c => c - '0').Sum();
}
If you want to iterate over the characters in a string and compute the sum of digits contained therein, it's trivial:
public static int SumOfDigits( string s ) {
int n = 0;
foreach ( char c in s ) {
n += c >= '0' && c <= '9' // if the character is a decimal digit
? c - '0' // - convert to its numeric value
: 0 // - otherwise, default to zero
; // and add that to 'n'
}
return n;
}
It sounds like you want take a comma-separated string of numbers, add the numbers together, then return the result.
The first thing you have to do is use the Split() method on the input string. The Split() method takes an input string splits the string into an array of strings based on a character:
string[] numbers = x1.Split(',');
So now we have an array of strings called numbers that hold each number. The next thing you have to do is create an empty variable to hold the running total:
int total = 0;
The next thing is to create a loop that will iterate through the numbers array and each time, add the number to the running total. Remember that numbers is an array of strings and not numbers. so we must use the Parse() method of int to convert the string to a number:
foreach (string number in numbers)
{
total += int.Parse(number);
}
Finally, just return the result:
return total;
Put it all together and you got this:
private static int AddSum(string x1)
{
string[] numbers = x1.Split(',');
int total = 0;
foreach (string number in numbers)
{
total += int.Parse(number);
}
return total;
}
I hope this helps and clarifies things. Keep in mind that this method doesn't do any kind of error checking, so if your input is bad, you'll get an exception probably.

Split string into multiple smaller strings

I have a multiline textbox that contains 10 digit mobile numbers separated by comma. I need to achieve string in group of at least 100 mobile numbers.
100 mobile numbers will be separated by 99 comma in total. What i am trying to code is to split the strings containing commas less than 100
public static IEnumerable<string> SplitByLength(this string str, int maxLength)
{
for (int index = 0; index < str.Length; index += maxLength) {
yield return str.Substring(index, Math.Min(maxLength, str.Length - index));
}
}
By using above code, I can achieve 100 numbers as 100 numbers will have 10*100(for mobile number)+99(for comma) text length. But the problem here is user may enter wrong mobile number like 9 digits or even 11 digits.
Can anyone guide me on how can I achieve this.
Thank you in advance.
You could use this extension method to put them into max-100 number groups:
public static IEnumerable<string[]> SplitByLength(this string str, string[] splitBy, StringSplitOptions options, int maxLength = int.MaxValue)
{
var allTokens = str.Split(splitBy, options);
for (int index = 0; index < allTokens.Length; index += maxLength)
{
int length = Math.Min(maxLength, allTokens.Length - index);
string[] part = new string[length];
Array.Copy(allTokens, index, part, 0, length);
yield return part;
}
}
Sample:
string text = string.Join(",", Enumerable.Range(0, 1111).Select(i => "123456789"));
var phoneNumbersIn100Groups = text.SplitByLength(new[] { "," }, StringSplitOptions.None, 100);
foreach (string[] part in phoneNumbersIn100Groups)
{
Assert.IsTrue(part.Length <= 100);
Console.WriteLine(String.Join("|", part));
}
You have a few options,
Put some kind of mask on the input data to prevent the user entering invalid data. In your UI you could then flag the error and prompt the user to reenter correct information. If you go down this route then something like this string[] nums = numbers.Split(','); will be fine.
Alternatively, you could use regex.split or regex.match and match on the pattern. Something like this should work assuming your numbers are in a string with a leading comma or space
Regex regex = new Regex("(\s|,)\d{10},)";
string[] nums = regex.Split(numbers);
This be can resolved with a simple Linq code
public static IEnumerable<string> SplitByLength(this string input, int groupSize)
{
// First split the input to the comma.
// this will give us an array of all single numbers
string[] numbers = input.Split(',');
// Now loop over this array in groupSize blocks
for (int index = 0; index < numbers.Length; index+=groupSize)
{
// Skip numbers from the starting position and
// take the following groupSize numbers,
// join them in a string comma separated and return
yield return string.Join(",", numbers.Skip(index).Take(groupSize));
}
}

Split a string on multiple delimiters and keep them in the output

I have an string that can be 2 to N chars long. I also have 4 ocode (each 2 chars long).
Is there a way to so something like:
var tmpArray = inputStr.Split(char1, char2, char3, char4).ToArray();
Say that the opcodes are A,B,C,D or 8 and I have this string AB123456789C123412341234B123 the array would be like this:
A
B
123456789
C
123412341234
B
123
This is all you need.
string toSplit = "AB123456789C123412341234B123";
string pattern = #"([ABCD])";
IEnumerable<string> substrings = Regex.Split(toSplit, pattern).Where(i => !String.IsNullOrWhiteSpace(i));
Test here: http://www.beansoftware.com/Test-Net-Regular-Expressions/Split-String.aspx
All you have to do is declare a character class [...] involving all your characters you want to split on, then you encompass that in (...) parens to keep the delimiters.
Try this,
private char[] alphabets = {'A','B','C', 'D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
var input = "AB123456789C123412341234B123";
var result = input.SplitAndKeep(alphabets).ToList();
public static class Extensions
{
public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
{
int start = 0, index;
while ((index = s.IndexOfAny(delims, start)) != -1)
{
if (index - start > 0)
yield return s.Substring(start, index - start);
yield return s.Substring(index, 1);
start = index + 1;
}
if (start < s.Length)
{
yield return s.Substring(start);
}
}
}
Use Regex.Split
var str = "AB123456789C123412341234B123";
Regex r = new Regex(#"([A-Z])|(\d*)");
var parts = r.Split(str).Where(x=> !string.IsNullOrWhiteSpace(x)).ToArray();
if you want just A,B,C and D, use this
Regex r = new Regex(#"([A-D])|(\d*)");
I find that Regex lookahead/lookbehind fits this scenario. The pattern basically says to split when a single letter is found behind or ahead of the current position. Then use Linq to not return any empty spaces as part of the result, which in this sample case the empty element would be the first element.
Lookahead/Lookbehind Reference
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string data = "AB123456789C123412341234B123";
var pieces = Regex.Split(data, "(?<=[a-zA-Z])|(?=[a-zA-Z])").Where(p => !String.IsNullOrEmpty(p));
foreach (string p in pieces)
Console.WriteLine(p);
}
}
Results:
A
B
123456789
C
123412341234
B
123
Fiddle Demo
Try first splitting on A only. Then split each result on B only. Then split each result on C only etc...
Splitting on A gives you:
B123456789C123412341234B123
which you know starts with A
Splitting on B gives you:
123456789C123412341234,
123
each of which you know starts with B and so on.

Count Words and Extract numbers from string and sum them

1) I need to count how much words i have in the sentence.
But what if i have more than one white space? It will count as a word. Need solution for this.
There is four words. / count as 4 words
There is four words. / count as 5 words
I use:
int countWords = txt.Split().Length;
2) I need to extract numbers from string and then get sum. My code is not working, No overload for method error.
All my code:
Console.Write("Ievadiet tekstu: ");
string txt = Console.ReadLine();
int sum = 0;
int countWords = txt.Split().Length;
foreach (char num in txt)
{
if (char.IsDigit(num))
sum += Int32.TryParse(num).ToString();
}
Console.WriteLine("There are {0} words in this sentence.",countWords);
Console.WriteLine("Summ is "+sum);
Use the overload of String.Split with StringSplitOptions.RemoveEmptyEntries. You can use an empty char[](or string[]) to get the same behaviour as String.Split without an argument, so that it splits by all white-space characters like space,tab or new-line characters.
If you want to sum the "words" which could be parsed to int then do that, use int.TryParse on all words which were extracted by String.Split. You could use LINQ:
string[] words = text.Split(new char[] {}, StringSplitOptions.RemoveEmptyEntries);
int wordCount = words.Length;
int num = 0;
int sum = words.Where(w => int.TryParse(w, out num)).Sum(w => num);
Here is a simple console app to do what you intend to.
It uses a Regular expression to capture number characters and sums them. The TryParse is just a fail-safe (i believe it is not needed in this case since the regex ensures only digits are captured).
static void Main(string[] args)
{
Regex digitRegex = new Regex("(\\d)");
string text = Console.ReadLine();
int wordCount = text.Split(new char[]{' '}, StringSplitOptions.RemoveEmptyEntries).Length;
int sum = 0;
foreach (Match x in digitRegex.Matches(text, 0))
{
int num;
if (int.TryParse(x.Value, out num))
sum += num;
}
Console.WriteLine("Word Count:{0}, Digits Total:{1}", wordCount, sum);
Console.ReadLine();
}
Hope it helps. Cheers

Converting a string number into sequence of digits - .Net 2.0

Given a string
string result = "01234"
I want to get the separate integers 0,1,2,3,4 from the string.
How to do that?
1
The following code is giving me the ascii values
List<int> ints = new List<int>();
foreach (char c in result.ToCharArray())
{
ints.Add(Convert.ToInt32(c));
}
EDIT: I hadn't spotted the ".NET 2.0" requirement. If you're going to do a lot of this sort of thing, it would probably be worth using LINQBridge, and see the later bit - particularly if you can use C# 3.0 while still targeting 2.0. Otherwise:
List<int> integers = new List<int>(text.Length);
foreach (char c in text)
{
integers.Add(c - '0');
}
Not as neat, but it will work. Alternatively:
List<char> chars = new List<char>(text);
List<int> integers = chars.ConvertAll(delegate(char c) { return c - '0'; });
Or if you'd be happy with an array:
char[] chars = text.ToCharArray();
int[] integers = Arrays.ConvertAll<char, int>(chars,
delegate(char c) { return c - '0'; });
Original answer
Some others have suggested using ToCharArray. You don't need to do that - string already implements IEnumerable<char>, so you can already treat it as a sequence of characters. You then just need to turn each character digit into the integer representation; the easiest way of doing that is to subtract the Unicode value for character '0':
IEnumerable<int> digits = text.Select(x => x - '0');
If you want this in a List<int> instead, just do:
List<int> digits = text.Select(x => x - '0').ToList();
Loop the characters and convert each to a number. You can put them in an array:
int[] digits = result.Select(c => c - '0').ToArray();
Or you can loop through them directly:
foreach (int digit in result.Select(c => c - '0')) {
...
}
Edit:
As you clarified that you are using framework 2.0, you can apply the same calculation in your loop:
List<int> ints = new List<int>(result.Length);
foreach (char c in result) {
ints.Add(c - '0');
}
Note: Specify the capacity when you create the list, that elliminates the need for the list to resize itself. You don't need to use ToCharArray to loop the characters in the string.
You could use LINQ:
var ints = result.Select(c => Int32.Parse(c.ToString()));
Edit:
Not using LINQ, your loop seems good enough. Just use Int32.Parse instead of Convert.ToInt32:
List<int> ints = new List<int>();
foreach (char c in result.ToCharArray())
{
ints.Add(Int32.Parse(c.ToString()));
}
string result = "01234";
List<int> list = new List<int>();
foreach (var item in result)
{
list.Add(item - '0');
}
Index into the string to extract each character. Convert each character into a number. Code left as an exercise for the reader.
another solution...
string numbers = "012345";
List<int> list = new List<int>();
foreach (char c in numbers)
{
list.Add(int.Parse(c.ToString()));
}
no real need to do a char array from the string since a string can be enumerated over just like an array.
also, the ToString() makes it a string so the int.Parse will give you the number instead of the ASCII value you get when converting a char.
List<int> ints = new List<int>();
foreach (char c in result.ToCharArray())
{
ints.Add(Convert.ToInt32(c));
}
static int[] ParseInts(string s) {
int[] ret = new int[s.Length];
for (int i = 0; i < s.Length; i++) {
if (!int.TryParse(s[i].ToString(), out ret[i]))
throw new InvalidCastException(String.Format("Cannot parse '{0}' as int (char {1} of {2}).", s[i], i, s.Length));
}
return ret;
}

Categories

Resources