3-digit grouping of all numbers in an alphanumeric string - c#

I found it not efficient to iterate through string parts split by space character and extract numeric parts and apply
UInt64.Parse(Regex.Match(numericPart, #"\d+").Value)
and the concatenating them together to form the string with numbers being grouped.
Is there a better, more efficient way to 3-digit grouping of all numbers in an string containing other characters?

I am pretty sure the most efficient way (CPU-wise, with just a single pass over the string) is the basic foreach loop, along these lines
var sb = new StringBuilder()
foreach(char c in inputString)
{
// if c is a digit count
// else reset counter
// if there are three digits insert a "."
}
return sb.ToString()
This will produce 123.456.7
If you want 1.234.567 you'll need an additional buffer for digit-sequences

So you want to replace all longs in a string with the same long but with a number-group-separator of the current culture? .... Yes
string[] words = input.Split();
var newWords = words.Select(w =>
{
long l;
bool isLong = System.Int64.TryParse(w.Trim(), out l);
if(isLong)
return l.ToString("N0");
else
return w;
});
string result = string.Join(" ", newWords);
With the input from your comment:
string input = "hello 134443 in the 33 when 88763 then";
You get the expected result: "hello 134,443 in the 33 when 88,763 then", if your current culture uses comma as number-group-separator.

I will post my regex-based example. I believe regex does not have to be too slow, especially once it is compiled and is declared with static and readonly.
// Declare the regex
private static readonly Regex regex = new Regex(#"(\d)(?=(\d{3})+(?!\d))", RegexOptions.Compiled);
// Then, somewhere inside a method
var replacement = string.Format("$1{0}", System.Globalization.CultureInfo.CurrentCulture.NumberFormat.NumberGroupSeparator); // Get the system digit grouping separator
var strn = "Hello 34234456 where 3334 is it?"; // Just a sample string
// Somewhere (?:inside a loop)?
var res = regex.Replace(strn, replacement);
Output (if , is a system digit grouping separator):
Hello 34,234,456 where 3,334 is it?

Related

Remove non numeric characters from string and cast numbers as an int into an array

I have a string: "ABD1254AGSHF56984,5845fhfhjekf!54685" and I want to loop through each character in this string and find any non numeric character and remove the characters until the next numeric character. i.e. remove all the non numeric characters from my string but these should be an indiciator that the numbers are seperate.
Output as an integer array:
1254
56984
5845
54685
These should be put into an array and converted as integers.
My attempt below but this just puts all the numbers as one rather than splitting them up based on the non numeric characters:
var input = "ABD1254AGSHF56984,5845fhfhjekf!54685";
var numeric = new String(input.Where(char.IsDigit).ToArray());
//This is the output of my attempt: 125456984584554685
You can use Regex to split up your numbers and letters,
string line = "ABD1254AGSHF56984,5845fhfhjekf!54685";
// This splits up big string into a collection of strings until anything other than a character is seen.
var words = Regex.Matches(line, #"[a-zA-Z]+");
// This gives you a collection of numbers...
var numbers = Regex.Matches(line, #"[0-9]+");
foreach (Match number in numbers)
{
Console.WriteLine(number.Value);
}
// prints
1254
56984
5845
54685
Regex Documentation should be read before implementation for better understanding.
If you can use the .NET Framework, there's a great class called System.Text.RegularExpressions.Regex that can be your new best friend.
var input = "ABD1254AGSHF56984,5845fhfhjekf!54685";
var numeric = Regex.Replace(input, #"[^\d]+", "\n").Trim();
What the above method is doing is it is looking for one or more non-decimal characters and replacing it with a carriage return (\n). The Trim() ensures that it removes any leading or trailing carriage returns.
Output:
1254
56984
5845
54685
There is no need for black magic to solve such a problem. You can solve it with a for-loop. It's been there since the begining of programming and is designed for such stuff ;)
check IF the character is a number. Then start collecting the digits
ELSE you hit a boundary and you can convert the collected digits to a single int and clear your temporal digit storage:
the only tricky thing here is that if you have a number at the end (where there is no non-numeric character as boundary afterwards) you need to check whether you hit the final boundary.
var input = "ABD1254AGSHF56984,5845fhfhjekf!54685";
string separateNumber = "";
List<int> collection = new List<int>();
for (int i = 0; i < input.Length; i++)
{
if (Char.IsDigit(input[i]))
{
separateNumber += input[i];
if (i == input.Length -1) // ensures that the last number is caught
{
collection.Add(Convert.ToInt32(separateNumber));
}
}
else if (string.IsNullOrEmpty(separateNumber) == false)
{
collection.Add(Convert.ToInt32(separateNumber));
separateNumber = "";
}
}
You're almost there, all you need is to not pass the results of your LINQ query to a new String() instance. This should work:
var numeric = input.Where(char.IsDigit).ToArray();

Split string and string arrays

string s= abc**xy**efg**xy**ijk123**xy**lmxno**xy**opq**xy**rstz;
I want the output as string array, where it get splits at "xy". I used
string[] lines = Regex.Split(s, "xy");
here it removes xy. I want array along with xy. So, after I split my string to string array, array should be as below.
lines[0]= abc;
lines[1]= xyefg;
lines[2]= xyijk123;
lines[3]= xylmxno;
lines[4]= xyopq ;
lines[5]= xyrstz;
how can i do this?
(?=xy)
You need to split on 0 width assertion.See demo.
https://regex101.com/r/fM9lY3/50
string strRegex = #"(?=xy)";
Regex myRegex = new Regex(strRegex, RegexOptions.None);
string strTargetString = #"abcxyefgxyijk123xylmxnoxyopqxyrstz";
return myRegex.Split(strTargetString);
Output:
abc
xyefg
xyijk123
xylmxno
xyopq
xyrstz
It seems fairly simple to do this:
string s = "abc**xy**efg**xy**ijk123**xy**lmxno**xy**opq**xy**rstz";
string[] lines = Regex.Split(s, "xy");
lines = lines.Take(1).Concat(lines.Skip(1).Select(l => "xy" + l)).ToArray();
I get the following result:
I don't know if you wanted to keep the ** - your question doesn't make it clear. Changing the RegEx to #"\*\*xy\*\*" will remove the **.
If you're not married to Regex, you could make your own extension method:
public static IEnumerable<string> Ssplit(this string InputString, string Delimiter)
{
int idx = InputString.IndexOf(Delimiter);
while (idx != -1)
{
yield return InputString.Substring(0, idx);
InputString = InputString.Substring(idx);
idx = InputString.IndexOf(Delimiter, Delimiter.Length);
}
yield return InputString;
}
Usage:
string s = "abc**xy**efg**xy**ijk123**xy**lmxno**xy**opq**xy**rstz";
var x = s.Ssplit("xy");
How about simply looping throgh the array starting with index 1 and adding the "xy" string to each entry?
Alternatively implement your own version of split that cuts the string how you want it.
Yeat another solution would be matching "xy*" in a non-greedy way and your array would be the list of all matches. Depending on language this probably won't be called split BTW.

How to extract specific number in a string surrounded by numbers and text C#

I am trying to extract specific number in a string with a format of "Q23-00000012-A14" I only wanted to get the numbers in 8 digit 00000000 the 12.
string rx = "Q23-00000012-A14"
string numb = Regex.Replace(rx, #"\D", "");
txtResult.Text = numb;
But im getting the result of 230000001214, I only want to get the 12 and disregard the rest. Can someone guide me.
If your string are always in this format (numbers are covered with "-"), I suggest useing string.split()
string rx = "Q23-00000012-A14"
string numb = int.parse(rx.Split('-')[1]).ToString();//this will get 12 for you
txtResult.Text = numb;
It's an easier way than using regex
Edit!! When you use rx.split('-') , it break string into array of strings with value of splited texts before and after '-'
So in this case:
rx.Split('-')[0]= "Q23"
rx.Split('-')[1]= "00000012"
rx.Split('-')[2]= "A12"
So you shouldn't use Replace. Use Match instead.
string pattern = #"[A-Z]\d+-(\d+)-[A-Z]\d+" ;
var regex = new Regex(pattern);
var match = regex.Match("Q23-00000012-A14");
if (match.Success)
{
String eightNumberString = match.Groups[1].Value; // Contains "00000012"
int yourvalueAsInt = Convert.ToInt32(eightNumberString) ; // Contains 12
}
Why you use don't simply substring or split function ?
string rx = "Q23-00000012-A14";
// substring
int numb = int.Parse(rx.Substring(5, 8));
// or split
int numb = int.Parse(rx.Split('-')[1]);
txtResult.Text = numb.ToString();
(I think it's a better way to use split method because if you change your constant 'Q23' length the method still work)

Search string pattern

If I have a string like MCCORMIC 3H R Final 08-26-2011.dwg or even MCCORMIC SMITH 2N L Final 08-26-2011.dwg and I wanted to capture the R in the first string or the L in the second string in a variable, what is the best method for doing so? I was thinking about trying the below statement but it does not work.
string filename = "MCCORMIC 3H R Final 08-26-2011.dwg"
string WhichArea = "";
int WhichIndex = 0;
WhichIndex = filename.IndexOf("Final");
WhichArea = filename.Substring(WhichIndex - 1,1); //Trying to get the R in front of word Final
Just split by space:
var parts = filename.Split(new [] {' '},
StringSplitOptions.RemoveEmptyEntries);
WhichArea = parts[parts.Length - 3];
It looks like the file names have a very specific format, so this will work just fine.
Even with any number of spaces, using StringSplitOptions.RemoveEmptyEntries means spaces will not be part of the split result set.
Code updated to deal with both examples - thanks Nikola.
I had to do something similar, but with Mirostation drawings instead of Autocad. I used regex in my case. Here's what I did, just in case you feel like making it more complex.
string filename = "MCCORMIC 3H R Final 08-26-2011.dwg";
string filename2 = "MCCORMIC SMITH 2N L Final 08-26-2011.dwg";
Console.WriteLine(TheMatch(filename));
Console.WriteLine(TheMatch(filename2));
public string TheMatch(string filename) {
Regex reg = new Regex(#"[A-Za-z0-9]*\s*([A-Z])\s*Final .*\.dwg");
Match match = reg.Match(filename);
if(match.Success) {
return match.Groups[1].Value;
}
return String.Empty;
}
I don't think Oded's answer covers all cases. The first example has two words before the wanted letter, and the second one has three words before it.
My opinion is that the best way to get this letter is by using RegEx, assuming that the word Final always comes after the letter itself, separated by any number of spaces.
Here's the RegEx code:
using System.Text.RegularExpressions;
private string GetLetter(string fileName)
{
string pattern = "\S(?=\s*?Final)";
Match match = Regex.Match(fileName, pattern);
return match.Value;
}
And here's the explanation of RegEx pattern:
\S(?=\s*?Final)
\S // Anything other than whitespace
(?=\s*?Final) // Positive look-ahead
\s*? // Whitespace, unlimited number of repetitions, as few as possible.
Final // Exact text.

Go to each white space in a string. C#

Is it possible to pass over a string, finding the white spaces?
For example a data set of:
string myString = "aa bbb cccc dd";
How could I loop through and detect each white space, and manipulate that space?
I need to do this in the most effecient way possible.
Thanks.
UPDATE:
I need to manipulate the space by increasing the white space from an integer value. So for instance increase the space to have 3 white spaces instead of one. I'd like to make it go through each white space in one loop, any method of doing this already in .NET? By white space I mean a ' '.
You can use the Regex.Replace method. This will replace any group of white space character with a dash:
myString = Regex.Replace(myString, "(\s+)", m => "-");
Update:
This will find groups of space characters and replace with the tripple amount of spaces:
myString = Regex.Replace(
myString,
"( +)",
m => new String(' ', m.Groups[1].Value.Length * 3)
);
However, that's a bit too simple to make use of regular expressions. You can do the same with a regular replace:
myString = myString.Replace(" ", " ");
This will replace each space intead of replace groups of spaces, but the regular replace is much simpler than Regex.Replace, so it should still be at least as fast, and the code is simpler.
If you want to replace all whitespace in one swoop, you can do:
// changes all strings to dashes
myString.Replace(' ', '-');
If you want to go case by case (that is, not just a mass replace), you can loop through IndexOf():
int pos = myString.IndexOf(' ');
while (pos >= 0)
{
// do whatever you want with myString # pos
// find next
pos = myString.IndexOf(' ', pos + 1);
}
UPDATE
As per your update, you could replace single spaces with the number of spaces specified by a variable (such as numSpaces) as follows:
myString.Replace(" ", new String(' ', numSpaces));
If you just want to replace all spaces with some other character:
myString = myString.Replace(' ', 'x');
If you need the possibility of doing something different to each:
foreach(char c in myString)
{
if (c == ' ')
{
// do something
}
}
Edit:
Per your comment clarifying your question:
To change each space to three spaces, you can do this:
myString = myString.Replace(" ", " ");
However note that this doesn't take into account instances where your input string already has two or more spaces. If that is a possibility you will want to use a regex.
Depending on what you're tring to do:
for(int k = 0; k < myString.Length; k++)
{
if(myString[k].IsWhiteSpace())
{
// do something with it
}
}
The above is a single pass through the string, so it's O(n). You can't really get more efficient that that.
However, if you want to manipulate the original string your best bet is to Use a StringBuilder to process the changes:
StringBuilder sb = new StringBuilder(myString);
for(int k = 0; k < myString.Length; k++)
{
if(myString[k].IsWhiteSpace())
{
// do something with sb
}
}
Finally, don't forget about Regular Expressions. It may not always be the most efficient method in terms of code run-time complexity but as far as efficiency of coding it may be a good trade-off.
For instance, here's a way to match all white spaces:
var rex = new System.Text.RegularExpressions.Regex("[^\\s](\\s+)[^\\s]");
var m = rex.Match(myString);
while(m.Success)
{
// process the match here..
m.NextMatch();
}
And here's a way to replace all white spaces with an arbitrary string:
var rex = new System.Text.RegularExpressions.Regex("\\s+");
String replacement = "[white_space]";
// replaces all occurrences of white space with the string [white_space]
String result = rex.Replace(myString, replacement);
Use string.Replace().
string newString = myString.Replace(" ", " ");
LINQ query below returns a set of anonymous type items with two properties - "sybmol" represents a white space character, and "index" - index in the input sequence. After that you have all whitespace characters and a position in the input sequence, now you can do what you want with this.
string myString = "aa bbb cccc dd";
var res = myString.Select((c, i) => new { symbol = c, index = i })
.Where(c => Char.IsWhiteSpace(c.symbol));
EDIT: For educational purposes below is implementation you are looking for, but obviously in real system use built in string constructor and String.Replace() as shown in other answers
string myString = "aa bbb cccc dd";
var result = this.GetCharacters(myString, 5);
string output = new string(result.ToArray());
public IEnumerable<char> GetCharacters(string input, int coeff)
{
foreach (char c in input)
{
if (Char.IsWhiteSpace(c))
{
int counter = coeff;
while (counter-- > 0)
{
yield return c;
}
}
else
{
yield return c;
}
}
}
var result = new StringBuilder();
foreach(Char c in myString)
{
if (Char.IsWhiteSpace(c))
{
// you can do what you wish here. strings are immutable, so you can only make a copy with the results you want... hence the "result" var.
result.Append('_'); // for example, replace space with _
}
else result.Append(c);
}
myString = result.ToString();
If you want to replace the white space with, e.g. '_', you can using String.Replace.
Example:
string myString = "aa bbb cccc dd";
string newString = myString.Replace(" ", "_"); // gives aa_bbb_cccc_dd
In case you want to left/right justify your string
int N=10;
string newstring = String.Join(
"",
myString.Split(' ').Select(s=>s.PadRight(N-s.Length)));

Categories

Resources