I have a string like this
12,3 m and i need 2 sub strings one decimal value and one unit like 12,3 and m
12.3 will return 12.3 and m
123,4 c will return 123,4 and c
The decimal separators can be . or ,
So how can i get it in C# without iterating through every characters like below
char c;
for (int i = 0; i < Word.Length; i++)
{
c = Word[i];
if (Char.IsDigit(c))
string1 += c;
else
string2 += c;
}
string input is not really needed to be formatted like this it can be like A12,3 m or ABC3.45 or 4.5 DEF etc. So string Split is not stable always
Looks like you are trying to split based on the whitespace character:
input = "12.3 c";
string[] stringArray = string.Split(input, ' ');
You can then do a float.Parse operation on the first element of the array. The decimal separator used by float.Parse would depend on your culture and if the wrong one is chosen you could get a FormatException.
You can also choose the decimal separator programatically through the below:
culture.NumberFormat.NumberDecimalSeparator = "."; // or ","
Checking your provided examples { "12,3 m", "A12,3 m", "ABC3.45", "4.5 DEF"} it seems that the string position can not only change but there can be 2 strings and one decimal in your inputstrings.
This solution will show you how to extract these data with only one regex and no manual string split. I will incorporate the CultureInfo from user1666620:
string[] inputStrings = new string[] { "12,3 m", "A12,3 m", "ABC3.45", "4.5 DEF"};
Regex splitterRx = new Regex("([a-zA-Z]*)\\s*([\\d\\.,]+)\\s*([a-zA-Z]*)");
List<Tuple<string, decimal, string>> results = new List<Tuple<string, decimal, string>>();
foreach (var str in inputStrings)
{
var splitterM = splitterRx.Match(str);
if (splitterM.Success)
{
results.Add(new Tuple<string, decimal, string>(
splitterM.Groups[1].Value,
decimal.Parse(
splitterM.Groups[2].Value.Replace(".", System.Globalization.NumberFormatInfo.CurrentInfo.NumberDecimalSeparator).Replace(
",", System.Globalization.NumberFormatInfo.CurrentInfo.NumberDecimalSeparator)
),
splitterM.Groups[3].Value
));
}
}
This will find all possible combinations of a present/not present string in pre/post position, so be sure to check the individual strings or apply any combining logik unto them.
Also it doesn't only check for the presence of a single space between the decimal and the strings but for the presence of any number of whitespaces. If you want to limit it to definately only 0 or 1 space instead replace the Regex with this:
Regex splitterRx = new Regex("([a-zA-Z]*)[ ]{0,1}([\\d\\.,]+)[ ]{0,1}([a-zA-Z]*)");
Related
I have a problem with c# string manipulation and I'd appreciate your help.
I have a file that contains many lines. It looks like this:
firstWord number(secondWord) thirdWord(Phrase) Date1 Date2
firstWord number(secondWord) thirdWord(Phrase) Date1 Time1
...
I need to separate these words and put them in a class properties. As you can see the problem is that the spaces between words are not the same, sometimes is one space sometimes eight spaces between them. And the second problem is that on the third place comes a phrase containing 2 to 5 words (again divided by spaces or sometimes contected with _ or -) and it needs to be considered as one string - it has to be one class member. The class should look like this:
class A
string a = firstWord;
int b = number;
string c = phrase;
Date d = Date1;
Time e = Time1;
I'd appreciate if you had any ideas how to solve this. Thank you.
Use the following steps:
Use File.ReadAllLines() to get a string[], where each element represents one line of the file.
For each line, use string.Split() and chop your line into individual words. Use both space and parentheses as your delimiters. This will give you an array of words. Call it arr.
Now create an object of your class and assign like this:
string a = arr[0];
int b = int.Parse(arr[1]);
string c = string.Join(" ", arr.Skip(4).Take(arr.Length - 6));
Date d = DateTime.Parse(arr[arr.Length - 2]);
Date e = DateTime.Parse(arr[arr.Length - 1]);
The only tricky stuff is string c above. Logic here is that from element no. 4 up to the 3rd last element, all of these elements form your phrase part, so we use linq to extract those elements and join them together to get back your phrase. This would obviously require that the phrase itself doesn't contain any parentheses itself, but that shouldn't normally be the case I assume.
You need a loop and string- and TryParse-methods:
var list = new List<ClassName>();
foreach (string line in File.ReadLines(path).Where(l => !string.IsNullOrEmpty(l)))
{
string[] fields = line.Trim().Split(new char[] { }, StringSplitOptions.RemoveEmptyEntries);
if (fields.Length < 5) continue;
var obj = new ClassName();
list.Add(obj);
obj.FirstWord = fields[0];
int number;
int index = fields[1].IndexOf('(');
if (index > 0 && int.TryParse(fields[1].Remove(index), out number))
obj.Number = number;
int phraseStartIndex = fields[2].IndexOf('(');
int phraseEndIndex = fields[2].LastIndexOf(')');
if (phraseStartIndex != phraseEndIndex)
{
obj.Phrase = fields[2].Substring(++phraseStartIndex, phraseEndIndex - phraseStartIndex);
}
DateTime dt1;
if(DateTime.TryParse(fields[3], out dt1))
obj.Date1 = dt1;
DateTime dt2;
if (DateTime.TryParse(fields[3], out dt2))
obj.Date2 = dt2;
}
The following regular expression seems to cover what I imagine you would need - at least a good start.
^(?<firstWord>[\w\s]*)\s+(?<secondWord>\d+)\s+(?<thirdWord>[\w\s_-]+)\s+(?<date>\d{4}-\d{2}-\d{2})\s+(?<time>\d{2}:\d{2}:\d{2})$
This captures 5 named groups
firstWord is any alphanumeric or whitespace
secondWord is any numeric entry
thirdWord any alphanumeric, space underscore or hyphen
date is any iso formatted date (date not validated)
time any time (time not validated)
Any amount of whitespace is used as the delimiter - but you will have to Trim() any group captures. It makes a hell of a lot of assumptions about your format (dates are ISO formatted, times are hh:mm:ss).
You could use it like this:
Regex regex = new Regex( #"(?<firstWord>[\w\s]*)\s+(?<secondWord>\d+)\s+(?<thirdWord>[\w\s_-]+)\s+(?<date>\d{4}-\d{2}-\d{2})\s+(?<time>\d{2}:\d{2}:\d{2})$", RegexOptions.IgnoreCase );
var match = regex.Match("this is the first word 123 hello_world 2017-01-01 10:00:00");
if(match.Success){
Console.WriteLine("{0}\r\n{1}\r\n{2}\r\n{3}\r\n{4}",match.Groups["firstWord"].Value.Trim(),match.Groups["secondWord"].Value,match.Groups["thirdWord"].Value,match.Groups["date"].Value,match.Groups["time"].Value);
}
http://rextester.com/LGM52187
You have to use Regex, you may have a look here as a starting point. so for example to get the first word you may use this
string data = "Example 2323 Second This is a Phrase 2017-01-01 2019-01-03";
string firstword = new Regex(#"\b[A-Za-z]+\b").Matches(data )[0]
I have the following text in an Excel spreadsheet cell:
"Calories (kcal) "
(minus quotes).
I can get the value of the cell into my code:
string nutrientLabel = dataRow[0].ToString().Trim();
I'm new to C# and need help in separating the "Calories" and "(kcal)" to to different variables that I can upload into my online system. I need the result to be two strings:
nutrientLabel = Calories
nutrientUOM = kcal
I've googled the hell out of this and found out how to make it work to separate them and display into Console.WriteLine but I need the values actually out to 2 variables.
foreach (DataRow dataRow in nutrientsdataTable.Rows)
{
string nutrientLabel = dataRow[0].ToString().Trim();
}
char[] paraSeparator = new char[] { '(', ')' };
string[] result;
Console.WriteLine("=======================================");
Console.WriteLine("Para separated strings :\n");
result = nutrientLabel.Split(paraSeparator,
StringSplitOptions.RemoveEmptyEntries);
foreach (string str in result)
{
Console.WriteLine(str);
}
You can use a simple regex for this:
var reg = new Regex(#"(?<calories>\d+)\s\((?<kcal>\d+)\)");
Which essentially says:
Match at least one number and store it in the group 'calories'
Match a space and an opening parenthesis
Match at least one number and store it in the group 'kcal'
Match a closing parenthesis
Then we can extract the results using the named groups:
var sampleInput = "15 (35)";
var match = reg.Match(sampleInput);
var calories = match.Groups["calories"];
var kcal = match.Groups["kcal"];
Note that calories and kcal are still strings here, you'll need to parse them into an integer (or decimal)
string [] s = dataRow[0].ToString().Split(' ');
nutrientLabel = s[0];
nutrientUOM = s[1].Replace(")","").Replace("(","");
I found it not efficient to iterate through string parts split by space character and extract numeric parts and apply
UInt64.Parse(Regex.Match(numericPart, #"\d+").Value)
and the concatenating them together to form the string with numbers being grouped.
Is there a better, more efficient way to 3-digit grouping of all numbers in an string containing other characters?
I am pretty sure the most efficient way (CPU-wise, with just a single pass over the string) is the basic foreach loop, along these lines
var sb = new StringBuilder()
foreach(char c in inputString)
{
// if c is a digit count
// else reset counter
// if there are three digits insert a "."
}
return sb.ToString()
This will produce 123.456.7
If you want 1.234.567 you'll need an additional buffer for digit-sequences
So you want to replace all longs in a string with the same long but with a number-group-separator of the current culture? .... Yes
string[] words = input.Split();
var newWords = words.Select(w =>
{
long l;
bool isLong = System.Int64.TryParse(w.Trim(), out l);
if(isLong)
return l.ToString("N0");
else
return w;
});
string result = string.Join(" ", newWords);
With the input from your comment:
string input = "hello 134443 in the 33 when 88763 then";
You get the expected result: "hello 134,443 in the 33 when 88,763 then", if your current culture uses comma as number-group-separator.
I will post my regex-based example. I believe regex does not have to be too slow, especially once it is compiled and is declared with static and readonly.
// Declare the regex
private static readonly Regex regex = new Regex(#"(\d)(?=(\d{3})+(?!\d))", RegexOptions.Compiled);
// Then, somewhere inside a method
var replacement = string.Format("$1{0}", System.Globalization.CultureInfo.CurrentCulture.NumberFormat.NumberGroupSeparator); // Get the system digit grouping separator
var strn = "Hello 34234456 where 3334 is it?"; // Just a sample string
// Somewhere (?:inside a loop)?
var res = regex.Replace(strn, replacement);
Output (if , is a system digit grouping separator):
Hello 34,234,456 where 3,334 is it?
I have a string which gives the measurement followed the units in either cm, m or inches.
For example :
The number could be 112cm, 1.12m, 45inches or 45in.
I would like to extract only the number part of the string. Any idea how to use the units as the delimiters to extract the number ?
While I am at it, I would like to ignore the case of the units.
Thanks
You can try:
string numberMatch = Regex.Match(measurement, #"\d+\.?\d*").Value;
EDIT
Furthermore, converting this to a double is trivial:
double result;
if (double.TryParse(number, out result))
{
// Yeiiii I've got myself a double ...
}
Use String.Split http://msdn.microsoft.com/en-us/library/tabh47cf.aspx
Something like:
var units = new[] {"cm", "inches", "in", "m"};
var splitnumber = mynumberstring.Split(units, StringSplitOptions.RemoveEmptyEntries);
var number = Convert.ToInt32(splitnumber[0]);
Using Regex this can help you out:
(?i)(\d+(?:\.\d+)?)(?=c?m|in(?:ch(?:es)?)?)
Break up:
(?i) = ignores characters case // specify it in C#, live do not have it
\d+(\.\d+)? = supports numbers like 2, 2.25 etc
(?=c?m|in(ch(es)?)?) = positive lookahead, check units after the number if they are
m, cm,in,inch,inches, it allows otherwise it is not.
?: = specifies that the group will not capture
? = specifies the preceding character or group is optional
Demo
EDIT
Sample code:
MatchCollection mcol = Regex.Matches(sampleStr,#"(?i)(\d+(?:\.\d+)?)(?=c?m|in(?:ch(?:es)?)?)")
foreach(Match m in mcol)
{
Debug.Print(m.ToString()); // see output window
}
I guess I'd try to replace with "" every character that is not number or ".":
//s is the string you need to convert
string tmp=s;
foreach (char c in s.ToCharArray())
{
if (!(c >= '0' && c <= '9') && !(c =='.'))
tmp = tmp.Replace(c.ToString(), "");
}
s=tmp;
Try using regular expression \d+ to find an integer number.
resultString = Regex.Match(measurementunit , #"\d+").Value;
Is it a requirement that you use the unit as the delimiter? If not, you could extract the number using regex (see Find and extract a number from a string).
I have a list of string
goal0=1234.4334abc12423423
goal1=-234234
asdfsdf
I want to extract the number part from string that start with goal,
in the above case is
1234.4334, -234234
(if two fragments of digit get the first one)
how should i do it easily?
Note that "goal0=" is part of the string, goal0 is not a variable.
Therefore I would like to have the first digit fragment that come after "=".
You can do the following:
string input = "goal0=1234.4334abc12423423";
input = input.Substring(input.IndexOf('=') + 1);
IEnumerable<char> stringQuery2 = input.TakeWhile(c => Char.IsDigit(c) || c=='.' || c=='-');
string result = string.Empty;
foreach (char c in stringQuery2)
result += c;
double dResult = double.Parse(result);
Try this
string s = "goal0=-1234.4334abc12423423";
string matches = Regex.Match(s, #"(?<=^goal\d+=)-?\d+(\.\d+)?").Value;
The regex says
(?<=^goal\d+=) - A positive look behind which means look back and make sure goal(1 or more number)= is at the start of the string, but dont make it part of the match
-? - A minus sign which is optional (the ? means 1 or more)
\d+ - One or more digits
(\.\d+)? - A decimal point followed by 1 or more digits which is optional
This will work if your string contains multiple decimal points as well as it will only take the first set of numbers after the first decimal point if there are any.
Use a regex for extracting:
x = Regex.Match(string, #"\d+").Value;
Now convert the resulting string to the number by using:
finalNumber = Int32.Parse(x);
Please try this:
string sample = "goal0=1234.4334abc12423423goal1=-234234asdfsdf";
Regex test = new Regex(#"(?<=\=)\-?\d*(\.\d*)?", RegexOptions.Singleline);
MatchCollection matchlist = test.Matches(sample);
string[] result = new string[matchlist.Count];
if (matchlist.Count > 0)
{
for (int i = 0; i < matchlist.Count; i++)
result[i] = matchlist[i].Value;
}
Hope it helps.
I didn't get the question at first. Sorry, but it works now.
I think this simple expression should work:
Regex.Match(string, #"\d+")
You can use the old VB Val() function from C#. That will extract a number from the front of a string, and it's already available in the framework:
result0 = Microsoft.VisualBasic.Conversion.Val(goal0);
result1 = Microsoft.VisualBasic.Conversion.Val(goal1);
string s = "1234.4334abc12423423";
var result = System.Text.RegularExpressions.Regex.Match(s, #"-?\d+");
List<String> list = new List<String>();
list.Add("goal0=1234.4334abc12423423");
list.Add("goal1=-23423");
list.Add("asdfsdf");
Regex regex = new Regex(#"^goal\d+=(?<GoalNumber>-?\d+\.?\d+)");
foreach (string s in list)
{
if(regex.IsMatch(s))
{
string numberPart = regex.Match(s).Groups["GoalNumber"];
// do something with numberPart
}
}