Split a string containing various spaces - c#

I have txt file as follows and would like to split them into double arrays
node Strain Axis Strain F P/S Sum Cur Moment
0 0.00000 0.00 0.0000 0 0 0 0 0.00
1 0.00041 -83.19 0.0002 2328 352 0 0 -0.80
2 0.00045 -56.91 0.0002 2329 352 0 0 1.45
3 0.00050 -42.09 0.0002 2327 353 0 0 -0.30
My goal is to have a series of arrays of each column. i.e.
node[] = {0,1,2,3), Axis[]= {0.00,-83.19,-56.91,-42.09}, ....
I know how to read the txt file and covert strings to double arrays. but the problem is the values are not separated by tab, but by different number of spaces. I googled to find out a way to do it. However, I couldn't find any. some discussed a way to do with a constant spaces. If you know how to do or there is an existing Q&A for this issue and let me know, it will be greatly appreciated. Thanks,

A different way, although I would suggest you stick with the other answers here using RemoveEmptyEntries would be to use a regular expression, but in this case it is overkill:
string[] elements = Regex.Split(s, #"\s+");

StringSplitOptions.RemoveEmptyEntires should do the trick:
var items = source.Split(new [] { " " }, StringSplitOptions.RemoveEmptyEntries);
The return value does not include array elements that contain an empty string

var doubles = text.Split("\n\r".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Skip(1)
.Select(line => line.Split(new char[]{' '},StringSplitOptions.RemoveEmptyEntries)
.Select(x => double.Parse(x)).ToArray())
.ToArray();

Use the option StringSplitOptions.RemoveEmptyEntries to treat consecutive delimiters as one:
string[] parts = source.Split(' ',StringSplitOptions.RemoveEmptyEntries);
then parse from there:
double[] values = parts.Select(s => double.Parse(s)).ToArray();

Related

sorting on List<string> with middle 2 character

I like to sort a list with middle 2 character. for example: The list contains following:
body1text
body2text
body11text
body3text
body12text
body13text
if I apply list.OrderBy(r => r.body), it will sort as follows:
body1text
body11text
body12text
body13text
body2text
body3text
But I need the following result:
body1text
body2text
body3text
body11text
body12text
body13text
is there any easy way to sort with middle 2 digit character?
Regards
Shuvra
The issue here is that your numbers are compared as strings, so string.Compare("11", "2") will return -1 meaning that "11" is less than "2". Assuming that your string is always in format "body" + n numbers + "text" you can match numbers with regex and parse an integer from result:
new[]
{
"body1text"
,"body2text"
,"body3text"
,"body11text"
,"body12text"
,"body13text"
}
.OrderBy(s => int.Parse(Regex.Match(s,#"\d+").Value))

Reading double Numbers from a text file which contains string and number mixed

I have a file which contains Numbers and Texts. and I'm trying to read all numbers as double and put them in a one dimension double array.
In the file , some lines begin with Space. also some lines contain Two or Three numbers after each other.
The file is creating from another app which i don't want to change its output format.
The data in the file is like blow and some lines begin with some space :
110 ! R1
123.000753 ! Radian per s as R2
600.0451 65 ! j/kg
12000 ! 4 Number of iteration
87.619 ! (min 20 and max 1000)
My code so far is :
char[] splits = { ' ', '!' };
var array = File.ReadAllLines(#"myfile.dat")
.SelectMany(linee => linee.Split(splits))
.Where(n => !string.IsNullOrWhiteSpace(n.ToString()))
.Select(n =>
{
double doub;
bool suc = double.TryParse(n, out doub);
return new { doub, suc };
}).Where( values=>values.suc).ToArray();
The problem is that my code also read numbers after ! in the descriptions like line 4 and line 5.
Array have to be like this :
110 , 123.000735 , 6000.0451 , 65 , 120000 , 87.619
But in my code is like this :
110 , 123.000735 , 6000.0451 , 65 , 120000 , 4 , 87.619 , 20 , 1000
It's hard to give a general formula when given only a single example, but the following will work for your example:
return File.ReadLines(#"myfile.dat")
.Where(s => !String.IsNullOrWhiteSpace(s))
.Select(s => s.Substring(0, s.IndexOf('!')).Split(new [] {' '}, StringSplitOptions.RemoveEmptyEntries))
.SelectMany(s => s)
.Select(s => Double.Parse(s));
One approach could be as following.
var lines = str.Split(new []{"!",Environment.NewLine},StringSplitOptions.RemoveEmptyEntries)
.Where(x=> x.Split(new []{" "},StringSplitOptions.RemoveEmptyEntries).All(c=>double.TryParse(c, out _))).
SelectMany(x=> x.Split(new []{" "},StringSplitOptions.RemoveEmptyEntries).Select(c=>double.Parse(c)));
Here's an alternate solution using regular expressions:
var regex = new Regex(#"^(\s*(?<v>\d+(\.\d+)?)\s*)+\!.*$");
var query = from line in lines
let match = regex.Match(line)
where match.Success
from #group in match.Groups.Cast<Group>()
where #group.Name == "v"
select double.Parse(#group.Value, NumberStyles.Float, CultureInfo.InvariantCulture);

Regex that match different format sentences in c#

Format of file
POS ID PosScore NegScore SynsetTerms Gloss
a 00001740 0.125 0 able#1" able to swim"; "she was able to program her computer";
a 00002098 0 0.75 unable#1 "unable to get to town without a car";
a 00002312 0 0 dorsal#2 abaxial#1 "the abaxial surface of a leaf is the underside or side facing away from the stem"
a 00002843 0 0 basiscopic#1 facing or on the side toward the base
a 00002956 0 0.23 abducting#1 abducent#1 especially of muscles; drawing away from the midline of the body or from an adjacent part
a 00003131 0 0 adductive#1 adducting#1 adducent#1 especially of muscles;
In this file, I want to extract (ID,PosScore,NegScore and SynsetTerms) field. The (ID,PosScore,NegScore) field data extraction is easy and I use the following code for the data of these fields.
Regex expression = new Regex(#"(\t(\d+)|(\w+)\t)");
var results = expression.Matches(input);
foreach (Match match in results)
{
Console.WriteLine(match);
}
Console.ReadLine();
and it give the correct result but the Filed SynsetTerms create a problem because some lines have two or more words so how organize word and get against it PosScore And NegScore.
For example, in fifth line there are two words abducting#1 and abducent#1 but both have same score.
So what will be regex for such line that get Word and its score, like:
Word PosScore NegScore
abducting#1 0 0.23
abducent#1 0 0.23
The non-regex, string-splitting version might be easier:
var data =
lines.Split(new[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
.Skip(1)
.Select(line => line.Split('\t'))
.SelectMany(parts => parts[4].Split().Select(word => new
{
ID = parts[1],
Word = word,
PosScore = decimal.Parse(parts[2]),
NegScore = decimal.Parse(parts[3])
}));
You can use this regex
^(?<pos>\w+)\s+(?<id>\d+)\s+(?<pscore>\d+(?:\.\d+)?)\s+(?<nscore>\d+(?:\.\d+)?)\s+(?<terms>(?:.*?#[^\s]*)+)\s+(?<gloss>.*)$
You can create a list like this
var lst=Regex.Matches(input,regex)
.Cast<Match>()
.Select(x=>
new
{
pos=x.Groups["pos"].Value,
terms=Regex.Split(x.Groups["terms"].Value,#"\s+"),
gloss=x.Groups["gloss"].Value
}
);
and now you can iterate over it
foreach(var temp in lst)
{
temp.pos;
//you can now iterate over terms
foreach(var t in temp.terms)
{
}
}

Parse string with encapsulated int to an array that contains other numbers to be ignored

during my coding I've come across a problem that involved parsing a string like this:
{15} there are 194 red balloons, {26} there are 23 stickers, {40} there are 12 jacks, ....
my code involved pulling both the sentence and the number into two separate arrays.
I've solved the problem involving parsing out the sentence into its own array using a *.Remove(0, 5) to eliminate the first part the problem with that part was that I had to make sure that the file always was written to a standard where {##} where involved however it was not as elegant as I would like in that some times the number would be {3} and i would be forced to make it { 3}.
as there were also the chance of the string containing other numbers I wasn't able to simply parse out the integers first.
int?[] array = y.Split(',')
.Select(z =>
{
int value;
return int.TryParse(z, out value) ? value : (int?)null;
})
.ToArray();
so anyway back to the problem at hand, I need to be able to parse out "{##}" into an array with each having its own element.
Here's one way to do it using positive lookaheads/lookbehinds:
string s = "{15} there are 194 red balloons, {26} there are 23 stickers, {40} there are 12 jacks";
// Match all digits preceded by "{" and followed by "}"
int[] matches = Regex.Matches(s, #"(?<={)\d+(?=})")
.Cast<Match>()
.Select(m => int.Parse(m.Value))
.ToArray();
// Yields [15, 26, 40]

parse text into key/value pair or json

I have text in the following format, I was wondering what the best approach might be to create a user object from it with the fields as its properties.
I dont know regular expressions that well and i was looking at the string methods in csharp particularly IndexOf and LastIndexOf, but i think that would be too messy as there are approximately 15 fields.
I am trying to do this in c sharp
Some characteristics:
The keys/fields are fixed and known beforehand, so i know that i have to look for things like title, company etc
The address part is single valued and following that there's some multi-valued fields
The multi-valued field may/maynot end with a comma (,)
There is one or two line brakes between the fields eg "country" is followed by 2 line brakes before we encounter "interest"
Title: Mr
Company: abc capital
Address1: 42 mystery lane
Zip: 112312
Country: Ireland
Interest: Biking, Swimming, Hiking,
Topic of Interest: Europe, Asia, Capital
This will split the the data up into key value pairs and store them in a dictionary. You may have to modify further for more requirements.
var dictionary = data
.Split(
new[] {"\r\n"},
StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split(':'))
.ToDictionary(
k => k[0].Trim(),
v => v[1].Trim());
I'd probably go with something like this:
private Dictionary<string, IEnumerable<string>> ParseValues(string providedValues)
{
Dictionary<string, IEnumerable<string>> parsedValues = new Dictionary<string, IEnumerable<string>>();
string[] lines = providedValues.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries); //Your newline character here might differ, being '\r', '\n', '\r\n'...
foreach (string line in lines)
{
string[] lineSplit = line.Split(':');
string key = lineSplit[0].Trim();
IEnumerable<string> values = lineSplit[1].Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries).Select(x => x.Trim()); //Removing empty entries here will ensure you don't get an empty for the "Interest" line, where you have 'Hiking' followed by a comma, followed by nothing else
parsedValues.Add(key, values);
}
return parsedValues;
}
or if you subscribe to the notion that readability and maintainability are not as cool as a great big chain of calls:
private static Dictionary<string, IEnumerable<string>> ParseValues(string providedValues)
{
return providedValues.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Select(x => x.Split(':')).ToDictionary(key => key[0].Trim(), value => value[1].Split(new char[]{ ','}, StringSplitOptions.RemoveEmptyEntries).Select(x => x.Trim()));
}
I strongly recomend getting more familiar wit regexp for those cases. Parsing "half" structured text is very easy and logic with regular exp.
for ex. this (and other following are just variants there are many ways to do it depending on what you need)
title:\s*(.*)\s+comp.*?:\s*(.*)\s+addr.*?:\s*(.*)\s+zip:\s*(.*)\s+country:\s*(.*)\s+inter.*?:\s*(.*)\s+topic.*?:\s*(.*)
gives result
1. Mr
2. abc capital
3. 42 mystery lane
4. 112312
5. Ireland
6. Biking, Swimming, Hiking,
7. Europe, Asia, Capital
or - more open to anything:
\s(.*?):\s(.*)
parses your input into nice groups like this:
Match 1
1. Title
2. Mr
Match 2
1. Company
2. abc capital
Match 3
1. Address1
2. 42 mystery lane
Match 4
1. Zip
2. 112312
Match 5
1. Country
2. Ireland
Match 6
1. Interest
2. Biking, Swimming, Hiking,
Match 7
1. Topic of Interest
2. Europe, Asia, Capital
I am not familiar with c# (and its dialect of regexp), I just wanted do awake your interest ...

Categories

Resources