Split strings from a text file - c#

I have the following strings in a text file "test"
Table Name.type
Market Drinks.tea
I wana split the strings so that I get the following output
ObjectName = Table AttributeName = Name Attribute Type =type
ObjectName = Market AttributeName = Drinks Attribute Type =tea
here is my code
string[] lines = File.ReadAllLines(#"d:\test.txt");
int i = 0;
var items = from line in lines
where i++ != 0
select new{
objectName = line.Split(new char[] {' '})[0],
attrName = line.Split(new char[]{'.'})[1],
attrType = line.Split(new char[] { ' ' })[2]
};
foreach (var item in items)
{
Console.WriteLine("ObjectName = {0}, AttributeName = {1}, Attribute Type = {2}",
item.objectName, item.attrName, item.attrType);
}
I'm getting an out of boundaries exception.
PS: there are no spaces at the end of the strings in the text file I just wanted to test a character!

You don't need the new char[] { ... } surrounding because String.Split() uses params
To fix the index-out-of-bounds, the last part of the select should become:
attrType = line.Split(' ', '.' )[2]
Edit
And thanks to #Kobi, a let will let you do the Split just once, a great improvement when you have many rows and/or columns.
var items = from line in lines
where i++ != 0
let words = line.Split(' ', '.')
select new
{
objectName = words[0],
attrName = words[1],
attrType = words[2]
};
Old answer
You can use the same Split for all 3 parts, making it a little easier to read:
select new{
objectName = line.Split(' ', '.' )[0],
attrName = line.Split(' ', '.' )[1],
attrType = line.Split(' ', '.' )[2]
};

Use regular expressions which is more robust:
static void Main()
{
const string PATTERN = #"^([\w]+)\s+([\w]+)\.(\w+)";
const string TEXT = "Table Name.type\r\nMarket Drinks.tea";
foreach (Match match in Regex.Matches(TEXT, PATTERN, RegexOptions.Multiline))
{
Console.WriteLine("ObjectName = {0} AttributeName = {1} Attribute Type ={2}",
match.Groups[1].Value, match.Groups[2].Value, match.Groups[3].Value);
}
}
Outputs:
ObjectName = Table AttributeName = Name Attribute Type =type
ObjectName = Market AttributeName = Drinks Attribute Type =tea

On the splitting part, you should do it like this (provided you are sure your input is in the correct format):
attrName = line.Split(' ')[1].Split('.')[0],
attrType = line.Split(' ')[1].Split('.')[1]

The out of bounds is on this line - attrType = line.Split(new char[] { ' ' })[2]
your attrType should be = line.Split(new char[] { '.' } )[1];
attrName should be = line.Split(new char[] {' '})[1].Split(new char[] {'.'})[0]
As Henk Holterman has said, you dont need to use new char[] inside split so your lines would be -
attrType = line.Split('.')[1];
attrName = line.Split(' ')[1].Split('.')[0];

thx to the replies here is the right answer for the desired output
objectName = line.Split(' ')[0],
attrName = line.Split(' ')[1].Split('.')[0],
attrType = line.Split('.')[1]

if you want to do in this way just use Regex its more flexible
const string pattern = #"(?<objectName>\w+)\s(?<attrName>\w+)\.(?<attrType>\w+)";
string[] lines = File.ReadAllLines(#"e:\a.txt");
var items = from line in lines
select new
{
objectName = Regex.Match(line, pattern).Groups["objectName"].Value,
attrName = Regex.Match(line, pattern).Groups["attrName"].Value,
attrType = Regex.Match(line, pattern).Groups["attrType"].Value
};
foreach (var item in items.ToList())
{
Console.WriteLine("ObjectName = {0}, AttributeName = {1}, Attribute Type = {2}",
item.objectName, item.attrName, item.attrType);
}

Related

Read a text file and split the text file by removing delimiters and store it into 2 arrays

I want to read a text file and split the text file by removing delimiters and store it into two 1d-arrays (one for Movie name and other for Revenue)
Example of my text file:
Jurassic World=11734562.56
Black Panther#4352749.21
The Revenant}7452893.21
Trainwreck{1547892.45
May be this code, solve your problem. Splitfunction take chars and parse with these values. Delimeters are removed your result. If your all text like this (first name and delimeter and revenue), you can select even index as movie name and odd index as revenue.
string allText = #"Jurassic World=11734562.56
Black Panther#4352749.21
The Revenant}7452893.21
Trainwreck{1547892.45";
string[] splitStrings = allText.Split('\n', '=', '{', '}', '#');
string[] movies = splitStrings.Where((s, i) => i % 2 == 0).ToArray();
string[] revenues = splitStrings.Where((s, i) => i % 2 == 1).ToArray();
Another solution which strips unwanted chars.
string val = #" Jurassic World=11734562.56
Black Panther#4352749.21
The Revenant}7452893.21
Trainwreck{1547892.45";
string[] lines = val.Split('\r');
string[] movieNameArr = new string[lines.Length];
decimal[] amountsArr = new decimal[lines.Length];
for (int i = 0; i < lines.Length; i++)
{
string[] split = lines[i].Split(new Char[] { '=', '#', '}', '{' });
// replace new line or space chars with empty string
split[0] = Regex.Replace(split[0], #" |\n", string.Empty);
movieNameArr[i] = split[0];
amountsArr[i] = decimal.Parse(split[1]);
}
Console.WriteLine("Movie arr: [{0}]", string.Join(", ", movieNameArr));
Console.WriteLine("Amounts arr: [{0}]", string.Join(", ", amountsArr));
Console.ReadKey();

Linq Query C# ( Multiple Occurrences in String array)

Guys I know this might be a naive Question but i got to ask.
I have array of string
List<string> lstPets = new List<string> { "dog", "cat","horse","parrot" };
And a string
string paragraph = "This is a test script to test whether a dog exists or not";
Now I have to write a linq query to find whether any occurrence of "lstPets" in "paragrah".
Thanks in advance. :)
var lstPets = new List<string> { "dog", "cat","horse","parrot" };
string paragraph = "This is a test script to test whether a dog exists or not";
var containsAny = lstPets.Any(paragraph.Contains);
Or maybe more tolerant:
var containsAny = lstPets.Any(pet => paragraph.Contains(pet, StringComparison.OrdinalIgnoreCase));
var sentenceQuery = from item in paragraph
let w = item .Split(new char[] { '.', '?', '!', ' ', ';', ':', ',' }, StringSplitOptions.RemoveEmptyEntries)
where w.Distinct().Intersect(IsPets).Count() == IsPets.Count()
select item ;
foreach (string str in sentenceQuery)
{
Console.WriteLine(str);
}
Try Code
var countres = lstPets.Where(c => paragraph.Contains(c)).ToList().Count() > 0 ? true : false;

How to use Linq to Split a String on newlines and space?

I have a string:
string data =
"item1 actived
item2 none
item special I none
item special II actived"
you can see 4 rows in the data.
I need to split a string into a List item as below:
item[0]={Name=item1, Status=actived}
item[1]={Name=item2, Status=none}
item[2]={Name=item Special I, Status=none}
item[3]={Name=item Special II, Status=actived}
I'm tried:
var s = SplitReturn(data);
public string[] SplitReturn(string name)
{
return name.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
}
How do I can Split space in my string and then convert to List?
string data =
#"item1 actived
item2 none
item special I none
item special II actived";
var result = data.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(item => {
int lastSpace = item.LastIndexOf(' ');
return new
{
Name = item.Substring(0, lastSpace).Trim(),
Status = item.Substring(lastSpace, item.Length - lastSpace).Trim()
}; }).ToList();

how to split a string TWICE

I've been trying to split a string twice but I keep getting the error "Index was outside the bounds of the array".
This is the string I intend to split:
"a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^"
such that I use the "^" as a delimiter in the first array separation so that each set will look as follows after the first result
a*b*c*d*e 1*2*3*4*5 e*f*g*h*i
Then thereafter perform another split operation on this set with * as the separator so that the results, for example from the first set is a b c d e
This is the C# code:
words = "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^";
char[] del = { '^' };
string[] splitResult = words.Split(del);
foreach (string w in splitResult)
{
char[] separator = { '*' };
string[] splitR = w.Split(separator);
foreach (string e in splitR)
{
string first = splitR[0];
string second = splitR[1];
string third = splitR[2];
string fourth = splitR[3];
string fifth = splitR[4];
}
}
To remove the last part where there is no result, how about
In C#
string str = "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^";
var result = str.Split(new char[] { '^' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split('*')).ToArray();
In VB.Net
Dim str As String = "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^"
Dim result = str.Split(New Char() {"^"}, StringSplitOptions.RemoveEmptyEntries)
.Select(Function(x) x.Split("*")).ToArray()
You can do this with Linq:
IEnumerable<IEnumerable<string>> strings = words
.Split(new char[] { '^' }, StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Split('*'));
or if you prefer to work exclusively with arrays
string[][] strings = words
.Split(new char[] { '^' }, StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Split('*').ToArray())
.ToArray();
string words= "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^";
string[] reslts = words.Split(new char[] { '*', '^' }, StringSplitOptions.RemoveEmptyEntries);
You have a terminating separator, So the final string is empty.
If (w != null) {
string[] splitR = w.Split(separator);
If splitR.lenght > 4)
{
string first = splitR[0];
string second = splitR[1];
string third = splitR[2];
string fourth = splitR[3];
string fifth = splitR[4];
}
}
Try this:
string words = "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^";
char[] del = { '^' };
string[] splitResult = words.Split(del,StringSplitOptions.RemoveEmptyEntries);
foreach (string w in splitResult)
{
char[] separator = { '*' };
string[] splitR = w.Split(separator);
if(splitR.Length==5)
{
string first = splitR[0];
string second = splitR[1];
string third = splitR[2];
string fourth = splitR[3];
string fifth = splitR[4];
Console.WriteLine("{0},{1},{2},{3},{4}", first, second, third, fourth, fifth);
}
}
You are getting exception Index was outside the bounds of the array because in the last loop, it is getting only one item, I suggest you to check for five items:
words = "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^";
char[] del = { '^' };
string[] splitResult = words.Split(del);
foreach (string w in splitResult)
{
char[] separator = { '*' };
string[] splitR = w.Split(separator);
if (splitR.Length>=5)
{
foreach (string e in splitR)
{
string first = splitR[0];
string second = splitR[1];
string third = splitR[2];
string fourth = splitR[3];
string fifth = splitR[4];
}
}
}
One line does it all
var f = words.Split(new char[] { '^' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split(new char[] { '*' }).ToArray())
.ToArray();
Your second loop does 5 times same thing (you don't use e).
The exception you got is because a last empty string was included resulting in an empty array that gave the index out of range exception in the inner loop.

Split string to array, remove empty spaces

I have a question about splitting string. I want to split string, but when in string see chars "" then don't split and remove empty spaces.
My String:
String tmp = "abc 123 \"Edk k3\" String;";
Result:
1: abc
2: 123
3: Edkk3 // don't split after "" and remove empty spaces
4: String
My code for result, but I don't know how to remove empty spaces in ""
var tmpList = tmp.Split(new[] { '"' }).SelectMany((s, i) =>
{
if (i % 2 == 1) return new[] { s };
return s.Split(new[] { ' ', ';' }, StringSplitOptions.RemoveEmptyEntries);
}).ToList();
Or but this doesn't see "", so it splits everything
string[] tmpList = tmp.Split(new Char[] { ' ', ';', '\"', ',' }, StringSplitOptions.RemoveEmptyEntries);
Add .Replace(" ","")
String tmp = #"abc 123 ""Edk k3"" String;";
var tmpList = tmp.Split(new[] { '"' }).SelectMany((s, i) =>
{
if (i % 2 == 1) return new[] { s.Replace(" ", "") };
return s.Split(new[] { ' ', ';' }, StringSplitOptions.RemoveEmptyEntries);
}).ToList();
string.Split is not suitable for what you want to do, as you can't tell it to ignore what is in the ".
I wouldn't go with Regex either, as this can get complicated and memory intensive (for long strings).
Implement your own parser - using a state machine to track whether you are within a quoted portion.
You can use a regular expression. Instead of splitting, specify what you want to keep.
Example:
string tmp = "abc 123 \"Edk k3\" String;";
MatchCollection m = Regex.Matches(tmp, #"""(.*?)""|([^ ]+)");
foreach (Match s in m) {
Console.WriteLine(s.Groups[1].Value.Replace(" ", "") + s.Groups[2].Value);
}
Output:
abc
123
Edkk3
String;

Categories

Resources