Split string to array, remove empty spaces - c#

I have a question about splitting string. I want to split string, but when in string see chars "" then don't split and remove empty spaces.
My String:
String tmp = "abc 123 \"Edk k3\" String;";
Result:
1: abc
2: 123
3: Edkk3 // don't split after "" and remove empty spaces
4: String
My code for result, but I don't know how to remove empty spaces in ""
var tmpList = tmp.Split(new[] { '"' }).SelectMany((s, i) =>
{
if (i % 2 == 1) return new[] { s };
return s.Split(new[] { ' ', ';' }, StringSplitOptions.RemoveEmptyEntries);
}).ToList();
Or but this doesn't see "", so it splits everything
string[] tmpList = tmp.Split(new Char[] { ' ', ';', '\"', ',' }, StringSplitOptions.RemoveEmptyEntries);

Add .Replace(" ","")
String tmp = #"abc 123 ""Edk k3"" String;";
var tmpList = tmp.Split(new[] { '"' }).SelectMany((s, i) =>
{
if (i % 2 == 1) return new[] { s.Replace(" ", "") };
return s.Split(new[] { ' ', ';' }, StringSplitOptions.RemoveEmptyEntries);
}).ToList();

string.Split is not suitable for what you want to do, as you can't tell it to ignore what is in the ".
I wouldn't go with Regex either, as this can get complicated and memory intensive (for long strings).
Implement your own parser - using a state machine to track whether you are within a quoted portion.

You can use a regular expression. Instead of splitting, specify what you want to keep.
Example:
string tmp = "abc 123 \"Edk k3\" String;";
MatchCollection m = Regex.Matches(tmp, #"""(.*?)""|([^ ]+)");
foreach (Match s in m) {
Console.WriteLine(s.Groups[1].Value.Replace(" ", "") + s.Groups[2].Value);
}
Output:
abc
123
Edkk3
String;

Related

Read a text file and split the text file by removing delimiters and store it into 2 arrays

I want to read a text file and split the text file by removing delimiters and store it into two 1d-arrays (one for Movie name and other for Revenue)
Example of my text file:
Jurassic World=11734562.56
Black Panther#4352749.21
The Revenant}7452893.21
Trainwreck{1547892.45
May be this code, solve your problem. Splitfunction take chars and parse with these values. Delimeters are removed your result. If your all text like this (first name and delimeter and revenue), you can select even index as movie name and odd index as revenue.
string allText = #"Jurassic World=11734562.56
Black Panther#4352749.21
The Revenant}7452893.21
Trainwreck{1547892.45";
string[] splitStrings = allText.Split('\n', '=', '{', '}', '#');
string[] movies = splitStrings.Where((s, i) => i % 2 == 0).ToArray();
string[] revenues = splitStrings.Where((s, i) => i % 2 == 1).ToArray();
Another solution which strips unwanted chars.
string val = #" Jurassic World=11734562.56
Black Panther#4352749.21
The Revenant}7452893.21
Trainwreck{1547892.45";
string[] lines = val.Split('\r');
string[] movieNameArr = new string[lines.Length];
decimal[] amountsArr = new decimal[lines.Length];
for (int i = 0; i < lines.Length; i++)
{
string[] split = lines[i].Split(new Char[] { '=', '#', '}', '{' });
// replace new line or space chars with empty string
split[0] = Regex.Replace(split[0], #" |\n", string.Empty);
movieNameArr[i] = split[0];
amountsArr[i] = decimal.Parse(split[1]);
}
Console.WriteLine("Movie arr: [{0}]", string.Join(", ", movieNameArr));
Console.WriteLine("Amounts arr: [{0}]", string.Join(", ", amountsArr));
Console.ReadKey();

How to find largest word that starts with a capital and add a separator and space

I have code that finds largest word that starts with a capital letter. But I need that word to add a separator and space. Any ideas how I should do it properly?
char[] skyrikliai = { ' ', '.', ',', '!', '?', ':', ';', '(', ')', '\t' };
string eilute = "Arvydas (g. 1964 m. gruodzio 19 d. Kaune)– Lietuvos, krepsininkas, olimpinis ir pasaulio cempionas, nuo 2011 m. spalio 24 d.";
static string Ilgiausias(string eilute, char[] skyrikliai)
{
string[] parts = eilute.Split(skyrikliai,
StringSplitOptions.RemoveEmptyEntries);
string ilgiaus = "";
foreach (string zodis in parts)
if ((zodis.Length > ilgiaus.Length) && (zodis[0].ToString() == zodis[0].ToString().ToUpper()))
ilgiaus = zodis;
return ilgiaus;
}
It should find word Lietuvos and add , and space
Result should be "Lietuvos, "
I would use LINQ for that:
var ilgiaus = parts.Where(s => s[0].IsUpper())
.OrderByDescending(s => s.Length)
.FirstOrDefault();
if(ilgiaus != null) {
return ilgiaus + ", ";
}
Also you can use regex and linq. You dont need to split by many characters.
Regex regex = new Regex(#"[A-Z]\w*");
string str = "Arvydas (g. 1964 m. gruodzio 19 d. Kaune)– Lietuvos, krepsininkas, olimpinis ir pasaulio cempionas, nuo 2011 m. spalio 24 d.";
string longest = regex.Matches(str).Cast<Match>().Select(match => match.Value).MaxBy(val => val.Length);
if you dont want to use MoreLinq, instead of MaxBy(val => val.Length) you can do OrderByDescending(x => x.Length).First()
There are probably more ingenious and elegant ways, but the following pseudocode should work:
List<String> listOfStrings = new List<String>();
// add some strings to the generic list
listOfStrings.Add("bla");
listOfStrings.Add("foo");
listOfStrings.Add("bar");
listOfStrings.Add("Rompecabeza");
listOfStrings.Add("Rumpelstiltskin");
. . .
String longestWorld = String.Empty;
. . .
longestWord = GetLongestCapitalizedWord(listOfStrings);
. . .
private String GetLongestCapitalizedWord(List<String> listOfStrings)
{
foreach (string s in listofstrings)
{
if ((IsCapitalized(s) && (s.Len > longestWord.Len)))
{
longestWord = s;
}
}
}
private bool IsCapitalized(String s)
{
return // true or false
}

Compare String with omitting the blank lines?

I have two strings and need to compare them without checking the blank lines...
First string
CREATE OR REPLACE PROCEDURE "HELL_"
as
begin
dbms_output.put_line('Hello!');
end;
Second string
CREATE OR REPLACE PROCEDURE "USER1"."HELL_"
as
begin
dbms_output.put_line('Hello!');
end;
code that I am using:
string text1 = "";
string text2 = "";
if (text1.Equals(text2 ))
MessageBox.Show("same");
//no Exception
else
{
MessageBox.Show("not");
}
You can split the lines by using StringSplitOptions.RemoveEmptyEntries. The resulting string[] doesn't contain empty lines. Then Enumerable.SequenceEqual is useful.
string[] lines1 = text1.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
string[] lines2 = text2.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
bool equal = lines1.SequenceEqual(lines2);
If the "empty" lines can contain white-spaces:
var lines1 = text1.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Where(l => l.Trim().Length > 0);
var lines2 = text2.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Where(l => l.Trim().Length > 0);
and if you want to ignore white-spaces at all:
var lines1 = text1.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Where(l => l.Trim().Length > 0)
.Select(l => l.Trim());
var lines2 = text2.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Where(l => l.Trim().Length > 0)
.Select(l => l.Trim());
and if you also want to ignore the case:
bool equal = lines1.SequenceEqual(lines2, StringComparer.OrdinalIgnoreCase);
For your specific case this would do the trick:
if (firststring.Equals(secondstring.Text.Replace("\r\n\r\n", "\r\n")))
MessageBox.Show("same");
//no Exception
else
{
MessageBox.Show("not");
}
Along the same line as the other answers (by also "sanitizing" first) but more generally treats "blank lines" as any whitespace-only lines bounded by CR, LF, or any combination of the two.
string RemoveEmptyLines (string s) {
return Regex.Replace(s, #"(?:^|[\r\n]+)\s*?(?=(?:[\r\n]+|$))", "");
}
// Usage
RemoveEmptyLines(a) == RemoveEmptyLines(b)
The line-end characters (i.e. [\r\n]) may be expanded or refined as needed. This regular expression only process one blank line at a time (although all blank lines will be removed within the single Replace call) with a non-greedy quantifier and forward-lookahead. I find that this variation shows the intended operation more explicitly.

how to split a string TWICE

I've been trying to split a string twice but I keep getting the error "Index was outside the bounds of the array".
This is the string I intend to split:
"a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^"
such that I use the "^" as a delimiter in the first array separation so that each set will look as follows after the first result
a*b*c*d*e 1*2*3*4*5 e*f*g*h*i
Then thereafter perform another split operation on this set with * as the separator so that the results, for example from the first set is a b c d e
This is the C# code:
words = "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^";
char[] del = { '^' };
string[] splitResult = words.Split(del);
foreach (string w in splitResult)
{
char[] separator = { '*' };
string[] splitR = w.Split(separator);
foreach (string e in splitR)
{
string first = splitR[0];
string second = splitR[1];
string third = splitR[2];
string fourth = splitR[3];
string fifth = splitR[4];
}
}
To remove the last part where there is no result, how about
In C#
string str = "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^";
var result = str.Split(new char[] { '^' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split('*')).ToArray();
In VB.Net
Dim str As String = "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^"
Dim result = str.Split(New Char() {"^"}, StringSplitOptions.RemoveEmptyEntries)
.Select(Function(x) x.Split("*")).ToArray()
You can do this with Linq:
IEnumerable<IEnumerable<string>> strings = words
.Split(new char[] { '^' }, StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Split('*'));
or if you prefer to work exclusively with arrays
string[][] strings = words
.Split(new char[] { '^' }, StringSplitOptions.RemoveEmptyEntries)
.Select(w => w.Split('*').ToArray())
.ToArray();
string words= "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^";
string[] reslts = words.Split(new char[] { '*', '^' }, StringSplitOptions.RemoveEmptyEntries);
You have a terminating separator, So the final string is empty.
If (w != null) {
string[] splitR = w.Split(separator);
If splitR.lenght > 4)
{
string first = splitR[0];
string second = splitR[1];
string third = splitR[2];
string fourth = splitR[3];
string fifth = splitR[4];
}
}
Try this:
string words = "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^";
char[] del = { '^' };
string[] splitResult = words.Split(del,StringSplitOptions.RemoveEmptyEntries);
foreach (string w in splitResult)
{
char[] separator = { '*' };
string[] splitR = w.Split(separator);
if(splitR.Length==5)
{
string first = splitR[0];
string second = splitR[1];
string third = splitR[2];
string fourth = splitR[3];
string fifth = splitR[4];
Console.WriteLine("{0},{1},{2},{3},{4}", first, second, third, fourth, fifth);
}
}
You are getting exception Index was outside the bounds of the array because in the last loop, it is getting only one item, I suggest you to check for five items:
words = "a*b*c*d*e^1*2*3*4*5^e*f*g*h*i^";
char[] del = { '^' };
string[] splitResult = words.Split(del);
foreach (string w in splitResult)
{
char[] separator = { '*' };
string[] splitR = w.Split(separator);
if (splitR.Length>=5)
{
foreach (string e in splitR)
{
string first = splitR[0];
string second = splitR[1];
string third = splitR[2];
string fourth = splitR[3];
string fifth = splitR[4];
}
}
}
One line does it all
var f = words.Split(new char[] { '^' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Split(new char[] { '*' }).ToArray())
.ToArray();
Your second loop does 5 times same thing (you don't use e).
The exception you got is because a last empty string was included resulting in an empty array that gave the index out of range exception in the inner loop.

Split strings from a text file

I have the following strings in a text file "test"
Table Name.type
Market Drinks.tea
I wana split the strings so that I get the following output
ObjectName = Table AttributeName = Name Attribute Type =type
ObjectName = Market AttributeName = Drinks Attribute Type =tea
here is my code
string[] lines = File.ReadAllLines(#"d:\test.txt");
int i = 0;
var items = from line in lines
where i++ != 0
select new{
objectName = line.Split(new char[] {' '})[0],
attrName = line.Split(new char[]{'.'})[1],
attrType = line.Split(new char[] { ' ' })[2]
};
foreach (var item in items)
{
Console.WriteLine("ObjectName = {0}, AttributeName = {1}, Attribute Type = {2}",
item.objectName, item.attrName, item.attrType);
}
I'm getting an out of boundaries exception.
PS: there are no spaces at the end of the strings in the text file I just wanted to test a character!
You don't need the new char[] { ... } surrounding because String.Split() uses params
To fix the index-out-of-bounds, the last part of the select should become:
attrType = line.Split(' ', '.' )[2]
Edit
And thanks to #Kobi, a let will let you do the Split just once, a great improvement when you have many rows and/or columns.
var items = from line in lines
where i++ != 0
let words = line.Split(' ', '.')
select new
{
objectName = words[0],
attrName = words[1],
attrType = words[2]
};
Old answer
You can use the same Split for all 3 parts, making it a little easier to read:
select new{
objectName = line.Split(' ', '.' )[0],
attrName = line.Split(' ', '.' )[1],
attrType = line.Split(' ', '.' )[2]
};
Use regular expressions which is more robust:
static void Main()
{
const string PATTERN = #"^([\w]+)\s+([\w]+)\.(\w+)";
const string TEXT = "Table Name.type\r\nMarket Drinks.tea";
foreach (Match match in Regex.Matches(TEXT, PATTERN, RegexOptions.Multiline))
{
Console.WriteLine("ObjectName = {0} AttributeName = {1} Attribute Type ={2}",
match.Groups[1].Value, match.Groups[2].Value, match.Groups[3].Value);
}
}
Outputs:
ObjectName = Table AttributeName = Name Attribute Type =type
ObjectName = Market AttributeName = Drinks Attribute Type =tea
On the splitting part, you should do it like this (provided you are sure your input is in the correct format):
attrName = line.Split(' ')[1].Split('.')[0],
attrType = line.Split(' ')[1].Split('.')[1]
The out of bounds is on this line - attrType = line.Split(new char[] { ' ' })[2]
your attrType should be = line.Split(new char[] { '.' } )[1];
attrName should be = line.Split(new char[] {' '})[1].Split(new char[] {'.'})[0]
As Henk Holterman has said, you dont need to use new char[] inside split so your lines would be -
attrType = line.Split('.')[1];
attrName = line.Split(' ')[1].Split('.')[0];
thx to the replies here is the right answer for the desired output
objectName = line.Split(' ')[0],
attrName = line.Split(' ')[1].Split('.')[0],
attrType = line.Split('.')[1]
if you want to do in this way just use Regex its more flexible
const string pattern = #"(?<objectName>\w+)\s(?<attrName>\w+)\.(?<attrType>\w+)";
string[] lines = File.ReadAllLines(#"e:\a.txt");
var items = from line in lines
select new
{
objectName = Regex.Match(line, pattern).Groups["objectName"].Value,
attrName = Regex.Match(line, pattern).Groups["attrName"].Value,
attrType = Regex.Match(line, pattern).Groups["attrType"].Value
};
foreach (var item in items.ToList())
{
Console.WriteLine("ObjectName = {0}, AttributeName = {1}, Attribute Type = {2}",
item.objectName, item.attrName, item.attrType);
}

Categories

Resources