Split strings that have strange pattern - c#

I need help to split a collection of strings that have rather strange pattern.
Example data:
List<string> input = new List<string>();
input.Add("Blue Code \n 03 ID \n 05 Example \n Sky is blue");
input.Add("Green Code\n 01 ID\n 15");
input.Add("Test TestCode \n 99 \n Testing is fun");
Expected output:
For input[0]:
string part1 = "Blue"
string part2 = "Code \n 03"
string part3 = "ID \n 05"
string part4 = "Example \n Sky is blue"
For input[1]:
string part1 = "Green"
string part2 = "Code\n 01"
string part3 = "ID\n 15"
For input[2]:
string part1 = "Test"
string part2 = "TestCode \n 99"
string part3 = "\n Testing is fun"
Edited with one more example:
"038 038\n 0004 049.0\n 0006"
Expected output:
"038"
"038\n 0004"
"049.0\n 0006"
In short, I don't even know how to describe the pattern... It seems like I need the first string(act as a key) right before the "\n" as part of the new string, but the last input[2] has slightly different pattern from the other 2. Also, please take note of the spaces, they are extremely inconsistent.
I know this is a long shot, but please let me know if anyone can figure out how to deal with these data.
Updated: I think I can forget about solving this... When I actually take a look at the database in detail, I just found out that there are NOT only \n, it can be... anything, including |a |b |c (from a-z, A-Z), \a \b \c (from a-z, A-Z). Manually re-entering the data could be much more easier...

I would say the pattern is:
List<string> input = new List<string>();
input.Add("Blue Code \n 03 ID \n 05 Example \n Sky is blue");
input.Add("Green Code\n 01 ID\n 15");
input.Add("Test TestCode \n 99 \n Testing is fun");
foreach(string text in input)
{
string rest = text;
//1 Take first word
string part1 = rest.Split(' ')[0];
rest = rest.Skip(part1.Length).ToString();
//while rest contains (/n number)
while (rest.Contains("\n"))
{
//Take until /n number
int index = rest.IndexOf("\n");
string partNa = rest.Take(index).ToString();
string temp = rest.Skip(index).ToString();
string partNb = temp.Split(' ')[0];
int n;
if (int.TryParse("123", out n))
{
string partN = partNa + partNb;
rest = rest.Skip(partN.Length).ToString();
}
}
//Take rest
string part3 = rest;
}
It could probably be written a bit more optimised, but you get the idea.

Ok, I have got this little code snippet to generate the output you are looking for. the Pattern seems to be: Word [Key \n Value] [Key \n Value] [Key \n Value (With Spaces)]
Where the Key can be empty. Is that right?
var input = new List<string>
{
"Blue Code \n 03 ID \n 05 Example \n Sky is blue",
"Green Code\n 01 ID\n 15",
"038 038\n 0004 049.0\n 0006",
"Test TestCode \n 99 \n Testing is fun"
};
var output = new List<List<string>>();
foreach (var item in input)
{
var items = new List<string> {item.Split(' ')[0]};
const string strRegex = #"(?<group>[a-zA-Z0-9\.]*\s*\n\s*[a-zA-Z0-9\.]*)";
var myRegex = new Regex(strRegex, RegexOptions.None);
var matchCollection = myRegex.Matches(item.Remove(0, item.Split(' ')[0].Length));
for (var i = 0; i < 2; i++)
{
if (matchCollection[i].Success)
{
items.Add(matchCollection[i].Value);
}
}
var index = item.IndexOf(items.Last()) + items.Last().Length;
var final = item.Substring(index);
if (final.Contains("\n"))
{
items.Add(final);
}
else
{
items[items.Count -1 ] = items[items.Count - 1] + final;
}
output.Add(items);
}

Related

How to count white space in a string based on other string

Suppose we have two string s1 and s2
s1 = "123 456 789 012 1234";
s2 = "1234567";
I want to print s2 with white space as given in string s1 . Output will be
Output = "123 456 7";
Approach with simple for loop
string s1 = "123 456 789 012 1234";
string s2 = "1234567";
for (int i = 0; i < s1.Length && i < s2.Length; i++)
{
if (char.IsWhiteSpace(s1[i]))
{
s2 = s2.Insert(i, s1[i].ToString());
}
}
https://dotnetfiddle.net/POn5E2
You should create a loop for s2 if the character equals s1 then add character to the result. Otherwise,until there add a space to result...
string s1 = "123 456 789 012 1234";
string s2 = "1234567";
string result = "";
int s1_index = 0;
for (int i = 0; i < s2.Length; i++)
{
if (s1[s1_index] == ' ')
while (s1[s1_index] == ' ')
{
result += ' ';
s1_index++;
}
if (s2[i] == s1[s1_index])
result += s1[s1_index];
s1_index++;
}
Console.WriteLine(result);
Simply just put a condition on number of non-whitespace characters. Not efficient solution, but it has a good idea:
string s1 = "123 456 789 012 1234";
string s2 = "1234567";
var s2Length = s2.Length;
var array = s1.TakeWhile((c, index) => s1.Substring(0, index + 1)
.Count(f => !char.IsWhiteSpace(f)) <= s2Length)
.ToArray();
var result = new string(array); //123 456 7

Converting char to ASCII symbol

So I want to make "hello world!" in a creative way and I came up with and idea to use ASCII but I don't realy now how to convert from char to ASCII symbol (neither from string). Here is my code:
public static void Main()
{
List<string> imie = new List<string>();
greet();
}
public static string greet()
{
string s = "";
string nums = "104 101 108 108 111 32 119 111 114 108 100 33";
char[] numbers = nums.ToCharArray();
foreach (char l in numbers)
{
s += Encoding.ASCII.GetChars(new byte[] {l});
}
return s;
}
Also in line "s += Encoding.ASCII.GetChars(new byte[] {l});" I am getting error "Cannot implicitly convert type 'char' to 'byte'. An explicit conversion exists (are you missing a cast?)"
here you go
public static string greet() {
string s = "";
string nums = "104 101 108 108 111 32 119 111 114 108 100 33";
var numbers = nums.Split(" ");
foreach (var nstr in numbers) {
int k = Int32.Parse(nstr);
s += Convert.ToChar(k);
}
return s;
}
or better (appending to a string is very ineffiecint)
public static string greet() {
StringBuilder s = "";
string nums = "104 101 108 108 111 32 119 111 114 108 100 33";
var numbers = nums.Split(" ");
foreach (var nstr in numbers) {
int k = Int32.Parse(nstr);
s.Append(Convert.ToChar(k));
}
return s.ToString();
}
public static string greet()
{
string nums = "104 101 108 108 111 32 119 111 114 108 100 33";
var bytes = nums.Split().Select(n => byte.Parse(n)).ToArray();
return Encoding.ASCII.GetChars(bytes);
}
Quite creative, but it seems that the level of creativity does not match the level of your C# knowledge yet... There are many misunderstandings, which makes answering this question a bit hard.
Let's start in the Main() method:
you don't use the variable
List<string> imie = new List<string>();
but actually, that type of list would be useful in a different place of the program. For the moment, let's put this line of code inside the greet() method instead.
you call greet() which returns a string, but you never use the return value. Let's surround this by a print statement:
Console.WriteLine(greet());
The Main() method now looks like
public static void Main()
{
Console.WriteLine(greet());
}
Let's go on with the greet() method.
the variable s is not very descriptive. Let's rename it to helloworld, so you have a better idea of what it's being used for.
Instead of using a single string, let's take the idea of having a list of strings instead.
List<string> numbers = new List<string>{"104", "101", "108", "108", "111", "32", "119", "111", "114", "108", "100", "33"};
We can now get rid of nums and the old numbers variables. We don't need those.
The for loop gives you a string instead of single characters (which would have been individual digits of the numbers actually). Let's also change the variable name.
foreach (string number in numbers)
It's good practice to have singular and plural in for loops.
For the string concatenation, let's use int.Parse() instead of further messing with individual digits of a character. In order for the number to become a character, we need to cast it to a char
helloworld += (char) int.Parse(number);
The method:
public static string greet()
{
List<string> numbers = new List<string>{"104", "101", "108", "108", "111", "32", "119", "111", "114", "108", "100", "33"};
string helloworld = "";
foreach (string number in numbers)
{
helloworld += (char) int.Parse(number);
}
return helloworld;
}

C# split text when delimiter may be in values [duplicate]

Given
2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,"Corvallis, OR",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34
How to use C# to split the above information into strings as follows:
2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34
As you can see one of the column contains , <= (Corvallis, OR)
Based on
C# Regex Split - commas outside quotes
string[] result = Regex.Split(samplestring, ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
Use the Microsoft.VisualBasic.FileIO.TextFieldParser class. This will handle parsing a delimited file, TextReader or Stream where some fields are enclosed in quotes and some are not.
For example:
using Microsoft.VisualBasic.FileIO;
string csv = "2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,\"Corvallis, OR\",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
// You can also read from a file
// TextFieldParser parser = new TextFieldParser("mycsvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields;
while (!parser.EndOfData)
{
fields = parser.ReadFields();
foreach (string field in fields)
{
Console.WriteLine(field);
}
}
parser.Close();
This should result in the following output:
2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34
See Microsoft.VisualBasic.FileIO.TextFieldParser for more information.
You need to add a reference to Microsoft.VisualBasic in the Add References .NET tab.
It is so much late but this can be helpful for someone. We can use RegEx as bellow.
Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] Fields = CSVParser.Split(Test);
I see that if you paste csv delimited text in Excel and do a "Text to Columns", it asks you for a "text qualifier". It's defaulted to a double quote so that it treats text within double quotes as literal. I imagine that Excel implements this by going one character at a time, if it encounters a "text qualifier", it keeps going to the next "qualifier". You can probably implement this yourself with a for loop and a boolean to denote if you're inside literal text.
public string[] CsvParser(string csvText)
{
List<string> tokens = new List<string>();
int last = -1;
int current = 0;
bool inText = false;
while(current < csvText.Length)
{
switch(csvText[current])
{
case '"':
inText = !inText; break;
case ',':
if (!inText)
{
tokens.Add(csvText.Substring(last + 1, (current - last)).Trim(' ', ','));
last = current;
}
break;
default:
break;
}
current++;
}
if (last != csvText.Length - 1)
{
tokens.Add(csvText.Substring(last+1).Trim());
}
return tokens.ToArray();
}
You could split on all commas that do have an even number of quotes following them.
You would also like to view at the specf for CSV format about handling comma's.
Useful Link : C# Regex Split - commas outside quotes
Use a library like LumenWorks to do your CSV reading. It'll handle fields with quotes in them and will likely overall be more robust than your custom solution by virtue of having been around for a long time.
It is a tricky matter to parse .csv files when the .csv file could be either comma separated strings, comma separated quoted strings, or a chaotic combination of the two. The solution I came up with allows for any of the three possibilities.
I created a method, ParseCsvRow() which returns an array from a csv string. I first deal with double quotes in the string by splitting the string on double quotes into an array called quotesArray. Quoted string .csv files are only valid if there is an even number of double quotes. Double quotes in a column value should be replaced with a pair of double quotes (This is Excel's approach). As long as the .csv file meets these requirements, you can expect the delimiter commas to appear only outside of pairs of double quotes. Commas inside of pairs of double quotes are part of the column value and should be ignored when splitting the .csv into an array.
My method will test for commas outside of double quote pairs by looking only at even indexes of the quotesArray. It also removes double quotes from the start and end of column values.
public static string[] ParseCsvRow(string csvrow)
{
const string obscureCharacter = "ᖳ";
if (csvrow.Contains(obscureCharacter)) throw new Exception("Error: csv row may not contain the " + obscureCharacter + " character");
var unicodeSeparatedString = "";
var quotesArray = csvrow.Split('"'); // Split string on double quote character
if (quotesArray.Length > 1)
{
for (var i = 0; i < quotesArray.Length; i++)
{
// CSV must use double quotes to represent a quote inside a quoted cell
// Quotes must be paired up
// Test if a comma lays outside a pair of quotes. If so, replace the comma with an obscure unicode character
if (Math.Round(Math.Round((decimal) i/2)*2) == i)
{
var s = quotesArray[i].Trim();
switch (s)
{
case ",":
quotesArray[i] = obscureCharacter; // Change quoted comma seperated string to quoted "obscure character" seperated string
break;
}
}
// Build string and Replace quotes where quotes were expected.
unicodeSeparatedString += (i > 0 ? "\"" : "") + quotesArray[i].Trim();
}
}
else
{
// String does not have any pairs of double quotes. It should be safe to just replace the commas with the obscure character
unicodeSeparatedString = csvrow.Replace(",", obscureCharacter);
}
var csvRowArray = unicodeSeparatedString.Split(obscureCharacter[0]);
for (var i = 0; i < csvRowArray.Length; i++)
{
var s = csvRowArray[i].Trim();
if (s.StartsWith("\"") && s.EndsWith("\""))
{
csvRowArray[i] = s.Length > 2 ? s.Substring(1, s.Length - 2) : ""; // Remove start and end quotes.
}
}
return csvRowArray;
}
One downside of my approach is the way I temporarily replace delimiter commas with an obscure unicode character. This character needs to be so obscure, it would never show up in your .csv file. You may want to put more handling around this.
This question and its duplicates have a lot of answers. I tried this one that looked promising, but found some bugs in it. I heavily modified it so that it would pass all of my tests.
/// <summary>
/// Returns a collection of strings that are derived by splitting the given source string at
/// characters given by the 'delimiter' parameter. However, a substring may be enclosed between
/// pairs of the 'qualifier' character so that instances of the delimiter can be taken as literal
/// parts of the substring. The method was originally developed to split comma-separated text
/// where quotes could be used to qualify text that contains commas that are to be taken as literal
/// parts of the substring. For example, the following source:
/// A, B, "C, D", E, "F, G"
/// would be split into 5 substrings:
/// A
/// B
/// C, D
/// E
/// F, G
/// When enclosed inside of qualifiers, the literal for the qualifier character may be represented
/// by two consecutive qualifiers. The two consecutive qualifiers are distinguished from a closing
/// qualifier character. For example, the following source:
/// A, "B, ""C"""
/// would be split into 2 substrings:
/// A
/// B, "C"
/// </summary>
/// <remarks>Originally based on: https://stackoverflow.com/a/43284485/2998072</remarks>
/// <param name="source">The string that is to be split</param>
/// <param name="delimiter">The character that separates the substrings</param>
/// <param name="qualifier">The character that is used (in pairs) to enclose a substring</param>
/// <param name="toTrim">If true, then whitespace is removed from the beginning and end of each
/// substring. If false, then whitespace is preserved at the beginning and end of each substring.
/// </param>
public static List<String> SplitQualified(this String source, Char delimiter, Char qualifier,
Boolean toTrim)
{
// Avoid throwing exception if the source is null
if (String.IsNullOrEmpty(source))
return new List<String> { "" };
var results = new List<String>();
var result = new StringBuilder();
Boolean inQualifier = false;
// The algorithm is designed to expect a delimiter at the end of each substring, but the
// expectation of the caller is that the final substring is not terminated by delimiter.
// Therefore, we add an artificial delimiter at the end before looping through the source string.
String sourceX = source + delimiter;
// Loop through each character of the source
for (var idx = 0; idx < sourceX.Length; idx++)
{
// If current character is a delimiter
// (except if we're inside of qualifiers, we ignore the delimiter)
if (sourceX[idx] == delimiter && inQualifier == false)
{
// Terminate the current substring by adding it to the collection
// (trim if specified by the method parameter)
results.Add(toTrim ? result.ToString().Trim() : result.ToString());
result.Clear();
}
// If current character is a qualifier
else if (sourceX[idx] == qualifier)
{
// ...and we're already inside of qualifier
if (inQualifier)
{
// check for double-qualifiers, which is escape code for a single
// literal qualifier character.
if (idx + 1 < sourceX.Length && sourceX[idx + 1] == qualifier)
{
idx++;
result.Append(sourceX[idx]);
continue;
}
// Since we found only a single qualifier, that means that we've
// found the end of the enclosing qualifiers.
inQualifier = false;
continue;
}
else
// ...we found an opening qualifier
inQualifier = true;
}
// If current character is neither qualifier nor delimiter
else
result.Append(sourceX[idx]);
}
return results;
}
Here are the test methods to prove that it works:
[TestMethod()]
public void SplitQualified_00()
{
// Example with no substrings
String s = "";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "" }, substrings);
}
[TestMethod()]
public void SplitQualified_00A()
{
// just a single delimiter
String s = ",";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "", "" }, substrings);
}
[TestMethod()]
public void SplitQualified_01()
{
// Example with no whitespace or qualifiers
String s = "1,2,3,1,2,3";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_02()
{
// Example with whitespace and no qualifiers
String s = " 1, 2 ,3, 1 ,2\t, 3 ";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_03()
{
// Example with whitespace and no qualifiers
String s = " 1, 2 ,3, 1 ,2\t, 3 ";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(
new List<String> { " 1", " 2 ", "3", " 1 ", "2\t", " 3 " },
substrings);
}
[TestMethod()]
public void SplitQualified_04()
{
// Example with no whitespace and trivial qualifiers.
String s = "1,\"2\",3,1,2,\"3\"";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
s = "\"1\",\"2\",3,1,\"2\",3";
substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_05()
{
// Example with no whitespace and qualifiers that enclose delimiters
String s = "1,\"2,2a\",3,1,2,\"3,3a\"";
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2,2a", "3", "1", "2", "3,3a" },
substrings);
s = "\"1,1a\",\"2,2b\",3,1,\"2,2c\",3";
substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1,1a", "2,2b", "3", "1", "2,2c", "3" },
substrings);
}
[TestMethod()]
public void SplitQualified_06()
{
// Example with qualifiers enclosing whitespace but no delimiter
String s = "\" 1 \",\"2 \",3,1,2,\"\t3\t\"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" },
substrings);
}
[TestMethod()]
public void SplitQualified_07()
{
// Example with qualifiers enclosing whitespace but no delimiter
String s = "\" 1 \",\"2 \",3,1,2,\"\t3\t\"";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", "2 ", "3", "1", "2", "\t3\t" },
substrings);
}
[TestMethod()]
public void SplitQualified_08()
{
// Example with qualifiers enclosing whitespace but no delimiter; also whitespace btwn delimiters
String s = "\" 1 \", \"2 \" , 3,1, 2 ,\" 3 \"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" },
substrings);
}
[TestMethod()]
public void SplitQualified_09()
{
// Example with qualifiers enclosing whitespace but no delimiter; also whitespace btwn delimiters
String s = "\" 1 \", \"2 \" , 3,1, 2 ,\" 3 \"";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 ", " 3", "1", " 2 ", " 3 " },
substrings);
}
[TestMethod()]
public void SplitQualified_10()
{
// Example with qualifiers enclosing whitespace and delimiter
String s = "\" 1 \",\"2 , 2b \",3,1,2,\" 3,3c \"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2 , 2b", "3", "1", "2", "3,3c" },
substrings);
}
[TestMethod()]
public void SplitQualified_11()
{
// Example with qualifiers enclosing whitespace and delimiter; also whitespace btwn delimiters
String s = "\" 1 \", \"2 , 2b \" , 3,1, 2 ,\" 3,3c \"";
// whitespace should be preserved
var substrings = s.SplitQualified(',', '"', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 , 2b ", " 3", "1", " 2 ", " 3,3c " },
substrings);
}
[TestMethod()]
public void SplitQualified_12()
{
// Example with tab characters between delimiters
String s = "\t1,\t2\t,3,1,\t2\t,\t3\t";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_13()
{
// Example with newline characters between delimiters
String s = "\n1,\n2\n,3,1,\n2\n,\n3\n";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_14()
{
// Example with qualifiers enclosing whitespace and delimiter, plus escaped qualifier
String s = "\" 1 \",\"\"\"2 , 2b \"\"\",3,1,2,\" \"\"3,3c \"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "\"2 , 2b \"", "3", "1", "2", "\"3,3c" },
substrings);
}
[TestMethod()]
public void SplitQualified_14A()
{
// Example with qualifiers enclosing whitespace and delimiter, plus escaped qualifier
String s = "\"\"\"1\"\"\"";
// whitespace should be removed
var substrings = s.SplitQualified(',', '"', true);
CollectionAssert.AreEquivalent(new List<String> { "\"1\"" },
substrings);
}
[TestMethod()]
public void SplitQualified_15()
{
// Instead of comma-delimited and quote-qualified, use pipe and hash
// Example with no whitespace or qualifiers
String s = "1|2|3|1|2,2f|3";
var substrings = s.SplitQualified('|', '#', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2,2f", "3" }, substrings);
}
[TestMethod()]
public void SplitQualified_16()
{
// Instead of comma-delimited and quote-qualified, use pipe and hash
// Example with qualifiers enclosing whitespace and delimiter
String s = "# 1 #|#2 | 2b #|3|1|2|# 3|3c #";
// whitespace should be removed
var substrings = s.SplitQualified('|', '#', true);
CollectionAssert.AreEquivalent(new List<String> { "1", "2 | 2b", "3", "1", "2", "3|3c" },
substrings);
}
[TestMethod()]
public void SplitQualified_17()
{
// Instead of comma-delimited and quote-qualified, use pipe and hash
// Example with qualifiers enclosing whitespace and delimiter; also whitespace btwn delimiters
String s = "# 1 #| #2 | 2b # | 3|1| 2 |# 3|3c #";
// whitespace should be preserved
var substrings = s.SplitQualified('|', '#', false);
CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 | 2b ", " 3", "1", " 2 ", " 3|3c " },
substrings);
}
I had a problem with a CSV that contains fields with a quote character in them, so using the TextFieldParser, I came up with the following:
private static string[] parseCSVLine(string csvLine)
{
using (TextFieldParser TFP = new TextFieldParser(new MemoryStream(Encoding.UTF8.GetBytes(csvLine))))
{
TFP.HasFieldsEnclosedInQuotes = true;
TFP.SetDelimiters(",");
try
{
return TFP.ReadFields();
}
catch (MalformedLineException)
{
StringBuilder m_sbLine = new StringBuilder();
for (int i = 0; i < TFP.ErrorLine.Length; i++)
{
if (i > 0 && TFP.ErrorLine[i]== '"' &&(TFP.ErrorLine[i + 1] != ',' && TFP.ErrorLine[i - 1] != ','))
m_sbLine.Append("\"\"");
else
m_sbLine.Append(TFP.ErrorLine[i]);
}
return parseCSVLine(m_sbLine.ToString());
}
}
}
A StreamReader is still used to read the CSV line by line, as follows:
using(StreamReader SR = new StreamReader(FileName))
{
while (SR.Peek() >-1)
myStringArray = parseCSVLine(SR.ReadLine());
}
With Cinchoo ETL - an open source library, it can automatically handles columns values containing separators.
string csv = #"2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,""Corvallis, OR"",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
using (var p = ChoCSVReader.LoadText(csv)
)
{
Console.WriteLine(p.Dump());
}
Output:
Key: Column1 [Type: String]
Value: 2
Key: Column2 [Type: String]
Value: 1016
Key: Column3 [Type: String]
Value: 7/31/2008 14:22
Key: Column4 [Type: String]
Value: Geoff Dalgas
Key: Column5 [Type: String]
Value: 6/5/2011 22:21
Key: Column6 [Type: String]
Value: http://stackoverflow.com
Key: Column7 [Type: String]
Value: Corvallis, OR
Key: Column8 [Type: String]
Value: 7679
Key: Column9 [Type: String]
Value: 351
Key: Column10 [Type: String]
Value: 81
Key: Column11 [Type: String]
Value: b437f461b3fd27387c5d8ab47a293d35
Key: Column12 [Type: String]
Value: 34
For more information, please visit codeproject article.
Hope it helps.

How to Split Year month Day,if Year does not exist in the given string using c#?

Here the below code am working for Split the number from the given string and stores the correspond integer into combobox.That working Perfect.But
i want to know ,If Year does not exist in the string,how to assign Year as Zero and the next integer for month strores in second combobox
For example :If string is "4Month(s)2Day(s)" Here No Year,So how to check Year not contains and insert Zero to combobox1,4 to combobox2 and 2 to combobox3
in the following code
int count = 0;
string[] delimiterChars = {"Year","Years","Years(s)","Month","Month(s)","Day","Day(s)"};
string variable =agee;
string[] words = variable.Split(delimiterChars, StringSplitOptions.None);
foreach (string s in words)
{
var data = Regex.Match(s, #"\d+").Value;
count++;
if (count == 1)
{
comboBox1.Text = data;
}
else if (count == 2)
{
comboBox2.Text = data;
}
else if (count == 3)
{
comboBox3.Text = data;
}
}
You can do with Regex like this
int combBox1, combBox2, combBox3;
var sample = "1Year(s)4month(s)2DaY(s)";
var yearString = Regex.Match(sample, #"\d+Year", RegexOptions.IgnoreCase).Value;
if (!string.IsNullOrEmpty(yearString))
combBox1 = int.Parse(Regex.Match(yearString, #"\d+").Value);
var monthString = Regex.Match(sample, #"\d+Month", RegexOptions.IgnoreCase).Value;
if (!string.IsNullOrEmpty(monthString))
combBox2 = int.Parse(Regex.Match(monthString, #"\d+").Value);
var dayStrings = Regex.Match(sample, #"\d+Day", RegexOptions.IgnoreCase).Value;
if (!string.IsNullOrEmpty(dayStrings))
combBox3 = int.Parse(Regex.Match(dayStrings, #"\d+").Value);
You can skip the int.Parse() if you want, then you have to set 0 manually.
Instead of first splitting the string and then using a RegEx to parse the parts, I'd use a RegEx for the entire work.
Using Regex Hero's tester (requires Silverlight to work...) I came up with the following:
(?:(?<years>\d+)Year\(?s?\)?)?(?<months>\d+)Month\(?s?\)?(?<days>\d+)Day\(?s?\)?
This matches all of the following inputs
Input Matching groups:
***** ****************
4Month(s)2Day(s) months: 4, days: 2
1Year(s)4Month(s)2Day(s) years: 1, months: 4, days: 2
3Years6Month(s)14Day(s) years: 3, months: 6, days: 14
1Year1Month1Day years: 1, months, 1, days: 1
As you see, it matches everything that's there. If you don't have a match for years, you can test for that with the Success property of the capture group.
Sample
var pattern = #"(?:(?<years>\d+)Year\(?s?\)?)?(?<months>\d+)Month\(?s?\)?(?<days>\d+)Day\(?s?\)?";
var regex = new Regex(pattern);
var testCases = new List<string> {
"4Month(s)2Day(s)",
"1Year(s)4Month(s)2Day(s)",
"3Years6Month(s)14Day(s)",
"1Year1Month1Day"
};
foreach (var test in testCases) {
var match = regex.Match(test);
var years = match.Groups["years"].Success ? match.Groups["years"].Value : "0";
var months = match.Groups["months"].Value;
var days = match.Groups["days"].Value;
string.Format("input: {3}, years: {0}, months: {1}, days: {2}", years, months, days, test).Dump();
}
Run that in LinqPad, and you'll see
input: 4Month(s)2Day(s), years: 0, months: 4, days: 2
input: 1Year(s)4Month(s)2Day(s), years: 1, months: 4, days: 2
input: 3Years6Month(s)14Day(s), years: 3, months: 6, days: 14
input: 1Year1Month1Day, years: 1, months: 1, days: 1
I think you have another problem here. If you split the string, you don't now if the value is a year, month or day. This information get lost with splitting. Maybe you should parse the string another way, to get this information.
You can create 3 boolean variables to check whether you have year day and month in your string, and check that boolean variable before assigning value to that combobox.
if(variable.Contains("Year"))
bool Hasyear = true;
if(variable.Contains("Month"))
bool HasMonth= true;
if(variable.Contains("Day"))
bool HasDay= true;
Use a better pattern
string input1 = "1Year(s)4Month(s)2Day(s)";
string pattern1 = #"(?'year'\d+)?(Year(\(s\))?)?(?'month'\d+)(Month(\(s\))?)?(?'day'\d+)(Day(\(s\))?)?";
Match match1 = Regex.Match(input1, pattern1);
string year1 = match1.Groups["year"].Value;
string month1 = match1.Groups["month"].Value;
string day1 = match1.Groups["day"].Value;
string input2 = "4Month(s)2Day(s)";
string pattern2 = #"(?'year'\d+)?(Year(\(s\))?)?(?'month'\d+)(Month(\(s\))?)?(?'day'\d+)(Day(\(s\))?)?";
Match match2 = Regex.Match(input2, pattern2);
string year2 = match2.Groups["year"].Value;
string month2 = match2.Groups["month"].Value;
string day2 = match2.Groups["day"].Value;​
You could very simply do like this:
string agee = "1Year4Month(s)2Day(s)";
string[] delimiterChars = {"Year", "Month", "Day"};
string variable =agee.Replace("(s)","").Replace("s", "");
string[] words = variable.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
int count = words.Length;
switch (count)
{
case 0:
combobox1.Text = "0";
combobox2.Text = "0";
combobox3.Text = "0";
break;
case 1:
combobox1.Text = "0";
combobox2.Text = "0";
combobox3.Text = words[0];
break;
case 2:
combobox1.Text = "0";
combobox2.Text = words[0];
combobox3.Text = words[1];
break;
case 2:
combobox1.Text = words[0];
combobox2.Text = words[1];
combobox3.Text = words[2];
break;
}

How to pull out alpha and count digits using regex?

I want to build Regex in C#. I need to know how to pull out alpha and count digits using Regex.
string example = "ASDFG 3457";
Need to pull out of "ASDFG" and then count digits (eg 4 or 5 - 7). If finding 4 digits, return value = 3457 without alpha. How to do this in C#?
I know it is better to do without regex but i have a requirement that i must use regex for a reason.
If all your doing is trying to get the numbers from a piece of text you can do this:
string expr=#"\d+";
string text="ASDFG 3457":
MatchCollection mc = Regex.Matches(text, expr);
foreach (Match m in mc)
{
Console.WriteLine(m);
}
regex
(?<alpha>\w*) (?<number>\d*)
this extracts two named groups: alpha and number.
It assumes the first group only contain words and the second only contains digits and that they are separated by a blank space.
None of them are mandatory.
If you need to make them mandatory you could replace * with +
You can also force the number of digits to four with \d{4}
I'd recommend you reading a regex tutorial and take some c# sample from the web. #Srb1313711 answer already helps you on that.
Obviously (cough) the simplest "solution" is here:
using System;
using System.Collections.Generic;
class Program
{
private static IEnumerable<long> ParseNumbers(IEnumerable<char> stream)
{
bool eos = false;
using (var it = stream.GetEnumerator())
do
{
Func<bool> advance = () => !(eos = !it.MoveNext());
while (advance() && !char.IsDigit(it.Current)) ;
if (eos) break;
long accum = 0;
do accum = accum * 10 + (it.Current - '0');
while (advance() && char.IsDigit(it.Current));
yield return accum;
}
while (!eos);
}
static void Main()
{
foreach (var num in ParseNumbers("ASDFG 3457 ASDFG.\n 123457"))
{
Console.WriteLine(num);
}
}
}
For fun, of course.
Edit
For more fun: the unsafe variation. Note this is also no longer deferred, so it won't work if not all input has arrived yet, and it generates an eager list of values:
using System;
using System.Collections.Generic;
class Program
{
private static unsafe List<long> ParseNumbers(char[] input)
{
var r = new List<long>();
fixed (char* begin = input)
{
char* it = begin, end = begin + input.Length;
while (true)
{
while (it != end && (*it < '0' || *it > '9'))
++it;
if (it == end) break;
long accum = 0;
while (it != end && *it >= '0' && *it <= '9')
accum = accum * 10 + (*(it++) - '0');
r.Add(accum);
}
}
return r;
}
static void Main()
{
foreach (var number in ParseNumbers("ASDFG 3457 ASDFG.\n 123457".ToCharArray()))
{
Console.WriteLine(number);
}
}
}
Description
This regular expression will:
capture the text into group 1
count the number of digits and place them into a capture group based on how many where found
Capture group 2 will have numbers which are 8 or more digits long
Capture group 3 will have numbers which are 5-7 digits long
Capture group 4 will have numbers which are exactly 4 digits long
Capture group 5 will have numbers which are 1-3 digits long
([A-Za-z]*) (?:(\d{8,})|(\d{5,7})|(\d{4})|(\d{1,3}))
Example
Live Demo: http://www.rubular.com/r/AIO9uUNNQc
Sample Text
ASDFG 1234567890
ASDFG 123456789
ASDFG 12345678
ASDFG 1234567
ASDFG 123456
ASDFG 12345
ASDFG 1234
ASDFG 123
ASDFG 12
ASDFG 1
Capture Groups
[0][0] = ASDFG 1234567890
[0][1] = ASDFG
[0][2] = 1234567890
[0][3] =
[0][4] =
[0][5] =
[1][0] = ASDFG 123456789
[1][1] = ASDFG
[1][2] = 123456789
[1][3] =
[1][4] =
[1][5] =
[2][0] = ASDFG 12345678
[2][1] = ASDFG
[2][2] = 12345678
[2][3] =
[2][4] =
[2][5] =
[3][0] = ASDFG 1234567
[3][1] = ASDFG
[3][2] =
[3][3] = 1234567
[3][4] =
[3][5] =
[4][0] = ASDFG 123456
[4][1] = ASDFG
[4][2] =
[4][3] = 123456
[4][4] =
[4][5] =
[5][0] = ASDFG 12345
[5][1] = ASDFG
[5][2] =
[5][3] = 12345
[5][4] =
[5][5] =
[6][0] = ASDFG 1234
[6][1] = ASDFG
[6][2] =
[6][3] =
[6][4] = 1234
[6][5] =
[7][0] = ASDFG 123
[7][1] = ASDFG
[7][2] =
[7][3] =
[7][4] =
[7][5] = 123
[8][0] = ASDFG 12
[8][1] = ASDFG
[8][2] =
[8][3] =
[8][4] =
[8][5] = 12
[9][0] = ASDFG 1
[9][1] = ASDFG
[9][2] =
[9][3] =
[9][4] =
[9][5] = 1

Categories

Resources