How to get value specific column value in csv using c#? - c#

I do a project in c# winforms.
I want to get first column value in csv.
var reader = new StreamReader(File.OpenRead(#"C:\test.csv"));
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(';');
listA.Add(values[0]);
}
------------------
no |name |
------------------
1 |wwwwww
2 |yyyyy
3 |aaaaa
4 |bbbbbb
Now I am using above this code. It gives the value row by row. i want all name value in listA
Any one have idea?

There is now way to read column in CSV without reading whole file. You can use some wrappers (for example: LINQ to CSV library) but they will just "hide" reading operation.

Yes - you're currently spliting on ;
Try using a comma instead.
Better to use a dedicated library btw...

Some frown upon Regex but I think it provides good flexibility. Here is an example inspired by
Adrian Mejia. Basically, you can choose particular characters between which the delimiter is valid in the context. i.e. a comma in "hello, world" or 'hello, world' would be valid.
static void Main(string[] args)
{
string csv = "Hello,1,3.5,25,\"speech marks\",'inverted commas'\r\nWorld,2,4,60,\"again, more speech marks\",'something else in inverted commas, with a comma'";
// General way to create grouping constructs which are valid 'text' fields
string p = "{0}([^{0}]*){0}"; // match group '([^']*)' (inverted commas) or \"([^\"]*)\" (speech marks)
string c = "(?<={0}|^)([^{0}]*)(?:{0}|$)"; // commas or other delimiter group (?<=,|^)([^,]*)(?:,|$)
char delimiter = ','; // this can be whatever delimiter you like
string p1 = String.Format(p, "\""); // speechmarks group (0)
string p2 = String.Format(p, "'"); // inverted comma group (1)
string c1 = String.Format(c, delimiter); // delimiter group (2)
/*
* The first capture group will be speech marks ie. "some text, "
* The second capture group will be inverted commas ie. 'this text'
* The third is everything else seperated by commas i.e. this,and,this will be [this][and][this]
* You can extend this to customise delimiters that represent text where a comma between is a valid entry eg. "this text, complete with a pause, is perfectly valid"
*
* */
//string pattern = "\"([^\"]*)\"|'([^']*)'|(?<=,|^)([^,]*)(?:,|$)";
string pattern = String.Format("{0}|{1}|{2}", new object[] { p1, p2, c1 }); // The actual pattern to match based on groups
string text = csv;
// If you're reading from a text file then this will do the trick. Uses the ReadToEnd() to put the whole file to a string.
//using (TextReader tr = new StreamReader("PATH TO MY CSV FILE", Encoding.ASCII))
//{
// text = tr.ReadToEnd(); // just read the whole stream
//}
string[] lines = text.Split(new char[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries); // if you have a blank line just remove it?
Regex regex = new Regex(pattern, RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase); // compile for speed
List<object> rowsOfColumns = new List<object>();
foreach (string row in lines)
{
List<string> columns = new List<string>();
// Find matches.
MatchCollection matches = regex.Matches(row);
foreach (Match match in matches)
{
for (int ii = 0; ii < match.Groups.Count; ii++)
{
if (match.Groups[ii].Success) // ignore things that don't match
{
columns.Add(match.Groups[ii].Value.TrimEnd(new char[] { delimiter })); // strip the delimiter
break;
}
}
}
// Do something with your columns here (add to List for example)
rowsOfColumns.Add(columns);
}
}

var reader = new StreamReader(File.OpenRead(#"C:\test.csv"));
List<string> listA = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
string[] dates = line.Split(',');
for (int i = 0; i < dates.Length; i++)
{
if(i==0)
listA.Add(dates[0]);
}
}

Related

Splitting a line of text that has key value pairs where value can be empty

I need to split a line of text
The general syntax for a delivery instruction is |||name|value||name|value||…..|||
Each delivery instruction starts and ends with 3 pipe characters - |||
A delivery instruction is a set of name/value pairs separated by a single pipe eg name|value
Each name value pair is separated by 2 pipe characters ||
Names and Values may not contain the pipe character
The value of any pair may be a blank string.
I need a regex that will help me resolve the above problem.
My latest attempt with my limited Regex skills:
string SampleData = "|||env|af245g||mail_idx|39||gen_date|2016/01/03 11:40:06||docm_name|Client Statement (01.03.2015−31.03.2015)||docm_cat_name|Client Statement||docm_type_id|9100||docm_type_name|Client Statement||addr_type_id|1||addr_type_name|Postal address||addr_street_nr|||addr_street_name|Robinson Road||addr_po_box|||addr_po_box_type|||addr_postcode|903334||addr_city|Singapore||addr_state|||addr_country_id|29955||addr_country_name|Singapore||obj_nr|10000023||bp_custr_type|Customer||access_portal|Y||access_library|Y||avsr_team_id|13056||pri_avsr_id|||pri_avsr_name|||ctact_phone|||dlv_type_id|5001||dlv_type_name|Channel to standard mail||ao_id|14387||ao_name|Corp Limited||ao_title|||ao_mob_nr|||ao_email_addr||||??";
string[] Split = Regex.Matches(SampleData, "(\|\|\|(?:\w+\|\w*\|\|)*\|)").Cast<Match>().Select(m => m.Value).ToArray();
The expected output should be as follows(based on the sample data string provided):
env|af245g
mail_idx|39
gen_date|2016/01/03 11:40:06
docm_name|Client Statement (01.03.2015−31.03.2015)
docm_cat_name|Client Statement
docm_type_id|9100
docm_type_name|Client Statement
addr_type_id|1
addr_type_name|Postal address
addr_street_nr|
addr_street_name|Robinson Road
addr_po_box|
addr_po_box_type|
addr_postcode|903334
addr_city|Singapore
addr_state|
addr_country_id|29955
addr_country_name|Singapore
obj_nr|10000023
bp_custr_type|Customer
access_portal|Y
access_library|Y
avsr_team_id|13056
pri_avsr_id|
pri_avsr_name|
ctact_phone|
dlv_type_id|5001
dlv_type_name|Channel to standard mail
ao_id|14387
ao_name|Corp Limited
ao_title|
ao_mob_nr|
ao_email_addr|
You can also do it without using Regex. Its just simple splitting.
string nameValues = "|||zeeshan|1||ali|2||ahsan|3|||";
string sub = nameValues.Substring(3, nameValues.Length - 6);
Dictionary<string, string> dic = new Dictionary<string, string>();
string[] subsub = sub.Split(new string[] {"||"}, StringSplitOptions.None);
foreach (string item in subsub)
{
string[] nameVal = item.Split('|');
dic.Add(nameVal[0], nameVal[1]);
}
foreach (var item in dic)
{
// Retrieve key and value here i.e:
// item.Key
// item.Value
}
Hope this helps.
I think you're making this more difficult than it needs to be. This regex yields the desired result:
#"[^|]+\|([^|]*)"
Assuming you're dealing with a single, well-formed delivery instruction, there's no need to match the starting and ending triple-pipes. You don't need to worry about the double-pipe separators either, because the "name" part of the "name|value" pair is always present. Just look for the first thing that looks like a name with a pipe following it, and everything up to the next pipe character is the value.
(?<=\|\|\|).*?(?=\|\|\|)
You can use this to get all the key value pairs between |||.See demo.
https://regex101.com/r/fM9lY3/59
string strRegex = #"(?<=\|\|\|).*?(?=\|\|\|)";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline);
string strTargetString = #"|||env|af245g||mail_idx|39||gen_date|2016/01/03 11:40:06||docm_name|Client Statement (01.03.2015−31.03.2015)||docm_cat_name|Client Statement||docm_type_id|9100||docm_type_name|Client Statement||addr_type_id|1||addr_type_name|Postal address||addr_street_nr|||addr_street_name|Robinson Road||addr_po_box|||addr_po_box_type|||addr_postcode|903334||addr_city|Singapore||addr_state|||addr_country_id|29955||addr_country_name|Singapore||obj_nr|10000023||bp_custr_type|Customer||access_portal|Y||access_library|Y||avsr_team_id|13056||pri_avsr_id|||pri_avsr_name|||ctact_phone|||dlv_type_id|5001||dlv_type_name|Channel to standard mail||ao_id|14387||ao_name|Corp Limited||ao_title|||ao_mob_nr|||ao_email_addr||||??";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}
Here's a variation of #Syed Muhammad Zeeshan code that runs faster:
string nameValues = "|||zeeshan|1||ali|2||ahsan|3|||";
string[] nameArray = nameValues.Split(new char[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
Dictionary<string, string> dic = new Dictionary<string, string>();
int i = 0;
foreach (string item in nameArray)
{
if (i < nameArray.Length - 1)
dic.Add(nameArray[i], nameArray[i + 1]);
i = i + 2;
}
Interesting, I will like to try:
class Program
{
static void Main(string[] args)
{
string nameValueList = "|||zeeshan|1||ali|2||ahsan|3|||";
while (nameValueList != "|||")
{
nameValueList = nameValueList.TrimStart('|');
string nameValue = GetNameValue(ref nameValueList);
Console.WriteLine(nameValue);
}
Console.ReadLine();
}
private static string GetNameValue(ref string nameValues)
{
string retVal = string.Empty;
while(nameValues[0] != '|') // for name
{
retVal += nameValues[0];
nameValues = nameValues.Remove(0, 1);
}
retVal += nameValues[0];
nameValues = nameValues.Remove(0, 1);
while (nameValues[0] != '|') // for value
{
retVal += nameValues[0];
nameValues = nameValues.Remove(0, 1);
}
return retVal;
}
}
https://dotnetfiddle.net/WRbsRu

How to parse out all unique variables with a certain naming convention?

I have a code file and I need to find all unique objects of type TADODataSet, but they aren't defined in this 30,000 line file I have.
I wrote a console application that splits each line into individual words and adds that word to a list if it contains ADODataSet (the naming convention prefix for the objects I'm interested in) but this didn't work quite right because of how I'm splitting my lines of code.
This is all of my code:
static void Main(string[] args)
{
string file = #"C:\somePath\Form1.cs";
string output = #"C:\someOtherPath\New Text Document.txt";
List<string> datasets = new List<string>();
string[] lines = File.ReadAllLines(file);
foreach (string line in lines)
{
string[] words = line.Split(' ');
foreach (string word in words)
{
if (word.ToLower().Contains("adodataset"))
datasets.Add(word);
}
}
if (datasets.Count > 0)
{
using (StreamWriter sw = new StreamWriter(output))
{
foreach (string dataset in datasets.Distinct())
{
sw.WriteLine(dataset);
}
}
Console.WriteLine(String.Format("Wrote {0} data sets to {1}", datasets.Distinct().Count(), output));
Console.ReadKey();
}
}
But this didn't work as I hoped, and added "words" such as these:
SQLText(ADODataSetEnrollment->FieldByName("Age1")->AsString)
SQLText(ADODataSetEnrollment->FieldByName("Age2")->AsString)
SQLText(ADODataSetEnrollment->FieldByName("Age3")->AsString)
I'm only interested in ADODataSetEnrollment, so I should only have 1 entry for that variable in my output file but because that line of code doesn't contain a space it's treated as a single "word".
How can I split my lines array instead, so that way I can find unique variables?
Have you tried RegEx matching? With RegEx you can for example say
RegEx.IsMatch(word, "(?i)(?<!\w)adodataset(?!\w)")
> (?i) means ignore case (like uppercase, lower case, i think)
> (?<!\w)means not preceded by a literal (like letters, ABC..., abc... and so
> forth)
> (?!\w) means not followed by a literal RegEx.IsMatch(...)
> returns a bool value
Ended up with this as a solution:
string file = #"C:\somePath\Form1.cs";
string output = #"C:\someOtherPath\New Text Document.txt";
List<string> datasets = new List<string>();
string[] lines = File.ReadAllLines(file);
decimal i = 0;
foreach (string line in lines)
{
string[] words = line.Split(' ');
foreach (string word in words)
{
if (word.ToLower().Contains("adodataset"))
{
int start = word.ToLower().IndexOf("adodataset");
string dsWord = String.Empty;
string temp = word.Substring(start, word.Length - start);
foreach (char c in temp)
{
if (Char.IsLetter(c))
dsWord += c;
else
break;
}
if (dsWord != String.Empty)
datasets.Add(dsWord);
}
}
i++;
Console.Write("\r{0}% ", Math.Round(i / lines.Count() * 100, 2));
}
if (datasets.Count > 0)
{
using (StreamWriter sw = new StreamWriter(output))
{
foreach (string dataset in datasets.Distinct())
sw.WriteLine(dataset);
}
Console.WriteLine(String.Format("Wrote {0} data sets to {1}", datasets.Distinct().Count(), output));
Console.ReadKey();
}
Pretty ghetto, but it did what I needed it to do. I'll happily accept someone else's answer though if they know of a better way to use Regex to just pull out the variable name from within the line of code, rather than the whole line itself.
You can try this solution:
string file = File.ReadAllText(#"text.txt");
string output = #"C:\someOtherPath\New Text Document.txt";
List<string> datasets = new List<string>();
var a = Regex.Matches(file, #"\W(ADODataSet\w*)", RegexOptions.IgnoreCase);
foreach (Match m in a)
{
datasets.Add(m.Groups[1].Value);
}

Replace placeholders in order

I have a part of a URL like this:
/home/{value1}/something/{anotherValue}
Now i want to replace all between the brackets with values from a string-array.
I tried this RegEx pattern: \{[a-zA-Z_]\} but it doesn't work.
Later (in C#) I want to replace the first match with the first value of the array, second with the second.
Update: The /'s cant be used to separate. Only the placeholders {...} should be replaced.
Example: /home/before{value1}/and/{anotherValue}
String array: {"Tag", "1"}
Result: /home/beforeTag/and/1
I hoped it could works like this:
string input = #"/home/before{value1}/and/{anotherValue}";
string pattern = #"\{[a-zA-Z_]\}";
string[] values = {"Tag", "1"};
MatchCollection mc = Regex.Match(input, pattern);
for(int i, ...)
{
mc.Replace(values[i];
}
string result = mc.GetResult;
Edit:
Thank you Devendra D. Chavan and ipr101,
both solutions are greate!
You can try this code fragment,
// Begin with '{' followed by any number of word like characters and then end with '}'
var pattern = #"{\w*}";
var regex = new Regex(pattern);
var replacementArray = new [] {"abc", "cde", "def"};
var sourceString = #"/home/{value1}/something/{anotherValue}";
var matchCollection = regex.Matches(sourceString);
for (int i = 0; i < matchCollection.Count && i < replacementArray.Length; i++)
{
sourceString = sourceString.Replace(matchCollection[i].Value, replacementArray[i]);
}
[a-zA-Z_] describes a character class. For words, you'll have to add * at the end (any number of characters within a-zA-Z_.
Then, to have 'value1' captured, you'll need to add number support : [a-zA-Z0-9_]*, which can be summarized with: \w*
So try this one : {\w*}
But for replacing in C#, string.Split('/') might be easier as Fredrik proposed. Have a look at this too
You could use a delegate, something like this -
string[] strings = {"dog", "cat"};
int counter = -1;
string input = #"/home/{value1}/something/{anotherValue}";
Regex reg = new Regex(#"\{([a-zA-Z0-9]*)\}");
string result = reg.Replace(input, delegate(Match m) {
counter++;
return "{" + strings[counter] + "}";
});
My two cents:
// input string
string txt = "/home/{value1}/something/{anotherValue}";
// template replacements
string[] str_array = { "one", "two" };
// regex to match a template
Regex regex = new Regex("{[^}]*}");
// replace the first template occurrence for each element in array
foreach (string s in str_array)
{
txt = regex.Replace(txt, s, 1);
}
Console.Write(txt);

Regex Replace 2 values in C#

If I have a RichTextBox_1 that look like this:
TEXT TEXT 444.444 555.555 270
TEXT TEXT 444.444 555.555 270
TEXT TEXT 444.444 555.555 270
And I would like to replace the values in the third column with values in a RichTextBox_2:
123.456
12.345
-123.987
And the values in the fourth column with values in a RichTextBox_3:
-9.876
98.76
-987.654
To get a final file:
TEXT TEXT 123.456 9.876 270
TEXT TEXT 12.345 98.76 270
TEXT TEXT -123.987 -987.654 270
How could I do this using REGEX?
EDIT:
CODE:
(Splits the values from a ListBox into: RichTextBox_2 and RichTextBox_3. Instead of the ListBox I have moved everything in this to a RichTextBox_1)
private void calculateXAndYPlacement()
{
// Reads the lines in the file to format.
var fileReader = File.OpenText(filePath + "\\Calculating X,Y File.txt");
// Creates a list for the lines to be stored in.
var fileList = new List<string>();
// Adds each line in the file to the list.
var fileLines = ""; #UPDATED #Corey Ogburn
while ((fileLines = fileReader.ReadLine()) != null) #UPDATED #Corey Ogburn
fileList.Add(fileLines); #UPDATED #Corey Ogburn
// Creates new lists to hold certain matches for each list.
var xyResult = new List<string>();
var xResult = new List<string>();
var yResult = new List<string>();
// Iterate over each line in the file and extract the x and y values
fileList.ForEach(line =>
{
Match xyMatch = Regex.Match(line, #"(?<x>-?\d+\.\d+)\s+(?<y>-?\d+\.\d+)");
if (xyMatch.Success)
{
// Grab the x and y values from the regular expression match
String xValue = xyMatch.Groups["x"].Value;
String yValue = xyMatch.Groups["y"].Value;
// Add these two values, separated by a space, to the "xyResult" list.
xyResult.Add(String.Join(" ", new[]{ xValue, yValue }));
// Add the results to the lists.
xResult.Add(xValue);
yResult.Add(yValue);
// Calculate the X & Y values (including the x & y displacements)
double doubleX = double.Parse(xValue);
double doubleXValue = double.Parse(xDisplacementTextBox.Text);
StringBuilder sbX = new StringBuilder();
sbX.AppendLine((doubleX + doubleXValue).ToString());
double doubleY = double.Parse(yValue);
double doubleYValue = double.Parse(yDisplacementTextBox.Text);
StringBuilder sbY = new StringBuilder();
sbY.AppendLine((doubleY + doubleYValue).ToString());
calculatedXRichTextBox.AppendText(sbX + "\n");
calculatedYRichTextBox.AppendText(sbY + "\n");
}
});
}
I was trying to mess around with the Regex.Replace but I am having some trouble... Here is what I what trying and it does not work:
var combinedStringBuilders = new List<string>();
combinedStringBuilders.Add(String.Concat(sbX + "\t" + sbY));
var someNew = Regex.Replace(line, #"(?<x>-?\d+\.\d+)\s+(?<y>-?\d+\.\d+)", combinedStringBuilders);
General non-language specific method.
(I haven't done enough c# to write specific's)
5 (only) columns should be validated first.
This can be done by alternating non-whitespace
then whitespace.
regex: ^(\s*\S+\s+\S+\s+)(\S+)(\s+)(\S+)(\s+\S+\s*)$
replacemnt: ${1}${sbX}${3}${sbY}${5}
Open the 3 files. In a while loop, read a line from each file.
Validate both sbX/sbY that they have no whitespace.
Do the replacement, append the modified line to a file.
You should simply split your string into lines and then columns, I'm guessing your text has msdos line endings (cr+lf).
public string ReplaceColumn( string text, int col, List<string> newValues ){
var sb = new StringBuilder();
var lines = Regex.Split( text, "[\r\n]+" ); // split into lines
for ( int row = 0; row < lines.Count ; row++ ){
var line = lines[row];
var columns = Regex.Split( line, "[\s]+" ); // split into columns
// replace the chosen column for this row
columns[col] = newvalues[row];
// rebuild the line and store it
sb.Append( String.Join( " ", columns );
sb.Append( "\r\n" ); // or whatever line ending you want
}
return sb.ToString();
}
Of course, the above doesn't work too well if your text columns contain whitespace.
// Reads the lines in the file to format.
var fileReader = File.OpenText(filePath + "\\Calculating X,Y File.txt");
// Creates a list for the lines to be stored in.
var fileList = new List<string>();
// Adds each line in the file to the list.
var fileLines = "";
while ((fileLines = fileReader.ReadLine()) != null)
fileList.Add(fileLines);
// Creates new lists to hold certain matches for each list.
var xyResult = new List<string>();
var xResult = new List<string>();
var yResult = new List<string>();
// Iterate over each line in the file and extract the x and y values
fileList.ForEach(line =>
{
Match xyMatch = Regex.Match(line, #"(?<x>-?\d+\.\d+)\s+(?<y>-?\d+\.\d+)");
if (xyMatch.Success)
{
// Grab the x and y values from the regular expression match
String xValue = xyMatch.Groups["x"].Value;
String yValue = xyMatch.Groups["y"].Value;
// Add these two values, separated by a space, to the "xyResult" list.
xyResult.Add(String.Join(" ", new[] { xValue, yValue }));
// Add the results to the lists.
xResult.Add(xValue);
yResult.Add(yValue);
// Store the old X and Y values.
oldXRichTextBox.AppendText(xValue + Environment.NewLine);
oldYRichTextBox.AppendText(yValue + Environment.NewLine);
try
{
// Calculate the X & Y values (including the x & y displacements)
double doubleX = double.Parse(xValue);
double doubleXValue = double.Parse(xDisplacementTextBox.Text);
StringBuilder sbX = new StringBuilder();
sbX.AppendLine((doubleXValue - doubleX).ToString());
double doubleY = double.Parse(yValue);
double doubleYValue = double.Parse(yDisplacementTextBox.Text);
StringBuilder sbY = new StringBuilder();
sbY.AppendLine((doubleY + doubleYValue).ToString());
calculateXRichTextBox.AppendText(sbX + "");
calculateYRichTextBox.AppendText(sbY + "");
// Removes the blank lines.
calculateXRichTextBox.Text = Regex.Replace(calculateXRichTextBox.Text, #"^\s*$(\n|\r|\r\n)", "", RegexOptions.Multiline);
calculateYRichTextBox.Text = Regex.Replace(calculateYRichTextBox.Text, #"^\s*$(\n|\r|\r\n)", "", RegexOptions.Multiline);
}

c# reading in a text file into datatable

i need to read files that look like this into a datatable:
A02 BLANK031
B02 F357442
C02 F264977
D02 BLANK037
E02 F272521
F02 E121562
G02 F264972
H02 F332321
A03 E208240
B03 F313854
C03 E229786
D03 E229787
E03 F307584
F03 F357478
i have a weird delimitter and some trailing spaces.
how would i read this into a datatable such that the first column will contain 'A02','B02'... and the second column will contain 'BLANK031','F357442',etc..
currently i am doing:
DataTable dt = new DataTable();
using (TextReader tr = File.OpenText(batchesAddresses[index]))
{
string line;
while ((line = tr.ReadLine()) != null)
{
string[] items = Regex.Split(line, ' ');
if (dt.Columns.Count == 0)
{
// Create the data columns for the data table based on the number of items
// on the first line of the file
for (int i = 0; i < items.Length; i++)
dt.Columns.Add(new DataColumn("Column" + i, typeof(string)));
}
dt.Rows.Add(items);
}
}
but this is not working because i have trailing spaces and multiple spaces between columns
If you use:
static readonly char[] space = { ' ' };
...
string[] items = line.Split(space, StringSplitOptions.RemoveEmptyEntries);
you should get the 2 values you expect, although something more selective might be desirable, especially if the right-hand-side might contain a space in the middle.
Change your regex to something like: (\w{3})\s+(\w{5,10}). This means capture 3 word chars (including digits) into group 1, look for one or more whitespace characters, and then capture 5-10 word chars into group 2.
Then do:
Regex r = new Regex("(\w{3})\s+(\w{5,10})");
Match m = r.Match(line);
string col1 = m.Groups[1].Value;
string col2 = m.Groups[2].Value;
The error regarding System.StringSplitOptions seems to be a casting bug in the compiler. Add a line prior to your split statement that defines the desired StringSplitOptions and then use the variable in the split statement.
static readonly char[] space = { ' ' };
static readonly StringSplitOptions options = StringSplitOptions.RemoveEmptyEntries;
...
string[] items = line.Split(space, options);
This should work for all overloads.

Categories

Resources