How to split a csv file into multiple Lists - c#

I have this code:
//new List
List<string> lines = new List<string>();
List<string> lines2 = new List<string>();
// read and write data to list
for (int i = 0; i < fileName.Length; i++)
{
string file = #"read\" + Path.GetFileName(fileName[i]);
// load rows to list.
lines = File.ReadLines(file).ToList();
foreach (string line in lines)
{
// Variablen für lines
string[] entries = line.Split(';');
int length = entries.Length;
}
}
I am able to read all lines from my csv file into one list but I would like to split the csv file after the 6th column into a second list. How do I do that?
I tired already linq with lines.Take(6).ToList(); but this just reads the first 6 lines if I'm not mistaken. Same for Skip().

You are on the right track, to use Take and Skip, and then to add the items to their respective lists, you can call the AddRange method to add a group of items at once:
var filePath = #"c:\public\temp\temp.txt";
var firstSixColumns = new List<string>();
var restOfColumns = new List<string>();
foreach(var fileLine in File.ReadLines(filePath))
{
var fileLineParts = fileLine.Split(';');
firstSixColumns.AddRange(fileLineParts.Take(6)); // Add the first 6 entries
restOfColumns.AddRange(fileLineParts.Skip(6)); // Add the rest of the entries
}
If, on the other hand, you are trying to just split the csv lines into two groups of columns, without splitting them further (so each line in firstSixLines would represent a row of six columns), then you can use string.Join to stitch the columns back again before adding them:
foreach(var fileLine in File.ReadLines(filePath))
{
var fileLineParts = fileLine.Split(';');
// Join the items before adding them to their respective lists
firstSixColumns.Add(string.Join(";", fileLineParts.Take(6)));
restOfColumns.Add(string.Join(";", fileLineParts.Skip(6)));
}

You need to apply skip at the point you are reading the column values
List<string> entries = line.Split(';').Skip(6).ToList();
That said there are plenty of libraries that you can use that are built around reading CSV files. I would recommend searching NuGet for one before you reinvent the wheel.
If you want 2 lists
string[] entries = line.Split(';');
List<string> entriesFirst = entries.Take(6).ToList();
List<string> entriesSecond = entries.Skip(6).ToList();

Related

How to read data from an excel and push into an array in asp.net?

I am new to c# programming and would like to know how we can read data from an excel cell by cell. In the below code, I am getting an array of data from Column A of excel as pValue1= a;b;c;d%;e%;f%; Now, I want to push only the values with % at the end into a different array if the header of column A=ID. Also, I want to enclose each item in pValue1 with single quotes.
Input:
ID
Name
a
roy
b
may
c
Jly
d%
nav
e%
nov
f%
lio
Expected output:
pValue1= 'a';'b';'c'
pValue3= d%e%f%
try {
Utils.ExcelFile excelFile = new Utils.ExcelFile(excelFilename);
DataTable excelData = excelFile.GetDataFromExcel();
// Column headers
param1 = 0 < excelData.Columns.Count ? excelData.Columns[0].ColumnName :string.Empty;
param2 = 1 < excelData.Columns.Count ? excelData.Columns[1].ColumnName :string.Empty;
ArrayList pValueArray1 = new ArrayList();
ArrayList pValueArray2 = new ArrayList();
if (pValueArray1.Count > 0) pValue1 = string.Join(";", pValueArray1.ToArray()) + ";";
if (pValueArray2.Count > 0) pValue2 = string.Join(";", pValueArray2.ToArray()) + ";";
}
Not sure if i understood your issue. I guess you have already loaded the excel into the DataTable and you now just want to split the Id-column into two separate lists. You can use LINQ:
var percentLookup = excelData.AsEnumerable()
.ToLookup(row => row.Field<string>("Id").EndsWith("%"));
List<string> pValue1List = percentLookup[false]
.Select(row => row.Field<string>("Id"))
.ToList();
List<string> pValue2List = percentLookup[true]
.Select(row => row.Field<string>("Id"))
.ToList();
The lookup contains two groups, the rows where the id-column has a percent at the end and the rest. So you can create the two lists easily with it.
Since you are new to C# programming it might be better to use a plain loop:
List<string> pValue1List = new List<string>();
List<string> pValue2List = new List<string>();
foreach (DataRow row in excelData.Rows)
{
string id = row.Field<string>("Id");
if(id.EndsWith("%"))
{
pValue2List.Add(id);
}
else
{
pValue1List.Add(id);
}
}
If you need a String[] instead of a List<string> use ToArray instead of ToList and in the second approach fill the lists but use i.e. pValue1List.ToArray at the end.
In general: you should stop using ArrayList, that's 20 years old and since more than 10 years obsolete. Use a strongly typed List<T>, here a List<string>.

How to get all rows for specific column from .csv file

In my project, I have a .csv file with many columns.
I need to extract all rows for only first column. I've managed to read all lines, but got stuck on how to extract rows from first column to another .csv file.
string filePath = #"C:\Users\BP185150\Desktop\OTC.csv";
string[] OTC_Output = File.ReadAllLines(#"C:\Users\BP185150\Desktop\OTC.csv");
foreach (string line in OTC_Output)
{
Console.WriteLine(line);
Console.Read();
}
Console.ReadLine();
Depending on what seperator your csv is using you can use the string.split() function.
e.g.
string firstItem = line.Split(',')[0];
Console.WriteLine(firstItem);
Adding them to a collection:
ICollection<string> firstItems = new List<string>();
string[] OTC_Output = File.ReadAllLines(#"C:\Users\BP185150\Desktop\OTC.csv");
foreach (string line in OTC_Output)
{
firstItems.Add(line.Split(',')[0]);
}
Well if you want to use File.ReadAllLines, the best way to get the first column is to split the line with a delimiter that your csv is using. Then just add the first item of every line to a collection.
var column = OTC_Output.Select(line => line.Split(';').First()).ToList();
In lineItems, you'll have all the columns splitted:
var lineItems = line.Split(";").ToArray();
Then, parse the value only for the first of them:
lineItems.GetValue(0).ToString();

How do I remove duplicates from excel range? c#

I've converted cells in my excel range from strings to form a string list and have separated each item after the comma in the original list. I am starting to think I have not actually separated each item, and they are still one whole, trying to figure out how to do this properly so that each item( ie. the_red_bucket_01)is it's own string.
example of original string in a cell 1 and 2:
Cell1 :
the_red_bucket_01, the_blue_duck_01,_the green_banana_02, the orange_bear_01
Cell2 :
the_purple_chair_01, the_blue_coyote_01,_the green_banana_02, the orange_bear_01
The new list looks like this, though I'm not sure they are separate items:
the_red_bucket_01
the_blue_duck_01
the green_banana_02
the orange_bear_01
the_red_chair_01
the_blue_coyote_01
the green_banana_02
the orange_bear_01
Now I want to remove duplicates so that the console only shows 1 of each item, no matter how many there are of them, I can't seem to get my foreah/if statements to work. It is printing out multiple copies of the items, I'm assuming because it is iterating for each item in the list, so it is returning the data that many items.
foreach (Excel.Range item in xlRng)
{
string itemString = (string)item.Text;
List<String> fn = new List<String>(itemString.Split(','));
List<string> newList = new List<string>();
foreach (string s in fn)
if (!newList.Contains(s))
{
newList.Add(s);
}
foreach (string combo in newList)
{
Console.Write(combo);
}
You probably need to trim the strings, because they have leading white spaces, so "string1" is different from " string1".
foreach (string s in fn)
if (!newList.Contains(s.Trim()))
{
newList.Add(s);
}
You can do this much simpler with Linq by using Distinct.
Returns distinct elements from a sequence by using the default
equality comparer to compare values.
foreach (Excel.Range item in xlRng)
{
string itemString = (string)item.Text;
List<String> fn = new List<String>(itemString.Split(','));
foreach (string combo in fn.Distinct())
{
Console.Write(combo);
}
}
As mentioned in another answer, you may also need to Trim any whitespace, in which case you would do:
fn.Select(x => x.Trim()).Distinct()
Where you need to contain keys/values, its better to use Dictionary type. Try changing code with List<T> to Dictionary<T>. i.e.
From:
List<string> newList = new List<string>();
foreach (string s in fn)
if (!newList.Containss))
{
newList.Add(s);
}
to
Dictionary<string, string> newList = new Dictionary<string, string>();
foreach (string s in fn)
if (!newList.ContainsKey(s))
{
newList.Add(s, s);
}
If you are concerned about the distinct items while you are reading, then just use the Distinct operator like fn.Distinct()
For processing the whole data, I can suggest two methods:
Read in the whole data then use LINQ's Distinct operator
Or use a Set data structure and store each element in that while reading the excel
I suggest that you take a look at the LINQ documentation if you are processing data. It has really great extensions. For even more methods, you can check out the MoreLINQ package.
I think your code would probably work as you expect if you moved newList out of the loop - you create a new variable named newList each loop so it's not going to find duplicates from earlier loops.
You can do all of this this more concisely with Linq:
//set up some similar data
string list1 = "a,b,c,d,a,f";
string list2 = "a,b,c,d,a,f";
List<string> lists = new List<string> {list1,list2};
// find unique items
var result = lists.SelectMany(i=>i.Split(',')).Distinct().ToList();
SelectMany() "flattens" the list of lists into a list.
Distinct() removes duplicates.
var uniqueItems = new HashSet<string>();
foreach (Excel.Range cell in xlRng)
{
var cellText = (string)cell.Text;
foreach (var item in cellText.Split(',').Select(s => s.Trim()))
{
uniqueItems.Add(item);
}
}
foreach (var item in uniqueItems)
{
Console.WriteLine(item);
}

Attempting to output an IOrderedEnumerable list using C#

I have read in a .csv file, done some formatting, seperated each line into its columns and added the resulting arrays into a list of arrays of columns. Next I have ordered the list of arrays using IOrderedEnumerable to order it by the second column ascending alphabetically, then I attempt to out put this newly ordered list to the screen. Its the last part that am I stuck on.
This is what I have attempted:
// attempt to read file, if it fails for some reason display the exception error message
try
{
// create list for storing arrays
List<string[]> users = new List<string[]>();
string[] lineData;
string line;
// read in stremreader
System.IO.StreamReader file = new System.IO.StreamReader("dcpmc_whitelist.csv");
// loop through each line and remove any speech marks
while((line = file.ReadLine()) != null)
{
// remove speech marks from each line
line = line.Replace("\"", "");
// split line into each column
lineData = line.Split(';');
// add each element of split array to the list of arrays
users.Add(lineData);
}
//sort this list by username ascending
IOrderedEnumerable<String[]> usersByUsername = users.OrderBy(user => user[0]);
// display the newly ordered list
for (int i = 0; i <= users.Count; i++)
{
Console.WriteLine(usersByUsername[i]);
}
// after loading the list take user to top of the screen
Console.SetWindowPosition(0, 0);
}
catch (Exception e)
{
// Let the user know what went wrong when reading the file
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
But this give the error:
cannot apply indexing with [] to an expression of type
system.linq.iorderedenumerable
What is causing this error and how can I simply output the newly ordered list correctly?
The cause is neither IEnumerable nor IOrderedEnumerable support indexing, showing you the error.
To display the ordered result you can use foreach to enumerate the collection:
// display the newly ordered list
foreach (var user in usersByUsername)
{
Console.WriteLine(string.Join(", ", user));
}
Or you can convert result to list and use indexing:
//sort this list by username ascending
IList<String[]> usersByUsername = users.OrderBy(user => user[0]).ToList();
// display the newly ordered list
for (int i = 0; i <= users.Count; i++)
{
Console.WriteLine(string.Join(", ", usersByUsername[i]));
}
Also note the usage of string.Join - just printing string[] might not give you the result you expect.

Is there a way to dynamically create an object at run time in .NET 3.5?

I'm working on an importer that takes tab delimited text files. The first line of each file contains 'columns' like ItemCode, Language, ImportMode etc and there can be varying numbers of columns.
I'm able to get the names of each column, whether there's one or 10 and so on. I use a method to achieve this that returns List<string>:
private List<string> GetColumnNames(string saveLocation, int numColumns)
{
var data = (File.ReadAllLines(saveLocation));
var columnNames = new List<string>();
for (int i = 0; i < numColumns; i++)
{
var cols = from lines in data
.Take(1)
.Where(l => !string.IsNullOrEmpty(l))
.Select(l => l.Split(delimiter.ToCharArray(), StringSplitOptions.None))
.Select(value => string.Join(" ", value))
let split = lines.Split(' ')
select new
{
Temp = split[i].Trim()
};
foreach (var x in cols)
{
columnNames.Add(x.Temp);
}
}
return columnNames;
}
If I always knew what columns to be expecting, I could just create a new object, but since I don't, I'm wondering is there a way I can dynamically create an object with properties that correspond to whatever GetColumnNames() returns?
Any suggestions?
For what it's worth, here's how I used DataTables to achieve what I wanted.
// saveLocation is file location
// numColumns comes from another method that gets number of columns in file
var columnNames = GetColumnNames(saveLocation, numColumns);
var table = new DataTable();
foreach (var header in columnNames)
{
table.Columns.Add(header);
}
// itemAttributeData is the file split into lines
foreach (var row in itemAttributeData)
{
table.Rows.Add(row);
}
Although there was a bit more work involved to be able to manipulate the data in the way I wanted, Karthik's suggestion got me on the right track.
You could create a dictionary of strings where the first string references the "properties" name and the second string its characteristic.

Categories

Resources