Excel Text to Column Equivalent in C# Datatable

Excel Text to Column Equivalent in C# Datatable - c#

Although several C# methods in regards to the conversion of a CSV file to DataTable exist online, none of them seems to replicate Excel - Convert Text to Columns. To illustrate my point, suppose the following sample CSV is used:
" FIRst row DATA T"
;;
"Excel versus Csharp" ; ; ; ;
; "MA"; "10000"; "20000"; ; "400000";;; ;; ;;;
"SECOND REPORTING"
"1"; "20"; ; "";"";"";"";"";"";"";
One is expecting (ideally without the double quotes on the string) to obtaining in the DataTable:
FIRst row DATA T
Excel versus Csharp
MA 10000 20000 40000 ...
SECOND REPORTING
1 20 ....
Method used:
public static DataTable CSVQuoteToDataTable(string
csvFilePath)
{
DataTable csvData = new DataTable();
try
{
string[] seps = { "\";", ";\"" };
char[] quotes = { '\"', ' ' };
string[] colFields = null;
foreach (var line in File.ReadLines(csvFilePath)) % targets unfortunately the 1st row as column of the DataTable
{
var fields = line
.Split(seps, StringSplitOptions.None)
.Select(s => s.Trim(quotes).Replace("\\\"", "\""))
.ToArray();
if (colFields == null)
{
colFields = fields;
foreach (string column in colFields)
{
DataColumn datacolumn = new DataColumn(column);
datacolumn.AllowDBNull = true;
csvData.Columns.Add(datacolumn);
}
}
else
{
for (int i = 0; i < fields.Length; i++)
{
if (fields[i] == "")
{
fields[i] = null;
}
}
csvData.Rows.Add(fields);
}
}
}
catch (Exception ex)
{
}
return csvData;
}
Feedback on more efficient or better method would be appreciated.
Thanks

I think you can accomplish most of what you desire with the Nuget extension LumenWorksCsvReader. It will allow you to use a delimiter other than a comma and handle quotes (or not, if you choose).
https://www.nuget.org/packages/LumenWorksCsvReader/
The only problem I see is the variable number of columns, in which case you may have to handle that in your code -- determine the max number of columns and use that. The package will throw an exception when you try to read outside the range, and as clumsy as it may seem, you can just include a try/catch for that.
Bare bones, but something like this should get you started.
using LumenWorks.Framework.IO.Csv;
using (StreamReader sr = new StreamReader(#"c:\cdh\foo.csv"))
{
using (CsvReader reader = new CsvReader(sr, hasHeaders:false, delimiter:';', quote:'"'))
{
while (reader.ReadNextRecord())
{
for (int i = 0; i < maxFieldCount; i++)
{
try
{
string val = reader[i];
// do something with 'val'
}
catch (Exception ex)
{
}
}
}
}
}

Related

New line within CSV column causing issue

I have a large csv file which has millions of rows. The sample csv lines are
CODE,COMPANY NAME, DATE, ACTION
A,My Name , LLC,2018-01-28,BUY
B,Your Name , LLC,2018-01-25,SELL
C,
All Name , LLC,2018-01-21,SELL
D,World Name , LLC,2018-01-20,BUY
Row C has new line, but actually this is same record. I want to remove new line character from the csv line within cell\field\column.
I tired \r\n, Envirnment.NewLine and many other things, but could not make it work.
Here is my code..
private DataTable CSToDataTable(string csvfile)
{
Int64 row = 0;
try
{
string CSVFilePathName = csvfile; //#"C:\test.csv";
string[] Lines = File.ReadAllLines(CSVFilePathName.Replace(Environment.NewLine, ""));
string[] Fields;
Fields = Lines[0].Split(new char[] { ',' });
int Cols = Fields.GetLength(0);
DataTable dt = new DataTable();
//1st row must be column names; force lower case to ensure matching later on.
for (int i = 0; i < Cols; i++)
dt.Columns.Add(Fields[i].ToLower(), typeof(string));
DataRow Row;
for (row = 1; row < Lines.GetLength(0); row++)
{
Fields = Lines[row].Split(new char[] { ',' });
Row = dt.NewRow();
//Console.WriteLine(row);
for (int f = 0; f < Cols; f++)
{
Row[f] = Fields[f];
}
dt.Rows.Add(Row);
if (row == 190063)
{
}
}
return dt;
}
catch (Exception ex)
{
throw ex;
}
}
How can I remove new line character and read the row correctly? I don't want to skip the such rows as per the business requirement.

You CSV file is not in valid format. In order to parse and load them successfully, you will have to sanitize them. Couple of issues
COMPANY NAME column contains field separator in it. Fix them by
surrounding quotes.
New line in CSV value - This can be fixed by combining adjacent rows as one.
With Cinchoo ETL, you can sanitize and load your large file as below
string csv = #"CODE,COMPANY NAME, DATE, ACTION
A,My Name , LLC,2018-01-28,BUY
B,Your Name , LLC,2018-01-25,SELL
C,
All Name , LLC,2018-01-21,SELL
D,World Name , LLC,2018-01-20,BUY";
string bufferLine = null;
var reader = ChoCSVReader.LoadText(csv)
.WithFirstLineHeader()
.Setup(s => s.BeforeRecordLoad += (o, e) =>
{
string line = (string)e.Source;
string[] tokens = line.Split(",");
if (tokens.Length == 5)
{
//Fix the second and third value with quotes
e.Source = #"{0},""{1},{2}"",{3}, {4}".FormatString(tokens[0], tokens[1], tokens[2], tokens[3], tokens[4]);
}
else
{
//Fix the breaking lines, assume that some csv lines broken into max 2 lines
if (bufferLine == null)
{
bufferLine = line;
e.Skip = true;
}
else
{
line = bufferLine + line;
tokens = line.Split(",");
e.Source = #"{0},""{1},{2}"",{3}, {4}".FormatString(tokens[0], tokens[1], tokens[2], tokens[3], tokens[4]);
line = null;
}
}
});
foreach (var rec in reader)
Console.WriteLine(rec.Dump());
//Careful to load millions rows into DataTable
//var dt = reader.AsDataTable();
Hope it helps.

You haven't made it clear what are the possible criteria an unwanted new line could appear in the file. So assuming that a 'proper' line in the CSV file does NOT end with a comma, and if one ends with a comma that means that it's not a properly formatted line, you could do something like this:
static void Main(string[] args)
{
string path = #"CSVFile.csv";
List<CSVData> data = new List<CSVData>();
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
{
using (StreamReader sr = new StreamReader(fs))
{
sr.ReadLine(); // Header
while (!sr.EndOfStream)
{
var line = sr.ReadLine();
while (line.EndsWith(","))
{
line += sr.ReadLine();
}
var items = line.Split(new string[] { "," }, StringSplitOptions.None);
data.Add(new CSVData() { CODE = items[0], NAME = items[1], COMPANY = items[2], DATE = items[3], ACTION = items[4] });
}
}
}
Console.ReadLine();
}
public class CSVData
{
public string CODE { get; set; }
public string NAME { get; set; }
public string COMPANY { get; set; }
public string DATE { get; set; }
public string ACTION { get; set; }
}
Obviously there's a lot of error handling to be done here (for example, when creating a new CSVData object make sure your items contain all the data you want), but I think this is the start you need.

Split string that includes multiline substrings into substrings [duplicate]

I'm writing a simple import application and need to read a CSV file, show result in a DataGrid and show corrupted lines of the CSV file in another grid. For example, show the lines that are shorter than 5 values in another grid. I'm trying to do that like this:
StreamReader sr = new StreamReader(FilePath);
importingData = new Account();
string line;
string[] row = new string [5];
while ((line = sr.ReadLine()) != null)
{
row = line.Split(',');
importingData.Add(new Transaction
{
Date = DateTime.Parse(row[0]),
Reference = row[1],
Description = row[2],
Amount = decimal.Parse(row[3]),
Category = (Category)Enum.Parse(typeof(Category), row[4])
});
}
but it's very difficult to operate on arrays in this case. Is there a better way to split the values?

Don't reinvent the wheel. Take advantage of what's already in .NET BCL.
add a reference to the Microsoft.VisualBasic (yes, it says VisualBasic but it works in C# just as well - remember that at the end it is all just IL)
use the Microsoft.VisualBasic.FileIO.TextFieldParser class to parse CSV file
Here is the sample code:
using (TextFieldParser parser = new TextFieldParser(#"c:\temp\test.csv"))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
//Processing row
string[] fields = parser.ReadFields();
foreach (string field in fields)
{
//TODO: Process field
}
}
}
It works great for me in my C# projects.
Here are some more links/informations:
MSDN: Read From Comma-Delimited Text Files in Visual Basic
MSDN: TextFieldParser Class

I recommend CsvHelper from NuGet.
PS: Regarding other more upvoted answers, I'm sorry but adding a reference to Microsoft.VisualBasic is:
Ugly
Not cross-platform, because it's not available in .NETCore/.NET5 (and Mono never had very good support of Visual Basic, so it may be buggy).

My experience is that there are many different csv formats. Specially how they handle escaping of quotes and delimiters within a field.
These are the variants I have ran into:
quotes are quoted and doubled (excel) i.e. 15" -> field1,"15""",field3
quotes are not changed unless the field is quoted for some other reason. i.e. 15" -> field1,15",fields3
quotes are escaped with \. i.e. 15" -> field1,"15\"",field3
quotes are not changed at all (this is not always possible to parse correctly)
delimiter is quoted (excel). i.e. a,b -> field1,"a,b",field3
delimiter is escaped with \. i.e. a,b -> field1,a\,b,field3
I have tried many of the existing csv parsers but there is not a single one that can handle the variants I have ran into. It is also difficult to find out from the documentation which escaping variants the parsers support.
In my projects I now use either the VB TextFieldParser or a custom splitter.

Sometimes using libraries are cool when you do not want to reinvent the wheel, but in this case one can do the same job with fewer lines of code and easier to read compared to using libraries.
Here is a different approach which I find very easy to use.
In this example, I use StreamReader to read the file
Regex to detect the delimiter from each line(s).
An array to collect the columns from index 0 to n
using (StreamReader reader = new StreamReader(fileName))
{
string line;
while ((line = reader.ReadLine()) != null)
{
//Define pattern
Regex CSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
//Separating columns to array
string[] X = CSVParser.Split(line);
/* Do something with X */
}
}

CSV can get complicated real fast.
Use something robust and well-tested:
FileHelpers:
www.filehelpers.net
The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams.

Another one to this list, Cinchoo ETL - an open source library to read and write CSV files
For a sample CSV file below
Id, Name
1, Tom
2, Mark
Quickly you can load them using library as below
using (var reader = new ChoCSVReader("test.csv").WithFirstLineHeader())
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
If you have POCO class matching the CSV file
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
You can use it to load the CSV file as below
using (var reader = new ChoCSVReader<Employee>("test.csv").WithFirstLineHeader())
{
foreach (var item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
Please check out articles at CodeProject on how to use it.
Disclaimer: I'm the author of this library

I use this here:
http://www.codeproject.com/KB/database/GenericParser.aspx
Last time I was looking for something like this I found it as an answer to this question.

private static DataTable ConvertCSVtoDataTable(string strFilePath)
{
DataTable dt = new DataTable();
using (StreamReader sr = new StreamReader(strFilePath))
{
string[] headers = sr.ReadLine().Split(',');
foreach (string header in headers)
{
dt.Columns.Add(header);
}
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(',');
DataRow dr = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
}
return dt;
}
private static void WriteToDb(DataTable dt)
{
string connectionString =
"Data Source=localhost;" +
"Initial Catalog=Northwind;" +
"Integrated Security=SSPI;";
using (SqlConnection con = new SqlConnection(connectionString))
{
using (SqlCommand cmd = new SqlCommand("spInsertTest", con))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add("#policyID", SqlDbType.Int).Value = 12;
cmd.Parameters.Add("#statecode", SqlDbType.VarChar).Value = "blagh2";
cmd.Parameters.Add("#county", SqlDbType.VarChar).Value = "blagh3";
con.Open();
cmd.ExecuteNonQuery();
}
}
}

Here's a solution I coded up today for a situation where I needed to parse a CSV without relying on external libraries. I haven't tested performance for large files since it wasn't relevant to my particular use case but I'd expect it to perform reasonably well for most situations.
static List<List<string>> ParseCsv(string csv) {
var parsedCsv = new List<List<string>>();
var row = new List<string>();
string field = "";
bool inQuotedField = false;
for (int i = 0; i < csv.Length; i++) {
char current = csv[i];
char next = i == csv.Length - 1 ? ' ' : csv[i + 1];
// if current character is not a quote or comma or carriage return or newline (or not a quote and currently in an a quoted field), just add the character to the current field text
if ((current != '"' && current != ',' && current != '\r' && current != '\n') || (current != '"' && inQuotedField)) {
field += current;
} else if (current == ' ' || current == '\t') {
continue; // ignore whitespace outside a quoted field
} else if (current == '"') {
if (inQuotedField && next == '"') { // quote is escaping a quote within a quoted field
i++; // skip escaping quote
field += current;
} else if (inQuotedField) { // quote signifies the end of a quoted field
row.Add(field);
if (next == ',') {
i++; // skip the comma separator since we've already found the end of the field
}
field = "";
inQuotedField = false;
} else { // quote signifies the beginning of a quoted field
inQuotedField = true;
}
} else if (current == ',') { //
row.Add(field);
field = "";
} else if (current == '\n') {
row.Add(field);
parsedCsv.Add(new List<string>(row));
field = "";
row.Clear();
}
}
return parsedCsv;
}

First of all need to understand what is CSV and how to write it.
Every next string ( /r/n ) is next "table" row.
"Table" cells is separated by some delimiter symbol. Most often used symbols is \t or ,
Every cell possibly can contain this delimiter symbol (cell must to start with quotes symbol and ends with this symbol in this case)
Every cell possibly can contains /r/n sybols (cell must to start with quotes symbol and ends with this symbol in this case)
The easiest way for C#/Visual Basic to work with CSV files is to use standard Microsoft.VisualBasic library. You just need to add needed reference, and the following string to your class:
using Microsoft.VisualBasic.FileIO;
Yes, you can use it in C#, don't worry. This library can read relatively big files and supports all of needed rules, so you will be able to work with all of CSV files.
Some time ago I had wrote simple class for CSV read/write based on this library. Using this simple class you will be able to work with CSV like with 2 dimensions array.
You can find my class by the following link:
https://github.com/ukushu/DataExporter
Simple example of using:
Csv csv = new Csv("\t");//delimiter symbol
csv.FileOpen("c:\\file1.csv");
var row1Cell6Value = csv.Rows[0][5];
csv.AddRow("asdf","asdffffff","5")
csv.FileSave("c:\\file2.csv");

To complete the previous answers, one may need a collection of objects from his CSV File, either parsed by the TextFieldParser or the string.Split method, and then each line converted to an object via Reflection. You obviously first need to define a class that matches the lines of the CSV file.
I used the simple CSV Serializer from Michael Kropat found here: Generic class to CSV (all properties)
and reused his methods to get the fields and properties of the wished class.
I deserialize my CSV file with the following method:
public static IEnumerable<T> ReadCsvFileTextFieldParser<T>(string fileFullPath, string delimiter = ";") where T : new()
{
if (!File.Exists(fileFullPath))
{
return null;
}
var list = new List<T>();
var csvFields = GetAllFieldOfClass<T>();
var fieldDict = new Dictionary<int, MemberInfo>();
using (TextFieldParser parser = new TextFieldParser(fileFullPath))
{
parser.SetDelimiters(delimiter);
bool headerParsed = false;
while (!parser.EndOfData)
{
//Processing row
string[] rowFields = parser.ReadFields();
if (!headerParsed)
{
for (int i = 0; i < rowFields.Length; i++)
{
// First row shall be the header!
var csvField = csvFields.Where(f => f.Name == rowFields[i]).FirstOrDefault();
if (csvField != null)
{
fieldDict.Add(i, csvField);
}
}
headerParsed = true;
}
else
{
T newObj = new T();
for (int i = 0; i < rowFields.Length; i++)
{
var csvFied = fieldDict[i];
var record = rowFields[i];
if (csvFied is FieldInfo)
{
((FieldInfo)csvFied).SetValue(newObj, record);
}
else if (csvFied is PropertyInfo)
{
var pi = (PropertyInfo)csvFied;
pi.SetValue(newObj, Convert.ChangeType(record, pi.PropertyType), null);
}
else
{
throw new Exception("Unhandled case.");
}
}
if (newObj != null)
{
list.Add(newObj);
}
}
}
}
return list;
}
public static IEnumerable<MemberInfo> GetAllFieldOfClass<T>()
{
return
from mi in typeof(T).GetMembers(BindingFlags.Public | BindingFlags.Instance | BindingFlags.Static)
where new[] { MemberTypes.Field, MemberTypes.Property }.Contains(mi.MemberType)
let orderAttr = (ColumnOrderAttribute)Attribute.GetCustomAttribute(mi, typeof(ColumnOrderAttribute))
orderby orderAttr == null ? int.MaxValue : orderAttr.Order, mi.Name
select mi;
}

I'd highly suggest using CsvHelper.
Here's a quick example:
public class csvExampleClass
{
public string Id { get; set; }
public string Firstname { get; set; }
public string Lastname { get; set; }
}
var items = DeserializeCsvFile<List<csvExampleClass>>( csvText );
public static List<T> DeserializeCsvFile<T>(string text)
{
CsvReader csv = new CsvReader( new StringReader( text ) );
csv.Configuration.Delimiter = ",";
csv.Configuration.HeaderValidated = null;
csv.Configuration.MissingFieldFound = null;
return (List<T>)csv.GetRecords<T>();
}
Full documentation can be found at: https://joshclose.github.io/CsvHelper

Given string array of column names, how do I read a .csv file to a DataTable?

Assume I have a .csv file with 70 columns, but only 5 of the columns are what I need. I want to be able to pass a method a string array of the columns names that I want, and for it to return a datatable.
private void method(object sender, EventArgs e) {
string[] columns =
{
#"Column21",
#"Column48"
};
DataTable myDataTable = Get_DT(columns);
}
public DataTable Get_DT(string[] columns) {
DataTable ret = new DataTable();
if (columns.Length > 0)
{
foreach (string column in columns)
{
ret.Columns.Add(column);
}
string[] csvlines = File.ReadAllLines(#"path to csv file");
csvlines = csvlines.Skip(1).ToArray(); //ignore the columns in the first line of the csv file
//this is where i need help... i want to use linq to read the fields
//of the each row with only the columns name given in the string[]
//named columns
}
return ret;
}

Read the first line of the file, line.Split(',') (or whatever your delimiter is), then get the index of each column name and store that.
Then for each other line, again do a var values = line.Split(','), then get the values from the columns.
Quick and dirty version:
string[] csvlines = File.ReadAllLines(#"path to csv file");
//select the indices of the columns we want
var cols = csvlines[0].Split(',').Select((val,i) => new { val, i }).Where(x => columns.Any(c => c == x.val)).Select(x => x.i).ToList();
//now go through the remaining lines
foreach (var line in csvlines.Skip(1))
{
var line_values = line.Split(',').ToList();
var dt_values = line_values.Where(x => cols.Contains(line_values.IndexOf(x)));
//now do something with the values you got for this row, add them to your datatable
}

You can look at https://joshclose.github.io/CsvHelper/
Think Reading individual fields is what you are looking for
var csv = new CsvReader( textReader );
while( csv.Read() )
{
var intField = csv.GetField<int>( 0 );
var stringField = csv.GetField<string>( 1 );
var boolField = csv.GetField<bool>( "HeaderName" );
}

We can easily do this without writing much code.
Exceldatareader is an awesome dll for that, it will directly as a datable from the excel sheet with just one method.
here is the links for example:http://www.c-sharpcorner.com/blogs/using-iexceldatareader1
http://exceldatareader.codeplex.com/
Hope it was useful kindly let me know your thoughts or feedbacks
Thanks
Karthik

var data = File.ReadAllLines(#"path to csv file");
// the expenses row
var query = data.Single(d => d[0] == "Expenses");
//third column
int column21 = 3;
return query[column21];

As others have stated a library like CsvReader can be used for this. As for linq, I don't think its suitable for this kind of job.
I haven't tested this but it should get you through
using (TextReader textReader = new StreamReader(filePath))
{
using (var csvReader = new CsvReader(textReader))
{
var headers = csvReader.FieldHeaders;
for (int rowIndex = 0; csvReader.Read(); rowIndex++)
{
var dataRow = dataTable.NewRow();
for (int chosenColumnIndex = 0; chosenColumnIndex < columns.Count(); chosenColumnIndex++)
{
for (int headerIndex = 0; headerIndex < headers.Length; headerIndex++)
{
if (headers[headerIndex] == columns[chosenColumnIndex])
{
dataRow[chosenColumnIndex] = csvReader.GetField<string>(headerIndex);
}
}
}
dataTable.Rows.InsertAt(dataRow, rowIndex);
}
}
}

Ignoring CSV rows with no data

I'm surprised that I haven't seen anything about this on here (or maybe I missed it). When parsing a CSV file, if there are rows with no data, how can/should that be handled? I'm not talking about blank rows, but empty rows, for example:
ID,Name,Quantity,Price
1,Stuff,2,5
2,Things,1,2.5
,,,
,,,
,,,
I am using TextFieldParser to handle commas in data, multiple delimiters, etc. The two solutions I've thought of is to either use ReadLine instead of ReadFields, but that would remove the benefits of using the TextFieldParser, I'd assume, because then I'd have to handle commas a different way. The other option would be to iterate through the fields and drop the row if all of the fields are empty. Here's what I have:
dttExcelTable = new DataTable();
using (TextFieldParser parser = new TextFieldParser(fileName))
{
parser.Delimiters = new string[] { ",", "|" };
string[] fields = parser.ReadFields();
if (fields == null)
{
return null;
}
foreach (string columnHeader in fields)
{
dttExcelTable.Columns.Add(columnHeader);
}
while (true)
{
DataRow importedRow = dttExcelTable.NewRow();
fields = parser.ReadFields();
if (fields == null)
{
break;
}
for (int i = 0; i < fields.Length; i++)
{
importedRow[i] = fields[i];
}
foreach (var field in importedRow.ItemArray)
{
if (!string.IsNullOrEmpty(field.ToString()))
{
dttExcelTable.Rows.Add(importedRow);
break;
}
}
}
}

Without using a thirdy party CSV reader you could change your code in this way
.....
DataRow importedRow = dttExcelTable.NewRow();
for (int i = 0; i < fields.Length; i++)
importedRow[i] = fields[i];
if(!importedRow.ItemArray.All (ia => string.IsNullOrWhiteSpace(ia.ToString())))
dttExcelTable.Rows.Add(importedRow);
Using the All IEnumerable extension you could check every element of the ItemArray using string.IsNullOrWhiteSpace. If the return is true you have an array of empty string and you could skip the Add

You can just replace commas in the line by nothing and test this if it is null.
strTemp = s.Replace(",", "");
if (!String.IsNullOrEmpty(strTemp)) { /*code here */}
http://ideone.com/8wKOVD

Doesn't seem like there's really a better solution than the one that I provided. I will just need to loop through all of the fields and see if they are all empty before adding it to my datatable.
The only other solution I've found is Steve's answer, which is to not use TextFieldParser

I know this is literally years later, but I recently had this issue and was able to find a workaround similar to previous responses. You can see the whole flushed out function
public static DataTable CSVToDataTable(IFormFile file)
{
DataTable dt = new DataTable();
using (StreamReader sr = new StreamReader(file.OpenReadStream()))
{
string[] headers = sr.ReadLine().Split(',');
foreach (string header in headers)
{
dt.Columns.Add(header);
}
var txt = sr.ReadToEnd();
var stringReader = new StringReader(txt);
TextFieldParser parser = new TextFieldParser(stringReader);
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
string[] rows = parser.ReadFields();
string tmpStr = string.Join("", rows);
if (!string.IsNullOrWhiteSpace(tmpStr))
{
DataRow dr = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
}
}
return dt;
}
It works for me and has proven fairly reliable. The main snippet is found in the WHILE loop after calling .ReadFields()--I join the returned rows to a string and then check if its nullorempty. Hopefully this can help someone who stumbles upon this.

How to read tab delimited lines by skipping alternate lines

I am currently able to parse and extract data from large tab delimited file. I am reading, parsing and extracting line by line and adding the split items in my Data table (Row Limit adding 3 rows at a time). I need to skip even lines i.e. Read first maximum tab delimited line and then skip 2nd one and read the third one directly.
My Tab delimited source file format
001Mean 26.975 1.1403 910.45
001Stdev 26.975 1.1403 910.45
002Mean 26.975 1.1403 910.45
002Stdev 26.975 1.1403 910.45
Need to skip or avoid reading Stdev tab delimited lines.
C# Code:
Getting the Maximum length of items in a tab delimited line of the file by splitting a line
using (var reader = new StreamReader(sourceFileFullName))
{
string line = null;
line = reader.ReadToEnd();
if (!string.IsNullOrEmpty(line))
{
var list_with_max_cols = line.Split('\n').OrderByDescending(y => y.Split('\t').Count()).Take(1);
foreach (var value in list_with_max_cols)
{
var values = value.ToString().Split(new[] { '\t', '\n' }).ToArray();
MAX_NO_OF_COLUMNS = values.Length;
}
}
}
Reading the file line by line until maximum length in a tab delimited line is satisfied as first line to parse and extract
using (var reader = new StreamReader(sourceFileFullName))
{
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
//when reach first line it is column list need to create datatable based on that.
if (firstLineOfFile)
{
columnData = new_read_line;
firstLineOfFile = false;
continue;
}
if (firstLineOfChunk)
{
firstLineOfChunk = false;
chunkDataTable = CreateEmptyDataTable(columnData);
}
AddRow(chunkDataTable, new_read_line);
chunkRowCount++;
if (chunkRowCount == _chunkRowLimit)
{
firstLineOfChunk = true;
chunkRowCount = 0;
yield return chunkDataTable;
chunkDataTable = null;
}
}
}
Creating Data Table:
private DataTable CreateEmptyDataTable(string firstLine)
{
IList<string> columnList = Split(firstLine);
var dataTable = new DataTable("TableName");
for (int columnIndex = 0; columnIndex < columnList.Count; columnIndex++)
{
string c_string = columnList[columnIndex];
if (Regex.Match(c_string, "\\s").Success)
{
string tmp = Regex.Replace(c_string, "\\s", "");
string finaltmp = Regex.Replace(tmp, #" ?\[.*?\]", ""); // To strip strings inside [] and inclusive [] alone
columnList[columnIndex] = finaltmp;
}
}
dataTable.Columns.AddRange(columnList.Select(v => new DataColumn(v)).ToArray());
dataTable.Columns.Add("ID");
return dataTable;
}
How to skip lines by reading alternatively and split and then add to my datatable !!!
AddRow Function : Managed to achieve my requirement by adding following changes !!!
private void AddRow(DataTable dataTable, string line)
{
if (line.Contains("Stdev"))
{
return;
}
else
{
//Rest of Code
}
}

Considering you have tab separated values in each line, how about reading the odd lines and splitting them into arrays. This is just a sample; you can expand upon this.
Test data (file.txt)
luck is when opportunity meets preparation
this line needs to be skipped
microsoft visual studio
another line to be skipped
let us all code
Code
var oddLines = File.ReadLines(#"C:\projects\file.txt").Where((item, index) => index%2 == 0);
foreach (var line in oddLines)
{
var words = line.Split('\t');
}
Debug screen shots
EDIT
To get lines that don't contain 'Stdev'
var filteredLines = System.IO.File.ReadLines(#"C:\projects\file.txt").Where(item => !item.Contains("Stdev"));

Change
using (var reader = new StreamReader(sourceFileFullName))
{
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
To
using (var reader = new StreamReader(sourceFileFullName))
{
int cnt = 0;
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
cnt++;
if(cnt % 2 == 0)
continue;
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Excel Text to Column Equivalent in C# Datatable - c#

Related

New line within CSV column causing issue

Split string that includes multiline substrings into substrings [duplicate]

Given string array of column names, how do I read a .csv file to a DataTable?

Ignoring CSV rows with no data

How to read tab delimited lines by skipping alternate lines

Categories

Resources