I have a large csv file which has millions of rows. The sample csv lines are
CODE,COMPANY NAME, DATE, ACTION
A,My Name , LLC,2018-01-28,BUY
B,Your Name , LLC,2018-01-25,SELL
C,
All Name , LLC,2018-01-21,SELL
D,World Name , LLC,2018-01-20,BUY
Row C has new line, but actually this is same record. I want to remove new line character from the csv line within cell\field\column.
I tired \r\n, Envirnment.NewLine and many other things, but could not make it work.
Here is my code..
private DataTable CSToDataTable(string csvfile)
{
Int64 row = 0;
try
{
string CSVFilePathName = csvfile; //#"C:\test.csv";
string[] Lines = File.ReadAllLines(CSVFilePathName.Replace(Environment.NewLine, ""));
string[] Fields;
Fields = Lines[0].Split(new char[] { ',' });
int Cols = Fields.GetLength(0);
DataTable dt = new DataTable();
//1st row must be column names; force lower case to ensure matching later on.
for (int i = 0; i < Cols; i++)
dt.Columns.Add(Fields[i].ToLower(), typeof(string));
DataRow Row;
for (row = 1; row < Lines.GetLength(0); row++)
{
Fields = Lines[row].Split(new char[] { ',' });
Row = dt.NewRow();
//Console.WriteLine(row);
for (int f = 0; f < Cols; f++)
{
Row[f] = Fields[f];
}
dt.Rows.Add(Row);
if (row == 190063)
{
}
}
return dt;
}
catch (Exception ex)
{
throw ex;
}
}
How can I remove new line character and read the row correctly? I don't want to skip the such rows as per the business requirement.
You CSV file is not in valid format. In order to parse and load them successfully, you will have to sanitize them. Couple of issues
COMPANY NAME column contains field separator in it. Fix them by
surrounding quotes.
New line in CSV value - This can be fixed by combining adjacent rows as one.
With Cinchoo ETL, you can sanitize and load your large file as below
string csv = #"CODE,COMPANY NAME, DATE, ACTION
A,My Name , LLC,2018-01-28,BUY
B,Your Name , LLC,2018-01-25,SELL
C,
All Name , LLC,2018-01-21,SELL
D,World Name , LLC,2018-01-20,BUY";
string bufferLine = null;
var reader = ChoCSVReader.LoadText(csv)
.WithFirstLineHeader()
.Setup(s => s.BeforeRecordLoad += (o, e) =>
{
string line = (string)e.Source;
string[] tokens = line.Split(",");
if (tokens.Length == 5)
{
//Fix the second and third value with quotes
e.Source = #"{0},""{1},{2}"",{3}, {4}".FormatString(tokens[0], tokens[1], tokens[2], tokens[3], tokens[4]);
}
else
{
//Fix the breaking lines, assume that some csv lines broken into max 2 lines
if (bufferLine == null)
{
bufferLine = line;
e.Skip = true;
}
else
{
line = bufferLine + line;
tokens = line.Split(",");
e.Source = #"{0},""{1},{2}"",{3}, {4}".FormatString(tokens[0], tokens[1], tokens[2], tokens[3], tokens[4]);
line = null;
}
}
});
foreach (var rec in reader)
Console.WriteLine(rec.Dump());
//Careful to load millions rows into DataTable
//var dt = reader.AsDataTable();
Hope it helps.
You haven't made it clear what are the possible criteria an unwanted new line could appear in the file. So assuming that a 'proper' line in the CSV file does NOT end with a comma, and if one ends with a comma that means that it's not a properly formatted line, you could do something like this:
static void Main(string[] args)
{
string path = #"CSVFile.csv";
List<CSVData> data = new List<CSVData>();
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
{
using (StreamReader sr = new StreamReader(fs))
{
sr.ReadLine(); // Header
while (!sr.EndOfStream)
{
var line = sr.ReadLine();
while (line.EndsWith(","))
{
line += sr.ReadLine();
}
var items = line.Split(new string[] { "," }, StringSplitOptions.None);
data.Add(new CSVData() { CODE = items[0], NAME = items[1], COMPANY = items[2], DATE = items[3], ACTION = items[4] });
}
}
}
Console.ReadLine();
}
public class CSVData
{
public string CODE { get; set; }
public string NAME { get; set; }
public string COMPANY { get; set; }
public string DATE { get; set; }
public string ACTION { get; set; }
}
Obviously there's a lot of error handling to be done here (for example, when creating a new CSVData object make sure your items contain all the data you want), but I think this is the start you need.
Related
I have a txt file, that has headers and then 3 columns of values (i.e)
Description=null
area = 100
1,2,3
1,2,4
2,1,5 ...
... 1,2,1//(these are the values that I need in one list)
Then another segment
Description=null
area = 10
1,2,3
1,2,4
2,1,5 ...
... 1,2,1//(these are the values that I need in one list).
In fact I just need one list per "Table" of values, the values always are in 3 columns but, there are n segments, any idea?
Thanks!
List<double> VMM40xyz = new List<double>();
foreach (var item in VMM40blocklines)
{
if (item.Contains(','))
{
VMM40xyz.AddRange(item.Split(',').Select(double.Parse).ToList());
}
}
I tried this, but it just work with the values in just one big list.
It looks like you want your data to end up in a format like this:
public class SetOfData //Feel free to name these parts better.
{
public string Description = "";
public string Area = "";
public List<double> Data = new List<double>();
}
...stored somewhere in...
List<SetOfData> finalData = new List<SetOfData>();
So, here's how I'd read that in:
public static List<SetOfData> ReadCustomFile(string Filename)
{
if (!File.Exists(Filename))
{
throw new FileNotFoundException($"{Filename} does not exist.");
}
List<SetOfData> returnData = new List<SetOfData>();
SetOfData currentDataSet = null;
using (FileStream fs = new FileStream(Filename, FileMode.Open))
{
using (StreamReader reader = new StreamReader(fs))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
//This will start a new object on every 'Description' line.
if (line.Contains("Description="))
{
//Save off the old data set if there is one.
if (currentDataSet != null)
returnData.Add(currentDataSet);
currentDataSet = new SetOfData();
//Now, to make sure there is something after "Description=" and to set the Description if there is.
//Your example data used "null" here, which this will take literally to be a string containing the letters "null". You can check the contents of parts[1] inside the if block to change this.
string[] parts = line.Split('=');
if (parts.Length > 1)
currentDataSet.Description = parts[1].Trim();
}
else if (line.Contains("area = "))
{
//Just in case your file didn't start with a "Description" line for some reason.
if (currentDataSet == null)
currentDataSet = new SetOfData();
//And then we do some string splitting like we did for Description.
string[] parts = line.Split('=');
if (parts.Length > 1)
currentDataSet.Area = parts[1].Trim();
}
else
{
//Just in case your file didn't start with a "Description" line for some reason.
if (currentDataSet == null)
currentDataSet = new SetOfData();
string[] parts = line.Split(',');
foreach (string part in parts)
{
if (double.TryParse(part, out double number))
{
currentDataSet.Data.Add(number);
}
}
}
}
//Make sure to add the last set.
returnData.Add(currentDataSet);
}
}
return returnData;
}
i am currently working on a small Project and i got stuck with a Problem i currently can not manage to solve...
I have multiple ".CSV" Files i want to read, they all have the same Data just with different Values.
Header1;Value1;Info1
Header2;Value2;Info2
Header3;Value3;Info3
While reading the first File i Need to Create the Headers. The Problem is they are not splited in Columns but in rows (as you can see above Header1-Header3).
Then it Needs to read the Value 1 - Value 3 (they are listed in the 2nd Column) and on top of that i Need to create another Header -> Header4 with the data of "Info2" which is always placed in Column 3 and Row 2 (the other values of Column 3 i can ignore).
So the Outcome after the first File should look like this:
Header1;Header2;Header3;Header4;
Value1;Value2;Value3;Info2;
And after multiple files it sohuld be like this:
Header1;Header2;Header3;Header4;
Value1;Value2;Value3;Value4;
Value1b;Value2b;Value3b;Value4b;
Value1c;Value2c;Value3c;Value4c;
I tried it with OleDB but i get the Error "missing ISAM" which i cant mange to fix. The Code i Used is the following:
public DataTable ReadCsv(string fileName)
{
DataTable dt = new DataTable("Data");
/* using (OleDbConnection cn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +
Path.GetDirectoryName(fileName) + "\";Extendet Properties ='text;HDR=yes;FMT=Delimited(,)';"))
*/
using (OleDbConnection cn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
Path.GetDirectoryName(fileName) + ";Extendet Properties ='text;HDR=yes;FMT=Delimited(,)';"))
{
using(OleDbCommand cmd = new OleDbCommand(string.Format("select *from [{0}]", new FileInfo(fileName).Name,cn)))
{
cn.Open();
using(OleDbDataAdapter adapter = new OleDbDataAdapter(cmd))
{
adapter.Fill(dt);
}
}
}
return dt;
}
Another attempt i did was using StreamReader. But the Headers are in the wrong place and i dont know how to Change this + do this for every file. the Code i tried is the following:
public static DataTable ReadCsvFilee(string path)
{
DataTable oDataTable = new DataTable();
var fileNames = Directory.GetFiles(path);
foreach (var fileName in fileNames)
{
//initialising a StreamReader type variable and will pass the file location
StreamReader oStreamReader = new StreamReader(fileName);
// CONTROLS WHETHER WE SKIP A ROW OR NOT
int RowCount = 0;
// CONTROLS WHETHER WE CREATE COLUMNS OR NOT
bool hasColumns = false;
string[] ColumnNames = null;
string[] oStreamDataValues = null;
//using while loop read the stream data till end
while (!oStreamReader.EndOfStream)
{
String oStreamRowData = oStreamReader.ReadLine().Trim();
if (oStreamRowData.Length > 0)
{
oStreamDataValues = oStreamRowData.Split(';');
//Bcoz the first row contains column names, we will poluate
//the column name by
//reading the first row and RowCount-0 will be true only once
// CHANGE TO CHECK FOR COLUMNS CREATED
if (!hasColumns)
{
ColumnNames = oStreamRowData.Split(';');
//using foreach looping through all the column names
foreach (string csvcolumn in ColumnNames)
{
DataColumn oDataColumn = new DataColumn(csvcolumn.ToUpper(), typeof(string));
//setting the default value of empty.string to newly created column
oDataColumn.DefaultValue = string.Empty;
//adding the newly created column to the table
oDataTable.Columns.Add(oDataColumn);
}
// SET COLUMNS CREATED
hasColumns = true;
// SET RowCount TO 0 SO WE KNOW TO SKIP COLUMNS LINE
RowCount = 0;
}
else
{
// IF RowCount IS 0 THEN SKIP COLUMN LINE
if (RowCount++ == 0) continue;
//creates a new DataRow with the same schema as of the oDataTable
DataRow oDataRow = oDataTable.NewRow();
//using foreach looping through all the column names
for (int i = 0; i < ColumnNames.Length; i++)
{
oDataRow[ColumnNames[i]] = oStreamDataValues[i] == null ? string.Empty : oStreamDataValues[i].ToString();
}
//adding the newly created row with data to the oDataTable
oDataTable.Rows.Add(oDataRow);
}
}
}
//close the oStreamReader object
oStreamReader.Close();
//release all the resources used by the oStreamReader object
oStreamReader.Dispose();
}
return oDataTable;
}
I am thankful for everyone who is willing to help. And Thanks for reading this far!
Sincerely yours
If I understood you right, there is a strict parsing there like this:
string OpenAndParse(string filename, bool firstFile=false)
{
var lines = File.ReadAllLines(filename);
var parsed = lines.Select(l => l.Split(';')).ToArray();
var header = $"{parsed[0][0]};{parsed[1][0]};{parsed[2][0]};{parsed[1][0]}\n";
var data = $"{parsed[0][1]};{parsed[1][1]};{parsed[2][1]};{parsed[1][2]}\n";
return firstFile
? $"{header}{data}"
: $"{data}";
}
Where it would return - if first file:
Header1;Header2;Header3;Header2
Value1;Value2;Value3;Value4
if not first file:
Value1;Value2;Value3;Value4
If I am correct, rest is about running this against a list file of files and joining the results in an output file.
EDIT: Against a directory:
void ProcessFiles(string folderName, string outputFileName)
{
bool firstFile = true;
foreach (var f in Directory.GetFiles(folderName))
{
File.AppendAllText(outputFileName, OpenAndParse(f, firstFile));
firstFile = false;
}
}
Note: I missed you want a DataTable and not an output file. Then you could simply create a list and put the results into that list making the list the datasource for your datatable (then why would you use semicolons in there? Probably all you need is to simply attach the array values to a list).
(Adding as another answer just to make it uncluttered)
void ProcessMyFiles(string folderName)
{
List<MyData> d = new List<MyData>();
var files = Directory.GetFiles(folderName);
foreach (var file in files)
{
OpenAndParse(file, d);
}
string[] headers = GetHeaders(files[0]);
DataGridView dgv = new DataGridView {Dock=DockStyle.Fill};
dgv.DataSource = d;
dgv.ColumnAdded += (sender, e) => {e.Column.HeaderText = headers[e.Column.Index];};
Form f = new Form();
f.Controls.Add(dgv);
f.Show();
}
string[] GetHeaders(string filename)
{
var lines = File.ReadAllLines(filename);
var parsed = lines.Select(l => l.Split(';')).ToArray();
return new string[] { parsed[0][0], parsed[1][0], parsed[2][0], parsed[1][0] };
}
void OpenAndParse(string filename, List<MyData> d)
{
var lines = File.ReadAllLines(filename);
var parsed = lines.Select(l => l.Split(';')).ToArray();
var data = new MyData
{
Col1 = parsed[0][1],
Col2 = parsed[1][1],
Col3 = parsed[2][1],
Col4 = parsed[1][2]
};
d.Add(data);
}
public class MyData
{
public string Col1 { get; set; }
public string Col2 { get; set; }
public string Col3 { get; set; }
public string Col4 { get; set; }
}
I don't know if this is the best way to do this. But what i would have done in your case, is to rewrite the CSV's the conventionnal way while reading all the files, then create a stream containing the new CSV created.
It would look like something like this :
var csv = new StringBuilder();
csv.AppendLine("Header1;Header2;Header3;Header4");
foreach (var item in file)
{
var newLine = string.Format("{0},{1},{2},{3}", item.value1, item.value2, item.value3, item.value4);
csv.AppendLine(newLine);
}
//Create Stream
MemoryStream stream = new MemoryStream();
StreamReader reader = new StreamReader(stream);
//Fill your data table here with your values
Hope this will help.
i have to import 2 CSV's.
CSV 1 [49]: Including about 50 tab seperated colums.
CSV 2:[2] Inlcudes 3 Columns which should be replaced on the [3] [6] and [11] place of my first csv.
So heres what i do:
1) Importing the csv and split into a array.
string employeedatabase = "MYPATH";
List<String> status = new List<String>();
StreamReader file2 = new System.IO.StreamReader(filename);
string line = file2.ReadLine();
while ((line = file2.ReadLine()) != null)
{
string[] ud = line.Split('\t');
status.Add(ud[0]);
}
String[] ud_status = status.ToArray();
PROBLEM 1: i have about 50 colums to handle, ud_status is just the first, so do i need 50 Lists and 50 String arrays?
2) Importing the second csv and split into a array.
List<String> vorname = new List<String>();
List<String> nachname = new List<String>();
List<String> username = new List<String>();
StreamReader file = new System.IO.StreamReader(employeedatabase);
string line3 = file.ReadLine();
while ((line3 = file.ReadLine()) != null)
{
string[] data = line3.Split(';');
vorname.Add(data[0]);
nachname.Add(data[1]);
username.Add(data[2]);
}
String[] db_vorname = vorname.ToArray();
String[] db_nachname = nachname.ToArray();
String[] db_username = username.ToArray();
PROBLEM 2: After loading these two csv's i dont know how to combine them, and change to columns as mentioned above ..
somethine like this?
mynewArray = ud_status + "/t" + ud_xy[..n] + "/t" + changed_colum + ud_xy[..n];
save "mynewarray" into tablulator seperated csv with encoding "utf-8".
To read the file into a meaningful format, you should set up a class that defines the format of your CSV:
public class CsvRow
{
public string vorname { get; set; }
public string nachname { get; set; }
public string username { get; set; }
public CsvRow (string[] data)
{
vorname = data[0];
nachname = data[1];
username = data[2];
}
}
Then populate a list of this:
List<CsvRow> rows = new List<CsvRow>();
StreamReader file = new System.IO.StreamReader(employeedatabase);
string line3 = file.ReadLine();
while ((line3 = file.ReadLine()) != null)
{
rows.Add(new CsvRow(line3.Split(';'));
}
Similarly format your other CSV and include unused properties for the new fields. Once you have loaded both, you can populate the new properties from this list in a loop, matching the records by whatever common field the CSVs hopefully share. Then finally output the resulting data to a new CSV file.
Your solution is not to use string arrays to do this. That will just drive you crazy. It's better to use the System.Data.DataTable object.
I didn't get a chance to test the LINQ lambda expression at the end of this (or really any of it, I wrote this on a break), but it should get you on the right track.
using (var ds = new System.Data.DataSet("My Data"))
{
ds.Tables.Add("File0");
ds.Tables.Add("File1");
string[] line;
using (var reader = new System.IO.StreamReader("FirstFile"))
{
//first we get columns for table 0
foreach (string s in reader.ReadLine().Split('\t'))
ds.Tables["File0"].Columns.Add(s);
while ((line = reader.ReadLine().Split('\t')) != null)
{
//and now the rest of the data.
var r = ds.Tables["File0"].NewRow();
for (int i = 0; i <= line.Length; i++)
{
r[i] = line[i];
}
ds.Tables["File0"].Rows.Add(r);
}
}
//we could probably do these in a loop or a second method,
//but you may want subtle differences, so for now we just do it the same way
//for file1
using (var reader2 = new System.IO.StreamReader("SecondFile"))
{
foreach (string s in reader2.ReadLine().Split('\t'))
ds.Tables["File1"].Columns.Add(s);
while ((line = reader2.ReadLine().Split('\t')) != null)
{
//and now the rest of the data.
var r = ds.Tables["File1"].NewRow();
for (int i = 0; i <= line.Length; i++)
{
r[i] = line[i];
}
ds.Tables["File1"].Rows.Add(r);
}
}
//you now have these in functioning datatables. Because we named columns,
//you can call them by name specifically, or by index, to replace in the first datatable.
string[] columnsToReplace = new string[] { "firstColumnName", "SecondColumnName", "ThirdColumnName" };
for(int i = 0; i < ds.Tables[0].Rows.Count; i++)
{
//you didn't give a sign of any relation between the two tables
//so this is just by row, and assumes the row count is equivalent.
//This is also not advised.
//if there is a key these sets of data share
//you should join on them instead.
foreach(DataRow dr in ds.Tables[0].Rows[i].ItemArray)
{
dr[3] = ds.Tables[1].Rows[i][columnsToReplace[0]];
dr[6] = ds.Tables[1].Rows[i][columnsToReplace[1]];
dr[11] = ds.Tables[1].Rows[i][columnsToReplace[2]];
}
}
//ds.Tables[0] now has the output you want.
string output = String.Empty;
foreach (var s in ds.Tables[0].Columns)
output = String.Concat(output, s ,"\t");
output = String.Concat(output, Environment.NewLine); // columns ready, now the rows.
foreach (DataRow r in ds.Tables[0].Rows)
output = string.Concat(output, r.ItemArray.SelectMany(t => (t.ToString() + "\t")), Environment.NewLine);
if(System.IO.File.Exists("MYPATH"))
using (System.IO.StreamWriter file = new System.IO.StreamWriter("MYPATH")) //or a variable instead of string literal
{
file.Write(output);
}
}
With Cinchoo ETL - an open source file helper library, you can do the merge of CSV files as below. Assumed the 2 CSV file contains same number of lines.
string CSV1 = #"Id Name City
1 Tom New York
2 Mark FairFax";
string CSV2 = #"Id City
1 Las Vegas
2 Dallas";
dynamic rec1 = null;
dynamic rec2 = null;
StringBuilder csv3 = new StringBuilder();
using (var csvOut = new ChoCSVWriter(new StringWriter(csv3))
.WithFirstLineHeader()
.WithDelimiter("\t")
)
{
using (var csv1 = new ChoCSVReader(new StringReader(CSV1))
.WithFirstLineHeader()
.WithDelimiter("\t")
)
{
using (var csv2 = new ChoCSVReader(new StringReader(CSV2))
.WithFirstLineHeader()
.WithDelimiter("\t")
)
{
while ((rec1 = csv1.Read()) != null && (rec2 = csv2.Read()) != null)
{
rec1.City = rec2.City;
csvOut.Write(rec1);
}
}
}
}
Console.WriteLine(csv3.ToString());
Hope it helps.
Disclaimer: I'm the author of this library.
Major Edit: I am doing a bad job of explaining :(
I have two classes:
public class UserDefinitions// a list of 'Items', each'group of items belong to a user. I handle User logic elsewhere, and it works flawlessly.
{
public List<Item> items { get; set; }
}
public class Item //the User definitions. A user could have 1 or 15 of these. They would all be a single 'line' from the CSV file.
{
public string definitionKey { get; set; }
public string defName { get; set; }
public string defValue { get; set; }
}
Which I wanna build with a CSV File. I build this CSV File, so I make it using the same parameters every time.
I run SQL on my company's DB to generate results like so: http://i.imgur.com/gS1UJot.png
Then I read the file like so:
class Program
{
static void Main(string[] args)
{
var userData = new UserDefinitions();
var csvList = new List<Item>();
string json = "";
string fPath = #"C:\test\csvTest.csv";
var lines = File.ReadAllLines(fPath);
Console.WriteLine(lines);
List<string> udata = new List<string>(lines);
foreach (var line in udata)
{
string[] userDataComplete = line.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);// this cleans any empty cells from the CSV
csvList.Add(new Item { definitionKey = userDataComplete[1], defName = userDataComplete[2], defValue = userDataComplete[3] });
}
json = JsonConvert.SerializeObject(csvList); //everything below is for debugging/tracking progress
Console.WriteLine(json);
Console.ReadKey();
StreamWriter sw = new StreamWriter("C:\\test\\testjson.txt");
sw.WriteLine(json);
sw.Close();
}
}
This ALMOST does what I want. The output json is from the first 'column' of the csv data
[{"definitionKey":"uuid1","defName":"HairColor","defValue":"Brown"},{"definitionKey":"uuid1","defName":"HairColor","defValue":"Blonde"},{"definitionKey":"uuid1","defName":"HairColor","defValue":"Blue"}]
When using the screen shot as an example, the wanted output should be
[{"attributeDefinitionKey":"uuid1","name":"HairColor","value":"Brown"},{"definitionKey":"uuid2","defName":"FreckleAmount","defValue":"50"}]
[{"attributeDefinitionKey":"uuid1","name":"HairColor","value":"Blonde"},{"definitionKey":"uuid2","defName":"FreckleAmount","defValue":"null"}]
[{"attributeDefinitionKey":"uuid1","name":"HairColor","value":"Blue"},{"definitionKey":"uuid3","defName":"Tattoos","defValue":"5"}]
I can't pick out certain aspects at will, or apply them to Items. For example there maybe 10 users or 5000 users, but the definitionKey will always be the [1], and adding '3' will get every subsequent defintionKey. Just like the defName will always be in the [2] spot and adding 3 will get every subsequent defName if there are any, this is all per line.
I know I have to add some +3 logic, but not quite sure how to incorporate that. Maybe a for loop? a nested for loop after a foreach loop? I feel I am missing something obvious!
Thanks again for any help
This reads the csv line for line and converts each row to json, while adapting to the change in the amount of columns.
This only works if the CSV follows your rules:
one userId and
x amount of "Things" with 3 columns per "Thing".
private static void Main(string[] args)
{
var file = new StreamReader(#"C:\test\csvTest.csv");
string line;
var itemsJson = new List<string>();
file.ReadLine();
while ((line = file.ReadLine()) != null)
{
var sb = new StringBuilder();
var fields = line.Split(',');
sb.Append(GetKeyValueJson("UserId", fields[0]));
for (var i = 1; i < fields.Length; i += 3)
{
var x = (i + 3) / 3;
sb.Append(GetKeyValueJson($"Thing {i + x} ID", fields[i]));
sb.Append(GetKeyValueJson($"Thing {i + x} ID", fields[i + 1]));
sb.Append(i + 3 == fields.Length
? GetKeyValueJson($"Thing {i + x} ID", fields[i + 2], true)
: GetKeyValueJson($"Thing {i + x} ID", fields[i + 2]));
}
itemsJson.Add(WrapJson(sb.ToString()));
}
var json = WrapItems(itemsJson);
Console.ReadLine();
}
private static string GetKeyValueJson(string id, string value, bool lastPair = false)
{
var sb = new StringBuilder();
sb.Append('"');
sb.Append(id);
sb.Append('"');
sb.Append(':');
sb.Append('"');
sb.Append(value);
sb.Append('"');
if (!lastPair)
sb.Append(',');
return sb.ToString();
}
private static string WrapJson(string s)
{
var sb = new StringBuilder();
sb.Append('{');
sb.Append(s);
sb.Append('}');
return sb.ToString();
}
private static string WrapItems(List<string> jsonList)
{
var sb = new StringBuilder();
sb.Append('"');
sb.Append("Items");
sb.Append('"');
sb.Append(':');
sb.Append('[');
sb.Append(jsonList.Aggregate((current, next) => current + "," + next));
sb.Append(']');
return WrapJson(sb.ToString());
}
}
It's not pretty and sorting would be tough, but it should adapt to the column amount as long as it is in 3's.
I am currently able to parse and extract data from large tab delimited file. I am reading, parsing and extracting line by line and adding the split items in my Data table (Row Limit adding 3 rows at a time). I need to skip even lines i.e. Read first maximum tab delimited line and then skip 2nd one and read the third one directly.
My Tab delimited source file format
001Mean 26.975 1.1403 910.45
001Stdev 26.975 1.1403 910.45
002Mean 26.975 1.1403 910.45
002Stdev 26.975 1.1403 910.45
Need to skip or avoid reading Stdev tab delimited lines.
C# Code:
Getting the Maximum length of items in a tab delimited line of the file by splitting a line
using (var reader = new StreamReader(sourceFileFullName))
{
string line = null;
line = reader.ReadToEnd();
if (!string.IsNullOrEmpty(line))
{
var list_with_max_cols = line.Split('\n').OrderByDescending(y => y.Split('\t').Count()).Take(1);
foreach (var value in list_with_max_cols)
{
var values = value.ToString().Split(new[] { '\t', '\n' }).ToArray();
MAX_NO_OF_COLUMNS = values.Length;
}
}
}
Reading the file line by line until maximum length in a tab delimited line is satisfied as first line to parse and extract
using (var reader = new StreamReader(sourceFileFullName))
{
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
//when reach first line it is column list need to create datatable based on that.
if (firstLineOfFile)
{
columnData = new_read_line;
firstLineOfFile = false;
continue;
}
if (firstLineOfChunk)
{
firstLineOfChunk = false;
chunkDataTable = CreateEmptyDataTable(columnData);
}
AddRow(chunkDataTable, new_read_line);
chunkRowCount++;
if (chunkRowCount == _chunkRowLimit)
{
firstLineOfChunk = true;
chunkRowCount = 0;
yield return chunkDataTable;
chunkDataTable = null;
}
}
}
Creating Data Table:
private DataTable CreateEmptyDataTable(string firstLine)
{
IList<string> columnList = Split(firstLine);
var dataTable = new DataTable("TableName");
for (int columnIndex = 0; columnIndex < columnList.Count; columnIndex++)
{
string c_string = columnList[columnIndex];
if (Regex.Match(c_string, "\\s").Success)
{
string tmp = Regex.Replace(c_string, "\\s", "");
string finaltmp = Regex.Replace(tmp, #" ?\[.*?\]", ""); // To strip strings inside [] and inclusive [] alone
columnList[columnIndex] = finaltmp;
}
}
dataTable.Columns.AddRange(columnList.Select(v => new DataColumn(v)).ToArray());
dataTable.Columns.Add("ID");
return dataTable;
}
How to skip lines by reading alternatively and split and then add to my datatable !!!
AddRow Function : Managed to achieve my requirement by adding following changes !!!
private void AddRow(DataTable dataTable, string line)
{
if (line.Contains("Stdev"))
{
return;
}
else
{
//Rest of Code
}
}
Considering you have tab separated values in each line, how about reading the odd lines and splitting them into arrays. This is just a sample; you can expand upon this.
Test data (file.txt)
luck is when opportunity meets preparation
this line needs to be skipped
microsoft visual studio
another line to be skipped
let us all code
Code
var oddLines = File.ReadLines(#"C:\projects\file.txt").Where((item, index) => index%2 == 0);
foreach (var line in oddLines)
{
var words = line.Split('\t');
}
Debug screen shots
EDIT
To get lines that don't contain 'Stdev'
var filteredLines = System.IO.File.ReadLines(#"C:\projects\file.txt").Where(item => !item.Contains("Stdev"));
Change
using (var reader = new StreamReader(sourceFileFullName))
{
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;
To
using (var reader = new StreamReader(sourceFileFullName))
{
int cnt = 0;
string new_read_line = null;
//Read and display lines from the file until the end of the file is reached.
while ((new_read_line = reader.ReadLine()) != null)
{
cnt++;
if(cnt % 2 == 0)
continue;
var items = new_read_line.Split(new[] { '\t', '\n' }).ToArray();
if (items.Length != MAX_NO_OF_COLUMNS)
continue;