Reading a CSV file in .NET? - c#

How do I read a CSV file using C#?

A choice, without using third-party components, is to use the class Microsoft.VisualBasic.FileIO.TextFieldParser (http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx) . It provides all the functions for parsing CSV. It is sufficient to import the Microsoft.VisualBasic assembly.
var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(file);
parser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
parser.SetDelimiters(new string[] { ";" });
while (!parser.EndOfData)
{
string[] row = parser.ReadFields();
/* do something */
}

You can use the Microsoft.VisualBasic.FileIO.TextFieldParser class in C#:
using System;
using System.Data;
using Microsoft.VisualBasic.FileIO;
static void Main()
{
string csv_file_path = #"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTableFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTableFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datacolumn = new DataColumn(column);
datacolumn.AllowDBNull = true;
csvData.Columns.Add(datacolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}

You could try CsvHelper, which is a project I work on. Its goal is to make reading and writing CSV files as easy as possible, while being very fast.
Here are a few ways you can read from a CSV file.
// By type
var records = csv.GetRecords<MyClass>();
var records = csv.GetRecords( typeof( MyClass ) );
// Dynamic
var records = csv.GetRecords<dynamic>();
// Using anonymous type for the class definition
var anonymousTypeDefinition =
{
Id = default( int ),
Name = string.Empty,
MyClass = new MyClass()
};
var records = csv.GetRecords( anonymousTypeDefinition );

I usually use a simplistic approach like this one:
var path = Server.MapPath("~/App_Data/Data.csv");
var csvRows = System.IO.File.ReadAllLines(path, Encoding.Default).ToList();
foreach (var row in csvRows.Skip(1))
{
var columns = row.Split(';');
var field1 = columns[0];
var field2 = columns[1];
var field3 = columns[2];
}

I just used this library in my application. http://www.codeproject.com/KB/database/CsvReader.aspx. Everything went smoothly using this library, so I'm recommending it. It is free under the MIT License, so just include the notice with your source files.
I didn't display the CSV in a browser, but the author has some samples for Repeaters or DataGrids. I did run one of his test projects to test a Sort operation I have added and it looked pretty good.

You can try Cinchoo ETL - an open source lib for reading and writing CSV files.
Couple of ways you can read CSV files
Id, Name
1, Tom
2, Mark
This is how you can use this library to read it
using (var reader = new ChoCSVReader("emp.csv").WithFirstLineHeader())
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
If you have POCO object defined to match up with CSV file like below
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
You can parse the same file using this POCO class as below
using (var reader = new ChoCSVReader<Employee>("emp.csv").WithFirstLineHeader())
{
foreach (var item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
Please check out articles at CodeProject on how to use it.
Disclaimer: I'm the author of this library

I recommend Angara.Table, about save/load: http://predictionmachines.github.io/Angara.Table/saveload.html.
It makes column types inference, can save CSV files and is much faster than TextFieldParser. It follows RFC4180 for CSV format and supports multiline strings, NaNs, and escaped strings containing the delimiter character.
The library is under MIT license. Source code is https://github.com/Microsoft/Angara.Table.
Though its API is focused on F#, it can be used in any .NET language but not so succinct as in F#.
Example:
using Angara.Data;
using System.Collections.Immutable;
...
var table = Table.Load("data.csv");
// Print schema:
foreach(Column c in table)
{
string colType;
if (c.Rows.IsRealColumn) colType = "double";
else if (c.Rows.IsStringColumn) colType = "string";
else if (c.Rows.IsDateColumn) colType = "date";
else if (c.Rows.IsIntColumn) colType = "int";
else colType = "bool";
Console.WriteLine("{0} of type {1}", c.Name, colType);
}
// Get column data:
ImmutableArray<double> a = table["a"].Rows.AsReal;
ImmutableArray<string> b = table["b"].Rows.AsString;
Table.Save(table, "data2.csv");

You might be interested in Linq2Csv library at CodeProject. One thing you would need to check is that if it's reading the data when it needs only, so you won't need a lot of memory when working with bigger files.
As for displaying the data on the browser, you could do many things to accomplish it, if you would be more specific on what are your requirements, answer could be more specific, but things you could do:
1. Use HttpListener class to write simple web server (you can find many samples on net to host mini-http server).
2. Use Asp.Net or Asp.Net Mvc, create a page, host it using IIS.

Seems like there are quite a few projects on CodeProject or CodePlex for CSV Parsing.
Here is another CSV Parser on CodePlex
http://commonlibrarynet.codeplex.com/
This library has components for CSV parsing, INI file parsing, Command-Line parsing as well. It's working well for me so far. Only thing is it doesn't have a CSV Writer.

This is just for parsing the CSV. For displaying it in a web page, it is simply a matter of taking the list and rendering it however you want.
Note: This code example does not handle the situation where the input string line contains newlines.
public List<string> SplitCSV(string line)
{
if (string.IsNullOrEmpty(line))
throw new ArgumentException();
List<string> result = new List<string>();
int index = 0;
int start = 0;
bool inQuote = false;
StringBuilder val = new StringBuilder();
// parse line
foreach (char c in line)
{
switch (c)
{
case '"':
inQuote = !inQuote;
break;
case ',':
if (!inQuote)
{
result.Add(line.Substring(start, index - start)
.Replace("\"",""));
start = index + 1;
}
break;
}
index++;
}
if (start < index)
{
result.Add(line.Substring(start, index - start).Replace("\"",""));
}
return result;
}
}

I have been maintaining an open source project called FlatFiles for several years now. It's available for .NET Core and .NET 4.5.1.
Unlike most of the alternatives, it allows you to define a schema (similar to the way EF code-first works) with an extreme level of precision, so you aren't fight conversion issues all the time. You can map directly to your data classes, and there is also support for interfacing with older ADO.NET classes.
Performance-wise, it's been tuned to be one of the fastest parsers for .NET, with a plethora of options for quirky format differences. There's also support for fixed-length files, if you need it.

you can use this library: Sky.Data.Csv
https://www.nuget.org/packages/Sky.Data.Csv/
this is a really fast CSV reader library and it's really easy to use:
using Sky.Data.Csv;
var readerSettings = new CsvReaderSettings{Encoding = Encoding.UTF8};
using(var reader = CsvReader.Create("path-to-file", readerSettings)){
foreach(var row in reader){
//do something with the data
}
}
it also supports reading typed objects with CsvReader<T> class which has a same interface.

Related

C# Linq doesn't recognize Czech characters while reading from .csv file

Basically, when trying to get a .csv file into a list using Linq, all characters with diacritics turn into <?> character. What should i do to make the code keep them as in the .csv file?
using (StreamReader ctec = new StreamReader(souborovejmeno))
{
var lines = File.ReadAllLines(souborovejmeno).Select(a => a.Split('\t'));
var csv = from line in lines
select (from piece in line
select piece).ToList();
foreach (var c in csv)
{
hraci.Add(new Hrac(c[0], c[1]));
listBox1.Items.Add(c[0]);
}
}
Thanks in advance for answers. Sorry if this is quite dumb, i am not too experienced in coding.
I think your problem is missed encoding. I see you already have answer above that works.
var lines = File.ReadAllLines(path, Encoding.UTF8).Select(a => a.Split('\t'));
But I strongly recommend you to use CsvHelper
dotnet add package CsvHelper
And use something like this
public class Record
{
[Index(0)]
public int Key { get; set; }
[Index(1)]
public string Value { get; set; }
}
....
using (var reader = new StreamReader(souborovejmeno, Encoding.UTF8))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
var records = csv.GetRecords<Record>();
foreach(var record in records) {
hraci.Add(new Hrac(record.Key, record.Value));
listBox1.Items.Add(record.Key);
}
}
...
Try to include an encoding like that:
var lines = File.ReadAllLines(path, Encoding.UTF8).Select(a => a.Split('\t'));
Make sure to import System.Text

How would I convert data in a .txt file into xml? c#

I have thousands of lines of data in a text file I want to make easily searchable by turning it into something easier to search (I am hoping an XML or another type of large data structure, though I am not sure if it will be the best for what I have in mind).
The data looks like this for each line:
Book 31, Thomas,George, 32, 34, 154
(each book is not unique, they are indexes so book will have several different entries of whom is listed in it, and the numbers are the page they are listed)
So I am kinda of lost on how to do this, I would want to read the .txt file, trim out all the spaces and commas, I basically get how to prep the data for it, but how would I programmatically make that many elements and values in xml or populate some other large data structure?
If your csv file does not change too much and the structure is stable, you could simply parse it to a list of objects at startup
private class BookInfo {
string title {get;set;}
string person {get;set;}
List<int> pages {get;set;}
}
private List<BookInfo> allbooks = new List<BookInfo>();
public void parse() {
var lines = File.ReadAllLines(filename); //you could also read the file line by line here to avoid reading the complete file into memory
foreach (var l in lines) {
var info = l.Split(',').Select(x=>x.Trim()).ToArray();
var b = new BookInfo {
title = info[0],
person = info[1]+", " + info[2],
pages = info.Skip(3).Select(x=> int.Parse(x)).ToList()
};
allbooks.Add(b);
}
}
Then you can easily search the allbooks list with for instance LINQ.
EDIT
Now, that you have clarified your input, I adapted the parsing a little bit to better fit your needs.
If you want to search your booklist by either the title or the person more easily, you can also create a lookup on each of the properties
var titleLookup = allbooks.ToLookup(x=> x.title);
var personLookup = allbooks.ToLookup(x => x.person);
So personLookup["Thomas, George"] will give you a list of all bookinfos that mention "Thomas, George" and titleLookup["Book 31"] will give you a list of all bookinfos for "Book 31", ie all persons mentioned in that book.
If you want the CSV file to make easily searchable by turning it into something easier to search, you can convert it to DataTable.
if you want data , you can use LINQ to XML to search
The following class generates both DataTable or Xml data format. You can pass delimeter ,includeHeader or use the default:
class CsvUtility
{
public DataTable Csv2DataTable(string fileName, bool includeHeader = false, char separator = ',')
{
IEnumerable<string> reader = File.ReadAllLines(fileName);
var data = new DataTable("Table");
var headers = reader.First().Split(separator);
if (includeHeader)
{
foreach (var header in headers)
{
data.Columns.Add(header.Trim());
}
reader = reader.Skip(1);
}
else
{
for (int index = 0; index < headers.Length; index++)
{
var header = "Field" + index; // headers[index];
data.Columns.Add(header);
}
}
foreach (var row in reader)
{
if (row != null) data.Rows.Add(row.Split(separator));
}
return data;
}
public string Csv2Xml(string fileName, bool includeHeader = false, char separator = ',')
{
var dt = Csv2DataTable(fileName, includeHeader, separator);
var stream = new StringWriter();
dt.WriteXml(stream);
return stream.ToString();
}
}
example to use:
CsvUtility csv = new CsvUtility();
var dt = csv.Csv2DataTable("f1.txt");
// Search for string in any column
DataRow[] filteredRows = dt.Select("Field1 LIKE '%" + "Thomas" + "%'");
//search in certain field
var filtered = dt.AsEnumerable().Where(r => r.Field<string>("Field1").Contains("Thomas"));
//generate xml
var xml= csv.Csv2Xml("f1.txt");
Console.WriteLine(xml);
/*
output of xml for your sample:
<DocumentElement>
<Table>
<Field0>Book 31</Field0>
<Field1> Thomas</Field1>
<Field2>George</Field2>
<Field3> 32</Field3>
<Field4> 34</Field4>
<Field5> 154</Field5>
</Table>
</DocumentElement>
*/

Enforce LF line endings with CsvHelper

If I have some LF converted (using N++) CSV files, everytime I write data to them using JoshClose's CsvHelper the line endings are back to CRLF.
Since I'm having problems with CLRF ROWTERMINATORS in SQL Server, I whish to keep my line endings like the initital status of the file.
Couldn't find it in the culture settings, I compile my own version of the library.
How to proceed?
Missing or incorrect Newline characters when using CsvHelper is a common problem with a simple but poorly documented solution. The other answers to this SO question are correct but are missing one important detail.
Configuration allows you to choose from one of four available alternatives:
// Pick one of these alternatives
CsvWriter.Configuration.NewLine = NewLine.CR;
CsvWriter.Configuration.NewLine = NewLine.LF;
CsvWriter.Configuration.NewLine = NewLine.CRLF;
CsvWriter.Configuration.NewLine = NewLine.Environment;
However, many people are tripped up by the fact that (by design) CsvWriter does not emit any newline character when you write the header using CsvWriter.WriteHeader() nor when you write a single record using CsvWriter.WriteRecord(). The reason is so that you can write additional header elements or additional record elements, as you might do when your header and row data comes from two or more classes rather than from a single class.
CsvWriter does emit the defined type of newline when you call CsvWriter.NextRecord(), and the author, JoshClose, states that you are supposed to call NextRecord() after you are done with the header and after you are done with each individual row added using WriteRecord. See GitHub Issues List 929
When you are writing multiple records using WriteRecords() CsvWriter automatically emits the defined type of newline at the end of each record.
In my opinion this ought to be much better documented, but there it is.
From what I can tell, the line terminator isn't controlled by CvsHelper. I've gotten it to work by adjusting the File writer I pass to CsvWriter.
TextWriter tw = File.CreateText(filepathname);
tw.NewLine = "\n";
CsvWriter csvw = new CsvWriter(tw);
csvw.WriteRecords(records);
csvw.Dispose();
Might be useful for somebody:
public static void AppendToCsv(ShopDataModel shopRecord)
{
using (var writer = new StreamWriter(DestinationFile, true))
{
using (var csv = new CsvWriter(writer))
{
csv.WriteRecord(shopRecord);
writer.Write("\n");
}
}
}
As of CsvHelper 13.0.0, line-endings are now configurable via the NewLine configuration property.
E.g.:
using CsvHelper;
using CsvHelper.Configuration;
using System.Globalization;
void Main()
{
using (var writer = new StreamWriter(#"my-file.csv"))
{
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.Configuration.HasHeaderRecord = false;
csv.Configuration.NewLine = NewLine.LF; // <<####################
var records = new List<Foo>
{
new Foo { Id = 1, Name = "one" },
new Foo { Id = 2, Name = "two" },
};
csv.WriteRecords(records);
}
}
}
private class Foo
{
public int Id { get; set; }
public string Name { get; set; }
}

How to remove characters from an excel sheet?

My overall problem is that I have a large Excel file(Column A-S, 85000 rows) that I want to convert to XML. The data in the cells is all text.
The process I'm using now is to manually save the excel file as csv, then parse that in my own c# program to turn it into XML. If you have better recommendations, please recommend. I've searched SO and the only fast methods I found for converting straight to XML require my data to be all numeric.
(Tried reading cell by cell, would have taken 3 days to process)
So, unless you can recommend a different way for me to approach the problem, I want to be able to programmatically remove all commas, <, >, ', and " from the excel sheet.
There are many options to read/edit/create Excel files:
MS provides the free OpenXML SDK V 2.0 - see http://msdn.microsoft.com/en-us/library/bb448854%28office.14%29.aspx (XLSX only)
This can read+write MS Office files (including Excel).
Another free option see http://www.codeproject.com/KB/office/OpenXML.aspx (XLSX only)
IF you need more like handling older Excel versions (like XLS, not only XLSX), rendering, creating PDFs, formulas etc. then there are different free and commercial libraries like ClosedXML (free, XLSX only), EPPlus (free, XLSX only), Aspose.Cells, SpreadsheetGear, LibXL and Flexcel etc.
Another option is Interop which requires Excel to be installed locally BUT Interop is not supported in sever-scenarios by MS.
Any library-based approach to deal with the Excel-file directly is way faster than Interop in my experience...
I would use a combination of Microsoft.Office.Interop.Excel and XmlSerializer to get the job done.
This is in light of the fact that a) you're using a console appilcation, and b) the interop assemblies are easy to integrate to the solution (just References->Add).
I'm assuming that you have a copy of Excel installed in the machine runnning the process (you mentioned you manually open the workbook currently, hence the assumption).
The code would look something like this:
The serializable layer:
public class TestClass
{
public List<TestLineItem> LineItems { get; set; }
public TestClass()
{
LineItems = new List<TestLineItem>();
}
}
public class TestLineItem
{
private string SanitizeText(string input)
{
return input.Replace(",", "")
.Replace(".", "")
.Replace("<", "")
.Replace(">", "")
.Replace("'", "")
.Replace("\"", "");
}
private string m_field1;
private string m_field2;
public string Field1
{
get { return m_field1; }
set { m_field1 = SanitizeText(value); }
}
public string Field2
{
get { return m_field2; }
set { m_field2 = SanitizeText(value); }
}
public decimal Field3 { get; set; }
public TestLineItem() { }
public TestLineItem(object field1, object field2, object field3)
{
m_field1 = (field1 ?? "").ToString();
m_field2 = (field2 ?? "").ToString();
if (field3 == null || field3.ToString() == "")
Field3 = 0m;
else
Field3 = Convert.ToDecimal(field3.ToString());
}
}
Then open the worksheet and load into a 2D array:
// using OExcel = Microsoft.Office.Interop.Excel;
var app = new OEXcel.Application();
var wbPath = Path.Combine(
Environment.GetFolderPath(
Environment.SpecialFolder.MyDocuments), "Book1.xls");
var wb = app.Workbooks.Open(wbPath);
var ws = (OEXcel.Worksheet)wb.ActiveSheet;
// there are better ways to do this...
// this one's just off the top of my head
var rngTopLine = ws.get_Range("A1", "C1");
var rngEndLine = rngTopLine.get_End(OEXcel.XlDirection.xlDown);
var rngData = ws.get_Range(rngTopLine, rngEndLine);
var arrayData = (object[,])rngData.Value2;
var tc = new TestClass();
// since you're enumerating an array, the operation will run much faster
// than reading the worksheet line by line.
for (int i = arrayData.GetLowerBound(0); i <= arrayData.GetUpperBound(0); i++)
{
tc.LineItems.Add(
new TestLineItem(arrayData[i, 1], arrayData[i, 2], arrayData[i, 3]));
}
var xs = new XmlSerializer(typeof(TestClass));
var fs = File.Create(Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments),
"Book1.xml"));
xs.Serialize(fs, tc);
wb.Close();
app.Quit();
The generated XML output will look something like this:
<TestClass>
<LineItems>
<TestLineItem>
<Field1>test1</Field1>
<Field2>some&lt;encoded&gt; stuff here</Field2>
<Field3>123456.789</Field3>
</TestLineItem>
<TestLineItem>
<Field1>test2</Field1>
<Field2>testing some commas, and periods.</Field2>
<Field3>23456789.12</Field3>
</TestLineItem>
<TestLineItem>
<Field1>test3</Field1>
<Field2>text in &quot;quotes&quot; and &#39;single quotes&#39;</Field2>
<Field3>0</Field3>
</TestLineItem>
</LineItems>
</TestClass>

LINQ for beginners

I love C#, I love the framework, and I also love to learn as much as possible. Today I began to read articles about LINQ in C# and I couldn't find anything good for a beginner that never worked with SQL in his life.
I found this article very helpful and I understood small parts of it, but I'd like to get more examples.
After reading it couple of times, I tried to use LINQ in a function of mine, but I failed.
private void Filter(string filename)
{
using (TextWriter writer = File.CreateText(Application.StartupPath + "\\temp\\test.txt"))
{
using(TextReader reader = File.OpenText(filename))
{
string line;
while((line = reader.ReadLine()) != null)
{
string[] items = line.Split('\t');
int myInteger = int.Parse(items[1]);
if (myInteger == 24809) writer.WriteLine(line);
}
}
}
}
This is what I did and it did not work, the result was always false.
private void Filter(string filename)
{
using (TextWriter writer = File.CreateText(Application.StartupPath + "\\temp\\test.txt"))
{
using(TextReader reader = File.OpenText(filename))
{
string line;
while((line = reader.ReadLine()) != null)
{
string[] items = line.Split('\t');
var Linqi = from item in items
where int.Parse(items[1]) == 24809
select true;
if (Linqi == true) writer.WriteLine(line);
}
}
}
}
I'm asking for two things:
How would the function look like using as much Linq as possible?
A website/book/article about Linq,but please note I'm a decent beginner in sql/linq.
Thank you in advance!
Well one thing that would make your sample more "LINQy" is an IEnumerable<string> for reading lines from a file. Here's a somewhat simplified version of my LineReader class from MiscUtil:
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
public sealed class LineReader : IEnumerable<string>
{
readonly Func<TextReader> dataSource;
public LineReader(string filename)
: this(() => File.OpenText(filename))
{
}
public LineReader(Func<TextReader> dataSource)
{
this.dataSource = dataSource;
}
public IEnumerator<string> GetEnumerator()
{
using (TextReader reader = dataSource())
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Now you can use that:
var query = from line in new LineReader(filename)
let items = line.Split('\t')
let myInteger int.Parse(items[1]);
where myInteger == 24809
select line;
using (TextWriter writer = File.CreateText(Application.StartupPath
+ "\\temp\\test.txt"))
{
foreach (string line in query)
{
writer.WriteLine(line);
}
}
Note that it would probably be more efficient to not have the let clauses:
var query = from line in new LineReader(filename)
where int.Parse(line.Split('\t')[1]) == 24809
select line;
at which point you could reasonably do it all in "dot notation":
var query = new LineReader(filename)
.Where(line => int.Parse(line.Split('\t')[1]) == 24809);
However, I far prefer the readability of the original query :)
101 LINQ Samples is certainly a good collection of examples. Also LINQPad might be a good way to play around with LINQ.
For a website as a starting point, you can try Hooked on LINQ
Edit:
Original site appears to be dead now (domain is for sale).
Here's the internet archive of the last version: https://web.archive.org/web/20140823041217/http://www.hookedonlinq.com/
If you're after a book, I found LINQ in action from Manning Publications a good place to start.
MSDN LINQ Examples: http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx
I got a lot out of the following sites when I started:
http://msdn.microsoft.com/en-us/library/bb425822.aspx
http://weblogs.asp.net/scottgu/archive/2007/05/19/using-linq-to-sql-part-1.aspx
To answer the first question, there frankly isn't too much reason to use LINQ the way you suggest in the above function except as an exercise. In fact, it probably just makes the function harder to read.
LINQ is more useful at operating on a collection than a single element, and I would use it in that way instead. So, here's my attempt at using as much LINQ as possible in the function (make no mention of efficiency and I don't suggest reading the whole file into memory like this):
private void Filter(string filename)
{
using (TextWriter writer = File.CreateText(Application.StartupPath + "\\temp\\test.txt"))
{
using(TextReader reader = File.OpenText(filename))
{
List<string> lines;
string line;
while((line = reader.ReadLine()) != null)
lines.Add(line);
var query = from l in lines
let splitLine = l.Split('\t')
where int.Parse(splitLine.Skip(1).First()) == 24809
select l;
foreach(var l in query)
writer.WriteLine(l);
}
}
}
First, I would introduce this method:
private IEnumerable<string> ReadLines(StreamReader reader)
{
while(!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
Then, I would refactor the main method to use it. I put both using statements above the same block, and also added a range check to ensure items[1] doesn't fail:
private void Filter(string fileName)
{
using(var writer = File.CreateText(Application.StartupPath + "\\temp\\test.txt"))
using(var reader = File.OpenText(filename))
{
var myIntegers =
from line in ReadLines(reader)
let items = line.Split('\t')
where items.Length > 1
let myInteger = Int32.Parse(items[1])
where myInteger == 24809
select myInteger;
foreach(var myInteger in myIntegers)
{
writer.WriteLine(myInteger);
}
}
}
I found this article to be extremely crucial to understand LINQ which is based upon so many new constructs brought in in .NET 3.0 & 3.5:
I'll warn you it's a long read, but if you really want to understand what Linq is and does I believe it is essential
http://blogs.msdn.com/ericwhite/pages/FP-Tutorial.aspx
Happy reading
If I was to rewrite your filter function using LINQ where possible, it'd look like this:
private void Filter(string filename)
{
using (TextWriter writer = File.CreateText(Application.StartupPath + "\\temp\\test.txt"))
{
var lines = File.ReadAllLines(filename);
var matches = from line in lines
let items = line.Split('\t')
let myInteger = int.Parse(items[1]);
where myInteger == 24809
select line;
foreach (var match in matches)
{
writer.WriteLine(line)
}
}
}
As for Linq books, I would recommend:
(source: ebookpdf.net)
http://www.diesel-ebooks.com/mas_assets/full/0321564189.jpg
Both are excellent books that drill into Linq in detail.
To add yet another variation to the as-much-linq-as-possible topic, here's my take:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
namespace LinqDemo
{
class Program
{
static void Main()
{
var baseDir = AppDomain.CurrentDomain.BaseDirectory;
File.WriteAllLines(
Path.Combine(baseDir, "out.txt"),
File.ReadAllLines(Path.Combine(baseDir, "in.txt"))
.Select(line => new KeyValuePair<string, string[]>(line, line.Split(','))) // split each line into columns, also carry the original line forward
.Where(info => info.Value.Length > 1) // filter out lines that don't have 2nd column
.Select(info => new KeyValuePair<string, int>(info.Key, int.Parse(info.Value[1]))) // convert 2nd column to int, still carrying the original line forward
.Where(info => info.Value == 24809) // apply the filtering criteria
.Select(info => info.Key) // restore original lines
.ToArray());
}
}
}
Note that I changed your tab-delimited-columns to comma-delimited columns (easier to author in my editor that converts tabs to spaces ;-) ). When this program is run against an input file:
A1,2
B,24809,C
C
E
G,24809
The output will be:
B,24809,C
G,24809
You could improve memory requirements of this solution by replacing "File.ReadAllLines" and "File.WriteAllLines" with Jon Skeet's LineReader (and LineWriter in a similar vein, taking IEnumerable and writing each returned item to the output file as a new line). This would transform the solution above from "get all lines into memory as an array, filter them down, create another array in memory for result and write this result to output file" to "read lines from input file one by one, and if that line meets our criteria, write it to output file immediately" (pipeline approach).
cannot just check if Linqi is true...Linqi is an IEnumerable<bool> (in this case) so have to check like Linqi.First() == true
here is a small example:
string[] items = { "12121", "2222", "24809", "23445", "24809" };
var Linqi = from item in items
where Convert.ToInt32(item) == 24809
select true;
if (Linqi.First() == true) Console.WriteLine("Got a true");
You could also iterate over Linqi, and in my example there are 2 items in the collection.

Categories

Resources