Parsing a CSV formatted text file - c#

I have a text file that looks like this:
1,Smith, 249.24, 6/10/2010
2,Johnson, 1332.23, 6/11/2010
3,Woods, 2214.22, 6/11/2010
1,Smith, 219.24, 6/11/2010
I need to be able to find the balance for a client on a given date.
I'm wondering if I should:
A. Start from the end and read each line into an Array, one at a time.
Check the last name index to see if it is the client we're looking for.
Then, display the balance index of the first match.
or
B. Use RegEx to find a match and display it.
I don't have much experience with RegEx, but I'll learn it if it's a no brainer in a situation like this.

I would recommend using the FileHelpers opensource project:
http://www.filehelpers.net/
Piece of cake:
Define your class:
[DelimitedRecord(",")]
public class Customer
{
public int CustId;
public string Name;
public decimal Balance;
[FieldConverter(ConverterKind.Date, "dd-MM-yyyy")]
public DateTime AddedDate;
}
Use it:
var engine = new FileHelperAsyncEngine<Customer>();
// Read
using(engine.BeginReadFile("TestIn.txt"))
{
// The engine is IEnumerable
foreach(Customer cust in engine)
{
// your code here
Console.WriteLine(cust.Name);
// your condition >> add balance
}
}

This looks like a pretty standard CSV type layout, which is easy enough to process. You can actually do it with ADO.Net and the Jet provider, but I think it is probably easier in the long run to process it yourself.
So first off, you want to process the actual text data. I assume it is reasonable to assume each record is seperated by some newline character, so you can utilize the ReadLine method to easily get each record:
StreamReader reader = new StreamReader("C:\Path\To\file.txt")
while(true)
{
var line = reader.ReadLine();
if(string.IsNullOrEmpty(line))
break;
// Process Line
}
And then to process each line, you can split the string on comma, and store the values into a data structure. So if you use a data structure like this:
public class MyData
{
public int Id { get; set; }
public string Name { get; set; }
public decimal Balance { get; set; }
public DateTime Date { get; set; }
}
And you can process the line data with a method like this:
public MyData GetRecord(string line)
{
var fields = line.Split(',');
return new MyData()
{
Id = int.Parse(fields[0]),
Name = fields[1],
Balance = decimal.Parse(fields[2]),
Date = DateTime.Parse(fields[3])
};
}
Now, this is the simplest example, and doesn't account for cases where the fields may be empty, in which case you would either need to support NULL for those fields (using nullable types int?, decimal? and DateTime?), or define some default value that would be assigned to those values.
So once you have that you can store the collection of MyData objects in a list, and easily perform calculations based on that. So given your example of finding the balance on a given date you could do something like:
var data = customerDataList.First(d => d.Name == customerNameImLookingFor
&& d.Date == dateImLookingFor);
Where customerDataList is the collection of MyData objects read from the file, customerNameImLookingFor is a variable containing the customer's name, and customerDateImLookingFor is a variable containing the date.
I've used this technique to process data in text files in the past for files ranging from a couple records, to tens of thousands of records, and it works pretty well.

I think the cleanest way is to load the entire file into an array of custom objects and work with that. For 3 MB of data, this won't be a problem. If you wanted to do completely different search later, you could reuse most of the code. I would do it this way:
class Record
{
public int Id { get; protected set; }
public string Name { get; protected set; }
public decimal Balance { get; protected set; }
public DateTime Date { get; protected set; }
public Record (int id, string name, decimal balance, DateTime date)
{
Id = id;
Name = name;
Balance = balance;
Date = date;
}
}
…
Record[] records = from line in File.ReadAllLines(filename)
let fields = line.Split(',')
select new Record(
int.Parse(fields[0]),
fields[1],
decimal.Parse(fields[2]),
DateTime.Parse(fields[3])
).ToArray();
Record wantedRecord = records.Single
(r => r.Name = clientName && r.Date = givenDate);

Note that both your options will scan the file. That is fine if you only want to search in the file for 1 item.
If you need to search for multiple client/date combinations in the same file, you could parse the file into a Dictionary<string, Dictionary <date, decimal>> first.
A direct answer: for a one-off, a RegEx will probably be faster.

If you're just reading it I'd consider reading in the whole file in memory using StreamReader.ReadToEnd and then treating it as one long string to search through and when you find a record you want to look at just look for the previous and next line break and then you have the transaction row you want.
If it's on a server or the file can be refreshed all the time this might not be a good solution though.

If it's all well-formatted CSV like this then I'd use something like the Microsoft.VisualBasic.TextFieldParser class or the Fast CSV class over on code project to read it all in.
The data type is a little tricky because I imagine not every client has a record for every day. That means you can't just have a nested dictionary for your looksup. Instead, you want to "index" by name first and then date, but the form of the date record is a little different. I think I'd go for something like this as I read in each record:
Dictionary<string, SortedList<DateTime, double>>

hey, hey, hey!!! why not do it with this great project on codeproject Linq to CSV, way cool!
rock solid

Related

WinForms - Creating and populating a DataGridView with a List of unknown number of columns

Hello everyone
I really don't know if I gonna be able to explain myself, but here we go:
I made a winforms app that capture prices for same products in different stores. The code is already optimized in a way I can add more stores.
The product class is something like this:
public enum EnumMercado { Extra = 1, Dia = 8, Carrefour = 9, BIG = 10, Pao = 11 };
public class Produto
{
public EnumMercado Mercado { get; set; }
public string IDProduto { get; set; }
public string NomeProduto { get; set; }
public bool Disponivel { get; set; }
public decimal? Preco_de { get; set; }
public decimal Preco_por { get; set; }
public Bitmap ProductImage { get; set; }
public bool Erro_Captura { get; set; }
public String ErrorMessage { get; set; }
public Produto()
{
Erro_Captura = false;
ErrorMessage = string.Empty;
}
}
And here is the class that I use to populate a single product search:
public class PesquisaGeral
{
public PRODUTOS Produto { get; set; }
public List<Produto> Cotacoes { get; set; }
public PesquisaGeral()
{
Cotacoes = new List<Produto>();
}
}
PRODUTOS is a Entity class (Product ID on SQL and Name of the product in SQL)
Cotacoes is a List of Produto (for each store which this product is linked)
To get a full products x prices I have a List of PesquisaGeral
Now the question begins
In my DataGridView i want to populate in this way:
Headers:
[Product], [Name of the store 1], [Name of the store 2], [Name of the store 3]....[Quantity]
Values:
[Product 1] [2.66] [2.94] [1.98].....[editable text box]
I already made this work with a DataTable, creating the columns dynamically according to the number of the stores, plus the name of the product and the quantity.
that's "OK"
Is there a way to accomplish this regardless of using a DataTable?
Is it possible to dynamically create a List that will count the number of stores and create the number of named objects (perhaps anonymously) in a horizontal way?
What I want to accomplish is something like this:
just to remember: public enum EnumMercado { Extra = 1, Dia = 8, Carrefour = 9, BIG = 10, Pao = 11 };
var a = getmyCotacoes();
If I point the mouse in the running response, I want to be able to see a "List of something" that each something will show as (probably a list of anonymous):
Produt:"Product 1"
Extra:2.1
Dia:2.15
Carrefour:3.7
BIG:2.1
Pao:2.25
Quantity:0
Today I have those stores but this will increase and I don't want to change the method every time I add a store.
btw, sorry about my English, I'm Brazilian.
thx in adv
Rafael
Your comment about not wanting to use DataTable doesn't make sense; datatables/datarows can be subclassed to create a custom class just like you do with your proposed solution of List , so you get all the functionality of a datatable plus whatever you want from your custom class. Visual studio even has a built in device for creating custom datatables and datarows so that a strongly typed data access layer can be built (usually for a db but doesn't have to be)
The comment about having to join the original search results doesn't make sense either; a datatable is not your database table, and can contain fewer or more columns than your database does. For example, your select might be SELECT ID as ProductId, Nome, Preco FROM products WHERE Nome LIKE 'Jamon%' so, three columns, and your datatable might have columns ProductId, Nome, Quantidade, Preco, PrecoTotal so 5 columns, and PrecoTotal could have an Expression of "[Preco] * [Quantidade]" so it calculates automatically. The quantity column is not retrieved by the query, so the user fills it in. The total only calculates when the quantity is filled in..
I think when it comes to the problem of whether to use or not use datatables, it's more that you aren't quite using them correctly or fully understand how they work; I'm certain they can solve your problem but so we just need to nail down the actual problem
Your actual question seems to be "how do I have a variable number of shops" - that's gonna be pretty hard if you keep your shops in an enum (MercadoEnum) because enums are compiled into the program. To add a new shop you'll have to release a new program. Instead your shop should be a table in the db, like your product is. You should have another table linking shops to products and the price a particular product would be stored in this table because even though every shop might sell milk, they all sell at different prices. When you query your ships and prices you essentially get a list of products back that are the same product, and a bunch of different prices but critically you get the prices back as rows - if 5 shops sell a product you get 5 rows. If two shops sell a product you get two rows back. You're saying you want these as columns - we typically call this a pivot, where we turn a variable number of rows into a variable number of columns. Be careful when you do this, because it's not always wise. Columns are typically thought of as the attributes of a thing, and things don't typically have an infinitely variable number of attributes -
modelling them as such can make them harder to work with
In this case it's still possible; you have a column for the price in each shop, and you perform the pivot either in the db or in the code, maybe by having a datatable with the product I'd as the key, and add a new column every time you encounter a shop you don't know about. You'll end up with one row per product with a column set of prices in each shop. If you have multiple products then you might have a lot of blanks if not every shop sells every product
Note that when I say shop I mean "nation-wide chain of shops" - I recognize Carrefour in that mercado list and they have a lot of locations and maybe even every branch sells some products differently. Perhaps even the product linking should be done at a branch level (a building in a town) rather than group-of-shops level, to allow different locations to stock different products at different prices
Your first problem is not "datatable can't do" one, it's a data modelling one; you haven't quite got the data modelling right for your program yet

LinqToExcel Duplicate Column Names

I have a machine generated excel file that has a few columns with the same name. e.g.:
A B C D
Group 1           Group 2
Period | Name     Period | Name
And i got a DTO like this:
[ExcelColumn("Period")]
public string FirstPeriod { get; set; }
[ExcelColumn("Name")]
public string FirstName { get; set; }
[ExcelColumn("Period")]
public string SecondPeriod { get; set; }
[ExcelColumn("Name")]
public string SecondName { get; set; }
I use the following command to read the lines:
var excel = new ExcelQueryFactory(filePath);
excel.WorksheetRange<T>(beginCell, endColl + linesCount.ToString(), sheetIndex);
It reads the file just fine, but when i check the content of my DTO i saw that all the 'Second' properties have the same values of the 'First' ones.
This post was the closest thing that i found in my searches and i think the problem could be solved with something like this:
excel.AddMapping<MyDto>(x => x.FirstPeriod, "A");
excel.AddMapping<MyDto>(x => x.FirstName, "B");
excel.AddMapping<MyDto>(x => x.SecondPeriod, "C");
excel.AddMapping<MyDto>(x => x.SecondName, "D");
But i don't know how to get the excel column letters...
Obs: I got a few more code behind this but i don't think its relevant to the problem.
The problem that you're having is not possible to solve today with LinqToExcel because it wraps the OleDb functions and then they map properties based on columns names, so you lose the OleDb options like "FN" for specify columns (like "F1" for "A").
There's a issue on LinqToExcel github repo about this. https://github.com/paulyoder/LinqToExcel/issues/85
I recommend you to change the name of the columns to not duplicates names (e.g. Period1, Name1, Period2, Name2) if it's not possible to change because its machine generated, try change the header names in runtime.
Another option is to make more than one query in excel file, with ranges splitted your groups and then merging the results later.
var excel = new ExcelQueryFactory(filePath);
var group1 = excel.WorksheetRange<T>(A1, B + rowCount);
var group2 = excel.WorksheetRange<T>(C1, D + rowCount);
Edit: I'll work on a feature to try solve this problem in a elegant manner, so maybe in future you have a more flexible option to map columns and properties (if they accept my Pull Request)

LinqToExcel returns null

I have an excel sheet of xls format, I am using LinqToExcel to read it and then import it to my DB.
This sheet consist of about 3K row and only 6 cols. I am using .addmapping to map my class properties to the column names
The problem i have is: the cells of column "web-code" are SOMETIMES coming back as null although there are data in the cells.
Here is a sample data that is coming as null!
My Code watch
And here is a sample data where the data coming correct:
My Code Watch
I have tried applying ExcelColumn attribute for mapping, but no luck!
code:
var factory = new ExcelQueryFactory(_excelFilePath);
factory.AddMapping<ExcelPriceEntity>(x => x.WebCode, "WEB-CODE");
factory.AddMapping<ExcelPriceEntity>(x => x.Type, "TYPE");
factory.AddMapping<ExcelPriceEntity>(x => x.Style, "STYLE");
factory.AddMapping<ExcelPriceEntity>(x => x.Qty, "QTY");
factory.AddMapping<ExcelPriceEntity>(x => x.UnitPrice, "Unit Price");
factory.AddMapping<ExcelPriceEntity>(x => x.Bucket, "WEBCODE W/BUCKET");
factory.StrictMapping = StrictMappingType.ClassStrict;
factory.TrimSpaces = TrimSpacesType.Both;
factory.ReadOnly = true;
var prices = factory.Worksheet<ExcelPriceEntity>(_allPricesSheetName).ToList();
var priccerNP = prices.Where(p => p.Type.Contains("900 ARROW TAPE")).ToList();
My PriceEntity Class:
public class ExcelPriceEntity
{
//[ExcelColumn("TYPE")]
public string Type { get; set; }
public string WebCode { get; set; }
//[ExcelColumn("STYLE")]
public string Style { get; set; }
//[ExcelColumn("QTY")]
public string Qty { get; set; }
//[ExcelColumn("Unit Price")]
public string UnitPrice { get; set; }
//[ExcelColumn("WEBCODE W/BUCKET")]
public string Bucket { get; set; }
}
Alternate Solution:
I ended up saving the excel sheet as csv file, then import to SQL table.Then i used linq-to-sql to read the data.
Root Cause:
After researching i found out the problem was that the first cell of this column(web-code) was interger number, and excel trys to figure out the datatype of the column by looking at the first rows!
So next rows of (web-code) column was some text data. So excel couldn't parse it as integer, and assign null value to it!
What I could've done is, assing text value to the first cell so excel would guess the data type as string. But I didn't test that. For anyone reading this answer, try having text value in you first row if you came across the same problem
here, the Contains is not like string contains. It compares a list of cell values to the exact value u give inside the contains method. just try with the full text "900 AMMONIA STOCK OR CUST...)
Another alternative solution to #alsafoo is to convert the column from "general" to "text".
These are the steps:
1. Right click on any cell in the column.
2. In Number tab, select text.
3. Select Format cell
4. Press Ok.
After then, the library will read all values as string.

Get a single element of CSV file

I'm trying to add some csv elements to a list of Alimento, where Alimento is declared as:
namespace ContaCarboidrati
{
class Alimento
{
public virtual string Codice { get; set; }
public virtual string Descrizione { get; set; }
public virtual int Carboidrati { get; set; }
}
}
My csv looks something like this:
"C00, Pasta, 75".
Here's the method that should create the list from the csv:
private static List<Alimento> CreaListaAlimentiDaCsv()
{
List<Alimento> listaCsv = new List<Alimento>();
StreamReader sr = new StreamReader(#"C:\Users\Alex\Documents\RecordAlimenti.csv");
string abc = sr.ReadLine();
//listaCsv = abc.Split(",");
}
abc is "C00, Pasta, 75". I want to get a single element to add it to the list, or add all the 3 elements to the list, i thought that a single element is easier to made.
Sorry for my bad English
Thanks in advance
Alex
You are on the right track, but you cannot just create an Alimento of three strings, which is what you will get if you do abc.Split(","). You need to create a new Alimento object for each item (line) in the csv file and initialize each object correctly. Something like this:
var item = abc.Split(',');
listaCsv.Add(new Alimento() { Codice = item[0], Descrizione = item[1],
Carboidrati = int.Parse(item[2])};
Also, your csv seems to include spaces after the commas which you might want to get rid of. You could use string.Trim() to get rid of leading/trailing spaces. You also have to make sure the third item is actually an integer and take action if that is not the case (i.e. add some error handling).
As a side note, implementing a csv reader is not as trivial as one may think, but there are several free C# implementations out there. If you need something a bit more advanced than just reading a simple (and strictly one-line-per-item) csv, try one of these:
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
http://www.filehelpers.com/
You can parse file with LINQ
var listaCsv = (from line in File.ReadAllLines("RecordAlimenti.csv")
let items = line.Split(',')
select new Alimento {
Codice = items[0],
Descrizione = items[1],
Carboidrati = Int32.Parse(items[2])
}).ToList();
You can parse it pretty easy assuming your data isn't bad.
private IEnumerable<Alimento> CreaListaAlimentiDaCsv(string fileName)
{
return File.Readlines(fileName) //#"C:\Users\Alex\Documents\RecordAlimenti.csv"
.Select(line => line.Split(',').Trim())
.Select(
values =>
new Alimento
{
Codice = value[0],
Descrizione = values[0],
Carboidrati = Convert.ToInt32(values[3])
});
}
You can also use Linq on the method such as
//Takes one line without iterating the entire file
CreaListaAlimentiDaCsv(#"C:\Users\Alex\Documents\RecordAlimenti.csv").Take(1);
//Skips the first line and takes the second line reading two lines total
CreaListaAlimentiDaCsv(#"C:\Users\Alex\Documents\RecordAlimenti.csv").Skip(1).Take(1);

selection using LINQ

I have a sample xml file that looks like this:
<Books>
<Category Genre="Fiction" BookName="book_name" BookPrice="book_price_in_$" />
<Category Genre="Fiction" BookName="book_name" BookPrice="book_price_in_$" />
<Category Genre="NonFiction" BookName="book_name" BookPrice="book_price_in_$" />
<Category Genre="Children" BookName="book_name" BookPrice="book_price_in_$" />
</Books>
I need to collect all book names and book prices and pass to some other method. Right now, i get all book names and book prices seperately into two different List<string> using the following command:
List<string>BookNameList = root.Elements("Category").Select(x => (string)x.Attribute("BookName")).ToList();
List<string>BookPriceList = root.Elements("Category").Select(x => (string)x.Attribute("BookPrice")).ToList();
I create a text file and send this back to the calling function (stroing these results in a text file is a requirement, the text file has two fields bookname and bookprice).
To write to text file is use following code:
for(int i = 0; i < BookNameList.Count; i++)
{
//write BookNameList[i] to file
// Write BookPriceList[i] to file
}
I somehow dont feel good about this approach. suppose due to any reason both lists of not same size. Right now i do not take that into account and i feel using foreach is much more efficient (I maybe wrong). Is it possible to read both the entries into a datastructure (having two attributes name and price) from LINQ? then i can easily iterate over the list of that datastructure with foreach.
I am using C# for programming.
Thanks,
[Edit]: Thanks everyone for the super quick responses, i choose the first answer which I saw.
Selecting:
var books = root.Elements("Category").Select(x => new {
Name = (string)x.Attribute("BookName"),
Price = (string)x.Attribute("BookPrice")
}).ToList();
Looping:
foreach (var book in books)
{
// do something with
// book.Name
// book.Price
}
I think you could make it more tidy by some very simple means.
A somewhat simplified example follows.
First define the type Book:
public class Book
{
public Book(string name, string price)
{
Name = name;
Price = price;
}
public string Name { get; set; }
public string Price { get; set; } // could be decimal if we want a proper type.
}
Then project your XML data into a sequence of Books, like so:
var books = from category in root.Elements("Category")
select new Book((string) x.Attribute("BookName"), (string) x.Attribute("BookPrice"));
If you want better efficiency I would advice using a XmlReader and writing to the file on every encountered Category, but it's quite involved compared to your approach. It depends on your requirements really, I don't think you have to worry about it too much unless speed is essential or the dataset is huge.
The streamed approach would look something like this:
using (var outputFile = OpenOutput())
using (XmlReader xml = OpenInput())
{
try
{
while (xml.ReadToFollowing("Category"))
{
if (xml.IsStartElement())
{
string name = xml.GetAttribute("BookName");
string price = xml.GetAttribute("BookPrice");
outputFile.WriteLine(string.Format("{0} {1}", name, price));
}
}
}
catch (XmlException xe)
{
// Parse error encountered. Would be possible to recover by checking
// ReadState and continue, this would obviously require some
// restructuring of the code.
// Catching parse errors is recommended because they could contain
// sensitive information about the host environment that we don't
// want to bubble up.
throw new XmlException("Uh-oh");
}
}
Bear in mind that if your nodes have XML namespaces you must register those with the XmlReader through a NameTable or it won't recognize the nodes.
You can do this with a single query and a foreach loop.
var namesAndPrices = from category in root.Elements("Category")
select new
{
Name = category.Attribute("BookName").Value,
Price = category.Attribute("BookPrice").Value
};
foreach (var nameAndPrice in namesAndPrices)
{
// TODO: Output to disk
}
To build on Jeff's solution, if you need to pass this collection into another function as an argument you can abuse the KeyValuePair data structure a little bit and do something along the lines of:
var namesAndPrices = from category in root.Elements("Category")
select new KeyValuePair<string, string>(
Name = category.Attribute("BookName").Value,
Price = category.Attribute("BookPrice").Value
);
// looping that happens in another function
// Key = Name
// Value = Price
foreach (var nameAndPrice in namesAndPrices)
{
// TODO: Output to disk
}

Categories

Resources