How to remove characters from an excel sheet? - c#

My overall problem is that I have a large Excel file(Column A-S, 85000 rows) that I want to convert to XML. The data in the cells is all text.
The process I'm using now is to manually save the excel file as csv, then parse that in my own c# program to turn it into XML. If you have better recommendations, please recommend. I've searched SO and the only fast methods I found for converting straight to XML require my data to be all numeric.
(Tried reading cell by cell, would have taken 3 days to process)
So, unless you can recommend a different way for me to approach the problem, I want to be able to programmatically remove all commas, <, >, ', and " from the excel sheet.

There are many options to read/edit/create Excel files:
MS provides the free OpenXML SDK V 2.0 - see http://msdn.microsoft.com/en-us/library/bb448854%28office.14%29.aspx (XLSX only)
This can read+write MS Office files (including Excel).
Another free option see http://www.codeproject.com/KB/office/OpenXML.aspx (XLSX only)
IF you need more like handling older Excel versions (like XLS, not only XLSX), rendering, creating PDFs, formulas etc. then there are different free and commercial libraries like ClosedXML (free, XLSX only), EPPlus (free, XLSX only), Aspose.Cells, SpreadsheetGear, LibXL and Flexcel etc.
Another option is Interop which requires Excel to be installed locally BUT Interop is not supported in sever-scenarios by MS.
Any library-based approach to deal with the Excel-file directly is way faster than Interop in my experience...

I would use a combination of Microsoft.Office.Interop.Excel and XmlSerializer to get the job done.
This is in light of the fact that a) you're using a console appilcation, and b) the interop assemblies are easy to integrate to the solution (just References->Add).
I'm assuming that you have a copy of Excel installed in the machine runnning the process (you mentioned you manually open the workbook currently, hence the assumption).
The code would look something like this:
The serializable layer:
public class TestClass
{
public List<TestLineItem> LineItems { get; set; }
public TestClass()
{
LineItems = new List<TestLineItem>();
}
}
public class TestLineItem
{
private string SanitizeText(string input)
{
return input.Replace(",", "")
.Replace(".", "")
.Replace("<", "")
.Replace(">", "")
.Replace("'", "")
.Replace("\"", "");
}
private string m_field1;
private string m_field2;
public string Field1
{
get { return m_field1; }
set { m_field1 = SanitizeText(value); }
}
public string Field2
{
get { return m_field2; }
set { m_field2 = SanitizeText(value); }
}
public decimal Field3 { get; set; }
public TestLineItem() { }
public TestLineItem(object field1, object field2, object field3)
{
m_field1 = (field1 ?? "").ToString();
m_field2 = (field2 ?? "").ToString();
if (field3 == null || field3.ToString() == "")
Field3 = 0m;
else
Field3 = Convert.ToDecimal(field3.ToString());
}
}
Then open the worksheet and load into a 2D array:
// using OExcel = Microsoft.Office.Interop.Excel;
var app = new OEXcel.Application();
var wbPath = Path.Combine(
Environment.GetFolderPath(
Environment.SpecialFolder.MyDocuments), "Book1.xls");
var wb = app.Workbooks.Open(wbPath);
var ws = (OEXcel.Worksheet)wb.ActiveSheet;
// there are better ways to do this...
// this one's just off the top of my head
var rngTopLine = ws.get_Range("A1", "C1");
var rngEndLine = rngTopLine.get_End(OEXcel.XlDirection.xlDown);
var rngData = ws.get_Range(rngTopLine, rngEndLine);
var arrayData = (object[,])rngData.Value2;
var tc = new TestClass();
// since you're enumerating an array, the operation will run much faster
// than reading the worksheet line by line.
for (int i = arrayData.GetLowerBound(0); i <= arrayData.GetUpperBound(0); i++)
{
tc.LineItems.Add(
new TestLineItem(arrayData[i, 1], arrayData[i, 2], arrayData[i, 3]));
}
var xs = new XmlSerializer(typeof(TestClass));
var fs = File.Create(Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments),
"Book1.xml"));
xs.Serialize(fs, tc);
wb.Close();
app.Quit();
The generated XML output will look something like this:
<TestClass>
<LineItems>
<TestLineItem>
<Field1>test1</Field1>
<Field2>some&lt;encoded&gt; stuff here</Field2>
<Field3>123456.789</Field3>
</TestLineItem>
<TestLineItem>
<Field1>test2</Field1>
<Field2>testing some commas, and periods.</Field2>
<Field3>23456789.12</Field3>
</TestLineItem>
<TestLineItem>
<Field1>test3</Field1>
<Field2>text in &quot;quotes&quot; and &#39;single quotes&#39;</Field2>
<Field3>0</Field3>
</TestLineItem>
</LineItems>
</TestClass>

Related

Get field data outside of reporting database using Encompass360 SDK

I'm trying to build a standalone application that creates a custom report for Encompass360 without needing to put certain fields into the reporting database.
So far I have only found one way to do it, but it is extremely slow. (Much slower than a normal report within encompass when retrieving data outside of the reporting database.) It takes almost 2 minutes to pull the data for 5 loans doing this:
int count = 5;
StringList fields = new StringList();
fields.Add("Fields.317");
fields.Add("Fields.3238");
fields.Add("Fields.313");
fields.Add("Fields.319");
fields.Add("Fields.2");
// lstLoans.Items contains the string location of the loans(i.e. "My Pipeline\Dave#6")
foreach (LoanIdentity loanID in lstLoans.Items)
{
string[] loanIdentifier = loanID.ToString().Split('\\');
Loan loan = Globals.Session.Loans.Folders[loanIdentifier[0]].OpenLoan(loanIdentifier[1]);
bool fundingPlus = true; // if milestone == funding || shipping || suspended || completion;
if (!fundingPlus)
continue;
bool oneIsChecked = false;
LogMilestoneEvents msEvents = loan.Log.MilestoneEvents;
DateTime date;
MilestoneEvent ms = null; // better way to do this probably
if (checkBox4.Checked)
{
ms = msEvents.GetEventForMilestone("Completion");
if (ms.Completed)
{
oneIsChecked = true;
}
}
else if (checkBox3.Checked)
{
ms = msEvents.GetEventForMilestone("Suspended");
if (ms.Completed)
{
oneIsChecked = true;
}
}
else if (checkBox2.Checked)
{
ms = msEvents.GetEventForMilestone("Shipping");
if (ms.Completed)
{
oneIsChecked = true;
}
}
else if (checkBox1.Checked)
{
ms = msEvents.GetEventForMilestone("Funding");
if (ms.Completed)
{
oneIsChecked = true;
}
}
if (!oneIsChecked)
continue;
string LO = loan.Fields["317"].FormattedValue;
string LOid = loan.Fields["3238"].FormattedValue;
string city = loan.Fields["313"].FormattedValue;
string address = loan.Fields["319"].FormattedValue;
string loanAmount = loan.Fields["2"].FormattedValue;
if (loanAmount == "")
{
Console.WriteLine(LO);
continue;
}
int numLoans = 1;
addLoanFieldToListView(LO, numLoans, city, address, loanAmount);
if (--count == 0)
break;
}
}
I haven't been able to figure out how to use any of the pipeline methods to retrieve data outside the reporting database, but when all of the fields I am looking for are in the reporting database, it hardly takes a couple seconds to retrieve the contents of hundreds of loans using these tools:
session.Reports.SelectReportingFieldsForLoans(loanGUIDs, fields);
session.Loans.QueryPipeline(selectedDate, PipelineSortOrder.None);
session.Loans.OpenPipeline(PipelineSortOrder.None);
What would really help me is if somebody provided a simple example for retrieving data outside of the reporting database by using the encompass sdk that doesn't take longer than it ought to for retrieving the data.
Note: I am aware I can add the fields to the reporting database that aren't in it currently, so this is not the answer I am looking for.
Note #2: Encompass360 doesn't have it's own tag, if somebody knows of better tags that can be added for the subject at hand, please add them.
I use the SelectFields method on Loans to retrieve loan field data that is not in the reporting database in Encompass. It is very performant compared to opening loans up one by one but the results are returned as strings so it requires some parsing to get the values in their native types. Below is the example from the documentation for using this method.
using System;
using System.IO;
using EllieMae.Encompass.Client;
using EllieMae.Encompass.BusinessObjects;
using EllieMae.Encompass.Query;
using EllieMae.Encompass.Collections;
using EllieMae.Encompass.BusinessObjects.Loans;
class LoanReader
{
public static void Main()
{
// Open the session to the remote server
Session session = new Session();
session.Start("myserver", "mary", "maryspwd");
// Build the query criterion for all loans that were opened this year
DateFieldCriterion dateCri = new DateFieldCriterion();
dateCri.FieldName = "Loan.DateFileOpened";
dateCri.Value = DateTime.Now;
dateCri.Precision = DateFieldMatchPrecision.Year;
// Perform the query to get the IDs of the loans
LoanIdentityList ids = session.Loans.Query(dateCri);
// Create a list of the specific fields we want to print from each loan.
// In this case, we'll select the Loan Amount and Interest Rate.
StringList fieldIds = new StringList();
fieldIds.Add("2"); // Loan Amount
fieldIds.Add("3"); // Rate
// For each loan, select the desired fields
foreach (LoanIdentity id in ids)
{
// Select the field values for the current loan
StringList fieldValues = session.Loans.SelectFields(id.Guid, fieldIds);
// Print out the returned values
Console.WriteLine("Fields for loan " + id.ToString());
Console.WriteLine("Amount: " + fieldValues[0]);
Console.WriteLine("Rate: " + fieldValues[1]);
}
// End the session to gracefully disconnect from the server
session.End();
}
}
You will highly benefit from adding these fields to the reporting DB and using RDB query instead. Internally, Encompass has to open / parse files when you read fields without RDB, which is a slow process. Yet it just does a SELECT query on fields in RDB which is a very fast process. This tool will allow you quickly checking / finding which fields are in RDB so that you can create a plan for your query as well as a plan to update RDB: https://www.encompdev.com/Products/FieldExplorer
You query RDB via Session.Loans.QueryPipeline() very similarly to your use of Loan Query. Here's a good example of source code (in VB): https://www.encompdev.com/Products/AlertCounterFieldPlugin

OpenXML - after xlsx edit Excel detects errors in file

I have xlsx file with pivot table and some filter (pivot field), I’m trying to use OpenXML to:
Open the file
Modify pivot field setting
Save the file
I’m using this simple (and ugly) code to do the job:
OpenSettings settings = new OpenSettings()
{
MarkupCompatibilityProcessSettings = new MarkupCompatibilityProcessSettings(MarkupCompatibilityProcessMode.ProcessAllParts, FileFormatVersions.Office2010)
};
SpreadsheetDocument spd = SpreadsheetDocument.Open(pathToFile, true, settings);
var pivotTableCacheDefinitionParts = spd.WorkbookPart.PivotTableCacheDefinitionParts;
foreach (PivotTableCacheDefinitionPart item in pivotTableCacheDefinitionParts)
{
var pivotCacheDefinition = item.PivotCacheDefinition;
var d = pivotCacheDefinition.CacheFields.Where(x => (x as CacheField).Caption == "Some filter from Excel");
foreach(var item2 in d)
{
if (item2.InnerXml.Contains("Some filter value"))
{
var a1 = item2.InnerXml.Replace("><", ">\n<").Split('\n');
var a2 = a1.Where(x => !x.Contains("Some filter value"));
string a3 = "";
foreach (var item3 in a2)
{
a3 += item3;
}
a3=a3.Replace("count=\"2\"","count=\"1\"");//There are two values to choose from currently
item2.InnerXml = a3;
}
}
}
After I save my document using:
spd.WorkbookPart.Workbook.Save();
spd.Close();
Excel claims file is damaged and will attempt to repair it… I tried using other libraries but:
ClosedXML - it doesn’t see any data in pivot tables (perhaps because OLAP is used as the data source? I don’t know)
ExcelDataReader - doesn’t seems to support pivot tables, or support them only partially
EPPlus (beta version - stable didn’t work with my xlsx file) - it doesn’t seem to support edit of the pivotal fields (pivot table filters)
MS.Office.Interop.Excel - this works (mostly), but since we want to use this functionality on the server side, it is not recommended solution
What am I doing wrong?

Bulk data insertion in SQL Server table from delimited text file using c#

I have tab delimited text file. File is around 100MB. I want to store data from this file to SQL server table. The file contains 1 million records when stored in sql server. What is the best way to achieve this?
I can create in momory datatable in c# and then upload the same to sql server, but in this case it will load entire 100 MB file to memory. What if file size get bigger?
No problem; CsvReader will handle most delimited text formats, and implements IDataReader, so can be used to feed a SqlBulkCopy. For example:
using (var file = new StreamReader(path))
using (var csv = new CsvReader(file, true)) // true = first row is headers
using (var bcp = new SqlBulkCopy(connectionString))
{
bcp.DestinationTableName = "Foo";
bcp.WriteToServer(csv);
}
Note that CsvReader has lots of options more more subtle file handling (specifying the delimiter rules, etc). SqlBulkCopy is the high-performance bulk-load API - very efficient. This is a streaming reader/writer API; it does not load all the data into memory at once.
You should read the file line-by-line, so you don't have to load the whole line into memory:
using (var file = System.IO.File.OpenText(filename))
{
while (!file.EndOfStream)
{
string line = file.ReadLine();
// TODO: Do your INSERT here
}
}
* Update *
"This will make 1 million separate insert commands to sql server. Is there any way to make it in bulk"
You could use parameterised queries, which would still issue 1M inserts, but would still be quite fast.
Alternatively, you can use SqlBulkCopy, but that's going to be rather difficult if you don't want to use 3rd party libraries. If you are more amenable to the MS license, you could use the LINQ Entity Data Reader (distributed under Ms-PL license), which provides the AsDataReader extension method:
void MyInsertMethod()
{
using (var bulk = new SqlBulkCopy("MyConnectionString"))
{
bulk.DestinationTableName = "MyTableName";
bulk.WriteToServer(GetRows().AsDataReader());
}
}
class MyType
{
public string A { get; set; }
public string B { get; set; }
}
IEnumerable<MyType> GetRows()
{
using (var file = System.IO.File.OpenText("MyTextFile"))
{
while (!file.EndOfStream)
{
var splitLine = file.ReadLine().Split(',');
yield return new MyType() { A = splitLine[0], B = splitLine[1] };
}
}
}
If you didn't want to use the MS licensed code either, you could implement IDataReader yourself, but that is going to be a PITA. Note that the CSV handling above (Split(',')) is not at all robust, and also that column names in the table must be the same as property names on MyType. TBH, I'd recommend you go with Marc's answer on this one

XML data file not opening and working properly

I developed a WPF application using XML as the database file. Yesterday, the program stopped working. After some checking, I saw that there was a problem with Transaction.xml file. I tried opening the same in IE, but got this error
The XML page cannot be displayed
Cannot view XML input using style sheet. Please correct the error and then click the Refresh button, or try again later.
An invalid character was found in text content. Error processing resource 'file:///C:/RegisterMaintenance/Transaction.xml
Then, I tried opening the file in notepad and it showed weird character(screenshot below).
In the end, its displaying the right structure of xml. Please tell me what has gone wrong and why the xml not showing correctly. How can get it to normal state. I am really worried as this is my only data file. Any help or suggestion will be great.
One of the codes that edit this file, there are other similar types of code files that use Transaction.xml
public string Add()
{
XDocument doc1 = XDocument.Load(#"Ledgers.xml");
XElement elem = (from r in doc1.Descendants("Ledger")
where r.Element("Name").Value == this.Buyer
select r).First();
this.TinNo = (string)elem.Element("TinNo");
this.PhoneNo = (string)elem.Element("PhoneNo");
this.CommissionAmount = (this.CommissionRate * this.Amount) / 100;
this.CommissionAmount = Math.Round((decimal)this.CommissionAmount);
this.VatAmount = (this.CommissionAmount + this.Amount) * this.VatRate / 100;
this.VatAmount = Math.Round((decimal)this.VatAmount);
this.InvoiceAmount = this.Amount + this.CommissionAmount + this.VatAmount;
XDocument doc2 = XDocument.Load(#"Transactions.xml");
var record = from r in doc2.Descendants("Transaction")
where (int)r.Element("Serial") == Serial
select r;
foreach (XElement r in record)
{
r.Element("Invoice").Add(new XElement("InvoiceNo", this.InvoiceNo), new XElement("InvoiceDate", this.InvoiceDate),
new XElement("TinNo", this.TinNo), new XElement("PhoneNo", this.PhoneNo), new XElement("TruckNo", this.TruckNo), new XElement("Source", this.Source),
new XElement("Destination", this.Destination), new XElement("InvoiceAmount", this.InvoiceAmount),
new XElement("CommissionRate", this.CommissionRate), new XElement("CommissionAmount", this.CommissionAmount),
new XElement("VatRate", this.VatRate), new XElement("VatAmount", this.VatAmount));
}
doc2.Save(#"Transactions.xml");
return "Invoice Created Successfully";
}
C# is an Object Orient Programming (OOP) language, perhaps you should use some objects! How can you possibly test your code for accuracy?
You should separate out responsibilities, an example:
public class Vat
{
XElement self;
public Vat(XElement parent)
{
self = parent.Element("Vat");
if (null == self)
{
parent.Add(self = new XElement("Vat"));
// Initialize values
Amount = 0;
Rate = 0;
}
}
public XElement Element { get { return self; } }
public decimal Amount
{
get { return (decimal)self.Attribute("Amount"); }
set
{
XAttribute a = self.Attribute("Amount");
if (null == a)
self.Add(new XAttribute("Amount", value));
else
a.Value = value.ToString();
}
}
public decimal Rate
{
get { return (decimal)self.Attribute("Rate"); }
set
{
XAttribute a = self.Attribute("Rate");
if (null == a)
self.Add(new XAttribute("Rate", value));
else
a.Value = value.ToString();
}
}
}
All the Vat data will be in one node, and all the accessing of it will be in one testable class.
Your above foreach would look more like:
foreach(XElement r in record)
{
XElement invoice = r.Add("Invoice");
...
Vat vat = new Vat(invoice);
vat.Amount = this.VatAmount;
vat.Rate = this.VatRate;
}
That is readable! At a glance, from your code, I cannot even tell if invoice is the parent of Vat, but I can now!
Note: This isn't to say your code is at fault, it could be a hard-drive error, as that is what it looks like to me. But if you want people to peruse your code, make it readable and testable! Years from now if you or someone else has to change your code, if it isn't readable, it is useless.
Perhaps from this incident you learned two things
read-ability and test-ability.
Backups! (All my valuable Xml files are in a SVN (TortoiseSVN) so I can compare what has changed, as well as keeping good backups. The SVN is backed-up to online storage.)
An ideal next step is to take the code in the property setters and refactor that out to a static function extension that is both testable and reproducable:
public static class XAttributeExtensions
{
public static XAttribute SetAttribute(this XElement self, string name, object value)
{
// test for correct arguments
if (null == self)
throw new ArgumentNullException("XElement to SetAttribute method cannot be null!");
if (string.IsNullOrEmpty(name))
throw new ArgumentNullException("Attribute name cannot be null or empty to SetAttribute method!");
if (null == value) // how to handle?
value = ""; // or can throw an exception like one of the above.
// Now to the good stuff
XAttribute a = self.Attribute(name);
if (null == a)
self.Add(a = new XAttribute(name, value));
else
a.Value = value.ToString();
return a;
}
}
That is easily testable, very readable and the best is it can be used over and over again getting the same results!
Example, the Amount property can be greatly simplified with:
public decimal Amount
{
get { return (decimal)self.Attribute("Amount"); }
set { self.SetAttribute("Amount", value); }
}
I know this is a lot of boiler-plate code, but I find it readable, extendable and best of all test-able. If I want to add another value to Vat, I can just modify the class and not have to worry about have I added it in the right place. If Vat had children, I'd make another class that Vat had a property for.
The .xml is clearly malformed. No browser or other program that reads xml files will be able to do anything with it. It doesn't matter that the xml starts being correct after some lines.
So the error is most certainly it whatever creates and/or edits your xml file. You should have a look there. Maybe the encoding is wrong. The most used encoding is UTF-8.
Also, as a side note, XML is not really the best format for large databases (too much overhead), so switching to a binary format would be best. Even switching to JSON would bring a benefit.

Reading a CSV file in .NET?

How do I read a CSV file using C#?
A choice, without using third-party components, is to use the class Microsoft.VisualBasic.FileIO.TextFieldParser (http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx) . It provides all the functions for parsing CSV. It is sufficient to import the Microsoft.VisualBasic assembly.
var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(file);
parser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
parser.SetDelimiters(new string[] { ";" });
while (!parser.EndOfData)
{
string[] row = parser.ReadFields();
/* do something */
}
You can use the Microsoft.VisualBasic.FileIO.TextFieldParser class in C#:
using System;
using System.Data;
using Microsoft.VisualBasic.FileIO;
static void Main()
{
string csv_file_path = #"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTableFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTableFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datacolumn = new DataColumn(column);
datacolumn.AllowDBNull = true;
csvData.Columns.Add(datacolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}
You could try CsvHelper, which is a project I work on. Its goal is to make reading and writing CSV files as easy as possible, while being very fast.
Here are a few ways you can read from a CSV file.
// By type
var records = csv.GetRecords<MyClass>();
var records = csv.GetRecords( typeof( MyClass ) );
// Dynamic
var records = csv.GetRecords<dynamic>();
// Using anonymous type for the class definition
var anonymousTypeDefinition =
{
Id = default( int ),
Name = string.Empty,
MyClass = new MyClass()
};
var records = csv.GetRecords( anonymousTypeDefinition );
I usually use a simplistic approach like this one:
var path = Server.MapPath("~/App_Data/Data.csv");
var csvRows = System.IO.File.ReadAllLines(path, Encoding.Default).ToList();
foreach (var row in csvRows.Skip(1))
{
var columns = row.Split(';');
var field1 = columns[0];
var field2 = columns[1];
var field3 = columns[2];
}
I just used this library in my application. http://www.codeproject.com/KB/database/CsvReader.aspx. Everything went smoothly using this library, so I'm recommending it. It is free under the MIT License, so just include the notice with your source files.
I didn't display the CSV in a browser, but the author has some samples for Repeaters or DataGrids. I did run one of his test projects to test a Sort operation I have added and it looked pretty good.
You can try Cinchoo ETL - an open source lib for reading and writing CSV files.
Couple of ways you can read CSV files
Id, Name
1, Tom
2, Mark
This is how you can use this library to read it
using (var reader = new ChoCSVReader("emp.csv").WithFirstLineHeader())
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
If you have POCO object defined to match up with CSV file like below
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
You can parse the same file using this POCO class as below
using (var reader = new ChoCSVReader<Employee>("emp.csv").WithFirstLineHeader())
{
foreach (var item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
Please check out articles at CodeProject on how to use it.
Disclaimer: I'm the author of this library
I recommend Angara.Table, about save/load: http://predictionmachines.github.io/Angara.Table/saveload.html.
It makes column types inference, can save CSV files and is much faster than TextFieldParser. It follows RFC4180 for CSV format and supports multiline strings, NaNs, and escaped strings containing the delimiter character.
The library is under MIT license. Source code is https://github.com/Microsoft/Angara.Table.
Though its API is focused on F#, it can be used in any .NET language but not so succinct as in F#.
Example:
using Angara.Data;
using System.Collections.Immutable;
...
var table = Table.Load("data.csv");
// Print schema:
foreach(Column c in table)
{
string colType;
if (c.Rows.IsRealColumn) colType = "double";
else if (c.Rows.IsStringColumn) colType = "string";
else if (c.Rows.IsDateColumn) colType = "date";
else if (c.Rows.IsIntColumn) colType = "int";
else colType = "bool";
Console.WriteLine("{0} of type {1}", c.Name, colType);
}
// Get column data:
ImmutableArray<double> a = table["a"].Rows.AsReal;
ImmutableArray<string> b = table["b"].Rows.AsString;
Table.Save(table, "data2.csv");
You might be interested in Linq2Csv library at CodeProject. One thing you would need to check is that if it's reading the data when it needs only, so you won't need a lot of memory when working with bigger files.
As for displaying the data on the browser, you could do many things to accomplish it, if you would be more specific on what are your requirements, answer could be more specific, but things you could do:
1. Use HttpListener class to write simple web server (you can find many samples on net to host mini-http server).
2. Use Asp.Net or Asp.Net Mvc, create a page, host it using IIS.
Seems like there are quite a few projects on CodeProject or CodePlex for CSV Parsing.
Here is another CSV Parser on CodePlex
http://commonlibrarynet.codeplex.com/
This library has components for CSV parsing, INI file parsing, Command-Line parsing as well. It's working well for me so far. Only thing is it doesn't have a CSV Writer.
This is just for parsing the CSV. For displaying it in a web page, it is simply a matter of taking the list and rendering it however you want.
Note: This code example does not handle the situation where the input string line contains newlines.
public List<string> SplitCSV(string line)
{
if (string.IsNullOrEmpty(line))
throw new ArgumentException();
List<string> result = new List<string>();
int index = 0;
int start = 0;
bool inQuote = false;
StringBuilder val = new StringBuilder();
// parse line
foreach (char c in line)
{
switch (c)
{
case '"':
inQuote = !inQuote;
break;
case ',':
if (!inQuote)
{
result.Add(line.Substring(start, index - start)
.Replace("\"",""));
start = index + 1;
}
break;
}
index++;
}
if (start < index)
{
result.Add(line.Substring(start, index - start).Replace("\"",""));
}
return result;
}
}
I have been maintaining an open source project called FlatFiles for several years now. It's available for .NET Core and .NET 4.5.1.
Unlike most of the alternatives, it allows you to define a schema (similar to the way EF code-first works) with an extreme level of precision, so you aren't fight conversion issues all the time. You can map directly to your data classes, and there is also support for interfacing with older ADO.NET classes.
Performance-wise, it's been tuned to be one of the fastest parsers for .NET, with a plethora of options for quirky format differences. There's also support for fixed-length files, if you need it.
you can use this library: Sky.Data.Csv
https://www.nuget.org/packages/Sky.Data.Csv/
this is a really fast CSV reader library and it's really easy to use:
using Sky.Data.Csv;
var readerSettings = new CsvReaderSettings{Encoding = Encoding.UTF8};
using(var reader = CsvReader.Create("path-to-file", readerSettings)){
foreach(var row in reader){
//do something with the data
}
}
it also supports reading typed objects with CsvReader<T> class which has a same interface.

Categories

Resources