I have a process whereby we have written a class to import a large (ish) CSV into our app using CsvHelper (https://joshclose.github.io/CsvHelper).
I would like to compare the header to the Map to ensure the header's integrity. We get the CSV file from a 3rd party and I want to ensure it doesn't change over time and thought the best way to do this would be to compare it against the map.
We have a class set up as so (trimmed):
public class VisitExport
{
public int? Count { get; set; }
public string CustomerName { get; set; }
public string CustomerAddress { get; set; }
}
And its corresponding map (also trimmed):
public class VisitMap : ClassMap<VisitExport>
{
public VisitMap()
{
Map(m => m.Count).Name("Count");
Map(m => m.CustomerName).Name("Customer Name");
Map(m => m.CustomerAddress).Name("Customer Address");
}
}
This is the code I have for reading the CSV file and it works great. I have a try catch in place for the error but ideally, if it fails specifically for a header miss match, I'd like to handle that specifically.
private void fileLoadedLink_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
try
{
var filePath = string.Empty;
data = new List<VisitExport>();
using (OpenFileDialog openFileDialog = new OpenFileDialog())
{
openFileDialog.InitialDirectory = new KnownFolder(KnownFolderType.Downloads).Path;
openFileDialog.Filter = "csv files (*.csv)|*.csv";
openFileDialog.FilterIndex = 2;
openFileDialog.RestoreDirectory = true;
if (openFileDialog.ShowDialog() == DialogResult.OK)
{
filePath = openFileDialog.FileName;
var fileStream = openFileDialog.OpenFile();
var culture = CultureInfo.GetCultureInfo("en-GB");
using (StreamReader reader = new StreamReader(fileStream))
using (var readCsv = new CsvReader(reader, culture))
{
var map = new VisitMap();
readCsv.Context.RegisterClassMap(map);
var fileContent = readCsv.GetRecords<VisitExport>();
data = fileContent.ToList();
fileLoadedLink.Text = filePath;
viewModel.IsFileLoaded = true;
}
}
}
}
catch (CsvHelperException ex)
{
Console.WriteLine(ex.InnerException != null ? ex.InnerException.Message : ex.Message);
fileLoadedLink.Text = "Error loading file.";
viewModel.IsFileLoaded = false;
}
}
Is there a way of comparing the Csv header vs my map?
There are two basic cases for CSV files with headers: missing CSV columns, and extra CSV columns. The first is already detected by CsvHelper while the detection of the second is not implemented out of the box and requires subclassing of CsvReader.
(As CsvHelper maps CSV columns to model properties by name, permuting the order of the columns in the CSV file would not be considered a breaking change.)
Note that this only applies to CSV files that actually contain headers. Since you are not setting CsvConfiguration.HasHeaderRecord = false I assume that this applies to your use case.
Details about each of the two cases follow.
Missing CSV columns.
Currently CsvHelper already throws an exception by default in such situations. When unmapped data model properties are found, CsvConfiguration.HeaderValidated is invoked. By default this is set to ConfigurationFunctions.HeaderValidated whose current behavior is to throw a HeaderValidationException if there are any unmapped model properties. You can replace or extend HeaderValidated with logic of your own if you prefer:
var culture = CultureInfo.GetCultureInfo("en-GB");
var config = new CsvConfiguration (culture)
{
HeaderValidated = (args) =>
{
// Add additional logic as required here
ConfigurationFunctions.HeaderValidated(args);
},
};
using (var readCsv = new CsvReader(reader, config))
{
// Remainder unchanged
Demo fiddle #1 here.
Extra CSV columns.
Currently CsvHelper does not inform the application when this happens. See Throw if csv contains unexpected columns #1032 which confirms that this is not implemented out of the box.
In a GitHub comment, user leopignataro suggests a workaround, which is to subclass CsvReader and add the necessary validation logic oneself. However the version shown in the comment doesn't seem to handle duplicated column names or embedded references. The following subclass of CsvHelper should do this correctly. It is based on the logic in CsvReader.ValidateHeader(ClassMap map, List<InvalidHeader> invalidHeaders). It recursively walks the incoming ClassMap, attempts to find a CSV header corresponding to each member or constructor parameter, and flags the index of each one that is mapped. Afterwards, if there are any unmapped headers, the supplied Action<CsvContext, List<string>> OnUnmappedCsvHeaders is invoked to notify the application of the problem and throw some exception if desired:
public class ValidatingCsvReader : CsvReader
{
public ValidatingCsvReader(TextReader reader, CultureInfo culture, bool leaveOpen = false) : this(new CsvParser(reader, culture, leaveOpen)) { }
public ValidatingCsvReader(TextReader reader, CsvConfiguration configuration) : this(new CsvParser(reader, configuration)) { }
public ValidatingCsvReader(IParser parser) : base(parser) { }
public Action<CsvContext, List<string>> OnUnmappedCsvHeaders { get; set; }
public override void ValidateHeader(Type type)
{
base.ValidateHeader(type);
var headerRecord = HeaderRecord;
var mapped = new BitArray(headerRecord.Length);
var map = Context.Maps[type];
FlagMappedHeaders(map, mapped);
var unmappedHeaders = Enumerable.Range(0, headerRecord.Length).Where(i => !mapped[i]).Select(i => headerRecord[i]).ToList();
if (unmappedHeaders.Count > 0)
{
OnUnmappedCsvHeaders?.Invoke(Context, unmappedHeaders);
}
}
protected virtual void FlagMappedHeaders(ClassMap map, BitArray mapped)
{
// Logic adapted from https://github.com/JoshClose/CsvHelper/blob/0d753ff09294b425e4bc5ab346145702eeeb1b6f/src/CsvHelper/CsvReader.cs#L157
// By https://github.com/JoshClose
foreach (var parameter in map.ParameterMaps)
{
if (parameter.Data.Ignore)
continue;
if (parameter.Data.IsConstantSet)
// If ConvertUsing and Constant don't require a header.
continue;
if (parameter.Data.IsIndexSet && !parameter.Data.IsNameSet)
// If there is only an index set, we don't want to validate the header name.
continue;
if (parameter.ConstructorTypeMap != null)
{
FlagMappedHeaders(parameter.ConstructorTypeMap, mapped);
}
else if (parameter.ReferenceMap != null)
{
FlagMappedHeaders(parameter.ReferenceMap.Data.Mapping, mapped);
}
else
{
var index = GetFieldIndex(parameter.Data.Names.ToArray(), parameter.Data.NameIndex, true);
if (index >= 0)
mapped.Set(index, true);
}
}
foreach (var memberMap in map.MemberMaps)
{
if (memberMap.Data.Ignore || !CanRead(memberMap))
continue;
if (memberMap.Data.ReadingConvertExpression != null || memberMap.Data.IsConstantSet)
// If ConvertUsing and Constant don't require a header.
continue;
if (memberMap.Data.IsIndexSet && !memberMap.Data.IsNameSet)
// If there is only an index set, we don't want to validate the header name.
continue;
var index = GetFieldIndex(memberMap.Data.Names.ToArray(), memberMap.Data.NameIndex, true);
if (index >= 0)
mapped.Set(index, true);
}
foreach (var referenceMap in map.ReferenceMaps)
{
if (!CanRead(referenceMap))
continue;
FlagMappedHeaders(referenceMap.Data.Mapping, mapped);
}
}
}
And then in your code, handle the OnUnmappedCsvHeaders callback however you would like, such as by throwing a CsvHelperException or some other custom exception:
using (var readCsv = new ValidatingCsvReader(reader, culture)
{
OnUnmappedCsvHeaders = (context, headers) => throw new CsvHelperException(context, string.Format("Unmapped CSV headers: \"{0}\"", string.Join(",", headers))),
})
Demo fiddles:
#2 (your model).
#3 (with external references).
#4 (duplicate names).
#5 (using the auto-generated map).
This could use additional testing, e.g. for data models with parameterized constructors and additional, mutable properties.
How about catching HeaderValidationException before catching CsvHelperException
catch (HeaderValidationException ex)
{
var message = ex.Message.Split('\n')[0];
var currentHeader = ex.Context.Reader.HeaderRecord;
message += $"{Environment.NewLine}Header: \"{string.Join(",", currentHeader)}\"";
Console.WriteLine(message);
fileLoadedLink.Text = "Error loading file.";
viewModel.IsFileLoaded = false;
}
catch (CsvHelperException ex)
{
Console.WriteLine(ex.InnerException != null ? ex.InnerException.Message : ex.Message);
fileLoadedLink.Text = "Error loading file.";
viewModel.IsFileLoaded = false;
}
Related
I have a csv file with the following data:
500000,0.005,6000
690000,0.003,5200
I need to add each line as a separate array. So 50000, 0.005, 6000 would be array1. How would I do this?
Currently my code adds each column into one element.
For example data[0] is showing 500000
690000
static void ReadFromFile(string filePath)
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(filePath))
{
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
string[] data = line.Split(',');
Console.WriteLine(data[0] + " " + data[1]);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
Using the limited data set you've provided...
const string test = #"500000,0.005,6000
690000,0.003,5200";
var result = test.Split('\n')
.Select(x=> x.Split(',')
.Select(y => Convert.ToDecimal(y))
.ToArray()
)
.ToArray();
foreach (var element in result)
{
Console.WriteLine($"{element[0]}, {element[1]}, {element[2]}");
}
Can it be done without LINQ? Yes, but it's messy...
const string test = #"500000,0.005,6000
690000,0.003,5200";
List<decimal[]> resultList = new List<decimal[]>();
string[] lines = test.Split('\n');
foreach (var line in lines)
{
List<decimal> decimalValueList = new List<decimal>();
string[] splitValuesByComma = line.Split(',');
foreach (string value in splitValuesByComma)
{
decimal convertedValue = Convert.ToDecimal(value);
decimalValueList.Add(convertedValue);
}
decimal[] decimalValueArray = decimalValueList.ToArray();
resultList.Add(decimalValueArray);
}
decimal[][] resultArray = resultList.ToArray();
That will give the exact same output as what I've done with the first example
If you may use a List<string[]> you do not have to worry about the array length.
In the following example, the variable lines will be a list arrays, like:
["500000", "0.005", "6000"]
["690000", "0.003", "5200"]
static void ReadFromFile(string filePath)
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader(filePath))
{
List<string[]> lines = new List<string[]>();
string line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
string[] splittedLine = line.Split(',');
lines.Add(splittedLine);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
While other have split method, I will have a more "scolar"-"specified" method.
You have some Csv value in a file. Find a name for this object stored in a Csv, name every column, type them.
Define the default value of those field. Define what happends for missing column, and malformed field. Header?
Now that you know what you have, define what you want. This time again: Object name -> Property -> Type.
Believe me or not, the simple definition of your input and output solved your issue.
Use CsvHelper to simplify your code.
CSV File Definition:
public class CsvItem_WithARealName
{
public int data1;
public decimal data2;
public int goodVariableNames;
}
public class CsvItemMapper : ClassMap<CsvItem_WithARealName>
{
public CsvItemMapper()
{ //mapping based on index. cause file has no header.
Map(m => m.data1).Index(0);
Map(m => m.data2).Index(1);
Map(m => m.goodVariableNames).Index(2);
}
}
A Csv reader method, point a document it will give your the Csv Item.
Here we have some configuration: no header and InvariantCulture for decimal convertion
private IEnumerable<CsvItem_WithARealName> GetCsvItems(string filePath)
{
using (var fileReader = File.OpenText(filePath))
using (var csvReader = new CsvHelper.CsvReader(fileReader))
{
csvReader.Configuration.CultureInfo = CultureInfo.InvariantCulture;
csvReader.Configuration.HasHeaderRecord = false;
csvReader.Configuration.RegisterClassMap<CsvItemMapper>();
while (csvReader.Read())
{
var record = csvReader.GetRecord<CsvItem_WithARealName>();
yield return record;
}
}
}
Usage :
var filename = "csvExemple.txt";
var items = GetCsvItems(filename);
Get the entire Csv Line on parsing error
With CsvHelper we use :
MissingFieldFound:
Gets or sets the function that is called when a missing field is found.The default
function will throw a CsvHelper.MissingFieldException.You can supply your own
function to do other things like logging the issue instead of throwing an exception.
Arguments: headerNames, index, context
BadDataFound:
Gets or sets the function that is called when bad field data is found. A field
has bad data if it contains a quote and the field is not quoted (escaped). You
can supply your own function to do other things like logging the issue instead
of throwing an exception. Arguments: context
In the following MCVE, only MissingFieldFound capture the complete line when BadDataFound did not.
static void Main()
{
using (var stream = new MemoryStream())
using (var writer = new StreamWriter(stream))
using (var reader = new StreamReader(stream))
using (var csv = new CsvReader(reader))
{
writer.WriteLine("FirstName,LastName");
writer.WriteLine("\"Jon\"hn\"\",\"Doe\"");
writer.WriteLine("\"JaneDoe\"");
writer.WriteLine("\"Jane\",\"Doe\"");
writer.Flush();
stream.Position = 0;
var good = new List<Test>();
var bad = new List<string>();
var isRecordBad = false;
csv.Configuration.BadDataFound = context =>
{
isRecordBad = true;
bad.Add(context.RawRecord);
};
csv.Configuration.MissingFieldFound = (headerNames, index, context) =>
{
isRecordBad = true;
bad.Add(context.RawRecord);
};
while (csv.Read())
{
var record = csv.GetRecord<Test>();
if (!isRecordBad)
{
good.Add(record);
}
isRecordBad = false;
}
good.Dump();
bad.Dump();
}
}
public class Test
{
public string FirstName { get; set; }
public string LastName { get; set; }
}
I would like the result to be :
"Jon"hn"","Doe"
"JaneDoe"
Instead of :
"Jon"hn"",
"JaneDoe"
For long Csv with a lot of column the rest of the line often have valuable information.
You can get the line with this:
csv.Parser.Context.RawRecord;
I read Excel files using OpenXml. all work fine but if the spreadsheet contains one cell that has an address mail and after it a space and another word, such as:
abc#abc.com abc
It throws an exception immediately at the opening of the spreadsheet:
var _doc = SpreadsheetDocument.Open(_filePath, false);
exception:
DocumentFormat.OpenXml.Packaging.OpenXmlPackageException
Additional information:
Invalid Hyperlink: Malformed URI is embedded as a
hyperlink in the document.
There is an open issue on the OpenXml forum related to this problem: Malformed Hyperlink causes exception
In the post they talk about encountering this issue with a malformed "mailto:" hyperlink within a Word document.
They propose a work-around here: Workaround for malformed hyperlink exception
The workaround is essentially a small console application which locates the invalid URL and replaces it with a hard-coded value; here is the code snippet from their sample that does the replacement; you could augment this code to attempt to correct the passed brokenUri:
private static Uri FixUri(string brokenUri)
{
return new Uri("http://broken-link/");
}
The problem I had was actually with an Excel document (like you) and it had to do with a malformed http URL; I was pleasantly surprised to find that their code worked just fine with my Excel file.
Here is the entire work-around source code, just in case one of these links goes away in the future:
void Main(string[] args)
{
var fileName = #"C:\temp\corrupt.xlsx";
var newFileName = #"c:\temp\Fixed.xlsx";
var newFileInfo = new FileInfo(newFileName);
if (newFileInfo.Exists)
newFileInfo.Delete();
File.Copy(fileName, newFileName);
WordprocessingDocument wDoc;
try
{
using (wDoc = WordprocessingDocument.Open(newFileName, true))
{
ProcessDocument(wDoc);
}
}
catch (OpenXmlPackageException e)
{
e.Dump();
if (e.ToString().Contains("The specified package is not valid."))
{
using (FileStream fs = new FileStream(newFileName, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
UriFixer.FixInvalidUri(fs, brokenUri => FixUri(brokenUri));
}
}
}
}
private static Uri FixUri(string brokenUri)
{
brokenUri.Dump();
return new Uri("http://broken-link/");
}
private static void ProcessDocument(WordprocessingDocument wDoc)
{
var elementCount = wDoc.MainDocumentPart.Document.Descendants().Count();
Console.WriteLine(elementCount);
}
}
public static class UriFixer
{
public static void FixInvalidUri(Stream fs, Func<string, Uri> invalidUriHandler)
{
XNamespace relNs = "http://schemas.openxmlformats.org/package/2006/relationships";
using (ZipArchive za = new ZipArchive(fs, ZipArchiveMode.Update))
{
foreach (var entry in za.Entries.ToList())
{
if (!entry.Name.EndsWith(".rels"))
continue;
bool replaceEntry = false;
XDocument entryXDoc = null;
using (var entryStream = entry.Open())
{
try
{
entryXDoc = XDocument.Load(entryStream);
if (entryXDoc.Root != null && entryXDoc.Root.Name.Namespace == relNs)
{
var urisToCheck = entryXDoc
.Descendants(relNs + "Relationship")
.Where(r => r.Attribute("TargetMode") != null && (string)r.Attribute("TargetMode") == "External");
foreach (var rel in urisToCheck)
{
var target = (string)rel.Attribute("Target");
if (target != null)
{
try
{
Uri uri = new Uri(target);
}
catch (UriFormatException)
{
Uri newUri = invalidUriHandler(target);
rel.Attribute("Target").Value = newUri.ToString();
replaceEntry = true;
}
}
}
}
}
catch (XmlException)
{
continue;
}
}
if (replaceEntry)
{
var fullName = entry.FullName;
entry.Delete();
var newEntry = za.CreateEntry(fullName);
using (StreamWriter writer = new StreamWriter(newEntry.Open()))
using (XmlWriter xmlWriter = XmlWriter.Create(writer))
{
entryXDoc.WriteTo(xmlWriter);
}
}
}
}
}
The fix by #RMD works great. I've been using it for years. But there is a new fix.
You can see the fix here in the changelog for issue #793
Upgrade OpenXML to 2.12.0.
Right click solution and select Manage NuGet Packages.
Implement the fix
It is helpful to have a unit test. Create an excel file with a bad email address like test#gmail,com. (Note the comma instead of the dot).
Make sure the stream you open and the call to SpreadsheetDocument.Open allows Read AND Write.
You need to implement a RelationshipErrorHandlerFactory and use it in the options when you open. Here is the code I used:
public class UriRelationshipErrorHandler : RelationshipErrorHandler
{
public override string Rewrite(Uri partUri, string id, string uri)
{
return "https://broken-link";
}
}
Then you need to use it when you open the document like this:
var openSettings = new OpenSettings
{
RelationshipErrorHandlerFactory = package =>
{
return new UriRelationshipErrorHandler();
}
};
using var document = SpreadsheetDocument.Open(stream, true, openSettings);
One of the nice things about this solution is that it does not require you to create a temporary "fixed" version of your file and it is far less code.
Unfortunately solution where you have to open file as zip and replace broken hyperlink would not help me.
I just was wondering how it is posible that it works fine when your target framework is 4.0 even if your only installed .Net Framework has version 4.7.2.
I have found out that there is private static field inside System.UriParser that selects version of URI's RFC specification. So it is possible to set it to V2 as it is set for .net 4.0 and lower versions of .Net Framework. Only problem that it is private static readonly.
Maybe someone will want to set it globally for whole application. But I wrote UriQuirksVersionPatcher that will update this version and restore it back in Dispose method. It is obviously not thread-safe but it is acceptable for my purpose.
using System;
using System.Diagnostics;
using System.Reflection;
namespace BarCap.RiskServices.RateSubmissions.Utility
{
#if (NET20 || NET35 || NET40)
public class UriQuirksVersionPatcher : IDisposable
{
public void Dispose()
{
}
}
#else
public class UriQuirksVersionPatcher : IDisposable
{
private const string _quirksVersionFieldName = "s_QuirksVersion"; //See Source\ndp\fx\src\net\System\_UriSyntax.cs in NexFX sources
private const string _uriQuirksVersionEnumName = "UriQuirksVersion";
/// <code>
/// private enum UriQuirksVersion
/// {
/// V1 = 1, // RFC 1738 - Not supported
/// V2 = 2, // RFC 2396
/// V3 = 3, // RFC 3986, 3987
/// }
/// </code>
private const string _oldQuirksVersion = "V2";
private static readonly Lazy<FieldInfo> _targetFieldInfo;
private static readonly Lazy<int?> _patchValue;
private readonly int _oldValue;
private readonly bool _isEnabled;
static UriQuirksVersionPatcher()
{
var targetType = typeof(UriParser);
_targetFieldInfo = new Lazy<FieldInfo>(() => targetType.GetField(_quirksVersionFieldName, BindingFlags.Static | BindingFlags.NonPublic));
_patchValue = new Lazy<int?>(() => GetUriQuirksVersion(targetType));
}
public UriQuirksVersionPatcher()
{
int? patchValue = _patchValue.Value;
_isEnabled = patchValue.HasValue;
if (!_isEnabled) //Disabled if it failed to get enum value
{
return;
}
int originalValue = QuirksVersion;
_isEnabled = originalValue != patchValue;
if (!_isEnabled) //Disabled if value is proper
{
return;
}
_oldValue = originalValue;
QuirksVersion = patchValue.Value;
}
private int QuirksVersion
{
get
{
return (int)_targetFieldInfo.Value.GetValue(null);
}
set
{
_targetFieldInfo.Value.SetValue(null, value);
}
}
private static int? GetUriQuirksVersion(Type targetType)
{
int? result = null;
try
{
result = (int)targetType.GetNestedType(_uriQuirksVersionEnumName, BindingFlags.Static | BindingFlags.NonPublic)
.GetField(_oldQuirksVersion, BindingFlags.Static | BindingFlags.Public)
.GetValue(null);
}
catch
{
#if DEBUG
Debug.WriteLine("ERROR: Failed to find UriQuirksVersion.V2 enum member.");
throw;
#endif
}
return result;
}
public void Dispose()
{
if (_isEnabled)
{
QuirksVersion = _oldValue;
}
}
}
#endif
}
Usage:
using(new UriQuirksVersionPatcher())
{
using(var document = SpreadsheetDocument.Open(fullPath, false))
{
//.....
}
}
P.S. Later I found that someone already implemented this pathcher: https://github.com/google/google-api-dotnet-client/blob/master/Src/Support/Google.Apis.Core/Util/UriPatcher.cs
I haven't use OpenXml but if there's no specific reason for using it then I highly recommend LinqToExcel from LinqToExcel. Example of code is here:
var sheet = new ExcelQueryFactory("filePath");
var allRows = from r in sheet.Worksheet() select r;
foreach (var r in allRows) {
var cella = r["Header"].ToString();
}
I have the following code which takes a CSV and writes to a console:
using (CsvReader csv = new CsvReader(
new StreamReader("data.csv"), true))
{
// missing fields will not throw an exception,
// but will instead be treated as if there was a null value
csv.MissingFieldAction = MissingFieldAction.ReplaceByNull;
// to replace by "" instead, then use the following action:
//csv.MissingFieldAction = MissingFieldAction.ReplaceByEmpty;
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};",
headers[i],
csv[i] == null ? "MISSING" : csv[i]));
Console.WriteLine();
}
}
The CSV file has 7 headers for which I have 7 columns in my SQL table.
What is the best way to take each csv[i] and write to a row for each column and then move to the next row?
I tried to add the ccsv[i] to a string array but that didn't work.
I also tried the following:
SqlCommand sql = new SqlCommand("INSERT INTO table1 [" + csv[i] + "]", mysqlconnectionstring);
sql.ExecuteNonQuery();
My table (table1) is like this:
name address city zipcode phone fax device
your problem is simple but I will take it one step further and let you know a better way to approach the issue.
when you have a problem to sold, always break it down into parts and apply each part in each own method. For example, in your case:
1 - read from the file
2 - create a sql query
3 - run the query
and you can even add validation to the file (imagine your file does not even have 7 fields in one or more lines...) and the example below it to be taken, only if your file never passes around 500 lines, as if it does normally you should consider to use a SQL statement that takes your file directly in to the database, it's called bulk insert
1 - read from file:
I would use a List<string> to hold the line entries and I always use StreamReader to read from text files.
using (StreamReader sr = File.OpenText(this.CsvPath))
{
while ((line = sr.ReadLine()) != null)
{
splittedLine = line.Split(new string[] { this.Separator }, StringSplitOptions.None);
if (iLine == 0 && this.HasHeader)
// header line
this.Header = splittedLine;
else
this.Lines.Add(splittedLine);
iLine++;
}
}
2 - generate the sql
foreach (var line in this.Lines)
{
string entries = string.Concat("'", string.Join("','", line))
.TrimEnd('\'').TrimEnd(','); // remove last ",'"
this.Query.Add(string.Format(this.LineTemplate, entries));
}
3 - run the query
SqlCommand sql = new SqlCommand(string.Join("", query), mysqlconnectionstring);
sql.ExecuteNonQuery();
having some fun I end up doing the solution and you can download it here, the output is:
The code can be found here. It needs more tweaks but I will left that for others. Solution written in C#, VS 2013.
The ExtractCsvIntoSql class is as follows:
public class ExtractCsvIntoSql
{
private string CsvPath, Separator;
private bool HasHeader;
private List<string[]> Lines;
private List<string> Query;
/// <summary>
/// Header content of the CSV File
/// </summary>
public string[] Header { get; private set; }
/// <summary>
/// Template to be used in each INSERT Query statement
/// </summary>
public string LineTemplate { get; set; }
public ExtractCsvIntoSql(string csvPath, string separator, bool hasHeader = false)
{
this.CsvPath = csvPath;
this.Separator = separator;
this.HasHeader = hasHeader;
this.Lines = new List<string[]>();
// you can also set this
this.LineTemplate = "INSERT INTO [table1] SELECT ({0});";
}
/// <summary>
/// Generates the SQL Query
/// </summary>
/// <returns></returns>
public List<string> Generate()
{
if(this.CsvPath == null)
throw new ArgumentException("CSV Path can't be empty");
// extract csv into object
Extract();
// generate sql query
GenerateQuery();
return this.Query;
}
private void Extract()
{
string line;
string[] splittedLine;
int iLine = 0;
try
{
using (StreamReader sr = File.OpenText(this.CsvPath))
{
while ((line = sr.ReadLine()) != null)
{
splittedLine = line.Split(new string[] { this.Separator }, StringSplitOptions.None);
if (iLine == 0 && this.HasHeader)
// header line
this.Header = splittedLine;
else
this.Lines.Add(splittedLine);
iLine++;
}
}
}
catch (Exception ex)
{
if(ex.InnerException != null)
while (ex.InnerException != null)
ex = ex.InnerException;
throw ex;
}
// Lines will have all rows and each row, the column entry
}
private void GenerateQuery()
{
foreach (var line in this.Lines)
{
string entries = string.Concat("'", string.Join("','", line))
.TrimEnd('\'').TrimEnd(','); // remove last ",'"
this.Query.Add(string.Format(this.LineTemplate, entries));
}
}
}
and you can run it as:
class Program
{
static void Main(string[] args)
{
string file = Ask("What is the CSV file path? (full path)");
string separator = Ask("What is the current separator? (; or ,)");
var extract = new ExtractCsvIntoSql(file, separator);
var sql = extract.Generate();
Output(sql);
}
private static void Output(IEnumerable<string> sql)
{
foreach(var query in sql)
Console.WriteLine(query);
Console.WriteLine("*******************************************");
Console.Write("END ");
Console.ReadLine();
}
private static string Ask(string question)
{
Console.WriteLine("*******************************************");
Console.WriteLine(question);
Console.Write("= ");
return Console.ReadLine();
}
}
Usually i like to be a bit more generic so i'll try to explain a very basic flow i use from time to time:
I don't like the hard coded attitude so even if your code will work it will be dedicated specifically to one type. I prefer i simple reflection, first to understand what DTO is it and then to understand what repository should i use to manipulate it:
For example:
public class ImportProvider
{
private readonly string _path;
private readonly ObjectResolver _objectResolver;
public ImportProvider(string path)
{
_path = path;
_objectResolver = new ObjectResolver();
}
public void Import()
{
var filePaths = Directory.GetFiles(_path, "*.csv");
foreach (var filePath in filePaths)
{
var fileName = Path.GetFileName(filePath);
var className = fileName.Remove(fileName.Length-4);
using (var reader = new CsvFileReader(filePath))
{
var row = new CsvRow();
var repository = (DaoBase)_objectResolver.Resolve("DAL.Repository", className + "Dao");
while (reader.ReadRow(row))
{
var dtoInstance = (DtoBase)_objectResolver.Resolve("DAL.DTO", className + "Dto");
dtoInstance.FillInstance(row.ToArray());
repository.Save(dtoInstance);
}
}
}
}
}
Above is a very basic class responsible importing the data. Nevertheless of how this piece of code parsing CSV files (CsvFileReader), the important part is thata "CsvRow" is a simple List.
Below is the implementation of the ObjectResolver:
public class ObjectResolver
{
private readonly Assembly _myDal;
public ObjectResolver()
{
_myDal = Assembly.Load("DAL");
}
public object Resolve(string nameSpace, string name)
{
var myLoadClass = _myDal.GetType(nameSpace + "." + name);
return Activator.CreateInstance(myLoadClass);
}
}
The idea is to simple follow a naming convetion, in my case is using a "Dto" suffix for reflecting the instances, and "Dao" suffix for reflecting the responsible dao. The full name of the Dto or the Dao can be taken from the csv name or from the header (as you wish)
Next step is filling the Dto, each dto or implements the following simple abstract:
public abstract class DtoBase
{
public abstract void FillInstance(params string[] parameters);
}
Since each Dto "knows" his structure (just like you knew to create an appropriate table in the database), it can easily implement the FillInstanceMethod, here is a simple Dto example:
public class ProductDto : DtoBase
{
public int ProductId { get; set; }
public double Weight { get; set; }
public int FamilyId { get; set; }
public override void FillInstance(params string[] parameters)
{
ProductId = int.Parse(parameters[0]);
Weight = double.Parse(parameters[1]);
FamilyId = int.Parse(parameters[2]);
}
}
After you have your Dto filled with data you should find the appropriate Dao to handle it
which is basically happens in reflection in this line of the Import() method:
var repository = (DaoBase)_objectResolver.Resolve("DAL.Repository", className + "Dao");
In my case the Dao implements an abstract base class - but it's not that relevant to your problem, your DaoBase can be a simple abstract with a single Save() method.
This way you have a dedicated Dao to CRUD your Dto's - each Dao simply knows how to save for its relevant Dto. Below is the corresponding ProductDao to the ProductDto:
public class ProductDao : DaoBase
{
private const string InsertProductQuery = #"SET foreign_key_checks = 0;
Insert into product (productID, weight, familyID)
VALUES (#productId, #weight, #familyId);
SET foreign_key_checks = 1;";
public override void Save(DtoBase dto)
{
var productToSave = dto as ProductDto;
var saveproductCommand = GetDbCommand(InsertProductQuery);
if (productToSave != null)
{
saveproductCommand.Parameters.Add(CreateParameter("#productId", productToSave.ProductId));
saveproductCommand.Parameters.Add(CreateParameter("#weight", productToSave.Weight));
saveproductCommand.Parameters.Add(CreateParameter("#familyId", productToSave.FamilyId));
ExecuteNonQuery(ref saveproductCommand);
}
}
}
Please ignore the CreateParameter() method, since it's an abstraction from the base classs. you can just use a CreateSqlParameter or CreateDataParameter etc.
Just notice, it's a real naive implementation - you can easily remodel it better, depends on your needs.
From the first impression of your questionc I guess you would be having hugely number of records (more than lacs). If yes I would consider the SQL bulk copies an option. If the record would be less go ahead single record. Insert. The reason for you insert not working is u not providing all the columns of the table and also there's some syntax error.
I'm having trouble figuring out how to determine if a number is duplicated.
Right now, the process is when the user clicks on a button to browse for an xml file, the xml file gets deserialized and stored into db and the data gets shown on a DataGrid on the view.
So, I added a confirmation dialog so when the user clicks on browse, the code checks to see if the lot_number being deserialized is a duplicate or not from inside a column from a table in database. I only want the user to be able to add lot numbers to db that are not duplicates.
Here's my code so far:
public void DeSerializationStream(string filePath)
{
XmlRootAttribute xRoot = new XmlRootAttribute();
xRoot.ElementName = "lot_information";
xRoot.IsNullable = false;
// Create an instance of lotinformation class.
var lot = new LotInformation();
// Create an instance of stream writer.
TextReader txtReader = new StreamReader(filePath);
// Create and instance of XmlSerializer class.
XmlSerializer xmlSerializer = new XmlSerializer(typeof(LotInformation), xRoot);
// DeSerialize from the StreamReader
lot = (LotInformation)xmlSerializer.Deserialize(txtReader);
// Close the stream reader
txtReader.Close();
}
public void ReadLot(LotInformation lot)
{
try
{
using (var db = new DDataContext())
{
var lotNumDb = db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(r.lot_number));
if (lotNumDb != null || lotNumDb.lot_number.ToString().Equals(lot.lot_number))
{
confirmationWindow.Message = LanguageResources.Resource.Sample_Exists_Already;
dialogService.ShowDialog(LanguageResources.Resource.Error, confirmationWindow);
}
else {
Console.WriteLine("lot does not exist. yay");
}
DateTime ExpirationDate = lot.exp_date;
if (ExpirationDate != null)
{
if (DateTime.Compare(ExpirationDate, DateTime.Now) > 0)
{
try
{
LotInformation lotInfo = db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(lotNumber));
}
catch (InvalidOperationException e)
{
//TODO: Add a Dialog Here
}
}
else
{
Console.WriteLine(ExpirationDate);
errorWindow.Message = LanguageResources.Resource.Lot_Expired;
dialogService.ShowDialog(LanguageResources.Resource.Error, errorWindow);
}
}
else
{
errorWindow.Message = LanguageResources.Resource.Lot_Not_In_Database;
dialogService.ShowDialog(LanguageResources.Resource.Error, errorWindow);
}
}
}
catch
{
errorWindow.Message = LanguageResources.Resource.Database_Error;
dialogService.ShowDialog(LanguageResources.Resource.Error, errorWindow);
logger.writeErrLog(LanguageResources.Resource.Database_Error);
}
}
I think I'm just having problems with when to grab the lot_number in this process.
This part below gives me problems. It keeps showing the Sample Exists already message for unique lot numbers that I'm uploading and I'm not sure why. I think it's a problem with my LINQ query but I'm not sure how to fix it. Any ideas?
var lotNumDb = db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(r.lot_number));
if (lotNumDb != null || lotNumDb.lot_number.ToString().Equals(lot.lot_number))
{
confirmationWindow.Message = LanguageResources.Resource.Sample_Exists_Already;
dialogService.ShowDialog(LanguageResources.Resource.Error, confirmationWindow);
}
else {
Console.WriteLine("lot does not exist. yay");
}
you can't use like this:
db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(r.lot_number))
may be :
db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(lot.lot_number))
or
db.LotInformation.FirstOrDefault(r => r.lot_number.Equals(a string))