Error: 'FileId' field header not found. Parameter name: name - c#

I am new to CsvHelper, my apologies if I have missed something in the documentation.
I have a CSV file with 200 off columns. Typically there would be close to 65000 rows. Importing these rows into a SQL Database Table was fine, until I added a new field in the SQL Database Table called "FileId" - which does not exist in the CSV File. I wish to Inject this field and the relevant value.
How do I do this please?
Please see code below I am using:
const string fileToWorkWith = #"C:\Data\Fidessa ETP Files\Import\2019\myCsvFile.csv";
Output.WriteLn($"Working with file {fileToWorkWith}.");
const string databaseConnectionString = "Server=MyServer;Database=DB;User Id=sa; Password = xyz;";
Output.WriteLn($"Checking if working file exists.");
if (new System.IO.FileInfo(fileToWorkWith).Exists == false)
{
Output.WriteLn("Working file does not exist.", Output.WriteTypes.Error);
return;
}
Output.WriteLn("Reading file.");
using (var reader = new CsvReader(new StreamReader(fileToWorkWith), true, char.Parse(",") ))
{
reader.Columns = new List<LumenWorks.Framework.IO.Csv.Column>
{
new LumenWorks.Framework.IO.Csv.Column { Name = "FileId", Type = typeof(int), DefaultValue = "1" },
};
reader.UseColumnDefaults = true;
Output.WriteLn("Checking fields in file exist in the Database.");
foreach (var fieldName in reader.GetFieldHeaders())
{
if (Fields.IsValid(fieldName.Replace(" ","_")) == false)
{
Output.WriteLn($"A new field named {fieldName} has been found in the file that does not exist in the database.", Output.WriteTypes.Error);
return;
}
}
using (var sbc = new SqlBulkCopy(databaseConnectionString))
{
sbc.DestinationTableName = "FidessaETP.tableARC_EventsOrderAndFlow_ImportTest";
sbc.BatchSize = 1000;
Output.WriteLn("Mapping available Csv Fields to DB Fields");
foreach (var field in reader.GetFieldHeaders().ToArray())
{
sbc.ColumnMappings.Add(field, field.Replace(" ", "_"));
}
sbc.WriteToServer(reader);
}
}
The Error Details
Message:
'FileId' field header not found. Parameter name: name
Source:
LumenWorks.Framework.IO
Stack Trace:
System.ArgumentException: 'FileId' field header not found. Parameter
name: name at
LumenWorks.Framework.IO.Csv.CsvReader.System.Data.IDataRecord.GetOrdinal(String
name) at
System.Data.SqlClient.SqlBulkCopy.WriteRowSourceToServerCommon(Int32
columnCount) at
System.Data.SqlClient.SqlBulkCopy.WriteRowSourceToServerAsync(Int32
columnCount, CancellationToken ctoken) at
System.Data.SqlClient.SqlBulkCopy.WriteToServer(IDataReader reader) at
Haitong.Test.CsvImporter.Program.Main(String[] args) in
C:\Development\Workspaces\UK OPS Data Warehouse\UK OPS Data
Warehouse\Haitong.Test.CsvImporter\Program.cs:line 86

You might be able to solve the problem by loading the CSV data into a DataTable, adding a FileId column with a default value to the table, and passing the DataTable into the SqlBulkCopy. It doesn't look like your current solution loads the whole file into memory, so you should monitor memory usage if you try this approach. You might be able to get your current solution to work if you dig through the documentation of the Columns property of the CsvReader. It looks like it does not behave the way you are trying to use it.
Here is an example of you you might load the file using a DataTable:
DataTable csvTable = new DataTable();
using (var reader = new StreamReader("path\\to\\file.csv"))
{
using (var csv = new CsvReader(reader, true))
{
csvTable.Load(csv);
}
}
DataColumn newColumn = new DataColumn("FileId", typeof(System.Int32));
newColumn.DefaultValue = 1;
csvTable.Columns.Add(newColumn);
using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString))
{
sbc.WriteToServer(csvTable);
}

Related

C# ExcelDataReader Error - 'Invalid file signature' for XLSB format

I am receiving 'Invalid file signature' error when I try to read xlsb file using below code.
If I use CreateReader, then I am receiving 'Detected ZIP file, but not a valid OpenXml file' error. I have also tried other options as given below but nothing works for me.
Can somebody help me to read xlsb file.
Stream stream = new MemoryStream(srcContent);
public static DataSet GetXLSBData(Stream stream)
{
DataSet dataSet;
using (var reader = ExcelReaderFactory.CreateBinaryReader(stream))
{
dataSet = reader.AsDataSet();
}
foreach (DataTable table in dataSet.Tables)
{
table.TableName = table.TableName.Trim();
}
return dataSet;
}
Other options tried:
var reader = ExcelReaderFactory.CreateOpenXmlReader(stream)
var reader = ExcelReaderFactory.CreateCsvReader(stream)
var reader = ExcelReaderFactory.CreateReader(stream)
My proposal
c# code :
using (XlsxOrXlsbReadOrEdit excelFile = new XlsxOrXlsbReadOrEdit())
{
excelFile.Open("file.xlsx");
excelFile.ActualSheetName = "sheet1";
object[] row = null;
while (excelFile.Read())
{
if (row == null)
{
row = new object[excelFile.FieldCount];
}
excelFile.GetValues(row);
}
}
disclimer - I am creator of SpreadSheetTasks
Links
https://www.nuget.org/packages/SpreadSheetTasks/

Validate a certain column in a csv file data using textfieldparser in c#

I have a button in my webform that uploads a csv file with data, inserts the data in a database, and displays the content of the database in a gridview. I am using a TextFieldParser to read the csv file. However, I cannot seem to figure out how to add a validation in it.
I want to validate the first column (which is the SKU in my database) of the uploaded csv file data. If the data has a duplicate in the database, it will prompt a message that the action cannot be completed. If not, it will continue to insert the data in the database.
This is what the data in the csv file that will be uploaded looks like
For reference, This is my code:
protected void AddButton_Click(object sender, EventArgs e)
{
string path = #"C:\Users\hac9289\Downloads\";
//Creating object of datatable
DataTable tblcsv = new DataTable();
//creating columns
tblcsv.Columns.Add("Stock Keeping Unit");
tblcsv.Columns.Add("Universal Product Code");
tblcsv.Columns.Add("Vendor Name");
tblcsv.Columns.Add("Product Name");
tblcsv.Columns.Add("Product Description");
tblcsv.Columns.Add("Retail Price");
//getting full file path of Uploaded file
string CSVFilePath = Path.GetFullPath(path + AddFile.PostedFile.FileName);
if (!AddFile.HasFile)
{
ScriptManager.RegisterStartupScript(this, typeof(string), "Alert", "alert('File Upload Empty');", true);
}
else
{
//parse records in csv file
using (TextFieldParser parser = new TextFieldParser(CSVFilePath))
{
parser.HasFieldsEnclosedInQuotes = true;
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
bool invalid = false;
while (!parser.EndOfData)
{
//Processing row
tblcsv.Rows.Add();
int count = 0;
string[] fields = parser.ReadFields();
/*I am trying these code for validation but it doesnt work hahaha
foreach (DataRow row in tblcsv.Rows)
{
// Check some other column is not equal to some value
if (row["StockKeepingUnit"] == fields)
{
ScriptManager.RegisterStartupScript(this, typeof(string), "Alert", "alert('Action not completed due to duplicate SKU');", true);
}
}*/
foreach (string field in fields)
{
tblcsv.Rows[tblcsv.Rows.Count - 1][count] = field;
count++;
}
}
}
InsertCSVRecords(tblcsv);
PopulateGridView();
}
}
private void InsertCSVRecords(DataTable csvdt)
{
using (SqlConnection connect = new SqlConnection(connectionString))
{
connect.Open();
//creating object of SqlBulkCopy
using (SqlBulkCopy objbulk = new SqlBulkCopy(connect))
{
//assigning Destination table name
objbulk.DestinationTableName = "RetailInfo";
//Mapping Table column
objbulk.ColumnMappings.Add(0, "StockKeepingUnit");
objbulk.ColumnMappings.Add(1, "UniversalProductCode");
objbulk.ColumnMappings.Add(2, "VendorName");
objbulk.ColumnMappings.Add(3, "ProductName");
objbulk.ColumnMappings.Add(4, "ProductDesc");
objbulk.ColumnMappings.Add(5, "RetailPrice");
//inserting Datatable Records to DataBase
objbulk.WriteToServer(csvdt);
}
}
ScriptManager.RegisterStartupScript(this, typeof(string), "Alert", "alert('CSV Data added');", true);
}
Any ideas? Any help will be appreciated. Thank you.
Like this,
...
foreach (string field in fields)
{
// Just here your fields populating in tblcsv's rows. You can check here your field is existing or not.
// [your_code] : checks field type (or something else) is SKU, after check if exist.
tblcsv.Rows[tblcsv.Rows.Count - 1][count] = field;
count++;
}
...
You can use the select method of data table which returns an array of rows that matches the filter criteria.
//Instead of 0 in fileds[0] you have to provide an index of stock-keeping unit index position in array
if (tblcsv.Select($"StockKeepingUnit={fields[0]}").Length > 0)
{
//Display Message Record already exists
//Skip to add in datatable and continue
//or display message
}

Manipulate existing CSV file, while keeping columns order. (CsvReader/CsvWriter)

I need to manipulate an existing CSV file via following actions:
Read from an existing CSV file -> then Append new row to it.
I have following code which is choking over the 3rd row - as the file is already in use by the code from the 1st row. And I'm not sure how to read it properly otherwise, and then append new row to it.
public bool Save(Customer customer)
{
using (StreamReader input = File.OpenText("DataStoreOut.csv"))
using (CsvReader csvReader = new CsvReader(input))
using (StreamWriter output = File.CreateText("DataStoreOut.csv"))
using (var csvWriter = new CsvWriter(output))
{
IEnumerable<Customer> records = csvReader.GetRecords<Customer>();
List<Customer> customerList = new List<Customer>();
customerList.Add(customer);
csvWriter.WriteHeader<Customer>();
csvWriter.NextRecord();
foreach (var array in customerList)
{
csvWriter.WriteRecord(records.Append(array));
}
}
}
Each of row in the CSV file contains a customer.CustomerId (which is unique, and read-only). How can I read only row which has specific customerId and then update any values there.
If you want to append a record to a file, the best way to do it is read the items, add the new one to the collection, and write everything back.
public static void Append(Customer customer, string file)
{
List<Customer> records = null;
using (var reader = new StreamReader(file))
{
using (var csv = new CsvReader(reader))
{
records = csv.GetRecords<Customer>().ToList();
}
}
records.Add(customer);
using (var writer = new StreamWriter(file))
{
using (var csv = new CsvWriter(writer))
{
csv.WriteRecords(records);
}
}
}
As #Dour High Arch mentioned, to be perfectly safe though you might want to take the extra step of using a temp file in case something goes wrong.
If you want to update instead of append, you'd have to look up the specified record, and update it if it exists.
public static void Update(Customer customer, string file)
{
List<Customer> records = null;
using (var reader = new StreamReader(file))
{
using (var csv = new CsvReader(reader))
{
records = csv.GetRecords<Customer>().ToList();
}
}
var index = records.FindIndex(x => x.ID == customer.ID);
if (index >= 0)
{
records[index] = customer;
using (var writer = new StreamWriter(file))
{
using (var csv = new CsvWriter(writer))
{
csv.WriteRecords(records);
}
}
}
}
Again, writing to a temp file is advisable.
UPDATE
Actually there's a slightly better way to append if you don't want to replace the file. When instantiating a StreamWriter you can do so with append=true. In which case, it will append to the end of the file.
The small caveat is that in case the EOF marker is not at a new line but at the last field of the last record, this will append record to the end of the last field messing up your columns. As a workaround I've added a writer.WriteLine(); before using the CSVHelper class' writer.
public static void Append2(Customer customer, string file)
{
using (var writer = new StreamWriter(file, true))
{
writer.WriteLine();
using (var csv = new CsvWriter(writer))
{
csv.WriteRecord(customer);
}
}
}
In case the file is in a new line, then this will add an empty line though. That can be countered by ignoring empty lines when you read a file.

Exporting CSV to SQL in C# - how to offset the export by one column

I am using C# to parse a csv file and export to a SQL Server database table. The schema of the database table is almost identical to that of the csv file, with the exception that the table has a Primary Key Identity column as the first column.
The problem: the 2nd column of the database table, which should receive the 1st column of the csv file, is actually receiving the 2nd column of the csv file. The code is assuming that first PK Identity column of the database table is the first column to be written to from the CSV file. In case this is confusing, assume column 1, 2, and 3 of the CSV file have headers called Contoso1, Contoso2 and Contoso3, respectively. The database table's columns 1 through 4 are called Id, Contoso1, Contoso2, and Contoso3, respectively. During the export, the Id column correctly gets populated with the identity id, but then the Contoso1 column of the database table gets populated with the Contoso2 column of the CSV file, and that being off by one column continues on for all 300 columns.
Here is the code. I'm looking for a way to do a one-column offset with this code. If possible, I'd like to avoid hardcoding a mapping scheme as there are 300+ columns.
using System;
using System.Data.SqlClient;
using System.Data;
using Microsoft.VisualBasic.FileIO;
namespace CSVTest
{
class Program
{
static void Main(string[] args)
{
string csv_file_path = #"pathToCsvFile";
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
InsertDataIntoSQLServerUsingSQLBulkCopy(csvData);
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
return null;
}
return csvData;
}
static void InsertDataIntoSQLServerUsingSQLBulkCopy(DataTable csvFileData)
{
using (SqlConnection dbConnection = new SqlConnection("Data Source=localhost;Initial Catalog=Database_Name;Integrated Security=SSPI;"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = "TableName";
//foreach (var column in csvFileData.Columns)
//s.ColumnMappings.Add(column.ToString(), column.ToString());
s.WriteToServer(csvFileData);
}
}
}
}
}
I'm assuming that -
a. only column needs to be skipped, but this can be modified to add multiple columns to skip
b. you know, ahead of time, the zero based index of the column to skip.
With that out of the way, here are the 3 modifications you need to make.
Add the variable to store the index to skip
string csv_file_path = #"pathToCsvFile";
//Assuming just one index for the column number to skip - zero based counting.
//perhaps read from the AppConfig
int columnIndexToSkip = 0;
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path, columnIndexToSkip);
Modify the function signature to take the extra int parameter
private static DataTable GetDataTabletFromCSVFile(string csv_file_path, int columnIndexToSkip)
{
Add the dummy column at that index
csvData.Rows.Add(fieldData);
}
}// end of while (!csvReader.EndOfData) loop
if (columnIndexToSkip >= 0)
{
csvData.Columns.Add("DUMMY").SetOrdinal(columnIndexToSkip);
}
I've not tested the import, but the updated csv file looks good to me.

Defining a table rather than a range as a PivotTable 'cacheSource'

I am building a tool to automate the creation of an Excel workbook that contains a table and an associated PivotTable. The table structure is on one sheet, the data for which will be pulled from a database using another tool at a later point. The PivotTable is on a second sheet using the table from the previous sheet as the source.
I am using EPPlus to facilitate building the tool but am running into problems specifying the cacheSource. I am using the following to create the range and PivotTable:
var dataRange = dataWorksheet.Cells[dataWorksheet.Dimension.Address.ToString()];
var pivotTable = pivotWorksheet.PivotTables.Add(pivotWorksheet.Cells["B3"], dataRange, name);
This sets the cacheSource to:
<x:cacheSource type="worksheet" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:worksheetSource ref="A1:X2" sheet="dataWorksheet" />
or within Excel, the data source is set to:
dataWorksheet!$A$1:$X$2
This works fine if the table size never changes, but as the number of rows will be dynamic, I am finding when the data is refreshed, data is only read from the initial range specified.
What I am want to do is to programmatically set the cacheSource to:
<x:cacheSource type="worksheet" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:worksheetSource name="dataWorksheet" />
</x:cacheSource>
or in Excel, set the data source to:
dataWorksheet
I believe it may be possible to do this by accessing the XML directly (any pointers on this would be most welcome) but is there any way to do this using EPPlus?
It can be done but it is not the prettiest thing in the world. You can extract the cache def xml and edit it from the created EPPlus pivot table object but that will wreak havoc with the save logic when you call package.save() (or GetAsByteArray()) since it parses the xml on save to generate the final file. This is the result of, as you said, EPPlus not capable of handling a table as the source.
So, your alternative is to save the file with EPPlus normally and then manipulate the content of the xlsx which is a renamed zip file using a .net ZipArchive. The trick is you cannot manipulate the files out of order in the zip otherwise Excel will complain when it opens the file. And since you cannot insert an entry (only add to the end) you have to recreate the zip. Here is an extension method on a ZipArchive that will allow you to update the cache source:
public static bool SetCacheSourceToTable(this ZipArchive xlsxZip, FileInfo destinationFileInfo, string tablename, int cacheSourceNumber = 1)
{
var cacheFound = false;
var cacheName = String.Format("pivotCacheDefinition{0}.xml", cacheSourceNumber);
using (var copiedzip = new ZipArchive(destinationFileInfo.Open(FileMode.Create, FileAccess.ReadWrite), ZipArchiveMode.Update))
{
//Go though each file in the zip one by one and copy over to the new file - entries need to be in order
xlsxZip.Entries.ToList().ForEach(entry =>
{
var newentry = copiedzip.CreateEntry(entry.FullName);
var newstream = newentry.Open();
var orgstream = entry.Open();
//Copy all other files except the cache def we are after
if (entry.Name != cacheName)
{
orgstream.CopyTo(newstream);
}
else
{
cacheFound = true;
//Load the xml document to manipulate
var xdoc = new XmlDocument();
xdoc.Load(orgstream);
//Get reference to the worksheet xml for proper namespace
var nsm = new XmlNamespaceManager(xdoc.NameTable);
nsm.AddNamespace("default", xdoc.DocumentElement.NamespaceURI);
//get the source
var worksheetSource = xdoc.SelectSingleNode("/default:pivotCacheDefinition/default:cacheSource/default:worksheetSource", nsm);
//Clear the attributes
var att = worksheetSource.Attributes["ref"];
worksheetSource.Attributes.Remove(att);
att = worksheetSource.Attributes["sheet"];
worksheetSource.Attributes.Remove(att);
//Create the new attribute for table
att = xdoc.CreateAttribute("name");
att.Value = tablename;
worksheetSource.Attributes.Append(att);
xdoc.Save(newstream);
}
orgstream.Close();
newstream.Flush();
newstream.Close();
});
}
return cacheFound;
}
And here is how to use it:
//Throw in some data
var datatable = new DataTable("tblData");
datatable.Columns.AddRange(new[]
{
new DataColumn("Col1", typeof (int)), new DataColumn("Col2", typeof (int)), new DataColumn("Col3", typeof (object))
});
for (var i = 0; i < 10; i++)
{
var row = datatable.NewRow();
row[0] = i; row[1] = i*10; row[2] = Path.GetRandomFileName();
datatable.Rows.Add(row);
}
const string tablename = "PivotTableSource";
using (var pck = new ExcelPackage())
{
var workbook = pck.Workbook;
var source = workbook.Worksheets.Add("source");
source.Cells.LoadFromDataTable(datatable, true);
var datacells = source.Cells["A1:C11"];
source.Tables.Add(datacells, tablename);
var pivotsheet = workbook.Worksheets.Add("pivot");
pivotsheet.PivotTables.Add(pivotsheet.Cells["A1"], datacells, "PivotTable1");
using (var orginalzip = new ZipArchive(new MemoryStream(pck.GetAsByteArray()), ZipArchiveMode.Read))
{
var fi = new FileInfo(#"c:\temp\Pivot_From_Table.xlsx");
if (fi.Exists)
fi.Delete();
var result = orginalzip.SetCacheSourceToTable(fi, tablename, 1);
Console.Write("Cache source was updated: ");
Console.Write(result);
}
}

Categories

Resources