I am trying to populate a datatable reading values from a csv file. The format of the datatable should match a corresponding table in a database.
The csv file has very many columns (~80) so I don't want to type out everything. The names of the columns in the csv file don't match the names of the columns in the db exactly. Also, two additional columns, with data not present in the csv have to be added manually.
The problem is to convert the string data from the csv-file to the correct type in the datatable.
Currently I have
I read the table template from the database and use this to create my new datatable.
I create a map that maps the column positions from csv file to the column positions in the database.
I try to insert the value from the csv file into the datatable. This is where my code fails, because the data is of the incorrect type. As stated above, since there are so many different columns, I dont want to do the conversion manually but rather infer the type from the table template. Also, some columns can contain null values.
My code
public static DataTable ReadAssets(string strFilePath, DateTime reportingDate, Enums.ReportingBases reportingBasis, char sep=',')
{
//Reads the table template from the database
DataTable dt = DbInterface.Db.GetTableTemplate("reports.Assets");
var dbColumnNames = dt.Columns.Cast<DataColumn>().Select(x => x.ColumnName).ToList();
//These columns are not present in the csv data and so they have to be added manually
int posReportingDate = dbColumnNames.IndexOf("ReportingDate");
int posReportingBasis = dbColumnNames.IndexOf("ReportingBasis");
//read the csv and populate the table
using (StreamReader sr = new (strFilePath))
{
string[] csvColumnNames = sr.ReadLine().Split(sep);
//creates an <int, int> dictionary that maps the columns
var columnMap = CreateColumnMap(dbColumnNames.ToArray(), csvColumnNames);
while (!sr.EndOfStream)
{
string[] csvRow = sr.ReadLine().Split(sep);
DataRow dr = dt.NewRow();
dr[posReportingDate] = reportingDate;
dr[posReportingBasis] = reportingBasis.ToString();
foreach(var posPair in columnMap)
{
//This is where the code fails.... I need a conversion to the correct type here.
dr[posPair.Value] = csvRow[posPair.Key];
}
dt.Rows.Add(dr);
}
}
return dt;
}
I maintain a couple libraries that can help with this scenario: Sylvan.Data and Sylvan.Data.Csv. They are both open-source, MIT licensed, and available on nuget.org. My library allows applying a schema to CSV data, and attaching extra columns. Doing this allows using the SqlBulkCopy to efficiently load the data directly into the database. My CSV parser also happens to be the fastest in the .NET ecosystem.
As an example, given the following target SQL table:
create table MyTable (
Name varchar(32),
Value int,
ValueDate datetime,
InsertDate datetime,
RowNum int
)
A CSV file, data.csv, containing the following:
a,b,c
a,1,2022-01-01
b,2,2022-01-02
Here is a complete C# 6 sample program that will bulk copy CSV data along with "extra" columns into a data base table.
using Sylvan.Data; // v0.1.1
using Sylvan.Data.Csv; // v1.1.11
using System.Data.SqlClient;
const string SourceCsvFile = "data.csv";
const string TargetTableName = "MyTable";
var conn = new SqlConnection();
conn.ConnectionString = new SqlConnectionStringBuilder
{
DataSource = ".",
InitialCatalog = "Test",
IntegratedSecurity = true
}.ConnectionString;
conn.Open();
// read schema for the target table
var cmd = conn.CreateCommand();
cmd.CommandText = $"select top 0 * from {TargetTableName}";
var reader = cmd.ExecuteReader();
var schema = reader.GetColumnSchema();
reader.Close();
// apply the database schema to the CSV data
var opts = new CsvDataReaderOptions { Schema = new CsvSchema(schema) };
var csvReader = CsvDataReader.Create(SourceCsvFile, opts);
// attach additional external columns to the CSV data
var data = csvReader.WithColumns(
new CustomDataColumn<DateTime>("ImportDate", r => DateTime.UtcNow),
new CustomDataColumn<int>("RowNum", r => csvReader.RowNumber)
);
// bulk copy the data into the target table
var bc = new SqlBulkCopy(conn);
bc.DestinationTableName = TargetTableName;
bc.WriteToServer(data);
Hopefully you find this to be an elegant solution.
Related
I need to merge data of a .csv file with the data of multiple SQL database tables.
What I get is a .csv file with approximately 2000 lines of rows. Those rows contain data of two tables, once for a customer table and once for contract infos to this customer (1:n relation).
The data would then be compared, new data will be imported, existing one updated, this counts for both tables, the customers and the contracts.
On the other hand, I need a dynamic, rather simple mapping: csv header "X" = table column "Y". The important bit here is, that it has to be dynamic, which essentially means, I'd be able to, at any point in time, add more columns to my .csv file, create an entry in my mapping file and it would still fetch this new entry. Vice-versa of course, if I change a mapped column or remove one the same behavior.
What I have done:
I parse the .csv file with GenericParser to a datatable with column headers. I created an ini file which is essentially just mapping the headers headers from that datatable to the table on my SQL server. I then replace the datatable columnNames with the column header names from my SQL table, which I, again, fetch from my mapping.
My Problem:
What would be the best way to make sure, my mapping knows which SQL table that column is part of and how would that translate into my SQL statements for the merging process.
I could also imagine, that this is completely the wrong way to do this. If so, I am open for suggestions.
Main:
static void Main(string[] args)
{
// Init Config
_mainConfig = new Configuration.Main(#"C:\temp\kosy\mainconfig.cfg");
// get Import Files
String csvFileKunden = _mainConfig.csvFileKunden;
String csvFileVerträge = _mainConfig.csvFileVerträge;
// Parse to Datatables
var dataTableKunden = getDataTableKosy(csvFileKunden);
var dataTableVerträge = getDataTableKosy(csvFileVerträge);
// Init Mapping
var mappingConfig = new Config(_mainConfig.mappingFile);
var sk = "mappingKunden";
var sv = "mappingVerträge";
// Replace ColumNames with Mapped Names
foreach (DataColumn column in dataTableKunden.Columns) {
dataTableKunden.Columns[column.ColumnName].ColumnName = mappingConfig.Read(column.ColumnName, sk);
}
foreach (DataColumn column1 in dataTableVerträge.Columns)
{
dataTableVerträge.Columns[column1.ColumnName].ColumnName = mappingConfig.Read(column1.ColumnName, sv);
}
// Replace TableNames with Mapped Names
dataTableKunden.TableName = mappingConfig.Read("Kunden", "tablemapping");
dataTableVerträge.TableName = mappingConfig.Read("Verträge", "tablemapping");
Console.ReadKey();
}
Datatable Parsing:
private static DataTable getDataTableKosy(string csvFile) {
DataTable retVal;
using (var parser = new GenericParserAdapter(csvFile)) {
parser.ColumnDelimiter = ';';
parser.FirstRowHasHeader = true;
retVal = parser.GetDataTable();
}
return retVal;
}
Testdata that I work with:
Link to file sharing (Dracoon)
We have a few columns in our tables that are not user friendly names so we change them in the actual GUI.
For example a column in the database is labeled as "IntAmt" but the user doesn't have a clue so in the program we call this column "Interest Amount".
The problem is if I use direct column mapping as shown below, then it will error as these columns do not match. There is no column in the database called "InterestAmount"
So is there a way that I can reference the correct column name in mapping with annotations or something? We are using Entity Framework as well.
var connection = DbContext.Database.Connection.ConnectionString;
using (SqlConnection sqc = new SqlConnection(connection))
{
sqc.Open();
using (SqlBulkCopy bcp = new SqlBulkCopy(sqc))
{
bcp.DestinationTableName = strTargetTable;
sourceData.Columns.Cast<DataColumn>().ToList().ForEach(x =>
bcp.ColumnMappings
.Add(new SqlBulkCopyColumnMapping(x.ColumnName, x.ColumnName)));
bcp.BatchSize = 50000;
bcp.BulkCopyTimeout = 12000;
bcp.WriteToServer(sourceData);
}
sqc.Close();
}
After further investigation we have column mappings stored in a table so the table looks like this
TableName: ColumnMappingTable
[ColumnName] [GuiColumnName]
IntAmt InterestAmount
PrinAmt PrincipalAmount
So that being said is there a way that I can map this way dynamically?
We have alot of tables and hardcoding mapping each name individually in code would take a lot of time
In the class that defines the db object add and IEnumerable of type SqlBulkCopyColumnMapping
public static IEnumerable<SqlBulkCopyColumnMapping> GetObjectNameColumnMappings()
{
return new[]
{
new SqlBulkCopyColumnMapping(nameof(propertyName), "dbColumnName"),
new SqlBulkCopyColumnMapping(nameof(propertyName2), "dbColumnName2")
};
}
and then call this method and set it equal to your column mappings
using (SqlBulkCopy bcp = new SqlBulkCopy(sqc))
{
bcp.DestinationTableName = strTargetTable;
bcp.ColumnMappings = classInstance.GetObjectNameColumnMappings();
bcp.BatchSize = 50000;
bcp.BulkCopyTimeout = 12000;
bcp.WriteToServer(sourceData);
}
Try using TableMappings.Add()
Article for reference:
I am using this code to insert a csv file to my database:
private void InsertDataIntoSQLServerUsingSQLBulkCopy(DataTable csvFileData)
{
using (SqlConnection dbConnection = new SqlConnection(ConnectionString))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = "tablename";
foreach (var column in csvFileData.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
s.WriteToServer(csvFileData);
}
}
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
My database has temporary data in it. To test this code I copied fields with headers from SQL server and pasted it into an excel file. I then saved the excel file to csv and ran the code. It adds the data from the csv to the database perfectly!
I then tried running a csv file with similar values to my original csv file and its giving me a 'String to DateTime' Exception. So I know something is up with the Dates and I know that the excel columns are in value of 'Date'.
Im really scratching my head with this one. Any good way to parse columns with dates?
I'm noticing a few issues with your code that could cause you some trouble.
There is no schema validation of the CSV file. You simple take any given CSV file and attempt to write it to the server using whatever column headers it has.
When you create a DataColumn instance, the default column type will be System.String. This is probably causing your date issues.
I don't see any transformation of the data in the CSV file. If one of the fields in your database table is a datetime and you are attempting to bulk insert a System.String column you are going to run into issues.
My suggestions would be the following:
Perform schema validation on the CSV file so you know you are getting the input you expect. This is two-fold: ensure the data is in the expected format and ensure the expected column headers exist.
For the table that you bulk insert, create column types that are appropriate for your SQL tables. Use the overload of the DataColumn constructor where you specify the column data type: new DataColumn("Name", typeof(DateTime))
Take the data you Extracted from the CSV (all as strings) and Transform it into the required format, then Load it.
The operation you are doing is a very basic ETL. It appears you have the Extract and Load portion working, the thing you are missing is the Transform component.
I am trying to use SqlBulkCopy woth the datatable to insert data in to database
I have this code
string mydemo="my demo record";
DataTable prodSalesData = new DataTable("ProductSalesData");
// Create Column 1: SaleDate
DataColumn dateColumn = new DataColumn();
dateColumn.DataType = Type.GetType("System.String");
dateColumn.ColumnName = "SaleDate";
prodSalesData.Columns.Add(dateColumn);
DataRow dailyProductSalesRow = prodSalesData.NewRow();
dailyProductSalesRow["SaleDate"] = mydemo;
// Create DbDataReader to Data Worksheet
using (OleDbDataReader dr1 = command1.ExecuteReader())
{
// Bulk Copy to SQL Server
using (SqlBulkCopy bulkCopy1 = new SqlBulkCopy(con))
{
bulkCopy1.DestinationTableName = "activity1";
bulkCopy1.ColumnMappings.Add(0, "id");
bulkCopy1.ColumnMappings.Add(1,"name");
bulkCopy1.ColumnMappings.Add(2, "activity1first");
bulkCopy1.ColumnMappings.Add(3, "activity1second");
bulkCopy1.ColumnMappings.Add(4, prodSalesData.Columns.ToString());
bulkCopy1.WriteToServer(dr1);
}
}
Here fors 4 recods comes frome xcell file but i want to insert an external data to the same table , i tried this much but it gives me this error
The given ColumnMapping does not match up with any column in the source or destination.
Any help?
Thanks
The error is triggered because you are setting the mappinng for 4 prodSalesData columns (by position) but you only have one column. The ColumnMappings should contain the columns being mapped from the source to the destination i.e.
bulkCopy1.ColumnMappings.Add("SaleDate", prodSalesData.Columns.ToString());
I am attempting to populate a DataGridView with data from a csv or comma delimited txt file the user selects. The csv gets loaded into the DataGridView, but in a certain column which contains a mixture of alpha or numeric values, if the first several values are numeric and then the data switches to alpha characters, they get dropped. See below:
Here I've imported a csv with a mix of alpha or numeric values in the cover column. The cells that should contain the alpha values are instead null.
Here I've imported a csv with only either null (the first value is supposed to be null) or alpha values. It has no issues.
It seems like perhaps there is some sort of data type guessing going on, where it thinks that the data should be numeric and nullfiies anything else.
Here's the code I'm using to import the CSV:
string conStr = #"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + Path.GetDirectoryName(loadPath) + ";Extensions=csv,txt";
OdbcConnection conn = new OdbcConnection(conStr);
OdbcDataAdapter da = new OdbcDataAdapter("Select * from [" + Path.GetFileName(loadPath) + "]", conn);
DataTable dt = new DataTable(loadPath);
da.Fill(dt);
csvTable.DataSource = dt;
Any help is appreciated.
Have you considered using a generic CSV parser that will use the structure of the data to create a datatable?
This is an easy to use, generic parser, great for flat files like CSV files.
EDIT:
To expand upon Jims comment here is an example of using the TextFieldParser in C#. This only handles your first 3 fields, but should be enough of an example to see how it works.
String myFilePath = #"c:\test.csv";
DataTable dt = new DataTable();
dt.Columns.Add("HAB_CODE");
dt.Columns.Add("SIZE");
dt.Columns.Add("COVER");
using (var myCsvFile = new TextFieldParser(myFilePath)){
myCsvFile.TextFieldType = FieldType.Delimited;
myCsvFile.SetDelimiters(",");
myCsvFile.CommentTokens = new[] { "HEADER", "COMMENT", "TRAILER" };
while (!myCsvFile.EndOfData) {
string[] fieldArray;
try {
fieldArray = myCsvFile.ReadFields();
dt.Rows.Add(fieldArray);
}
catch (Microsoft.VisualBasic.FileIO.MalformedLineException ex) {
// not a valid delimited line - log, terminate, or ignore
continue;
}
// process values in fieldArray
}
}