C# method to insert to Teradata quickly - c#

I'm trying to write a method in C# which when passed a System.Data.DataTable and destination database and table name, inserts the data.
I have this working well for SQL Server destinations, but for Teradata destinations the method I have come up with is incredibly slow by comparison (~70x time), which for some of the data my users will need to upload is unacceptable.
I'm posting both sets of code below, hoping someone can help me with a better method for Teradata please?
Teradata (using Teradata.Client.Provider library):
public static void UploadTDData(DataTable dtLogData, string destination)
{
// Get destination table structure without data
var query = string.Format("SELECT * FROM {0} where 0 = 1", destination);
using (TdDataAdapter insertAdapter = new TdDataAdapter(query, myTeradataConnection))
{
DataTable dt = new System.Data.DataTable();
insertAdapter.Fill(dt);
foreach (DataRow row in dtLogData.Rows)
{
DataRow newRow = dt.NewRow();
newRow.ItemArray = row.ItemArray;
dt.Rows.Add(newRow);
}
TdCommandBuilder builder = new TdCommandBuilder(insertAdapter);
//No limit on batch size
insertAdapter.UpdateBatchSize = 0;
insertAdapter.Update(dt);
}
}
SQL Server:
public static void UploadData(DataTable dtLogData, string destination)
{
using (System.Data.SqlClient.SqlBulkCopy sqlBulkCopy = new System.Data.SqlClient.SqlBulkCopy(mySQLConnection))
{
sqlBulkCopy.DestinationTableName = destination;
sqlBulkCopy.WriteToServer(dtLogData);
}
}

Related

Exporting CSV to SQL in C# - how to offset the export by one column

I am using C# to parse a csv file and export to a SQL Server database table. The schema of the database table is almost identical to that of the csv file, with the exception that the table has a Primary Key Identity column as the first column.
The problem: the 2nd column of the database table, which should receive the 1st column of the csv file, is actually receiving the 2nd column of the csv file. The code is assuming that first PK Identity column of the database table is the first column to be written to from the CSV file. In case this is confusing, assume column 1, 2, and 3 of the CSV file have headers called Contoso1, Contoso2 and Contoso3, respectively. The database table's columns 1 through 4 are called Id, Contoso1, Contoso2, and Contoso3, respectively. During the export, the Id column correctly gets populated with the identity id, but then the Contoso1 column of the database table gets populated with the Contoso2 column of the CSV file, and that being off by one column continues on for all 300 columns.
Here is the code. I'm looking for a way to do a one-column offset with this code. If possible, I'd like to avoid hardcoding a mapping scheme as there are 300+ columns.
using System;
using System.Data.SqlClient;
using System.Data;
using Microsoft.VisualBasic.FileIO;
namespace CSVTest
{
class Program
{
static void Main(string[] args)
{
string csv_file_path = #"pathToCsvFile";
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
InsertDataIntoSQLServerUsingSQLBulkCopy(csvData);
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
return null;
}
return csvData;
}
static void InsertDataIntoSQLServerUsingSQLBulkCopy(DataTable csvFileData)
{
using (SqlConnection dbConnection = new SqlConnection("Data Source=localhost;Initial Catalog=Database_Name;Integrated Security=SSPI;"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = "TableName";
//foreach (var column in csvFileData.Columns)
//s.ColumnMappings.Add(column.ToString(), column.ToString());
s.WriteToServer(csvFileData);
}
}
}
}
}
I'm assuming that -
a. only column needs to be skipped, but this can be modified to add multiple columns to skip
b. you know, ahead of time, the zero based index of the column to skip.
With that out of the way, here are the 3 modifications you need to make.
Add the variable to store the index to skip
string csv_file_path = #"pathToCsvFile";
//Assuming just one index for the column number to skip - zero based counting.
//perhaps read from the AppConfig
int columnIndexToSkip = 0;
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path, columnIndexToSkip);
Modify the function signature to take the extra int parameter
private static DataTable GetDataTabletFromCSVFile(string csv_file_path, int columnIndexToSkip)
{
Add the dummy column at that index
csvData.Rows.Add(fieldData);
}
}// end of while (!csvReader.EndOfData) loop
if (columnIndexToSkip >= 0)
{
csvData.Columns.Add("DUMMY").SetOrdinal(columnIndexToSkip);
}
I've not tested the import, but the updated csv file looks good to me.

DataSet not saving the data

This is the first time I am using DataSet. Below is my code
var transactionSet = new ModelExecutionContext()
{
TransactionSet = new DataSet()
{
Tables = { new DataTable()
{
TableName = "transaction_history"
}
}
}
};
transactionSet.TransactionSet.Tables["transaction_history"].Columns.Add().ColumnName = "retailer_reference_id";
var retailerReferenceIdRow = transactionSet.TransactionSet.Tables["transaction_history"].NewRow();
retailerReferenceIdRow["retailer_reference_id"] = 8;
transactionSet.TransactionSet.AcceptChanges();
I am unit testing a method in a class which has the datasets. I am trying to mock those datasets. I thought transactionSet.TransactionSet.AcceptChanges(); will save the changes into the DataSet, but in the execution, I am getting context?.TransactionSet?.Tables["transaction_history"]?.Rows.Count = 0
Is anything incorrect with my code?
After you created object of row you need to add row to table.
transactionSet.TransactionSet.Tables["transaction_history"].Rows.Add(retailerReferenceIdRow);

C# read excel data using oledb and format it in specified format and insert into SQL Server database

net and C#. I need to write a program to browse and read an excel and then parse it in specified format and finally insert into sql server database.
I have used oledb to read excel and I created DataTable from excel. Now I'm having a trouble to parse it in required format. Here is the link for the picture of what is excel input and what is expected format to insert into database.
Input and expected output format
Right now I'm doing with simple data in future I need to do in for large excel data around (3000 columns) to parse into some 250000 records. Please also give me advise in terms of performance wise. Right now I'm using oledb is it fine or do I need to use anything else.
Here is my sample code c# code file
OleDbConnection Econ;
SqlConnection con;
string constr, Query, sqlconn;
protected void Page_Load(object sender, EventArgs e)
{
}
// excel connection
private void ExcelConn(string FilePath)
{
constr = string.Format(#"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=""Excel 12.0 Xml;HDR=YES;""", FilePath);
Econ = new OleDbConnection(constr);
}
// sql connection
private void connection()
{
sqlconn = ConfigurationManager.ConnectionStrings["SqlCom"].ConnectionString;
con = new SqlConnection(sqlconn);
}
// read data from excel and creating a datatable
private void ExcelToDataTable(string FilePath)
{
ExcelConn("C:\\Users\\username\\Desktop\\EmpEx.xlsx");
Query = string.Format("Select * FROM [Sheet1$]");
OleDbCommand Ecom = new OleDbCommand(Query, Econ);
Econ.Open();
OleDbDataAdapter oda = new OleDbDataAdapter(Ecom);
DataTable dtExcel = new DataTable();
Econ.Close();
oda.Fill(dtExcel);
// DataTable parseTable = ParseDataTable(dtExcel);
//connection();
// printing data table
foreach (DataRow dataRow in dtExcel.Rows)
{
foreach (var item in dataRow.ItemArray)
{
Response.Write(item);
}
}
Response.Write("<br> Colums: " + dtExcel.Columns.Count.ToString() + "<br>");
Response.Write("Rows: " + dtExcel.Rows.Count.ToString() + "<br>");
//print on screen
foreach(DataRow row in dtExcel.Rows)
{
foreach(DataColumn col in dtExcel.Columns)
{
Label1.Text = Label1.Text + row[col].ToString() + "\t";
}
}
}
// Method to make data table in specified format
public DataTable ParseDataTable(DataTable dtExcel)
{
var dt = new DataTable("sourceData");
dt.Columns.Add(new DataColumn("id", typeof(String)));
dt.Columns.Add(new DataColumn("name", typeof(String)));
dt.Columns.Add(new DataColumn("variable", typeof(String)));
dt.Columns.Add(new DataColumn("year", typeof(String)));
dt.Columns.Add(new DataColumn("value", typeof(String)));
// NOT GETTING TO PARSE In specified format
/**** NEED HELP HERE *****/
return dt;
}
protected void Button1_Click(object sender, EventArgs e)
{
string CurrentFilePath = Path.GetFullPath(FileUpload1.PostedFile.FileName);
ExcelToDataTable(CurrentFilePath);
}
Please help me how can I achieve this. How can I parse input excel data in specified format as mentioned in the attached picture in the link (screenshot). Please suggest me any way to fix my problem.
I solved this problem with using C# OLEDB ACE engine. Currently it supports only around 250 columns. It satisfies my requirement so far.
Solution is I'm able to get the sheet name and sheet range through code for the input file. I copied input file into a C# oledb datatable inputtable , using that datatable I created another formatted datatable which holds the values from inputtable based on conditional logic. I used linq to query the datatable in order to generate the formatted result.
on button click:
string rangeStringwithSHeet = sheetName + excelComm.GetRange(sheetName, excelConn);
dataQuery = string.Format("SELECT Institution" + queryIn + " FROM [{0}] ", rangeStringwithSHeet);
// connect to excel with query and get the initiall datatable from excel input
dataExcelTable = excelComm.FillDataTableWithQuery(dataQuery, excelConn);
formattedDataTableExcel(dataExcelTable);
The actual conversion logic I included in formattedDataTableExcel() method, where I created this for my web application. I wrote logic according to my business logic.
I'm not posting the actual logic here. If anyone have a similar issue let me know I can help with the conversion process.
My recommendation would be to re-think your tool. This would be much easier in a tool like SQL Server Integration Services (SSIS) or other tools whose sole purpose is this.
From the SSIS Wiki article, "SSIS is a platform for data integration and workflow applications."
From the C# Wiki article "C# (pronounced as see sharp) is a multi-paradigm programming language".
I have created a solution for unpivoting the data in F# which can be found here. Since F# works on the .NET CLR you could call this from C# or could translate it to C# using linq equivalent operations.
// Sample Input as a jagged array
let sampleInput =
[| [| "id"; "name"; "variable1"; "variable1"; "variable2" |]
[| ""; ""; "Fall 2000"; "Fall 2001"; "Fall 2000" |]
[| "1"; "abc"; "1400"; "1500"; "1200" |]
[| "2"; "xyz"; "1200"; "1400"; "1100" |] |]
let variables = sampleInput.[0].[2 ..]
let falls = sampleInput.[1].[2 ..]
let idNameValues = sampleInput.[2 ..] |> Array.map (fun value -> (value.[0], value.[1], value.[2 ..]))
// Output as an array of tuples
let output =
idNameValues
|> Array.collect (fun (id, name, values) ->
Array.zip3 variables falls values // Zip up the variables, falls and values data arrays for each old id, name combination
|> Array.mapi (fun i (variable, fall, value) -> (i, int id, name, variable, fall, value)) // Flatten out over the row id, old id index and name
)
|> Array.sortBy (fun (row, id, _, _, _, _) -> (row, id)) // Sort by row id and old id index
|> Array.mapi (fun i (_, _, name, variable, fall, value) -> (i + 1, name, variable, fall, int value)) // Add new id index
printfn "SampleInput =\n %A" sampleInput
printfn "Output =\n %A" output
I have actually had a go at translating the F# code to C#. I am sure you could probably write more idiomatic C# code here and performance is probably lacking a bit too with the massive amounts of linq but it seems to work!
You can see it working in .NET Fiddle here.
using System;
using System.Linq;
public class Program
{
public static string[][] SampleInput()
{
return new string[][]{
new string[] { "id", "name", "variable1", "variable1", "variable2" },
new string[] { "", "", "Fall 2000", "Fall 2001", "Fall 2000" },
new string[] { "1", "abc", "1400", "1500", "1200" },
new string[] { "2", "xyz", "1200", "1400", "1100" }
};
}
public static Tuple<int, string, string, string, int>[] Unpivot(string[][] flattenedInput)
{
var variables = (flattenedInput[0]).Skip(2).ToArray();
var falls = (flattenedInput[1]).Skip(2).ToArray();
var idNameValues = flattenedInput.Skip(2).Select(idNameValue => Tuple.Create(idNameValue[0], idNameValue[1], idNameValue.Skip(2))).ToArray();
return
idNameValues
.SelectMany(idNameValue => variables
.Zip(falls, (variable, fall) => Tuple.Create(variable, fall))
.Zip(idNameValue.Item3, (variableFall, val) => Tuple.Create(variableFall.Item1, variableFall.Item2, val))
.Select((variableFallVal, i) => Tuple.Create(i + 1, Convert.ToInt32(idNameValue.Item1), idNameValue.Item2, variableFallVal.Item1, variableFallVal.Item2, variableFallVal.Item3))
)
.OrderBy(rowId_ => Tuple.Create(rowId_.Item1, rowId_.Item2))
.Select((_NameVariableFallValue, i) => Tuple.Create(i + 1, _NameVariableFallValue.Item3, _NameVariableFallValue.Item4, _NameVariableFallValue.Item5, Convert.ToInt32(_NameVariableFallValue.Item6)))
.ToArray();
}
public static void Main()
{
var flattenedData = SampleInput();
var normalisedData = Unpivot(SampleInput());
Console.WriteLine("SampleInput =");
foreach (var row in SampleInput())
{
Console.WriteLine(Tuple.Create(row[0], row[1], row[2], row[3], row[4]).ToString());
}
Console.WriteLine("\nOutput =");
foreach (var row in normalisedData)
{
Console.WriteLine(row.ToString());
}
}
}
Edit: Below is an example of converting an excel file represented by a file path to a jagged string array. In this case I have used the Nuget Package ExcelDataReader to get the data from Excel.
using System;
using System.IO;
using System.Data;
using System.Collections.Generic;
using System.Linq;
using Excel; // Install Nuget Package ExcelDataReader
public class Program
{
public static string[][] ExcelSheetToJaggedArray(string fileName, string sheetName)
{
using (var stream = File.Open(fileName, FileMode.Open, FileAccess.Read))
{
using (var excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream))
{
var data =
excelReader.AsDataSet().Tables
.Cast<DataTable>()
.FirstOrDefault(sheet => sheet.TableName == sheetName);
return
data.Rows
.Cast<DataRow>()
.Select(row =>
row.ItemArray
.Select(cell => cell.ToString()).ToArray())
.ToArray();
}
}
}
public static void Main()
{
// Sample use of ExcelSheetToJaggedArray function
var fileName = #"C:\SampleInput.xlsx";
var jaggedArray = ExcelSheetToJaggedArray(fileName, "Sheet1");
foreach (var row in jaggedArray)
{
foreach (var cell in row)
{
Console.Write(cell.ToString() + ",");
}
Console.WriteLine();
}
}
}

DataAdapter .Update does not update back table

my problem is very common, but I have not found any solution.
This is my code:
public async Task<QueryResult> RollbackQuery(ActionLog action)
{
var inputParameters = JsonConvert.DeserializeObject<Parameter[]>(action.Values);
var data = DeserailizeByteArrayToDataSet(action.RollBackData);
using (var structure = PrepareStructure(action.Query, action.Query.DataBase, inputParameters))
{
//_queryPlanner is the implementor for my interface
return await _queryPlanner.RollbackQuery(structure, data);
}
}
I need to load DataTable (from whereever) and replace data to database. This is my Rollback function. This function use a "CommandStructure" where I've incapsulated all SqlClient objects. PrepareStructure initialize all objects
//_dataLayer is an Helper for create System.Data.SqlClient objects
//ex: _dataLayer.CreateCommand(preSelect) => new SqlCommand(preSelect)
private CommandStructure PrepareStructure(string sql, string preSelect, DataBase db, IEnumerable<Parameter> inputParameters)
{
var parameters = inputParameters as IList<Parameter> ?? inputParameters.ToList();
var structure = new CommandStructure(_logger);
structure.Connection = _dataLayer.ConnectToDatabase(db);
structure.SqlCommand = _dataLayer.CreateCommand(sql);
structure.PreSelectCommand = _dataLayer.CreateCommand(preSelect);
structure.QueryParameters = _dataLayer.CreateParemeters(parameters);
structure.WhereParameters = _dataLayer.CreateParemeters(parameters.Where(p => p.IsWhereClause.HasValue && p.IsWhereClause.Value));
structure.CommandBuilder = _dataLayer.CreateCommandBuilder();
structure.DataAdapter = new SqlDataAdapter();
return structure;
}
So, my function uses SqlCommandBuilder and DataAdapter to operate on Database.
PreSelectCommand is like "Select * from Purchase where CustomerId = #id"
The table Purchase has one primaryKey on ID filed
public virtual async Task<QueryResult> RollbackQuery(CommandStructure cmd, DataTable oldData)
{
await cmd.OpenConnectionAsync();
int record = 0;
using (var cmdPre = cmd.PreSelectCommand as SqlCommand)
using (var dataAdapt = new SqlDataAdapter(cmdPre))
using (var cmdBuilder = new SqlCommandBuilder(dataAdapt))
{
dataAdapt.UpdateCommand = cmdBuilder.GetUpdateCommand();
dataAdapt.DeleteCommand = cmdBuilder.GetDeleteCommand();
dataAdapt.InsertCommand = cmdBuilder.GetInsertCommand();
using (var tbl = new DataTable(oldData.TableName))
{
dataAdapt.Fill(tbl);
dataAdapt.FillSchema(tbl, SchemaType.Source);
tbl.Merge(oldData);
foreach (DataRow row in tbl.Rows)
{
row.SetModified();
}
record = dataAdapt.Update(tbl);
}
}
return new QueryResult
{
RecordAffected = record
};
}
I Execute the code and I don't have any errors, but the data are not updated.
variable "record" contain the right number of modified (??) record, but..... on the table nothing
can someone help me?
EDIT 1:
With SQL Profiler I saw that no query is executed on DB. Only select query on .Fill(tbl) command.
EDIT 2:
Now I have made one change:
tbl.Merge(oldData) => tbl.Merge(oldData, true)
so I see perform the expected query but, with reversed parameters.
UPDATE Purchase SET price=123 where id=6 and price=22
instead of
UPDATE Purchase SET price=22 where id=6 and price=123

Reading NUnit test data from Excel - how to access testdata?

I'm trying to use Excel data reader introduced here http://fabiouechi.blogspot.fi/2010/07/excel-data-driven-tests-with-nunit.html to read data for my NUnit tests.
My test data has several columns - like status, running, pressure, p_prev, temperature - and over 200 rows in excel file.
I'm using the following code to read test cases.
public static IEnumerable<TestCaseData> TestCaseData_T3003
{
get
{
var testcases = ExcelTestCaseDataReader.New()
.FromFileSystem(#"C:\Tests\Test data.xlsx")
.AddSheet("T3003")
.GetTestCases(delegate(string sheet, DataRow row, int rowNum)
{
var testName = sheet + rowNum;
//var category = Convert.ToString(row["col1"]);
IDictionary testDataArgs = new Hashtable();
var testData = new TestCaseData(testDataArgs).SetName(testName);
return testData;
}
);
foreach (TestCaseData testCaseData in testcases)
{
yield return testCaseData;
}
}
}
public List<TestCaseData> GetTestCases(Func<string, DataRow, int, TestCaseData> testCaseDataCreator)
{
var testDataList = new List<TestCaseData>();
IExcelDataReader excelReader = GetExcelReader(ExcelFile);
excelReader.IsFirstRowAsColumnNames = true;
DataSet result = excelReader.AsDataSet();
foreach (var sheet in Sheets)
{
var sheetTable = result.Tables[sheet];
var i = 0;
foreach (DataRow dr in sheetTable.Rows)
{
testDataList.Add(testCaseDataCreator(sheet, dr, i));
i = i + 1;
}
}
excelReader.Close();
return testDataList;
}
and the actual test, which uses data from excel, is still very raw.
[Test]
[TestCaseSource("TestCaseData_T3003")]
public void T3003_Excel(IDictionary testData)
{
//do the assertions here
}
The question is, how do I access the test data in my test procedure? How do I refer to the value in a column "status" or "pressure"?
Nunit finds all rows in my test data, because it runs the test for 214 times.
But, when I debug my code and bread in T3003_Excel, the property testData.Count is zero. So is the length of the key collection of hashtable testData.Keys. (testData.Keys.Count = 0)
Any suggestions or help?
You're just adding an empty HashTable to the test case data; you need to actually put something in it. Your delegate should be something like this:
...
.GetTestCases(delegate(string sheet, DataRow row, int rowNum)
{
var testDataArgs = new Dictionary<string, object>();
foreach (DataColumn column in row.Table.Columns)
{
testDataArgs[column.ColumnName] = row[column];
}
var testName = sheet + rowNum;
return new TestCaseData(testDataArgs).SetName(testName);
}

Categories

Resources