I'm building a .NET application that talks to an Oracle 11g database. I am trying to take data from Excel files provided by a third party and upsert (UPDATE record if exists, INSERT if not), but am having some trouble with performance.
These Excel files are to replace tariff codes and descriptions, so there are a couple thousand records in each file.
| Tariff | Description |
|----------------------------------------|
| 1234567890 | 'Sample description here' |
I did some research on bulk inserting, and even wrote a function that opens a transaction in the application, executes a bunch of UPDATE or INSERT statements, then commits. Unfortunately, that takes a long time and prolongs the session between the application and the database.
public void UpsertMultipleRecords(string[] updates, string[] inserts) {
OleDbConnection conn = new OleDbConnection("connection string here");
conn.Open();
OleDbTransaction trans = conn.BeginTransaction();
try {
for (int i = 0; i < updates.Length; i++) {
OleDbCommand cmd = new OleDbCommand(updates[i], conn);
cmd.Transaction = trans;
int count = cmd.ExecuteNonQuery();
if (count < 1) {
cmd = new OleDbCommand(inserts[i], conn);
cmd.Transaction = trans;
}
}
trans.Commit();
} catch (OleDbException ex) {
trans.Rollback();
} finally {
conn.Close();
}
}
I found via Ask Tom that an efficient way of doing something like this is using an Oracle MERGE statement, implemented in 9i. From what I understand, this is only possible using two existing tables in Oracle. I've tried but don't understand temporary tables or if that's possible. If I create a new table that just holds my data when I MERGE, I still need a solid way of bulk inserting.
The way I usually upload my files to merge, is by first inserting into a load table with sql*loader and then executing a merge statement from the load table into the target table.
A temporary table will only retain it's contents for the duration of the session. I expect sql*loader to end the session upon completion, so better use a normal table that you truncate after the merge.
merge into target_table t
using load_table l on (t.key = l.key) -- brackets are mandatory
when matched then update
set t.col = l.col
, t.col2 = l.col2
, t.col3 = l.col3
when not matched then insert
(t.key, t.col, t.col2, t.col3)
values
(l.key, l.col, l.col2, l.col3)
Related
Currently, I’ve been working to translate a whole table and put it into another table with same schema.
Given:
Since the table data rows are more than a thousand rows, it is quite hard to translate all of that in one transaction
I also need to know its datatypes since not all of the columns are translatable.
Plan:
My initial plan is to get the users by batch(e.g. top 10 first) and put it into a “datatable”. Reason is because datatable has a column list which holds the columns datatype. This plan I think is JUST OK.
Drawback:
Putting it into a datatable, I know, would be slow. I wouldnt be able to hide it even if batch it. Just a little bit mitigate it.
On the otherhand, if I put the data into a list, instead of datatable, transaction would be faster. But this will result to another sqlcommand call to get the data type schema of the table.
Question:
Is there a way I could the best of both worlds? Faster and a one call, data value and datatype together. Note, In this case, aside from the row data, I just need the data type of the column.
One technique might be to use BulkCopy. Simply read the schema off the first table. Create the target table, define column mappings and do the bulk copy. I have seen this rip through hundreds of thousands of records in seconds.
string connectionString = GetConnectionString();
// Open a sourceConnection to the AdventureWorks database.
using (SqlConnection sourceConnection =
new SqlConnection(connectionString))
{
sourceConnection.Open();
// Perform initial schema read and create target table
// Get data from the source table as a SqlDataReader.
SqlCommand commandSourceData = new SqlCommand(
"SELECT ProductID, Name, " +
"ProductNumber " +
"FROM Production.Product;", sourceConnection);
SqlDataReader reader =
commandSourceData.ExecuteReader();
// Open the destination connection.
using (SqlConnection destinationConnection =
new SqlConnection(connectionString))
{
destinationConnection.Open();
// Set up the bulk copy object.
using (SqlBulkCopy bulkCopy =
new SqlBulkCopy(destinationConnection))
{
bulkCopy.DestinationTableName =
"dbo.BulkCopyDemoMatchingColumns";
bulkCopy.ColumnMappings.Add("SourceColumn1", "TargetColumn1");
bulkCopy.ColumnMappings.Add("SourceColumn2", "TargetColumn2");
try
{
// Write from the source to the destination.
bulkCopy.WriteToServer(reader);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
finally
{
// Close the SqlDataReader. The SqlBulkCopy
// object is automatically closed at the end
// of the using block.
reader.Close();
}
}
}
}
https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/multiple-bulk-copy-operations
I'm working with an hosted service in C# asp.net core, linQ and T-SQL.
I need to make an insert one by one of records in my database.
Of course this is not a fast operation, but I'm not that experienced in this field so maybe I'm doing it wrong.
This is my code in my manager:
public void StrategyMassive(string foldpathsave)
{
using (IServiceScope scope = _services.CreateScope())
{
List<string> filesreading = new List<string>();
VUContext _context = scope.ServiceProvider.GetRequiredService<VUContext>();
List<string> filesnumber = File.ReadAllLines(foldpathsave).ToList();
filesreading = filesnumber.ToList();
filesreading.RemoveRange(0, 2);
foreach (string singlefile in filesreading)
{
//INTERNAL DATA NORMALIZATION
_repository.ImportAdd(_context, newVUL, newC2, newC3, newDATE);
_repository.Save(_context);
}
}
}
And this is my repository interface:
public void ImportAdd(VUContext _context, AVuTable newVUL, ACs2Table newC2, ACs3Table newC3, ADateTable newDATe)
{
_context.AVuTable.Add(newVU);
_context.ADateTable.Add(newDATE);
if (newC2 != null)
{
_context.ACs2Table.Add(newC2);
}
if (newC3 != null)
{
_context.ACs3Table.Add(newC3);
}
public void Save(VUContext _context)
{
_context.SaveChanges();
}
}
It everything quite simple I know, so how can I speed up this insert keeping it one by one record easly?
Start NOT using the slowest way to do it.
It starts with the way you actually load the files.
It goes on by not using SqlBulkCopy - in multiple threads possibly - to write the data to the database.
What you do is the slowest possible way - because EntityFramework is NOT an ETL tool.
Btw., one transaction per item (SaveChanges) does not help either. It maeks a super slow solution really really really super slow.
I manage to laod around 64k rows per second per thread, with 4-6 threads running in parallel.
To my experience SqlBulkCopy is the fastest way to do it. filesnumber sounds to be misnomer and I suspect you are reading a list of delimited files to be loaded to SQL Server after some normalization process. Probably that would even be faster if you do your normalization on server side, after loading the data initially to a temp file. Here is a sample SqlBulkCopy from a delimited file:
void Main()
{
Stopwatch sw = new Stopwatch();
sw.Start();
string sqlConnectionString = #"server=.\SQLExpress2012;Trusted_Connection=yes;Database=SampleDb";
string path = #"d:\temp\SampleTextFiles";
string fileName = #"combDoubledX.csv";
using (OleDbConnection cn = new OleDbConnection(
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source="+path+
";Extended Properties=\"text;HDR=No;FMT=Delimited\";"))
using (SqlConnection scn = new SqlConnection( sqlConnectionString ))
{
OleDbCommand cmd = new OleDbCommand("select * from "+fileName, cn);
SqlBulkCopy sbc = new SqlBulkCopy(scn, SqlBulkCopyOptions.TableLock,null);
sbc.ColumnMappings.Add(0,"[Category]");
sbc.ColumnMappings.Add(1,"[Activity]");
sbc.ColumnMappings.Add(5,"[PersonId]");
sbc.ColumnMappings.Add(6,"[FirstName]");
sbc.ColumnMappings.Add(7,"[MidName]");
sbc.ColumnMappings.Add(8,"[LastName]");
sbc.ColumnMappings.Add(12,"[Email]");
cn.Open();
scn.Open();
SqlCommand createTemp = new SqlCommand();
createTemp.CommandText = #"if exists
(SELECT * FROM tempdb.sys.objects
WHERE object_id = OBJECT_ID(N'[tempdb]..[##PersonData]','U'))
BEGIN
drop table [##PersonData];
END
create table ##PersonData
(
[Id] int identity primary key,
[Category] varchar(50),
[Activity] varchar(50) default 'NullOlmasin',
[PersonId] varchar(50),
[FirstName] varchar(50),
[MidName] varchar(50),
[LastName] varchar(50),
[Email] varchar(50)
)
";
createTemp.Connection = scn;
createTemp.ExecuteNonQuery();
OleDbDataReader rdr = cmd.ExecuteReader();
sbc.NotifyAfter = 200000;
//sbc.BatchSize = 1000;
sbc.BulkCopyTimeout = 10000;
sbc.DestinationTableName = "##PersonData";
//sbc.EnableStreaming = true;
sbc.SqlRowsCopied += (sender,e) =>
{
Console.WriteLine("-- Copied {0} rows to {1}.[{2} milliseconds]",
e.RowsCopied,
((SqlBulkCopy)sender).DestinationTableName,
sw.ElapsedMilliseconds);
};
sbc.WriteToServer(rdr);
if (!rdr.IsClosed) { rdr.Close(); }
cn.Close();
scn.Close();
}
sw.Stop();
sw.Dump();
}
And few sample lines from that file:
"Computer Labs","","LRC 302 Open Lab","","","10057380","Test","","Cetin","","5550123456","","cb#nowhere.com"
"Computer Labs","","LRC 302 Open Lab","","","123456789","John","","Doe","","5551234567","","jdoe#somewhere.com"
"Computer Labs","","LRC 302 Open Lab","","","012345678","Mary","","Doe","","5556666444","","mdoe#here.com"
You could create and run a list of Tasks<> doing SqlBulkCopy reading from a source (SqlBulkCopy supports a series of readers).
For faster operation you need to reduce the amount of database roundtrips
Using batching of statements feature in EF Core
You can see this feature is available only in EF Core, so you need to migrate to using EF Core if you are still using EF 6.
Compare EF Core & EF6
For this feature to work you need to move the Save operation outside of the loop.
Bulk insert
Bulk insert feature is designed to be the fastest way to insert large amount of database records
Bulk Copy Operations in SQL Server
To use it you need to use the SqlBulkCopy class for SQL Server and your code needs considerable rework.
I dont know how to do this query in c#.
There are two databases and each one has a table required for this query. I need to take the data from one database table and update the other database table with the corresponding payrollID.
I have two tables in seperate databases, Employee which is in techData database and strStaff in QLS database. In the employee table I have StaffID but need to pull the PayrollID from strStaff.
Insert payrollID into Employee where staffID from strStaff = staffID from Employee
However I need to get the staffID and PayrollID from strStaff before I can do the insert query.
This is what I have got so far but it wont work.
cn.ConnectionString = ConfigurationManager.ConnectionStrings["PayrollPlusConnectionString"].ConnectionString;
cmd.Connection = cn;
cmd.CommandText = "Select StaffId, PayrollID From [strStaff] Where (StaffID = #StaffID)";
cmd.Parameters.AddWithValue("#StaffID", staffID);
//Open the connection to the database
cn.Open();
// Execute the sql.
dr = cmd.ExecuteReader();
// Read all of the rows generated by the command (in this case only one row).
For each (dr.Read()) {
cmd.CommandText = "Insert into Employee, where StaffID = #StaffID";
}
// Close your connection to the DB.
dr.Close();
cn.Close();
Assuminig, you want to add data to existing table, you have to use UPDATE + SELECT statement (as i mentioned in a comment to the question). It might look like:
UPDATE emp SET payrollID = sta.peyrollID
FROM Emplyoee AS emp INNER JOIN strStaff AS sta ON emp.staffID = sta.staffID
I have added some clarity to your question: the essential part is that you want to create a C# procedure to accomplish your task (not using SQL Server Management Studio, SSIS, bulk insert, etc). Pertinent to this, there will be 2 different connection objects, and 2 different SQL statements to execute on those connections.
The first task would be retrieving data from the first DB (for certainty let's call it source DB/Table) using SELECT SQL statement, and storing it in some temporary data structure, either per row (as in your code), or the entire table using .NET DataTable object, which will give substantial performance boost. For this purpose, you should use the first connection object to source DB/Table (btw, you can close that connection as soon as you get the data).
The second task would be inserting the data into second DB (target DB/Table), though from your business logic it's a bit unclear how to handle possible data conflicts if records with identical ID already exist in the target DB/Table (some clarity needed). To complete this operation you should use the second connection object and second SQL query.
The sample code snippet to perform the first task, which allows retrieving entire data into .NET/C# DataTable object in a single pass is shown below:
private static DataTable SqlReadDB(string ConnString, string SQL)
{
DataTable _dt;
try
{
using (SqlConnection _connSql = new SqlConnection(ConnString))
{
using (SqlCommand _commandl = new SqlCommand(SQL, _connSql))
{
_commandSql.CommandType = CommandType.Text;
_connSql.Open();
using (SqlCeDataReader _dataReaderSql = _commandSql.ExecuteReader(CommandBehavior.CloseConnection))
{
_dt = new DataTable();
_dt.Load(_dataReaderSqlCe);
_dataReaderSql.Close();
}
}
_connSqlCe.Close();
return _dt;
}
}
catch { return null; }
}
The second part (adding data to target DB/Table) you should code based on the clarified business logic (i.e. data conflicts resolution: do you want to update existing record or skip, etc). Just iterate through the data rows in DataTable object and perform either INSERT or UPDATE SQL operations.
Hope this may help. Kind regards,
Well i have a file.sql that contains 20,000 of insert commands
Sample From the .sql file
INSERT INTO table VALUES
(1,-400,400,3,154850,'Text',590628,'TEXT',1610,'TEXT',79);
INSERT INTO table VALUES
(39,-362,400,3,111659,'Text',74896,'TEXT',0,'TEXT',14);
And i am using the following code to create an in memory Sqlite database and pull the values into it then calculate the time elapsed
using (var conn = new SQLiteConnection(#"Data Source=:memory:"))
{
conn.Open();
var stopwatch = new Stopwatch();
stopwatch.Start();
using (var cmd = new SQLiteCommand(conn))
{
using (var transaction = conn.BeginTransaction())
{
cmd.CommandText = File.ReadAllText(#"file.sql");
cmd.ExecuteNonQuery();
transaction.Commit();
}
}
var timeelapsed = stopwatch.Elapsed.TotalSeconds <= 60
? stopwatch.Elapsed.TotalSeconds + " seconds"
: Math.Round(stopwatch.Elapsed.TotalSeconds/60) + " minutes";
MessageBox.Show(string.Format("Time elapsed {0}", timeelapsed));
conn.Close();
}
Things i have tried
Using file database instead of memory one.
Using begin transaction and commit transaction [AS SHOWN IN MY CODE].
Using Firefox's extension named SQLite Manager to test whether the
slowing down problem is from the script; However, I was surprised
that the same 20,000 lines that i am trying to process using my code
has been pulled to the database in JUST 4ms!!!.
Using PRAGMA synchronous = OFF, as well as, PRAGMA journal_mode =
MEMORY.
Appending begin transaction; and commit transaction; to the
beginning and ending of the .sql file respectively.
As the SQLite documentations says : SQLite is capable of processing 50,000 commands per seconds. And that is real and i made sure of it using the SQLite Manager [AS DESCRIPED IN THE THIRD SOMETHING THAT I'V TRIED]; However, I am getting my 20,000 commands done in 4 minutes something that tells that there is something wrong.
QUESTION : What is the problem am i facing why is the Execution done very slowly ?!
SQLite.Net documentation recommends the following construct for transactions
using (SqliteConnection conn = new SqliteConnection(#"Data Source=:memory:"))
{
conn.Open();
using(SqliteTransaction trans = conn.BeginTransaction())
{
using (SqliteCommand cmd = new SQLiteCommand(conn))
{
cmd.CommandText = File.ReadAllText(#"file.sql");
cmd.ExecuteNonQuery();
}
trans.Commit();
}
con.Close();
}
Are you able to manipulate the text file contexts to something like:
INSERT INTO table (col01, col02, col03, col04, col05, col06, col07, col08, col09, col10, col11)
SELECT 1,-400,400,3,154850,'Text',590628,'TEXT',1610,'TEXT',79
UNION ALL
SELECT 39,-362,400,3,111659,'Text',74896,'TEXT',0,'TEXT',14
;
Maybe try "batching them" into groups of 100 as a initial test.
http://sqlite.org/lang_select.html
SqlLite seems to support the UNION ALL statement.
I'm working with 2 SQL 2008 Servers on different machines. The server names are source.ex.com, and destination.ex.com.
destination.ex.com is linked to source.ex.com and the appropriate permissions are in place for source.ex.com to write to a database called bacon-wrench on destination.ex.com
I've logged into source.ex.com via SMS and tested this query (successfully):
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (4,6);
In a C# .NET 4.0 WebPage I connect to source.ex.com and perform a similar query (successfully):
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
String sql = #"
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (34,56);";
using(SqlCommand cmd = new SqlCommand(sql, c))
{
cmd.ExecuteNonQuery();
}
}
For small sets of insert statements (say 20 or less) doing something like this performs fine:
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
String sql = #"
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (34,56);
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (22,11);
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (33,55);
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (1,2);";
using(SqlCommand cmd = new SqlCommand(sql, c))
{
cmd.ExecuteNonQuery();
}
}
I'm trying to do something like this with around 20000 records. The above method takes 11 minutes to complete -- which I assume is the server sreaming at me to make it some kind of bulk operation. From other StackOverflow threads the SqlBulkCopy class was recommended and it takes as a parameter DataTable, perfect!
So I build a DataTable and attempt to write it to the server (fail):
DataTable dt = new DataTable();
dt.Columns.Add("PunchID", typeof(int));
dt.Columns.Add("BaconID", typeof(int));
for(int i = 0; i < 20000; i++)
{
//I realize this would make 20000 duplicate
//rows but its not important
dt.Rows.Add(new object[] {
11, 33
});
}
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
using(SqlBulkCopy bulk = new SqlBulkCopy(c))
{
bulk.DestinationTableName = "[destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]";
bulk.ColumnMappings.Add("PunchID", "PunchID");
bulk.ColumnMappings.Add("BaconID", "BaconID");
bulk.WriteToServer(dt);
}
}
EDIT2: The below message is what I'm attempting to fix:
The web page crashes at bulk.WriteToServer(dt); with an error message Database bacon-wrench does not exist please ensure it is typed correctly. What am I doing wrong? How do I change this to get it to work?
EDIT1:
I was able to speed up the query significantly using the below syntax. But it is still very slow for such a small record set.
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
String sql = #"
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES
(34,56),
(22,11),
(33,55),
(1,2);";
using(SqlCommand cmd = new SqlCommand(sql, c))
{
cmd.ExecuteNonQuery();
}
}
If you are using SQL Server 2008+, you can introduce a Table user datatype. Prepare the type, receiving table and stored procedure something like below. Data type and stored procedure is on the local system. I generally have an if statement in the code detecting whether the table is remote or local, remote I do this, local I use SqlBulkCopy.
if(TYPE_ID(N'[Owner].[TempTableType]') is null)
begin
CREATE TYPE [Owner].[TempTableType] AS TABLE ( [PendingID] uniqueidentifier, [Reject] bit)
end
IF NOT EXISTS (SELECT * FROM [LinkedServer].[DatabaseOnLS].sys.tables where name = 'TableToReceive')
EXEC('
CREATE TABLE [DatabaseOnLS].[Owner].[TableToReceive] ( [PendingID] uniqueidentifier, [Reject] bit)
') AT [LinkedServer]
else
EXEC('
TRUNCATE TABLE [DatabaseOnLS].[Owner].[TableToReceive]
') AT [LinkedServer]
CREATE PROCEDURE [Owner].[TempInsertTable]
#newTableType TempTableType readonly
AS
BEGIN
insert into [LinkedServer].[DatabaseOnLS].[Owner].[TableToReceive] select * from #newTableType
END
In the C# code you can then do something like this to insert the DataTable into the table on the linked server (I'm using an existing UnitOfWork, which already have a connection and transaction):
using (var command = new SqlCommand("TempInsertTable",
oUoW.Database.Connection as SqlConnection) { CommandType = CommandType.StoredProcedure }
)
{
command.Transaction = oUoW.Database.CurrentTransaction as SqlTransaction;
command.Parameters.Add(new SqlParameter("#newTableType", oTempTable));
drResults = command.ExecuteReader();
drResults.Close();
}
After trying a number of things including linked server settings, collations, synonyms, etc., I eventually got to this error message:
Inserting into remote tables or views is not allowed by using the BCP utility or by using BULK INSERT.
Perhaps you can bulk insert to a staging table on your local server (your code works fine for this) and then insert from that staging table to your linked server from there, followed by a local delete of the staging table. You'll have to test for performance.