I have a customer entity with 10 Properties.
7 of those properties are saved in the customer table.
3 of those properties are saved in the test table.
The 3 properties in test table are CustomerId, Label, Text.
When I query these 3 properties I get 3 dataset like this:
CustomerId | Label | Text
1005 | blubb | What a day
1006 | hello | Sun is shining
0007 | |
When I save them I have to call my stored procedure 3 times on the test table
In my SP I check wether the dataset with the specific customerId AND Label already exists
then I do an UPDATE else an INSERT.
How would you call the stored procedure 3 times with all CommandText, CommandType, ExecuteNonQuery etc stuff ?
The easiest way : use the TransactionScope class.
Simply put the call into a block like :
using(TransactionScope ts = new TransactionScope()){
using(SqlConnection conn = new SqlConnection(myconnstring)
{
conn.Open();
... do the call to sproc
ts.Complete();
conn.Close();
}
}
[Edit] I also added the SqlConnection, because I'm very fan of this pattern. The using keyword ensure the connection is closed and the transcation rollback if something wrong happened
Well, a SqlTransaction spanning three ExecuteNonQuery is the simplest, but some alternatives:
use the XML datatype to pass all three in as XML; parse the XML (SQL server has functions for this) in the sproc into 3 records
use a "table valued parameter" to pass them in a single call - note this needs additional definition at the DB to represent the structured data
if the data volume is huge (3000 rather than 3), SqlBulkCopy into a staging table, then run a sproc to move the data into the real table in one set-based operation
Finally, watch out for the "inner platform effect" - it sounds a bit like a DB inside a DB.
There are several classes that inherit from DbTransaction. The documentation for SqlTransaction has sample code.
You should encapsulate your INSERTs into a transaction. The bad way to do it is to use a TransactionScope in ADO.NET, the good way is to write a stored procedure and BEGIN and COMMIT/ROLLBACK your transaction inside you proc. You don't want to go back and forth form client to server while maintaing a transaction, because you will hurt concurreny and performance (exclusive locks are hold on the resources inserted until the transaction ends).
BEGIN TRAN
BEGIN TRY
INSERT
INSERT
COMMIT TRAN
END TRY
BEGIN CATCH
PRINT ERROR_MESSAGE() -- you can use THROW in SQL Server 2012 to retrhrow the error
ROLLBACK
END CATCH
Related
I want to perform bulk insert from CSV to MySQL database using C#, I'm using MySql.Data.MySqlClient for connection. CSV columns are refereed into multiple tables and they are dependent on primary key value, for example,
CSV(column & value): -
emp_name, address,country
-------------------------------
jhon,new york,usa
amanda,san diago,usa
Brad,london,uk
DB Schema(CountryTbl) & value
country_Id,Country_Name
1,usa
2,UK
3,Germany
DB Schema(EmployeeTbl)
Emp_Id(AutoIncrement),Emp_Name
DB Schema(AddressTbl)
Address_Id(AutoIncrement), Emp_Id,Address,countryid
Problem statement:
1> Read data from CSV to get the CountryId from "CountryTbl" for respective employee.
2> Insert data into EmployeeTbl and AddressTbl with CountryId
Approach 1
Go as per above problem statement steps, but that will be a performance hit (Row-by-Row read and insert)
Approach 2
Use "Bulk Insert" option "MySqlBulkLoader", but that needs csv files to read, and looks that this option is not going to work for me.
Approach 3
Use stored proc and use the procedure for upload. But I don't want to use stored proc.
Please suggest if there is any other option by which I can do bulk upload or suggest any other approach.
Unless you have hundreds of thousands of rows to upload, bulk loading (your approach 2) probably is not worth the extra programming and debugging time it will cost. That's my opinion, for what it's worth (2x what you paid for it :)
Approaches 1 and 3 are more or less the same. The difference lies in whether you issue the queries from c# or from your sp. You still have to work out the queries. So let's deal with 1.
The solutions to these sorts of problems depend on make and model of RDBMS. If you decide you want to migrate to SQL Server, you'll have to change this stuff.
Here's what you do. For each row of your employee csv ...
... Put a row into the employee tbl
INSERT INTO EmployeeTbl (Emp_Name) VALUES (#emp_name);
Notice this query uses the INSERT ... VALUES form of the insert query. When this query (or any insert query) runs, it drops the autoincremented Emp_Id value where a subsequent invocation of LAST_INSERT_ID() can get it.
... Put a row into the address table
INSERT INTO AddressTbl (Emp_Id,Address,countryid)
SELECT LAST_INSERT_ID() AS Emp_Id,
#address AS Address,
country_id AS countryid
FROM CountryTbl
WHERE Country_Name = #country;
Notice this second INSERT uses the INSERT ... SELECT form of the insert query. The SELECT part of all this generates one row of data with the column values to insert.
It uses LAST_INSERT_ID() to get Emp_Id,
it uses a constant provided by your C# program for the #address, and
it looks up the countryid value from your pre-existing CountryTbl.
Notice, of course, that you must use the C# Parameters.AddWithValue() method to set the values of the # parameters in these queries. Those values come from your CSV file.
Finally, wrap each thousand rows or so of your csv in a transaction, by preceding their INSERT statements with a START TRANSACTION; statement and ending them with a COMMIT; statement. That will get you a performance improvement, and if something goes wrong the entire transaction will get rolled back so you can start over.
I am executing a stored procedure against a database through a C# application. I would like to do computations after the stored procedure is executed and then after the computations are done, I'd like to roll back the database to its state prior to the stored procedure. Most of the examples I've seen on stack overflow only involve using a rollback in a catch block of a try/catch block in the event of an error, but that's different from what I'm doing.
I'm not sure if I should be saving the state of the database at some point, and then do a transaction roll back with that state, or should be attaching a transaction parameter to the SqlCommand instance of the stored procedure, or something else.
You can do this via transactions. Sample is here: https://msdn.microsoft.com/en-us/library/86773566(v=vs.110).aspx Yes, it also uses catch block but you don't have to.
Alternatively you can use database snapshots if your edition of SQL Server supports them but they will roll back ALL CHANGES SINCE THE MOMENT SNAPSHOT WAS MADE - yours and any other user. Most likely this is not what you want.
One option is to use a WAITFOR DELAY in combination with a modified transaction isolation level. What this does is executes your code, makes the query wait for a set amount of time, then rolls back the transaction. During that time you designate, you can have another session query from your table that received the modification (as long as you set the transaction level to read uncommitted) and you will be able to see the new value. Here is a sample:
In one window of SSMS:
CREATE TABLE ##testtable (id INT);
BEGIN TRAN;
INSERT INTO ##testtable (id)
VALUES (1), (2), (3);
WAITFOR DELAY '00:01:00';
ROLLBACK TRAN
In a second SSMS window, run this query during the 1 minute timeframe:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT *
FROM ##testtable
During the 1 minute that the other window is executing, you will see value in the temp table. After one minute, the table will be blank. This works for somewhat simple tasks, but if you are doing test upon data that already has a test change to it, just do a snapshot or a database restore.
Standard disclaimer: That might not be a good thing to do. (Okay, that's done.)
You could do it most easily in your stored procedure using a table variable. You can
begin a transaction
modify your data
query the modified data
insert it into the table variable
roll back the transaction
select what's in the table variable
declare #myData table (someColumn int, someOtherColumn varchar(10))
begin transaction
begin try
[make your changes]
insert into #myData
select something, something something
rollback transaction
select * from #myData
end try
begin catch
rollback transaction
end catch
Rolling back your transaction won't affect what's in the table variable. I'd trust this more than counting on my application to roll back the transaction. Not that it wouldn't work, but I'd just trust this much more.
That being said, you could just create a SqlTransaction (docs), execute your SqlCommand using that transaction, query your data, and then roll back the transaction.
On ORACLE database if you want to wait in your PL/SQL program (“sleep”) you may want to use the procedure “SLEEP” from the package “DBMS_LOCK“.
CREATE PROCEDURE execution(xVal NUMBER) IS
BEGIN
INSERT INTO TABLE_1 VALUES (xVal);
DBMS_LOCK.Sleep (60);
ROLLBACK;
END execution
I am new to Database interection with C#, I am trying to writing 10000 records in database in a loop with the help of SqlCommand and SqlConnection objects with the help of SqlTransaction and committing after 5000. It is taking 10 seconds to processed.
SqlConnection myConnection = new SqlConnection("..Connection String..");
myConnection.Open();
SqlCommand myCommand = new SqlCommand();
myCommand.CommandText = "exec StoredProcedureInsertOneRowInTable Param1, Param2........";
myCommand.Connection = myConnection;
SqlTransaction myTrans = myConnection.Begintransaction();
for(int i=0;i<10000;i++)
{
mycommand.ExecuteNonQuery();
if(i%5000==0)
{
myTrans.commit();
myTrans = myConnection.BeginTransaction();
mycommand.Transaction = myTrans;
}
}
Above code is giving me only 1000 rows write/sec in database.
But when i tried to implement same logic in SQL and execute it on Database with SqlManagement Studio the it gave me 10000 write/sec.
When I compare the behaviour of above two approch then it showes me that while executing with ADO.Net there is large number of Logical reads.
my questions are:
1. Why there is logical reads in ADO.Net execution?
2. Is tansaction have some hand shaking?
3. Why they are not available in case of management studio?
4. If I want very fast insert transactions on DB then what will be the approach? .
Updated Information about Database objects
Table: tbl_FastInsertTest
No Primary Key, Only 5 fields first three are type of int (F1,F2,F3) and last 2(F4,F5) are type varchar(30)
storedprocedure:
create proc stp_FastInsertTest
{
#nF1 int,
#nF2 int,
#nF3 int,
#sF4 varchar(30),
#sF5 varchar(30)
}
as
Begin
set NoCOUNT on
Insert into tbl_FastInsertTest
{
[F1],
[F2],
[F3],
[F4],
[F5]
}
Values
{
#nF1,
#nF2,
#nF3,
#sF4,
#sF5,
} end
--------------------------------------------------------------------------------------
SQL Block Executing on SSMS
--When I am executing following code on SSMS then it is giving me more than 10000 writes per second but when i tried to execute same STP on ADO than it gave me 1000 to 1200 writes per second
--while reading no locks
begin trans
declare #i int
set #i=0
While(1<>0)
begin
exec stp_FastInsertTest 1,2,3,'vikram','varma'
set #i=#i+1
if(#i=5000)
begin
commit trans
set #i=0
begin trans
end
end
If you are running something like:
exec StoredProcedureInsertOneRowInTable 'blah', ...
exec StoredProcedureInsertOneRowInTable 'bloop', ...
exec StoredProcedureInsertOneRowInTable 'more', ...
in SSMS, that is an entirely different scenario, where all of that is a single batch. With ADO.NET you are paying a round-trip per ExecuteNonQuery - I'm actually impressed it managed 1000/s.
Re the logical reads, that could just be looking at the query-plan cache, but without knowing more about StoredProcedureInsertOneRowInTable it is impossible to comment on whether something query-specific is afoot. But I suspect you have some different SET conditions between SSMS and ADO.NET that is forcing it to use a different plan - this is in particular a problem with things like persisted calculated indexed columns, and columns "promoted" out of a sql-xml field.
Re making it faster - in this case it sounds like a table-valued parameters is exactly the thing, but you should also review the other options here
For performant inserts take a look at SqlBulkCopy class if it works for you it should be fast.
As Sean said, using parameterized queries is always a good idea.
Using a StringBuilder class, batching thousand INSERT statements in a single query and committing the transaction is a proven way of inserting data:
var sb=new StringBuilder();
for(int i=0;i < 1000;i++)
{
sb.AppendFormat("INSERT INTO Table(col1,col2)
VALUES({0},{1});",values1[i],values2[i]);
}
sqlCommand.Text=sb.ToString();
Your code doesn't look right to me, you are not committing transactions at each batch. Your code keeps opening new transactions.
It is always a good practice to drop indexes while inserting a lot of data, and adding them later. Indexes will slow down your writes.
Sql Management Studio does not have transactions but Sql has, try this:
BEGIN TRANSACTION MyTransaction
INSERT INTO Table(Col1,Col1) VALUES(Val10,Val20);
INSERT INTO Table(Col1,Col1) VALUES(Val11,Val21);
INSERT INTO Table(Col1,Col1) VALUES(Val12,Val23);
COMMIT TRANSACTION
You need to use a parameterized query so that the execution path can get processed and cached. Since you're using string concatenation (shudder, this is bad, google sql injection) to build the query, SQL Server treats those 10,000 queries are separate, individual queries and builds an execution plan for each one.
MSDN: http://msdn.microsoft.com/en-us/library/yy6y35y8.aspx although you're going to want to simplify their code a bit and you'll have to reset the parameters on the command.
If you really, really want to get the data in the db fast, think about using bcp... but you better make sure the data is clean first (as there's no real error checking/handling on it.
I am wondering how can do a mass insert and bulk copy at the same time? I have 2 tables that should be affect by the bulk copy as they both depend on each other.
So I want it that if while inserting table 1 a record dies it gets rolled back and table 2 never gets updated. Also if table 1 inserts good and table 2 an update fails table 1 gets rolled back.
Can this be done with bulk copy?
Edit
I should have mentioned I am doing the bulk insert though C#.
It sort of looks like this but this is an example I been working off. So I am not sure if I have to alter it to be a stored procedure(not sure how it would look and how the C# code would look)
private static void BatchBulkCopy()
{
// Get the DataTable
DataTable dtInsertRows = GetDataTable();
using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity))
{
sbc.DestinationTableName = "TBL_TEST_TEST";
// Number of records to be processed in one go
sbc.BatchSize = 500000;
// Map the Source Column from DataTabel to the Destination Columns in SQL Server 2005 Person Table
// sbc.ColumnMappings.Add("ID", "ID");
sbc.ColumnMappings.Add("NAME", "NAME");
// Number of records after which client has to be notified about its status
sbc.NotifyAfter = dtInsertRows.Rows.Count;
// Event that gets fired when NotifyAfter number of records are processed.
sbc.SqlRowsCopied += new SqlRowsCopiedEventHandler(sbc_SqlRowsCopied);
// Finally write to server
sbc.WriteToServer(dtInsertRows);
sbc.Close();
}
}
I am wondering how can do a mass
insert and bulk copy at the same time?
I have 2 tables that should be affect
by the bulk copy as they both depend
on each other. So I want it that if
while inserting table 1 a record dies
it gets rolled back and table 2 never
gets updated. Also if table 1 inserts
good and table 2 an update fails table
1 gets rolled back. Can this be done
with bulk copy?
No - the whole point of SqlBulkCopy is to get data into your database as fast as possible. It will just dump the data into a single table.
The normal use case will be to then inspect that table once it's imported, and begin to "split up" that data and store it into whatever place it needs to go - typically through a stored procedure (since the data already is on the server, and you want to distribute it to other tables - you don't want to pull all that data back down to the client, inspect it, and then send it back to the server one more time).
SqlBulkCopy only grabs a bunch of data and drops it into a table - very quickly so. It cannot split up data into multiple tables based on criteria or conditions.
You can run bulk inserts inside of a user defined transaction so do something like this:
BEGIN TRANSACTION MyDataLoad
BEGIN TRY
BULK INSERT ...
BULK INSERT ...
COMMIT TRANSACTJION MyDataLoad
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
END CATCH
However, there may be other ways to accomplish what you want. Are the tables empty before you bulk insert into them? When you say the tables depend on each other, do you mean that there are foreign key constraints you want enforced?
I have a table, schema is very simple, an ID column as unique primary key (uniqueidentifier type) and some other nvarchar columns. My current goal is, for 5000 inputs, I need to calculate what ones are already contained in the table and what are not. Tht inputs are string and I have a C# function which converts string into uniqueidentifier (GUID). My logic is, if there is an existing ID, then I treat the string as already contained in the table.
My question is, if I need to find out what ones from the 5000 input strings are already contained in DB, and what are not, what is the most efficient way?
BTW: My current implementation is, convert string to GUID using C# code, then invoke/implement a store procedure which query whether an ID exists in database and returns back to C# code.
My working environment: VSTS 2008 + SQL Server 2008 + C# 3.5.
My first instinct would be to pump your 5000 inputs into a single-column temporary table X, possibly index it, and then use:
SELECT X.thecol
FROM X
JOIN ExistingTable USING (thecol)
to get the ones that are present, and (if both sets are needed)
SELECT X.thecol
FROM X
LEFT JOIN ExistingTable USING (thecol)
WHERE ExistingTable.thecol IS NULL
to get the ones that are absent. Worth benchmarking, at least.
Edit: as requested, here are some good docs & tutorials on temp tables in SQL Server. Bill Graziano has a simple intro covering temp tables, table variables, and global temp tables. Randy Dyess and SQL Master discuss performance issue for and against them (but remember that if you're getting performance problems you do want to benchmark alternatives, not just go on theoretical considerations!-).
MSDN has articles on tempdb (where temp tables are kept) and optimizing its performance.
Step 1. Make sure you have a problem to solve. Five thousand inserts isn't a lot to insert one at a time in a lot of contexts.
Are you certain that the simplest way possible isn't sufficient? What performance issues have you measured so far?
What do you need to do with those entries that do or don't exist in your table??
Depending on what you need, maybe the new MERGE statement in SQL Server 2008 could fit your bill - update what's already there, insert new stuff, all wrapped neatly into a single SQL statement. Check it out!
http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx
http://www.sql-server-performance.com/articles/dba/SQL_Server_2008_MERGE_Statement_p1.aspx
http://blogs.msdn.com/brunoterkaly/archive/2008/11/12/sql-server-2008-merge-capability.aspx
Your statement would look something like this:
MERGE INTO
(your target table) AS t
USING
(your source table, e.g. a temporary table) AS s
ON t.ID = s.ID
WHEN NOT MATCHED THEN -- new rows does not exist in base table
....(do whatever you need to do)
WHEN MATCHED THEN -- row exists in base table
... (do whatever else you need to do)
;
To make this really fast, I would load the "new" records from e.g. a TXT or CSV file into a temporary table in SQL server using BULK INSERT:
BULK INSERT YourTemporaryTable
FROM 'c:\temp\yourimportfile.csv'
WITH
(
FIELDTERMINATOR =',',
ROWTERMINATOR =' |\n'
)
BULK INSERT combined with MERGE should give you the best performance you can get on this planet :-)
Marc
PS: here's a note from TechNet on MERGE performance and why it's faster than individual statements:
In SQL Server 2008, you can perform multiple data manipulation language (DML) operations in a single statement by using the MERGE statement. For example, you may need to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table. Typically, this is done by executing a stored procedure or batch that contains individual INSERT, UPDATE, and DELETE statements. However, this means that the data in both the source and target tables are evaluated and processed multiple times; at least once for each statement.
By using the MERGE statement, you can replace the individual DML statements with a single statement. This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed. However, performance gains depend on having correct indexes, joins, and other considerations in place. This topic provides best practice recommendations to help you achieve optimal performance when using the MERGE statement.
Try to ensure you end up running only one query - i.e. if your solution consists of running 5000 queries against the database, that'll probably be the biggest consumer of resources for the operation.
If you can insert the 5000 IDs into a temporary table, you could then write a single query to find the ones that don't exist in the database.
If you want simplicity, since 5000 records is not very many, then from C# just use a loop to generate an insert statement for each of the strings you want to add to the table. Wrap the insert in a TRY CATCH block. Send em all up to the server in one shot like this:
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
if you have a unique index or primary key defined on your string GUID, then the duplicate inserts will fail. Checking ahead of time to see if the record does not exist just duplicates work that SQL is going to do anyway.
If performance is really important, then consider downloading the 5000 GUIDS to your local station and doing all the analysis localy. Reading 5000 GUIDS should take much less than 1 second. This is simpler than bulk importing to a temp table (which is the only way you will get performance from a temp table) and doing an update using a join to the temp table.
Since you are using Sql server 2008, you could use Table-valued parameters. It's a way to provide a table as a parameter to a stored procedure.
Using ADO.NET you could easily pre-populate a DataTable and pass it as a SqlParameter.
Steps you need to perform:
Create a custom Sql Type
CREATE TYPE MyType AS TABLE
(
UniqueId INT NOT NULL,
Column NVARCHAR(255) NOT NULL
)
Create a stored procedure which accepts the Type
CREATE PROCEDURE spInsertMyType
#Data MyType READONLY
AS
xxxx
Call using C#
SqlCommand insertCommand = new SqlCommand(
"spInsertMyType", connection);
insertCommand.CommandType = CommandType.StoredProcedure;
SqlParameter tvpParam =
insertCommand.Parameters.AddWithValue(
"#Data", dataReader);
tvpParam.SqlDbType = SqlDbType.Structured;
Links: Table-valued Parameters in Sql 2008
Definitely do not do it one-by-one.
My preferred solution is to create a stored procedure with one parameter that can take and XML in the following format:
<ROOT>
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000">
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000001">
....
</ROOT>
Then in the procedure with the argument of type NCHAR(MAX) you convert it to XML, after what you use it as a table with single column (lets call it #FilterTable). The store procedure looks like:
CREATE PROCEDURE dbo.sp_MultipleParams(#FilterXML NVARCHAR(MAX))
AS BEGIN
SET NOCOUNT ON
DECLARE #x XML
SELECT #x = CONVERT(XML, #FilterXML)
-- temporary table (must have it, because cannot join on XML statement)
DECLARE #FilterTable TABLE (
"ID" UNIQUEIDENTIFIER
)
-- insert into temporary table
-- #important: XML iS CaSe-SenSiTiv
INSERT #FilterTable
SELECT x.value('#ID', 'UNIQUEIDENTIFIER')
FROM #x.nodes('/ROOT/MyObject') AS R(x)
SELECT o.ID,
SIGN(SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END)) AS FoundInDB
FROM #FilterTable o
LEFT JOIN dbo.MyTable t
ON o.ID = t.ID
GROUP BY o.ID
END
GO
You run it as:
EXEC sp_MultipleParams '<ROOT><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000"/><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000002"/></ROOT>'
And your results look like:
ID FoundInDB
------------------------------------ -----------
60EAD98F-8A6C-4C22-AF75-000000000000 1
60EAD98F-8A6C-4C22-AF75-000000000002 0