INSERT with Dapper takes too long in SQL Azure

INSERT with Dapper takes too long in SQL Azure - c#

I've got an instance of an entity, with a one to many relationship (Items). I'm attempting to have this inserted into a SQL Azure instance via Dapper.
The item count is ~3.5k
using (var transaction = connection.BeginTransaction())
{
// (...) x inserted here, we want to insert the items next...
connection.Execute(TSQL_InsertStatement, x.Items, transaction);
transaction.Commit();
}
Locally (vs a .\SQLEXPRESS), the execution takes ~300-500[ms]
In Azure, the execution takes ~8[s]
I believe this action generates a ton of individual INSERTs and this might be a problem. I've tried refactoring this to a "manually-generated-sql" solution, which generated 4x INSERTS (1k, 1k, 1k, 500 rows), however the execution of this took over 2[s] locally.
Is there a way to have this run faster with Dapper? Like a bulk insert maybe?
Is there something I'm not aware of with SQL Azure? Like some hidden throttling?

Using multiple inserts within same transaction is generally bad idea in Azure SQL Database.
You should use batching to increase performance of insert queries see https://azure.microsoft.com/en-us/documentation/articles/sql-database-use-batching-to-improve-performance/ .
If it is possible, I would recommend table value parameters, bulk insert or multiple parametrized inserts. These scenarios are described in the referenced article.
Azure Sql Database also has JSON support so if your original source is array of JSON objects you might push entire JSON as text and parse it in SQL side using OPENJSON (there are still no evidence is this faster than other methods).

Related

How to run non-persistent queries in Visual Studio

When using the Query Design feature in Visual Studio, any queries that I run on a SQL Database or Microsoft Access Database while testing are persistent. Meaning they actually change the data in the table(s). Is there a way to make the queries non-persistent while testing them until a program is run? Using C# as a programming language, and .NET as a framework if it matters. Also need to know the process for doing this with either an MS Access or SQL database.

You can do transactions in C# similar to how you use them in SQL. Here is an example:
connection.Open();
SqlCommand command = connection.CreateCommand();
SqlTransaction transaction;
// Start a local transaction.
transaction = connection.BeginTransaction("SampleTransaction");
//Execute query here
Query details
//check if test environment
bool testEnvironment = SomeConfigFile.property("testEnvironment");
if (!testEnvironment) {
transaction.Commit();
} else {
transaction.Rollback();
}
Here is the documentation on transactions in C#: https://msdn.microsoft.com/en-us/library/86773566%28v=vs.110%29.aspx

It should be possible for VS to create you a local copy of the SQL data you're working on while you're testing. This is held in the bin folder. Have a look at this:
https://msdn.microsoft.com/en-us/library/ms246989.aspx
Once you're finished testing you could simply change it to be pointing to the database you want to alter with your application.

I'm not aware of a way to get exactly what you're asking for, but I think there is an approach to get close to the behaviour you want:
When using Microsoft SQL Server, creating a table with a leading hash in the name (#tableName) will cause the table to be disposed of when your session ends.
One way you could take advantage of this to get your desired behaviour is to copy your working table into a temporary table, and work on the temporary table instead of the live table.
To do so, use something like the following:
SELECT * INTO #tempTable FROM liveTable
This will create a complete copy of your liveTable, with all of the same columns and rows. Once you are finished, the table will be automatically dropped and no permanent changes will have been made.
This can also useful for a series of queries which you execute on the same subset of a large data set. Selecting the subset of data into a smaller temporary table can make subsequent queries much faster than if you had to select from the full data set repeatedly.
Just keep in mind that as soon as your connection closes, all the data goes with it.

SQL Server - insert multiple values - what is the right way?

I have a .NET application that works against a SQL Server. This app gets data from a remote third party API, and I need to insert that data to my database in a transaction.
First I delete all existing data from the tables, then I insert each row of data that I get from the API.
I wrote a stored procedure that accepts parameters and does the insert. then I call that stored procedure in a loop with a transaction from .NET.
I'm guessing there's a smarter way to do this?
Thanks

If you're doing thousands or maybe even tens of thousands you can probably do best with table valued parameters.
If you're doing more than that then you should probably look at doing the dedicated SQL server bulk insert feature. That might not work great transactionally if I remember correctly.
Either way truncate is way faster than delete.
What I've done in the past to avoid needing transactions is create two tables, and use another for deciding which is the active one. That way you always have a table with valid data and no write locks.

Efficient Update of Table from One SQL Server to Another, Same Table Structure

I have one database server, acting as the main SQL Server, containing a Table to hold all data. Other database servers come in and out (different instances of SQL Server). When they come online, they need to download data from main Table (for a given time period), they then generate their own additional data to the same local SQL Server database table, and then want to update the main server with only new data, using a C# program, through a scheduled service, every so often. Multiple additional servers could be generating data at the same time, although it's not going to be that many.
Main table will always be online. The additional non-main database table is not always online, and should not be an identical copy of main, first it will contain a subset of the main data, then it generates its own additional data to the local table and updates main table every so often with its updates. There could be a decent amount of number of rows generated and/or downloaded. so an efficient algorithm is needed to copy from the extra database to the main table.
What is the most efficient way to transfer this in C#? SqlBulkCopy doesn't look like it will work because I can't have duplicate entries in main server, and it would fail if checking constraints since some entries already exist.

You could do it in DB or in C#. In all cases you must do something like Using FULL JOINs to Compare Datasets. You know that already.
Most important thing is to do it in transaction. If you have 100k rows split it to 1000 rows per transaction. Or try to determine what combination of rows per transaction is best for you.
Use Dapper. It's really fast.
If you have all your data in C#, use TVP to pass it to DB stored procedure. In stored procedure use MERGE to UPDATE/DELETE/INSERT data.
And last. In C# use Dictionary<Tkey, TValue> or something different with O(1) access time.

SQLBulkCopy is the fastest way for inserting data into a table from a C# program. I have used it to copy data between databases and so far nothing beats it speed wise. Here is a nice generic example: Generic bulk copy.
I would use a IsProcessed flag in the table of the main server and keep track of the main table's primary keys when you download data to the local db server. Then you should be able to do a delete and update to the main server again.

Here's how i would do it:
Create a stored procedure on the main table database which receives a user defined table variable with the same structure as the main table.
it should do something like -
INSERT INTO yourtable (SELECT * FROM tablevar)
OR you could use the MERGE statement for the Insert-or-Update functionality.
In code, (a windows service) load all (or a part of) the data from the secondery table and send it to the stored procedure as a table variable.
You could do it in bulks of 1000's and each time a bulk is updated you should mark it in the source table / source updater code.

Can you use linked servers for this? If yes it will make copying of data from and to main server much easier.
When copying data back to the main server I’d use IF EXISTS before each INSERT statement to additionally make sure there are no duplicates and encapsulate all insert statements into transaction so that if an error occurs transaction is rolled back.
I also agree with others on doing this in batches on 1000 or so records so that if something goes wrong you can limit the damage.

BulkCopy from Stored Procedure

I have tables A, B and C in database. I have to put the result obtained from A and B into table C.
Currently, I have an SP that returns the result of the A and B to the C# application. This result will be copied into table C using "System.Data.SqlClient.SqlBulkCopy". The advanatge is during the insert using bulkcopy, log files are not created.
I want to avoid this extra traffic, by handling the insert in the SP itself. However, it should not be using any log files. Any way to achieve this?
Please share your thoughts.
Volume Of Data: 150,000
Database : SQL Server 2005
The database is in full recovery model; it cannot be changed.. Is SELECT INTO usefull in such scenario?
EDIT: When I use System.Data.SqlClient.SqlBulkCopy, the operation is getting completed in 3 mnutes; in normal insert it takes 30 minutes... This particular operation need not be recovered; however other operations in the database has to be recoveed - hence I cannot change the recovery mode of the whole database.
Thanks
Lijo

You can use SELECT INTO with the BULK_LOGGED recovery model in order minimise the number of records written to the transaction log as described in Example B of the INTO Clause documentation (MSDN):
ALTER DATABASE AdventureWorks2008R2 SET RECOVERY BULK_LOGGED;
GO
-- Put your SELECT INTO statement here
GO
ALTER DATABASE AdventureWorks2008R2 SET RECOVERY FULL;
This is also required for bulk inserts if you wish to have minimal impact on the transaction log as described in Optimizing Bulk Import Performance (MSDN):
For a database under the full recovery model, all row-insert operations that are performed during bulk import are fully logged in the transaction log. For large data imports, this can cause the transaction log to fill rapidly. For bulk-import operations, minimal logging is more efficient than full logging and reduces the possibility that a bulk-import operation will fill the log space. To minimally log a bulk-import operation on a database that normally uses the full recovery model, you can first switch the database to the bulk-logged recovery model. After bulk importing the data, switch the recovery model back to the full recovery model.
(emphasis mine)
I.e. if you don't already set the database recovery model to BULK_LOGGED before performing a bulk insert then you won't currently be getting the benefit of minimal transaction logging with bulk insers either and so the transaction log won't be source of your slowdown. (The SqlBulkCopy class doesn't do this for you automatically or anything)

Maybe you can use select into.
Try to take a look at http://msdn.microsoft.com/en-us/library/ms191244.aspx

Can you give an example of the processing your procedure does?
Typically, I would think a set-based insert of 150,000 rows (no linked servers or anything) would take almost no time on most installations.
How long does just selecting the 150,000 rows with a query take?
Are you using a cursor and loop instead of a single INSERT INTO C SELECT * FROM (some combination of A and B)?
Is there any blocking which is causing the operation to wait for other operations to complete?
If your database is in full recovery model, it is going to log the operation - that's the point of using the database that way. The database has been told to use that model and it's going to do that to ensure it can comply.
Imagine if you told the database that a column needed to be unique but it didn't actually enforce it for you! It would be worth less than a comment on a post-it note which fell off a specification document!

In SQL Server 2008 you do not need to return the data to the client/application before proceeding with a minimally logged operation. You can do it within the stored procedure immediately following your query that produces the result to be inserted to Table C.
See Insert: Specifically "Using INSERT INTO…SELECT to Bulk Load Data with Minimal Logging"
[Edit]: Having since expanded your question to include that you are using the FULL recovery model, you therefore cannot benefit from minimally logged operations.
Instead you should concentrate your efforts on optimising your data insert process, than concern yourself with logging overhead.

Insert data into table C in parts using insert into c select * from AandB WHERE ID < SOMETHING. Or you can take send output of a and b data as xml to stored procedure to insert bulk data.
Hope this will help you.

insert directly or via a stored procedure

I am using sql server and winforms for my application. data would be inserted every minute into the database tables by pressing a button on a Form.
for this, I am using the INSERT query.
But if I create a procedure and include the same insert query in it, then would it be more efficient, what would be the difference then?

Using stored procedures is more secure

A stored procedure would generally be quicker as the query plan is stored and does not need to be created for each call. If this is a simple insert the difference would be minimal.
A stored procedure can be run with execute permissions which is more secure than giving insert permissions to the user.

It depends on what you mean by 'efficient'.
Execution time - if you're only saving to the database only every couple of seconds then any speed difference between SPs and INSERT is most likely insignificant. If the volume is especially high you would probably set up something like a command queue on the server before fine-tuning at this level.
Development time
using INSERT means you can write your SQL directly in your codebase (in a repository or similar). I've seen that described as poor design, but I think that as long as you have integration tests around the query there's no real problem
Stored Procedures can be more difficult to maintain - you need to have a plan to deploy the new SP to the database. Benefits are that you can implement finer-grained security on the database itself (as #b-rain and #mark_s have said) and it is easy to decide between INSERT and UPDATE within the SP, whereas to do the same in code means making certain assumptions.
Personally (at the moment) I use inline SQL for querying and deleting, and stored procedures for inserting. I have a script and a set of migration files that I can run against the production database to deploy table and SP changes, which seems to work pretty well. I also have integration tests around both the inline SQL and the SP calls. If you go for inline SQL you definitely should use parameterised queries, it helps against SQL injection attacks and it is also easier to read and program.

If your DBA is even allowing you to do this without a stored procedure I'd be very suspicious...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

INSERT with Dapper takes too long in SQL Azure - c#

Related

How to run non-persistent queries in Visual Studio

SQL Server - insert multiple values - what is the right way?

Efficient Update of Table from One SQL Server to Another, Same Table Structure

BulkCopy from Stored Procedure

insert directly or via a stored procedure

Categories

Resources