I am inserting record in the database (100,1.000, 10.000 and 100.000) using 2 methods
(is a table with no primary key and no index)
using a for and inserting one by one
using a stored procedure
The times are, of course better using stored procedure.
My questions are: 1)if i use a index will the operation go faster and 2)Is there any other way to make the insertion
PS:I am using ibatis as ORM if that makes any difference
Check out SqlBulkCopy.
It's designed for fast insertion of bulk data. I've found it to be fastest when using the TableLock option and setting a BatchSize of around 10,000, but it's best to test the different scenarios with your own data.
You may also find the following useful.
SQLBulkCopy Performance Analysis
No, I suspect that, if you use an index, it will actually go slower. That's because it has to update the index as well as inserting the data.
If you're reasonably certain that the data won't have duplicate keys, add the index after you've inserted all the rows. That way, it built once rather than being added to and re-balanced on every insert.
That's a function of the DBMS. I know it's true for the one I use frequently (which is not SQLServer).
I know this is slightly off-topic, but it's a shame you're not using SQL Server 2008, as there's been a massive improvement in this area with the advent of the MERGE statement and user-defined table types (which allow you to pass-in a 'table' of data to the stored procedure or statement so you can insert/update many records in one go).
For some more information, have a look at http://www.sql-server-helper.com/sql-server-2008/merge-statement-with-table-valued-parameters.aspx
It was already discussed : Insert data into SQL server with best performance.
Related
Insert Query vs SqlBulkCopy - which one is best as performance wise for insert records from One DataBase to Another DataBase
I know SQLBulkcopy used for large no of records.
If records less than 10 then which one is better.
Please share your views.
As you are asking for less than 10 records,I would suggest you that use simple insert query.
But if you want to use SQLBulkCopy then first you should know when to use it.
Read For Knowledge
BULK INSERT
The BULK INSERT command is the in-process method for bringing data from a text file into SQL Server. Because it runs in process with Sqlservr.exe, it is a very fast way to load data files into SQL Server.
IMO, in your case, using SqlBulkCopy is way overkill..
You can use a User-defined Type (UDT) as a table-valued parameter (TVP) passed to a stored procedure if you have less than 1,000 rows or so. After that threshold, the performance gain of SqlBulkCopy begins to outweigh its initial overhead. SqlBulkCopy works best with Heap Tables (tables without clustered indexes).
I found this article helpful for my own solution - https://www.sentryone.com/blog/sqlbulkcopy-vs-table-valued-parameters-bulk-loading-data-into-sql-server.
Here's a link to a site with some benchmarks.
I've used SqlBulkCopy extensively in the past and it's very efficient. However, you may need to redesign your tables to be able to use it effectively. For example, when you use SqlBulkCopy you may want to use client side generated identifiers, so you will have to invent some scheme that allows you to generate stable, robust and predictable IDs. This is very different from your typical insert into table with auto generated identity column.
As an alternative to SqlBulkCopy you can use the method discussed in the link I provided where you use a combination of union all and insert into statements, it has excellent performance as well. Altough, as the dataset increases I think SqlBulkCopy will be the faster option. A hybrid approach is probably warranted here where you switch to SqlBulkCopy when record count is high.
I recon SqlBulkCopy will win out for larger data sets but you need to understand that a SqlBulkCopy operation can be used to forgo some checks. This will of course speed things up even more but also allow you to violate conditions that you have imposed on your database.
I have to parse a big XML file and import (insert/update) its data into various tables with foreign key constraints.
So my first thought was: I create a list of SQL insert/update statements and execute them all at once by using SqlCommand.ExecuteNonQuery().
Another method I found was shown by AMissico: Method
where I would execute the sql commands one by one. No one complained, so I think its also a viable practice.
Then I found out about SqlBulkCopy, but it seems that I would have to create a DataTable with the data I want to upload. So, SqlBulkCopy for every table. For this I could create a DataSet.
I think every option supports SqlTransaction. It's approximately 100 - 20000 records per table.
Which option would you prefer and why?
You say that the XML is already in the database. First, decide whether you want to process it in C# or in T-SQL.
C#: You'll have to send all data back and forth once, but C# is a far better language for complex logic. Depending on what you do it can be orders of magnitude faster.
T-SQL: No need to copy data to the client but you have to live with the capabilities and perf profile of T-SQL.
Depending on your case one might be far faster than the other (not clear which one).
If you want to compute in C#, use a single streaming SELECT to read the data and a single SqlBulkCopy to write it. If your writes are not insert-only, write to a temp table and execute as few DML statements as possible to update the target table(s) (maybe a single MERGE).
If you want to stay in T-SQL minimize the number of statements executed. Use set-based logic.
All of this is simplified/shortened. I left out many considerations because they would be too long for a Stack Overflow answer. Be aware that the best strategy depends on many factors. You can ask follow-up questions in the comments.
Don't do it from C# unless you have to, it's a huge overhead and SQL can do it so much faster and better by itself
Insert to table from XML file using INSERT INTO SELECT
I have a very large number of rows (10 million) which I need to select out of a SQL Server table. I will go through each record and parse out each one (they are xml), and then write each one back to a database via a stored procedure.
The question I have is, what's the most efficient way to do this?
The way I am doing it currently is I open 2 SqlConnection's (one for read one for write). The read one uses the SqlDataReader of which it basically does a select * from table and I loop through the dataset. After I parse each record I do an ExecuteNonQuery (using parameters) on the second connection.
Is there any suggestions to make this more efficient, or is this just the way to do it?
Thanks
It seems that you are writing rows one-by-one. That is the slowest possible model. Write bigger batches.
There is no need for two connections when you use MARS. Unfortunately, MARS forces a 14 byte row versioning tag in each written row. Might be totally acceptable, or not.
I had very slimier situation and here what I did:
I made two copies of same database.
One is optimized for reading and another is optimized for writing.
In config, i kept two connection string ConnectionRead and ConnectionWrite.
Now in DataLayer when I have read statement(select..) I switch my connection to ConnectionRead connection string and when writing using other one.
Now since I have to keep both the databases in sync, I am using SQL replication for this job.
I can understand implementation depends on many aspect but approach may help you.
I agree with Tim Schmelter's post - I did something very similar... I actually used a SQLCLR procedure which read the data from a XML column in a SQL table into an in-memory (table) using .net (System.Data) then used the .net System.Xml namespace to deserialize the xml, populated another in-memory table (in the shape of the destination table) and used the sqlbulkcopy to populate that destination SQL table with those parsed attributes I needed.
SQL Server is engineered for set-based operations... If ever I'm shredding/iterating (row-by-row) I tend to use SQLCLR as .Net is generally better at iterative/data-manipulative processing. An exception to my rule is when working with a little metadata for data-driven processes, cleanup routines where I may use a cursor.
I have a scenario where I need to synchronize a database table with a list (XML) from an external system.
I am using EF but am not sure which would be the best way to achieve this in terms of performance.
There are 2 ways to do this as I see, but neither seem to be efficient to me.
Call Db each time
-Read each entry from the XML
-Try and retrieve the entry from the list
-If no entry found, add the entry
-If found , update timestamp
-At end of loop, delete all entries with older timestamp.
Load All Objects and work in memory
Read all EF objects into a list.
Delete all EF objects
Add item for each item in the XML
Save Changes to Db.
The lists are not that long, estimating around 70k rows. I don't want to clear the db table before inserting the new rows, as this table is a source for data from a webservice, and I don't want to lock the table while its possible to query it.
If I was doing this in T-SQL i would most likely insert the rows into a temp table, and join to find missing and deleted entries, but I have no idea how the best way to handle this in Entity Framework would be.
Any suggestions / ideas ?
The general problem with Entity Framework is that, when changing data, it will fire a query for each changed record anyway, regardless of lazy or eager loading. So by nature, it will be extremely slow (think of factor 1000+).
My suggestion is to use a stored procedure with a table valued parameter and ignore Entity Framework all together. You could use a merge statement.
70k rows is not much, but 70k insert/update/delete statements is always going to be very slow.
You could test it and see if the performance is managable, but intuition says entity framework is not the way to go.
I would iterate over the elements in the XML and update the corresponding row in the DB one at a time. I guess that's what you meant with your first option? As long as you have a good query plan to select each row, that should be pretty efficient. Like you said, 70k rows isn't that much so you are better off keeping the code straightforward rather than doing something less readable for a little more speed.
It depends. It's ok to use EF if there'll be not many changes (say less than hundreds). Otherwise, need bulk insert into DB and merge rows inside database.
When I want to insert huge amount of data into database, which way is more efficient?
1. insert data row by row by calling insert statement
2. create a user defined Table type in my database, and write a stored procedure to insert a DataTable into database.
and why?
The most efficient would be SQL Bulk Insert.
To improve the performance further you can use SqlBulkCopy to SQL Server in Parallel.
This will probably vary based on the specific workload.
We can only tell you here what options are available to you - it is down to you to measure how efficient each option is in your specific case.
Options:
SQLBulkCopy
INSERT statements with single row
INSERT statements with multiple rows (experiment with different numbers of rows)
Stored procedure (with variations on the above)
2 is much much faster, because anything that involves per-row operations (RBAR) is inherently slow. I did quite a lot of testing with table valued parameters and found them to be close to the performance of bulk inserts for batch sizes of up to 1,000,000 rows. However, that's purely anecdotal and your mileage may vary.
Also, I would advise against using a DataTable and write something that streamed the rows more directly. I blogged about that here.