I need to dramatically improve writing speed for SQLite (or maybe suggest another solution for this outside of SQLite).
Scenario :
I have 71 Columns with 365 * 24 * 60 values each. (365 = days)
I do "insert intos" for testing the db_performance
To shorten the testing-time I did the tests for 90 days instead of 365 (so the result-timespans will be x4)
Settings :
I've tried various PRAGMAS like
synchronous off
locking_mode exclusive
cache & pagesize with different values (though I read low values may improve performance, for me higher values did a good job)
journal_mode off
changing timeout values
Approaches :
#A1 Gathering all "insert intos", ExecuteNonquery each, at the end do one giant transaction
#A2 the same like above but with ParallelForEach and ExecuteNonqueryAsync
#A3 Gathering all "insert intos" for one day and do one transaction each
Tablestructure :
#T1 One table with all the columns
#T2 One table for each column
Results :
I did runs for 90 days ( so it doesn't take too long ) and the main problem is writing speed.
I measured 5 phases, which are :
#P1 setup the tables & headers ( ~ 8-9ms)
#P2 prepare the data (for every "insert into" command do ExecuteNonquery) ( ~ 15000-18000ms ! )
#P3 do the transaction (~ 200-500 ms)
#P4 read one complete column ( ~80 - 200 ms)
#P5 delete one complete column ( ~ 1 - 9 ms)
I tried all the different methods and approaches I mentioned before, but couldn't manage to improve #P2. Any ideas how to fix that ? Or maybe any hint for a better solution as a serverless db (Realm?) ?
Here's the code for the #A1 #P2 #T2, which had the best results so far...
using (var transaction = sqLiteConnection.BeginTransaction())
{
using (var command = sqLiteConnection.CreateCommand())
{
foreach (var vcommand in values_list)
{
command.CommandText = vcommand;
command.ExecuteNonQuery();
}
}
transaction.Commit();
}
(values list is a string[] with 71*90 insert intos or in Marks version one giant command.)
Edit/Update :
I tried the approach by Mark Benningfield making one giant "insert into" for all values in one table with all columns and could improve the overall speed to ~8500ms (#P2 ~7500ms).
Final Update :
Ok I did a bunch of tests and will summarize the results :
For comparison reason all databases had the same values, a two-dimensional double array with [129600,71] values. None of them had a prepared insert-statement, so the generation time for transforming the values into the needed format is included (phase 2).
SQLite needs ~14seconds with one giant transaction (the previous ~8s were without generating the insert-into-command live). SQL_CE is atm the best for this scenario. This is mainly due to not operating with strings ("INSERT INTO"), but with DataTables and rows + bulkInsert. Realm is interesting, especially for mobile users - very intuitive. But you cannot add dynamic obejcts atm (so you need a static object). Influx is another nice database for timeseries, but it's very specific, not embedded and has IMHO a poor C# implementation (it may perform much better via console).
Have you tried writing the data to a text file and then using the import command (see Importing CSV files)? Unlike INSERT commands, these routines usually ignore triggers and work with direct table access.
Make your insert command look like this (by constructing it however you need to):
INSERT INTO table (col1, col2, col3) VALUES (val1, 'val2', val3),
(val1, 'val2', val3),
(val2, 'val2', val3),
...
(val1, 'val2', val3);
Then execute the single insert command to do a bulk update of known data.
Ok I did a bunch of tests and will summarize the results :
For comparison reason all databases had the same values, a two-dimensional double array with [129600,71] values. None of them had a prepared insert-statement, so the generation time for transforming the values into the needed format is included (phase 2).
SQLite needs ~14seconds with one giant transaction (the previous ~8s were without generating the insert-into-command live). SQL_CE is atm the best for this scenario. This is mainly due to not operating with strings ("INSERT INTO"), but with DataTables and rows + bulkInsert.
Realm is interesting, especially for mobile users - very intuitive. But you cannot add dynamic obejcts atm (so you need a static object).
Influx is another nice database for timeseries, but it's very specific, not embedded and has IMHO a poor C# implementation (it may perform much better via console).
The fastest and recommended way to do bulk inserts is to use prepared statements with parameters. That way, a statement (command) is only parsed and prepared once, instead of having to parse it again for every row. SQLite also does not have to parse the parameter values from the command text, but they are supplied and used directly. For each row, you only switch parameters.
So instead of going this way:
using (var transaction = sqLiteConnection.BeginTransaction())
{
using (var command = sqLiteConnection.CreateCommand())
{
foreach (var vcommand in values_list)
{
command.CommandText = vcommand;
command.ExecuteNonQuery();
}
}
transaction.Commit();
}
You should do it like this:
using (var transaction = sqLiteConnection.BeginTransaction())
{
using (var command = sqLiteConnection.CreateCommand())
{
// Create command and parameters
command.CommandText = "INSERT INTO MyTable VALUES (?, ?)";
var param1 = command.Parameters.Add(null, SqliteType.Integer);
var param2 = command.Parameters.Add(null, SqliteType.Text);
foreach (var item in values_list)
{
// For each row, only update parameter values
param1.Value = item.IntProperty;
param2.Value = item.TextProperty;
command.ExecuteNonQuery();
}
}
transaction.Commit();
}
This will perform much better. The statement is only parsed on first execute. All following executes will use the already prepared statement. It also safeguards you against SQL Injection attacks: Text parameters are not inserted into the actual SQL statement string, which would allow manipulation of your statement. Instead, they are passed directly as values to SQLite. So you do not only gain performance, you have also prevented one of the most common database attack scenarios.
General rule: Never put values directly into SQL statements. Always use parameters instead.
Note: There are also other ways to create and parameters. This is just one example how to do it. For example, you can also use named parameters:
// Create command and parameters
command.CommandText = "INSERT INTO MyTable VALUES (#one, #two)";
var param1 = command.Parameters.Add("#one", SqliteType.Integer);
var param2 = command.Parameters.Add("#two", SqliteType.Text);
Related
I've been looking at the PostGres multi row/value insert which looks something like this in pure SQL:
insert into table (col1, col2, col3) values (1,2,3), (4,5,6)....
The reason I wan to use this is I have a lot of data to insert that is arriving via a queue, which I'm batching into 500/1000 record inserts at a time to improve performance.
However, I have been unable to find an example of doing this from within C#, everything I can find is adding only a single records parameter at a time, then executing, which is too slow.
I have this working using Dapper currently, but I need to expand the SQL to an upsert (insert on conflict update) which everything I have found indicated Dapper can't handle. I have found evidence the Postgres can handle upsert and multi valued in a single action.
Tom
I didn't get your question completely right. But for bulk insert in Postgresql, this is a good answer
It gives an example for inserting multiple records from a list (RecordList) into table (user_data.part_list) :
using (var writer = conn.BeginBinaryImport(
"copy user_data.part_list from STDIN (FORMAT BINARY)"))
{
foreach (var record in RecordList)
{
writer.StartRow();
writer.Write(record.UserId);
writer.Write(record.Age, NpgsqlTypes.NpgsqlDbType.Integer);
writer.Write(record.HireDate, NpgsqlTypes.NpgsqlDbType.Date);
}
writer.Complete();
}
COPY is the fastest way but does not work if you want to do UPSERTS with an ON CONFLICT ... clause.
If it's necessary to use INSERT, ingesting n rows (with possibly varying n per invocation) can be elegantly done using UNNEST like
INSERT INTO table (col1, col2, ..., coln) SELECT UNNEST(#p1), UNNEST(#p2), ... UNNEST(#pn);
The parameters p then need to be an array of the matching type. Here's an example for an array of ints:
new NpgsqlParameter()
{
ParameterName = "p1",
Value = new int[]{1,2,3},
NpgsqlDbType = NpgsqlDbType.Array | NpgsqlDbType.Integer
}
If you want to insert many records efficiently, you probably want to take a look at Npgsql's bulk copy API, which doesn't use SQL and is the most efficient option available.
Otherwise, there's nothing special about inserting two rows rather than one:
insert into table (col1, col2, col3) values (#p1_1,#p1_2,#p1_3), (#p2_1,#p2_2,#p_3)....
Simply add the parameters with the correct name and execute just as you would any other SQL.
I get a list of ID's and amounts from a excel file (thousands of id's and corresponding amounts). I then need to check the database to see if each ID exists and if it does check to make sure the amount in the DB is greater or equal to that of the amount from the excel file.
Problem is running this select statement upwards of 6000 times and return the values I need takes a long time. Even at a 1/2 of a second a piece it will take about an hour to do all the selects. (I normally dont get more than 5 results max back)
Is there a faster way to do this?
Is it possible to somehow pass all the ID's at once and just make 1 call and get the massive collection?
I have tried using SqlDataReaders and SqlDataAdapters but they seem to be about the same (too long either way)
General idea of how this works below
for (int i = 0; i < ID.Count; i++)
{
SqlCommand cmd = new SqlCommand("select Amount, Client, Pallet from table where ID = #ID and Amount > 0;", sqlCon);
cmd.Parameters.Add("#ID", SqlDbType.VarChar).Value = ID[i];
SqlDataAdapter da = new SqlDataAdapter(cmd);
da.Fill(dataTable);
da.Dispose();
}
Instead of a long in list (difficult to parameterise and has a number of other inefficiencies regarding execution plans: compilation time, plan reuse, and the plans themselves) you can pass all the values in at once via a table valued parameter.
See arrays and lists in SQL Server for more details.
Generally I make sure to give the table type a primary key and use option (recompile) to get the most appropriate execution plans.
Combine all the IDs together into a single large IN clause, so it reads like:
select Amount, Client, Pallet from table where ID in (1,3,5,7,9,11) and Amount > 0;
"I have tried using SqlDataReaders and SqlDataAdapters"
It sounds like you might be open to other APIs. Using Linq2SQL or Linq2Entities:
var someListIds = new List<int> { 1,5,6,7 }; //imagine you load this from where ever
db.MyTable.Where( mt => someListIds.Contains(mt.ID) );
This is safe in terms of avoiding potential SQL injection vulnerabilities and will generate a "in" clause. Note however the size of the someListIds can be so large that the SQL query generated exceeds limits of query length, but the same is true of any other technique involving the IN clause. You can easily workaround that by partitioning lists into large chunks, and still be tremendously better than a query per ID.
Use Table-Valued Parameters
With them you can pass a c# datatable with your values into a stored procedure as a resultset/table which you can join to and do a simple:
SELECT *
FROM YourTable
WHERE NOT EXISTS (SELECT * FORM InputResultSet WHERE YourConditions)
Use the in operator. Your problem is very common and it has a name: N+1 performance problem
Where are you getting the IDs from? If it is from another query, then consider grouping them into one.
Rather than performing a separate query for every single ID that you have, execute one query to get the amount of every single ID that you want to check (or if you have too many IDs to put in one query, then batch them into batches of a few thousand).
Import the data directly to SQL Server. Use stored procedure to output the data you need.
If you must consume it in the app tier... use xml datatype to pass into a stored procedure.
You can import the data from the excel file into SQL server as a table (using the import data wizard). Then you can perform a single query in SQL server where you join this table to your lookup table, joining on the ID field. There's a few more steps to this process, but it's a lot neater than trying to concatenate all the IDs into a much longer query.
I'm assuming a certain amount of access privileges to the server here, but this is what I'd do given the access I normally have. I'm also assuming this is a one off task. If not, the import of the data to SQL server can be done programmatically as well
IN clause has limits, so if you go with that approach, make sure a batch size is used to process X amount of Ids at a time, otherwise you will hit another issue.
A #Robertharvey has noted, if there are not a lot of IDs and there are no transactions occurring, then just pull all the Ids at once into memory into a dictionary like object and process them there. Six thousand values is not alot and a single select could return all those back within a few seconds.
Just remember that if another process is updating the data, your local cached version may be stale.
There is another way to handle this, Making XML of IDs and pass it to procedure. Here is code for procedure.
IF OBJECT_ID('GetDataFromDatabase') IS NOT NULL
BEGIN
DROP PROCEDURE GetDataFromDatabase
END
GO
--Definition
CREATE PROCEDURE GetDataFromDatabase
#xmlData XML
AS
BEGIN
DECLARE #DocHandle INT
DECLARE #idList Table (id INT)
EXEC SP_XML_PREPAREDOCUMENT #DocHandle OUTPUT, #xmlData;
INSERT INTO #idList (id) SELECT x.id FROM OPENXML(#DocHandle, '//data', 2) WITH ([id] INT) x
EXEC SP_XML_removeDOCUMENT #DocHandle ;
--SELECT * FROM #idList
SELECT t.Amount, t.Client, t.Pallet FROM yourTable t INNER JOIN #idList x ON t.id = x.id and t.Amount > 0;
END
GO
--Uses
EXEC GetDataFromDatabase #xmlData = '<root><data><id>1</id></data><data><id>2</id></data></root>'
You can put any logic in procedure. You can pass id, amount also via XML. You can pass huge list of ids via XML.
SqlDataAdapter objects too heavy for that.
Firstly, using stored procedures, it will be faster.
Secondly, use the group operation, for this pass as a parameter to a list of identifiers on the side of the database, run a query on these parameters, and return the processed result.
It will quickly and efficiently, as all data processing logic is on the side of the database server
You can select the whole resultset (or join multiple 'limited' result sets) and save it all to DataTable Then you can do selects and updates (if needed) directly on datatable. Then plug new data back... Not super efficient memory wise, but often is very good (and only) solution when working in bulk and need it to be very fast.
So if you have thousands of records, it might take couple of minutes to populate all records into the DataTable
then you can search your table like this:
string findMatch = "id = value";
DataRow[] rowsFound = dataTable.Select(findMatch);
Then just loop foreach (DataRow dr in rowsFound)
I am using a C# application, in order to manage a mySQL database.
What I want to do is:
Read some records.
Run some functions to calculate "stuff".
Insert "stuff" to database.
In order to calculate n-th "stuff", I must have already calculated (n-1)-th "stuff".
This is what I do:
Declare:
static MySqlCommand cmd;
static MySqlDataReader dr;
My main loop is like following:
for (...)
{
dr.Close();
cmd.CommandText = "insert into....";
dr = cmd.ExecuteReader();
}
This is taking way too long. Total number of rows to be inserted is about 2.5M.
When I use mySql database in regular server, it takes about 100-150 hours. When I use a localhost database, it takes about 50h.
I think there should be a quicker way. My thoughts:
I think that now i connect to db and disconnect from db every time i loop. Is it true?
I could i create a CommandText that contains for example 100 queries (separated by semi-colon). Is this possible?
Instead of executing the queries, output them in a text file (file will be about 300MB). Then insert them into db using phpMyAdmin (Bonus question: I'm using phpMyAdmin. Is this ok? Is there a better (maybe not web) interface?)
Try using a bulk insert. I found this syntax here. And then use ExecuteNonQuery() as SLaks suggested in the comments. Those combined may speed it up a good bit.
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
It's possible you are using InnoDB as an access method. In this case you should try wrapping every hundred or so rows of INSERT operations in a transaction. When I have had to handle this kind of application, it's made a huge difference. To do this, structure your code like so:
MySqlCommand commit;
start.CommandText = "START TRANSACTION";
MySqlCommand commit;
commit.CommandText = "COMMIT";
int bunchSize = 100;
int bunch = 0;
start.ExecuteNonQuery(); /* start the first bunch transaction */
bunch = bunchsize;
for(/*whatever loop conditions you need*/) {
/* whatever you need to do */
/* your insert statement */
if (--bunchsize <= 0) {
commit.ExecuteNonQuery(); /* end one bunch transaction */
start.ExecuteNonQuery(); /* and begin the next */
bunchsize = bunch;
}
}
commit.ExecuteNonQuery(); /* end the last bunch transaction */
It is also possible that the table to which you're inserting megarows has lots of indexes. In this case you can speed things up by beginning your series of INSERTs with
SET unique_checks=0;
SET foreign_key_checks=0;
ALTER TABLE tbl_name DISABLE KEYS;
and ending it with this sequence.
ALTER TABLE tbl_name ENABLE KEYS;
SET unique_checks=1;
SET foreign_key_checks=1;
You must take great care in your software to avoid inserting rows that would be rejected as duplicates when you use this technique, because the ENABLE KEYS operation will not work in that case.
Read this for more information: http://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb-bulk-data-loading.html
I am new to Database interection with C#, I am trying to writing 10000 records in database in a loop with the help of SqlCommand and SqlConnection objects with the help of SqlTransaction and committing after 5000. It is taking 10 seconds to processed.
SqlConnection myConnection = new SqlConnection("..Connection String..");
myConnection.Open();
SqlCommand myCommand = new SqlCommand();
myCommand.CommandText = "exec StoredProcedureInsertOneRowInTable Param1, Param2........";
myCommand.Connection = myConnection;
SqlTransaction myTrans = myConnection.Begintransaction();
for(int i=0;i<10000;i++)
{
mycommand.ExecuteNonQuery();
if(i%5000==0)
{
myTrans.commit();
myTrans = myConnection.BeginTransaction();
mycommand.Transaction = myTrans;
}
}
Above code is giving me only 1000 rows write/sec in database.
But when i tried to implement same logic in SQL and execute it on Database with SqlManagement Studio the it gave me 10000 write/sec.
When I compare the behaviour of above two approch then it showes me that while executing with ADO.Net there is large number of Logical reads.
my questions are:
1. Why there is logical reads in ADO.Net execution?
2. Is tansaction have some hand shaking?
3. Why they are not available in case of management studio?
4. If I want very fast insert transactions on DB then what will be the approach? .
Updated Information about Database objects
Table: tbl_FastInsertTest
No Primary Key, Only 5 fields first three are type of int (F1,F2,F3) and last 2(F4,F5) are type varchar(30)
storedprocedure:
create proc stp_FastInsertTest
{
#nF1 int,
#nF2 int,
#nF3 int,
#sF4 varchar(30),
#sF5 varchar(30)
}
as
Begin
set NoCOUNT on
Insert into tbl_FastInsertTest
{
[F1],
[F2],
[F3],
[F4],
[F5]
}
Values
{
#nF1,
#nF2,
#nF3,
#sF4,
#sF5,
} end
--------------------------------------------------------------------------------------
SQL Block Executing on SSMS
--When I am executing following code on SSMS then it is giving me more than 10000 writes per second but when i tried to execute same STP on ADO than it gave me 1000 to 1200 writes per second
--while reading no locks
begin trans
declare #i int
set #i=0
While(1<>0)
begin
exec stp_FastInsertTest 1,2,3,'vikram','varma'
set #i=#i+1
if(#i=5000)
begin
commit trans
set #i=0
begin trans
end
end
If you are running something like:
exec StoredProcedureInsertOneRowInTable 'blah', ...
exec StoredProcedureInsertOneRowInTable 'bloop', ...
exec StoredProcedureInsertOneRowInTable 'more', ...
in SSMS, that is an entirely different scenario, where all of that is a single batch. With ADO.NET you are paying a round-trip per ExecuteNonQuery - I'm actually impressed it managed 1000/s.
Re the logical reads, that could just be looking at the query-plan cache, but without knowing more about StoredProcedureInsertOneRowInTable it is impossible to comment on whether something query-specific is afoot. But I suspect you have some different SET conditions between SSMS and ADO.NET that is forcing it to use a different plan - this is in particular a problem with things like persisted calculated indexed columns, and columns "promoted" out of a sql-xml field.
Re making it faster - in this case it sounds like a table-valued parameters is exactly the thing, but you should also review the other options here
For performant inserts take a look at SqlBulkCopy class if it works for you it should be fast.
As Sean said, using parameterized queries is always a good idea.
Using a StringBuilder class, batching thousand INSERT statements in a single query and committing the transaction is a proven way of inserting data:
var sb=new StringBuilder();
for(int i=0;i < 1000;i++)
{
sb.AppendFormat("INSERT INTO Table(col1,col2)
VALUES({0},{1});",values1[i],values2[i]);
}
sqlCommand.Text=sb.ToString();
Your code doesn't look right to me, you are not committing transactions at each batch. Your code keeps opening new transactions.
It is always a good practice to drop indexes while inserting a lot of data, and adding them later. Indexes will slow down your writes.
Sql Management Studio does not have transactions but Sql has, try this:
BEGIN TRANSACTION MyTransaction
INSERT INTO Table(Col1,Col1) VALUES(Val10,Val20);
INSERT INTO Table(Col1,Col1) VALUES(Val11,Val21);
INSERT INTO Table(Col1,Col1) VALUES(Val12,Val23);
COMMIT TRANSACTION
You need to use a parameterized query so that the execution path can get processed and cached. Since you're using string concatenation (shudder, this is bad, google sql injection) to build the query, SQL Server treats those 10,000 queries are separate, individual queries and builds an execution plan for each one.
MSDN: http://msdn.microsoft.com/en-us/library/yy6y35y8.aspx although you're going to want to simplify their code a bit and you'll have to reset the parameters on the command.
If you really, really want to get the data in the db fast, think about using bcp... but you better make sure the data is clean first (as there's no real error checking/handling on it.
I have a list of objects, this list contains about 4 million objects. there is a stored proc that takes objects attributes as params , make some lookups and insert them into tables.
what s the most efficient way to insert this 4 million objects to db?
How i do :
-- connect to sql - SQLConnection ...
foreach(var item in listofobjects)
{
SQLCommand sc = ...
// assign params
sc.ExecuteQuery();
}
THis has been really slow.
is there a better way to do this?
this process will be a scheduled task. i will run this ever hour, so i do expect high volume data like this.
Take a look at the SqlBulkCopy Class
based on your comment, dump the data into a staging table then do the lookup and insert into the real table set based from a proc....it will be much faster than row by row
It's never going to be ideal to insert four million records from C#, but a better way to do it is to build the command text up in code so you can do it in chunks.
This is hardly bulletproof, and it doesn't illustrate how to incorporate lookups (as you've mentioned you need), but the basic idea is:
// You'd modify this to chunk it out - only testing can tell you the right
// number - perhaps 100 at a time.
for(int i=0; i < items.length; i++) {
// e.g., 'insert dbo.Customer values(#firstName1, #lastName1)'
string newStatement = string.Format(
"insert dbo.Customer values(#firstName{0}, #lastName{0})", i);
command.CommandText += newStatement;
command.Parameters.Add("#firstName" + i, items[i].FirstName);
command.Parameters.Add("#lastName" + i, items[i].LastName);
}
// ...
command.ExecuteNonQuery();
I have had excellent results using XML to get large amounts of data into SQL Server. Like you, I initially was inserting rows one at a time which took forever due to the round trip time between the application and the server, then I switched the logic to pass in an XML string containing all the rows to insert. Time to insert went from 30 minutes to less that 5 seconds. This was for a couple of thousand rows. I have tested with XML strings up to 20 megabytes in size and there were no issues. Depending on your row size this might be an option.
The data was passed in as an XML String using the nText type.
Something like this formed the basic details of the stored procedure that did the work:
CREATE PROCEDURE XMLInsertPr( #XmlString ntext )
DECLARE #ReturnStatus int, #hdoc int
EXEC #ReturnStatus = sp_xml_preparedocument #hdoc OUTPUT, #XmlString
IF (#ReturnStatus <> 0)
BEGIN
RAISERROR ('Unable to open XML document', 16,1,50003)
RETURN #ReturnStatus
END
INSERT INTO TableName
SELECT * FROM OPENXML(#hdoc, '/XMLData/Data') WITH TableName
END
You might consider dropping any indexes you have on the table(s) you are inserting into and then recreating them after you have inserted everything. I'm not sure how the bulk copy class works but if you are updating your indexes on every insert it can slow things down quite a bit.
Like Abe metioned: drop indexes (and recreate later)
If you trust your data: generate a sql statement for each call to the stored proc, combine some, and then execute.
This saves you communication overhead.
The combined calls (to the stored proc) could be wrapped in a BEGIN TRANSACTION so you have only one commit per x inserts
If this is a onetime operation: do no optimize and run it during the night / weekend