ADO.NET and SQLite single cell select performance

ADO.NET and SQLite single cell select performance - c#

I want to create simple database in runtime, fill it with data from internal resource and then read each record through loop. Previously I used LiteDb for that but I couldn't squeeze time anymore so
I choosed SQLite.
I think there are few things to improve I am not aware of.
Database creation process:
First step is to create table
using var create = transaction.Connection.CreateCommand();
create.CommandText = "CREATE TABLE tableName (Id TEXT PRIMARY KEY, Value TEXT) WITHOUT ROWID";
create.ExecuteNonQuery();
Next insert command is defined
var insert = transaction.Connection.CreateCommand();
insert.CommandText = "INSERT OR IGNORE INTO tableName VALUES (#Id, #Record)";
var idParam = insert.CreateParameter();
var valueParam = insert.CreateParameter();
idParam.ParameterName = "#" + IdColumn;
valueParam.ParameterName = "#" + ValueColumn;
insert.Parameters.Add(idParam);
insert.Parameters.Add(valueParam);
Through loop each value is inserted
idParameter.Value = key;
valueParameter.Value = value.ValueAsText;
insert.Parameters["#Id"] = idParameter;
insert.Parameters["#Value"] = valueParameter;
insert.ExecuteNonQuery();
Transaction commit transaction.Commit();
Create index
using var index = transaction.Connection.CreateCommand();
index.CommandText = "CREATE UNIQUE INDEX idx_tableName ON tableName(Id);";
index.ExecuteNonQuery();
And after that i perform milion selects (to retrieve single value):
using var command = _connection.CreateCommand();
command.CommandText = "SELECT Value FROM tableName WHERE Id = #id;";
var param = command.CreateParameter();
param.ParameterName = "#id";
param.Value = id;
command.Parameters.Add(param);
return command.ExecuteReader(CommandBehavior.SingleResult).ToString();
For all select's one connection is shared and never closed. Insert is quite fast (less then minute) but select's are very troublesome here. Is there a way to improve them?
Table is quite big (around ~2 milions records) and Value contains quite heavy serialized objects.
System.Data.SQLite provider is used and connection string contains this additional options: Version=3;Journal Mode=Off;Synchronous=off;

If you go for performance, you need to consider this: each independent SELECT command is a roundtrip to the DB with some extra costs. It's similar to a N+1 select problem in case of parent-child relations.
The best thing you can do is to get a LIST of items (values):
SELECT Value FROM tableName WHERE Id IN (1, 2, 3, 4, ...);
Here's a link on how to code that: https://www.mikesdotnetting.com/article/116/parameterized-in-clauses-with-ado-net-and-linq

You could have the select command not recreated for every Id but created once and only executed for every Id. From your code it seems every select is CreateCommand/CreateParameters and so on. See this for example: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.prepare?view=net-5.0 - you run .Prepare() once and then only execute (they don't need to be NonQuery)
you could then try to see if you can be faster with ExecuteScalar and not having reader created for one data result, like so: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.executescalar?view=net-5.0
If scalar will not prove to be faster then you could try to use .SingleRow instead of .SingleResult in your ExecuteReader for possible performance optimisations. According to this: https://learn.microsoft.com/en-us/dotnet/api/system.data.commandbehavior?view=net-5.0 it might work. I doubt that but if first two don't help, why not try it too.

Related

Get output 'inserted' on update with Entity Framework

SQL Server provides output for inserted and updated record with the 'inserted' keyword.
I have a table representing a processing queue. I use the following query to lock a record and get the ID of the locked record:
UPDATE TOP (1) GlobalTrans
SET LockDateTime = GETUTCDATE()
OUTPUT inserted.ID
WHERE LockDateTime IS NULL
This will output a column named ID with all the updated record IDs (a single ID in my case). How can I translate this into EF in C# to execute the update and get the ID back?

Entity Framework has no way of doing that.
You could do it the ORM way, by selecting all the records, setting their LockDateTime and writing them back. That probably is not safe for what you want to do because by default it's not one single transaction.
You can span your own transactions and use RepeatableRead as isolation level. That should work. Depending on what your database does in the background, it might be overkill though.
You could write the SQL by hand. That defeats the purpose of entity framework, but it should be just as safe as it was before as far as the locking mechanism is concerned.
You could also put it into a stored procedure and call that. It's a little bit better than the above version because at least somebody will compile it and check that the table and column names are correct.

Simple Example #1 to get a data table:
I did this directly against the connection:
Changed the command.ExecuteNonQuery() to command.ExecuteReader()
var connection = DbContext().Database.Connection as SqlConnection;
using (var command = connection.CreateCommand())
{
command.CommandText = sql;
command.CommandTimeout = 120;
command.Parameters.Add(param);
using (var reader = command.ExecuteReader())
{
var resultTable = new DataTable();
resultTable.Load(reader);
return resultTable;
}
}
FYI, If you don't have an OUTPUT clause in your SQL, it will return an empty data table.
Example #2 to return entities:
This is a bit more complicated but does work.
using a SQL statement with a OUTPUT inserted.*
var className = typeof(T).Name;
var container = ObjContext().MetadataWorkspace.GetEntityContainer(UnitOfWork.ObjContext().DefaultContainerName, DataSpace.CSpace);
var setName = (from meta in container.BaseEntitySets where meta.ElementType.Name == className select meta.Name).First();
var results = ObjContext().ExecuteStoreQuery<T>(sql, setName, trackingEnabled ? MergeOption.AppendOnly : MergeOption.NoTracking).ToList();
T being the entity being worked on

Want to use identity returned from insert in subsequent insert in a transaction

I'm using Rob Conery's Massive for database access. I want to wrap a transaction around a couple of inserts but the second insert uses the identity returned from the first insert. It's not obvious to me how to do this in a transaction. Some assistance would be appreciated.
var commandList = new List<DbCommand>
{
contactTbl.CreateInsertCommand(new
{
newContact.Name,
newContact.Contact,
newContact.Phone,
newContact.ForceChargeThreshold,
newContact.MeterReadingMethodId,
LastModifiedBy = userId,
LastModifiedDate = modifiedDate,
}),
branchContactTbl.CreateInsertCommand(new
{
newContact.BranchId,
ContactId = ????, <-- how to set Id as identity from previous command
}),
};

Make a query between those two inserts, this method from Massive may be useful:
public object Scalar(string sql, params object[] args) {
object result = null;
using (var conn = OpenConnection()) {
result = CreateCommand(sql, conn, args).ExecuteScalar();
}
return result;
}
Your sql will be = "select scope_identity()"
UPDATE 2013/02/26
Looking again at the Massive code there is no reliable way to retrieve last inserted ID.
Code above will work only when connection that makes "select scope_identity()" is pooled. (It must be the same connection that made insert).
Massive table.Insert(..) method returns Dynamic that contains ID field, which is filled with "SELECT ##IDENTITY". It gets last inserted ID from global scope, which is obvious bug (apparent in multithreading scenarios).

Can you just do it in a stored proc? The you can use scope_identity or better yet the output clause to get the value(s) you need. And all the inserts to all the tables are in one transaction which can be rolled back if any of them fail.

Getting the Last Insert ID with SQLite.NET in C#

I have a simple problem with a not so simple solution... I am currently inserting some data into a database like this:
kompenzacijeDataSet.KompenzacijeRow kompenzacija = kompenzacijeDataSet.Kompenzacije.NewKompenzacijeRow();
kompenzacija.Datum = DateTime.Now;
kompenzacija.PodjetjeID = stranka.id;
kompenzacija.Znesek = Decimal.Parse(tbZnesek.Text);
kompenzacijeDataSet.Kompenzacije.Rows.Add(kompenzacija);
kompenzacijeDataSetTableAdapters.KompenzacijeTableAdapter kompTA = new kompenzacijeDataSetTableAdapters.KompenzacijeTableAdapter();
kompTA.Update(this.kompenzacijeDataSet.Kompenzacije);
this.currentKompenzacijaID = LastInsertID(kompTA.Connection);
The last line is important. Why do I supply a connection? Well there is a SQLite function called last_insert_rowid() that you can call and get the last insert ID. Problem is it is bound to a connection and .NET seems to be reopening and closing connections for every dataset operation. I thought getting the connection from a table adapter would change things. But it doesn't.
Would anyone know how to solve this? Maybe where to get a constant connection from? Or maybe something more elegant?
Thank you.
EDIT:
This is also a problem with transactions, I would need the same connection if I would want to use transactions, so that is also a problem...

Using C# (.net 4.0) with SQLite, the SQLiteConnection class has a property LastInsertRowId that equals the Primary Integer Key of the most recently inserted (or updated) element.
The rowID is returned if the table doesn't have a primary integer key (in this case the rowID is column is automatically created).
See https://www.sqlite.org/c3ref/last_insert_rowid.html for more.
As for wrapping multiple commands in a single transaction, any commands entered after the transaction begins and before it is committed are part of one transaction.
long rowID;
using (SQLiteConnection con = new SQLiteConnection([datasource])
{
SQLiteTransaction transaction = null;
transaction = con.BeginTransaction();
... [execute insert statement]
rowID = con.LastInsertRowId;
transaction.Commit()
}

select last_insert_rowid();
And you will need to execute it as a scalar query.
string sql = #"select last_insert_rowid()";
long lastId = (long)command.ExecuteScalar(sql); // Need to type-cast since `ExecuteScalar` returns an object.

last_insert_rowid() is part of the solution. It returns a row number, not the actual ID.
cmd = CNN.CreateCommand();
cmd.CommandText = "SELECT last_insert_rowid()";
object i = cmd.ExecuteScalar();
cmd.CommandText = "SELECT " + ID_Name + " FROM " + TableName + " WHERE rowid=" + i.ToString();
i = cmd.ExecuteScalar();

I'm using Microsoft.Data.Sqlite package and I do not see a LastInsertRowId property. But you don't have to create a second trip to database to get the last id. Instead, combine both sql statements into a single string.
string sql = #"
insert into MyTable values (null, #name);
select last_insert_rowid();";
using (var cmd = conn.CreateCommand()) {
cmd.CommandText = sql;
cmd.Parameters.Add("#name", SqliteType.Text).Value = "John";
int lastId = Convert.ToInt32(cmd.ExecuteScalar());
}

There seems to be answers to both Microsoft's reference and SQLite's reference and that is the reason some people are getting LastInsertRowId property to work and others aren't.
Personally I don't use an PK as it's just an alias for the rowid column. Using the rowid is around twice as fast as one that you create. If I have a TEXT column for a PK I still use rowid and just make the text column unique. (for SQLite 3 only. You need your own for v1 & v2 as vacuum will alter rowid numbers)
That said, the way to get the information from a record in the last insert is the code below. Since the function does a left join to itself I LIMIT it to 1 just for speed, even if you don't there will only be 1 record from the main SELECT statement.
SELECT my_primary_key_column FROM my_table
WHERE rowid in (SELECT last_insert_rowid() LIMIT 1);

The SQLiteConnection object has a property for that, so there is not need for additional query.
After INSERT you just my use LastInsertRowId property of your SQLiteConnection object that was used for INSERT command.
Type of LastInsertRowId property is Int64.
Off course, as you already now, for auto increment to work the primary key on table must be set to be AUTOINCREMENT field, which is another topic.

database = new SQLiteConnection(databasePath);
public int GetLastInsertId()
{
return (int)SQLite3.LastInsertRowid(database.Handle);
}

# How about just running 2x SQL statements together using Execute Scalar?
# Person is a object that has an Id and Name property
var connString = LoadConnectionString(); // get connection string
using (var conn = new SQLiteConnection(connString)) // connect to sqlite
{
// insert new record and get Id of inserted record
var sql = #"INSERT INTO People (Name) VALUES (#Name);
SELECT Id FROM People
ORDER BY Id DESC";
var lastId = conn.ExecuteScalar(sql, person);
}

In EF Core 5 you can get ID in the object itself without using any "last inserted".
For example:
var r = new SomeData() { Name = "New Row", ...};
dbContext.Add(r);
dbContext.SaveChanges();
Console.WriteLine(r.ID);
you would get new ID without thinking of using correct connection or thread-safety etc.

If you're using the Microsoft.Data.Sqlite package, it doesn't include a LastInsertRowId property in the SqliteConnection class, but you can still call the last_insert_rowid function by using the underlying SQLitePCL library. Here's an extension method:
using Microsoft.Data.Sqlite;
using SQLitePCL;
public static long GetLastInsertRowId(this SqliteConnection connection)
{
var handle = connection.Handle ?? throw new NullReferenceException("The connection is not open.");
return raw.sqlite3_last_insert_rowid(handle);
}

Possible to get PrimaryKey IDs back after a SQL BulkCopy?

I am using C# and using SqlBulkCopy. I have a problem though. I need to do a mass insert into one table then another mass insert into another table.
These 2 have a PK/FK relationship.
Table A
Field1 -PK auto incrementing (easy to do SqlBulkCopy as straight forward)
Table B
Field1 -PK/FK - This field makes the relationship and is also the PK of this table. It is not auto incrementing and needs to have the same id as to the row in Table A.
So these tables have a one to one relationship but I am unsure how to get back all those PK Id that the mass insert made since I need them for Table B.
Edit
Could I do something like this?
SELECT *
FROM Product
WHERE NOT EXISTS (SELECT * FROM ProductReview WHERE Product.ProductId = ProductReview.ProductId AND Product.Qty = NULL AND Product.ProductName != 'Ipad')
This should find all the rows that where just inserted with the sql bulk copy. I am not sure how to take the results from this then do a mass insert with them from a SP.
The only problem I can see with this is that if a user is doing the records one at a time and a this statement runs at the same time it could try to insert a row twice into the "Product Review Table".
So say I got like one user using the manual way and another user doing the mass way at about the same time.
manual way.
1. User submits data
2. Linq to sql Product object is made and filled with the data and submited.
3. this object now contains the ProductId
4. Another linq to sql object is made for the Product review table and is inserted(Product Id from step 3 is sent along).
Mass way.
1. User grabs data from a user sharing the data.
2. All Product rows from the sharing user are grabbed.
3. SQL Bulk copy insert on Product rows happens.
4. My SP selects all rows that only exist in the Product table and meets some other conditions
5. Mass insert happens with those rows.
So what happens if step 3(manual way) is happening at the same time as step 4(mass way). I think it would try to insert the same row twice causing a primary constraint execption.

In that scenario, I would use SqlBulkCopy to insert into a staging table (i.e. one that looks like the data I want to import, but isn't part of the main transactional tables), and then at the DB to a INSERT/SELECT to move the data into the first real table.
Now I have two choices depending on the server version; I could do a second INSERT/SELECT to the second real table, or I could use the INSERT/OUTPUT clause to do the second insert , using the identity rows from the table.
For example:
-- dummy schema
CREATE TABLE TMP (data varchar(max))
CREATE TABLE [Table1] (id int not null identity(1,1), data varchar(max))
CREATE TABLE [Table2] (id int not null identity(1,1), id1 int not null, data varchar(max))
-- imagine this is the SqlBulkCopy
INSERT TMP VALUES('abc')
INSERT TMP VALUES('def')
INSERT TMP VALUES('ghi')
-- now push into the real tables
INSERT [Table1]
OUTPUT INSERTED.id, INSERTED.data INTO [Table2](id1,data)
SELECT data FROM TMP

If your app allows it, you could add another column in which you store an identifier of the bulk insert (a guid for example). You would set this id explicitly.
Then after the bulk insert, you just select the rows that have that identifier.

I had the same issue where I had to get back ids of the rows inserted with SqlBulkCopy.
My ID column was an identity column.
Solution:
I have inserted 500+ rows with bulk copy, and then selected them back with the following query:
SELECT TOP InsertedRowCount *
FROM MyTable
ORDER BY ID DESC
This query returns the rows I have just inserted with their ids. In my case I had another unique column. So I selected that column and id. Then mapped them with a IDictionary like so:
IDictionary<string, int> mymap = new Dictionary<string, int>()
mymap[Name] = ID
Hope this helps.

My approach is similar to what RiceRiceBaby described, except one important thing to add is that the call to retrieve Max(Id) needs to be a part of a transaction, along with the call to SqlBulkCopy.WriteToServer. Otherwise, someone else may insert during your transaction and this would make your Id's incorrect. Here is my code:
public static void BulkInsert<T>(List<ColumnInfo> columnInfo, List<T> data, string
destinationTableName, SqlConnection conn = null, string idColumn = "Id")
{
NLogger logger = new NLogger();
var closeConn = false;
if (conn == null)
{
closeConn = true;
conn = new SqlConnection(_connectionString);
conn.Open();
}
SqlTransaction tran =
conn.BeginTransaction(System.Data.IsolationLevel.Serializable);
try
{
var options = SqlBulkCopyOptions.KeepIdentity;
var sbc = new SqlBulkCopy(conn, options, tran);
var command = new SqlCommand(
$"SELECT Max({idColumn}) from {destinationTableName};", conn,
tran);
var id = command.ExecuteScalar();
int maxId = 0;
if (id != null && id != DBNull.Value)
{
maxId = Convert.ToInt32(id);
}
data.ForEach(d =>
{
maxId++;
d.GetType().GetProperty(idColumn).SetValue(d, maxId);
});
var dt = ConvertToDataTable(columnInfo, data);
sbc.DestinationTableName = destinationTableName;
foreach (System.Data.DataColumn dc in dt.Columns)
{
sbc.ColumnMappings.Add(dc.ColumnName, dc.ColumnName);
}
sbc.WriteToServer(dt);
tran.Commit();
if(closeConn)
{
conn.Close();
conn = null;
}
}
catch (Exception ex)
{
tran.Rollback();
logger.Write(LogLevel.Error, $#"An error occurred while performing a bulk
insert into table {destinationTableName}. The entire
transaction has been rolled back.
{ex.ToString()}");
throw ex;
}
}

Depending on your needs and how much control you have of the tables, you may want to consider using UNIQUEIDENTIFIERs (Guids) instead of your IDENTITY primary keys. This moves key management outside of the database and into your application. There are some serious tradeoffs to this approach, so it may not meet your needs. But it may be worth considering. If you know for sure that you'll be pumping a lot of data into your tables via bulk-insert, it is often really handy to have those keys managed in your object model rather than your application relying on the database to give you back the data.
You could also take a hybrid approach with staging tables as suggested before. Get the data into those tables using GUIDs for the relationships, and then via SQL statements you could get the integer foreign keys in order and pump data into your production tables.

I would:
Turn on identity insert on the table
Grab the Id of the last row of the table
Loop from (int i = Id; i < datable.rows.count+1; i++)
In the loop, assign the Id property of your datable to i+1.
Run your SQL bulk insert with your keep identity turned on.
Turn identity insert back off
I think that's the safest way to get your ids on an SQL bulk insert because it will prevent mismatched ids that could caused by the application be executed on another thread.

Disclaimer: I'm the owner of the project C# Bulk Operations
The library overcome SqlBulkCopy limitations and add flexible features like output inserted identity value.
Behind the code, it does exactly like the accepted answer but way easier to use.
var bulk = new BulkOperation(connection);
// Output Identity
bulk.ColumnMappings.Add("ProductID", ColumnMappingDirectionType.Output);
// ... Column Mappings...
bulk.BulkInsert(dt);

SQL Insert one row or multiple rows data?

I am working on a console application to insert data to a MS SQL Server 2005 database. I have a list of objects to be inserted. Here I use Employee class as example:
List<Employee> employees;
What I can do is to insert one object at time like this:
foreach (Employee item in employees)
{
string sql = #"INSERT INTO Mytable (id, name, salary)
values ('#id', '#name', '#salary')";
// replace #par with values
cmd.CommandText = sql; // cmd is IDbCommand
cmd.ExecuteNonQuery();
}
Or I can build a balk insert query like this:
string sql = #"INSERT INTO MyTable (id, name, salary) ";
int count = employees.Count;
int index = 0;
foreach (Employee item in employees)
{
sql = sql + string.format(
"SELECT {0}, '{1}', {2} ",
item.ID, item.Name, item.Salary);
if ( index != (count-1) )
sql = sql + " UNION ALL ";
index++
}
cmd.CommandType = sql;
cmd.ExecuteNonQuery();
I guess the later case is going to insert rows of data at once. However, if I have
several ks of data, is there any limit for SQL query string?
I am not sure if one insert with multiple rows is better than one insert with one row of data, in terms of performance?
Any suggestions to do it in a better way?

Actually, the way you have it written, your first option will be faster.
Your second example has a problem in it. You are doing sql = + sql + etc. This is going to cause a new string object to be created for each iteration of the loop. (Check out the StringBuilder class). Technically, you are going to be creating a new string object in the first instance too, but the difference is that it doesn't have to copy all the information from the previous string option over.
The way you have it set up, SQL Server is going to have to potentially evaluate a massive query when you finally send it which is definitely going to take some time to figure out what it is supposed to do. I should state, this is dependent on how large the number of inserts you need to do. If n is small, you are probably going to be ok, but as it grows your problem will only get worse.
Bulk inserts are faster than individual ones due to how SQL server handles batch transactions. If you are going to insert data from C# you should take the first approach and wrap say every 500 inserts into a transaction and commit it, then do the next 500 and so on. This also has the advantage that if a batch fails, you can trap those and figure out what went wrong and re-insert just those. There are other ways to do it, but that would definately be an improvement over the two examples provided.
var iCounter = 0;
foreach (Employee item in employees)
{
if (iCounter == 0)
{
cmd.BeginTransaction;
}
string sql = #"INSERT INTO Mytable (id, name, salary)
values ('#id', '#name', '#salary')";
// replace #par with values
cmd.CommandText = sql; // cmd is IDbCommand
cmd.ExecuteNonQuery();
iCounter ++;
if(iCounter >= 500)
{
cmd.CommitTransaction;
iCounter = 0;
}
}
if(iCounter > 0)
cmd.CommitTransaction;

In MS SQL Server 2008 you can create .Net table-UDT that will contain your table
CREATE TYPE MyUdt AS TABLE (Id int, Name nvarchar(50), salary int)
then, you can use this UDT in your stored procedures and your с#-code to batch-inserts.
SP:
CREATE PROCEDURE uspInsert
(#MyTvp AS MyTable READONLY)
AS
INSERT INTO [MyTable]
SELECT * FROM #MyTvp
C# (imagine that records you need to insert already contained in Table "MyTable" of DataSet ds):
using(conn)
{
SqlCommand cmd = new SqlCommand("uspInsert", conn);
cmd.CommandType = CommandType.StoredProcedure;
SqlParameter myParam = cmd.Parameters.AddWithValue
("#MyTvp", ds.Tables["MyTable"]);
myParam.SqlDbType = SqlDbType.Structured;
myParam.TypeName = "dbo.MyUdt";
// Execute the stored procedure
cmd.ExecuteNonQuery();
}
So, this is the solution.
Finally I want to prevent you from using code like yours (building the strings and then execute this string), because this way of executing may be used for SQL-Injections.

look at this thread,
I've answered there about table valued parameter.

Bulk-copy is usually faster than doing inserts on your own.
If you still want to do it in one of your suggested ways you should make it so that you can easily change the size of the queries you send to the server. That way you can optimize for speed in your production environment later on. Query times may v ary alot depending on the query size.

The batch size for a SQL Server query is listed at being 65,536 * the network packet size. The network packet size is by default 4kbs but can be changed. Check out the Maximum capacity article for SQL 2008 to get the scope. SQL 2005 also appears to have the same limit.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.