Best way to do bulk inserts using dapper.net [closed] - c#

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am using the following code to insert records to a table in SQL Server 2014
using (SqlConnection conn = new SqlConnection(ConfigurationManager.AppSettings["myConnString"]))
{
conn.Execute("INSERT statement here", insertList);
}
The insertList is a list that has 1 million items in it. I tested this insert on a i5 desktop and it took about 65 minutes to insert a million records to SQL Server on the same machine. I am not sure how dapper is doing the inserts behind the scenes. I certainly dont want to open and close the database connection a million times!
Is this the best way to do bulk inserts in dapper or should I try something else or go with plain ADO.Net using Enterprise library?
EDIT
In hindsight, I know using ADO.Net will be better, so will rephrase my question. I still would like to know if this is the best that dapper can do or am I missing a better way to do it in dapper itself?

Building on Ehsan Sajjad's comment, one of the ways is to write a stored procedure that has a READONLY parameter of a user-defined TABLE type.
Say you want to bulk insert contacts that consist of a first name and last name, this is how you would go about it:
1) Create a table type:
CREATE TYPE [dbo].[MyTableType] AS TABLE(
[FirstName] [varchar](50) NULL,
[LastName] [varchar](50) NULL
)
GO
2) Now create a stored proc that uses the above table type:
CREATE PROC [dbo].[YourProc]
/*other params here*/
#Names AS MyTableType READONLY
AS
/* proc body here
*/
GO
3) On the .NET side, pass the parameter as System.Data.SqlDbType.Structured
This usually involves creating an in-memory data-table, then adding rows to it and then using this DataTable object as the #Names parameter.
NOTE: The DataTable is considered to be memory intensive - be careful and profile your code to be sure that it does not cause resource issues on your server.
ALTENATIVE SOLUTION
Use the approach outlined here: https://stackoverflow.com/a/9947259/190476
The solution is for DELETE but can be adapted for an insert or update as well.

The first choice should be SQL Bulk Copy, cause it's safe from SQL injection.
However, there is a way to drastically improve performance. You could merge multiple inserts into one SQL and have only one call instead of multiple.
So instead of this:
You can have this:
Code for inserting Users in bulk can look like this:
public async Task InsertInBulk(IList<string> userNames)
{
var sqls = GetSqlsInBatches(userNames);
using (var connection = new SqlConnection(ConnectionString))
{
foreach (var sql in sqls)
{
await connection.ExecuteAsync(sql);
}
}
}
private IList<string> GetSqlsInBatches(IList<string> userNames)
{
var insertSql = "INSERT INTO [Users] (Name, LastUpdatedAt) VALUES ";
var valuesSql = "('{0}', getdate())";
var batchSize = 1000;
var sqlsToExecute = new List<string>();
var numberOfBatches = (int)Math.Ceiling((double)userNames.Count / batchSize);
for (int i = 0; i < numberOfBatches; i++)
{
var userToInsert = userNames.Skip(i * batchSize).Take(batchSize);
var valuesToInsert = userToInsert.Select(u => string.Format(valuesSql, u));
sqlsToExecute.Add(insertSql + string.Join(',', valuesToInsert));
}
return sqlsToExecute;
}
Whole article and performance comparison is available here: http://www.michalbialecki.com/2019/05/21/bulk-insert-in-dapper/

The best free way to insert with excellent performance is using the SqlBulkCopy class directly as Alex and Andreas suggested.
Disclaimer: I'm the owner of the project Dapper Plus
This project is not free but supports the following operations:
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
By using mapping and allowing to output value like identity columns.
// CONFIGURE & MAP entity
DapperPlusManager.Entity<Order>()
.Table("Orders")
.Identity(x => x.ID);
// CHAIN & SAVE entity
connection.BulkInsert(orders)
.AlsoInsert(order => order.Items);
.Include(x => x.ThenMerge(order => order.Invoice)
.AlsoMerge(invoice => invoice.Items))
.AlsoMerge(x => x.ShippingAddress);

I faced an issue of a solution wich should work with ADO, Entity and Dapper, so a made this lib; it generates batches in form of
IEnumerable<(string SqlQuery, IEnumerable<SqlParameter> SqlParameters)>
IEnumerable<(string SqlQuery, DynamicParameters DapperDynamicParameters)>
this link contains instructions. It's safe against SQL Injection, because the usage of parameters instead concatenation.
Usage with Dapper:
using MsSqlHelpers;
var mapper = new MapperBuilder<Person>()
.SetTableName("People")
.AddMapping(person => person.FirstName, columnName: "Name")
.AddMapping(person => person.LastName, columnName: "Surename")
.AddMapping(person => person.DateOfBirth, columnName: "Birthday")
.Build();
var people = new List<Person>()
{
new Person() { FirstName = "John", LastName = "Lennon", DateOfBirth = new DateTime(1940, 10, 9) },
new Person() { FirstName = "Paul", LastName = "McCartney", DateOfBirth = new DateTime(1942, 6, 18) },
};
var connectionString = "Server=SERVER_ADDRESS;Database=DATABASE_NAME;User Id=USERNAME;Password=PASSWORD;";
var sqlQueriesAndDapperParameters = new MsSqlQueryGenerator().GenerateDapperParametrizedBulkInserts(mapper, people);
using (var sqlConnection = new SqlConnection(connectionString))
{
// Default batch size: 1000 rows or (2100-1) parameters per insert.
foreach (var (SqlQuery, DapperDynamicParameters) in sqlQueriesAndDapperParameters)
{
sqlConnection.Execute(SqlQuery, DapperDynamicParameters);
}
}

Related

Build efficient SQL statements with multiple parameters in C#

I have a list of items with different ids which represent a SQL table's PK values.
Is there any way to build an efficient and safe statement?
Since now I've always prepared a string representing the statement and build it as I traversed the list via a foreach loop.
Here's an example of what I'm doing:
string update = "UPDATE table SET column = 0 WHERE";
foreach (Line l in list)
{
update += " id = " + l.Id + " OR";
}
// To remove last OR
update.Remove(update.Length - 3);
MySqlHelper.ExecuteNonQuery("myConnectionString", update);
Which feels very unsafe and looks very ugly.
Is there a better way for this?
So yeah, in SQL you've got the 'IN' keyword which allows you to specify a set of values.
This should accomplish what you would like (syntax might be iffy, but the idea is there)
var ids = string.Join(',', list.Select(x => x.Id))
string update = $"UPDATE table SET column = 0 WHERE id IN ({ids})";
MySqlHelper.ExecuteNonQuery("myConnectionString", update);
However, the way you're performing your SQL can be considered dangerous (you should be fine as this just looks like ids from a DB, who knows, better to be safe than sorry). Here you're passing parameters straight into your query string, which is a potential risk to SQL injection which is very dangerous. There are ways around this, and using the inbuilt .NET 'SqlCommand' object
https://www.w3schools.com/sql/sql_injection.asp
https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlcommand?view=dotnet-plat-ext-6.0
It would be more efficient to use IN operator:
string update = "UPDATE table SET column = 0 WHERE id IN (";
foreach (Line l in list)
{
update += l.Id + ",";
}
// To remove last comma
update.Remove(update.Length - 1);
// To insert closing bracket
update += ")";
If using .NET Core Framework, see the following library which creates parameters for a WHERE IN. The library is a port from VB.NET which I wrote in Framework 4.7 years ago. Clone the repository, get SqlCoreUtilityLibrary project for creating statements.
Setup.
public void UpdateExample()
{
var identifiers = new List<int>() { 1, 3,20, 2, 45 };
var (actual, exposed) = DataOperations.UpdateExample(
"UPDATE table SET column = 0 WHERE id IN", identifiers);
Console.WriteLine(actual);
Console.WriteLine(exposed);
}
Just enough code to create the parameterizing SQL statement. Note ActualCommandText method is included for development, not for production as it reveals actual values for parameters.
public static (string actual, string exposed) UpdateExample(string commandText, List<int> identifiers)
{
using var cn = new SqlConnection() { ConnectionString = GetSqlConnection() };
using var cmd = new SqlCommand() { Connection = cn };
cmd.CommandText = SqlWhereInParamBuilder.BuildInClause(commandText + " ({0})", "p", identifiers);
cmd.AddParamsToCommand("p", identifiers);
return (cmd.CommandText, cmd.ActualCommandText());
}
For a real app all code would be done in the method above rather than returning the two strings.
Results
UPDATE table SET column = 0 WHERE id IN (#p0,#p1,#p2,#p3,#p4)
UPDATE table SET column = 0 WHERE id IN (1,3,20,2,45)

ADO.NET and SQLite single cell select performance

I want to create simple database in runtime, fill it with data from internal resource and then read each record through loop. Previously I used LiteDb for that but I couldn't squeeze time anymore so
I choosed SQLite.
I think there are few things to improve I am not aware of.
Database creation process:
First step is to create table
using var create = transaction.Connection.CreateCommand();
create.CommandText = "CREATE TABLE tableName (Id TEXT PRIMARY KEY, Value TEXT) WITHOUT ROWID";
create.ExecuteNonQuery();
Next insert command is defined
var insert = transaction.Connection.CreateCommand();
insert.CommandText = "INSERT OR IGNORE INTO tableName VALUES (#Id, #Record)";
var idParam = insert.CreateParameter();
var valueParam = insert.CreateParameter();
idParam.ParameterName = "#" + IdColumn;
valueParam.ParameterName = "#" + ValueColumn;
insert.Parameters.Add(idParam);
insert.Parameters.Add(valueParam);
Through loop each value is inserted
idParameter.Value = key;
valueParameter.Value = value.ValueAsText;
insert.Parameters["#Id"] = idParameter;
insert.Parameters["#Value"] = valueParameter;
insert.ExecuteNonQuery();
Transaction commit transaction.Commit();
Create index
using var index = transaction.Connection.CreateCommand();
index.CommandText = "CREATE UNIQUE INDEX idx_tableName ON tableName(Id);";
index.ExecuteNonQuery();
And after that i perform milion selects (to retrieve single value):
using var command = _connection.CreateCommand();
command.CommandText = "SELECT Value FROM tableName WHERE Id = #id;";
var param = command.CreateParameter();
param.ParameterName = "#id";
param.Value = id;
command.Parameters.Add(param);
return command.ExecuteReader(CommandBehavior.SingleResult).ToString();
For all select's one connection is shared and never closed. Insert is quite fast (less then minute) but select's are very troublesome here. Is there a way to improve them?
Table is quite big (around ~2 milions records) and Value contains quite heavy serialized objects.
System.Data.SQLite provider is used and connection string contains this additional options: Version=3;Journal Mode=Off;Synchronous=off;
If you go for performance, you need to consider this: each independent SELECT command is a roundtrip to the DB with some extra costs. It's similar to a N+1 select problem in case of parent-child relations.
The best thing you can do is to get a LIST of items (values):
SELECT Value FROM tableName WHERE Id IN (1, 2, 3, 4, ...);
Here's a link on how to code that: https://www.mikesdotnetting.com/article/116/parameterized-in-clauses-with-ado-net-and-linq
You could have the select command not recreated for every Id but created once and only executed for every Id. From your code it seems every select is CreateCommand/CreateParameters and so on. See this for example: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.prepare?view=net-5.0 - you run .Prepare() once and then only execute (they don't need to be NonQuery)
you could then try to see if you can be faster with ExecuteScalar and not having reader created for one data result, like so: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.executescalar?view=net-5.0
If scalar will not prove to be faster then you could try to use .SingleRow instead of .SingleResult in your ExecuteReader for possible performance optimisations. According to this: https://learn.microsoft.com/en-us/dotnet/api/system.data.commandbehavior?view=net-5.0 it might work. I doubt that but if first two don't help, why not try it too.

c#, using dynamic queries

How can I use dynamic queries in C# ? From what I've searched its similiar to when we use SqlCommand with parameters to prevent sql injection(example below).
using (SQLiteConnection DB_CONNECTION = new SQLiteConnection(connectionString))
{
DB_CONNECTION.Open();
string sqlquery = "UPDATE table SET Name =#Name, IsComplete=#IsComplete WHERE Key =#Key;";
int rows = 0;
using (SQLiteCommand command = new SQLiteCommand(sqlquery, DB_CONNECTION))
{
SQLiteParameter[] tableA = { new SQLiteParameter("#Key", todo.Key), new SQLiteParameter("#Name", table.Name), new SQLiteParameter("#IsComplete", table.IsComplete) };
command.Parameters.AddRange(tableA);
rows = command.ExecuteNonQuery();
}
DB_CONNECTION.Close();
return (rows);
}
I'm new to c# and i wondering how can I make this work, thanks in advance.
Basically just build up the string sqlQuery based on a set of conditions and ensure that the appropriate parameters have been set. For example, here is some psuedo-C# (not tested for bugs):
//Set to true, so our queries will always include the check for SomeOtherField.
//In reality, use some check in the C# code that you would want to compose your query.
//Here we set some value we want to compare to.
string someValueToCheck = "Some value to compare";
using (SQLiteConnection DB_CONNECTION = new SQLiteConnection(connectionString))
{
DB_CONNECTION.Open();
string sqlquery = "UPDATE MyTable SET Name =#Name, IsComplete=#IsComplete WHERE Key =#Key";
//Replace this with some real condition that you want to use.
if (!string.IsNullOrWhiteSpace(someValueToCheck))
{
sqlquery += " AND SomeOtherField = #OtherFieldValue"
}
int rows = 0;
using (SQLiteCommand command = new SQLiteCommand(sqlquery, DB_CONNECTION))
{
//Use a list here since we can't add to an array - arrays are immutable.
List<SQLiteParameter> tableAList = {
new SQLiteParameter("#Key", todo.Key),
new SQLiteParameter("#Name", table.Name),
new SQLiteParameter("#IsComplete", table.IsComplete) };
if (!string.IsNullOrWhiteSpace(someValueToCheck)) {
//Replace 'someValueToCheck' with a value for the C# that you want to use as a parameter.
tableAList.Add(new SQLiteParameter("#OtherFieldValue", someValueToCheck));
}
//We convert the list back to an array as it is the expected parameter type.
command.Parameters.AddRange(tableAList.ToArray());
rows = command.ExecuteNonQuery();
}
DB_CONNECTION.Close();
return (rows);
}
In this day and age it would probably be worth looking into LINQ to Entities, as this will help you to compose queries dynamically in your code - for example https://stackoverflow.com/a/5541505/201648.
To setup for an existing database - also known as "Database First" - see the following tutorial:
https://msdn.microsoft.com/en-au/data/jj206878.aspx
You can skip step 1 since you already have a database, or do the whole tutorial first as practice.
Here is some psuedo-C# LINQ code to perform roughly the same update as the previous example:
//The context you have setup for the ERP database.
using (var db = new ERPContext())
{
//db is an Entity Framework database context - see
//https://msdn.microsoft.com/en-au/data/jj206878.aspx
var query = db.MyTable
.Where(c => c.Key == todo.Key);
if (!string.IsNullOrWhiteSpace(someValueToCheck))
{
//This where is used in conjunction to the previous WHERE,
//so it's more or less a WHERE condition1 AND condition2 clause.
query = query.Where(c => c.SomeOtherField == someValueToCheck);
}
//Get the single thing we want to update.
var thingToUpdate = query.First();
//Update the values.
thingToUpdate.Name = table.Name;
thingToUpdate.IsComplete = table.IsComplete;
//We can save the context to apply these results.
db.SaveChanges();
}
There is some setup involved with Entity Framework, but in my experience the syntax is easier to follow and your productivity will increase. Hopefully this gets you on the right track.
LINQ to Entites can also map SQL stored procedures if someone one your team objects to using it for performance reasons:
https://msdn.microsoft.com/en-us/data/gg699321.aspx
OR if you absolutely ust compose custom queries in the C# code this is also permitted in Entity Framework:
https://msdn.microsoft.com/en-us/library/bb738521(v=vs.100).aspx

Entity Framework update based on database values?

Before Entity Framework, if I had something like a stock quantity or an on order quantity for a product, I would update it's quantities using the current database values as such:
UPDATE Products SET AvailableQty = AvailableQty - 2 WHERE ProductId = 1;
Is there a way to accomplish the same thing with Entity Framework? It seems like the framework follows a Load, Modify, Update pattern:
Product product = db.Products.FirstOrDefault(x => x.ProductId == 1);
product.AvailableQty += 2;
db.SaveChanges();
However, following this method there is a possibility that the product changes between the initial loading of the data and the update of the data. I know I can have a concurrency field on the entity that will prevent the update, but in most cases I don't want to have user intervention (such as a customer placing an order or receiving a purchase order).
Is there a preferred method to handling situations like these using EF, or should I just fall back to raw SQL for these scenarios?
Enclose your find and update within a transaction
using (var transaction = new System.Transactions.TransactionScope())
{
Product product = db.Products.FirstOrDefault(x => x.ProductId == 1);
product.AvailableQty += 2;
db.SaveChanges();
transaction.Complete();
}
"preferred method" would be opinion-based, so I'll just concentrate on the answer. EF allows you direct access to the database through the DbContext's Database property. You can execute SQL directly with ExecuteSqlCommand. Or, you can use the SqlQuery extension method.
ExecuteSqlCommand returns the records affected. And, the SqlQuery extension methods lets you use the fill provided by EF.
Also, if that is not enough power, you can create your own commands like this:
var direct = mydbContext.Database;
using (var command = direct.Connection.CreateCommand())
{
if (command.Connection.State != ConnectionState.Open)
{
command.Connection.Open();
}
command.CommandText = query.ToString(); // Some query built with StringBuilder.
command.Parameters.Add(new SqlParameter("#id", someId));
using (var reader = command.ExecuteReader())
{
if (reader.Read())
{
... code here ...
reader.Close();
}
command.Connection.Close();
}
}

.NET Entity Framework Insert vs Bulk Insert

When I use my xxxContext object and issue several Adds to a table, then SaveChanges() how does the entity framework resolve this to SQL? Will it just loop doing insert into xxx or if there are hundreds of rows, is it smart enough to issue a Bulk insert command?
Bonus Question: If it doesn't issue the Bulk Insert is there a way to force it to so my DB performance isn't killed by separate inserts? Or to bulk to a temp table then merge to the original table like an Upsert?
The downfall of any ORM tool is that it is "chatty". Most times this is good enough. Sometimes it is not.
The short answer is "no".
Which is why I still sometimes pick IDataReader over EF or NHibernate, etc.
And for bulk insert operations, I send xml to the stored procedure, and I shred it and bulk insert/update or merge from there.
So even when I use an ORM, I create a Domain Library that is not EF (or NHibernate) dependent......so I have a "safety valve" to by pass the ORM in certain situations.
There is oportunity for several improvements in Entity Framework:
Set:
yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;
Do SaveChanges() in packages of 100 inserts... try with 1000 and see the changes.
Since during all this inserts, the context is the same, you can rebuild your context object every 1000 inserts. var yourContext = new YourContext();
Doing this improvements in an importing data process of mine, took it from 7 minutes to 6 seconds.
The actual numbers... could not be 100's o 1000's in your case... try it and tweek it.
If your insert queries are ANSI SQL or you don't care about supporting multipe databases with your codebase, you still have the backdoor to create a ADO.NET provider from EF and execute some raw SQL calls
https://stackoverflow.com/a/1579220/98491
I would do something like this
private void BulkInsert(IEnumerable<Person> Persons)
{
// use the information in the link to get your connection
DbConnection conn = ...
using (DbCommand cmd = conn.CreateCommand())
{
var sb = new StringBuilder();
sb.Append("INSERT INTO person (firstname, lastname) VALUES ");
var count = 0;
foreach(var person in persons)
{
if (count !=0) sb.Append(",");
sb.Append(GetInsertCommand(person, count++, cmd));
}
if (count > 0)
{
cmd.CommandText = sb.ToString();
cmd.ExecuteNonQuery();
}
}
if (sb.Length > 0)
ExecuteNonQuery(sb.ToString());
}
private string GetInsertCommand(Person person, int count, DbCommand cmd)
{
var firstname = "#firstname" + count.ToString();
var lastname = "#lastname" + count.ToString();
cmd.Parameters.Add(firstname, person.Firstname);
cmd.Parameters.Add(lastname, person.Firstname);
return String.Format("({0},{1})", firstname, lastname);
}
I must admit I haven't tested it but this should be a quick and dirty method to bypass EF for some Bulk Inserts until Bulk inserts are part of the core.
Update
Just a quick idea. Have you tried the ... method from the Migrations namespace?
Maybe this one does bulk inserts, haven't look into it but it is worth a try:
private void BatchInsert(IEnumerable<Person> persons)
{
context.Persons.AddOrUpdate(persons);
}
I know this method can be slow if you define a Key column like AddOrUpdate(p => p.Firstname, persons) but I would guess without specifing it, that should be all inserts (not guaranteed)
you can use bulk insert extension
usage:
using EntityFramework.BulkInsert.Extensions;
context.BulkInsert(myEntities);
with DbContext:
using (var ctx = GetContext())
{
using (var transactionScope = new TransactionScope())
{
// some stuff in dbcontext
ctx.BulkInsert(entities);
ctx.SaveChanges();
transactionScope.Complete();
}
}
I’m afraid EF does not support bulk insert or update. As you said currently EF will generate bunch of Insert commands and execute them separately (but all wrapped in a single transaction). There were some plans to implement batching, not sure if there is some progress recently. Hopefully in EF6 but I somehow doubt.
You can read more in this discussion.
ASP .NET Core version fast method insert from Repository.
public virtual void AddRangeFastAndCommit(IEnumerable<T> entities)
{
MyDbContext localContext = new MyDbContext(_context.Options);
localContext.ChangeTracker.AutoDetectChangesEnabled = false;
foreach (var entity in entities)
{
localContext.Add(entity);
}
localContext.SaveChanges();
localContext.Dispose();
}

Categories

Resources