.NET Entity Framework Insert vs Bulk Insert - c#

When I use my xxxContext object and issue several Adds to a table, then SaveChanges() how does the entity framework resolve this to SQL? Will it just loop doing insert into xxx or if there are hundreds of rows, is it smart enough to issue a Bulk insert command?
Bonus Question: If it doesn't issue the Bulk Insert is there a way to force it to so my DB performance isn't killed by separate inserts? Or to bulk to a temp table then merge to the original table like an Upsert?

The downfall of any ORM tool is that it is "chatty". Most times this is good enough. Sometimes it is not.
The short answer is "no".
Which is why I still sometimes pick IDataReader over EF or NHibernate, etc.
And for bulk insert operations, I send xml to the stored procedure, and I shred it and bulk insert/update or merge from there.
So even when I use an ORM, I create a Domain Library that is not EF (or NHibernate) dependent......so I have a "safety valve" to by pass the ORM in certain situations.

There is oportunity for several improvements in Entity Framework:
Set:
yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;
Do SaveChanges() in packages of 100 inserts... try with 1000 and see the changes.
Since during all this inserts, the context is the same, you can rebuild your context object every 1000 inserts. var yourContext = new YourContext();
Doing this improvements in an importing data process of mine, took it from 7 minutes to 6 seconds.
The actual numbers... could not be 100's o 1000's in your case... try it and tweek it.

If your insert queries are ANSI SQL or you don't care about supporting multipe databases with your codebase, you still have the backdoor to create a ADO.NET provider from EF and execute some raw SQL calls
https://stackoverflow.com/a/1579220/98491
I would do something like this
private void BulkInsert(IEnumerable<Person> Persons)
{
// use the information in the link to get your connection
DbConnection conn = ...
using (DbCommand cmd = conn.CreateCommand())
{
var sb = new StringBuilder();
sb.Append("INSERT INTO person (firstname, lastname) VALUES ");
var count = 0;
foreach(var person in persons)
{
if (count !=0) sb.Append(",");
sb.Append(GetInsertCommand(person, count++, cmd));
}
if (count > 0)
{
cmd.CommandText = sb.ToString();
cmd.ExecuteNonQuery();
}
}
if (sb.Length > 0)
ExecuteNonQuery(sb.ToString());
}
private string GetInsertCommand(Person person, int count, DbCommand cmd)
{
var firstname = "#firstname" + count.ToString();
var lastname = "#lastname" + count.ToString();
cmd.Parameters.Add(firstname, person.Firstname);
cmd.Parameters.Add(lastname, person.Firstname);
return String.Format("({0},{1})", firstname, lastname);
}
I must admit I haven't tested it but this should be a quick and dirty method to bypass EF for some Bulk Inserts until Bulk inserts are part of the core.
Update
Just a quick idea. Have you tried the ... method from the Migrations namespace?
Maybe this one does bulk inserts, haven't look into it but it is worth a try:
private void BatchInsert(IEnumerable<Person> persons)
{
context.Persons.AddOrUpdate(persons);
}
I know this method can be slow if you define a Key column like AddOrUpdate(p => p.Firstname, persons) but I would guess without specifing it, that should be all inserts (not guaranteed)

you can use bulk insert extension
usage:
using EntityFramework.BulkInsert.Extensions;
context.BulkInsert(myEntities);
with DbContext:
using (var ctx = GetContext())
{
using (var transactionScope = new TransactionScope())
{
// some stuff in dbcontext
ctx.BulkInsert(entities);
ctx.SaveChanges();
transactionScope.Complete();
}
}

I’m afraid EF does not support bulk insert or update. As you said currently EF will generate bunch of Insert commands and execute them separately (but all wrapped in a single transaction). There were some plans to implement batching, not sure if there is some progress recently. Hopefully in EF6 but I somehow doubt.
You can read more in this discussion.

ASP .NET Core version fast method insert from Repository.
public virtual void AddRangeFastAndCommit(IEnumerable<T> entities)
{
MyDbContext localContext = new MyDbContext(_context.Options);
localContext.ChangeTracker.AutoDetectChangesEnabled = false;
foreach (var entity in entities)
{
localContext.Add(entity);
}
localContext.SaveChanges();
localContext.Dispose();
}

Related

How to do "UPDATE table SET COL = COL + 1"

When I do this in EF Core 6:
int diff = 2;
var row = await db.Table.FirstOrDefaultAsync(); // foo = 3
row.foo += diff;
await db.SaveChangesAsync();
It translates to SQL UPDATE Table SET foo=5. If the database changes during my operation, it will set a wrong value.
I have heard that EF Core has collision prevention and will throw an exception in such a case, but if I use the SQL UPDATE Table SET foo=foo+2 can I even avoid the collision? If so, how to write this is EF Core 6?
If you want to make raw SQL queries, i.e. something you can execute, EF Core is maybe not the best, as EF Core is an entity tracking framework. But it is supported.
You can also try bulk updates, but they have the same issue when an entity is updated during writing back:
foreach (var item in context.Table)
{
item.foo++;
}
context.SaveChanges();
Note the text at the bottom:
Unfortunately, EF doesn't currently provide APIs for performing bulk updates. Until these are introduced, you can use raw SQL to perform the operation where performance is sensitive:
context.Database.ExecuteSqlRaw("UPDATE [Employees] SET [Salary] = [Salary] + 1000");
You could also make an old-fashioned ADO call
string queryString = "UPDATE [table] SET [foo]=[foo]+2";
SqlCommand command = new SqlCommand(queryString, connection);
command.ExecuteNonQuery();
edit: note both Execute[...] calls return an int representing the number of lines affected.

Get output 'inserted' on update with Entity Framework

SQL Server provides output for inserted and updated record with the 'inserted' keyword.
I have a table representing a processing queue. I use the following query to lock a record and get the ID of the locked record:
UPDATE TOP (1) GlobalTrans
SET LockDateTime = GETUTCDATE()
OUTPUT inserted.ID
WHERE LockDateTime IS NULL
This will output a column named ID with all the updated record IDs (a single ID in my case). How can I translate this into EF in C# to execute the update and get the ID back?
Entity Framework has no way of doing that.
You could do it the ORM way, by selecting all the records, setting their LockDateTime and writing them back. That probably is not safe for what you want to do because by default it's not one single transaction.
You can span your own transactions and use RepeatableRead as isolation level. That should work. Depending on what your database does in the background, it might be overkill though.
You could write the SQL by hand. That defeats the purpose of entity framework, but it should be just as safe as it was before as far as the locking mechanism is concerned.
You could also put it into a stored procedure and call that. It's a little bit better than the above version because at least somebody will compile it and check that the table and column names are correct.
Simple Example #1 to get a data table:
I did this directly against the connection:
Changed the command.ExecuteNonQuery() to command.ExecuteReader()
var connection = DbContext().Database.Connection as SqlConnection;
using (var command = connection.CreateCommand())
{
command.CommandText = sql;
command.CommandTimeout = 120;
command.Parameters.Add(param);
using (var reader = command.ExecuteReader())
{
var resultTable = new DataTable();
resultTable.Load(reader);
return resultTable;
}
}
FYI, If you don't have an OUTPUT clause in your SQL, it will return an empty data table.
Example #2 to return entities:
This is a bit more complicated but does work.
using a SQL statement with a OUTPUT inserted.*
var className = typeof(T).Name;
var container = ObjContext().MetadataWorkspace.GetEntityContainer(UnitOfWork.ObjContext().DefaultContainerName, DataSpace.CSpace);
var setName = (from meta in container.BaseEntitySets where meta.ElementType.Name == className select meta.Name).First();
var results = ObjContext().ExecuteStoreQuery<T>(sql, setName, trackingEnabled ? MergeOption.AppendOnly : MergeOption.NoTracking).ToList();
T being the entity being worked on

Best way to do bulk inserts using dapper.net [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am using the following code to insert records to a table in SQL Server 2014
using (SqlConnection conn = new SqlConnection(ConfigurationManager.AppSettings["myConnString"]))
{
conn.Execute("INSERT statement here", insertList);
}
The insertList is a list that has 1 million items in it. I tested this insert on a i5 desktop and it took about 65 minutes to insert a million records to SQL Server on the same machine. I am not sure how dapper is doing the inserts behind the scenes. I certainly dont want to open and close the database connection a million times!
Is this the best way to do bulk inserts in dapper or should I try something else or go with plain ADO.Net using Enterprise library?
EDIT
In hindsight, I know using ADO.Net will be better, so will rephrase my question. I still would like to know if this is the best that dapper can do or am I missing a better way to do it in dapper itself?
Building on Ehsan Sajjad's comment, one of the ways is to write a stored procedure that has a READONLY parameter of a user-defined TABLE type.
Say you want to bulk insert contacts that consist of a first name and last name, this is how you would go about it:
1) Create a table type:
CREATE TYPE [dbo].[MyTableType] AS TABLE(
[FirstName] [varchar](50) NULL,
[LastName] [varchar](50) NULL
)
GO
2) Now create a stored proc that uses the above table type:
CREATE PROC [dbo].[YourProc]
/*other params here*/
#Names AS MyTableType READONLY
AS
/* proc body here
*/
GO
3) On the .NET side, pass the parameter as System.Data.SqlDbType.Structured
This usually involves creating an in-memory data-table, then adding rows to it and then using this DataTable object as the #Names parameter.
NOTE: The DataTable is considered to be memory intensive - be careful and profile your code to be sure that it does not cause resource issues on your server.
ALTENATIVE SOLUTION
Use the approach outlined here: https://stackoverflow.com/a/9947259/190476
The solution is for DELETE but can be adapted for an insert or update as well.
The first choice should be SQL Bulk Copy, cause it's safe from SQL injection.
However, there is a way to drastically improve performance. You could merge multiple inserts into one SQL and have only one call instead of multiple.
So instead of this:
You can have this:
Code for inserting Users in bulk can look like this:
public async Task InsertInBulk(IList<string> userNames)
{
var sqls = GetSqlsInBatches(userNames);
using (var connection = new SqlConnection(ConnectionString))
{
foreach (var sql in sqls)
{
await connection.ExecuteAsync(sql);
}
}
}
private IList<string> GetSqlsInBatches(IList<string> userNames)
{
var insertSql = "INSERT INTO [Users] (Name, LastUpdatedAt) VALUES ";
var valuesSql = "('{0}', getdate())";
var batchSize = 1000;
var sqlsToExecute = new List<string>();
var numberOfBatches = (int)Math.Ceiling((double)userNames.Count / batchSize);
for (int i = 0; i < numberOfBatches; i++)
{
var userToInsert = userNames.Skip(i * batchSize).Take(batchSize);
var valuesToInsert = userToInsert.Select(u => string.Format(valuesSql, u));
sqlsToExecute.Add(insertSql + string.Join(',', valuesToInsert));
}
return sqlsToExecute;
}
Whole article and performance comparison is available here: http://www.michalbialecki.com/2019/05/21/bulk-insert-in-dapper/
The best free way to insert with excellent performance is using the SqlBulkCopy class directly as Alex and Andreas suggested.
Disclaimer: I'm the owner of the project Dapper Plus
This project is not free but supports the following operations:
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
By using mapping and allowing to output value like identity columns.
// CONFIGURE & MAP entity
DapperPlusManager.Entity<Order>()
.Table("Orders")
.Identity(x => x.ID);
// CHAIN & SAVE entity
connection.BulkInsert(orders)
.AlsoInsert(order => order.Items);
.Include(x => x.ThenMerge(order => order.Invoice)
.AlsoMerge(invoice => invoice.Items))
.AlsoMerge(x => x.ShippingAddress);
I faced an issue of a solution wich should work with ADO, Entity and Dapper, so a made this lib; it generates batches in form of
IEnumerable<(string SqlQuery, IEnumerable<SqlParameter> SqlParameters)>
IEnumerable<(string SqlQuery, DynamicParameters DapperDynamicParameters)>
this link contains instructions. It's safe against SQL Injection, because the usage of parameters instead concatenation.
Usage with Dapper:
using MsSqlHelpers;
var mapper = new MapperBuilder<Person>()
.SetTableName("People")
.AddMapping(person => person.FirstName, columnName: "Name")
.AddMapping(person => person.LastName, columnName: "Surename")
.AddMapping(person => person.DateOfBirth, columnName: "Birthday")
.Build();
var people = new List<Person>()
{
new Person() { FirstName = "John", LastName = "Lennon", DateOfBirth = new DateTime(1940, 10, 9) },
new Person() { FirstName = "Paul", LastName = "McCartney", DateOfBirth = new DateTime(1942, 6, 18) },
};
var connectionString = "Server=SERVER_ADDRESS;Database=DATABASE_NAME;User Id=USERNAME;Password=PASSWORD;";
var sqlQueriesAndDapperParameters = new MsSqlQueryGenerator().GenerateDapperParametrizedBulkInserts(mapper, people);
using (var sqlConnection = new SqlConnection(connectionString))
{
// Default batch size: 1000 rows or (2100-1) parameters per insert.
foreach (var (SqlQuery, DapperDynamicParameters) in sqlQueriesAndDapperParameters)
{
sqlConnection.Execute(SqlQuery, DapperDynamicParameters);
}
}

Entity Framework update based on database values?

Before Entity Framework, if I had something like a stock quantity or an on order quantity for a product, I would update it's quantities using the current database values as such:
UPDATE Products SET AvailableQty = AvailableQty - 2 WHERE ProductId = 1;
Is there a way to accomplish the same thing with Entity Framework? It seems like the framework follows a Load, Modify, Update pattern:
Product product = db.Products.FirstOrDefault(x => x.ProductId == 1);
product.AvailableQty += 2;
db.SaveChanges();
However, following this method there is a possibility that the product changes between the initial loading of the data and the update of the data. I know I can have a concurrency field on the entity that will prevent the update, but in most cases I don't want to have user intervention (such as a customer placing an order or receiving a purchase order).
Is there a preferred method to handling situations like these using EF, or should I just fall back to raw SQL for these scenarios?
Enclose your find and update within a transaction
using (var transaction = new System.Transactions.TransactionScope())
{
Product product = db.Products.FirstOrDefault(x => x.ProductId == 1);
product.AvailableQty += 2;
db.SaveChanges();
transaction.Complete();
}
"preferred method" would be opinion-based, so I'll just concentrate on the answer. EF allows you direct access to the database through the DbContext's Database property. You can execute SQL directly with ExecuteSqlCommand. Or, you can use the SqlQuery extension method.
ExecuteSqlCommand returns the records affected. And, the SqlQuery extension methods lets you use the fill provided by EF.
Also, if that is not enough power, you can create your own commands like this:
var direct = mydbContext.Database;
using (var command = direct.Connection.CreateCommand())
{
if (command.Connection.State != ConnectionState.Open)
{
command.Connection.Open();
}
command.CommandText = query.ToString(); // Some query built with StringBuilder.
command.Parameters.Add(new SqlParameter("#id", someId));
using (var reader = command.ExecuteReader())
{
if (reader.Read())
{
... code here ...
reader.Close();
}
command.Connection.Close();
}
}

Getting SqlBulkCopy to show up as sql in MiniProfiler

I'm using MiniProfiler to profile my sql commands.
One issue I'm dealing with now is repeated INSERT statements generated by linq.
I've converted them into a SqlBulkCopy command, however now it doesn't appear to record it in the sql view in MiniProfiler.
Would there even be an associated command string for a SqlBulkCopy?
Is it possible to get the bulk copy to appear in the list of sql commands?
Can I at least make it counted in the % sql bit?
I'm aware I could use MiniProfiler.Current.Step("Doing Bulk Copy") but that wouldn't count as SQL, and wouldn't show in the listing with any detail.
Current code below:
public static void BulkInsertAll<T>(this DataContext dc, IEnumerable<T> entities)
{
var conn = (dc.Connection as ProfiledDbConnection).InnerConnection as SqlConnection;
conn.Open();
Type t = typeof(T);
var tableAttribute = (TableAttribute)t.GetCustomAttributes(
typeof(TableAttribute), false).Single();
var bulkCopy = new SqlBulkCopy(conn)
{
DestinationTableName = tableAttribute.Name
};
//....
bulkCopy.WriteToServer(table);
}
You should be able to use CustomTimings to profile these. These are included in the new v3 version that is now available on nuget.
You can see some example usages of CustomTiming in the sample project where this is used to record http and redis events.
An example of how you could use it with SqlBulkCopy:
string sql = GetBulkCopySql(); // what should show up for the SqlBulkCopy event?
using (MiniProfiler.Current.CustomTiming("SqlBulkCopy", sql))
{
RunSqlBulkCopy(); // run the actual SqlBulkCopy operation
}

Categories

Resources