When I do this in EF Core 6:
int diff = 2;
var row = await db.Table.FirstOrDefaultAsync(); // foo = 3
row.foo += diff;
await db.SaveChangesAsync();
It translates to SQL UPDATE Table SET foo=5. If the database changes during my operation, it will set a wrong value.
I have heard that EF Core has collision prevention and will throw an exception in such a case, but if I use the SQL UPDATE Table SET foo=foo+2 can I even avoid the collision? If so, how to write this is EF Core 6?
If you want to make raw SQL queries, i.e. something you can execute, EF Core is maybe not the best, as EF Core is an entity tracking framework. But it is supported.
You can also try bulk updates, but they have the same issue when an entity is updated during writing back:
foreach (var item in context.Table)
{
item.foo++;
}
context.SaveChanges();
Note the text at the bottom:
Unfortunately, EF doesn't currently provide APIs for performing bulk updates. Until these are introduced, you can use raw SQL to perform the operation where performance is sensitive:
context.Database.ExecuteSqlRaw("UPDATE [Employees] SET [Salary] = [Salary] + 1000");
You could also make an old-fashioned ADO call
string queryString = "UPDATE [table] SET [foo]=[foo]+2";
SqlCommand command = new SqlCommand(queryString, connection);
command.ExecuteNonQuery();
edit: note both Execute[...] calls return an int representing the number of lines affected.
Related
I want to create simple database in runtime, fill it with data from internal resource and then read each record through loop. Previously I used LiteDb for that but I couldn't squeeze time anymore so
I choosed SQLite.
I think there are few things to improve I am not aware of.
Database creation process:
First step is to create table
using var create = transaction.Connection.CreateCommand();
create.CommandText = "CREATE TABLE tableName (Id TEXT PRIMARY KEY, Value TEXT) WITHOUT ROWID";
create.ExecuteNonQuery();
Next insert command is defined
var insert = transaction.Connection.CreateCommand();
insert.CommandText = "INSERT OR IGNORE INTO tableName VALUES (#Id, #Record)";
var idParam = insert.CreateParameter();
var valueParam = insert.CreateParameter();
idParam.ParameterName = "#" + IdColumn;
valueParam.ParameterName = "#" + ValueColumn;
insert.Parameters.Add(idParam);
insert.Parameters.Add(valueParam);
Through loop each value is inserted
idParameter.Value = key;
valueParameter.Value = value.ValueAsText;
insert.Parameters["#Id"] = idParameter;
insert.Parameters["#Value"] = valueParameter;
insert.ExecuteNonQuery();
Transaction commit transaction.Commit();
Create index
using var index = transaction.Connection.CreateCommand();
index.CommandText = "CREATE UNIQUE INDEX idx_tableName ON tableName(Id);";
index.ExecuteNonQuery();
And after that i perform milion selects (to retrieve single value):
using var command = _connection.CreateCommand();
command.CommandText = "SELECT Value FROM tableName WHERE Id = #id;";
var param = command.CreateParameter();
param.ParameterName = "#id";
param.Value = id;
command.Parameters.Add(param);
return command.ExecuteReader(CommandBehavior.SingleResult).ToString();
For all select's one connection is shared and never closed. Insert is quite fast (less then minute) but select's are very troublesome here. Is there a way to improve them?
Table is quite big (around ~2 milions records) and Value contains quite heavy serialized objects.
System.Data.SQLite provider is used and connection string contains this additional options: Version=3;Journal Mode=Off;Synchronous=off;
If you go for performance, you need to consider this: each independent SELECT command is a roundtrip to the DB with some extra costs. It's similar to a N+1 select problem in case of parent-child relations.
The best thing you can do is to get a LIST of items (values):
SELECT Value FROM tableName WHERE Id IN (1, 2, 3, 4, ...);
Here's a link on how to code that: https://www.mikesdotnetting.com/article/116/parameterized-in-clauses-with-ado-net-and-linq
You could have the select command not recreated for every Id but created once and only executed for every Id. From your code it seems every select is CreateCommand/CreateParameters and so on. See this for example: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.prepare?view=net-5.0 - you run .Prepare() once and then only execute (they don't need to be NonQuery)
you could then try to see if you can be faster with ExecuteScalar and not having reader created for one data result, like so: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.executescalar?view=net-5.0
If scalar will not prove to be faster then you could try to use .SingleRow instead of .SingleResult in your ExecuteReader for possible performance optimisations. According to this: https://learn.microsoft.com/en-us/dotnet/api/system.data.commandbehavior?view=net-5.0 it might work. I doubt that but if first two don't help, why not try it too.
SQL Server provides output for inserted and updated record with the 'inserted' keyword.
I have a table representing a processing queue. I use the following query to lock a record and get the ID of the locked record:
UPDATE TOP (1) GlobalTrans
SET LockDateTime = GETUTCDATE()
OUTPUT inserted.ID
WHERE LockDateTime IS NULL
This will output a column named ID with all the updated record IDs (a single ID in my case). How can I translate this into EF in C# to execute the update and get the ID back?
Entity Framework has no way of doing that.
You could do it the ORM way, by selecting all the records, setting their LockDateTime and writing them back. That probably is not safe for what you want to do because by default it's not one single transaction.
You can span your own transactions and use RepeatableRead as isolation level. That should work. Depending on what your database does in the background, it might be overkill though.
You could write the SQL by hand. That defeats the purpose of entity framework, but it should be just as safe as it was before as far as the locking mechanism is concerned.
You could also put it into a stored procedure and call that. It's a little bit better than the above version because at least somebody will compile it and check that the table and column names are correct.
Simple Example #1 to get a data table:
I did this directly against the connection:
Changed the command.ExecuteNonQuery() to command.ExecuteReader()
var connection = DbContext().Database.Connection as SqlConnection;
using (var command = connection.CreateCommand())
{
command.CommandText = sql;
command.CommandTimeout = 120;
command.Parameters.Add(param);
using (var reader = command.ExecuteReader())
{
var resultTable = new DataTable();
resultTable.Load(reader);
return resultTable;
}
}
FYI, If you don't have an OUTPUT clause in your SQL, it will return an empty data table.
Example #2 to return entities:
This is a bit more complicated but does work.
using a SQL statement with a OUTPUT inserted.*
var className = typeof(T).Name;
var container = ObjContext().MetadataWorkspace.GetEntityContainer(UnitOfWork.ObjContext().DefaultContainerName, DataSpace.CSpace);
var setName = (from meta in container.BaseEntitySets where meta.ElementType.Name == className select meta.Name).First();
var results = ObjContext().ExecuteStoreQuery<T>(sql, setName, trackingEnabled ? MergeOption.AppendOnly : MergeOption.NoTracking).ToList();
T being the entity being worked on
Before Entity Framework, if I had something like a stock quantity or an on order quantity for a product, I would update it's quantities using the current database values as such:
UPDATE Products SET AvailableQty = AvailableQty - 2 WHERE ProductId = 1;
Is there a way to accomplish the same thing with Entity Framework? It seems like the framework follows a Load, Modify, Update pattern:
Product product = db.Products.FirstOrDefault(x => x.ProductId == 1);
product.AvailableQty += 2;
db.SaveChanges();
However, following this method there is a possibility that the product changes between the initial loading of the data and the update of the data. I know I can have a concurrency field on the entity that will prevent the update, but in most cases I don't want to have user intervention (such as a customer placing an order or receiving a purchase order).
Is there a preferred method to handling situations like these using EF, or should I just fall back to raw SQL for these scenarios?
Enclose your find and update within a transaction
using (var transaction = new System.Transactions.TransactionScope())
{
Product product = db.Products.FirstOrDefault(x => x.ProductId == 1);
product.AvailableQty += 2;
db.SaveChanges();
transaction.Complete();
}
"preferred method" would be opinion-based, so I'll just concentrate on the answer. EF allows you direct access to the database through the DbContext's Database property. You can execute SQL directly with ExecuteSqlCommand. Or, you can use the SqlQuery extension method.
ExecuteSqlCommand returns the records affected. And, the SqlQuery extension methods lets you use the fill provided by EF.
Also, if that is not enough power, you can create your own commands like this:
var direct = mydbContext.Database;
using (var command = direct.Connection.CreateCommand())
{
if (command.Connection.State != ConnectionState.Open)
{
command.Connection.Open();
}
command.CommandText = query.ToString(); // Some query built with StringBuilder.
command.Parameters.Add(new SqlParameter("#id", someId));
using (var reader = command.ExecuteReader())
{
if (reader.Read())
{
... code here ...
reader.Close();
}
command.Connection.Close();
}
}
When I use my xxxContext object and issue several Adds to a table, then SaveChanges() how does the entity framework resolve this to SQL? Will it just loop doing insert into xxx or if there are hundreds of rows, is it smart enough to issue a Bulk insert command?
Bonus Question: If it doesn't issue the Bulk Insert is there a way to force it to so my DB performance isn't killed by separate inserts? Or to bulk to a temp table then merge to the original table like an Upsert?
The downfall of any ORM tool is that it is "chatty". Most times this is good enough. Sometimes it is not.
The short answer is "no".
Which is why I still sometimes pick IDataReader over EF or NHibernate, etc.
And for bulk insert operations, I send xml to the stored procedure, and I shred it and bulk insert/update or merge from there.
So even when I use an ORM, I create a Domain Library that is not EF (or NHibernate) dependent......so I have a "safety valve" to by pass the ORM in certain situations.
There is oportunity for several improvements in Entity Framework:
Set:
yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;
Do SaveChanges() in packages of 100 inserts... try with 1000 and see the changes.
Since during all this inserts, the context is the same, you can rebuild your context object every 1000 inserts. var yourContext = new YourContext();
Doing this improvements in an importing data process of mine, took it from 7 minutes to 6 seconds.
The actual numbers... could not be 100's o 1000's in your case... try it and tweek it.
If your insert queries are ANSI SQL or you don't care about supporting multipe databases with your codebase, you still have the backdoor to create a ADO.NET provider from EF and execute some raw SQL calls
https://stackoverflow.com/a/1579220/98491
I would do something like this
private void BulkInsert(IEnumerable<Person> Persons)
{
// use the information in the link to get your connection
DbConnection conn = ...
using (DbCommand cmd = conn.CreateCommand())
{
var sb = new StringBuilder();
sb.Append("INSERT INTO person (firstname, lastname) VALUES ");
var count = 0;
foreach(var person in persons)
{
if (count !=0) sb.Append(",");
sb.Append(GetInsertCommand(person, count++, cmd));
}
if (count > 0)
{
cmd.CommandText = sb.ToString();
cmd.ExecuteNonQuery();
}
}
if (sb.Length > 0)
ExecuteNonQuery(sb.ToString());
}
private string GetInsertCommand(Person person, int count, DbCommand cmd)
{
var firstname = "#firstname" + count.ToString();
var lastname = "#lastname" + count.ToString();
cmd.Parameters.Add(firstname, person.Firstname);
cmd.Parameters.Add(lastname, person.Firstname);
return String.Format("({0},{1})", firstname, lastname);
}
I must admit I haven't tested it but this should be a quick and dirty method to bypass EF for some Bulk Inserts until Bulk inserts are part of the core.
Update
Just a quick idea. Have you tried the ... method from the Migrations namespace?
Maybe this one does bulk inserts, haven't look into it but it is worth a try:
private void BatchInsert(IEnumerable<Person> persons)
{
context.Persons.AddOrUpdate(persons);
}
I know this method can be slow if you define a Key column like AddOrUpdate(p => p.Firstname, persons) but I would guess without specifing it, that should be all inserts (not guaranteed)
you can use bulk insert extension
usage:
using EntityFramework.BulkInsert.Extensions;
context.BulkInsert(myEntities);
with DbContext:
using (var ctx = GetContext())
{
using (var transactionScope = new TransactionScope())
{
// some stuff in dbcontext
ctx.BulkInsert(entities);
ctx.SaveChanges();
transactionScope.Complete();
}
}
I’m afraid EF does not support bulk insert or update. As you said currently EF will generate bunch of Insert commands and execute them separately (but all wrapped in a single transaction). There were some plans to implement batching, not sure if there is some progress recently. Hopefully in EF6 but I somehow doubt.
You can read more in this discussion.
ASP .NET Core version fast method insert from Repository.
public virtual void AddRangeFastAndCommit(IEnumerable<T> entities)
{
MyDbContext localContext = new MyDbContext(_context.Options);
localContext.ChangeTracker.AutoDetectChangesEnabled = false;
foreach (var entity in entities)
{
localContext.Add(entity);
}
localContext.SaveChanges();
localContext.Dispose();
}
I am wondering is there a way to do batch updating? I am using ms sql server 2005.
I saw away with the sqlDataAdaptor but it seems like you have to first the select statement with it, then fill some dataset and make changes to dataset.
Now I am using linq to sql to do the select so I want to try to keep it that way. However it is too slow to do massive updates. So is there away that I can keep my linq to sql(for the select part) but using something different to do the mass update?
Thanks
Edit
I am interested in this staging table way but I am not sure how to do it and still not clear how it will be faster since I don't understand how the update part works.
So can anyone show me how this would work and how to deal with concurrent connections?
Edit2
This was my latest attempt at trying to do a mass update using xml however it uses to much resources and my shared hosting does not allow it to go through. So I need a different way so thats why I am not looking into a staging table.
using (TestDataContext db = new TestDataContext())
{
UserTable[] testRecords = new UserTable[2];
for (int count = 0; count < 2; count++)
{
UserTable testRecord = new UserTable();
if (count == 1)
{
testRecord.CreateDate = new DateTime(2050, 5, 10);
testRecord.AnotherField = true;
}
else
{
testRecord.CreateDate = new DateTime(2015, 5, 10);
testRecord.AnotherField = false;
}
testRecords[count] = testRecord;
}
StringBuilder sBuilder = new StringBuilder();
System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder);
XmlSerializer serializer = new XmlSerializer(typeof(UserTable[]));
serializer.Serialize(sWriter, testRecords);
using (SqlConnection con = new SqlConnection(connectionString))
{
string sprocName = "spTEST_UpdateTEST_TEST";
using (SqlCommand cmd = new SqlCommand(sprocName, con))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.CommandType = System.Data.CommandType.StoredProcedure;
SqlParameter param1 = new SqlParameter("#UpdatedProdData", SqlDbType.VarChar, int.MaxValue);
param1.Value = sBuilder.Remove(0, 41).ToString();
cmd.Parameters.Add(param1);
con.Open();
int result = cmd.ExecuteNonQuery();
con.Close();
}
}
}
# Fredrik Johansson I am not sure what your saying will work. Like it seems to me you want me to make a update statement for each record. I can't do that since I will have need update 1 to 50,000+ records and I will not know till that point.
Edit 3
So this is my SP now. I think it should be able to do concurrent connections but I wanted to make sure.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[sp_MassUpdate]
#BatchNumber uniqueidentifier
AS
BEGIN
update Product
set ProductQty = 50
from Product prod
join StagingTbl stage on prod.ProductId = stage.ProductId
where stage.BatchNumber = #BatchNumber
DELETE FROM StagingTbl
WHERE BatchNumber = #BatchNumber
END
You can use the sqlDataAdapter to do a batch update. It dosen’t matter how you fill your dataset. L2SQL or whatever, you can use different methods to do the update. Just define the query to run using the data in your datatable.
The key here is the UpdateBatchSize. The dataadapter will send the updates in batches of whatever size you define. You need to expirement with this value to see what number works best, but typicaly numbers of 500-1000 do best. SQL can then optimize the update and execute a little faster. Note that when doing batchupdates, you cannot update the row source of the datatable.
I use this method to do updates of 10-100K and it usualy runs in under 2 minutes. It will depend on what you are updating though.
Sorry, this is in VB….
Using da As New SqlDataAdapter
da.UpdateCommand = conn.CreateCommand
da.UpdateCommand.CommandTimeout = 300
da.AcceptChangesDuringUpdate = False
da.ContinueUpdateOnError = False
da.UpdateBatchSize = 1000 ‘Expirement for best preformance
da.UpdateCommand.UpdatedRowSource = UpdateRowSource.None 'Needed if UpdateBatchSize > 1
sql = "UPDATE YourTable"
sql += " SET YourField = #YourField"
sql += " WHERE ID = #ID"
da.UpdateCommand.CommandText = sql
da.UpdateCommand.UpdatedRowSource = UpdateRowSource.None
da.UpdateCommand.Parameters.Clear()
da.UpdateCommand.Parameters.Add("#YourField", SqlDbType.SmallDateTime).SourceColumn = "YourField"
da.UpdateCommand.Parameters.Add("#ID", SqlDbType.SmallDateTime).SourceColumn = "ID"
da.Update(ds.Tables("YourTable”)
End Using
Another option is to bulkcopy to a temp table, and then run a query to update the main table from it. This may be faster.
As allonym said, Use SqlBulkCopy, which is very fast(I found speed improvements of over 200x - from 1500 secs to 6s). However you can use the DataTable and DataRows classes to provide data to SQlBulkCopy (which seems easier). Using SqlBulkCopy this way has the added advantage of bein .NET 3.0 compliant as well (Linq was added only in 3.5).
Checkout http://msdn.microsoft.com/en-us/library/ex21zs8x%28v=VS.100%29.aspx for some sample code.
Use SqlBulkCopy, which is lightning-fast. You'll need a custom IDataReader implementation which enumerates over your linq query results. Look at http://code.msdn.microsoft.com/LinqEntityDataReader for more info and some potentially suitable IDataReader code.
You have to work with the expression trees directly, but it's doable. In fact, it's already been done for you, you just have to download the source:
Batch Updates and Deletes with LINQ to SQL
The alternative is to just use stored procedures or ad-hoc SQL queries using the ExecuteMethodCall and ExecuteCommand methods of the DataContext.
You can use SqlDataAdapter to do a batch-update even if a datatable is filled manually/programmatically (from linq of any other source).
Just remember to manually set the RowState for the rows in the datatable. Use dataRow.SetModified() for this.