Let's assume that I have a list of database entries and I want to duplicate them in the database. Would a regular foreach and db.Add() work, or is there an asynchronous way I should be using with db.Add()?
PS: this is not the actual code, just an example of what I'm trying to accomplish
var pencils = await db.Pencils.Where(x => x.IsBroken == false).ToListAsync();
foreach (var pencil in pencils)
{
pencil.ID = 0;
db.Add(pencil)
}
await db.SaveChangesAsync()
If your intention is to duplicate the broken pencils, your code is slightly flawed. You are loading tracked entities, and I'd be assuming that by setting the ID to 0 you'd want to insert new rows where an identity would take over assigning new IDs. EF entities will chuck an exception if you try setting a tracked entity's PK.
Instead:
var pencils = await db.Pencils.AsNoTracking()
.Where(x => x.IsBroken == false)
.ToListAsync();
db.AddRange(pencils);
await db.SaveChangesAsync()
AsNoTracking ensures the DbContext does not track the entities loaded. This means if we use Add, or in this case AddRange to add them all at once, provided those entities are declared with an Identity-based PK (Pencil.Id) then EF treats them as new entities and they would be assigned a new PK by the underlying database when saved.
As mentioned in the comments, "awaiting" an operation does not wait for the operation. That would be more along the lines of doing:
db.SaveChangesAsync().Wait();
... which would block the current thread until the SaveChanges completed.
await marks a continuation point so that the caller of the encompassing async operation can continue and a resumption point will be picked up and executed at some point after the operation is completed. So for instance in a web action handler, the thread that ASP.Net Core or IIS allocated to action the request can be free to pick up and start processing a new request while this runs in the background. The request will get it's response populated and sent back after completion.
Related
I have a large number of entities for example 10k entities that have all been updated. I need a fast way to change the state and ultimately commit them to the database.
BaseOtiveContext.Entry(Entity).State = EntityState.Modified;
For bulk inserts I can easily achieve very fast results with add range.
BaseOtiveContext.Contacts.AddRange(Entities)
I tried parallel for each for the updated entities but it throws an error as the object in the collection is modified.
I need a way to do this:
BaseOtiveContext.Contacts.AddRange(Entities)
but at the same time change the state to
.State = EntityState.Modified;
Solved my issues the following way:
using (DbContextTransaction txUpdate = dbUpdate.Database.BeginTransaction())
{
dbUpdate.Configuration.AutoDetectChangesEnabled = false;
foreach(var Item in UpdateItems)
{
dbUpdate.Entry<V2Contact>(Item).State = EntityState.Modified;
}
dbUpdate.Configuration.AutoDetectChangesEnabled = true;
dbUpdate.ChangeTracker.DetectChanges();
dbUpdate.SaveChanges();
txUpdate.Commit();
}
The first thing is to disable change tracking, then change the state of each object to update. The change state operation will happen quickly as change tracking is disabled, once done turn it back on and finally call detect changes, save and commit
The number of optimal parallel threads count (N) depends mostly on your CPU. Split the collection of objects into (N) groups so then each thread will process only objects belonging to it. When all threads complete, call Commit (SaveChanges).
I want to know what are the practical differences of executing a transaction in the same database context between these 3 ways:
1) Multiple operations with one single SaveChanges(), without explicitly using a sql transaction
using (TestDbContext db = new TestDbContext())
{
// first operation
// second operation
db.SaveChanges();
}
2) Multiple operations with one single SaveChanges(), using a sql transaction
using (TestDbContext db = new TestDbContext())
using (DbContextTransaction trans = db.Database.BeginTransaction())
{
// operation 1
// operation 2
db.SaveChanges();
trans.commit();
}
3) Multiple operations with multiple SaveChanges(), using a sql transaction
using (TestDbContext db = new TestDbContext())
using (DbContextTransaction trans = db.BeginTransaction())
{
// operation 1
db.SaveChanges();
// operation 2
db.SaveChanges();
trans.commit();
}
In (2) and (3), if commit() is supposed to actually execute requested sql queries to database, is it really different, say, save changes for each operation or save changes for all operation at once?
And if (1) can also allow multiple operations to be safely executed in the same database context so what's the main use of manually starting a transaction? I'd say we can manually provide try/catch block to roll back the transaction if something bad happens, but AFAIK, SaveChanges() also covers it, automatically, at least with SQLServer.
** UPDATED: Another thing is: Should I make db context and transaction variables class-level or these should be local to containing methods only?
If you do not start a transaction, it is implicit. Meaning, all SaveChanges() you perform will be available in the database immediately after the call.
If you start a transaction, SaveChanges() still performs the updates, but the data is not available to other connections until a commit is called.
You can test this yourself by setting break points, creating new objects, adding them to the context, and performing a SaveChanges(). You will see the ID property will have a value after that call, but there will be no corresponding row in the database until you perform a commit on the transaction.
As far as your second question goes, it really depends on concurrency needs, what your class is doing and how much data you're working with. It's not so much a scoping issue as it is a code execution issue.
Contexts are not thread safe, so as long as you only have one thread in your application access the context, you can make it at a broader scope. But then, if other instances of the application are accessing the data, you're going to have to make sure you refresh the data to the latest model. You also should consider that the more of the model you have loaded into memory, the slower saves are going to be over time.
I tend to create my contexts as close to the operations that are to be performed as possible, and dispose them soon after.
Your question doesn't really seem to be about entity framework at all, and is more regarding sql transactions. A sql transaction is a single 'atomic' change. That is to say that either all the changes are committed, or none are committed.
You don't really have an example which covers the scenario, but if you added another example like:
using (TestDbContext db = new TestDbContext())
{
// operation 1
db.SaveChanges();
// operation 2
db.SaveChanges();
}
...in this example, if your first operation saved successfully, but the second operation failed, you could have a situation where data committed at the first step is potentially invalid.
That's why you would use a sql transaction, to wrap both SaveChanges into a single operation that means either all data is committed, or none is committed.
I am writing a very, very simple query which just gets a document from a collection according to its unique Id. After some frusteration (I am new to mongo and the async / await programming model), I figured this out:
IMongoCollection<TModel> collection = // ...
FindOptions<TModel> options = new FindOptions<TModel> { Limit = 1 };
IAsyncCursor<TModel> task = await collection.FindAsync(x => x.Id.Equals(id), options);
List<TModel> list = await task.ToListAsync();
TModel result = list.FirstOrDefault();
return result;
It works, great! But I keep seeing references to a "Find" method, and I worked this out:
IMongoCollection<TModel> collection = // ...
IFindFluent<TModel, TModel> findFluent = collection.Find(x => x.Id == id);
findFluent = findFluent.Limit(1);
TModel result = await findFluent.FirstOrDefaultAsync();
return result;
As it turns out, this too works, great!
I'm sure that there's some important reason that we have two different ways to achieve these results. What is the difference between these methodologies, and why should I choose one over the other?
The difference is in a syntax.
Find and FindAsync both allows to build asynchronous query with the same performance, only
FindAsync returns cursor which doesn't load all documents at once and provides you interface to retrieve documents one by one from DB cursor. It's helpful in case when query result is huge.
Find provides you more simple syntax through method ToListAsync where it inside retrieves documents from cursor and returns all documents at once.
Imagine that you execute this code in a web request, with invoking find method the thread of the request will be frozen until the database return results it's a synchron call, if it's a long database operation that takes seconds to complete, you will have one of the threads available to serve web request doing nothing simply waiting that database return the results, and wasting valuable resources (the number of threads in thread pool is limited).
With FindAsync, the thread of your web request will be free while is waiting the database for returning the results, this means that during the database call this thread is free to attend an another web request. When the database returns the result then the code continue execution.
For long operations like read/writes from file system, database operations, comunicate with another services, it's a good idea to use async calls. Because while you are waiting for the results, the threads are available for serve another web request. This is more scalable.
Take a look to this microsoft article https://msdn.microsoft.com/en-us/magazine/dn802603.aspx.
Working with the Azure Storage Client library 2.1, I'm working on making a query of Table storage async. I created this code:
public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
var theQuery = _table.CreateQuery<TAzureTableEntity>()
.Where(tEnt => tEnt.PartitionKey == partitionKey);
TableQuerySegment<TAzureTableEntity> querySegment = null;
var returnList = new List<TAzureTableEntity>();
while(querySegment == null || querySegment.ContinuationToken != null)
{
querySegment = await theQuery.AsTableQuery()
.ExecuteSegmentedAsync(querySegment != null ?
querySegment.ContinuationToken : null);
returnList.AddRange(querySegment);
}
return returnList;
}
Let's assume there is a large set of data coming back so there will be a lot of round trips to Table Storage. The problem I have is that we're awaiting a set of data, adding it to an in-memory list, awaiting more data, adding it to the same list, awaiting yet more data, adding it to the list... and so on and so forth. Why not just wrap a Task.Factory.StartNew() around a regular TableQuery? Like so:
public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
var returnList = await Task.Factory.StartNew(() =>
table.CreateQuery<TAzureTableEntity>()
.Where(ent => ent.PartitionKey == partitionKey)
.ToList());
return returnList;
}
Doing it this way seems like we're not bouncing the SynchronizationContext back and forth so much. Or does it really matter?
Edit to Rephrase Question
What's the difference between the two scenarios mentioned above?
The difference between the two is that your second version will block a ThreadPool thread for the whole time the query is executing. This might be acceptable in a GUI application (where all you want is to execute the code somewhere other than the UI thread), but it will negate any scalability advantages of async in a server application.
Also, if you don't want your first version to return to the UI context for each roundtrip (which is a reasonable requirement), then use ConfigureAwait(false) whenever you use await:
querySegment = await theQuery.AsTableQuery()
.ExecuteSegmentedAsync(…)
.ConfigureAwait(false);
This way, all iterations after the first one will (most likely) execute on a ThreadPool thread and not on the UI context.
BTW, in your second version, you don't actually need await at all, you could just directly return the Task:
public Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
return Task.Run(() => table.CreateQuery<TAzureTableEntity>()
.Where(ent => ent.PartitionKey == partitionKey)
.ToList());
}
Not sure if this is the answer you're looking for but I still want to mention it :).
As you may already know, the 2nd method (using Task) handles continuation tokens internally and comes out of the method when all entities have been fetched whereas the 1st method fetches a set of entities (up to a maximum of 1000) and then comes out giving you the result set as well as a continuation token.
If you're interested in fetching all entities from a table, both methods can be used however the 1st one gives you the flexibility of breaking out of loop gracefully anytime, which you don't get in the 2nd one. So using the 1st function you could essentially introduce pagination concept.
Let's assume you're building a web application which shows data from a table. Further let's assume that the table contains large number of entities (let's say 100000 entities). Using 1st method, you can just fetch 1000 entities return the result back to the user and if the user wants, you can fetch next set of 1000 entities and show them to the user. You could continue doing that till the time user wants and there's data in the table. With the 2nd method the user would have to wait till all 100000 entities are fetched from the table.
I'm in the process of writing a query manager for a WinForms application that, among other things, needs to be able to deliver real-time search results to the user as they're entering a query (think Google's live results, though obviously in a thick client environment rather than the web). Since the results need to start arriving as the user types, the search will get more and more specific, so I'd like to be able to cancel a query if it's still executing while the user has entered more specific information (since the results would simply be discarded, anyway).
If this were ordinary ADO.NET, I could obviously just use the DbCommand.Cancel function and be done with it, but we're using EF4 for our data access and there doesn't appear to be an obvious way to cancel a query. Additionally, opening System.Data.Entity in Reflector and looking at EntityCommand.Cancel shows a discouragingly empty method body, despite the docs claiming that calling this would pass it on to the provider command's corresponding Cancel function.
I have considered simply letting the existing query run and spinning up a new context to execute the new search (and just disposing of the existing query once it finishes), but I don't like the idea of a single client having a multitude of open database connections running parallel queries when I'm only interested in the results of the most recent one.
All of this is leading me to believe that there's simply no way to cancel an EF query once it's been dispatched to the database, but I'm hoping that someone here might be able to point out something I've overlooked.
TL/DR Version: Is it possible to cancel an EF4 query that's currently executing?
Looks like you have found some bug in EF but when you report it to MS it will be considered as bug in documentation. Anyway I don't like the idea of interacting directly with EntityCommand. Here is my example how to kill current query:
var thread = new Thread((param) =>
{
var currentString = param as string;
if (currentString == null)
{
// TODO OMG exception
throw new Exception();
}
AdventureWorks2008R2Entities entities = null;
try // Don't use using because it can cause race condition
{
entities = new AdventureWorks2008R2Entities();
ObjectQuery<Person> query = entities.People
.Include("Password")
.Include("PersonPhone")
.Include("EmailAddress")
.Include("BusinessEntity")
.Include("BusinessEntityContact");
// Improves performance of readonly query where
// objects do not have to be tracked by context
// Edit: But it doesn't work for this query because of includes
// query.MergeOption = MergeOption.NoTracking;
foreach (var record in query
.Where(p => p.LastName.StartsWith(currentString)))
{
// TODO fill some buffer and invoke UI update
}
}
finally
{
if (entities != null)
{
entities.Dispose();
}
}
});
thread.Start("P");
// Just for test
Thread.Sleep(500);
thread.Abort();
It is result of my playing with if after 30 minutes so it is probably not something which should be considered as final solution. I'm posting it to at least get some feedback with possible problems caused by this solution. Main points are:
Context is handled inside the thread
Result is not tracked by context
If you kill the thread query is terminated and context is disposed (connection released)
If you kill the thread before you start a new thread you should use still one connection.
I checked that query is started and terminated in SQL profiler.
Edit:
Btw. another approach to simply stop current query is inside enumeration:
public IEnumerable<T> ExecuteQuery<T>(IQueryable<T> query)
{
foreach (T record in query)
{
// Handle stop condition somehow
if (ShouldStop())
{
// Once you close enumerator, query is terminated
yield break;
}
yield return record;
}
}