Is there a way to consume unit of work pattern mutlithreaded? - c#

I'm consuming my repository services through unitOfWork throught the app.
But when I tried to make several DB calls simultaneously, DbContext threw an exception, saying:
different threads concurrently using the same instance of DbContext
that is true since I'm making 3 DB requests at the same time:
var insertYekkaTask = unitOfWork.YekkaRepository.InsertAsync(new(data.AdvertID, data.MatchID));
var getMatchTask = matchService.GetAsync(data.MatchID);
var getAdvertTask = advertisementService.GetAsync(data.AdvertID);
await Task.WhenAll(insertYekkaTask, getMatchTask, getAdvertTask);
var yekkaId = insertYekkaTask.Result;
var match = getMatchTask.Result;
var advertisement = getAdvertTask.Result;
Now as you read from the title, Is there a way to consume unit of work pattern mutlithreaded or on some similar way?
Thanks.

Entity Framework has the limitation of only being used for one request at a time (per db context). This is actually a limitation of the on-the-wire protocol used by many (most?) database engines: when one query is running, the same connection cannot run a different query until that query completes.
So, if you want to do multiple concurrent queries - whether using multithreading or asynchronous code - you'll need multiple EF db contexts.

Related

How to share DbContext between multiple threads

I'm trying to implement a bulk insert and delete operation based on some specifications i.e IList<Specification> provided by the user. The data which needs to be queried is roundabout 1 million, based on which I need to insert or delete records to/from another table i.e UserSpecifications table which have only two columns UserId and SpecificationId and PK is UserId + SpecificationId
Order in which the records get inserted/deleted into UserSpecifications table doesn't matter
Hence I'm trying to create Parallel tasks for each user specification and process data into UserSpecifications
await Task.Run(() => Parallel.ForEach(specifications, async specification =>
{
await insertBulkRecordsAsync(specification); // this uses SqlBulkCopy for data insertion with transaction scope , on 1 million records
await deleteRecordsAsync(specification); // this uses https://entityframework-plus.net/batch-delete operation
}));
both insertBulkRecordsAsync & deleteRecordsAsync uses userContext
When I try to run this piece of code I get the following exception
The context cannot be used while the model is being created. This exception may be thrown if the context is used inside the OnModelCreating method or if the same context instance is accessed by multiple threads concurrently. Note that instance members of DbContext and related classes are not guaranteed to be thread-safe.
I know that EF 6 doesn't support sharing dbContext, but if I run a normal forEach loop things do work but it takes a lot of time to process the updates
Following works without any issue
foreach (var specification in specifications)
{
await insertBulkRecordsAsync(specification);
await deleteRecordsAsync(specification);
}
So do we have any way to share the context and increase the execution time

How to cache DataContext instances in a consumer type application?

We have an application using SDK provided by our provider to integrate easily with them. This SDK connects to AMQP endpoint and simply distributes, caches and transforms messages to our consumers. Previously this integration was over HTTP with XML as a data source and old integration had two ways of caching DataContext - per web request and per managed thread id. (1)
Now, however, we do not integrate over HTTP but rather AMQP which is transparent to us since the SDK is doing all the connection logic and we are only left with defining our consumers so there is no option to cache DataContext "per web request" so only per managed thread id is left.
I implemented chain of responsibility pattern, so when an update comes to us it is put in one pipeline of handlers which uses DataContext to update the database according to the new updates. This is how the invocation method of pipeline looks like:
public Task Invoke(TInput entity)
{
object currentInputArgument = entity;
for (var i = 0; i < _pipeline.Count; ++i)
{
var action = _pipeline[i];
if (action.Method.ReturnType.IsSubclassOf(typeof(Task)))
{
if (action.Method.ReturnType.IsConstructedGenericType)
{
dynamic tmp = action.DynamicInvoke(currentInputArgument);
currentInputArgument = tmp.GetAwaiter().GetResult();
}
else
{
(action.DynamicInvoke(currentInputArgument) as Task).GetAwaiter().GetResult();
}
}
else
{
currentInputArgument = action.DynamicInvoke(currentInputArgument);
}
}
return Task.CompletedTask;
}
The problem is (at least what I think it is) that this chain of responsibility is chain of methods returning/starting new tasks so when an update for entity A comes it is handled by managed thread id = 1 let's say and then only sometime after again same entity A arrives only to be handled by managed thread id = 2 for example. This leads to:
System.InvalidOperationException: 'An entity object cannot be referenced by multiple instances of IEntityChangeTracker.'
because DataContext from managed thread id = 1 already tracks entity A. (at least that's what I think it is)
My question is how can I cache DataContext in my case? Did you guys have the same problem? I read this and this answers and from what I understood using one static DataContext is not an option also.(2)
Disclaimer: I should have said that we inherited the application and I cannot answer why it was implemented like that.
Disclaimer 2: I have little to no experience with EF.
Comunity asked questions:
What version of EF we are using? 5.0
Why do entities live longer than the context? - They don't but maybe you are asking why entities need to live longer than the context. I use repositories that use cached DataContext to get entities from the database to store them in an in-memory collection which I use as a cache.
This is how entities are "extracted", where DatabaseDataContext is the cached DataContext I am talking about (BLOB with whole database sets inside)
protected IQueryable<T> Get<TProperty>(params Expression<Func<T, TProperty>>[] includes)
{
var query = DatabaseDataContext.Set<T>().AsQueryable();
if (includes != null && includes.Length > 0)
{
foreach (var item in includes)
{
query = query.Include(item);
}
}
return query;
}
Then, whenever my consumer application receives AMQP message my chain of responsibility pattern begins checking if this message and its data I already processed. So I have method that looks like that:
public async Task<TEntity> Handle<TEntity>(TEntity sportEvent)
where TEntity : ISportEvent
{
... some unimportant business logic
//save the sport
if (sport.SportID > 0) // <-- this here basically checks if so called
// sport is found in cache or not
// if its found then we update the entity in the db
// and update the cache after that
{
_sportRepository.Update(sport); /*
* because message update for the same sport can come
* and since DataContext is cached by threadId like I said
* and Update can be executed from different threads
* this is where aforementioned exception is thrown
*/
}
else // if not simply insert the entity in the db and the caches
{
_sportRepository.Insert(sport);
}
_sportRepository.SaveDbChanges();
... updating caches logic
}
I thought that getting entities from the database with AsNoTracking() method or detaching entities every time I "update" or "insert" entity will solve this, but it did not.
Whilst there is a certain overhead to newing up a DbContext, and using DI to share a single instance of a DbContext within a web request can save some of this overhead, simple CRUD operations can just new up a new DbContext for each action.
Looking at the code you have posted so far, I would probably have a private instance of the DbContext newed up in the Repository constructor, and then new up a Repository for each method.
Then your method would look something like this:
public async Task<TEntity> Handle<TEntity>(TEntity sportEvent)
where TEntity : ISportEvent
{
var sportsRepository = new SportsRepository()
... some unimportant business logic
//save the sport
if (sport.SportID > 0)
{
_sportRepository.Update(sport);
}
else
{
_sportRepository.Insert(sport);
}
_sportRepository.SaveDbChanges();
}
public class SportsRepository
{
private DbContext _dbContext;
public SportsRepository()
{
_dbContext = new DbContext();
}
}
You might also want to consider the use of Stub Entities as a way around sharing a DbContext with other repository classes.
Since this is about some existing business application I will focus on ideas that can help solve the issue rather than lecture about best practices or propose architectural changes.
I know this is kind of obvious but sometimes rewording error messages helps us better understand what's going on so bear with me.
The error message indicates entities are being used by multiple data contexts which indicates that there are multiple dbcontext instances and that entities are referenced by more than one of such instances.
Then the question states there is a data context per thread that used to be per http request and that entities are cached.
So it seems safe to assume entities read from a db context upon a cache miss and returned from the cache on a hit. Attempting to update entities loaded from one db context instance using a second db context instance cause the failure. We can conclude that in this case the exact same entity instance was used in both operations and no serialization/deserialization is in place for accessing the cache.
DbContext instances are in themselves entity caches through their internal change tracker mechanism and this error is a safeguard protecting its integrity. Since the idea is to have a long running process handling simultaneous requests through multiple db contexts (one per thread) plus a shared entity cache it would be very beneficial performance-wise and memory-wise (the change tracking would likely increase memory consumption in time) to attempt to either change db contexts lifecycle to be per message or empty their change tracker after each message is processed.
Of course in order to process entity updates they need to be attached to the current db context right after retrieving it from the cache and before any changes are applied to them.

How to... Display Data from Database

I've got an Application which consists of 2 parts at the moment
A Viewer that receives data from a database using EF
A Service that manipulates data from the database at runtime.
The logic behind the scenes includes some projects such as repositories - data access is realized with a unit of work. The Viewer itself is a WPF-Form with an underlying ViewModel.
The ViewModel contains an ObservableCollection which is the datasource of my Viewer.
Now the question is - How am I able to retrieve the database-data every few minutes? I'm aware of the following two problems:
It's not the latest data my Repository is "loading" - does EF "smart" stuff and retrieves data from the local cache? If so, how can I force EF to load the data from the database?
Re-Setting the whole ObservableCollection or adding / removing entities from another thread / backgroundworker (with invokation) is not possible. How am I supposed to solve this?
I will add some of my code if needed but at the moment I don't think that this would help at all.
Edit:
public IEnumerable<Request> GetAllUnResolvedRequests() {
return AccessContext.Requests.Where(o => !o.IsResolved);
}
This piece of code won't get the latest data - I edit some rows manually (set IsResolved to true) but this method retrieves it nevertheless.
Edit2:
Edit3:
var requests = AccessContext.Requests.Where(o => o.Date >= fromDate && o.Date <= toDate).ToList();
foreach (var request in requests) {
AccessContext.Entry(request).Reload();
}
return requests;
Final Question:
The code above "solves" the problem - but in my opinion it's not clean. Is there another way?
When you access an entity on a database, the entity is cached (and tracked to track changes that your application does until you specify AsNoTracking).
This has some issues (for example, performance issues because the cache increases or you see an old version of entities that is your case).
For this reasons, when using EF you should work with Unit of work pattern (i.e. you should create a new context for every unit of work).
You can have a look to this Microsoft article to understand how implement Unit of work pattern.
http://www.asp.net/mvc/overview/older-versions/getting-started-with-ef-5-using-mvc-4/implementing-the-repository-and-unit-of-work-patterns-in-an-asp-net-mvc-application
In your case using Reload is not a good choice because the application is not scalable. For every reload you are doing a query to database. If you just need to return desired entities the best way is to create a new context.
public IEnumerable<Request> GetAllUnResolvedRequests()
{
return GetNewContext().Requests.Where(o => !o.IsResolved).ToList();
}
Here is what you can do.
You can define the Task (which keeps running on ThreadPool) that periodically checks the Database (consider that periodically making EF to reload data has its own cost).
And You can define SQL Dependency on your query so that when there is a change in data, you can notify the main thread for the same.

Superimpose async webrequest with database retrieval

In the system I'm currently building I have to make webrequests to an API which provides a calculation service. This service requires a set of complex parameters which I have to retrieve from my database. Currently I'm using entity framework for retrieving these entities and for each entity I'm making a request to this api, retrieve the result and at the end save all results to the database (everything done synchronously)
There will be scaling issues with this approach when the set of entities increases (since I have to call the calculation service every 30 minutes on each entity). Because of this I would like to make the database retrieval and web request for an entity in parallell (or async) with the same operations for other entities. (Not with the purpose of reducing time for data loading but to do work while waiting for the webrequest to complete)
Since EF 5 context is not thread safe, what is my best alternatives for achieving this? Should I write specific SQL queries, use LINQ etc? Does anyone have code examples for a similar approach (db retrieval for webrequest in parallell)
EDIT
Adding a small code sample (very simplified). Assuming that the call to the webservice may take a couple of seconds this will not scale.
foreach(entityId in entityIds)
{
var entity = _repository.Find(entityId);
_repository.LoadData(entity);
_validator.ValidateData(entity);
var result = _webservice.Call(entity);
entity.State = result.State;
}
_repository.SaveChanges();
What you could do is to use a producer-consumer architecture: One thread accesses the database and adds the data to something like BlockingCollection. Another thread (or multiple threads) reads the data from the collection and performs the web request.
There are different ways for you to parallelize this. It all depends on what you really want/need.
One way would be to use a Paralle.ForEach.
Parallel.ForEach(
entityIds,
entityId =>
{
var entity = _repository.Find(entityId);
_repository.LoadData(entity);
_validator.ValidateData(entity);
var result = _webservice.Call(entity);
entity.State = result.State;
});
_repository.SaveChanges();

Linq to SQL - How should I manage database requests?

I have studied a bit into the lifespan of the DataContext trying to work out what is the best possible way of doing things.
Given I want to re-use my DAL in a web application I decided to go with the DataContext Per Business Object Request approach.
My idea was to extend my L2S entities from the dbml file to retrieve information the database creating a separate context per request e.g.
public partial class AnEntity
{
public IEnumerable<RelatedEntity> GetRelatedEntities()
{
using (var dc = new MyDataContext())
{
return dc.RelatedEntities.Where(r => r.EntityID == this.ID);
}
}
}
In terms of returning the Entities...do I need to return POCOs at this point or is it ok to simply return the business object returned from the query? I understand that if I was to try access properties of the returned entity (after the DataContext has been disposed) it would fail. However, this is the reason I have decided to implement these type of methods e.g.
Instead of:
AnEntity entity = null;
using (var repo = new EntityRepo())
{
entity = repo.GetEntity(12345);
}
var related = entity.RelatedEntities; // this would cause an exception
In theory I should be able to do:
AnEntity entity = null;
using (var repo = new EntityRepo())
{
entity = repo.GetEntity(12345);
}
var related = entity.GetRelatedEntities();
Given the circumstances of my particular app (needs to work in a windows service & web application) I would like to know if this seems a plausible approach, whether there are obvious flaws and if there are better approaches for what it is I am trying to do.
Thanks.
Generally speaking, as long as you are not calling a single DataContext object using more than one thread, you should be OK. In other words, use one DataContext object per thread, and do not share data or state between different DataContext objects.
The remaining multi-threaded issues have to do with concurrency in the database, not threading operations.
Other than these caveats, your approach seems sound. You can either use partial classes to implement your business methods, or you can add a business layer between the Linq to SQL classes and your repository.
You can get away with this:
var repo = new EntityRepo();
entity = repo.GetEntity(12345);
var related = entity.RelatedEntities;
See this StackOverflow post for an explanation why not disposing your context doesn't cause connection leaks or stuff like that.
The repository and context will get cleaned up by the garbage collector when the entities that were fetched by them fall out of scope (when building a website, at the end of the request).
Edit: MSDN documents that connections are opened as late as possible and closed as soon as possible. So skipping the using doesn't causes connection pool problems.

Categories

Resources