Superimpose async webrequest with database retrieval - c#

In the system I'm currently building I have to make webrequests to an API which provides a calculation service. This service requires a set of complex parameters which I have to retrieve from my database. Currently I'm using entity framework for retrieving these entities and for each entity I'm making a request to this api, retrieve the result and at the end save all results to the database (everything done synchronously)
There will be scaling issues with this approach when the set of entities increases (since I have to call the calculation service every 30 minutes on each entity). Because of this I would like to make the database retrieval and web request for an entity in parallell (or async) with the same operations for other entities. (Not with the purpose of reducing time for data loading but to do work while waiting for the webrequest to complete)
Since EF 5 context is not thread safe, what is my best alternatives for achieving this? Should I write specific SQL queries, use LINQ etc? Does anyone have code examples for a similar approach (db retrieval for webrequest in parallell)
EDIT
Adding a small code sample (very simplified). Assuming that the call to the webservice may take a couple of seconds this will not scale.
foreach(entityId in entityIds)
{
var entity = _repository.Find(entityId);
_repository.LoadData(entity);
_validator.ValidateData(entity);
var result = _webservice.Call(entity);
entity.State = result.State;
}
_repository.SaveChanges();

What you could do is to use a producer-consumer architecture: One thread accesses the database and adds the data to something like BlockingCollection. Another thread (or multiple threads) reads the data from the collection and performs the web request.

There are different ways for you to parallelize this. It all depends on what you really want/need.
One way would be to use a Paralle.ForEach.
Parallel.ForEach(
entityIds,
entityId =>
{
var entity = _repository.Find(entityId);
_repository.LoadData(entity);
_validator.ValidateData(entity);
var result = _webservice.Call(entity);
entity.State = result.State;
});
_repository.SaveChanges();

Related

Is there a way to consume unit of work pattern mutlithreaded?

I'm consuming my repository services through unitOfWork throught the app.
But when I tried to make several DB calls simultaneously, DbContext threw an exception, saying:
different threads concurrently using the same instance of DbContext
that is true since I'm making 3 DB requests at the same time:
var insertYekkaTask = unitOfWork.YekkaRepository.InsertAsync(new(data.AdvertID, data.MatchID));
var getMatchTask = matchService.GetAsync(data.MatchID);
var getAdvertTask = advertisementService.GetAsync(data.AdvertID);
await Task.WhenAll(insertYekkaTask, getMatchTask, getAdvertTask);
var yekkaId = insertYekkaTask.Result;
var match = getMatchTask.Result;
var advertisement = getAdvertTask.Result;
Now as you read from the title, Is there a way to consume unit of work pattern mutlithreaded or on some similar way?
Thanks.
Entity Framework has the limitation of only being used for one request at a time (per db context). This is actually a limitation of the on-the-wire protocol used by many (most?) database engines: when one query is running, the same connection cannot run a different query until that query completes.
So, if you want to do multiple concurrent queries - whether using multithreading or asynchronous code - you'll need multiple EF db contexts.

How to... Display Data from Database

I've got an Application which consists of 2 parts at the moment
A Viewer that receives data from a database using EF
A Service that manipulates data from the database at runtime.
The logic behind the scenes includes some projects such as repositories - data access is realized with a unit of work. The Viewer itself is a WPF-Form with an underlying ViewModel.
The ViewModel contains an ObservableCollection which is the datasource of my Viewer.
Now the question is - How am I able to retrieve the database-data every few minutes? I'm aware of the following two problems:
It's not the latest data my Repository is "loading" - does EF "smart" stuff and retrieves data from the local cache? If so, how can I force EF to load the data from the database?
Re-Setting the whole ObservableCollection or adding / removing entities from another thread / backgroundworker (with invokation) is not possible. How am I supposed to solve this?
I will add some of my code if needed but at the moment I don't think that this would help at all.
Edit:
public IEnumerable<Request> GetAllUnResolvedRequests() {
return AccessContext.Requests.Where(o => !o.IsResolved);
}
This piece of code won't get the latest data - I edit some rows manually (set IsResolved to true) but this method retrieves it nevertheless.
Edit2:
Edit3:
var requests = AccessContext.Requests.Where(o => o.Date >= fromDate && o.Date <= toDate).ToList();
foreach (var request in requests) {
AccessContext.Entry(request).Reload();
}
return requests;
Final Question:
The code above "solves" the problem - but in my opinion it's not clean. Is there another way?
When you access an entity on a database, the entity is cached (and tracked to track changes that your application does until you specify AsNoTracking).
This has some issues (for example, performance issues because the cache increases or you see an old version of entities that is your case).
For this reasons, when using EF you should work with Unit of work pattern (i.e. you should create a new context for every unit of work).
You can have a look to this Microsoft article to understand how implement Unit of work pattern.
http://www.asp.net/mvc/overview/older-versions/getting-started-with-ef-5-using-mvc-4/implementing-the-repository-and-unit-of-work-patterns-in-an-asp-net-mvc-application
In your case using Reload is not a good choice because the application is not scalable. For every reload you are doing a query to database. If you just need to return desired entities the best way is to create a new context.
public IEnumerable<Request> GetAllUnResolvedRequests()
{
return GetNewContext().Requests.Where(o => !o.IsResolved).ToList();
}
Here is what you can do.
You can define the Task (which keeps running on ThreadPool) that periodically checks the Database (consider that periodically making EF to reload data has its own cost).
And You can define SQL Dependency on your query so that when there is a change in data, you can notify the main thread for the same.

Entity Framework 5 performance concerns

Right now I'm working on a pretty complex database. Our object model is designed to be mapped to the database. We're using EF 5 with POCO classes, manually generated.
Everything is working, but there's some complaining about the performances. I've never had performance problems with EF so I'm wondering if this time I just did something terribly wrong, or the problem could reside somewhere else.
The main query may be composed of dynamic parameters. I have several if and switch blocks that are conceptually like this:
if (parameter != null) { query = query.Where(c => c.Field == parameter); }
Also, for some complex And/Or combinations I'm using LinqKit extensions from Albahari.
The query is against a big table of "Orders", containing years and years of data. The average use is a 2 months range filter though.
Now when the main query is composed, it gets paginated with a Skip/Take combination, where the Take is set to 10 elements.
After all this, the IQueryable is sent through layers, reaches the MVC layer where Automapper is employed.
Here, when Automapper starts iterating (and thus the query is really executed) it calls a bunch of navigation properties, which have their own navigation properties and so on. Everything is set to Lazy Loading according to EF recommendations to avoid eager loading if you have more than 3 or 4 distinct entities to include. My scenario is something like this:
Orders (maximum 10)
Many navigation properties under Order
Some of these have other navigation under them (localization entities)
Order details (many order details per order)
Many navigation properties under each Order detail
Some of these have other navigation under them (localization entities)
This easily leads to a total of 300+ queries for a single rendered "page". Each of those queries is very fast, running in a few milliseconds, but still there are 2 main concerns:
The lazy loaded properties are called in sequence and not parallelized, thus taking more time
As a consequence of previous point, there's some dead time between each query, as the database has to receive the sql, run it, return it and so on for each query.
Just to see how it went, I tried to make the same query with eager loading, and as I predicted it was a total disaster, with a translated sql of more than 7K lines (yes, seven thousands) and way more slow overall.
Now I'm reluctant to think that EF and Linq are not the right choice for this scenario. Some are saying that if they were to write a stored procedure which fetches all the needed data, it would run tens of times faster. I don't believe that to be true, and we would lose the automatic materialization of all related entities.
I thought of some things I could do to improve, like:
Table splitting to reduce the selected columns
Turn off object tracking, as this scenario is read only (have untracked entities)
With all of this said, the main complaint is that the result page (done in MVC 4) renders too slowly, and after a bit of diagnostics it seems all "Server Time" and not "Network Time", taking about from 8 to 12 seconds of server time.
From my experience, this should not be happening. I'm wondering if I'm approaching this query need in a wrong way, or if I have to turn my attention to something else (maybe a bad configured IIS server, or anything else I'm really clueless). Needles to say, the database has its indexes ok, checked very carefully by our dba.
So if anyone has any tip, advice, best practice I'm missing about this, or just can tell me that I'm dead wrong in using EF with Lazy Loading for this scenario... you're all welcome.
For a very complex query that brings up tons of hierarchical data, stored procs won't generally help you performance-wise over LINQ/EF if you take the right approach. As you've noted, the two "out of the box" options with EF (lazy and eager loading) don't work well in this scenario. However, there are still several good ways to optimize this:
(1) Rather than reading a bunch of entities into memory and then mapping via automapper, do the "automapping" directly in the query where possible. For example:
var mapped = myOrdersQuery.Select(o => new OrderInfo { Order = o, DetailCount = o.Details.Count, ... })
// by deferring the load until here, we can bring only the information we actually need
// into memory with a single query
.ToList();
This approach works really well if you only need a subset of the fields in your complex hierarchy. Also, EF's ability to select hierarchical data makes this much easier than using stored procs if you need to return something more complex than flat tabular data.
(2) Run multiple LINQ queries by hand and assemble the results in memory. For example:
// read with AsNoTracking() since we'll be manually setting associations
var myOrders = myOrdersQuery.AsNoTracking().ToList();
var orderIds = myOrders.Select(o => o.Id);
var myDetails = context.Details.Where(d => orderIds.Contains(d.OrderId)).ToLookup(d => d.OrderId);
// reassemble in memory
myOrders.ForEach(o => o.Details = myDetails[o.Id].ToList());
This works really well when you need all the data and still want to take advantage of as much EF materialization as possible. Note that, in most cases a stored proc approach can do no better than this (it's working with raw SQL, so it has to run multiple tabular queries) but can't reuse logic you've already written in LINQ.
(3) Use Include() to manually control which associations are eager-loaded. This can be combined with #2 to take advantage of EF loading for some associations while giving you the flexibility to manually load others.
Try to think of an efficient yet simple sql query to get the data for your views.
Is it even possible?
If not, try to decompose (denormalize) your tables so that less joins is required to get data. Also, are there efficient indexes on table colums to speed up data retrieval?
If yes, forget EF, write a stored procedure and use it to get the data.
Turning tracking off for selected queries is a-must for a read-only scenario. Take a look at my numbers:
http://netpl.blogspot.com/2013/05/yet-another-orm-micro-benchmark-part-23_15.html
As you can see, the difference between tracking and notracking scenario is significant.
I would experiment with eager loading but not everywhere (so you don't end up with 7k lines long query) but in selected subqueries.
One point to consider, EF definitely helps make development time much quicker. However, you must remember that when you're returning lots of data from the DB, that EF is using dynamic SQL. This means that EF must 1. Create the SQL, 2.SQL Server then needs to create an execution plan. this happens before the query is run.
When using stored procedures, SQL Server can cache the execution plan (which can be edited for performance), which does make it faster than using EF. BUT... you can always create your stored proc and then execute it from EF. Any complex procedures or queries I would convert to stored procs and then call from EF. Then you can see your performance gain(s) and reevaluate from there.
In some cases, you can use Compiled Queries MSDN to improve query performance drastically. The idea is that if you have a common query that is run many times that might generate the same SQL call with different parameters, you compile the query tie first time it's run then pass it as a delegate, eliminating the overhead of Entity Framework re-generating the SQL for each subsequent call.

Ravendb Savechanges(); taking too long time to run?

Ran into a weird problem with RavenDB
public ActionResult Save(RandomModel model)
{
//Do some stuff, validate model etc..
RavenSession.Store(model);
RavenSession.SaveChanges();
var newListOfModels = RavenSession.Query<RandomModel>().ToList();
return View("randomview",newListOfModels);
}
The newListOfModels does not contain the model i just added with the store method.
However, if i add a Thread.Sleep(100) after savechanges the stored model is included in the new list.
Am i storing and Saving stuff to RavenDB the wrong way?
How should i be doing this?
Of course there is a work around by just adding the incoming model to the newListOfModels and running SaveChanges after for example in a basecontrollers onactionexecuted method.
My primary concern is why i need to delay the thread before i can query the documentsession and find my newly added model there.
RavenDB indexes are stale by their nature. From the documentation:
RavenDB performs data indexing in a background thread, which is
executed whenever new data comes in or existing data is updated.
Running this as a background thread allows the server to respond
quickly even when large amounts of data have changed, however in that
case you may query stale indexes.
So you need to tell RavenDB when querying to wait for the index to be refressed.
You can do with the various WaitFor... customization, you will most probably want the WaitForNonStaleResultsAsOfLastWrite option:
var newListOfModels = RavenSession
.Query<RandomModel>()
.Customize(x => x.WaitForNonStaleResultsAsOfLastWrite()).ToList();

Persistence with EntityFramework in ASP.NET MVC application

In my ASP.NET MVC application I need to implement persistence of data. I've choose Entity Framework for its ability to create classes, database tables and queries from entity model so that I don't have to write SQL table creation or Linq to SQL queries by hand. So simplicity is my goal.
My approach was to create model and than a custom HttpModule that gets called at the and of each request and that just called SaveChanges() on the context. That made my life very hard - entity framework kept throwing very strange exception. Sometimes it worked - no exception but sometimes it did not. First I was trying to fix the problems one by one but when I got another one I realized that my general approach is probably wrong.
So that is the general practice to implement for implementing persistence in ASP.NET MVC application ? Do I just call saveChanges after each change ? Isn't that little inefficient ? And I don't know how to do that with Services patter anyway (services work with entities so I'd have to pass context instance to them so that they could save changes if they make some).
Some links to study materials or tutorials are also appreciated.
Note: this question asks for programing practice. I ask those who will consider it vague to bear in mind that it is still solving my very particular problem and right technique will save me a lot of technical problems before voting to close.
You just need to make sure SaveChanges gets called before your request finishes. At the bottom of a controller action is an ideal place. My controller actions typically look like this:
public ActionResult SomeAction(...)
{
_repository.DoSomething();
...
_repository.DoSomethingElse();
...
_repository.SaveChanges();
return View(...);
}
This has the added benefit that if an exception gets thrown, then SaveChanges will not get called. And you can either handle the exception in the action or in Controller.OnException.
It's going to be no more or less efficient than calling a stored procedure that many number of times (with respect to number of connections that need to be made).
Nominally, you would make all your changes to the object set, then SaveChanges to commit all those changes.
So instead of doing this:
mySet.Objects.Add(someObject);
mySet.SaveChanges();
mySet.OtherObjects.Add(someOtherObject);
mySet.SaveChanges();
You just need to do:
mySet.Objects.Add(someObject);
mySet.OtherObjects.Add(someOtherObject);
mySet.SaveChanges();
// Commits Both Changes
Usually your data access is wrapped by an object implementing the repsitory pattern. You then invoke a Save() method on the repository.
Something like
var customer = customerRepository.Get(id);
customer.FirstName = firstName;
customer.LastName = lastName;
customerRepository.SaveChanges();
The repository can then be wrapped by a service layer to provide view model objects or DTO's
Isn't that little inefficient ?
Don't prematurely optimise. When you have a performance issue, analyse the performance, identify a cause and then optimise. Repeat.
Update
A repository wraps data access, usually a single entity. A service layer wraps business logic and can access multiply entities through multiple repositories. It usually deals with 'slim' models or DTO's.
An example could be something like getting a list of invoices for a customer
public Customer GetCustomerWithInvoices(int id) {
var customer = customerRepository.Get(id);
var invoiceList = invoiceRepository.GetAllInvoicesFor(id);
return new Customer {
Customer = customer,
Invoices = invoiceList
};
}

Categories

Resources