First of all, sorry if my question will confuse you. Well, I'm still a rocky about this programming in c#.
I am using the code below:
foreach (var schedule in schedules)
{
if(schedule.SupplierId != Guid.Empty)
{
var supplier = db.Suppliers.Find(schedule.SupplierId);
schedule.CompanyName = supplier.CompanyName;
}
if(schedule.CustomerId != Guid.Empty)
{
var customer = db.Customers.Find(schedule.CustomerId);
schedule.CompanyName= customer.CompanyName;
}
}
It works really well, but what if the company that I will have is about a thousand company, this looping will slow my program, how to change this code into LINQ-expression.
Please for your reply. Thank you.
There isn't a good way to do this in the client. There are some tools out there to do a mass update on EF, but I would suggest to run just a query to do this, if you need to do this at all. It seems you are updating a field which is just related, but in fact belongs to another entity. You shouldn't do that, since it means updating the one will leave the other invalid.
What Patrick mentioned is absolutely right from an architectural point of view. In my understanding CompanyName belongs somewhere else, unless this entity is a "read-only view"...which obviously is not. Now, if you can't afford to make a major change I would suggest you move this heavy processing off the main thread to a separate thread...if you can.
You can also load all suppliers and companies in memory rather than opening a database connection 1000 times to issue a lookup query. But, again, consider strongly moving this to a separate thread
Related
I'm going through old projects at work trying to make them faster. I'm currently looking at some web APIs. One API is running particularly slow the problem is in the data service it is calling. Specifically it is in a lambda method trying to map a stored procedure result to a domain model. A simple version of the code.
public IEnumerable<DomainModelResult> GetData()
{
return this.EntityFrameworkDB.GetDataSproc().ToList()
.Select(sprocResults=>sprocResults.ToDomainModelResult())
.AsEnumerable();
}
This is a simplified version, but after profiling it I found the major hangup is in the lambda function. I am assuming this is because the EFContext is still open and some goofy entity framework stuff is happening.
Problem is I'm relatively new to Entity Framework(intern) and pretty ignorant to the inner workings of it. Could someone explain why this is so slow. I feel it should be very fast The DomainModelResult is a POCO and only setter methods are being used in ToDomainModelResult.
Edit:
I thought ToList() would do that but started to doubt myself because I couldn't think of another explanation. All the ToDomainModelResult() stuff is extremely simple. Something like.
public static DomainModelResult ToDomainModelResult(SprocResult source)
{
return new DomainModeResult
{
FirstName = source.description,
MiddleName = source._middlename,
LastName = source.lastname,
UserName = source.expr2,
Address = source.uglyName
};
}
Its just a bunch of simple setters, I think the model causing problems has 17 properties. The reason this is being done is because the project is old database first and the stored procedures have ugly names that aren't descriptive at all. Also so switching the stored procedures in dataservices is easy and doesn't break the rest of the project.
Edit:2 For some reason Using ToArray and breaking apart the linq statements makes the assignment from procedure result to domain model result extremely fast. Now the whole dataservice method is faster which is odd, I don't know where the rest of the time went.
This might be a more esoteric question than I originally thought. My question hasn't been answered but the problem is no longer there. Thanks to all the replied. I'm keeping this as unanswered for now.
Edit3: Please flag this question for removal I can't remove it. I found the problem but it is totally unrelated to my original question. I misunderstood the problem when I asked the question. The increase in speed I'm chalking up to compiler optimization and running code in the profiler. The real issues wasn't in my lambda but in a dynamic lambda called by entity framework when the context is closed or an object is accessed it was doing data validation. GetString, GetInt32, and ISDBNull were eating up the most time. So I'm assuming microsoft has optimized these methods and the only way to speed this up is possibly making some variable not nullable in the procedure. This question is misleading and so esoteric I don't think it belongs here and will just confuse people. Sorry.
You should split the code and check which one is taking time.
public IEnumerable<DomainModelResult> GetData()
{
var lst = this.EntityFrameworkDB.GetDataSproc().ToList();
return lst
.Select(sprocResults=>sprocResults.ToDomainModelResult())
.AsEnumerable();
}
I am pretty sure the GetDataSproc procedure is taking most of your time. You need to optimize the stored procedure code
Update
If possible, it is better to do more work on SQL side instead of retrieving 60,000 rows into your memory. Few possible solutions:
If you need to display this information, do paging (top and skip)
If you are doing any filtering or calculating or grouping anything after you retrieve rows in memory, do it in your stored proc
.Net side, as you are returning IEnumerable you may able to use yield on your second part, depends on your architecture
I've just been noodling about with a profiler looking at performance bottlenecks in a WCF application after some users complained of slowness.
To my surprise, almost all the problems came down to Entity Framework operations. We use a repository pattern and most of the "Add/Modify" code looks very much like this:
public void Thing_Add(Thing thing)
{
Log.Trace("Thing_Add called with ThingID " + thing.ThingID);
if (db.Things.Any(m => m.ThingID == thing.ThingID))
{
db.Entry(thing).State = System.Data.EntityState.Modified;
}
else
{
db.Things.Add(thing);
}
}
This is obviously a convenient way to wrap an add/update check into a single function.
Now, I'm aware that EF isn't the most efficient thing when it comes to doing inserts and updates. However, my understanding was (which a little research bears out) that it should be capable of processing a few hundred records faster than a user would likely notice.
But this is causing big bottlenecks on small upserts. For example, in one case it takes six seconds to process about fifty records. That's a particularly bad example but there seem to be instances all over this application where small EF upserts are taking upwards of a second or two. Certainly enough to annoy a user.
We're using Entity Framework 5 with a Database First model. The profiler says it's not the Log.Trace that's causing the issue. What could be causing this, and how can I investigate and fix the issue?
I found the root of the problem on another SO post: DbContext is very slow when adding and deleting
Turns out that when you're working with a large number of objects, especially in a loop, the gradual accumulation of change tracking makes EF get slower and slower.
Refreshing the DbContext isn't enough in this instance as we're still working with too many linked entities. So I put this inside the repository:
public void AutoDetectChangesEnabled(bool detectChanges)
{
db.Configuration.AutoDetectChangesEnabled = detectChanges;
}
And can now use it to turn AutoDetectChangesEnabled on and off before doing looped inserts:
try
{
rep.AutoDetectChangesEnabled(false);
foreach (var thing in thingsInFile)
{
rep.Thing_add(new Thing(thing));
}
}
finally
{
rep.AutoDetectChangesEnabled(true);
}
This makes a hell of a difference. Although it needs to used with care, since it'll stop EF from recognizing potential updates to changed objects.
is there any need to handle locks in terms of threading in any inventory application.
like as i think asp.net is not thread safe.
lets say that there is a product available and its quantity available is 1 and number of user partially trying to book that particular product are 40. so which is going to get that product. or what happens.
not sure even if the question is reliable or not.
http://blogs.msdn.com/b/benchr/archive/2008/09/03/does-asp-net-magically-handle-thread-safety-for-you.aspx
i am not sure on this please help.
Well, technically, you're not even talking about ASP.NET here, but rather Entity Framework or whatever else you're using to communicate with SQL Server or whatever else persistent data store you're using. Relational databases will typically row-lock, so that as one client is updating the row, the row cannot be read by another client, but you can still run into concurrency issues.
You can handle this situation one of two ways: pessimistic concurrency or optimistic concurrency. With pessimistic concurrency you create locks and any other thread trying to read/write the same data is simply turned away in the mean time. In a multi-threaded environment, it's far more common to use optimistic concurrency, since this allows a bit of play room for failover.
With optimistic concurrency, you version the data. As a simplistic example, let's say that I'm looking for the current stock of widgets in my dbo.Widgets table. I'd have a column like Version which might initially be set to "1" and 100 widgets in my Stock column. Client one wants to buy a widget, so I read the row and note the version, 1. Now, I want to update the column so I do an update to set Stock to 99 and Version to 2, but I include in my where clause Version = 1. But, between the time the row was initially read and the update was sent, another client bought a widget and updated the version of the row to 2. The first client's update fails, because Version is no longer 1. So the application then reads the row fresh and tries to update it again, subtracting 1 from Stock and incrementing Version by 1. Rinse and repeat. Generally, you'll want to have some upward limit of attempts before you'll just give up and return an error to the user, but in most scenarios, you might have one collision and then the next one goes through fine. Your server would have to be getting slammed with people eagerly trying to buy widgets before it would be a real problem.
Now of course, this is a highly simplistic approach, and honestly, not something you really have to manage yourself. Entity Framework, for example, will handle concurrency for you automatically as long as you have a rowversion column:
[Timestamp]
public byte[] RowVersion { get; set; }
See http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application for the full guide to setting it up.
ASP.NET certainly is not Thread Safe. The article you link too is fine as a start, but doesn't tell all the story by a long way. In your case, you likely load the Product List into memory at first request for it, or at Application Startup or some other trigger.
When a Request wants to work with a product you grab the appropriate member of this preloaded list. (Believe me this is better than having every request loading the product or product list from the database.) However, now if you have 40 simultaneous requests for the same product they will all be accessing the same object, and new nasty things can happen, like ending up with -39 stock.
You can address this in a many ways ways, but they boild down to two:
Protect the data somehow
Do what Amazon does
Protect the data
There are numerous ways of doing this. One would be to use a critical section via the Lock keyword on C#. For an example, something like this in the Product Class:
private object lockableThing; // Created in the ctor
public bool ReduceStockLevelForSale(int qtySold)
{
bool success = false;
if (this.quantityOnHand >= qtySold)
{
lock (lockableThing)
{
if (this.quantityOnHand >= qtySold)
{
this.quantityOnHand -= qtySold;
success = true;
}
}
}
return success;
}
The double check on the quantity on hand is deliberate and required. There are any number of ways of doing the equivalent. Books have been written about this sort of thing.
Do what Amazon does
As long as at some point in the Order Taking sequence, Amazon thinks it has enough on hand (or maybe even any) it will let you place the order. It doesn't reduce the stock level while the order is being confirmed. Once the order has been confirmed, it has a back-end process (i.e. NOT run by the Web Site) which checks order by order that the order can be fulfilled, and only reduces the On Hand level if it can. If it can't be, they put the order on hold and send you an email saying 'Sorry! We don't have enough of Product X!' and giving you some options.
Discussion
Amazon's is the best way, because if you decrement the stock from the Web Site at what point do you do it? Probably not until the order is confirmed. If the stock has gone, what do you then do? Also, you are going to have to have some functionality to send the 'Sorry!' email: what happens when the last one (or two or three) items of that product can't be found, don't physically exist or are broken? You send a 'Sorry!' email.
However, this does assume that you are in control of the full order to dispatch cycle which is not always the case. If you aren't in control of the full cycle, you need to adjust to what you are in control of, and then pick a method.
I'm very familiar with using a transaction RDBMS, but how would I make sure that changes made to my in-memory data are rolled back if the transaction fails? What if I'm not even using a database?
Here's a contrived example:
public void TransactionalMethod()
{
var items = GetListOfItems();
foreach (var item in items)
{
MethodThatMayThrowException(item);
item.Processed = true;
}
}
In my example, I might want the changes made to the items in the list to somehow be rolled back, but how can I accomplish this?
I am aware of "software transactional memory" but don't know much about it and it seems fairly experimental. I'm aware of the concept of "compensatable transactions", too, but that incurs the overhead of writing do/undo code.
Subversion seems to deal with errors updating a working copy by making you run the "cleanup" command.
Any ideas?
UPDATE:
Reed Copsey offers an excellent answer, including:
Work on a copy of data, update original on commit.
This takes my question one level further - what if an error occurs during the commit? We so often think of the commit as an immediate operation, but in reality it may be making many changes to a lot of data. What happens if there are unavoidable things like OutOfMemoryExceptions while the commit is being applied?
On the flipside, if one goes for a rollback option, what happens if there's an exception during the rollback? I understand things like Oracle RDBMS has the concept of rollback segments and UNDO logs and things, but assuming there's no serialisation to disk (where if it isn't serialised to disk it didn't happen, and a crash means you can investigate those logs and recover from it), is this really possible?
UPDATE 2:
An answer from Alex made a good suggestion: namely that one updates a different object, then, the commit phase is simply changing the reference to the current object over to the new object. He went further to suggest that the object you change is effectively a list of the modified objects.
I understand what he's saying (I think), and I want to make the question more complex as a result:
How, given this scenario, do you deal with locking? Imagine you have a list of customers:
var customers = new Dictionary<CustomerKey, Customer>();
Now, you want to make a change to some of those customers, how do you apply those changes without locking and replacing the entire list? For example:
var customerTx = new Dictionary<CustomerKey, Customer>();
foreach (var customer in customers.Values)
{
var updatedCust = customer.Clone();
customerTx.Add(GetKey(updatedCust), updatedCust);
if (CalculateRevenueMightThrowException(customer) >= 10000)
{
updatedCust.Preferred = true;
}
}
How do I commit? This (Alex's suggestion) will mean locking all customers while replacing the list reference:
lock (customers)
{
customers = customerTx;
}
Whereas if I loop through, modifying the reference in the original list, it's not atomic,a and falls foul of the "what if it crashes partway through" problem:
foreach (var kvp in customerTx)
{
customers[kvp.Key] = kvp.Value;
}
Pretty much every option for doing this requires one of three basic methods:
Make a copy of your data before modifications, to revert to a rollback state if aborted.
Work on a copy of data, update original on commit.
Keep a log of changes to your data, to undo them in the case of an abort.
For example, Software Transactional Memory, which you mentioned, follows the third approach. The nice thing about that is that it can work on the data optimistically, and just throw away the log on a successful commit.
Take a look at the Microsoft Research project, SXM.
From Maurice Herlihy's page, you can download documentation as well as code samples.
You asked: "What if an error occurs during the commit?"
It doesn't matter. You can commit to somewhere/something in memory and check meanwhile if the operation succeeds. If it did, you change the reference of the intended object (object A) to where you committed (object B). Then you have failsafe commits - the reference is only updated on successful commit. Reference change is atomic.
public void TransactionalMethod()
{
var items = GetListOfItems();
try {
foreach (var item in items)
{
MethodThatMayThrowException(item);
item.Processed = true;
}
}
catch(Exception ex) {
foreach (var item in items)
{
if (item.Processed) {
UndoProcessingForThisItem(item);
}
}
}
}
Obviously, the implementation of the "Undo..." is left as an exercise for the reader.
Here's a little experiment I did:
MyClass obj = dataContext.GetTable<MyClass>().Where(x => x.ID = 1).Single();
Console.WriteLine(obj.MyProperty); // output = "initial"
Console.WriteLine("Waiting..."); // put a breakpoint after this line
obj = null;
obj = dataContext.GetTable<MyClass>().Where(x => x.ID = 1).Single(); // same as before, but reloaded
Console.WriteLine(obj.MyProperty); // output still = "initial"
obj.MyOtherProperty = "foo";
dataContext.SubmitChanges(); // throws concurrency exception
When I hit the breakpoint after line 3, I go to a SQL query window and manually change the value to "updated". Then I carry on running. Linq does not reload my object, but re-uses the one it previously had in memory! This is a huge problem for data concurrency!
How do you disable this hidden cache of objects that Linq obviously is keeping in memory?
EDIT - On reflection, it is simply unthinkable that Microsoft could have left such a gaping chasm in the Linq framework. The code above is a dumbed-down version of what I'm actually doing, and there may be little subtleties that I've missed. In short, I'd appreciate if you'd do your own experimentation to verify that my findings above are correct. Alternatively, there must be some kind of "secret switch" that makes Linq robust against concurrent data updates. But what?
This isn't an issue I've come across before (since I don't tend to keep DataContexts open for long periods of time), but it looks like someone else has:
http://www.rocksthoughts.com/blog/archive/2008/01/14/linq-to-sql-caching-gotcha.aspx
LinqToSql has a wide variety of tools to deal with concurrency problems.
The first step, however, is to admit there is a concurrency problem to be solved!
First, DataContext's intended object lifecycle is supposed to match a UnitOfWork. If you're holding on to one for extended periods, you're going to have to work that much harder because the class isn't designed to be used that way.
Second, DataContext tracks two copies of each object. One is the original state and one is the changed/changable state. If you ask for the MyClass with Id = 1, it will give you back the same instance it gave you last time, which is the changed/changable version... not the original. It must do this to prevent concurrency problems with in memory instances... LinqToSql does not allow one DataContext to be aware of two changable versions of MyClass(Id = 1).
Third, DataContext has no idea whether your in-memory change comes before or after the database change, and so cannot referee the concurrency conflict without some guidance. All it sees is:
I read MyClass(Id = 1) from the database.
Programmer modified MyClass(Id = 1).
I sent MyClass(Id = 1) back to the database (look at this sql to see optimistic concurrency in the where clause)
The update will succeed if the database's version matches the original (optimistic concurrency).
The update will fail with concurrency exception if the database's version does not match the original.
Ok, now that the problem is stated, here's a couple of ways to deal with it.
You can throw away the DataContext and start over. This is a little heavy handed for some, but at least it's easy to implement.
You can ask for the original instance or the changed/changable instance to be refreshed with the database value by calling DataContext.Refresh(RefreshMode, target) (reference docs with many good concurrency links in the "Remarks" section). This will bring the changes client side and allow your code to work-out what the final result should be.
You can turn off concurrency checking in the dbml (ColumnAttribute.UpdateCheck) . This disables optimistic concurrency and your code will stomp over anyone else's changes. Also heavy handed, also easy to implement.
Set the ObjectTrackingEnabled property of the DataContext to false.
When ObjectTrackingEnabled is set to true the DataContext is behaving like a Unit of Work. It's going to keep any object loaded in memory so that it can track changes to it. The DataContext has to remember the object as you originally loaded it to know if any changes have been made.
If you are working in a read only scenario you should turn off object tracking. It can be a decent performance improvement.
If you aren't working in a read only scenario then I'm not sure why you want it to work this way. If you have made edits then why would you want it to pull in modified state from the database?
LINQ to SQL uses the identity map design pattern which means that it will always return the same instance of an object for it's given primary key (unless you turn off object tracking).
The solution is simply either use a second data context if you don't want it to interfere with the first instance or refresh the first instance if you do.