I'm using linq to sql for MySql (using DbLinq) in an ASP.NET MVC website. I have a weird caching problem. Consider the following methods in my Repository class:
public IEnumerable<Message> GetInbox(int userId)
{
using(MyDataContext repo = new MyDataContext(new MySqlConnection("[Connectionstring]")))
{
return repo.Messages.Where(m => m.MessageTo == userId);
}
}
public IEnumerable<Message> GetOutbox(int userId)
{
using (MyDataContext repo = new MyDataContext(new MySqlConnection("[Connectionstring]")))
{
return repo.Messages.Where(m => m.MessageFrom == userId);
}
}
'MyDataContext' is the by DbLinq generated mapping to my database, which inherits from DataContext. I'm not reusing the datacontext here (the above code looks a bit silly but I wanted to make absolutely sure that it was not some datacontext / mysqlconnection re-using issue).
What happens is, whichever of the two methods I call, with whatever userId, the results stay the same. Period. Even though I can see that repo.Messages has more than 10 results, with varying MessageFrom and MessageTo values, I only get the first-queried results back. So if I call GetInbox(4374) it gives me message A and message B. Calling GetInbox(526) afterwards still gives me message A and B, even though there are messages C and D who do have a userId of 526. I have to restart the application to see any changes.
What's going on here? I'm sure I'm doing something so stupid that I'm going to be ashamed when someone points it out to me. If I'm not doing something very stupid, then I find this issue very strange. I read about not reusing DataContext, but I am not. Why this caching issue? Below is my controller code, but I doubt it matters:
[Authorize]
public ActionResult Inbox(int userId)
{
Mailbox inbox = new Mailbox(userId, this.messageRepository.GetInbox(userId));
return PartialView("Inbox", inbox);
}
Though there are similar questions on SO, I haven't found an answer to this exact question. Many thanks!
UPDATE:
changing the code to: return repo.Messages.ToList().Where(m => m.MessageFrom == userId); fixes it, it works fine then. Seems like some cache problem. However, I of course don't want to fix it that way.
Changing the code so that the datacontext is not disposed after the query does not fix the problem.
I wrote some pretty similar code that seems to work fine. The only difference is that as Marc suggests, I'm passing in the connection string and calling ToList on the Where method. My Database is not automatically generated but derives from DataContext. The code is below.
class Program
{
static void Main(string[] args)
{
List<Item> first = GetItems("F891778E-9C87-4620-8AC6-737F6482CECB").ToList();
List<Item> second = GetItems("7CA18DD1-E23B-41AA-871B-8DEF6228F96C").ToList();
Console.WriteLine(first.Count);
Console.WriteLine(second.Count);
Console.Read();
}
static IEnumerable<Item> GetItems(string vendorId)
{
using (Database repo = new Database(#"connection_string_here"))
{
return repo.GetTable<Item>().Where(i => i.VendorId.ToString() == vendorId).ToList(); ;
}
}
}
Start of by writing a test. This will tell you wether Linq2Sql is behaving correctly. Something like:
var inboxMessages = this.messageRepository.GetInbox(userId1);
Assert.That(inboxMessages.All(m => m.MessageTo == userId1);
inboxMessages = this.messageRepository.GetInbox(userid2);
Assert.That(inboxMessages.All(m => m.MessageTo = userid2);
If that succeeds, you should really check wether it's the deferred execution that's causing problems. You should enumerate inboxMessages right away.
Another thing that might be causing trouble, is the fact that you start enumerating when the datacontext is already disposed. The only way to solve this, is not to dispose it at all (and rely on the GC cleaning it up when it goes out of scope), or come up with a custom IDisposable object, so you can put a using around it. Something like:
using(var inboxMessages = this.messageRepository.GetInbox(userId1))
{
Assert.That(inboxMessages.All(m => m.MessageTo == userId1);
}
Caching in LINQ-to-SQL is associated with the DataContext, and is mainly limited to identity caching - in most cases it will re-run a query even if you've done it before. There are a few examples, like .Single(x=>x.Id == id) (which has special handling).
Since you are clearly getting a new data-context each time, I don't think that is the culprit. However, I'm also slightly surprised that the code works... are you sure that is representative?
LINQ's Where method is deferred - meaning it isn't executed until you iterate the data (for example with foreach). But by that time you have already disposed the data-context! Have you snipped something from the example?
Also - by giving it a SqlConnection (that you don't then Dispose()), you may be impacting the cleanup - it may be preferable to just give it (the data-context) the connection string.
Well, it seemed that it was a problem with DbLinq. I used source code from 3 weeks old and there was an apparant bug in QueryCache (though it has always been in there). There's a complete thread that covers this here.
I updated the dblinq source. Querycache is now disabled (does imply a performance hit) and well at least now it works. I'll have to see if the performance is acceptable. Must confess that I'm a bit baffled though as what I'm trying to do is a common linq2sql pattern. Thanks all.
I'd avoid using DBLinq for production code... many of Linq-To-SQL's features aren't implemented, and walking through the source code shows a low level of maturity... many of the methods are not implemented or marked as "unterminated".
...you've been warned!
Related
I have some weird behaviour with a foreach-loop:
IEnumerable<Compound> loadedCompounds;
...
// Loop through the peaks.
foreach (GCPeak p in peaks)
{
// Loop through the compounds.
foreach (Compound c in loadedCompounds)
{
if (c.IsInRange(p) && c.SignalType == p.SignalType)
{
c.AddPeak(p);
}
}
}
So what I'd like to do: Loop through all the GCPeaks (it is a class) and sort them to their corresponding compounds.
AddPeak just adds the GCPeak to a SortedList. Code compiles and runs without exceptions, but the problem is:
After c.AddPeak(p) the SortedList in c contains the GCPeak (checked with Debugger), while the SortedLists in loadedCompounds remains empty.
I am quite confused with this bug I produced:
What is the reason for this behavior? Both Compound and GCPeak are classes so I'd expect references and not copies of my objects and my code to work.
How to do what I'd like to do properly?
EDIT:
This is how I obtain the IEnumarables (The whole thing is coming from an XML file - LINQ to XML). Compounds are obtained basically the same way.
IEnumerable<GCPeak> peaksFromSignal = from p in signal.Descendants("IntegrationResults")
select new GCPeak()
{
SignalType = signaltype,
RunInformation = runInformation,
RetentionTime = XmlConvert.ToDouble(p.Element("RetTime").Value),
PeakArea = XmlConvert.ToDouble(p.Element("Area").Value),
};
Thanks!
An IEnumerable won't hold a hard reference to your list. This causes two potential problems for you.
1) What you are enumerating might not be there anymore (for example if you were enumerating a list of facebook posts using a lazy technique like IEnumerable etc, but your connection to facebook for is closed, then it may evaluate to an empty enumerable. The same would occur if you were doing an IEnumerable over a database collection but that DB connection was closed etc.
2) Using an enumerable like that could lead you later to or previously to that to do a multiple enumeration which can have issues. Resharper typically warns against this (to prevent unintended consequences). See here for more info: Handling warning for possible multiple enumeration of IEnumerable
What you can do to debug your situation would be to use the LINQ extension of .toList() to force early evaluation of your IEnumerable. This will let you see what is in the IEnumerable easier and will let you follow this through your code. Do note that doing toList() does have performance implications as compared to a lazy reference like you have currently but it will force a hard reference earlier and help you debug your scenario and will avoid scenarios mentioned above causing challenges for you.
Thanks for your comments.
Indeed converting my loadedCompounds to a List<> worked.
Lesson learned: Be careful with IEnumerable.
EDIT
As requested, I am adding the implementation of AddPeak:
public void AddPeak(GCPeak peak)
{
if (peak != null)
{
peaks.Add(peak.RunInformation.InjectionDateTime, peak);
}
}
RunInformation is a struct.
I'm going through old projects at work trying to make them faster. I'm currently looking at some web APIs. One API is running particularly slow the problem is in the data service it is calling. Specifically it is in a lambda method trying to map a stored procedure result to a domain model. A simple version of the code.
public IEnumerable<DomainModelResult> GetData()
{
return this.EntityFrameworkDB.GetDataSproc().ToList()
.Select(sprocResults=>sprocResults.ToDomainModelResult())
.AsEnumerable();
}
This is a simplified version, but after profiling it I found the major hangup is in the lambda function. I am assuming this is because the EFContext is still open and some goofy entity framework stuff is happening.
Problem is I'm relatively new to Entity Framework(intern) and pretty ignorant to the inner workings of it. Could someone explain why this is so slow. I feel it should be very fast The DomainModelResult is a POCO and only setter methods are being used in ToDomainModelResult.
Edit:
I thought ToList() would do that but started to doubt myself because I couldn't think of another explanation. All the ToDomainModelResult() stuff is extremely simple. Something like.
public static DomainModelResult ToDomainModelResult(SprocResult source)
{
return new DomainModeResult
{
FirstName = source.description,
MiddleName = source._middlename,
LastName = source.lastname,
UserName = source.expr2,
Address = source.uglyName
};
}
Its just a bunch of simple setters, I think the model causing problems has 17 properties. The reason this is being done is because the project is old database first and the stored procedures have ugly names that aren't descriptive at all. Also so switching the stored procedures in dataservices is easy and doesn't break the rest of the project.
Edit:2 For some reason Using ToArray and breaking apart the linq statements makes the assignment from procedure result to domain model result extremely fast. Now the whole dataservice method is faster which is odd, I don't know where the rest of the time went.
This might be a more esoteric question than I originally thought. My question hasn't been answered but the problem is no longer there. Thanks to all the replied. I'm keeping this as unanswered for now.
Edit3: Please flag this question for removal I can't remove it. I found the problem but it is totally unrelated to my original question. I misunderstood the problem when I asked the question. The increase in speed I'm chalking up to compiler optimization and running code in the profiler. The real issues wasn't in my lambda but in a dynamic lambda called by entity framework when the context is closed or an object is accessed it was doing data validation. GetString, GetInt32, and ISDBNull were eating up the most time. So I'm assuming microsoft has optimized these methods and the only way to speed this up is possibly making some variable not nullable in the procedure. This question is misleading and so esoteric I don't think it belongs here and will just confuse people. Sorry.
You should split the code and check which one is taking time.
public IEnumerable<DomainModelResult> GetData()
{
var lst = this.EntityFrameworkDB.GetDataSproc().ToList();
return lst
.Select(sprocResults=>sprocResults.ToDomainModelResult())
.AsEnumerable();
}
I am pretty sure the GetDataSproc procedure is taking most of your time. You need to optimize the stored procedure code
Update
If possible, it is better to do more work on SQL side instead of retrieving 60,000 rows into your memory. Few possible solutions:
If you need to display this information, do paging (top and skip)
If you are doing any filtering or calculating or grouping anything after you retrieve rows in memory, do it in your stored proc
.Net side, as you are returning IEnumerable you may able to use yield on your second part, depends on your architecture
A programming pattern like this comes up every so often:
int staleCount = 0;
fileUpdatesGridView.DataSource = MultiMerger.TargetIds
.Select(id =>
{
FileDatabaseMerger merger = MultiMerger.GetMerger(id);
if (merger.TargetIsStale)
staleCount++;
return new
{
Id = id,
IsStale = merger.TargetIsStale,
// ...
};
})
.ToList();
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = staleCount > 0;
I'm not sure there is a more succinct way to code this?
Even if so, is it bad practice to do this?
No, it is not strictly "bad practice" (like constructing SQL queries with string concatenation of user input or using goto).
Sometimes such code is more readable than several queries/foreach or no-side-effect Aggregate call. Also it is good idea to at least try to write foreach and no-side-effect versions to see which one is more readable/easier to prove correctness.
Please note that:
it is frequently very hard to reason what/when will happen with such code. I.e. you sample hacks around the fact of LINQ queries executed lazily with .ToList() call, otherwise that value will not be computed.
pure functions can be run in parallel, once with side effects need a lot of care to do so
if you ever need to convert LINQ-to-Object to LINQ-to-SQL you have to rewrite such queries
generally LINQ queries favor functional programming style without side-effects (and hence by convention readers would not expect side-effects in the code).
Why not just code it like this:
var result=MultiMerger.TargetIds
.Select(id =>
{
FileDatabaseMerger merger = MultiMerger.GetMerger(id);
return new
{
Id = id,
IsStale = merger.TargetIsStale,
// ...
};
})
.ToList();
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.IsStale);
I would consider this a bad practice. You are making the assumption that the lambda expression is being forced to execute because you called ToList. That's an implementation detail of the current version of ToList. What if ToList in .NET 7.x is changed to return an object that semi-lazily converts the IQueryable? What if it's changed to run the lambda in parallel? All of a sudden you have concurrency issues on your staleCount. As far as I know, both of those are possibilities which would break your code because of bad assumptions your code is making.
Now as far as repeatedly calling MultiMerger.GetMerger with a single id, that really should be reworked to be a join as the logic for doing a join (w|c)ould be much more efficient than what you have coded there and would scale a lot better, especially if the implementation of MultiMerger is actually pulling data from a database (or might be changed to do so).
As far as calling ToList() before passing it to the Datasource, if the Datasource doesn't use all the fields in your new object, you would be (much) faster and take less memory to skip the ToList and let the datasource only pull the fields it needs. What you've done is highly couple the data to the exact requirements of the view, which should be avoided where possible. An example would be what if you all of a sudden need to display a field that exists in FileDatabaseMerger, but isn't in your current anonymous object? Now you have to make changes to both the controller and view to add it, where if you just passed in an IQueryable, you would only have to change the view. Again, faster, less memory, more flexible, and more maintainable.
Hope this helps.. And this question really should be posted of code review, not stackoverflow.
Update on further review, the following code would be much better:
var result=MultiMerger.GetMergersByIds(MultiMerger.TargetIds);
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.TargetIsStale);
or
var result=MultiMerger.GetMergers().Where(m=>MultiMerger.TargetIds.Contains(m.Id));
fileUpdatesGridView.DataSource = result;
fileUpdatesGridView.DataBind();
fileUpdatesMergeButton.Enabled = result.Any(r=>r.TargetIsStale);
First of all, sorry if my question will confuse you. Well, I'm still a rocky about this programming in c#.
I am using the code below:
foreach (var schedule in schedules)
{
if(schedule.SupplierId != Guid.Empty)
{
var supplier = db.Suppliers.Find(schedule.SupplierId);
schedule.CompanyName = supplier.CompanyName;
}
if(schedule.CustomerId != Guid.Empty)
{
var customer = db.Customers.Find(schedule.CustomerId);
schedule.CompanyName= customer.CompanyName;
}
}
It works really well, but what if the company that I will have is about a thousand company, this looping will slow my program, how to change this code into LINQ-expression.
Please for your reply. Thank you.
There isn't a good way to do this in the client. There are some tools out there to do a mass update on EF, but I would suggest to run just a query to do this, if you need to do this at all. It seems you are updating a field which is just related, but in fact belongs to another entity. You shouldn't do that, since it means updating the one will leave the other invalid.
What Patrick mentioned is absolutely right from an architectural point of view. In my understanding CompanyName belongs somewhere else, unless this entity is a "read-only view"...which obviously is not. Now, if you can't afford to make a major change I would suggest you move this heavy processing off the main thread to a separate thread...if you can.
You can also load all suppliers and companies in memory rather than opening a database connection 1000 times to issue a lookup query. But, again, consider strongly moving this to a separate thread
I have a WCF service with a security class for getting some of the attributes of the calling user. However I'm quite bad when it comes to thread safety - to this point, I haven't needed to do much with it, and only have a rudimentary theoretical understanding of the problems of multi-threading.
Given the following function:
public class SecurityService
{
public static Guid GetCurrentUserID()
{
if (Thread.CurrentPrincipal is MyCustomPrincipal)
{
MyCustomIdentity identity = null;
MyCustomPrincipal principal = (MyCustomPrincipal)Thread.CurrentPrincipal;
if (principal != null)
{
identity = (MyCustomIdentity)principal.Identity;
}
if (identity != null)
{
return identity.UUID;
}
}
return Guid.Empty;
}
}
Is there any chance that something could go wrong in there if the method is being called at the same time from 2 different threads? In my nightmares I see terrible consequences if these methods go wrong, like someone accidentally getting someone else's data or suddenly becoming a system administrator. A colleague (who also he was not an expert, but he's better than me) thought it would probably be okay because there's not really any shared resources that are being accessed there.
Or this one, which will access the database - could this go awry?
public static User GetCurrentUser()
{
var uuid = GetCurrentUserID();
if (uuid != null)
{
var rUser = new UserRepository();
return rUser.GetByID(uuid);
}
return null;
}
There's a lot of discussion about the principals of threading, but I tend to fall down and get confused when it comes to actually applying it, and knowing when to apply it. Any help appreciated.
I can explain more about the context/purpose of these functions if it's not clear.
EDIT: The rUser.GetByID() function basically calls through to a repository that looks up the database using NHibernate. So I guess the database here is a "shared resource", but not really one that gets locked or modified for this operation... in which case I guess it's okay...?
From what I see, the first example only accesses thread-local storage and stack-based variables, while the second one only accesses stack-based variables.
Both should be thread-safe.
I can't tell if GetByID is thread safe or not. Look to see if it accesses any shared/static resources. If it does, it's not thread-safe without some additional code to protect those resources.
The code that you have above doesn't contain any code that changes global state, therefore you can be fairly sure that it won't be a problem being called by multiple simlutaneous threads. Security principal information is tied to each thread, so no problem there either.