I'm trying to make entity framework work properly with my application. The scenario I have is something like the following: Say I have 8000 items, and each item has 100 components. The way it currently works is I eager load the 8000 items and lazy load the components for each item, because eager loading the entire thing would be too slow on application startup.
As far as I understood in order to have lazy loading working, I need to keep the context alive for the whole application lifetime. So I have a single instance of the context that is open on startup and is closed on exit. I also use that to track changes and save changes.
However I've been reading about EF and many people advise against this approach, in favor of opening and closing contexts at each operation. My question is: how would you go about lazy loading properties, tracking changes, and saving changes if I cannot work with the same context?
Furthermore, I am already facing with issues since I use different threads to load data in the background or save in the background (say it's saving, if I edit a tracking property it raises an exception). I fixed some of them by using a FIFO queue (on a specific thread) for operations on the same context, however tracking properties won't respect the queue.
Some help would be greatly appreaciated as to how to use EF properly
Related
Our team is developing a machine that will perform a physical process on a tray that holds vials of medical samples. The physical process will take approximately 1.5 hours. The tray and related vials are entities, loaded from a database using the Entity Framework. As the process runs, the device will update values on the entities. The changes may happen minutes or seconds apart. At the end of certain steps, between 10 and 45 minutes apart, we want to save those entities back to the database, and keep going.
Is it acceptable to have an Entity Framework context open for 1.5 hours? Can I make changes and save the entities multiple times during that time period using that context? If not, what is the best way to handle this?
Some ideas so far:
We could use the attach/detach capability. This should allow us to make changes to the entities outside of the context, then create a new context and attach the entity when we want to save, then detach it to continue working.
We could create a new context every time we want to change one of the entities. But I don't think we want to save every time we make a change.
We could copy the entities to business objects, and make the changes there. Then when we want to save, we would open a context and copy the changes into the entities, and save.
A combination of 2 and 3 will be ideal.
First off, do not keep a context open for hours at a time. You can do this through configuration but it is just going to waste resources considering that you are doing an operation for 90 minutes and it should take roughly 3 milliseconds to open a connection.
So just create a context as you need it. Next, keep in mind that although you open a context to gather data or maintain state, you do not actually need to save the data if it is not ready to be stored. You can just store it locally.
This is where step 3 comes in, with local memory. Basically you should keep it in local memory with an event handler attached. As the local copy changes, have the database update if the change has occurred within some acceptable time window.
Is it acceptable to have an Entity Framework context open for 1.5 hours?
UPDATE: Per the resources you link, if you allow EF to manage the opening and closing of the connection, it will open the connection as late as possible and close it as early as possible, so that the relatively costly database connection can be returned to the connection pool. The connection will only be held open for the duration that the context exists if you manually manage the database connection.
At the end of certain steps, between 10 and 45 minutes apart, we want to save those entities back to the database, and keep going.
Note that if the client crashes for any reason, the changes kept in memory will be lost. Consider the impact of this when deciding whether you really want to wait that long before persisting your data.
If it is quite certain that this is and will remain an architecture with one or just a few clients writing to a dedicated database, I would opt to keep the code as simple as possible... trading resource inefficiency that does not matter in this very specific case for a lesser chance of programmer error.
I understand that you want to save data in batches and that it's no use to save individual values if a batch as a whole doesn't succeed.
Since resources are not the bottleneck and this is a dedicated system, it doesn't matter that a context lives relatively long, so I would use a context per batch. The context collects the data and concludes each batch by one SaveChanges call, which automatically saves one batch in one database transaction. Roughly, the code would look like this:
do
{
// Start of a new batch.
using(var db = new MyContext())
{
// Collect data into the context
...
SaveChanges();
}
} while (....); // While there are new batches
Database connections will be opened and closed when needed. SaveChanges will do this, but also any other database interaction you may need in-between. EF will never leave a connection open for longer than necessary.
I'm struggling with the following problem.
I have a database with a table Jobs, which contains information about Jobs to be done. I followed the Code First approach of EF 6.0 and create a POCO class called Job. I then query the database for the Jobs:
DbSet<Job> receivedJobs;
using (var context = new MyContext())
{
receivedJobs = (from j in context.Jobs
select j);
}
With the received set receivedJobs I will then do a time consuming optimization.
As I understand it, the lifetime of the context as well as the resources the context controls ends with the closing bracket of the using statement. Also a good design should free resources to the database as soon as it is no longer required.
My question is now what should I do in my case? Just keep the database context alive until I finished my time consuming optimisation task. Or close the connection as it is not needed until the optimisation ends. But in the latter case what do I do with the disposed Job objects, because I will then need to access some navigation properties of them which I can't because the context was closed. (And by the way the data in the instances of the Job class will not be changed by the optimization. So it is not required to keep track of changes to these objects, because there will be none)
Hope someone can help me to understand what is the recommended design in this case.
Best regards
You should always hold a context for the least amount of time necessary to do the operations. In your case, it sounds like you will need the context until the optimization is done because you are using some of its methods for navigating the result set. If that is the case, then the context should be held until you don't need it.
The bad habit to avoid is to hold onto a context when you have no immediate need for it. You will see some applications that wrongly create a context on application start and hold it for the life of the application. That is bad and a waste of resources.
In your case, put the optimization code in place, use the context until the code is completed, then release the context. Your using statement will take care of all the messy disposal stuff. Just get your code that needs the context in the {} for the using and you should be good to go.
Altough it will not solve all of your issues, specially the design ones,do you know the "Include" function which preloads the navigation properties of you jobs?
For example if a job points to a list of Tasks thanks to property named "Tasks":
context.Jobs.Include("Tasks") //will preload the Tasks property of
your job.
context.Jobs.Include("Tasks.AllowedUsers") //will preload the Tasks
property of your job, and the AllowedUsers list of each task.
If you want to preload several properties at same level, just use something like:
context.Jobs.Include("Tasks").Include("OtherTasksOnJob")
I have a WPF database viewer application: It's a simple main window containing a user control with a data grid showing the data extracted from an SQLite database.
The problem is that this application takes 6 seconds to start until it is usable.
I tried building the user control (and doing all the data loading) in the constructor of the main window:
The splash screen will be shown 5s this way, then followed by 1s of empty main window until the application is ready to be used.
Users said that it takes too long until something (visually) happens.
I then moved the user control creation (and data loading) into the Loaded event handler of the main window:
The splash screen will be shown 3s, followed by 3s of empty main window until the application is ready.
Users said that it is "better", but don't like the fact that a half finished main window is shown in disabled state for so long.
Is there some general advice to be found about perceived application load time or are there any other recommendations about how this situation can be improved?
I believe ideally the main window would be shown as fast as possible, along with some hour glass or spinner until the data is loaded. But then I cannot just move the user control creation into a background worker as this would be done on the wrong thread.
Does anybody have any suggestions to this problem?
Edit:
Note that right now I've just assigned a LINQ-to-EF query as the grid data source.
One possible improvement may be to load this data into a data table in background and assign it only once loaded...
Edit2:
I'm using .net 4 with System.Data.SQLite and EF4 to load the data. There are more or less 4000 rows and 30 columns.
Load your data asynchronous. Present something nice on the GUI for the user while loading. The following code can help you with this:
BackgroundWorker bgWorker = new BackgroundWorker() { WorkerReportsProgress=true};
bgWorker.DoWork += (s, e) => {
// Load here your file/s
// Use bgWorker.ReportProgress(); to report the current progress
};
bgWorker.ProgressChanged+=(s,e)=>{
// Here you will be informed about progress and here it is save to change/show progress.
// You can access from here savely a ProgressBars or another control.
};
bgWorker.RunWorkerCompleted += (s, e) => {
// Here you will be informed if the job is done.
// Use this event to unlock your gui
};
bgWorker.RunWorkerAsync();
The app is not faster but it seems to be much faster because the GUI is immediately visible and responsive. Maybe you also can show the user a part of the loaded data while loading the rest. Use the ProgressChanged-event for doing this.
Update
I'm not sure if I understand your problem right. If your problem is not the time data needs to be loaded, then something is odd in your application. WPF is IMO very fast. Control-creation does not takes a lot of time. I visualize much bigger lists as you mention in some milliseconds.
Try to look if you have something in your UI that hinders the DataGrid to virtualize the Items. Maybe you have a proplem there. To analyse WPF apps, I can recommend you the WPF Profiling Tools.
The most obvious thing you can do is to profile your application and find the bottlenecks in start up time. It sounds like the most likely culprit will be the loading of data from your database.
One lesson I've learnt is that if you're using an ORM, when loading large datasets if you favour POCO (Plain Old CLR/C# Objects) over ORM-generated database entities (see example below), the load time will be a lot faster and RAM usage will be significantly decreased too. The reason for this is that EF will try to load the entire entity (i.e. all of it's fields) and possibly a whole load of data related to your entities, most of which you won't even need. The only time you really need to work directly with entities is when you're doing insert/update/delete operations. When reading data you should only get fields that your application needs to display and/or validate.
If you follow the MVVM pattern, the above architecture isn't hard to implement.
Example of loading data into POCOs with EF:
var query = from entity in context.Entities
select new EntityPoco
{
ID = entity.ID,
Name = entity.Name
};
return query.ToList();
POCOs are very simple classes with autoproperties for each field.
We usually have repositories for each entity in our applications and each repository is responsible for getting/updating data related to that entity. The view models have references to the repositories they need so they don't use EF directly. When users make changes that need to be persisted, we use other methods in the repository that then load only a subset of entities (i.e. the ones the user changed) and apply the necessary updates - with some validation done by the viewmodel and possibly other validation going on in the DB via constraints/triggers, etc.
There are many reasons for this.
1) Deployment machine might have fairly low configuration.
2) In-Proper or problem with data binding.
Possible solutions would be:
1) Lazy load the data
2) Optimize the performance. http://msdn.microsoft.com/en-us/library/aa970683.aspx
I had saw applications render 5M records less than a second in wpf.
PS:One more least possible reasons may be 30 columns, due to column order access.
Is it possible to lazy load a related object during an open session but to still have the related object available after the session closes?
For example, we have a USER class and a related ROLE class. When we load a USER we also lazy load the related ROLE object. Can we have the USER and ROLE class fully loaded and available after the session is closed?
Is this functionality possible?
Short answer: no. You must initialize anything you will need after the session closes, before closing the session. The method to use to force loading a lazy proxy (without enumerating it) is NHibernateUtil.Initialize(USER.ROLES).
Long answer... kind of. It is possible to "reattach" objects to a new session, thereby allowing PersistentBags and other NH proxies to be initialized. The best method to use, given that you know the object exists in the DB but not in your new session, and that you haven't yet modified it, is Session.Lock(USER, LockMode.None). This will associate the object with the new session without telling NHibernate to do anything regarding reads or writes of the object.
HOWEVER, be advised that this is a code smell. If you are regularly reattaching objects to new sessions, it is a sign that you are not keeping sessions open long enough. There is no problem with opening one session per windows form, for instance, and keeping it open as long as the form is open, PROVIDED you close it when the window closes.
If you're dealing with a 1-1 relationship (0-1 role per user) then possibly the simplest option would be to configure it for eager fetching it rather than lazy loading. Lazy loading is really geared towards 1-* relatives, or objects that are particularly large and rarely needed. NH does a pretty fine job of optimizing queries to include eager data fast in scenarios like that.
Yes. Once the session is closed, any objects that were lazy-loaded will remain in memory and you can access them without any problems.
I'm using LINQ to SQL, and having a bit of an issue incrementing a view counter cross-connection.
The teeny bit of code I'm using is:
t = this.AppManager.ForumManager.GetThread(id);
t.Views = t.Views + 1;
this.AppManager.DB.SubmitChanges();
Now in my tests, I am running this multiple times, non-concurrently. There are a total of 4 copies of the object performing this test.
That is to say, there is no locking issue, or anything like that but there are 4 data contexts.
Now, I would expect this to work like this: fetch a row, modify a field, update the row. However, this is throwing a ChangeConflictException.
Why would the change be conflicted if none of the copies of this are running concurrently?
Is there a way to ignore change conflicts on a certain table?
EDIT: Found the answer:
You can set "UpdateCheck = Never" on all columns on a table to create a last-in-wins style of update. This is what the application was using before I ported it to LINQ, so that is what I will use for now.
EDIT2: While my fix above did indeed prevent the exception from being thrown, it did not fix the underlying issue:
Since I have more than one data context, there ends up being more than one cached copy of each object. Should I be recreating my data context with every page load?
I would rather instruct the data context to forget everything. Is this possible?
I believe DataContext is indented to be relatively lightweight and short-lived. IMO, you should not cache data loaded with a DataContext longer than necessary. When it's short lived, it remains relatively small because (as I understand it) the DataContext's memory usage is primarily associated with tracking the changes you make to objects managed by it (retrieved by it).
In the application I work on, we create the context, display data on the UI, wait for user updates and then update the data. However, that is necessary mainly because we want the update to be based on what the user is looking at (otherwise we could retrieve the data and update all at once when the user hits update). If your updates are relatively free-standing, I think it would be sensible to retrieve the row immediately before updating it.
You can also use System.Data.Linq.DataContext.Refresh() to re-sync already-retrieved data with data in the database to help with this problem.
To respond to your last comment about making the context forget everything, I don't think there's a way to do that, but I suspect that's because all there is to a context is the tracked changes (and the connection), and it's just as well that you create a new context (remember to dispose of the old one) because really you want to throw away everything that the context is.