I have an application which needs to keep data from DB in memory.
There are 5-6 tables with very few rows and the tables are updated very rarely and as application needs this data very frequently I would like to avoid all time requesting the DB on each action.
I am using Entity Framework 4 (linq to entities) and it sends request each time quering. I know it is possible to avoid that using ToList or so ... but I need info from those 6 tables and queries apply joins.
What would be the better solution.
The purpose of the query is to be executed. You can check EF Caching Wrapper if it solves the problem but I don't think so. Caching provider caches actual query so it is enough to change where condition and it is considered as another query.
This should be done by loading your data into custom data structures (lists) and using Linq-to-objects on them.
If you are joining that data to other data which is not candidate for caching, I would suggest looking at your database engine features. Most advanced SQL databases, will place those tables in RAM already. You already will be incurring in network latency overhead when you issue the query for the non-cached data. And the database already will already have an index in RAM as well. Unless you are talking about big rows like an image or similar. You would just be moving a small amount of processing from one place to the next. Plus in order to be as efficient as the SQL database, not only do you need to find how to cache, but also cache an index and write code to use and maintain it as well.
Still, in some use cases it would be very useful thing to do.
Related
Hello and thanks for looking.
I have a DAL question for an application I'm working on. The app is going to extract some data from 5-6 tables from a production RDBMS that serves a much more critical role in the org. What the app has to do is use the data in these tables, analyze, apply some business logic/rules and then present.
The restrictions are that since the storage model is critical in nature to the org, I need to restrict how the app will request the data. Since the tables are relatively small, I created my data access to use DataTables to load the entirety of the db tables on a fixed interval using a timer.
My questions are really around my current design and the potential use of EF or LINQtoSQL
Can EF/LS work around the restrictions of the RDBMS. Most tutorials I've seen, the storage exists solely for the application. Can access to the storage be controlled and/or can EF use DataTables rather than An RDBMS?
Since the entirety of the tables are going to be loaded, is there a best practice for creating classes to consume the data within these tables? I will have to do in memory joins and querying/logic to get at the actual data I need.
Sorry if I'm being generic. I'm more just looking for thoughts and opinions as opposed to a solution to my problem. Please done hesitate to share your thoughts. Thanks.
For your first question, yes Entity Framework can use a existing DB as it's source, the term to search for when looking for Entity Framework tutorials on this topic is called "Database First"
For your second question let me first preface it with a warning: many ORMs are not designed around using it to load the entire data table and do bulk operations on them, especially if you will be modifying the result set and pushing the data back to the server in large quanties. The updates will be row based not set based because you did the modifications in C# code, not in a T-SQL query. Most ORMs are built around the expectation that you will be doing CRUD operations on the row level, not ETL operations or set level CRUD operations (except for Read which most ORMs will do as a set operation).
If you will not be updating the data, only pulling out using Entity Framework and building reports and whatnot off of the data you should be fine. If you are bulk inserting in to the database, things get more problematic. See this SO question for more information.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I see already a un-answered question here on.
My question is -
Is EF really production ready for large application?
The question originated from these underlying questions -
EF pulls all the records into memory then performs the query
operation. How EF would behave when table has around ~1000 records?
For simple edit I have to pull the record edit it and
then push to db using SaveChanges()
I faced a similar situation where we had a large database with many tables 7- 10 million records each. we used Entity framework to display the data. To get nice performance here's what I learned; My 10 Golden rules for Entity Framework:
Understand that call to database made only when the actual records are required. all the operations are just used to make the query (SQL) so try to fetch only a piece of data rather then requesting a large number of records. Trim the fetch size as much as possible
Yes, (In some cases stored procedures are a better choice, they are not that evil as some make you believe), you should use stored procedures where necessary. Import them into your model and have function imports for them. You can also call them directly ExecuteStoreCommand(), ExecuteStoreQuery<>(). Same goes for functions and views but EF has a really odd way of calling functions "SELECT dbo.blah(#id)".
EF performs slower when it has to populate an Entity with deep hierarchy. be extremely careful with entities with deep hierarchy
Sometimes when you are requesting records and you are not required to modify them you should tell EF not to watch the property changes (AutoDetectChanges). that way record retrieval is much faster
Indexing of database is good but in case of EF it becomes very important. The columns you use for retrieval and sorting should be properly indexed.
When you model is large, VS2010/VS2012 Model designer gets real crazy. so break your model into medium sized models. There is a limitation that the Entities from different models cannot be shared even though they may be pointing to the same table in the database.
When you have to make changes in the same entity at different places, use the same entity, make changes and save it only once. The point is to AVOID retrieving the same record, make changes & save it multiple times. (Real performance gain tip).
When you need the info in only one or two columns try not to fetch the full entity. you can either execute your sql directly or have a mini entity something. You may need to cache some frequently used data in your application also.
Transactions are slow. be careful with them.
SQL Profiler or any query profiler is your friend. Run it when developing your application to see what does EF sends to database. When you perform a join using LINQ or Lambda expression in ur application, EF usually generates a Select-Where-In-Select style query which may not always perform well. If u find any such case, roll up ur sleeves, perform the join on DB and have EF retrieve results. (I forgot this one, the most important one!)
if you keep these things in mind EF should give almost similar performance as plain ADO.NET if not the same.
1. EF pulls all the records into memory then performs the query operation. How EF would behave when table has around ~1000 records?
That's not true! EF fetches only necessary records and queries are transformed into proper SQL statements. EF can cache objects locally within DataContext (and track all changes made to entities), but as long as you follow the rule to keep context open only when needed, it won't be a problem.
2. For simple edit I have to pull the record edit it and then push to db using SaveChanges()
It's true, but I would not bother in doing that unless you really see the performance problems. Because 1. is not true, you'll only get one record from DB fetched before it's saved. You can bypass that, by creating the SQL query as a string and sending it as a plain string.
EF translates your LINQ query into an SQL query, so it doesn't pull all records into memory. The generated SQL might not always be the most efficient, but a thousand records won't be a problem at all.
Yes, that's one way of doing it (assuming you only want to edit one record). If you are changing several records, you can get them all using one query and SaveChanges() will persist all of those changes.
EF is not a bad ORM framework. It is a different one with its own characteristics. Compare Microsoft Entity Framework 6, against say NetTiers which is powered by Microsoft Enterprise Library 6.
These are two entirely different beasts. The accepted answer is really good because it goes through the nuances of EF6. Whats key to understand is that each ORM has its own strengths and weaknesses. Compare the project requirements and its data access patterns against the ORM's behavior patterns.
For Example: NetTiers will always give you higher raw performance than EF6. However that is primarily because it is not a point and click ORM and as part and parcel of generating the ORM you will be optimizing your data model, adding custom stored procedures where relevant, etc... if you engineered your data model with the same effort for EF6 you could probably get close to the same performance.
Also consider can you modify the ORM? for example with NetTiers you can add extensions to the codesmith templates to include your own design patterns over and above what is generated by the base ORM library.
Also consider EF6 makes significant use of reflection whereas NetTiers or any library powered by Microsoft Enterprise Library will make heavy use of Generics instead. These are two entirely different approaches. Why so? Because EF6 is based on dynamic reflection whereas NetTiers is based on static reflection. Which is faster and which is better entirely depends on the usage patterns that will be required of the ORM.
Sometimes a hybrid approach works better: Consider for example EF6 for Web API OData endpoints, A few large tables wrapped with NetTiers & Microsoft Enterprise Library with custom stored procedures, and a few large masterdata tables wrapped with a custom built write through object cache where on initial load the record set is streamed into the memory cache using an ADO data reader.
These are all different and they all have their best fit scenarios: EF6, NetTiers, NHibernate, Wilson OR Mapper, XPO from Dev Express, etc...
There is no simple answer for your question. The main thing is about what you want to do with your data? And do you need so much data at one time?
EF translated your Queries to SQL so at this time there is no Object in Memory. When you get the data, then the selected records are in memory. If you are selecting a large amount of large objects then it can be a performance killer if you need to manipulate them all.
If you don't need to manipulate them all you can disable change tracking and enable it later for single objects you need to manipulate.
So you see it depends on your type of application.
If you need to manipulate a mass of data efficient, then don't use a OR-Mapper!
Otherwise EF is fine, but consider how many objects you really need at one time and what you want to do with them.
I want to understand the purpose of datasets when we can directly communicate with the database using simple SQL statements.
Also, which way is better? Updating the data in dataset and then transfering them to the database at once or updating the database directly?
I want to understand the purpose of datasets when we can directly communicate with the database using simple SQL statements.
Why do you have food in your fridge, when you can just go directly to the grocery store every time you want to eat something? Because going to the grocery store every time you want a snack is extremely inconvenient.
The purpose of DataSets is to avoid directly communicating with the database using simple SQL statements. The purpose of a DataSet is to act as a cheap local copy of the data you care about so that you do not have to keep on making expensive high-latency calls to the database. They let you drive to the data store once, pick up everything you're going to need for the next week, and stuff it in the fridge in the kitchen so that its there when you need it.
Also, which way is better? Updating the data in dataset and then transfering them to the database at once or updating the database directly?
You order a dozen different products from a web site. Which way is better: delivering the items one at a time as soon as they become available from their manufacturers, or waiting until they are all available and shipping them all at once? The first way, you get each item as soon as possible; the second way has lower delivery costs. Which way is better? How the heck should we know? That's up to you to decide!
The data update strategy that is better is the one that does the thing in a way that better meets your customer's wants and needs. You haven't told us what your customer's metric for "better" is, so the question cannot be answered. What does your customer want -- the latest stuff as soon as it is available, or a low delivery fee?
Datasets support disconnected architecture. You can add local data, delete from it and then using SqlAdapter you can commit everything to the database. You can even load xml file directly into dataset. It really depends upon what your requirements are. You can even set in memory relations between tables in DataSet.
And btw, using direct sql queries embedded in your application is a really really bad and poor way of designing application. Your application will be prone to "Sql Injection". Secondly if you write queries like that embedded in application, Sql Server has to do it's execution plan everytime whereas Stored Procedures are compiled and it's execution is already decided when it is compiled. Also Sql server can change it's plan as the data gets large. You will get performance improvement by this. Atleast use stored procedures and validate junk input in that. They are inherently resistant to Sql Injection.
Stored Procedures and Dataset are the way to go.
See this diagram:
Edit: If you are into .Net framework 3.5, 4.0 you can use number of ORMs like Entity Framework, NHibernate, Subsonic. ORMs represent your business model more realistically. You can always use stored procedures with ORMs if some of the features are not supported into ORMs.
For Eg: If you are writing a recursive CTE (Common Table Expression) Stored procedures are very helpful. You will run into too much problems if you use Entity Framework for that.
This page explains in detail in which cases you should use a Dataset and in which cases you use direct access to the databases
I usually like to practice that, if I need to perform a bunch of analytical proccesses on a large set of data I will fill a dataset (or a datatable depending on the structure). That way it is a disconnected model from the database.
But for DML queries I prefer the quick hits directly to the database (preferably through stored procs). I have found this is the most efficient, and with well tuned queries it is not bad at all on the db.
I'm building a solution where I'm retrieving large amounts of data from the database(5k to 10k records). Our existing data access layer uses Fluent NHibernate but I'm "scared" that I will incur a large amount of overhead by hydrating object models that represent the database entities.
Can I retrieve simply an ADO dataset?
Yes you should be concerned about the performance of this. You can look at using the IStatelessSession functionality of NHibernate. However this probably won't give you the performance you are looking for. While I haven't used NH since 2.1.2GA, I would find it unlikely that they've substantially improved the performance of NH when it comes to bulk operations. To put it bluntly, NH just sucks (and most ORMs in general for that matter) when it comes to bulk operations.
Q: Can I retrieve simply an ADO dataset?
Of course you can. Just because you're using NHibernate doesn't mean you can't new up an ADO.NET connection and hit the database in the raw.
As much as I loathe data tables and data sets, this one of the rare cases you might want to consider using them instead of adding the overhead of mapping / creating the objects associated with your 10K rows of data.
Depending on how much performance you need, there are a few options. Nothing will ever beat using a sqldatareader, as that's what's underneath just about every .NET ORM implementation. In addition to being the fastest, it can take a lot less memory if you don't need to save a list of all the records after the query.
Now as for your performance worries, 5k-10k records isn't that high. I've pulled upwards of a million rows out of nhibernate before, but obviously the records weren't huge and it was for a single case. If you're doing this on a high traffic website then of course you will have to be more efficient if you're hitting bottlenecks. If you're thinking about datasets, I'd suggest instead trying Massive because it should still be more efficient than DataSet/Table and more convenient.
You can use "Scalar queries", which is in fact native SQL query returning a list of object[] (one object[] per row):
sess.CreateSQLQuery("SELECT * FROM CATS")
.AddScalar("ID", NHibernateUtil.Int64)
.AddScalar("NAME", NHibernateUtil.String)
.AddScalar("BIRTHDATE", NHibernateUtil.Date)
Example from NHibernate documentation: http://nhibernate.info/doc/nh/en/index.html#d0e10794
We have been developing a DB web app which constantly queries an SQL store of objects (about 80,000 of them) which are related to each other in a hierarchy. (i.e. many objects have parentid which relates to another object in the same table)
I have been thinking that loading the entire object tree into memory and querying the objects using LINQ may be a better approach. The objects are 1-5K each mostly
Does anyone have any thoughts?
"LINQ" includes LINQ to Objects, LINQ to SQL, etc. You can certainly query in memory, but maybe you should wait until you've done a performance analysis before optimizing. You may optimize the wrong thing.
Ad hoc, loading up to 250 megabytes of data into client memory seems not to be such an good idea. If you have it memory, you can use LINQ to Object and save the database roundtrips.
The problem is that your data is hierarchical and you have to fetch one row, examin it, and retrieve the releated rows and this causes so many roundtrips.
You could use lazy loading to retrieve more and more rows into client memory during the runtime of your program and perform more and more queries in memory when the working set stabilizes. But this will only help if rows are accessed multiple times and not only once or twice.
Another solution might be to create recursive views or stored procedure using common table expression that retrieve releate rows up to some distance from a given row.