I've written my own caching layer for my objects that come out of data access. My reasoning here is I'd like my data access layer to do just that -- data access. I don't really want it to worry about caching, and I'd only like to go in to that layer when I need to fetch data out of the database. Perhaps this is not the right way to think about things -- please let me know if I'm off track.
Anyway, there is at least one issue that I've ran in to so far. In one scenario, I load an object from NHibernate and stick it in the cache in one request. In the next request I get that object from the cache, modify it, and go back down to NHibernate to save it. Obviously NHibernate pukes, in this particular instance with a "Illegal attempt to associate a collection with two open sessions" exception.
So my question is, I guess, is there anything I should be aware of or do to make this work? Or should I just use a 2nd level cache that's built in to NHibernate?
NHibernate has caching for a reason.. use it :)
You'll find there are quite a few options for a second level cache provider that give you much more flexibility for cheaper then you could build it yourself. A perfect example is something like memcache if you decide you need to run a service on multiple systems.
Related
I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.
Environment/Tools
Visual Studio 2013
C#
Three tier web application:
Database tier: SQL Server 2012
Middle tier: Entity Framework 6.* using Database First, Web API 2.*
Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.
Background
I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).
Problem
When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.
Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.
What I Have Tried
Here is what I have attempted so far:
Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).
Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.
Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).
Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.
Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).
Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.
What I am Looking For
In short, has anyone had to tackle this problem before and come up with a working and performant solution?
Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.
Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.
Thanks,
Dave
It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.
Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.
Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.
To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.
Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.
Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.
This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.
For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)
Either way will work, and has a variety of trade-offs.
If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC
There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.
i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables.
my optimization was :
1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)
2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or )
better to read pros and cons of fastJSON before
https://www.codeproject.com/Articles/159450/fastJSON
3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data.
also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).
NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.
NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.
4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.
5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.
6- in case of importing and exporting data from big files oledb is a best choice i think.
how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .
I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:
Include "Foreign key properties" in the sub-entities and use this for later look-up.
Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.
This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:
var myFitnessClubs = context.FitnessClubs
?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);
Note 1:
You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.
Note 2:
For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.
Currently, my entire website does updating from SQL parameterized queries. It works, we've had no problems with it, but it can occasionally be very slow.
I was wondering if it makes sense to refactor some of these SQL commands into classes so that we would not have to hit the database so often. I understand hitting the database is generally the slowest part of any web application For example, say we have a class structure like this:
Project (comprised of) Tasks (comprised of) Assignments
Where Project, Task, and Assignment are classes.
At certain points in the site you are only working on one project at a time, and so creating a Project class and passing it among pages (using Session, Profile, something else) might make sense. I imagine this class would have a Save() method to save value changes.
Does it make sense to invest the time into doing this? Under what conditions might it be worth it?
If your site is slow, you need to figure out what the bottleneck is before you randomly start optimizing things.
Caching is certainly a good idea, but you shouldn't assume that this will solve the problem.
Caching is almost always underutilized in ASP .NET applications. Any time you hit your database, you should look for ways to cache the results.
Serializing objects into the session can be costly in itself, but most likley faster than just hitting the database every single time. You are benefiting now from Execution Plan caching in SQL Server so it's very likely that what you're getting is optimal performance out of your stored procedure.
One option you might consider doing to increase performance is to astract your data into objects via LINQ to SQL (against your sprocs) and then use AppFabric to cache the objects.
http://msdn.microsoft.com/en-us/windowsserver/ee695849.aspx
As for your updates, you should do that directly against the sprocs, but you will also need to clear our the Cache in AppFabric for objects that are affected by the Insert/Update/Delete.
You could also do the same thing simply using the standard Cache as well, but AppFabric has some added benefits.
Use the SQL Profiler to identify your slowest queries, and see if you can improve them with some simple index changes (removing unused indexes, adding missing indexes).
You could very easily improve your application performance by an order of magnitude without changing your front-end app at all.
See http://sqlserverpedia.com/wiki/Find_Missing_Indexes
If you have look up data only, you can store it in Cache object. This will avoid the hits to DB. Only data that can be used globally should be stored in Cache.
If this data requires filtering, you can restore it from Cache, and filter the data before rendering.
Session can be used to store user specific data. But care must be taken that too much of session variables can easily cause performance problems.
This book may be helpful.
http://www.amazon.com/Ultra-Fast-ASP-NET-Build-Ultra-Scalable-Server/dp/1430223839
Say I have some 10 "categories" that I need to reference in a web app. Currently I'm storing these categories in the DB and am retrieving them (category name and id) during pageload in my basepage, storing them in a hashtable and then using the hashtable to reference the categories. I realized that the DB call is being made during each pageload. I only need to pull these categories once as they never change during a given session. What's the best way to do this?
Do whatever I am doing, but store it as a Application variable?
Hard code the categories in the code, and do away with the database?
Store the categories in a session variable? And call the DB only if the session is empty?
Or something else?
I'm just curious to know what the best-practice is for doing such things.
If the categories are user dependent then I'd store them in the Session variable; otherwise I'd use the ASP.NET Caching functionality (or preferred distributed caching). This allows you to avoid hitting the database whilst being able to control how long the categories should be cached for.
Calling the database is not always as expensive as it seems, a typical website makes dozens of calls to the DB on each pageload. But if it becomes troublesome in terms of performance, consider using an ORM solution. The excellent open source NHibernate comes to mind, which is the "de facto" standard for mapping databases to classes and objects. Beyond the mapping, it automatically provides two levels of caching and connection pooling. Without too much trouble, your website outperforms any other's by a landslide.
The downside of using an ORM? To many people, it is considered a rather steep learning curve. If you want to read into this, make sure to pay a visit to the NHibernate Best Practices. A tough read at first, but definitely worthwhile.
If you combine NHibernate with FluentNHibernate, it becomes a breeze to use.
If I wanted to access a database in Delphi, I could add a datamodule to a project, configure it from my mainform and then access it anywhere in the application; a reference would be stored in a global variable.
I know that in C# and other more modern OO languages global variables are frowned upon. So how can I access my database from where I need it? The biggest problem I have is the configuration: location, user, password, etc. are unknown at design time.
I now have a db-class and make a new instance when I need it, but then I would have to store those settings in some globally accessible thing, and I have simply moved the problem.
What's the standard solution?
Thanks, regards, Miel.
I always use the singleton pattern. As for configuration, look at the System.Configuration.ConfigurationManager class which allows you to read settings from your project's app.config/web.config file.
It's a bit tricky to define the absolute best practice for database access in OOP.
You've hit the nail on the head that there are a lot of factors to consider:
how are configuration parameters handled?
is the app multi-threaded? do you need database connection pools?
do you need database portability (ie: do you use different DBs in dev versus production? are you concerned about vendor lock-in with one DB? Are you distributing the app the other users who may be using a different db?)
are you concerned with securing your SQL statements, or centrally enforcing other access permissions?
is there common logic involved when performing some inserts and updates that you'd rather not duplicate everywhere a particular table is touched?
Because of this, many OOP folks gravitate to an ORM framework for anything but the simplest cases. The general idea is that your application logic shouldn't need to talk to the database directly at any point: isolate your business code from the actual persistence mechanism for as long as possible.
Instead, try to design your application so that your business logic talks to a model layer. In other words, have model objects in the system that encapsulate and describe your business data. These model objects then expose methods for obtaining and saving their state into the database, but your logic doesn't need to care about that.
For example, say you have a concept called "Person" in your system. You'd probably model this as a class with some properties. In pseudo-code:
Person:
- first_name
- last_name
Your actual code in the system is then only concerned with instantiating and using Person objects, not with obtaining DB handles or writing SQL:
p = Person.get(first_name='Joe')
p.last_name = 'Bloggs'
p.save()
In an object-oriented world, you'll find that your business logic code becomes cleaner (and shorter!), easier to maintain, and much more testable.
Of course, you're right in that this means you need to now go off and build a database back-end that translates that Person class to one or more tables in your relational database. This is where using an ORM framework comes in handy. In Python, people use Django and SQLAlchemy. I'll let others indicate what folks use in C# (I'm not a C# developer, but you did tag your question OOP, so I'm going for the generic answer here, rather than C# specific).
The point, though, is that the ORM framework puts all the DB access in a single set of classes in the code, so that the DB access, configuration and pools are handled in one place... no need to instantiate them all over the application. What you use "where you need it" is the model object.
Of course, if your app is very simple and you want just a raw DB handle, then I do recommend the dependency injection approach others have listed.
Hope that helps.
It seems to me that you need to create an appropriate object (containing the connection or similar), and pass that instance to each object requiring access (see dependency injection)
This is different from using singletons. By using this mechanism, it'll free you from the dependency on one object and (perhaps a more compelling reason in this instance) allow you to perform testing by injecting mock objects or similar in place of the originally-injected database accessor object. I would definitely shy away from the singleton mechanism in this scenario.
I actually use a repository class that takes in the db information in its constructor and have the classes that need it get it passed in. I actually use an Inversion of Control (IOC) tool to inject that values in.
You could store the user information in a flat file somewhere, then read / write to it from your db-class
This way you won't duplicate the settings in your code, but the user can still modify the settings.
SubSonic is the "Swiss Army knife" for object relational mapping, and offers the ability to execute stored procedures and return results to List. You can have it up and running within a half hour.
I am building a web-store with many departments and categories. They are stored in our database and accessed often.
We are using URL rewriting so almost every request within the store generates a lookup. We also need to iterate over the data frequently to generate menus for the main store and the department pages.
This information will not change often so I'm thinking that I should load the database into a dictionary to speed up the information retrieval.
I know the standard practice is to load data into the application cache, however i assume that there is some level of serialization that occurs during caching, and for a large data-structure I'm thinking the overhead would be significant.
My impulse on this is to put the dictionary in a static variable in one of the related classes. I would however like to get some input input on this. Am I right in thinking that this method would be faster? Is it horrible practice? Is there a better way that I'm missing?
I can't seem to find much information on this and I'd really appreciate any information that you can share. Thanks!
The Application and Cache collections do not serialize the objects you pass into them, they store the actual reference. Retrieving an object from Cache will not be an expensive operation, no matter how large the object is. Always stick with the Cache objects unless you have a very good reason not to, its just good practice.
The only other thing worth mentioning is to make sure you think about multithreaded access to this collection. You're going to end up with some serious issues very quickly if you don't lock properly
Well, I don't think it's so much work to rewrite code to use a static field instead of application cache if there's need to do so. I'd personally use the cache first. There's no need for premature optimization, have you measured the performance? It may behave just right with the application cache object. Maybe it even works well with db queries? :)
So, my answer is - use the cache and see how it works.
memcached is your friend! (but could be overkill if you're not scaling out)
Any idea how large your dictionary would be in application cache? I'd be tempted to recommend that as a good first option.
IMHO, generally speaking, if you have control on updates on underlying object, you should use static storage. Otherwise, if you are dependent on a 3rd party API for data retrievals, use Caching technology.