This might be a silly question? but are you able to perform a SQL query to the database to get all record items for X, store it into a local variable "myRecords" and then filter our the results contained in "myRecords" variable? (Saves you making multiple rounds/queries to the database).
Is this even a good idea? or a bad idea?
I believe you referring to classic case of caching, there are plenty of resources to guide you though the implementation of such approach. To get more specific answer you will need to give more details of the problem you are trying to solve. The topic is huge and could be very complex depending on the parameters of your environment.
Definitely!! Its called caching.. but so many different ways to do it.. check out this great article http://wiki.asp.net/page.aspx/655/caching-in-aspnet/
I agree with K Ivanov. Just wanted to add a few things.
This can be a huge performance improvement or a complete disaster. Rarely is it in the middle of the two.
I think you might want to do a few things before pursuing this topic. First, profile your database and queries. Look for any areas such as new or changed indexes or even a slightly different table design that might lead to performance improvements first.
There are several problems. The first deals with memory requirements on the machine that will cache this. If your database server is already memory constrained then adding some intermediary cache ON it is not advisable.
If you cache the data in the web server then you might be radically increasing the memory footprint of your application. Further, depending on the caching strategy and the nature of your data it can quickly become stale leading to bad results.
Point is, before you try and go a different path make sure you completely understand the problem you have. It might be as simple as tweaking indexes and/or adding RAM to your DB server. These are relatively easy to implement whereas a solid caching mechanism can either be a boon or lead to even larger problems.
Related
We have a reporting tool that is grabbing a large amount of records. At times it can be 1 million records. We have been storing this in a datable. I wanted to to know if there was a better object to store this in. I would need to be able to aggregate the data in various ways.
Update:
Yes. Personally believe that should not being getting that many records. This is not the direction I want to go.
Also I am using Oracle
Update Update
Sorry for the delay, but there are always fire to put out here. The main issue was they were running out of memory and getting memory errors. They had issues with the datatable releasing from memory and also binding to a datagridview. I guess what I was looking for was a lighter weight object that wouldn't take as much space.
After thinking about a little more, it really doesn't make any sense to get that much data as diagonalbatman mentioned. furthermore if we have just a few people are using it with these issues. How is it going to scale.
Unfortunately, I have a boss that doesn't listen and an offshore team that is too much of a "yes sir" type attitude. They are serializing the raw data (as an XML file) and releasing the raw data Datatable which I think is not a good direction at all.
#diagonalbatman - OUt of curiousity, do you have an example of this
Why do you need to draw down 1 Milion records into your app?
Can you not do your reporting consolidation / aggregation on the DB? This would make better use of the DB's resources (after all this is what an RDBMS is designed to do) then you can focus your app on working with smaller consolidated sets?
I would recommend you try several options to verify, especially in light of your needed ability to aggregate the data in various ways.
1) Can it be aggregated by proper queries on the data side, this is likely the best solution.
2) if you use POCOs does LINQ improve upon your current memory and performance characteristics. Does LINQ allow you to to the aggregation you require.
Measure the characteristics you care about and try different options.
What you want are Data Cubes. Depending on the type of database you have, you should look at building some Cubes.
i have a similar requirement to stackoverflow to show a number of metrics on a page in my asp.net-mvc site that are very expensive to calculate. Stackoverflow has a lot of metrics on the page (like user accept rate, etc) which clearly is not being calculated on the fly on page request, given that it would be too slow.
What is a recommended practice for serving up calculated data really fast without the performance penalty (assuming we can accept that this data maybe a little out of date.
is this stored in some caching layer or stored in some other "results" database table so every day there is a job to calculate this data and store the results so they can be queries directly?
assuming that i am happy to deal with the delayed of having this data as a snapshot,what is the best solution for this type of problem.
Probably they may be relying on the Redis data store for such calculations and caching. This post from marcgravell may help.
yes, the answer is caching, how you do it is (can be) the complicated part, if you are using NHibernate adding caching is really easy, is part of your configuration and on the queries you just add .Cacheable and it manages it for you. Caching also depends on the type of environment, if you're using a single worker, web farm or web garden, you would have to build a caching layer to accomodate for your scenario
Although this is a somewhat-recent technique, one really great way to structure your system to make stuff like this possible is by using Command and Query Responsibility Segregation, more often referred to by CQRS.
Currently, my entire website does updating from SQL parameterized queries. It works, we've had no problems with it, but it can occasionally be very slow.
I was wondering if it makes sense to refactor some of these SQL commands into classes so that we would not have to hit the database so often. I understand hitting the database is generally the slowest part of any web application For example, say we have a class structure like this:
Project (comprised of) Tasks (comprised of) Assignments
Where Project, Task, and Assignment are classes.
At certain points in the site you are only working on one project at a time, and so creating a Project class and passing it among pages (using Session, Profile, something else) might make sense. I imagine this class would have a Save() method to save value changes.
Does it make sense to invest the time into doing this? Under what conditions might it be worth it?
If your site is slow, you need to figure out what the bottleneck is before you randomly start optimizing things.
Caching is certainly a good idea, but you shouldn't assume that this will solve the problem.
Caching is almost always underutilized in ASP .NET applications. Any time you hit your database, you should look for ways to cache the results.
Serializing objects into the session can be costly in itself, but most likley faster than just hitting the database every single time. You are benefiting now from Execution Plan caching in SQL Server so it's very likely that what you're getting is optimal performance out of your stored procedure.
One option you might consider doing to increase performance is to astract your data into objects via LINQ to SQL (against your sprocs) and then use AppFabric to cache the objects.
http://msdn.microsoft.com/en-us/windowsserver/ee695849.aspx
As for your updates, you should do that directly against the sprocs, but you will also need to clear our the Cache in AppFabric for objects that are affected by the Insert/Update/Delete.
You could also do the same thing simply using the standard Cache as well, but AppFabric has some added benefits.
Use the SQL Profiler to identify your slowest queries, and see if you can improve them with some simple index changes (removing unused indexes, adding missing indexes).
You could very easily improve your application performance by an order of magnitude without changing your front-end app at all.
See http://sqlserverpedia.com/wiki/Find_Missing_Indexes
If you have look up data only, you can store it in Cache object. This will avoid the hits to DB. Only data that can be used globally should be stored in Cache.
If this data requires filtering, you can restore it from Cache, and filter the data before rendering.
Session can be used to store user specific data. But care must be taken that too much of session variables can easily cause performance problems.
This book may be helpful.
http://www.amazon.com/Ultra-Fast-ASP-NET-Build-Ultra-Scalable-Server/dp/1430223839
I have my mind firmly wrapped around relational databases and how to code efficiently against them. Most of my experience is with MySQL and SQL. I like many of the things I'm hearing about document-based databases, especially when someone in a recent podcast mentioned huge performance benefits. So, if I'm going to go down that road, what are some of the mental steps I must take to shift from SQL to NO-SQL?
If it makes any difference in your answer, I'm a C# developer primarily (today, anyhow). I'm used to ORM's like EF and Linq to SQL. Before ORMs, I rolled my own objects with generics and datareaders. Maybe that matters, maybe it doesn't.
Here are some more specific:
How do I need to think about joins?
How will I query without a SELECT statement?
What happens to my existing stored objects when I add a property in my code?
(feel free to add questions of your own here)
Firstly, each NoSQL store is different. So it's not like choosing between Oracle or Sql Server or MySQL. The differences between them can be vast.
For example, with CouchDB you cannot execute ad-hoc queries (dynamic queries if you like). It is very good at online - offline scenarios, and is small enough to run on most devices. It has a RESTful interface, so no drivers, no ADO.NET libraries. To query it you use MapReduce (now this is very common across the NoSQL space, but not ubiquitous) to create views, and these are written in a number of languages, though most of the documentation is for Javascript. CouchDB is also designed to crash, which is to say if something goes wrong, it just restarts the process (the Erlang process, or group of linked processes that is, not the entire CouchDB instance typically).
MongoDB is designed to be highly performant, has drivers, and seems like less of a leap for a lot of people in the .NET world because of this. I believe though that in crash situations it is possible to lose data (it doesn't offer the same level of transactional guarantees around writes that CouchDB does).
Now both of these are document databases, and as such they share in common that their data is unstructured. There are no tables, no defined schema - they are schemaless. They are not like a key-value store though, as they do insist that the data you persist is intelligible to them. With CouchDB this means the use of JSON, and with MongoDB this means the use of BSON.
There are many other differences between MongoDB and CouchDB and these are considered in the NoSQL space to be very close in their design!
Other than document databases, their are network oriented solutions like Neo4J, columnar stores (column oriented rather than row oriented in how they persist data), and many others.
Something which is common across most NoSQL solutions, other than MapReduce, is that they are not relational databases, and that the majority do not make use of SQL style syntax. Typcially querying follows an imperative mode of programming rather than the declarative style of SQL.
Another typically common trait is that absolute consistency, as typically provided by relational databases, is traded for eventual models of consistency.
My advice to anyone looking to use a NoSQL solution would be to first really understand the requirements they have, understand the SLAs (what level of latency is required; how consistent must that latency remain as the solutions scales; what scale of load is anticipated; is the load consistent or will it spike; how consistent does a users view of the data need to be, should they always see their own writes when they query, should their writes be immediately visible to all other users; etc...). Understand that you can't have it all, read up on Brewers CAP theorum, which basically says you can't have absolute consistence, 100% availability, and be partition tolerant (cope when nodes can't communicate). Then look into the various NoSQL solutions and start to eliminate those which are not designed to meet your requirements, understand that the move from a relational database is not trivial and has a cost associated with it (I have found the cost of moving an organisation in that direction, in terms of meetings, discussions, etc... itself is very high, preventing focus on other areas of potential benefit). Most of the time you will not need an ORM (the R part of that equation just went missing), sometimes just binary serialisation may be ok (with something like DB4O for example, or a key-value store), things like the Newtonsoft JSON/BSON library may help out, as may automapper. I do find that working with C#3 theere is a definite cost compared to working with a dynamic language like, say Python. With C#4 this may improve a little with things like the ExpandoObject and Dynamic from the DLR.
To look at your 3 specific questions, with all it depends on the NoSQL solution you adopt, so no one answer is possible, however with that caveat, in very general terms:
If persisting the object (or aggregate more likely) as a whole, your joins will typically be in code, though you can do some of this through MapReduce.
Again, it depends, but with Couch you would execute a GET over HTTP against either a specific resource, or against a MapReduce view.
Most likely nothing. Just keep an eye-out for the serialisation, deserialisation scenarios. The difficulty I have found comes in how you manage versions of your code. If the property is purely for pushing to an interface (GUI, web service) then it tends to be less of an issue. If the property is a form of internal state which behaviour will rely on, then this can get more tricky.
Hope it helps, good luck!
Just stop thinking about the database.
Think about modeling your domain. Build your objects to solve the problem at hand following good patterns and practices and don't worry about persistence.
I am building a web-store with many departments and categories. They are stored in our database and accessed often.
We are using URL rewriting so almost every request within the store generates a lookup. We also need to iterate over the data frequently to generate menus for the main store and the department pages.
This information will not change often so I'm thinking that I should load the database into a dictionary to speed up the information retrieval.
I know the standard practice is to load data into the application cache, however i assume that there is some level of serialization that occurs during caching, and for a large data-structure I'm thinking the overhead would be significant.
My impulse on this is to put the dictionary in a static variable in one of the related classes. I would however like to get some input input on this. Am I right in thinking that this method would be faster? Is it horrible practice? Is there a better way that I'm missing?
I can't seem to find much information on this and I'd really appreciate any information that you can share. Thanks!
The Application and Cache collections do not serialize the objects you pass into them, they store the actual reference. Retrieving an object from Cache will not be an expensive operation, no matter how large the object is. Always stick with the Cache objects unless you have a very good reason not to, its just good practice.
The only other thing worth mentioning is to make sure you think about multithreaded access to this collection. You're going to end up with some serious issues very quickly if you don't lock properly
Well, I don't think it's so much work to rewrite code to use a static field instead of application cache if there's need to do so. I'd personally use the cache first. There's no need for premature optimization, have you measured the performance? It may behave just right with the application cache object. Maybe it even works well with db queries? :)
So, my answer is - use the cache and see how it works.
memcached is your friend! (but could be overkill if you're not scaling out)
Any idea how large your dictionary would be in application cache? I'd be tempted to recommend that as a good first option.
IMHO, generally speaking, if you have control on updates on underlying object, you should use static storage. Otherwise, if you are dependent on a 3rd party API for data retrievals, use Caching technology.