I have looked around for a simple answer but |I haven't found it (Though if I am being blind or impatient, I be happy for someone to post me the link)
I have the following code in my repository
get
{
if (context.entity.Local.Count == 0)
{
return context.entity;
}
return context.entity.Local;
}
I know from common sense that the word local is not querying the database and getting the result set from memory. However, what I would like to know is, how much faster is fetching the result set from local than it is from the database? It is a huge difference?
I am asking as I would like to speed up my web application so I am trying to find weaknesses in the code.
Thanks
First, your common sense makes no sense at all. Local is nothing that is defined at all in EF so it depends on whoever made the repository and could as well refer to something else.
Second - a lot. Easily factor of 1000. THe database is a separate process, involves generating and then parsing SQL. 2x network transfer (or network alyer transfer). Compare that to just reading out a property. 1000 is likely conservative. Not that it may be a lot of time in the database to start with.
It depends on what you DO - but caching in memory and avoiding the database is a valid strategy that can make a lot of difference, performance wise. At the cost of more memory consumption and change synchronization issues. The later is not really relevant for some (never changing) data.
Related
I am currently benchmarking two databases, Postgres and MongoDB, on a relatively large data set with equivalent queries. Of course, I am doing my best to put them on equal grounds, but I have one dilemma. For Postgres I take the execution time reported by EXPLAIN ANALYZE, and there is a similar concept with MongoDB, using profiling (although not equivalent, millis).
However, different times are observed if executed from, lets say, PgAdmin or the mongo CLI client or in my watched C# app. That time also includes the transfer latency, and probably protocol differences. PgAdmin, for example, actually seems to completely deform the execution time (it obviously includes the result rendering time).
The question is: is there any sense in actually measuring the time on the "receiving end", since an application actually does consume that data? Or does it just include too many variables and does not contribute anything to the actual database performance, and I should stick to the reported DBMS execution times?
The question you'd have to answer is why are you benchmarking the databases? If you are benchmarking so you can select one over the other, for use in a C# application, then you need to measure the time "on the 'receiving end'". Whatever variables that may contain, that is what you need to compare.
I find myself faced with a conundrum of which the answer probably falls outside of my expertise. I'm hoping someone can help.
I have an optimised and efficient query for fetching table (and linked) data, the actual contents of which are unimportant. However upon each read that data then needs to be processed to present the data in JSON format. As we're talking typical examples where a few hundred rows could have a few hundred-thousand associated rows this takes time. With multi-threading and a powerful CPU (i7 3960X) this processing is around 400ms - 800ms at 100% CPU. It's not a lot I know but why process it each time in the first place?
In this particular example, although everything I've ever read points to not doing so (as I understood it) I'm considering storing the computed JSON in a VARCHAR(MAX) column for fast reading.
Why? Well the data is read 100 times or more for every single write (change), it seems to me that given those numbers it would be far better to stored the JSON for optimised retrieval and re-compute and update it on the odd occasion the associations are changed - adding perhaps 10 to 20 ms to the time taken to write changes, but improving the reads by some large factor.
Your opinions on this would be much appreciated.
Yes, storing redundant information for performance reasons is pretty common. The first step is to measure the overhead - and it sounds like you've done that already (although I would also ask: what json serializer are you using? have you tried others?)
But fundamentally, yes that's ok, when the situation warrants it. To give an example: stackoverflow has a similar scenario - the markdown you type is relatively expensive to process into html. We could do that on every read, but we have insanely more reads than writes, so we cook the markdown at write, and store the html as well as the source markdown - then it is just a simple "data in, data out" exercise for most of the "show" code.
It would be unusual for this to be a common problem with json, though, since json serialization is a bit simpler and lots of meta-programming optimization is performed by most serializers. Hence my suggestion to try a different serializer before going this route.
Note also that the rendered json may need more network bandwidth that the original source data in TDS - so your data transfer between the db server and the application server may increase; another thing to consider.
I need to get the set of GUIDs in a remote database which do not exist in an IEnumerable (for context, this is coming from a Lucene index). There are potentially many millions of these Guids.
I currently think that inserting the IEnumerable to the database and doing the difference there will be too expensive (the inserts will hammer the database), but I am prepared to be proven wrong!
Reading both sets into memory is also infeasible due to the amount of data - our existing solution does this and fails with very large sets.
I would like a solution which can operate on a small subset of the data at a time so that we have a constant memory footprint. We have an idea as to how to roll our own implementation of this, but it is non-trivial, so would obviously rather use an existing one if it exists.
If anybody has any recommendations for an existing solution, I'd be grateful to hear them!
You could use SqlBulkCopy to load the guids very fast to the database(if it is SQL-Server).
TL;DR: Which is likely faster: accessing static local variable, accessing variable stored in HttpRuntime.Cache, or accessing variable stored in memcached?
At work, we get about 200,000 page views/day. On our homepage, we display a promotion. This promotion is different for different users, based on their country of origin and language.
All the different promotions are defined in an XML file on each web server. We have 12 web servers all serving the same site with the same XML file. There are about 50 different promotion combinations based on country/language. We imagine we'll never have more than 200 or so (if ever) promotions (combinations) total.
The XML file may be changed at any time, out of release cycle. When it's changed, the new definitions of promotions should immediately change on the live site. Implementing the functionality for this requirement is the responsibility of another developer and I.
Originally, I wrote the code so that the contents of the XML file were parsed and then stored in a static member of a class. A FileSystemWatcher monitored changes to the file, and whenever the file was changed, the XML would be reloaded/reparsed and the static member would be updated with the new contents. Seemed like a solid, simple solution to keeping the in-memory dictionary of promotions current with the XML file. (Each server doing this indepedently with its local copy of the XML file; all XML files are the same and change at the same time.)
The other developer I was working holds a Sr. position and decided that this was no good. Instead, we should store all the promotions in each server's HttpContext.Current.Cache with a CacheDependency file dependency that automatically monitored file changes, expunging the cached promotions when the file changed. While I liked that we no longer had to use a FileSystemWatcher, I worried a little that grabbing the promotions from the volitile cache instead of a static class member would be less performant.
(Care to comment on this concern? I already gave up trying to advocate not switching to HttpRuntime.Cache.)
Later, after we began using HttpRuntime.Cache, we adopted memcached with Enyim as our .NET interface for other business problems (e.g. search results). When we did that, this Sr. Developer decided we should be using memcached instead of the HttpRuntime (HttpContext) Cache for storing promotions. Higher-ups said "yeah, sounds good", and gave him a dedicated server with memcached just for these promotions. Now he's currently implementing the changes to use memcached instead.
I'm skeptical that this is a good decision. Instead of staying in-process and grabbing this promotion data from the HttpRuntime.Cache, we're now opening a socket to a network memcached server and transmitting its value to our web server.
This has to be less performant, right? Even if the cache is memcached. (I haven't had the chance to compile any performance metrics yet.)
On top of that, he's going to have to engineer his own file dependency solution over memcached since it doesn't provide such a facility.
Wouldn't my original design be best? Does this strike you as overengineering? Is HttpRuntime.Cache caching or memcached caching even necessary?
Not knowing exactly how much data you are talking about (assuming it's not a lot), I tend to somewhat agree with you; raw-speed wise, a static member should be the 'fastest', then Cache. That doesn't necessarily mean it's the best option, of course. Scalability is not always about speed. In fact, the things we do for scalability often negatively (marginally) affect the speed of an application.
More specifically; I do tend to start with the Cache object myself, unless a bit of 'static' data is pretty darn small and is pretty much guaranteed to be needed constantly (in which case I go for static members. Don't forget thread synch too, of course!)
With a modest amount of data that won't change often at all, and can easily be modified when you need to, by altering the files as you note, the Cache object is probably a good solution. memcached may be overkill, and overly complex... but it should work, too.
I think the major possible 'negative' to the memcached solution is the single-point-of-failure issue; Using the local server's Cache keeps each server isolated.
It sounds like there may not really be any choice in your case, politically speaking. But I think your reasoning isn't necessarily all that bad, given what you've shared here.
Very much agree with Andrew here. Few additions/deviations:
For small amount of rarely changing data, static fields would offer best performance. When your caching happens at no UI layer, it avoids taking dependency on System.Web assembly (of course, you can achieve this by other means as well as). However, in general, ASP.NET Cache would also be a good bet (especially when data is large, the cached data can expire if there is memory pressure etc.)
From both speed & scalability, output caching (including browser & down level caching) would be the best option and you should evaluate it. Even if data is changing frequently, output caching for 30-60 seconds can give significant performance boost for very large number of requests. If needed, you can do partial caching (user controls) and/or substitutions. Of course, this needs to be done with combination with data caching.
I work on a big project in company. We collect data which we get via API methods of the CMS.
ex.
DataSet users = CMS.UserHelper.GetLoggedUser(); // returns dataset with users
Now on some pages we need many different data, not just users, also Nodes of the tree of the CMS or specific data of subtreee.
So we thought of write an own "helper class" in which we later can get different data easy.
ex:
MyHelperClass.GetUsers();
MyHelperClass.Objects.GetSingleObject( ID );
Now the problem is our "Helper Class" is really big and now we like to collect different data from the "Helper Class" and write them into a typed dataset . Later we can give a repeater that typed dataset which contains data from different tables. (which even comes from the methods I wrote before via API)
Problem is: It is so slow now, even at loading the page! Does it load or init the whole class??
By the way CMS is Kentico if anyone works with it.
I'm tired. Tried whole night..but it's soooo slow. Please give a look to that architecture.
May be you find some crimes which are not allowed :S
I hope we get it work faster. Thank you.
alt text http://img705.imageshack.us/img705/3087/classj.jpg
Bottlenecks usually come in a few forms:
Slow or flakey network.
Heavy reading/writing to disk, as disk IO is 1000s of times slower than reading or writing to memory.
CPU throttle caused by long-running or inefficiently implemented algorithm.
Lots of things could affect this, including your database queries and indexes, the number of people accessing your site, lack of memory on your web server, lots of reflection in your code, just plain slow hardware etc. No one here can tell you why your site is slow, you need to profile it.
For what its worth, you asked a question about your API architecture -- from a code point of view, it looks fine. There's nothing wrong with copying fields from one class to another, and the performance penalty incurred by wrapper class casting from object to Guid or bool is likely to be so tiny that its negligible.
Since you asked about performance, its not very clear why you're connecting class architecture to performance. There are really really tiny micro-optimizations you could apply to your classes which may or may not affect performance -- but the four or five nanoseconds you'll gain with those micro-optimizations have already been lost simply by reading this answer. Network latency and DB queries will absolutely dwarf the performance subtleties of your API.
In a comment, you stated "so there is no problem with static classes or a basic mistake of me". Performance-wise, no. From a web-app point of view, probably. In particular, static fields are global and initialized once per AppDomain, not per session -- the variables mCurrentCultureCode and mcurrentSiteName sound session-specific, not global to the AppDomain. I'd double-check those to see your site renders correctly when users with different culture settings access the site at the same time.
Are you already using Caching and Session state?
The basic idea being to defer as much of the data loading to these storage mediums as possible and not do it on individual page loads. Caching especially can be useful if you only need to get the data once and want to share it between users and over time.
If you are already doing these things, ore cant directly implement them try deferring as much of this data gathering as possible, opting to short-circuit it and not do the loading up front. If the data is only occasionally used this can also save you a lot of time in page loads.
I suggest you try to profile your application and see where the bottlenecks are:
Slow load from the DB?
Slow network traffic?
Slow rendering?
Too much traffic for the client?
The profiling world should be part of almost every senior programmer. It's part of the general toolbox. Learn it, and you'll have the answers yourself.
Cheers!
First thing first... Enable Trace for your application and try to optimize Response size, caching and work with some Application and DB Profilers... By just looking at the code I am afraid no one can be able to help you better.