I am building a database on SQL Server.
This DB is going to be really huge.
However, there are few tables which need to be queried very frequently and are quite small.
Is there a way to cache these tables in RAM for faster querying ?
Any ideas/links to make the database insertions/query faster will be highly appreciated.
Also, do I get any performance boost if I migrate from SQL Express to SQL Server Enterprise ?
Thanks in advance.
SQL server will do an outstanding job of keeping small tables that are frequently accessed in RAM.
However, a small frequently accessed table does sound like a good candidate for caching at the application layer to avoid ever hitting the database.
If your database really is "huge", you will hit the 1GB RAM limit of SQL Express (and/or the 10GB per DB storage limitation) and will want an edition that does not have that constraint.
http://msdn.microsoft.com/en-us/library/cc645993(v=SQL.110).aspx
You can read the data from the table and store into the DataTable Variable。
You Should create suitable index and you and make the query faster.
If you are working with the C# then you may have try data caching.
You just need to follow 3 steps:
Fetch your result to a list
Now cache the list of data
Whenever you need to query cache result, cast your cache object to concern list type.
Following is the example code:
List<type> result = (Linq-query).ToList();
Cache["resultSet"] = optresult;
List<type> cachedList = (List<type>)Cache["resultSet"];
Now you may perform Linq query over cachedList which actually uses cached object.
Note: For caching any object you may use more precise approach like following, this provides better control over caching.
Cache cacheObjectName = new Cache();
cacheObjectName.Insert("Key", value, Dependency, DateTime, TimeSpan, CacheItemPriority, CacheItemRemovedCallback)
More a page is used by queries more are chances that the page will be in memory.But it will be at page level rather than table level. Everytime it will be referenced its count will be increased and a background process (lazy writer) usualy decrease the count for all the pages. When a new page is required to bring to memory ;sql server will write the page with least count to disk.Thus if your table's pages are accessed frequently there are high chances that the count will be high and thus those will stay in memory for longer.But if you will have some kind of a big query which reads lots of data from different tables which say is more than your memory then even those pages might be thrown out of the cache.But if you do not have those kind of queries then the chances are high that pages will stay in the memory.
Also, it means the same page is accessed a number of times.If diff processes will read diff pages from same table then you might not have very high use count for all of your pages and thus some of them could be written to disk.
Read below blog for more details on how buffers etc works.
http://sqlblog.com/blogs/elisabeth_redei/archive/2009/03/01/bufferpool-performance-counters.aspx
Depending on how often these small tables are changed, Query Notifications might be a good option. Essentially, you subscribe your application to changes in a data set in the db. A canonical example is a list of vendors. Doesn't change much over time but you want the application to know when it does change.
Related
I have a table with a lot of rows (3 million) from which I need to query some rows at several points in my app. The way I found to do this is querying all the data the first time that any was needed and storing it in a static DataTable with SqlAdapter.Fill() for the rest of the app life.
That's fast, because then when I need something I use DataTable.Select("some query") and the app processes the info just nice.
The problem is that this table takes about 800MB of RAM, and I have to run this app in PCs where it might be too much.
The other way I thought was to query the data I need each time. This takes little memory but has poor performance (a lot of queries to the database, which is at a network address and with 1000 queries you start to notice the ping and all that..).
Is there any intermediate point between performance and memory usage?
EDIT: What I'm retrieving are sales, which have a date, a product and a quantity. I query by product, and it isn't indexed that way. But anyways, making 1000 queries, even if the query took 0.05s, a 0.2s ping makes a total of 200 seconds...
First talk to the dba about performance
If you are downloading the entire table you might actually be putting more load on the network and SQL than if you performed individual queries.
As a dba if I knew you were downloading an entire large table I would put an index on product immediately.
Why are you performing 1000s of queries?
If you are looking for sales when a product is created then a cache is problematic. You would not yet have sales data. The problem with a cache is stale data. If you know the data will not change - you either have it or not then you can eliminate the concern of stale data.
There is something between sequentially and simultaneously. You can pack multiple selects in a single request. What this does is make a single round trip and is more efficient.
select * from tableA where ....;
select * from tableB where ....;
With DataReader just call SqlDataReader.NextResult Method ()
using (SqlDataReader rdr = cmd.ExecuteReader())
{
while (rdr.Read())
{
}
rdr.NextResultSet();
while (rdr.Read())
{
}
}
Pretty sure you can do the same type of thing with multiple DataTables in a DataSet.
Another option is a LocalDB. It is targeted at developers but for what you are doing it would work just fine. DataTable speed without memory concerns. You can even put an index on ProductID. It will take a little longer to write to disc compared to memory but you are not using up memory.
Then there is the ever evil with (nolock). Know what you are doing and I am not going to go into all the possible evils but I can tell you that I use it a lot.
The question can be precipitated to Memory vs Performance. The answer to that is Caching.
If you know what your usage pattern would be like, then one thing you can do is to create a local cache in the app.
The extreme cases are - your cache size is 800MB with all your data in it (thereby sacrificing memory) - OR - your cache size is 0MB and all your queries go to network (thereby sacrificing performance).
Three important questions about the design of the cache are answered below.
How to populate the Cache?
If you are likely to make some query multiple times, store it in cache and before going to network, test if your cache already has the result. If it doesn't, query the database and then store the result in the cache.
If after querying for some data, you are likely to query the next and/or previous piece of data, then query all of it once and cache it so that when you query the next piece, you already have it in cache.
Effectively the idea is that if you know some information may be needed in future, cache it beforehand.
How to free the Cache?
You can decide the freeing mechanism for cache either Actively or Passively.
Passively: Whenever cache is full you can evict the data from it.
Actively: Run a background thread at regular interval and it takes care of removal for you.
One hybrid method is to run a freeing thread as soon as you reach, let's say, 80% of your memory limit and then free whatever memory you can.
What data to remove from the Cache?
This has been answered already in context of the issue of Page Replacement Policies for Operating Systems.
For completion, I'll summarize the important ones here:
Evict the Least Recently Used data (if it is not likely to be used);
Evict the data that was brought in earliest (if the earliest data is not likely to be used);
Evict the data that was brought in latest (if you think that the newly brought in data is least likely to be used).
Automatically remove the data that is older than t time units.
RE: "I can't index by anything because I'm not the database admin nor can ask for that."
Can you prepopulate a temp table and index on that?, e.g.
Select * into #MyTempTable from BigHugeTable
Create Index Prodidx on #MyTempTable (product)
You will have to ensure you always reuse the same connection (and it isn't closed) in order to use the temp table.
I have noticed that our web application queries a particular table an enormous amount of times. The table is relatively small, with only about a hundred rows that are used.
I'm wondering if there is a way to store this table once every 15 minutes or so in memory in the website application, so the system doesn't have to make so many queries to get the same information over and over again. This would be available across many different users.
The table is the Client table, so users login from many different clients. The data is pretty static, probably getting updated perhaps once a day.
Updates: SQL profiler is showing the query is run quite a bit, so that's what concerns me. The website is not notably slow. I just thought this could help make it even faster.
If the table is small and frequently queried, there is an outstanding chance that the data and any indices is entirely in SQL Server's memory, the query plan is cached, and that the query will be extremely fast.
Measure the actual performance impact before making any changes.
If you see there is a performance impact, there are many caching strategies that you can use to reduce trips to the database. More information about access patterns to the table and the need for information consistency would be needed to recommend a specific caching strategy.
You state
to get the same information over and over again
but also state
once every 15 minutes
If the information really is the same over and over, you can load it once into the ASP.Net cache at application start. If it might change every so often, but it is OK for the data to be a little out-of-date for a given user, you can use a time-based cache expiration policy. If the data changes only every so often but must be up-to-date immediately after it changes, you can consider a SQL Dependency for cache expiration.
For more information on ASP.Net caching see
http://msdn.microsoft.com/en-us/library/xsbfdd8c(v=vs.100).aspx
and specifically
http://msdn.microsoft.com/en-us/library/6hbbsfk6(v=vs.100).aspx
My suggestion would be to create a WCF windows service - using REST you could easily cache the SQLDataReader (or other DataReader) and implement a TTL metric to re-query at an interval.
Well,there is few solutions.
If you want to load data in memory every 15 minutes you should use some of the .net caching library's,for example system .NET Caching where you could set expiration polices,and other.
You could try optimize you query with nonclustered indexes
You could use App Fabric caching,or something similar
And last,try to add more memory on sql server server
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What are the performance characteristics of sqlite with very large database files?
I want to create a .Net application that uses a database that will contain around 700 million records in one of its tables. I wonder if the performance of SQLite would satisfy this scenario or should I use SQL Server. I like the portability that SQLite gives me.
Go for SQL Server for sure. 700 million records in SQLite is too much.
With SQLite you have following limitation
Single process write.
No mirroring
No replication
Check out this thread: What are the performance characteristics of sqlite with very large database files?
700m is a lot.
To give you an idea. Let's say your record size was 4 bytes (essentially storing a single value), then your DB is going to be over 2GB. If your record size is something closer to 100 bytes then it's closer to 65GB... (that's not including space used by indexes, and transaction log files, etc).
We do a lot of work with large databases and I'd never consider SQLLite for anything of that size. Quite frankly, "Portability" is the least of your concerns here. In order to query a DB of that size with any sort of responsiveness you will need an appropriately sized database server. I'd start with 32GB of RAM and fast drives.
If it's write heavy 90%+, you might get away with smaller RAM. If it's read heavy then you will want to try and build it out so that the machine can load as much of the DB (or at least indexes) in RAM as possible. Otherwise you'll be dependent on disk spindle speeds.
SQLite SHOULD be able to handle this much data. However, you may have to configure it to allow it to grow to this size, and you shouldn't have this much data in an "in-memory" instance of SQLite, just on general principles.
For more detail, see this page which explains the practical limits of the SQLite engine. The relevant config settings are the page size (normally 64KB) and page count (up to a 64-bit int's max value of approx 2.1 billion). Do the math, and the entire database can take up more than 140TB. A database consisting of a single table with 700m rows would be on the order of tens of gigs; easily manageable.
However, just because SQLite CAN store that much data doesn't mean you SHOULD. The biggest drawback of SQLite for large datastores is that the SQLite code runs as part of your process, using the thread on which it's called and taking up memory in your sandbox. You don't get the tools that are available in server-oriented DBMSes to "divide and conquer" large queries or datastores, like replication/clustering. In dealing with a large table like this, insertion/deletion will take a very long time to put it in the right place and update all the indexes. Selection MAY be livable, but only in indexed queries; a page or table scan will absolutely kill you.
I've had tables with similar record counts and no problems retrieval wise.
For starters, the hardware and allocation to the server is where you can start. See this for examples: http://www.sqlservercentral.com/blogs/glennberry/2009/10/29/suggested-max-memory-settings-for-sql-server-2005_2F00_2008/
Regardless of size or number of records as long as you:
create indexes on foreign key(s),
store common queries in Views (http://en.wikipedia.org/wiki/View_%28database%29),
and maintain the database and tables regularly
you should be fine. Also, setting the proper column type/size for each column will help.
We currently use List<T> to store events from a simulation project we are running. We need to optimise memory utilisation and the time it takes to process the events in order to derive certain key metrics.
We thought of moving the event log to a SQL Server Compact database table and then possibly use Linq to calculate the metrics. From your experience do you think it will be faster to use SQL Server Compact than C#'s built-in data structures or are we going to have issues?
Some ideas.
MSMQ (Microsoft Message Queue)
You can have a thread dequeueing off of MSMQ and updating metrics on the fly. If you need to store these events for later paroosal you can put them into the database as you dequeue them. MSMQ demonstrates much better scalability in these scenarios - especially when the publisher and subscriber have assymetric processing speeds; and binary data is being used (as SQL can get bogged down with allocating space for VARBINARY, or allocating/splitting pages for indexes).
The two other SQL scenarios are complimentary to this one - you can still use dequeueing to insert into SQL; to avoid any hiccups in your simulation while SQL allocates space.
You can side-step what #Aliostad said using this one, to a certain degree.
OLAP (Online Analytical Processing)
Sounds like you might benefit from from OLAP (cubes etc.). This will increase the overall runtime of your simulation but will improve the value of the data. Unfortunately this means forking out cash for one of the bigger SQL editions.
Stored Procedures
While Linq-to-SQL is great for 'your average developer' please keep away from it in scientific projects. There are a host of great tricks you can use in raw TSQL, in addition to being able to inspect the query plan. If you want the best possible performance plan your DB carefully and create stored procedures/UDFs to aggregate your data.
If you can only calculate some of the metrics in C#, do as much work in SQL before-hand - and then feel free to use Linq-to-SQL to grab the data.
Also remember if you are inserting off the end of a MSMQ you can agressively index, which will speed up your metric calculations without impacting your simulation.
I would only involve SQL if there is a real need for better memory utilization (i.e. you are actually running out of it).
Memory Mapped Files
This allows you to offset memory pressure onto disk; at a performance penalty if it needs to be 'paged' back in.
Overall
I could steer clear of Linq to define basic metrics - do it in SQL. MSMQ is without a doubt a huge winner in this case. Don't overcomplicate the memory issue and keep it in .Net if you are not running out of memory.
If you need to process all of the events a C# List<> will be faster than Sql Server. An Array<> will have better performance, especially if the elements are structs and not classes, since structs are put in arrays where class instances only are referenced from the array. Having the structs within the array reduces garbage collection and increases cache locality.
If you only need to process part of the events, I think the solutions are in this order when it come to speed:
C# data structures, crafted especially for your needs.
Sql Server
Naive C# data structures, traversing a list searching for the right elements.
It sounds like you're thinking you need to have them in a database in order to use Linq. This isn't the case. You can use Linq with csharp's built in structures.
Depends on what you mean "faster use". If this is about performance of access to data, it's all about how much data you have, on big data the DB solution, only for statistical purposes, is definitely good choice.
Like DB, for this kind of purposes I would suggest SQLite: as this is single file (no services need like SQL Server compact) fully ACID supported DB. But again, this depends on your data size, as SQLite has limit of data inferior to that one of SQLServer.
Regards.
We need to optimise memory utilisation
Use Sql-Server-CE
the time it takes to process the events
Use Linq-To-Objects.
These two objectives are conflicting and you need to choose one that matters more to you.
I'm attempting to create Data Access Layer for my web application. Currently, all datatables are stored in the session. When I am finished the DAL will populate and return datatables. Is it a good idea to store the returned datatables in the session? A distributed/shared cache? Or just ping the database each time? Note: generally the number of rows in the datatable will be small < 2000.
Additional info:
Almost none of the data is shared. The parameters that are sent to the SQL queries are chosen by the user. The parameter values available to the user are based on who the user is. In most cases it is impossible for two users to run the same sql queries. However, the same user can run the same query more than once.
More info:
Number of concurrent users ~50,000
Important info:
In 99% of the cases no two users will have the same data/queries, however, the same user may run the same query/get the same data multiple times.
Thanks
Storing the data in session is not a good idea because:
Every user gets a separate copy of the same data - enormous waste of server memory.
IIS will recycle a session if you fill it with too much data.
I recommend storing the data tables in Cache, and also populating each table only when first requested rather than all at once. That way, if IIS starts reclaiming space in the cache, your code won't be affected.
Very simple example of fetching on demand:
T GetCached<T>(string cacheKey, Func<T> getDirect) {
object value = HttpContext.Current.Cache.Item(cacheKey);
if(value == null) {
value = getDirect();
HttpContext.Current.Cache.Insert(cacheKey, value);
}
return (T) value;
}
EDIT: - Question Update
Cache vs local Session - Local session state is all-or-nothing. If it gets too full, IIS will recycle everything in it. By contrast, cache items are dropped individually when memory gets too low, so it's much less of a problem.
Cache vs Session state server - I don't have any data to back this up, so please say so if I've got this wrong, but I would have thought that caching the data independently in memory in each physical server AppDomain would scale better than storing it in a shared session state service.
The first thing I would say is: cache is not mandatory everywhere. You should use it wisely and very specially on bottlenecks related to data access.
I don't think it's a good idea to store 1000 different datatables with 2000 records anywhere. If queries are so dynamic that having the same query in a short period of time is the exception then cache doesn't seem a good option.
And in relation to a distributed cache option, I suggest you to check http://memcached.org . A distributed cache used by many big projects around the world.
I know Velocity is near, but so far I know it needs Windows Server 2008 and it's something very very new yet. Normally Microsoft products are good from version 2.0 :-)
Store lookups/dictionaries - and items that your app would require very frequently in Application or Cache object; query database for data that depends upon the user role.
--EDIT--
This is in response to your comment.
Usually in any data oriented system, the queries run around the facts table(or tables that are inevitable to query); assuming you do have a set of inevitable tables, so you can use Cache.Insert():
Load the inevitable tables on app startup;
Load most queried tables in Cache upon table request-basis;
Query database for least queried tables.
If you do not have any performance issues then let SQL handle everything.
Storing that amount of data in the Session is a very bad idea. Each user will get their own version!
If this is shared data (same for all users), consider moving it to the Application object.