Imagine the following:
I have a table of 57,000 items that i regularly use in my application to figure out things like targeting groups etc.
instead of querying the database 300,000 times a day, for a table that hardly ever changes it's data, is there a way to store its information in my application and poll data in memory directly? Or, do I have to create some sort of custom datatype for each row and iterate through testing each row, to check for the results i want?
After some googling, the closest thing i could find is in-memory database
thank you,
- theo
SQLite supports in-memory tables.
For 57,000 items that you will be querying against (and want immediately available) I would not recommend implementing just simple caching. For that many items I'd either recommend a distributed memory cache (even if it's only one machine) such as Memcache, Velocity etc or to go with your initial idea of using an in memory database.
Also if you use any full fledged ORM such as NHibernate you can implement it to use clients for the distributed caching tools with almost no work. Many of the major clients have NHibernate implementations for them including Memcache, Velocity and some others. This might be a better solution as you can have it where it's only caching data it truly is using and not all the data it might need.
Read up on Caching
It sounds like this is application level data rather than user level, so you should look into "Caching Application Data"
Here are some samples of caching datatables
If you only need to find rows using the same key all the time, a simple Dictionary<Key, Value> could very well be all that you need. To me, 57,000 items doesn't really sound that much unless each row contains a huge amount of data. However, if you need to search by different columns, an in-memory database is most likely the way to go.
Related
We have a fairly busy distributed cloud based system that we want to introduce basic profiling into. Firstly we'd like to monitor web page render times and DB calls - we use EF and SQLServer.
The question is what is the best (performant & easy) way to record this information? My first thought is to store it in a DB, but would this cause a performance issue when a single page render may require multiple DB calls and hence, multiple inserts into the performance table.
Would it be better to store this information in memory only, or perhaps store in memory short term then batch persist to a DB later? Or is some other approach recommended?
If you simply send INSERT statements to the database without block-waiting to receive values of identity columns, it should be fairly lightweight for the web server.
If the database table does not have any keys, (or only a clustered key,) it should be fairly lightweight on the server, too.
This would certainly be the easiest approach, and would not take much to implement it and give it a try, so I would recommend that you check whether it covers your needs before trying anything else.
I'm currently using sqlite embedded to store relatively big lists of data (starting from 100'000 rows per table). Queries include only:
paging
sorting by a field
Amount of data in a row is relatively small. Performance is really bad, especially for the first query, which is critical for my application. All kinds of tunings and pre-caching already tried and reached the practical limit.
Is there any alternative of an embedded data store library which can do these simple queries in a very fast and efficient way? Theres no requirement for it to support sql at all.
If it is (predominantly) read-only, consider using memory mapped views of a file.
It will be possible to achieve maximum performance rolling your own indexes.
Obviously it will be also be the most work-intensive and error-prone to roll-your-own.
May I suggest a traditional RDBMS with good indexes or perhaps a newfangled no-SQL style DB that supports your work-load?
You can try lucene.net, it is blazing fast, does not require any installation, supports paging and sorting by fields and much much more.
http://incubator.apache.org/lucene.net/
With Simple Lucene wrapper it is also quite easy to use: http://blogs.planetcloud.co.uk/mygreatdiscovery/post/SimpleLucene-e28093-Lucenenet-made-easy.aspx
We have a reporting tool that is grabbing a large amount of records. At times it can be 1 million records. We have been storing this in a datable. I wanted to to know if there was a better object to store this in. I would need to be able to aggregate the data in various ways.
Update:
Yes. Personally believe that should not being getting that many records. This is not the direction I want to go.
Also I am using Oracle
Update Update
Sorry for the delay, but there are always fire to put out here. The main issue was they were running out of memory and getting memory errors. They had issues with the datatable releasing from memory and also binding to a datagridview. I guess what I was looking for was a lighter weight object that wouldn't take as much space.
After thinking about a little more, it really doesn't make any sense to get that much data as diagonalbatman mentioned. furthermore if we have just a few people are using it with these issues. How is it going to scale.
Unfortunately, I have a boss that doesn't listen and an offshore team that is too much of a "yes sir" type attitude. They are serializing the raw data (as an XML file) and releasing the raw data Datatable which I think is not a good direction at all.
#diagonalbatman - OUt of curiousity, do you have an example of this
Why do you need to draw down 1 Milion records into your app?
Can you not do your reporting consolidation / aggregation on the DB? This would make better use of the DB's resources (after all this is what an RDBMS is designed to do) then you can focus your app on working with smaller consolidated sets?
I would recommend you try several options to verify, especially in light of your needed ability to aggregate the data in various ways.
1) Can it be aggregated by proper queries on the data side, this is likely the best solution.
2) if you use POCOs does LINQ improve upon your current memory and performance characteristics. Does LINQ allow you to to the aggregation you require.
Measure the characteristics you care about and try different options.
What you want are Data Cubes. Depending on the type of database you have, you should look at building some Cubes.
I have an application which needs to keep data from DB in memory.
There are 5-6 tables with very few rows and the tables are updated very rarely and as application needs this data very frequently I would like to avoid all time requesting the DB on each action.
I am using Entity Framework 4 (linq to entities) and it sends request each time quering. I know it is possible to avoid that using ToList or so ... but I need info from those 6 tables and queries apply joins.
What would be the better solution.
The purpose of the query is to be executed. You can check EF Caching Wrapper if it solves the problem but I don't think so. Caching provider caches actual query so it is enough to change where condition and it is considered as another query.
This should be done by loading your data into custom data structures (lists) and using Linq-to-objects on them.
If you are joining that data to other data which is not candidate for caching, I would suggest looking at your database engine features. Most advanced SQL databases, will place those tables in RAM already. You already will be incurring in network latency overhead when you issue the query for the non-cached data. And the database already will already have an index in RAM as well. Unless you are talking about big rows like an image or similar. You would just be moving a small amount of processing from one place to the next. Plus in order to be as efficient as the SQL database, not only do you need to find how to cache, but also cache an index and write code to use and maintain it as well.
Still, in some use cases it would be very useful thing to do.
In a desktop application, I need to store a 'database' of patient names with simple information, which can later be searched through. I'd expect on average around 1,000 patients total. Each patient will have to be linked to test results as well, although these can/will be stored seperately from the patients themselves.
Is a database the best solution for this, or overkill? In general, we'll only be searching based on a patient's first/last name, or ID numbers. All data will be stored with the application, and not shared outside of it.
Any suggestions on the best method for keeping all such data organized? The method for storing the separate test data is what seems to stump me when not using databases, while keeping it linked to the patient.
Off the top of my head, given a List<Patient>, I can imagine several LINQ commands to make searching a breeze, although with a list of 1,000 - 10,000 patients, I'm unsure if there's any performance concerns.
Use a database. Mainly because what you expect and what you get (especially over the long term) tend be two totally different things.
This is completely unrelated to your question on a technical level, but are you doing this for a company in the United States? What kind of patient data are you storing?
Have you looked into HIPAA requirements and checked to see if you're a covered entity? Be sure that you're complying with all legal regulations and requirements!
I think 1000 is to much to try to store in XML. I'd go with a simple db type, like access or Sqlite. Yes, as a matter of fact, I'd probably use Sqlite. Sql Server Express is probably overkill for it. http://sqlite.phxsoftware.com/ is the .net provider.
I would recommend a database. You can use SQL Server Express for something like that. Trying to use XML or something similar would probably get out of hand with that many rows.
For smaller databases/apps like this I've yet to notice any performance hits from using LINQ to SQL or Entity Framework.
I would use SQL Server Express because it has the best tool support (IDE integration) from Microsoft. I don't see any reason to consider it overkill.
Here's an article on how to embed it directly in your application (no separate installation needed).
If you had read-only files provided by another party in some kind of standard format which were meant to be used by the application, then I would consider simply indexing them according to your use cases and running your searches and UI against that. But that's still some customized work.
Relational databases are great for storing data in tables, and for representing the relationships between tables. Typically there are also good tools for getting the data in and out.
There are other systems you could use to store your data, but none which would so quickly be mapped to your input (you didn't mention how your data would get into this system) and then be queryable against with least effort.
Now, which database to choose...
Use Database...but maybe just SQLite, instead of a fully fledged database like MS SQL (Express).