Search item from datatable(cache object) vs database

Search item from datatable(cache object) vs database - c#

Current I have a large data that stored inside database.
This set of data (3000 records) will be retrieve by user frequently.
Currently the method that I'm using right now is:
Retrieve this set of records from database
Convert into datatable
Store into a cache object
Search result from this cache object base on query
CachePostData.Select(string.Format("Name LIKE '%{0}%'", txtItemName.Text));
Bind the result to a repeater (with paging that display 40 records per page)
But I notice that the performance is not good (about 4 seconds on every request). So, I am wondering is there any better way to do this? Or I should straight away retrieve the result from the database for every query?

DataTable.Select is probably not the most efficient way to search an in-memory cache, but it certainly shouldn't take 4 seconds or anything like to search 3000 rows.
First step is to find out where your performance bottlenecks are. I'm betting it's nothing to do with searching the cache, but you can easily find out, e.g. with code something like:
var stopwatch = Stopwatch.StartNew();
var result = CachePostData.Select(string.Format("Name LIKE '%{0}%'", txtItemName.Text));
WriteToLog("Read from cache took {0} ms", stopwatch.Elapsed.TotalMilliseconds);
where WriteToLog traces somewhere (e.g. System.Diagnostics.Trace, System.Diagnostics.Debug, or a logging framework such as log4net).
If you are looking for alternatives for caching, you could simply cache a generic list of entity objects, and use Linq to search the list:
var result = CachePostData.Select(x => x.Name.Contains(txtItemName.Text));
This is probably slightly more efficient (for example, it doesn't need to parse the "NAME LIKE ..." filter expression), but again, I don't expect this is your bottleneck.

I think using datatable would be more efficient as by doing that you will reduce the hits on your database server. You can store datatable in cache and then to reuse it. Something like this:-
public DataTable myDatatable()
{
DataTable dt = HttpContext.Current.Cache["key"] as DataTable;
if(dt == null)
{
dt = myDatatable();
HttpContext.Current.Cache["key"] = dt;
}
return dt;
}
Also check SqlCacheDependency
You may also clear the cache on some particular time interval like this:-
HttpContext.Current.Cache.Insert("key", dt, null, DateTime.Now.AddHours(2),
System.Web.Caching.Cache.NoSlidingExpiration);
Also check DataTable caching performance

It is a bit hard to declare a correct solution for your problem without knowing how many hits on the database you actually expect. For most cases, I would not cache the data in the ASP.NET cache for filtering, because searching by using DataTable.Select basically performs a table scan and cannot take advantage of database indexing. Unless you run into really heavy load, most database server should be capable of performing this task with less delay than filtering the DataTable in .NET.
If your database supports fulltext search (ie. MSSQL or MySQL), you could create a fulltext index on the name column and query that. Fulltext search should give you even faster response types for these types of LIKE queries.
Generally, caching data for faster access is good, but in this case, the DataTable is most likely inferior to the database server in terms of searching for data. You could still use the cache to display unfiltered data faster and without hitting the database.

Related

Sorting inside the database or Sorting in code behind? Which is best?

I have a dropdown list in my aspx page. Dropdown list's datasource is a datatable. Backend is MySQL and records get to the datatable by using a stored procedure.
I want to display records in the dropdown menu in ascending order.
I can achieve this by two ways.
1) dt is datatable and I am using dataview to filter records.
dt = objTest_BLL.Get_Names();
dataView = dt.DefaultView;
dataView.Sort = "name ASC";
dt = dataView.ToTable();
ddown.DataSource = dt;
ddown.DataTextField = dt.Columns[1].ToString();
ddown.DataValueField = dt.Columns[0].ToString();
ddown.DataBind();
2) Or in the select query I can simply say that
SELECT
`id`,
`name`
FROM `test`.`type_names`
ORDER BY `name` ASC ;
If I use 2nd method I can simply eliminate the dataview part. Assume this type_names table has 50 records. And my page is view by 100,000 users at a minute. What is the best method by considering efficiency,Memory handling? Get unsorted records to datatable and filter in code behind or sort them inside the datatabse?

Note - Only real performance tests can tell you real numbers.. Theoretical options are below (which is why I use word guess a lot in this answer).
You have at least 3 (instead of 2) options -
Sort in database - If the column being sorted on is indexed.. Then this may make most sense, because overhead of sorting on your database server may be negligible. SQL servers own data caches may make this super fast operation.. but 100k queries per minute.. measure if SQL gives noticeably faster results without sort.
Sort in code behind / middle layer - Likely you won't have your own equivalent of index.. you'd be sorting list of 50 records, 100k times per minutes.. would be slower than SQL, I would guess.
Big benefit would apply, only if data is relatively static, or very slow changing, and sorted values can be cached in memory for few seconds to minutes or hours..
The option not in your list - send the data unsorted all the way to the client, and sort it on client side using javascript. This solution may scale the most... sorting 50 records in Browser, should not be a noticeable impact on your UX.

The SQL purists will no doubt tell you that it’s better to let SQL do the sorting rather than C#. That said, unless you are dealing with massive record sets or doing many queries per second it’s unlikely you’d notice any real difference.
For my own projects, these days I tend to do the sorting on C# unless I’m running some sort of aggregate on the statement. The reason is that it’s quick, and if you are running any sort of stored proc or function on the SQL server it means you don’t need to find ways of passing order by’s into the stored proc.

storing dataset of entire table and doing query on copy then updating GridView with results of query

I'm new to n-tier enterprise development. I just got quite a tutorial just reading threw the 'questions that may already have your answer' but didn't find what I was looking for. I'm doing a geneology site that starts off with the first guy that came over on the boat, you click on his name and the grid gets populated with all his children, then click on one of his kids that has kids and the grid gets populated with his kids and so forth. Each record has an ID and a ParentID. When you choose any given person, the ID is stored and then used in a search for all records that match the ParentID which returns all the kids. The data is never changed (at least by the user) so I want to just do one database access, fill all fields into one datatable and then do a requery of it each time to get the records to display. In the DAL I put all the records into a List which, in the ObjectDataSource the function that fills the GridView just returns the List of all entries. What I want to do is requery the datatable, fill the list back up with the new query and display in the GridView. My code is in 3 files here
(I can't get the backticks to show my code in this window) All I need is to figure out how to make a new query on the existing DataTable and copy it to a new DataTable. Hope this explains it well enough.
[edit: It would be easier to just do a new query from the database each time and it would be less resource intensive (in the future if the database gets too large) to store in memory, but I just want to know if I can do it this way - that is, working from 1 copy of the entire table] Any ideas...

Your data represents a tree structure by nature.
A grid to display it may not be my first choice...
Querying all data in one query can be done by using a complex SP.
But you are already considering performance. Thats always a good thing to keep in mind when coming up with a design. But creating something, improve it and only then start to optimize seems a better to go.
Since relational databases are not real good on hierarchical data, consider a nosql (graph)database. As you mentioned there are almost no writes to the DB, nosql shines here.

Paging through a datatable in codebehind

I need to handle very large datatables (2 million rows+) that comes from databases (SQL, Oracle, Access, MySQL, Sharepoint etc) outside of my control: Currently I loop through every row and column building a string object, but I run out of memory at about 100k rows.
The only solution I may take is to break the datatable into smaller pieces and persisting each block before starting on the next block of rows.
Since I cannot add ROW_NUMBER() or anything similar, I have to handle the populated datatable.
How can I easily (keep performance in mind) break the populated datatable into smaller datatables like paging?
PS there is no visual component to this functionality.

Are you using string concatenation? like this string += string.
Change that to StringBuilder and you should not have problems, at least not for 20k rows.

If you are talking about filling a DataTable object (which loads the results of your calls into memory before processing), you will likely be better off using a datareader for each of the mentioned providers so then you can process each row as it is read from the database instead of storing the DataTable in memory...
A great answer to another question lists the pro/cons of datareaders/datatables
If you're already using datareaders- ignore this. But your memory problem might be from also storing the retrieved results...

Speed up LINQ inserts

I have a CSV file and I have to insert it into a SQL Server database. Is there a way to speed up the LINQ inserts?
I've created a simple Repository method to save a record:
public void SaveOffer(Offer offer)
{
Offer dbOffer = this.db.Offers.SingleOrDefault (
o => o.offer_id == offer.offer_id);
// add new offer
if (dbOffer == null)
{
this.db.Offers.InsertOnSubmit(offer);
}
//update existing offer
else
{
dbOffer = offer;
}
this.db.SubmitChanges();
}
But using this method, the program is way much slower then inserting the data using ADO.net SQL inserts (new SqlConnection, new SqlCommand for select if exists, new SqlCommand for update/insert).
On 100k csv rows it takes about an hour vs 1 minute or so for the ADO.net way. For 2M csv rows it took ADO.net about 20 minutes. LINQ added about 30k of those 2M rows in 25 minutes. My database has 3 tables, linked in the dbml, but the other two tables are empty. The tests were made with all the tables empty.
P.S. I've tried to use SqlBulkCopy, but I need to do some transformations on Offer before inserting it into the db, and I think that defeats the purpose of SqlBulkCopy.
Updates/Edits:
After 18hours, the LINQ version added just ~200K rows.
I've tested the import just with LINQ inserts too, and also is really slow compared with ADO.net. I haven't seen a big difference between just inserts/submitchanges and selects/updates/inserts/submitchanges.
I still have to try batch commit, manually connecting to the db and compiled queries.

SubmitChanges does not batch changes, it does a single insert statement per object. If you want to do fast inserts, I think you need to stop using LINQ.
While SubmitChanges is executing, fire up SQL Profiler and watch the SQL being executed.
See question "Can LINQ to SQL perform batch updates and deletes? Or does it always do one row update at a time?" here: http://www.hookedonlinq.com/LINQToSQLFAQ.ashx
It links to this article: http://www.aneyfamily.com/terryandann/post/2008/04/Batch-Updates-and-Deletes-with-LINQ-to-SQL.aspx that uses extension methods to fix linq's inability to batch inserts and updates etc.

Have you tried wrapping the inserts within a transaction and/or delaying db.SubmitChanges so that you can batch several inserts?
Transactions help throughput by reducing the needs for fsync()'s, and delaying db.SubmitChanges will reduce the number of .NET<->db roundtrips.
Edit: see http://www.sidarok.com/web/blog/content/2008/05/02/10-tips-to-improve-your-linq-to-sql-application-performance.html for some more optimization principles.

Have a look at the following page for a simple walk-through of how to change your code to use a Bulk Insert instead of using LINQ's InsertOnSubmit() function.
You just need to add the (provided) BulkInsert class to your code, make a few subtle changes to your code, and you'll see a huge improvement in performance.
Mikes Knowledge Base - BulkInserts with LINQ
Good luck !

I wonder if you're suffering from an overly large set of data accumulating in the data-context, making it slow to resolve rows against the internal identity cache (which is checked once during the SingleOrDefault, and for "misses" I would expect to see a second hit when the entity is materialized).
I can't recall 100% whether the short-circuit works for SingleOrDefault (although it will in .NET 4.0).
I would try ditching the data-context (submit-changes and replace with an empty one) every n operations for some n - maybe 250 or something.
Given that you're calling SubmitChanges per isntance at the moment, you may also be wasting a lot of time checking the delta - pointless if you've only changed one row. Only call SubmitChanges in batches; not per record.

Alex gave the best answer, but I think a few things are being over looked.
One of the major bottlenecks you have here is calling SubmitChanges for each item individually. A problem I don't think most people know about is that if you haven't manually opened your DataContext's connection yourself, then the DataContext will repeatedly open and close it itself. However, if you open it yourself, and then close it yourself when you're absolutely finished, things will run a lot faster since it won't have to reconnect to the database every time. I found this out when trying to find out why DataContext.ExecuteCommand() was so unbelievably slow when executing multiple commands at once.
A few other areas where you could speed things up:
While Linq To SQL doesn't support your straight up batch processing, you should wait to call SubmitChanges() until you've analyzed everything first. You don't need to call SubmitChanges() after each InsertOnSubmit call.
If live data integrity isn't super crucial, you could retrieve a list of offer_id back from the server before you start checking to see if an offer already exists. This could significantly reduce the amount of times you're calling the server to get an existing item when it's not even there.

Why not pass an offer[] into that method, and doing all the changes in cache before submitting them to the database. Or you could use groups for submission, so you don't run out of cache. The main thing would be how long till you send over the data, the biggest time wasting is in the closing and opening of the connection.

Converting this to a compiled query is the easiest way I can think of to boost your performance here:
Change the following:
Offer dbOffer = this.db.Offers.SingleOrDefault (
o => o.offer_id == offer.offer_id);
to:
Offer dbOffer = RetrieveOffer(offer.offer_id);
private static readonly Func<DataContext, int> RetrieveOffer
{
CompiledQuery.Compile((DataContext context, int offerId) => context.Offers.SingleOrDefault(o => o.offer_id == offerid))
}
This change alone will not make it as fast as your ado.net version, but it will be a significant improvement because without the compiled query you are dynamically building the expression tree every time you run this method.
As one poster already mentioned, you must refactor your code so that submit changes is called only once if you want optimal performance.

Do you really need to check if the record exist before inserting it into the DB. I thought it looked strange as the data comes from a csv file.
P.S. I've tried to use SqlBulkCopy,
but I need to do some transformations
on Offer before inserting it into the
db, and I think that defeats the
purpose of SqlBulkCopy.
I don't think it defeat the purpose at all, why would it? Just fill a simple dataset with all the data from the csv and do a SqlBulkCopy. I did a similar thing with a collection of 30000+ rows and the import time went from minutes to seconds

I suspect it isn't the inserting or updating operations that are taking a long time, rather the code that determines if your offer already exists:
Offer dbOffer = this.db.Offers.SingleOrDefault (
o => o.offer_id == offer.offer_id);
If you look to optimise this, I think you'll be on the right track. Perhaps use the Stopwatch class to do some timing that will help to prove me right or wrong.
Usually, when not using Linq-to-Sql, you would have an insert/update procedure or sql script that would determine whether the record you pass already exists. You're doing this expensive operation in Linq, which certainly will never hope to match the speed of native sql (which is what's happening when you use a SqlCommand and select if the record exists) looking-up on a primary key.

Well you must understand linq creates code dynamically for all ADO operations that you do instead handwritten, so it will always take up more time then your manual code. Its simply an easy way to write code but if you want to talk about performance, ADO.NET code will always be faster depending upon how you write it.
I dont know if linq will try to reuse its last statement or not, if it does then seperating insert batch with update batch may improve performance little bit.

This code runs ok, and prevents large amounts of data:
if (repository2.GeoItems.GetChangeSet().Inserts.Count > 1000)
{
repository2.GeoItems.SubmitChanges();
}
Then, at the end of the bulk insertion, use this:
repository2.GeoItems.SubmitChanges();

filtering on gridview control VS-2008

This is what I am working with to get back to the web dev world
ASP.Net with VS2008
Subsonic as Data Access Layer
SqlServer DB
Home Project description:
I have a student registration system. I have a web page which should display the student records.
At present I have a gridview control which shows the records
The user logs on and gets to the view students page. The gridview shows the students in the system, where one of the columns is the registration status of open, pending, complete.
I want the user to be able to apply dynamic sorting or filters on the result returned so as to obtain a more refined result they wish to see. I envisioned allowing user to filter the results by applying where clause or like clause on the result returned, via a dataset interface by a subsonic method. I do not want to query the databse again to apply the filter
example: initial query
Select * from studentrecords where date = convert(varchar(32,getdate(),101)
The user then should be able to applly filter on the resultset returned so that they can do a last name like '%Souza%'
Is this even possible and is binding a datasource to a gridview control the best approach or should i create a custom collection inheriting from collectionbase and then binding that to the gridview control?
PS: Sorry about the typo's. My machine is under the influence of tea spill on my laptop

I use LINQ-to-SQL, not Subsonic, so YMMV, but my approach to filtering has been to supply an OnSelecting handler to the data source. In LINQ-to-SQL, I'm able to replace the result with a reference to a DataContext method that returns a the result of applying a table-valued function. You might want to investigate something similar with Subsonic.

As tvanfosson said, LINQ is very well suited to making composable queries; you can do this either with fully dunamic TSQL that the base library generates, or via a UDF that you mark with [FunctionAttribute(..., IsComposable=true)] in the data-context.
I'm not familiar with Subsonic, so I can't advise there; but one other thought: in your "date = " code, you might consider declaring a datetime variable and assigning it first... that way the optimiser can usually do a better job of optimising it (the query is simpler, and there is no question whether it is converting the datetime (per row) to a varchar, or the varchar to a datetime). The most efficient way of getting just the date portion of something is to cast/floor/cast:
SET #today = GETDATE()
SET #today = CAST(FLOOR(CAST(#today as float)) as datetime)
[update re comment]
Re composable - I mean that this allows you to build up a query such that only the final query is executed at the database. For example:
var query = from row in ctx.SomeComplexUdf(someArg)
where row.IsOpen && row.Value > 0
select row;
might go down the the server via the TSQL:
SELECT u1.*
FROM dbo.SomeComplexUdf(#p1) u1
WHERE u1.IsOpen = 1 -- might end up parameterized
AND u1.Value > 0 -- might end up parameterized
The point here being that only the suitable data is returned from the server, rather than lots of data being returned, then thrown away. LINQ-to-SQL can do all sorts of things via compsable queries, including paging, sorting etc. By minimising the amount of data you load from the database you can make significant improvements in performance.
The alternative without composability is that it simply does:
SELECT u1.*
FROM dbo.SomeComplexUdf(#p1) u1
And then throws away the other rows in your web app... obviously, if you are expecting 20 open records and 10000 closed records, this is a huge difference.

How about something like this?
Rather than assigning a data source / table to your grid control, instead attach a 'DataView' to it.
Here's sort of a pseudocode example:
DataTable myDataTable = GetDataTableFromSomewhere();
DataGridView dgv = new DataGridView();
DataView dv = new DataView(myDataTable);
//Then you can specify things like:
dv.Sort = "StudentStatus DESC";
dv.Filter = "StudentName LIKE '" + searchName + '";
dgv.DataSource = dv;
...and so on.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.