Persisting data for catch-all search

Persisting data for catch-all search - c#

We have a set of catch all search pages we're creating in our ASP.NET application. We have an initial search page, a SERP, and then a single item details page. All 3 pages have a search bar with initial criteria, more criteria, and advanced criteria choices.
When we put all of our criteria together, in addition to the main search box we have 20 different criteria parameters (from price, to price, sale item, date created, etc.) and then three collections of parameter IDs. These collections are from a list of the Manufacturers, Product Lines, and Categories our users can search from. So we have this fixed set of 20 fields and then 3 collections that could have a manufacturer or two, or could hold a collection of 100 Guids for the lines whose checkboxes they selected and want to search through.
In our old system we had a single form solution and we just posted back and submitted everything to our business object, passing it into a method that returned the results. In this new form we need to submit the results from page to page and persist this criteria. We're trying to figure out the best way to persist the data, when I say best I mean most efficient.
Querystring - This isn't going to work with large collections of Guid values for the 3 collections.
Session - We would create a criteria object and store it in the Session. As they move from page to page we can pull it out. At our peak we probable have 200-300 people using the server concurrently and the search is our most used form. I'm worried about performance with all those session variables.
Database - We were thinking of serializing and stashing this criteria object into the database (SQL Server 2k5) and the users would always have a current Search or last Search in the database. This eliminates some of the web server load from the Session solution but I'm worried this object load, serialization, db round trip, and unload is going to slow the forms down and affect user experience.
I'm looking for advice on which method is going to work most efficiently for us or if there is an accepted best practice or pattern I've overlooked.

With HTML5 you can use localStorage and sessionStorage, which makes the client keep the information in their browser.
http://www.w3schools.com/html/html5_webstorage.asp

Related

Large amount of static values in linked selects! - ASP.NET MVC

I have a conceptual problem.
For an ASP.NET MVC / C# website (although the exact technology may not be that important) I have 2 linked drop downs (html select) with Countries and Cities of a continent.
These are currently kept in 2 database tables and as you imagine the Cities table has round 10 000 records.
The current functionality is:
- initially the country select is populated.
- the user selects a country, an ajax request goes to the server, retrieves the cities for that country_id and populates the second (cities) select.
Sometimes it gets a bit slow as you might imagine, and since these are in the end static values (the lists will not change) what will be the best way to treat this situation?

I would recommend implementing it, using the jquery chosen plugin Chosen
plugin that can be used to filter, results, but since you are getting such a big response you can incorporate a search filter functionality with ajax as shown here, Jquery Ajax-Chosen

Caching dynamically changing list of objects

I have a question about how to efficiently cache list of objects. I have a sample table of Trips. Each Trip has DateFrom and DateTo. I want to cache a list of Trips.
First approach that I consider is getting Trips form database and cache list of Trips ValueObjects (by Value Objects I mean all data needed to display a Trip on a list) at x minutes. This approach is very simple, but have a few disadventages:
- dealing with pagination - if I store them page by page (for example with key TripsP1, TripsP2, etc...). When the page size on GUI changes I will have to make a set of Trips with different keys (for example with key TripsP1Size1, TripsP1Size2 etc...)
- how to deal with sorted list? Keep a set of Trips with different keys for each filters combinations?
Second approach I consider is hittng database for each request but take only Trip.Id from database. Next I want to get each Trip ValueObject from cache. After creating or modifying the Trip I will put them in Cache with TripId as key. Also if for some reason I couldn't find Trip ValueObject in cache I would take it from database and put into cache.
Which of this to approach is better? Or maybe you can suggest some more efficiently way?

I don't see why to hold those Trips in form of Pages.
Why not load all the trips into a List (which can be sorted, depends on your needs), and build an in-memory index for Linq search queries? for example this lib.
You can create in memory indexes of your cache for very quick searched, depending on you application query patterns.
If you are worried about Trips being changes while a transaction is in progress (which all caches miss this point), you should implement it using Immutable collection (and make the Trip immutable type).
This way when the user ask the server to provide items with specific criteria, you can easily select those items and provide it a IQueryable (IEnumarable sequence) method for getting streams of data.
Hope this helps, Ofir.

Design of queryable web based results

I'm trying to accomplish something like a facebook news feed wall, loading N number of results from the overall dataset, starting with the most recent, date descending. When you click “more”, it displays the next N underneath and so on until you finish the dataset.
I’m struggling to come up with the best design to accomplish this. Ive always been told that stateless web services are the only way to build a scalable enterprise application, which means that as I understand it, keeping the whole results object cached serverside on the first call to the page, and just taking N results from it with each subsequent web service call is a no no?
If that’s the case, then something like GetResults(int pageindex, int pagesize) would work.... and thats how I WAS going to do it but then I realised it would not work if someone added a new DB record in between calls. Eg you start with 23 wall feed items in the DB and want to display them 10 at a time.
First call, page 1, page size 10 will return results 14-23 (most recent first)
Someone then adds 2 new posts, so you have 25 now in the DB
Second call, page 2, page size 10 will return results 6-15, two of which were already returned in the first call.
So this offsetting approach doesn’t work because you can’t guarantee the underlying dataset will remain the same between calls.
Im confused, how do I accomplish this?
Edit: Sorry a little more info. To avoid the problem of huge data table lookups, I had considered the option of pre-populating a "transient" table with the last few days data for that user when you first load the screen, then just reading the results a page at a time from that transient table to make it faster reading, with a slightly slower load time. Then when you exhaust that data, you bring in the next period (say 2 weeks) into the transient table and continue reading.
The difficulty is that users will "Post" items which then automatically will be picked up by users who match their search criteria. Eg if your criteria state you want to meet people between 25 and 32 and within 50 miles of you, then when you load up your news feed, you want it to show posts from all users who match your criteria. Kindof like a dynamic friends list.
How I was going to achieve this was at time of login, a stored proc would run which would populate a transient table in the DB by selecting all users and filtering down based on age and location criteria which I have in static lookup tables (postcode distances etc), then it will save the list of Users who match your criteria to this transient table for use whenever you then need to filter posts or search users. If you update your preferences, it will also recalculate this but only when you update prefs or re-login. So any new users signing up won't appear until you next login, which is fine I think.
Then when it comes time to display your news feed, all it does is retrieves this list of User Ids from the DB who match your criteria, then brings back all NewsFeedPosts which were posted by those users. Hey presto, dynamic news feed!
But obviously this is a subset of the entire NewsFeedPost table which is generated on the fly, so it doesn't make sense to recalculate this every time a user clicks "more", so this was how I was thinking about implementing it.
Tables - NewsFeedCurrent, NewsFeedRecent, NewsFeedArchive
New posts are created in the current table. Every night a batch job runs that moves all data from current that is 2 days old, to the recent table, and any data in the recent table that is a week old to the archive table.
The thinking being that 90% of the time, the user will only be interested in the last 2 days of data. So keep table small for access time. Another 9% of the time the user may want the last weeks data. So keep that separate in a secondary table. Then only 1% of the time the user wants data more than a week old so keep that in a larger, slow archive table that will be slower, but gives you performance boost by keeping current and recent tables small.
So when you first hit the news feed page, what it was going to do is take the pre-generated user list for your account and pull out all NewsFeedCurrent items and put them in a transient table, say TempNewsFeed under your user ID. You can then work with this resultset just by pulling back everything for your user id, no filtering required for items you arent interested in as they are pre-filtered. this will add a second or so to the page load but will improve response time when fetching results. Then when that data is exhausted, it will then - again using the list of users matching your criteria - pull out all relevant data from the Recent table, adding it to the TempNewsFeed table, allowing you to continue fetching data up to a week old. When thats exhausted, it will finally go to the archive table and using the user id list, pull out all data matching this and put in the temp table, allowing you to continue navigating the remaining data. This will give a fairly significant delay as it populates the archive data but if you are going back a week, then you will have to accept 5-10 seconds wait while it populates the data and says "loading data...". Once it has though, navigating historical data will be just as quick as recent data as it will all be in the transient table.
If you refresh the screen or go back onto it from another screen, it clears out the transient table and starts again from the Current table data.

Hope my answer makes sense, makes the right assumptions ...
I would divide the news feed into two sections. The first is for incoming news - which would be powered with AJAX calls. It is constantly saying "What is new?" The second section is for older news, where the user can lazily load more news by scrolling down.
Newest News Items
The important point is to make note of the maximum news feed id on your page. Let's imagine that is 10000. When the user loaded the page, news feed id 10000 was the latest news item.
When the new section is updated with AJAX, we simply ask, "What is newer than id 10000?" and we load those items onto the page. After we load them, we also increment the id on the page. For example, if we start with id 10000 and we load five new news items, the new id would be 10005. The next call would ask, "What is newer than 10005?"
Older News Items
The older section would keep track of the oldest news item on the page. Let's imagine they scroll back for a weeks worth of news. The minimum news item id would be 9000. When they want to scroll back further, we simply ask, "What is older than 9000?"
The idea then is to maintain on the page the maximum news item id and the minimum news item id and then keep loading from that reference point.

storing dataset of entire table and doing query on copy then updating GridView with results of query

I'm new to n-tier enterprise development. I just got quite a tutorial just reading threw the 'questions that may already have your answer' but didn't find what I was looking for. I'm doing a geneology site that starts off with the first guy that came over on the boat, you click on his name and the grid gets populated with all his children, then click on one of his kids that has kids and the grid gets populated with his kids and so forth. Each record has an ID and a ParentID. When you choose any given person, the ID is stored and then used in a search for all records that match the ParentID which returns all the kids. The data is never changed (at least by the user) so I want to just do one database access, fill all fields into one datatable and then do a requery of it each time to get the records to display. In the DAL I put all the records into a List which, in the ObjectDataSource the function that fills the GridView just returns the List of all entries. What I want to do is requery the datatable, fill the list back up with the new query and display in the GridView. My code is in 3 files here
(I can't get the backticks to show my code in this window) All I need is to figure out how to make a new query on the existing DataTable and copy it to a new DataTable. Hope this explains it well enough.
[edit: It would be easier to just do a new query from the database each time and it would be less resource intensive (in the future if the database gets too large) to store in memory, but I just want to know if I can do it this way - that is, working from 1 copy of the entire table] Any ideas...

Your data represents a tree structure by nature.
A grid to display it may not be my first choice...
Querying all data in one query can be done by using a complex SP.
But you are already considering performance. Thats always a good thing to keep in mind when coming up with a design. But creating something, improve it and only then start to optimize seems a better to go.
Since relational databases are not real good on hierarchical data, consider a nosql (graph)database. As you mentioned there are almost no writes to the DB, nosql shines here.

Improving nested objects filtering speed

Here's a problem I experience (simplified example):
Let's say I have several tables:
One customer can have mamy products and a product can have multiple features.
On my asp.net front-end I have a grid with customer info:
something like this:
Name Address
John 222 1st st
Mark 111 2nd st
What I need is an ability to filter customers by feature. So, I have a dropdown list of available features that are connected to a customer.
What I currently do:
1. I return DataTable of Customers from stored procedure. I store it in viewstate
2. I return DataTable of features connected to customers from stored procedure. I store it in viewstate
3. On filter selected, I run stored procedure again with new feature_id filter where I do joins again to only show customers that have selected feature.
My problem: It is very slow.
I think that possible solutions would be:
1. On page load return ALL data in one viewstate variable. So basically three lists of nested objects. This will make my page load slow.
2. Perform async loazing in some smart way. How?
Any better solutions?
Edit:
this is a simplified example, so I also need to filter customer by property that is connected through 6 tables to table Customer.

The way I deal with these scenarios is by passing in Xml to SQL and then running a join against that. So Xml would look something like:
<Features><Feat Id="2" /><Feat Id="5" /><feat Id="8" /></Features>
Then you can pass that Xml into SQL (depending on what version of SQL there are different ways), but in the newer version's its a lot easier than it used to be:
http://www.codeproject.com/Articles/20847/Passing-Arrays-in-SQL-Parameters-using-XML-Data-Ty
Also, don't put any of that in ViewState; there's really no reason for that.

Storing an entire list of customers in ViewState is going to be hideously slow; storing all information for all customers in ViewState is going to be worse, unless your entire customer base is very very small, like about 30 records.
For a start, why are you loading all the customers into ViewState? If you have any significant number of customers, load the data a page at a time. That will at least reduce the amount of data flowing over the wire and might speed up your stored procedure as well.
In your position, I would focus on optimizing the data retrieval first (including minimizing the amount you return), and then worry about faster ways to store and display it. If you're up against unusual constraints that prevent this (very slow database; no profiling tools; not allowed to change stored procedures) than please let us know.

Solution 1: Include whatever criteria you need to filter on in your query, only return and render the requested records. No need to use viewstate.
Solution 2: Retrieve some reasonable page limit of customers, filter on the browser with javascript. Allow easy navigation to the next page.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.