I've a US city/state list table in my sql server 2005 database which is having a million records. My web application pages are having location textbox which uses AJAX autocomplete feature. I need to show complete city/state when user types in 3 characters.
For example:
Input bos..
Output:Boston,MA
Currently, performance wise, this functionality is pretty slow. How can i improve it?
Thanks for reading.
Have you checked in the indexes on your database? If your query is formatted correctly, and you have the proper indexes on your table, you can query a 5 million row database and get your results in less then a second. I would suggest to see if you have an index on the City with added column State onto the index. That way when you query by city, it will return both the city and state from the index.
If you run your query in sql management studio and press ctrl-m you can see the execution plan on your query. If you see something like table scan or index scan then you have the wrong index on your table. You want to make sure your results have an index seek, this means that your query is going through the proper pages in the database to find your data.
Hope this helps.
My guess would be that the problem you're having is not the database itself (although you should check it for index problems), but the amount of time that it takes to retrieve the information from the database, put it in the appropriate objects etc., and send it to the browser. If this is the case, there aren't a lot of options without some real work.
You can cache frequently accessed information on the web server. If you know there are a lot of cities which are frequently accessed, you can store them up-front and then check the database if what the user is looking for isn't in the system. We use prefix trees to store information when a user is typing something and we need to find it in a list.
You can start to pull information from the database as soon as the user starts to type and then pair down full result set down after you get more information from the user. This is a little trickier, as you'll have to store the information in memory between requests (so if the user types "B", you start the retrieval and store it in a session. When the user is done typing "BOS" the result set from the initial query is in memory temporarily and you can loop through and pull the subset that matches the final request).
Use parent child dropdowns
Related
I need your help.
How to navigate through records in 3-tire application of asp.net with c#?
Suppose like i am returning record set from dataacess layer to my UI layer then what is the better way to navigate it like next,previous,first and last record.
Thanks in Advance
You should adopt a record pagination strategy for data access, whereby data is retrieved in pages of N rows at a time, and allow data from a query to be retrieved from an arbitrary offset (i.e. You shouldn't hold database resources such as a connections or cursors open while the user decides what next to do).
The client itself will need to maintain this state (i.e. the current page the user is viewing is stored on browser), such as which record / page the user is currently browsing.
Each time the user navigates to a new page, you will need to fetch appropriate batch of data from the, remembering any filters that are applicable (Page Number, Page Size and Start Record are common models here).
You haven't mentioned which database you are using, but most RDBMS systems have pagination friendly functions such as OFFSET and LIMIT (and even Sql Server 2012 now has OFFSET / FETCH NEXT N ROWS). Linq based ORMs then expose these as easy to use functions for paging as Skip() and Take() respectively.
I have a situation at www.zipstory.com (beta) where I have n-permutations of feeds coming from the same database. For example, someone can get a feed for whatever city they are interested in and as many of these cities all together so its all sorted by most recent or most votes.
How should I cache things for each user without completely maxing out the available memory when there are thousands of users at the same time?
My only guess is don't. I could come up with a client-side caching strategy where I sort out the cities results but this way I could still cache in a one-size fits all strategy by city.
What approaches do you suggest? I'm in unfamiliar ground at this point and could use a good strategy. I noticed this website does not do that but Facebook does. They must be pulling from a pool of cached user-feeds and plucking them in client-side. Not sure, again I'm not smart enough to figure this out just yet.
In other words...
Each city has its own feed. Each user has an n-permutation of city feeds combined.
I would like to see possible solutions to this problem using c# and ASP.NET
Adding to this Febuary 28th, 2013. Here's what I did based on your comments so THANKS!...
For every user logging in, I cache their preferred city list
The top 10 post results are cached per city and stored in a Linq based object
When a user comes in and has x cities as feeds, I go through their city list loop then check if city postings are in cache, if not, I get from DB then populate the individual posting's html into cache along with other sorting elements.
I recombine the list of cities into one feed for the user and since i have some sorting elements on the linq object, i can resort them in the proper order and give back to the user
This does mean there is some CPU work everytime regardless as I have to combine city lists into a single city list but this avoids going to the database every time and everyone benefits in faster page response times. The main drawback is since I'm not doing a single query UNION on cities before, this requires a single query per city if each was not cached but each city is checked if cached or not individually so 10 queries per 10 cities would only happen if the site is a dead zone.
Judge the situation based on critical chain points.
If the memory is not a problem, consider having the whole feed cached and retrieving items from there. For this you can use distributed cache solutions. Some of them are even free. Start from the memcached, http://memcached.org/ . People refer to this approach as Load Ahead.
Sometimes memory is a problem if you want to use asp.net cache with expiration and priorities. In such a case, cache can be gone at any point when the memory becomes a problem. Thus, you load the data again on demand (called as Load Through) that affects bandwidth. In such a case, your code should be smarter to get along. If this is an option for you, then try to cache as little as possible. e.g. cache loaded items each and when the user requests a feed, check if all items are present in the cache. If not, you will have to fetch either all or missing ones again. I have done something similar in the past, but cannot provide the code. Key point is: cache entities and then cache feeds with references (IDs) to the entities. Thus, when a particular feed is requested, you check that all references are still valid in the cache. BTW, asp.net provides cache dependencies for such scenarios, so read about that too, - may be helpful.
In any case, have Decorator design pattern in mind when implementing data access layer, that would allow you to: 1 - postpone the caching concerns for the later development phases, and 2 - switch between the two approaches described above depending on how things go. I would start with simpler (and cheaper) built-in solution, and then would switch to the distributed cache solutions when really needed.
Only cache the minimal amount of different information per user that you need.
For example, if it fits in memory, cache the complete set of feeds and only store, per user, the id's of the feeds they are interested in.
When they request their feeds, just get those out of memory.
Have you considered cache-ing generic feeds, and tag them. And then per user, you just store reference to that tag/keyword.
Another possibility could be to store generic feed, and then filter on client. This will increase your bandwidth, but save cost on cache.
And if you are on HTML5, use Local Storage to hold user preference.
I am using Entity Framework, ASP.NET Web API and there is a WinForm client that sends its data to the server or receives.
I know how to get one record and also how to get a collection of records at one time, but there maybe a lot amount of records which may cause problems. So how can I get a collection of new information one by one in the client side?
I think I can get a list of new ID's first and then getting them one by one. Is there a better approach?
any information about implementing this will be useful.
EDIT: to be clear, I mean getting a collection of information from server in client machine and not in server from database :)
Paging is the answer. Below is a great link to paging with Entity Framework and MVC using Skip and Take. You basically specify the starting index (or "skip") and then the number of records you want to get (or "take"). Like this:
context.YOURDBSET.Skip(0).Take(10); //Gets the first 10.
context.YOURDBSET.Skip(10).Take(10); //Gets the next 10.
context.YOURDBSET.Skip(20).Take(10); //Gets the next 10.
etc.etc.etc.
Here's that link: http://msdn.microsoft.com/en-us/magazine/gg650669.aspx
A very common approach to this problem is to create some form of paging. For instance, the first time, just get the first 50 records from the server. Then based on some condition (could be user triggered, or automatic based on time, depends on your application) get the next set of data, record 51 - 100.
You can get creative with how you trigger getting the various pages of data. For instance, you could trigger the retrieval of data based on the user scrolling the mouse wheel if you need to show the data that way. Ultimately this depends on your scenario, but I believe paging is the answer.
Your page size could be 5 records, or it could be 500 records, but the idea is the same.
I've been implementing MS Search Server 2010 and so far its really good. Im doing the search queries via their web service, but due to the inconsistent results, im thinking about caching the result instead.
The site is a small intranet (500 employees), so it shouldnt be any problems, but im curious what approach you would take if it was a bigger site.
I've googled abit, but havent really come over anything specific. So, a few questions:
What other approaches are there? And why are they better?
How much does it cost to store a dataview of 400-500 rows? What sizes are feasible?
Other points you should take into consideration.
Any input is welcome :)
You need to employ many techniques to pull this off successfully.
First, you need some sort of persistence layer. If you are using a plain old website, then the user's session would be the most logical layer to use. If you are using web services (meaning session-less) and just making calls through a client, well then you still need some sort of application layer (sort of a shared session) for your services. Why? This layer will be home to your database result cache.
Second, you need a way of caching your results in whatever container you are using (session or the application layer of web services). You can do this a couple of ways... If the query is something that any user can do, then a simple hash of the query will work, and you can share this stored result among other users. You probably still want some sort of GUID for the result, so that you can pass this around in your client application, but having a hash lookup from the queries to the results will be useful. If these queries are unique then you can just use the unique GUID for the query result and pass this along to the client application. This is so you can perform your caching functionality...
The caching mechanism can incorporate some sort of fixed length buffer or queue... so that old results will automatically get cleaned out/removed as new ones are added. Then, if a query comes in that is a cache miss, it will get executed normally and added to the cache.
Third, you are going to want some way to page your result object... the Iterator pattern works well here, though probably something simpler might work... like fetch X amount of results starting at point Y. However the Iterator pattern would be better as you could then remove your caching mechanism later and page directly from the database if you so desired.
Fourth, you need some sort of pre-fetch mechanism (as others suggested). You should launch a thread that will do the full search, and in your main thread just do a quick search with the top X number of items. Hopefully by the time the user tries paging, the second thread will be finished and your full result will now be in the cache. If the result isn't ready, you can just incorporate some simple loading screen logic.
This should get you some of the way... let me know if you want clarification/more details about any particular part.
I'll leave you with some more tips...
You don't want to be sending the entire result to the client app (if you are using Ajax or something like an IPhone app). Why? Well because that is a huge waste. The user likely isn't going to page through all of the results... now you just sent over 2MB of result fields for nothing.
Javascript is an awesome language but remember it is still a client side scripting language... you don't want to be slowing the user experience down too much by sending massive amounts of data for your Ajax client to handle. Just send the prefetched result your client and additional page results as the user pages.
Abstraction abstraction abstraction... you want to abstract away the cache, the querying, the paging, the prefetching... as much of it as you can. Why? Well lets say you want to switch databases or you want to page directly from the database instead of using a result object in cache... well if you do it right this is much easier to change later on. Also, if using web services, many many other applications can make use of this logic later on.
Now, I probably suggested an over-engineered solution for what you need :). But, if you can pull this off using all the right techniques, you will learn a ton and have a very good base in case you want to extend functionality or reuse this code.
Let me know if you have questions.
It sounds like the slow part of the search is the full-text searching, not the result retrieval. How about caching the resulting resource record IDs? Also, since it might be true that search queries are often duplicated, store a hash of the search query, the query, and the matching resources. Then you can retrieve the next page of results by ID. Works with AJAX too.
Since it's an intranet and you may control the searched resources, you could even pre-compute a new or updated resource's match to popular queries during idle time.
I have to admit that I am not terribly familiar with MS Search Server so this may not apply. I have often had situations where an application had to search through hundreds of millions of records for result sets that needed to be sorted, paginated and sub-searched in a SQL Server though. Generally what I do is take a two step approach. First I grab the first "x" results which need to be displayed and send them to the browser for a quick display. Second, on another thread, I finish the full query and move the results to a temp table where they can be stored and retrieved quicker. Any given query may have thousands or tens of thousands of results but in comparison to the hundreds of millions or even billions of total records, this smaller subset can be manipulated very easily from the temp table. It also puts less stress on the other tables as queries happen. If the user needs a second page of records, or needs to sort them, or just wants a subset of the original query, this is all pulled from the temp table.
Logic then needs to be put into place to check for outdated temp tables and remove them. This is simple enough and I let the SQL Server handle that functionality. Finally logic has to be put into place for when the original query changes (significant perimeter changes) so that a new data set can be pulled and placed into a new temp table for further querying. All of this is relatively simple.
Users are so used to split second return times from places like google and this model gives me enough flexibility to actually achieve that without needing the specialized software and hardware that they use.
Hope this helps a little.
Tim's answer is a great way to handle things if you have the ability to run the initial query in a second thread and the logic (paging / sorting / filtering) to be applied to the results requires action on the server ..... otherwise ....
If you can use AJAX, a 500 row result set could be called into the page and paged or sorted on the client. This can lead to some really interesting features .... check out the datagrid solutions from jQueryUI and Dojo for inspiration!
And for really intensive features like arbitrary regex filters and drag-and-drop column re-ordering you can totally free the server.
Loading the data to the browser all at once also lets you call in supporting data (page previews etc) as the user "requests" them ....
The main issue is limiting the data you return per result to what you'll actually use for your sorts and filters.
The possibilities are endless :)
I have developed an network application that is in use in my company for last few years.
At start it was managing information about users, rights etc.
Over the time it grew with other functionality. It grew to the point that I have tables with, let's say 10-20 columns and even 20,000 - 40,000 records.
I keep hearing that Access in not good for multi-user environments.
Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client.
It happens because there is no database engine on the server side and data filtering is done on the client side.
I would migrate this project to the SQL Server but unfortunately it cannot be done in this case.
I was wondering if there is more reliable solution for me than using Access Database and still stay with a single-file database system.
We have quite huge system using dBase IV.
As far as I know it is fully multiuser database system.
Maybe it will be good to use it instead of Access?
What makes me not sure is the fact that dBase IV is much older than Access 2000.
I am not sure if it would be a good solution.
Maybe there are some other options?
If you're having problems with your Jet/ACE back end with the number of records you mentioned, it sounds like you have schema design problems or an inefficiently-structured application.
As I said in my comment to your original question, Jet does not retrieve full tables. This is a myth propagated by people who don't have a clue what they are talking about. If you have appropriate indexes, only the index pages will be requested from the file server (and then, only those pages needed to satisfy your criteria), and then the only data pages retrieved will be those that have the records that match the criteria in your request.
So, you should look at your indexing if you're seeing full table scans.
You don't mention your user population. If it's over 25 or so, you probably would benefit from upsizing your back end, especially if you're already comfortable with SQL Server.
But the problem you described for such tiny tables indicates a design error somewhere, either in your schema or in your application.
FWIW, I've had Access apps with Jet back ends with 100s of thousands of records in multiple tables, used by a dozen simultaneous users adding and updating records, and response time retrieving individual records and small data sets was nearly instantaneous (except for a few complex operations like checking newly entered records for duplication against existing data -- that's slower because it uses lots of LIKE comparisons and evaluation of expressions for comparison). What you're experiencing, while not an Access front end, is not commensurate with my long experience with Jet databases of all sizes.
You may wish to read this informative thread about Access: Is MS Access (JET) suitable for multiuser access?
For the record this answer is copied/edited from another question I answered.
Aristo,
You CAN use Access as your centralized data store.
It is simply NOT TRUE that access will choke in multi-user scenarios--at least up to 15-20 users.
It IS true that you need a good backup strategy with the Access data file. But last I checked you need a good backup strategy with SQL Server, too. (With the very important caveat that SQL Server can do "hot" backups but not Access.)
So...you CAN use access as your data store. Then if you can get beyond the company politics controlling your network, perhaps then you could begin moving toward upfitting your current application to use SQL Server.
I recently answered another question on how to split your database into two files. Here is the link.
Creating the Front End MDE
Splitting your database file into front end : back end is sort of a key to making it more performant. (Assume, as David Fenton mentioned, that you have a reasonably good design.)
If I may mention one last thing...it is ridiculous that your company won't give you other deployment options. Surely there is someone there with some power who you can get to "imagine life without your application." I am just wondering if you have more power than you might realize.
Seth
The problems you experience with an Access Database shared amongst your users will be the same with any file based database.
A read will pull a lot of data into memory and writes are guarded with some type of file lock. Under your environment it sounds like you are going to have to make the best of what you have.
"Second thing is the fact that when I try to read some records from the table over the network, the whole table has to be pulled to the client. "
Actually no. This is a common misstatement spread by folks who do not understand the nature of how Jet, the database engine inside Access, works. Pulling down all the records, or excessive number of records, happens because you don't have all the fields used in the selection criteria or sorting in the index. We've also found that indexing yes/no aka boolean fields can also make a huge difference in some queries.
What really happens is that Jet brings down the index pages and data pages which are required. While this is a lot more data than a database engine would create this is not the entire table.
I also have clients with 600K and 800K records in various tables and performance is just fine.
We have an Access database application that is used pretty heavily. I have had 23 users on all at the same time before without any issues. As long as they don't access the same record then I don't have any problems.
I do have a couple of forms that are used and updated by several different departments. For instance I have a Quoting form that contains 13 different tabs and 10-20 fields on each tab. Users are typically in a single record for minutes editing and looking for information. To avoid any write conflicts I call the below function any time a field is changed. As long as it is not a new record being entered, then it updates.
Function funSaveTheRecord()
If ([chkNewRecord].value = False And Me.Dirty) Then
'To save the record, turn off the form's Dirty property
Me.Dirty = False
End If
End Function
They way I have everything setup is as follows:
PDC.mdb <-- Front End, Stored on the users machine. Every user has their own copy. Links to tables found in PDC_be.mdb. Contains all forms, reports, queries, macros, and modules. I created a form that I can use to toggle on/off the shift key bipass. Only I have access to it.
PDC_be.mdb <-- Back End, stored on the server. Contains all data. Only form and VBA it contains is to toggle on/off the shift key bipass. Only I have access to it.
Secured.mdw <-- Security file, stored on the server.
Then I put a shortcut on the users desktop that ties the security file to the front end and also provides their login credentials.
This database has been running without error or corruption for over 6 years.
Access is not a flat file database system! It's a relational database system.
You can't use SQL Server Express?
Otherwise, MySQL is a good database.
But if you can't install ANYTHING (you should get into those politics sooner rather than later -- or it WILL be later), just use you existing database system.
Basically with Access, it cannot handle more than 5 people connected at the same time, or it will corrupt on you.