I have some SQL Server Store Procs that generates statistical data for charting in a C# web application.
Right now the user in the web app has to wait about 5 minutes to see these charts with updated data and this is a pain in the neck for the user and for me.
Some of the Store procs takes more than 5 minutes to generate the data but the web user don't need to see the info on the fly. Maybe update the chart every 2-3 hours.
So, I dont know what is the best practice to solve this.
I was thinking on creating a windows service that every 2-3 hours will call the SP's and then store the data in different tables.
Any clue on how to deal with this?
Appreciate the help
As I said in the comments, indexed views (kind of like materialized views) can increase performance of certain common queries without having to make temporary tables and things like that.
The benefits of indexed views are performance and that it doesn't require much extra coding and effort. When you create an indexed view as opposed to a temp table, the query navigator will (should) know when to take advantage of this view, without the end user needing to specify a temp or aggregate table.
Examples of the benefits of indexed views and how to implement them can be found here http://msdn.microsoft.com/en-us/library/dd171921(v=sql.100).aspx
here are some links to indexed views. Like the comments said, views allow you to quickly get information rather then always doing a select every time using a stored proc. Read the second link for a very good explanation about views.
MSDN
http://msdn.microsoft.com/en-ca/library/ms187864%28v=sql.105%29.aspx
Very well explained here
http://www.codeproject.com/Articles/199058/SQL-Server-Indexed-Views-Speed-Up-Your-Select-Quer
Related
I am trying to understand some basics of Lucene, the full text search engine. More specifically I am looking at Lucene.Net.
Today I have an old legacy .NET 4.8 web app. Some is MVC, but the newer parts follow a pretty nice API first pattern. The app holds a lot of records (app half a million) with tons of different fields. The search functionality there is outdated to say the least. It is a ton of old Linq2SQL queries that fan out in like queries.
I would like to introduce a new and better way to search records, so I started looking at Lucene.Net. But I am trying to understand one key concept, and I can't seem to find the answer anywhere, and I think it might be because it cannot be done, but I would like to make sure.
Is it possible to set up Lucene to monitor a SQL table or view so I don't have to maintain the Lucene index from within my code. The code of this app does not lend itself to easily keeping a Lucene index updated when things are added, changed or deleted. But the database is good source of truth. I can live with a small delay on having the index up to date. But basically I would like define for each business model what fields are part of the index and what the id is, and then be able to query with that index from the C# server side code of my Web App.
Is such a scenario even possible or am I asking too much?
It's totally possible, but not out of the box. You have to implement it if you want it. Fundamentally you need to implement three things.
A way to know every time a piece of relevant data in the sql database changes
A place to capture information about that change, call it a change log.
A routine that reads the change log, applies those changes to the
LuceneNet index and than marks the record in the change log has processed.
There are of course lots of different ways to handle each of these.
This SO answer Lucene.Net index updates, when a manual change is done in SQL Database provides more details on one way this can be accomplished.
I am building a Wiki / Blog similar application, and I have a question about the best way to store the View Count for each of the articles. The requirement is that I only want to store the unique number of users that viewed the article and not the total view count. So far I have come up with 3 different ways to accomplish:
1. SQL Server stored procedure: the problem with this approach is that the data is stored in XML data type and it might be a bit complicated to achieve the requirement using this method. I am leaving this as a last resort.
2. MSMQ: this would work great, since I can process the requests serially. The only problem with this approach is that, I cannot ensure that MSMQ is installed on the host server. This one is out of the question!
3. Using Application.Lock(): I know that using this method I can lock access to the Application object, update some entry in the application, update the database, and then call Application.Unlock(). While this sounds as a functional approach, it still feels like a workaround.
Does anyone has a suggestion on what I should do to achieve the requirement?
MsMQ and Application.Lock are def not the options to consider for something simple you want to do. (Application.Lock() is a def NO GO)
I also see no reason for XML. A stored proc does not rely on XML
Create a table
[page,userip]
on every view of the page
insert into <table>(page,userip) values(#page,#userip)
For the statistics just issue the a query
select count(*) from <table> group by userip having page=#page
This identifies a user on its IP, not completely failsafe as multiple users can come from the same ip.
But why not investigate google Analytics? All the info you need (and more)
I wonder if somebody could point me in the right direction. I've recently started playing with LinqToSQL and love the strongly typed data objects etc.
I'm just struggling to understand the impact on database performance etc. For example, say I was developing a simple user profile page. The page shows basic information about the user, some information on their recent activity, and a list of unread notifications.
If I was developing a stored procedure for this page, I could create a single SP which returns multiple datatables covering all of the required information - resulting in a single db call.
However, using LinqToSQL, this could results in many calls - one for user info, atleast one for activity, atleast one for notifications, if I then want further info on notifications this may result in further calls - multiple db calls.
Should I be worried about the number of db calls happenning as a result of using this design pattern? Ie, are the multiple db handshakes etc going to degrade my db etc?
I'd appreciate your thoughts on this!
Thanks
David
LINQ to SQL can consume multiple results from a stored proc if you need to go that route. Unfortnately the designer has problems mapping them correctly, so you will probably need to create your mapping manually. See http://www.thinqlinq.com/Default/Using-LINQ-to-SQL-to-return-Multiple-Results.aspx.
You can configure LINQ to SQL to eagerly load the child records if you know that you're going to need them for every parent record. Use the DataLoadOptions and .LoadWith to configure it.
You can also project an object graph with multiple child collections in the Select clause of a LINQ query to reduce the number of DB hits that you make.
Ultimately, you need to check a number of options to determine which route is the best performance for your situation. It's not a one size fits all scenario.
Is it worst from a performance standpoint ? Yes, it should be. Multiple roundtrips are usually worse than single.
The real question is, do you mind? Is your application going to receive enough visits to warrant the added complexity of a stored procedure? Or do you value the simplicity of future modifications over raw performance?
In any case, if you need the performance, you can create a stored procedure and map it on your context. This will give you one single call, but return the data as objects
Here is an article explaining a bit about that option:
linq-to-sql-returning-multiple-result-sets
i have a similar requirement to stackoverflow to show a number of metrics on a page in my asp.net-mvc site that are very expensive to calculate. Stackoverflow has a lot of metrics on the page (like user accept rate, etc) which clearly is not being calculated on the fly on page request, given that it would be too slow.
What is a recommended practice for serving up calculated data really fast without the performance penalty (assuming we can accept that this data maybe a little out of date.
is this stored in some caching layer or stored in some other "results" database table so every day there is a job to calculate this data and store the results so they can be queries directly?
assuming that i am happy to deal with the delayed of having this data as a snapshot,what is the best solution for this type of problem.
Probably they may be relying on the Redis data store for such calculations and caching. This post from marcgravell may help.
yes, the answer is caching, how you do it is (can be) the complicated part, if you are using NHibernate adding caching is really easy, is part of your configuration and on the queries you just add .Cacheable and it manages it for you. Caching also depends on the type of environment, if you're using a single worker, web farm or web garden, you would have to build a caching layer to accomodate for your scenario
Although this is a somewhat-recent technique, one really great way to structure your system to make stuff like this possible is by using Command and Query Responsibility Segregation, more often referred to by CQRS.
I've been implementing MS Search Server 2010 and so far its really good. Im doing the search queries via their web service, but due to the inconsistent results, im thinking about caching the result instead.
The site is a small intranet (500 employees), so it shouldnt be any problems, but im curious what approach you would take if it was a bigger site.
I've googled abit, but havent really come over anything specific. So, a few questions:
What other approaches are there? And why are they better?
How much does it cost to store a dataview of 400-500 rows? What sizes are feasible?
Other points you should take into consideration.
Any input is welcome :)
You need to employ many techniques to pull this off successfully.
First, you need some sort of persistence layer. If you are using a plain old website, then the user's session would be the most logical layer to use. If you are using web services (meaning session-less) and just making calls through a client, well then you still need some sort of application layer (sort of a shared session) for your services. Why? This layer will be home to your database result cache.
Second, you need a way of caching your results in whatever container you are using (session or the application layer of web services). You can do this a couple of ways... If the query is something that any user can do, then a simple hash of the query will work, and you can share this stored result among other users. You probably still want some sort of GUID for the result, so that you can pass this around in your client application, but having a hash lookup from the queries to the results will be useful. If these queries are unique then you can just use the unique GUID for the query result and pass this along to the client application. This is so you can perform your caching functionality...
The caching mechanism can incorporate some sort of fixed length buffer or queue... so that old results will automatically get cleaned out/removed as new ones are added. Then, if a query comes in that is a cache miss, it will get executed normally and added to the cache.
Third, you are going to want some way to page your result object... the Iterator pattern works well here, though probably something simpler might work... like fetch X amount of results starting at point Y. However the Iterator pattern would be better as you could then remove your caching mechanism later and page directly from the database if you so desired.
Fourth, you need some sort of pre-fetch mechanism (as others suggested). You should launch a thread that will do the full search, and in your main thread just do a quick search with the top X number of items. Hopefully by the time the user tries paging, the second thread will be finished and your full result will now be in the cache. If the result isn't ready, you can just incorporate some simple loading screen logic.
This should get you some of the way... let me know if you want clarification/more details about any particular part.
I'll leave you with some more tips...
You don't want to be sending the entire result to the client app (if you are using Ajax or something like an IPhone app). Why? Well because that is a huge waste. The user likely isn't going to page through all of the results... now you just sent over 2MB of result fields for nothing.
Javascript is an awesome language but remember it is still a client side scripting language... you don't want to be slowing the user experience down too much by sending massive amounts of data for your Ajax client to handle. Just send the prefetched result your client and additional page results as the user pages.
Abstraction abstraction abstraction... you want to abstract away the cache, the querying, the paging, the prefetching... as much of it as you can. Why? Well lets say you want to switch databases or you want to page directly from the database instead of using a result object in cache... well if you do it right this is much easier to change later on. Also, if using web services, many many other applications can make use of this logic later on.
Now, I probably suggested an over-engineered solution for what you need :). But, if you can pull this off using all the right techniques, you will learn a ton and have a very good base in case you want to extend functionality or reuse this code.
Let me know if you have questions.
It sounds like the slow part of the search is the full-text searching, not the result retrieval. How about caching the resulting resource record IDs? Also, since it might be true that search queries are often duplicated, store a hash of the search query, the query, and the matching resources. Then you can retrieve the next page of results by ID. Works with AJAX too.
Since it's an intranet and you may control the searched resources, you could even pre-compute a new or updated resource's match to popular queries during idle time.
I have to admit that I am not terribly familiar with MS Search Server so this may not apply. I have often had situations where an application had to search through hundreds of millions of records for result sets that needed to be sorted, paginated and sub-searched in a SQL Server though. Generally what I do is take a two step approach. First I grab the first "x" results which need to be displayed and send them to the browser for a quick display. Second, on another thread, I finish the full query and move the results to a temp table where they can be stored and retrieved quicker. Any given query may have thousands or tens of thousands of results but in comparison to the hundreds of millions or even billions of total records, this smaller subset can be manipulated very easily from the temp table. It also puts less stress on the other tables as queries happen. If the user needs a second page of records, or needs to sort them, or just wants a subset of the original query, this is all pulled from the temp table.
Logic then needs to be put into place to check for outdated temp tables and remove them. This is simple enough and I let the SQL Server handle that functionality. Finally logic has to be put into place for when the original query changes (significant perimeter changes) so that a new data set can be pulled and placed into a new temp table for further querying. All of this is relatively simple.
Users are so used to split second return times from places like google and this model gives me enough flexibility to actually achieve that without needing the specialized software and hardware that they use.
Hope this helps a little.
Tim's answer is a great way to handle things if you have the ability to run the initial query in a second thread and the logic (paging / sorting / filtering) to be applied to the results requires action on the server ..... otherwise ....
If you can use AJAX, a 500 row result set could be called into the page and paged or sorted on the client. This can lead to some really interesting features .... check out the datagrid solutions from jQueryUI and Dojo for inspiration!
And for really intensive features like arbitrary regex filters and drag-and-drop column re-ordering you can totally free the server.
Loading the data to the browser all at once also lets you call in supporting data (page previews etc) as the user "requests" them ....
The main issue is limiting the data you return per result to what you'll actually use for your sorts and filters.
The possibilities are endless :)