Improving nested objects filtering speed

Improving nested objects filtering speed - c#

Here's a problem I experience (simplified example):
Let's say I have several tables:
One customer can have mamy products and a product can have multiple features.
On my asp.net front-end I have a grid with customer info:
something like this:
Name Address
John 222 1st st
Mark 111 2nd st
What I need is an ability to filter customers by feature. So, I have a dropdown list of available features that are connected to a customer.
What I currently do:
1. I return DataTable of Customers from stored procedure. I store it in viewstate
2. I return DataTable of features connected to customers from stored procedure. I store it in viewstate
3. On filter selected, I run stored procedure again with new feature_id filter where I do joins again to only show customers that have selected feature.
My problem: It is very slow.
I think that possible solutions would be:
1. On page load return ALL data in one viewstate variable. So basically three lists of nested objects. This will make my page load slow.
2. Perform async loazing in some smart way. How?
Any better solutions?
Edit:
this is a simplified example, so I also need to filter customer by property that is connected through 6 tables to table Customer.

The way I deal with these scenarios is by passing in Xml to SQL and then running a join against that. So Xml would look something like:
<Features><Feat Id="2" /><Feat Id="5" /><feat Id="8" /></Features>
Then you can pass that Xml into SQL (depending on what version of SQL there are different ways), but in the newer version's its a lot easier than it used to be:
http://www.codeproject.com/Articles/20847/Passing-Arrays-in-SQL-Parameters-using-XML-Data-Ty
Also, don't put any of that in ViewState; there's really no reason for that.

Storing an entire list of customers in ViewState is going to be hideously slow; storing all information for all customers in ViewState is going to be worse, unless your entire customer base is very very small, like about 30 records.
For a start, why are you loading all the customers into ViewState? If you have any significant number of customers, load the data a page at a time. That will at least reduce the amount of data flowing over the wire and might speed up your stored procedure as well.
In your position, I would focus on optimizing the data retrieval first (including minimizing the amount you return), and then worry about faster ways to store and display it. If you're up against unusual constraints that prevent this (very slow database; no profiling tools; not allowed to change stored procedures) than please let us know.

Solution 1: Include whatever criteria you need to filter on in your query, only return and render the requested records. No need to use viewstate.
Solution 2: Retrieve some reasonable page limit of customers, filter on the browser with javascript. Allow easy navigation to the next page.

Related

Best approach to track Amount field on Invoice table when InvoiceItem items change?

I'm building an app where I need to store invoices from customers so we can track who has paid and who has not, and if not, see how much they owe in total. Right now my schema looks something like this:
Customer
- Id
- Name
Invoice
- Id
- CreatedOn
- PaidOn
- CustomerId
InvoiceItem
- Id
- Amount
- InvoiceId
Normally I'd fetch all the data using Entity Framework and calculate everything in my C# service, (or even do the calculation on SQL Server) something like so:
var amountOwed = Invoice.Where(i => i.CustomerId == customer.Id)
.SelectMany(i => i.InvoiceItems)
.Select(ii => ii.Amount)
.Sum()
But calculating everything every time I need to generate a report doesn't feel like the right approach this time, because down the line I'll have to generate reports that should calculate what all the customers owe (sometimes go even higher on the hierarchy).
For this scenario I was thinking of adding an Amount field on my Invoice table and possibly an AmountOwed on my Customer table which will be updated or populated via the InvoiceService whenever I insert/update/delete an InvoiceItem. This should be safe enough and make the report querying much faster.
But I've also been searching some on this subject and another recommended approach is using triggers on my database. I like this method best because even if I were to directly modify a value using SQL and not the app services, the other tables would automatically update.
My question is:
How do I add a trigger to update all the parent tables whenever an InvoiceItem is changed?
And from your experience, is this the best (safer, less error-prone) solution to this problem, or am I missing something?

There are many examples of triggers that you can find on the web. Many are poorly written unfortunately. And for future reference, post DDL for your tables, not some abbreviated list. No one should need to ask about the constraints and relationships you have (or should have) defined.
To start, how would you write a query to calculate the total amount at the invoice level? Presumably you know the tsql to do that. So write it, test it, verify it. Then add your amount column to the invoice table. Now how would you write an update statement to set that new amount column to the sum of the associated item rows? Again - write it, test it, verify it. At this point you have all the code you need to implement your trigger.
Since this process involves changes to the item table, you will need to write triggers to handle all three types of dml statements - insert, update, and delete. Write a trigger for each to simplify your learning and debugging. Triggers have access to special tables - go learn about them. And go learn about the false assumption that a trigger works with a single row - it doesn't. Triggers must be written to work correctly if 0 (yes, zero), 1, or many rows are affected.
In an insert statement, the inserted table will hold all the rows inserted by the statement that caused the trigger to execute. So you merely sum the values (using the appropriate grouping logic) and update the appropriate rows in the invoice table. Having written the update statement mentioned in the previous paragraphs, this should be a relatively simple change to that query. But since you can insert a new row for an old invoice, you must remember to add the summed amount to the value already stored in the invoice table. This should be enough direction for you to start.
And to answer your second question - the safest and easiest way is to calculate the value every time. I fear you are trying to solve a problem that you do not have and that you may never have. Generally speaking, no one cares about invoices that are of "significant" age. You might care about unpaid invoices for a period of time, but eventually you write these things off (especially if the amounts are not significant). Another relatively easy approach is to create an indexed view to calculate and materialize the total amount. But remember - nothing is free. An indexed view must be maintained and it will add extra processing for DML statements affecting the item table. Indexed views do have limitations - which are documented.
And one last comment. I would strongly hesitate to maintain a total amount at any level higher than invoice. Above that level one frequently wants to filter the results in any ways - date, location, type, customer, etc. At this level you are approaching data warehouse functionality which is not appropriate for a OLTP system.

First of all never use triggers for business logic. Triggers are tricky and easily forgettable. It will be hard to maintain such application.
For most cases you can easily populate your reporting data via entity framework or SQL query. But if it requires lots of joins then you need to consider using staging tables. Because reporting requires data denormalization. To populate staging tables you can use SQL jobs or other schedule mechanism (Azure Scheduler maybe). This way you won't need to work with lots of join and your reports will populate faster.

API - filter big list with word fragment

I have asp.net web api application. In database I have a big list (between 100.000 and 200.000) of pairs like id:name and this list could be changed quite rarely. I need to implement filtering like this /pair/filter?fragment=bla. It should return first 25 pairs where any word in name starts with word fragment. I see two approachs here: 1st approach is to load data into cache (HttpRuntimeCache, redis or smth like this) to increase loading time and filter in linq. But I think there will be problems with time required for serialiazing/deserialiazing. Another approach: for instance I have a pair 22:some title here so I need to provide separate table like this:
ID | FRAGMENT
22 | some
22 | title
22 | here
with primary key on both columns and separate index on FRAGMENT column to make queries faster. Any offers and remarks are welcome.
UPD: now I've refreshed my mind. I don't want to query database because requests happen quite often. So now I see the best solution is
load entire list in memory
build trie structure which keeps hashset of values in each node
in case of one text fragment - just return the hashset from trie node, in case of few fragments - find all hashsets and get their intersection

You could try a full-text index on your current DB (if its supported) and the CONTAINS keyword like so
SELECT * FROM tableName WHERE CONTAINS(name, 'bla*');
This will look for words starting with "bla" in the entire string, and also match the string "Monkeys blabla"

I dont really understand your question but if you want to query any table you can do so since you already have the queryString. You can try this out.
var res = _repository.Table.Where(c => c.Name.StartsWith("bla")).Take(25);
If it doesnt help. Try to to restructure your question a little bit.

Is this a case of premature optimization?
How many users will be hitting this service simultaneously? How many will be hitting your database simultaneously? How efficient is your query? How much data will be returned across the wire?
In most cases, you can't outsmart an efficient database for performance. Your row count is too small to create a truly heavy burden on your application's runtime performance when querying. This assumes, of course, that your query is well written and that you're properly opening, closing, and freeing resources in a timely fashion.
Caching the data in memory has its trade-offs that should be considered. It increases the memory footprint of your application, and requires you to write and maintain additional code to maintain that cache. That is by no means prohibitive, but should be considered in view of your overall architecture.
Consider these things carefully. From what I can tell, keeping this data in the database is fine. Deserialization tends to be fast (as most of the data you return is native types), and shouldn't be cost-prohibitive.

Persisting data for catch-all search

We have a set of catch all search pages we're creating in our ASP.NET application. We have an initial search page, a SERP, and then a single item details page. All 3 pages have a search bar with initial criteria, more criteria, and advanced criteria choices.
When we put all of our criteria together, in addition to the main search box we have 20 different criteria parameters (from price, to price, sale item, date created, etc.) and then three collections of parameter IDs. These collections are from a list of the Manufacturers, Product Lines, and Categories our users can search from. So we have this fixed set of 20 fields and then 3 collections that could have a manufacturer or two, or could hold a collection of 100 Guids for the lines whose checkboxes they selected and want to search through.
In our old system we had a single form solution and we just posted back and submitted everything to our business object, passing it into a method that returned the results. In this new form we need to submit the results from page to page and persist this criteria. We're trying to figure out the best way to persist the data, when I say best I mean most efficient.
Querystring - This isn't going to work with large collections of Guid values for the 3 collections.
Session - We would create a criteria object and store it in the Session. As they move from page to page we can pull it out. At our peak we probable have 200-300 people using the server concurrently and the search is our most used form. I'm worried about performance with all those session variables.
Database - We were thinking of serializing and stashing this criteria object into the database (SQL Server 2k5) and the users would always have a current Search or last Search in the database. This eliminates some of the web server load from the Session solution but I'm worried this object load, serialization, db round trip, and unload is going to slow the forms down and affect user experience.
I'm looking for advice on which method is going to work most efficiently for us or if there is an accepted best practice or pattern I've overlooked.

With HTML5 you can use localStorage and sessionStorage, which makes the client keep the information in their browser.
http://www.w3schools.com/html/html5_webstorage.asp

storing dataset of entire table and doing query on copy then updating GridView with results of query

I'm new to n-tier enterprise development. I just got quite a tutorial just reading threw the 'questions that may already have your answer' but didn't find what I was looking for. I'm doing a geneology site that starts off with the first guy that came over on the boat, you click on his name and the grid gets populated with all his children, then click on one of his kids that has kids and the grid gets populated with his kids and so forth. Each record has an ID and a ParentID. When you choose any given person, the ID is stored and then used in a search for all records that match the ParentID which returns all the kids. The data is never changed (at least by the user) so I want to just do one database access, fill all fields into one datatable and then do a requery of it each time to get the records to display. In the DAL I put all the records into a List which, in the ObjectDataSource the function that fills the GridView just returns the List of all entries. What I want to do is requery the datatable, fill the list back up with the new query and display in the GridView. My code is in 3 files here
(I can't get the backticks to show my code in this window) All I need is to figure out how to make a new query on the existing DataTable and copy it to a new DataTable. Hope this explains it well enough.
[edit: It would be easier to just do a new query from the database each time and it would be less resource intensive (in the future if the database gets too large) to store in memory, but I just want to know if I can do it this way - that is, working from 1 copy of the entire table] Any ideas...

Your data represents a tree structure by nature.
A grid to display it may not be my first choice...
Querying all data in one query can be done by using a complex SP.
But you are already considering performance. Thats always a good thing to keep in mind when coming up with a design. But creating something, improve it and only then start to optimize seems a better to go.
Since relational databases are not real good on hierarchical data, consider a nosql (graph)database. As you mentioned there are almost no writes to the DB, nosql shines here.

Need design suggestion for storing data for budget keeping application

I'm writing an application that I will use to keep up with my monthly budget. This will be a C# .NET 4.0 Winforms application.
Essentially my budget will be a matrix of data if you look at it visually. The columns are the "dates" at which that budget item will be spent. For example, I have 1 column for every Friday in the month. The Y axis is the name of the budget item (Car payment, house payment, eating out, etc). There are also categories, which are used to group the budget item names that are similar. For example, a category called "Housing" would have budget items called Mortgage, Rent, Electricity, Home Insurance, etc.
I need a good way to store this data from a code design perspective. Basically I've thought of two approaches:
One, I can have a "BudgetItem" class that has a "Category", "Value", and "Date". I would have a linear list of these items and each time I wanted to find a value by either date or category, I iterate this list in some form or fashion to find the value. I could probably use LINQ for this.
Second, I could use a 2D array which is indexed first by column (date) and second by row. I'd have to maintain categories and budget item names in a separate list and join the data together when I do my lookups somehow.
What is the best way to store this data in code? I'm leaning more towards the first solution but I wanted to see what you guys think. Later on when I implement my data persistence, I want to be able to persist this data to SQL server OR to an XML file (one file per monthly budget).

While your first attempt looks nicer, obviusly the second could be faster (depends on how you implement it). However when we are talking about desktop applications which are not performance critical, your first idea is definitely better, expecially because will help you a lot talking about maintaining your code. Also remember that the entity framework could be really nice in this situation
Finally if you know how to works with XML, I think is really better for this type of project. A database is required only when you have a fair amount of tables, as you explained you will only have 2 tables (budgetitem and category), I don't think you need a database for such a simple thing

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.