I am reading a huge amount of data from a SQL Server database (approx. 2.000.000 entries) and I want to print it to the end-user in a winforms GridView.
First Approach
First idea was using SQLDataReader which normally doesn't take too much time to read from tables with aound 200.000 entries. But uses too much memory (and time !) in the case above.
Actual Solution
The actual solution used is reading from the database through LINQ (dbml file) which is fine because it plugs the components directly to the DB server. It loads data on-the-fly, which is really great.
Problems
The problems are :
When I plug my grid view to the FeedBack Source, it seemed to me that I couldn't read the columns of my grid through code.
This is for plugging my LookUpSearchEdit to the source :
slueTest.Properties.DataSource = lifsTest; // This is the LinqInstantFeedbackSource
gvTest.PopulateColumns();
When I do :
gv.Columns["FirstColumn"] // "FirstColumn" is the name of the field in the LINQ Class (and in the DB)
It raises an exception ...
The data in the FeedBackSource is not accessible at all ...)
I lost all the features of the LookUpSearchEdit and I think it is because the data is read on-the-fly (sorting, searching, etc.).
Questions
Am I doing this right ? Or is there a better way to print a lot of data from a DB without consuming lots of memory / time?
Related
My app will build an item list and grab the necessary data (ex: prices, customer item codes) from an excel file.
This reference excel file has 650 lines and 7 columns.
App will read rows of 10-12 items in one run-time.
Would it be wiser to read line item by line item?
Or should I first read all line item in the excel file into a list/array and make the search from there?
Thank you
It's good to start by designing the classes that best represent the data regardless of where it comes from. Pretend that there is no Excel, SQL, etc.
If your data is always going to be relatively small (650 rows) then I would just read the whole thing into whatever data structure you create (your own classes.) Then you can query those for whatever data you want, like
var itemsIWant = allMyData.Where(item => item.Value == "something");
The reason is that it enables you to separate the query (selecting individual items) from the storage (whatever file or source the data comes from.) If you replace Excel with something else you won't have to rewrite other code. If you read it line by line then the code that selects items based on criteria is mingled with your Excel-reading code.
Keeping things separate enables you to more easily test parts of your code in isolation. You can confirm that one component correctly reads what's in Excel and converts it to your data. You can confirm that another component correctly executes a query to return the data you want (and it doesn't care where that data came from.)
With regard to optimization - you're going to be opening the file from disk and no matter what you'll have to read every row. That's where all the overhead is. Whether you read the whole thing at once and then query or check each row one at a time won't be a significant factor.
I'm new to n-tier enterprise development. I just got quite a tutorial just reading threw the 'questions that may already have your answer' but didn't find what I was looking for. I'm doing a geneology site that starts off with the first guy that came over on the boat, you click on his name and the grid gets populated with all his children, then click on one of his kids that has kids and the grid gets populated with his kids and so forth. Each record has an ID and a ParentID. When you choose any given person, the ID is stored and then used in a search for all records that match the ParentID which returns all the kids. The data is never changed (at least by the user) so I want to just do one database access, fill all fields into one datatable and then do a requery of it each time to get the records to display. In the DAL I put all the records into a List which, in the ObjectDataSource the function that fills the GridView just returns the List of all entries. What I want to do is requery the datatable, fill the list back up with the new query and display in the GridView. My code is in 3 files here
(I can't get the backticks to show my code in this window) All I need is to figure out how to make a new query on the existing DataTable and copy it to a new DataTable. Hope this explains it well enough.
[edit: It would be easier to just do a new query from the database each time and it would be less resource intensive (in the future if the database gets too large) to store in memory, but I just want to know if I can do it this way - that is, working from 1 copy of the entire table] Any ideas...
Your data represents a tree structure by nature.
A grid to display it may not be my first choice...
Querying all data in one query can be done by using a complex SP.
But you are already considering performance. Thats always a good thing to keep in mind when coming up with a design. But creating something, improve it and only then start to optimize seems a better to go.
Since relational databases are not real good on hierarchical data, consider a nosql (graph)database. As you mentioned there are almost no writes to the DB, nosql shines here.
Here's a problem I experience (simplified example):
Let's say I have several tables:
One customer can have mamy products and a product can have multiple features.
On my asp.net front-end I have a grid with customer info:
something like this:
Name Address
John 222 1st st
Mark 111 2nd st
What I need is an ability to filter customers by feature. So, I have a dropdown list of available features that are connected to a customer.
What I currently do:
1. I return DataTable of Customers from stored procedure. I store it in viewstate
2. I return DataTable of features connected to customers from stored procedure. I store it in viewstate
3. On filter selected, I run stored procedure again with new feature_id filter where I do joins again to only show customers that have selected feature.
My problem: It is very slow.
I think that possible solutions would be:
1. On page load return ALL data in one viewstate variable. So basically three lists of nested objects. This will make my page load slow.
2. Perform async loazing in some smart way. How?
Any better solutions?
Edit:
this is a simplified example, so I also need to filter customer by property that is connected through 6 tables to table Customer.
The way I deal with these scenarios is by passing in Xml to SQL and then running a join against that. So Xml would look something like:
<Features><Feat Id="2" /><Feat Id="5" /><feat Id="8" /></Features>
Then you can pass that Xml into SQL (depending on what version of SQL there are different ways), but in the newer version's its a lot easier than it used to be:
http://www.codeproject.com/Articles/20847/Passing-Arrays-in-SQL-Parameters-using-XML-Data-Ty
Also, don't put any of that in ViewState; there's really no reason for that.
Storing an entire list of customers in ViewState is going to be hideously slow; storing all information for all customers in ViewState is going to be worse, unless your entire customer base is very very small, like about 30 records.
For a start, why are you loading all the customers into ViewState? If you have any significant number of customers, load the data a page at a time. That will at least reduce the amount of data flowing over the wire and might speed up your stored procedure as well.
In your position, I would focus on optimizing the data retrieval first (including minimizing the amount you return), and then worry about faster ways to store and display it. If you're up against unusual constraints that prevent this (very slow database; no profiling tools; not allowed to change stored procedures) than please let us know.
Solution 1: Include whatever criteria you need to filter on in your query, only return and render the requested records. No need to use viewstate.
Solution 2: Retrieve some reasonable page limit of customers, filter on the browser with javascript. Allow easy navigation to the next page.
Hello i am developing a Database web application and i am having many reports to populate. I just want to know which one is the Best method among the following which will give me fast and accurate result as the data is going to be in 1000's.
Through populating Dataset?
Through DataReader ?
Through Array List ?
I am using 3 tier architecture. so what if i am writing a function which would be the most appropriate return type of the function in DATA Access Layer ?
You can use "push" method to set the data with a DataSet - this will give you the advantage to set the datasource for the main report and all subreports in one call to the database. However there are some limitations, for example you will be not able to use subreports in the details section.
I am not sure you can use datareader and array list as datasources. Even if you can I cannot see any advantages. Using datareader means that you will keep your connection to the database open while report is rendered ( the first pass). This may take some time and is not necessary. Array list ( if can be used) will allow you to set the data for one table - it is a flat structure - no relations. In most of the cases you probably will load the array-list from the database anyway so it will not make sense to get the data load it in an array and use the array to set one table if you can use a dataset.
Why you are ignoring the regular "pull" method ? It will be simpler.
While I understand this question is fairly vague since I'm not giving you all as much detail as I'd like to, I'm hoping for some general improvements that can be made to my generation code or the reports themselves to speed them up. I've asked for more hardware, but have been denied.
public Stream GenerateReport(string reportName, string format)
{
if (reportName == null)
throw new ArgumentNullException("reportName");
reportExecutionService.LoadReport(reportName, null);
string extension;
string encoding;
string mimeType;
ReportExecution2005.Warning[] warnings;
string[] streamIDs;
byte[] results = reportExecutionService.Render(format, null, out extension,
out encoding, out mimeType, out warnings, out streamIDs);
return new MemoryStream(results);
}
The reports themselves are taking 6-10 seconds each. I've narrowed down the bottleneck to Reporting Services itself. Where should I start looking to removed potential speed bottlenecks. Note: some code has been removed to protect the innocent.
Although not directly related to the code you posted, here are a couple of generic enhancements you should always consider when writing reports in Reporting Services:
Pre-load report tables so that they already aggregate any data that would have been aggregated in the report. For instance, if the report data source summarizes thousands of rows of data and requires joining multiple tables together, then you should create a pre-aggregated table that joins all the data together and already summarizes the data at the required grain for the report.
If you are passing parameters into the data source, then the aggregated underlying table should have a clustered index that corresponds with how the table will be searched. For instance, if the report only displays data for an individual customer and for a given transaction date range, then the clustered index should be ordered on the customer and transaction date.
Filtering data should occur in the data source query and not in the report itself. Meaning, if you parameterize your report so that it filters data, then the parameters should be passed to the database so that it returns a smaller set of data. Do not return a large set of data and then filter the data. It is easy to make this mistake when using a multi-valued parameter since the out-of-box instructions for using multi-value parameters is to filter the data AFTER the data has been returned to Reporting Services.
I hope you are already doing the above and that this is not a relevant post. :)
If you've narrowed it down to Reporting Services solely based on your client code, I would review the queries / SPs that retrieve your data. I've encountered some pretty nasty queries in my day that looked fairly innocent.
I just did a couple of really nasty reports! I had to join on shady criterias on multiple tables containing a few million rows each.
Ended up creating a console application for doing the collection every night for previous day. It just got too heavy with all the logic so generating the report simply took too long. Now speed is not an issue anymore.
It depends on the type of report. These three reports had the need for yesterdays figures only. But as Austin says, the queries or whatever is usually the bottleneck.
Another thing to remember is that the report "expires" after a while (this is default setting). So if you havent used the report for a while it takes a bit longer to generate. If you do it again straight after the next one is faster. Workaround for this would be to go to the report in internet explorer and click properties and have a look at Execution and History (they can be tweaked to improve rendering of reports) Be careful though if the data is critical you could end up with old data.