I've got collection of 50k records, and i get it in one second, but loading it to database take about 10 seconds.
How to increase loading data?
Everything what I make now is:
dgvCars.DataSource=cars;
Data-binding 50k rows is going to take a while. I would first look at reducing the data volume (what is any user really going to do with 50k rows). But otherwise: "virtual mode" (what | how).
Edit; I suspect most of the time is being spent doing things like building control trees and other structures, but it might be that the reflection-based member-access is slowing this down; if so, maybe HyperDescriptor could help (simply by adding a 1 line call in your code to enable it for the associated type).
Its typically more work, but you might look into asynchronous querying. Its more work, but as you can get a subset batch of data back, display it to the grid. Then on continued background results getting returned, just add to the underlying table being displayed in the grid on an as needed basis. Don't worry about pulling the full 50k records down.
Related
I have a database in SQL Server 2012 and want to update a table in it.
My table has three columns, the first column is of type nchar(24). It is filled with billion of rows. The other two columns are from the same type, but they are null (empty) at this moment.
I need to read the data from the first column, with this information I do some calculations. The result of my calculations are two strings, this two strings are the data I want to insert into the two empty columns.
My question is what is the fastest way to read the information from the first column of the table and update the second and third column.
Read and update step by step? Read a few rows, do the calculation, update the rows while reading the next few rows?
As it comes to billion of rows, performance is the only important thing here.
Let me know if you need any more information!
EDIT 1:
My calculation canĀ“t be expressed in SQL.
As the SQL server is on the local machine, the througput is nothing we have to be worried about. One calculation take about 0.02154 seconds, I have a total number of 2.809.475.760 rows this is about 280 GB of data.
Normally, DML is best performed in bigger batches. Depending on your indexing structure, a small batch size (maybe 1000?!) can already deliver the best results, or you might need bigger batch sizes (up to the point where you write all rows of the table in one statement).
Bulk updates can be performed by bulk-inserting information about the updates you want to make, and then updating all rows in the batch in one statement. Alternative strategies exist.
As you can't hold all rows to be updated in memory at the same time you probably need to look into MARS to be able to perform streaming reads while writing occasionally at the same time. Or, you can do it with two connections. Be careful to not deadlock across connections. SQL Server cannot detect that by principle. Only a timeout will resolve such a (distributed) deadlock. Making the reader run under snapshot isolation is a good strategy here. Snapshot isolation causes reader to not block or be blocked.
Linq is pretty efficient from my experiences. I wouldn't worry too much about optimizing your code yet. In fact that is typically something you should avoid is prematurely optimizing your code, just get it to work first then refactor as needed. As a side note, I once tested a stored procedure against a Linq query, and Linq won (to my amazement)
There is no simple how and a one-solution-fits all here.
If there are billions of rows, does performance matter? It doesn't seem to me that it has to be done within a second.
What is the expected throughput of the database and network. If your behind a POTS dial-in link the case is massively different when on 10Gb fiber.
The computations? How expensive are they? Just c=a+b or heavy processing of other text files.
Just a couple of questions raised in response. As such there is a lot more involved that we are not aware of to answer correctly.
Try a couple of things and measure it.
As a general rule: Writing to a database can be improved by batching instead of single updates.
Using a async pattern can free up some of the time for calculations instead of waiting.
EDIT in reply to comment
If calculations take 20ms biggest problem is IO. Multithreading won't bring you much.
Read the records in sequence using snapshot isolation so it's not hampered by write locks and update in batches. My guess is that the reader stays ahead of the writer without much trouble, reading in batches adds complexity without gaining much.
Find the sweet spot for the right batchsize by experimenting.
I am accessing a UniVerse database and reading out all the records in it for the purpose of synchronizing it to a MySQL database which is used for compatibility with some other applications which use the data. Some of the tables are >250,000 records long with >100 columns and the server is rather old and still used by many simultaneous users and so it takes a very ... long ... time to read the records sometimes.
Example: I execute SSELECT <file> TO 0 and begin reading through the select list, parsing each record into our data abstraction type and putting it in a .NET List. Depending on the moment, fetching each record can take between 250ms to 3/4 second depending on database usage. Removing the methods for extraction only speeds it up marginally since I think it still downloads all of the record information anyway when I call UniFile.read even if I don't use it.
Reading 250,000 records at this speed is prohibitively slow, so does anyone know a way I can speed this up? Is there some option I should be setting somewhere?
Do you really need to use SSELECT (sorted select)? The sorting on record key will create an additional performance overhead. If you do not need to synchronise in a sorted manner just use a plain SELECT and this should improve the performance.
If this doesn't help then try to automate the synchronisation to run at a time of low system usage, when either few or no users are logged onto the UniVerse system, if at all possible.
Other than that it could be that some of the tables you are exporting are in need of a resize. If they are not dynamic files (automatic-resizing - type 30), they may have gone into overflow space on disk.
To find out the size of your biggest tables and to see if they have gone into overflow you can use commands such as FILE.STAT and HASH.HELP at the command line to retrieve more information. Use HELP FILE.STAT or HELP HASH.HELP to look at the documentation for these commands, in order to extract the information that you need.
If these commands show that your files are of type 30, then they are automatically resized by the database engine. If however the file types are anything from type 2 to 18 the HASH.HELP command may recommend changes you can make to the table size to increase it's performance.
If none of this helps then you could check for useful indexes on the tables using LIST.INDEX TABLENAME ALL, which you could maybe use to speed up the selection.
Ensure your files are sized correctly using ANALYZE-FILE fileName. If not dynamic ensure there is not too much overflow.
Using SELECT instead of SSELECT will mean you are reading data from the database sequentially rather than randomly and be signicantly faster.
You should also investigate how you are extracting the data from each record and putting it into a list. Usually the pick data separators chars 254, 253 and 252 will not be compatible with the external database and need to be converted. How this is done can make an enormous difference to the performance.
It is not clear from the initial post, however a WRITESEQ would probably be the most efficient way to output the file data.
I have a C# List, that is filled from a database.. So far its only 1400 records, but I expect it to grow a LOT.. Routinely I do a check for new data on the entire list.. What I'm trying to figure out is this, is it faster to simply clear the List and reload all the data from the table, or would checking each record be faster..
Intuition tells me that the dump and load method would be faster, but I thought I should check first...
If my understanding is correct, you would have to load the list from MySql anyway, and veryfy that the in memory list is up to date, correct? So then the only issue you refer to is the in memory manegement of the list.
Well, typically I would try to profile the different behaviours first, an see which performs better.
But as you state, I would think that a clear and recreate should be faster than a systematic check and update.
You should dump and reload, definitely. I base this advice purely on my (perhaps unwarranted) fear of your code that checks for new data.
Depends how slow the load is. I'd say access something in memory is always going to be faster than loading from a DB. However you need to do your own calculations.
Remember don't optimise without numbers to back you up.
If the database is not getting hit several times an hour, then definitly reload the data from the database. Offloding as much work to the database as it is designed to take full advantage of system resources.
Add a column called 'InsertedTime' in your database. At intervals update those rows with InsertedTime>ListCreatedTime(a variable in your app)
This assumes that your db data are not deleted
If you want to handle db changes like updation and deletion. You can better set flags in your db. Later when you've updated your list, you can delete your records from your app.
But anyways profile your dump and load method and compare it with this to know which is faster in your scenario.
I am working with a rather large mysql database (several million rows) with a column storing blob images. The application attempts to grab a subset of the images and runs some processing algorithms on them. The problem I'm running into is that, due to the rather large dataset that I have, the dataset that my query is returning is too large to store in memory.
For the time being, I have changed the query to not return the images. While iterating over the resultset, I run another select which grabs the individual image that relates to the current record. This works, but the tens of thousands of extra queries have resulted in a performance decrease that is unacceptable.
My next idea is to limit the original query to 10,000 results or so, and then keep querying over spans of 10,000 rows. This seems like the middle of the road compromise between the two approaches. I feel that there is probably a better solution that I am not aware of. Is there another way to only have portions of a gigantic resultset in memory at a time?
Cheers,
Dave McClelland
One option is to use a DataReader. It streams the data, but it's at the expense of keeping an open connection to the database. If you're iterating over several million rows and performing processing for each one, that may not be desirable.
I think you're heading down the right path of grabbing the data in chunks, probably using MySql's Limit method, correct?
When dealing with such large datasets it is important not to need to have it all in memory at once. If you are writing the result out to disk or to a webpage, do that as you read in each row. Don't wait until you've read all rows before you start writing.
You also could have set the images to DelayLoad = true so that they are only fetched when you need them rather than implementing this functionality yourself. See here for more info.
I see 2 options.
1) if this is a windows app (as opposed to a web app) you can read each image using a data reader and dump the file to a temp folder on the disk, then you can do whatever processing you need to against the physical file.
2) Read and process the data in small chunks. 10k rows can still be a lot depending on how large the images are and how much process you want to do. Returning 5k worth of rows at a time and reading more in a separate thread when you are down to 1k remaining to process can make for a seamless process.
Also while not always recommended, forcing garbage collection before processing the next set of rows can help to free up memory.
I've used a solution like one outlined in this tutorial before:
http://www.asp.net/(S(pdfrohu0ajmwt445fanvj2r3))/learn/data-access/tutorial-25-cs.aspx
You could use multi-threading to pre-pull a portion of the next few datasets (at first pull 1-10,000 and in the background pull 10,001 - 20,000 and 20,001-30,000 rows; and delete the previous pages of the data (say if you are at 50,000 to 60,000 delete the first 1-10,000 rows to conserve memory if that is an issue). And use the user's location of the current "page" as a pointer to pull next range of data or delete some out-of-range data.
I want to show a large amount of data in a dataset, 100,000 records approx 10 columns, this consumes a large amount of ram 700MB. I have also tried using paging which reduces this by about 15-20% but I don't really like the Previous/Next buttons involved when using paging. I'm not writing the data to disk at present, should I be? If so what is the most common method? The data isn't to be stored forever just whilst it is being viewed, then a new query may be run and a nother 70,000 records could be viewed. What is the best way to proceed?
Thanks for the advice.
The reality is that the end-user rarely needs to see the totality of their dataset, so I would use which method you like for presenting the data (listview) and build a custom pager so that the dataset is only fed with the results of the number of records desired. Otherwise, each page load would result in re-calling the dataset.
The XML method to a temp file or utilizing a temp table created through a stored proc are alternatives but you still must sift and present the data.
An important question is where this data comes from. That will help determine what options are available to you. Writing to disk would work, but it probably isn't the best choice, for three reasons:
As a user, I'd be pretty annoyed if your app suddenly chewed up 700Mb of disk space with no warning at all. But, then, I'd notice such things. I suppose a lot of users wouldn't. Still: it's a lot of space.
Depending on the source of the data, even the initial transfer could take longer than your really want to allow.
Again, as a user, there's NO WAY I'm manually digging through 700Mb worth of data. That means you almost certainly never need to show it. You want to only load the requested page, one (or a couple) pages at a time.
I would suggest memory mapped files...not sure if .NET includes support for it yet.
That is a lot of data to be working with and keeping aroudn in memory.
Is this a ASP.NET app? Or a Windows app?
I personally have found that going with a custom pager setup (to control, next previous links) and paging at the database level to be the only possible way to get the best performance, only get the data needed....
implement paging in SQL if you want to reduce the memory footprint