I have a page that calls a Stored procedure to read 3 to 4 million data, make some calculations and return a small data table. The SP is slow approx. 20 to 30 sec. therefore, overall page load is slow.
Will i re-factor the SP? but problem is that whatever i do my end result will be the small Datatable.
Is there any suggestion to improve the performance?
If the table is not updated so often make a "aggregated" table that have better performance.
If you're churning through 3 or 4 million rows of data and actually doing real work, 20 or 30 seconds is pretty decent performance, IMHO. Check the execution plan of your stored procedure. In the ideal world every table would be getting hit with index seeks rather than table scans. Consult with your DBA if you're not sure how to interpret the showplan results. I assume you're using SQL Server.
Check to make sure your tables have appropriate indices and that the statistics are up to date. Update them if not. Recompile the stored procedure. Parameters passed to the stored procedure can bollux up the cached execution plan, if the are oddball values. You can prevent this by coding your stored procedure like so:
create proc myProc
#p1 varchar(32)
as
declare
#p1Local varchar(32)
set #p1Local = #p1
...
If your data does not get updated too often, you can create an indexed view.
http://msdn.microsoft.com/en-us/library/dd171921(v=sql.100).aspx:
indexed views provide additional performance benefits that cannot be achieved using standard indexes. Indexed views can increase query performance in the following ways:
Aggregations can be precomputed and stored in the index to minimize expensive computations during query execution.
Tables can be prejoined and the resulting data set stored.
Combinations of joins or aggregations can be stored.
Related
This question already has answers here:
Insert 2 million rows into SQL Server quickly
(8 answers)
Closed 8 years ago.
I am writing a stored procedure to insert rows into a table. The problem is that in some operation we might want to insert more than 1 million rows and we want to make it fast. Another thing is that in one of the column, it is Nvarchar(MAX). We might want to put avg 1000 characters in this column.
Firstly, I wrote a prc to insert row by row. Then I generate some random data for insert with the NVARCHAR(MAX) column to be a string of 1000 characters. Then use a loop to call the prc to insert the rows. The perf is very bad which takes 48 mins if I use SQL server to log on the database server to insert. If I use C# to connect to the server in my desktop (that is what we usually want to do ), it takes about more than 90mins.
Then, I changed the prc to take a table type parameter as the input. I prepared the rows somehow and put them in the table type parameter and do the insert by the following command:
INSERT INTO tableA SELECT * from #tableTypeParameterB
I tried batch size as 1000 rows and 3000 rows (Put 1000-3000 rows in the #tableTypeParameterB to be inserted for one time). The performance is still bad. It takes about 3 mins to insert 1 million rows if I run it in the SQL server and take about 10 mins if I use C# program to connect from my desktop.
The tableA has a clustered index with 2 columns.
My target is to make the insert as fast as possible (My idea target is within 1 min). Is there any way to optimize it?
Just an update:
I tried the Bulk Copy Insert which was suggested by some people below. I tried use the SQLBULKCOPY to insert 1000 row and 10000 row at a time. The performance is still 10 mins to insert 1 million row (Every row has a column with 1000 characters). There is no performance improve. Is there any other suggestions?
An update based on the comments require.
The data is actually coming from UI. The user will change use UI to bulk select, we say, one million rows and change one column from the old value to new value. This operation will be done in a separate procedure.But here what we need to do is that make the mid-tier service to get the old value and new value from the UI and insert them in the table. The old value and new value may be up to 4000 characters and the average is 1000 characters. I think the long string old/new value slow down the speed because when I change the test data old value/new value to 20-50 characters and insert is very fast no matter use SQLBulkCopy or table type variable
I think what you are looking for is Bulk Insert if you prefer using SQL.
Or there is also the ADO.NET for Batch Operations option, so you keep the logic in your C# application. This article is also very complete.
Update
Yes I'm afraid bulk insert will only work with imported files (from within the database).
I have an experience in a Java project where we needed to insert millions of rows (data came from outside the application btw).
Database was Oracle, so of course we used the multi-line insert of Oracle. It turned out that the Java batch update was much faster than the multi-valued insert of Oracle (so called "bulk updates").
My suggestion is:
Compare the performance between the multi-value insert of SQL Server code (then you can read from inside your database, a procedure if you like) with the ADO.NET Batch Insert.
If the data you are going to manipulate is coming from outside your application (if it is not already in the database), I would say just go for the ADO.NET Batch Inserts. I think that its your case.
Note: Keep in mind that batch inserts usually operate with the same query. That is what makes them so fast.
Calling a prc in a loop incurs many round trips to SQL.
Not sure what batching approach you used but you should look into table value parameters: Docs are here. You'll want to still batch write.
You'll also want to consider memory on your server. Batching (say 10K at a time) might be a bit slower but might keep memory pressure lower on your server since you're buffering and processing a set at a time.
Table-valued parameters provide an easy way to marshal multiple rows
of data from a client application to SQL Server without requiring
multiple round trips or special server-side logic for processing the
data. You can use table-valued parameters to encapsulate rows of data
in a client application and send the data to the server in a single
parameterized command. The incoming data rows are stored in a table
variable that can then be operated on by using Transact-SQL.
Another option is bulk insert. TVPs benefit from re-use however so it depends on your usage pattern. The first link has a note about comparing:
Using table-valued parameters is comparable to other ways of using
set-based variables; however, using table-valued parameters frequently
can be faster for large data sets. Compared to bulk operations that
have a greater startup cost than table-valued parameters, table-valued
parameters perform well for inserting less than 1000 rows.
Table-valued parameters that are reused benefit from temporary table
caching. This table caching enables better scalability than equivalent
BULK INSERT operations.
Another comparison here: Performance of bcp/BULK INSERT vs. Table-Valued Parameters
Here is an example what I've used before with SqlBulkCopy. Grant it I was only dealing with around 10,000 records but it did it inserted them a few seconds after the query ran. My field names were the same so it was pretty easy. You might have to modify the DataTable field names. Hope this helps.
private void UpdateMemberRecords(Int32 memberId)
{
string sql = string.Format("select * from Member where mem_id > {0}", memberId);
try {
DataTable dt = new DataTable();
using (SqlDataAdapter da = new SqlDataAdapter(new SqlCommand(sql, _sourceDb))) {
da.Fill(dt);
}
Console.WriteLine("Member Count: {0}", dt.Rows.Count);
using (SqlBulkCopy sqlBulk = new SqlBulkCopy(ConfigurationManager.AppSettings("DestDb"), SqlBulkCopyOptions.KeepIdentity)) {
sqlBulk.BulkCopyTimeout = 600;
sqlBulk.DestinationTableName = "Member";
sqlBulk.WriteToServer(dt);
}
} catch (Exception ex) {
throw;
}
}
If you have SQL2014, then the speed of In-Memory OLTP is amazing;
http://msdn.microsoft.com/en-au/library/dn133186.aspx
Depending on your end goal, it may be a good idea to look into Entity Framework (or similar). This abstracts out the SQL so that you don't really have to worry about it in your client application, which is how things should be.
Eventually, you could end up with something like this:
using (DatabaseContext db = new DatabaseContext())
{
for (int i = 0; i < 1000000; i++)
{
db.Table.Add(new Row(){ /* column data goes here */});
}
db.SaveChanges();
}
The key part here (and it boils down to a lot of the other answers) is that Entity Framework handles building the actual insert statement and committing it to the database.
In the above code, nothing will actually be sent to the database until SaveChanges is called and then everything is sent.
I can't quite remember where I found it, but there is research around that suggests it is worth while to call SaveChanges every so often. From memory, I think every 1000 entries is a good choice for committing to the database. Committing every entry, compared to every 100 entries, doesn't provide much performance benefit and 10000 takes it past the limit. Don't take my word for that though, the numbers could be wrong. You seem to have a good grasp on the testing side of things though, so have a play around with things.
I am retrieving rows from large table (1.8 GB, 20 milions of records) with DataReader.
The SQL Server (2008 R2) consumes a lot of memory and (sometimes) doesn't survive this query.
It is probably holding the whole result in memory and returning the rows from this buffer to client.
The select is quite simple - it just returns all rows from table with simple where condition, date stored in column is smaller than actual date. There are no blobs or strings in columns.
Am I right with my estimation about the cause of memory usage? And what can I do in this situation - I need all rows, the query doesn't have to be fast, but memory efficient.
Thanks
Updated info - select is in stored procedure. Code:
CREATE PROCEDURE [get_current_records]
with recompile
AS
BEGIN
declare #currentDate datetime = getdate()
SELECT
[id]
, name
, description
, number
,[valid_from]
,[valid_to]
from ui_parcela
where valid_from < #currentDate and (valid_to is null or valid_to > #currentDate )
END
Its important to know what are you doing with those rows?
Are you storing them in the memory , or you can use each row and release them from the memory
So I am suggesting you try using async reader
It sounds like you need to look at your SQL query if the server is running out of memory. Have you Indexed correctly?
Check the SQL execution plan to see what is expensive.
I am using a simple table with 6 columns, 3 of which are of XML type, not schema-constrained.
When the table reaches a size around 120,000 or 150,000 rows, I see a dramatic performance cost in doing any query in the table. For comparison, I have another table, which grows in size at about the same rate, but only contain scalar types (int, datetime, a few float columns). That table performs perfectly fine even after 200,000 rows.
And by the way, I am not using XQuery on the xml columns, i am only using regular SQL query statements.
Some specifics: both tables contain a DateTime field called SampleTime.
a statement like (it's in a stored procedure but I show you the actual statement)
SELECT MAX(sampleTime) SampleTime
FROM dbo.MyRecords
WHERE PlacementID=#somenumber
takes 0 seconds on the table without xml columns, and anything from 13 to 20 seconds on the table with XML columns. That depends on which drive I set my database on. At the moment it sits on a different spindle (not C:) and it takes 13 seconds.
Has anyone seen this behavior before, or have any hint at what I am doing wrong?
I tried this with SQL 2008 EXPRESS and the full-blown SQL Server 2008, that made no difference.
Oh, one last detail: I am doing this from a C# application, .NET 3.5, using SqlConnection, SqlReader, etc..
I'd appreciate some insight into that, thanks!
Sam
Do you have an index on PlacementID and sampleTime in that order?
Both the size of the table and columns types are irrelevant if an index can satisfy the query ("covering")
We noticed significant performance problems when the size of individual xml rows surpassed 64kb. Not sure if you are in that range or not, but it was the difference between nearly instant queries and those taking upwards of 60 seconds.
At the end of the day we extracted out all of the queryable data into normal sql tables to perform our searches on. Incidentally, that was the last time we used the xml data type.
I have a table that has 5 columns: AcctId (int), Address1 (varchar), Address2 (varchar), Person1 (varchar), Person2 (varchar) . I'm generating random data to insert into this table via a C# console application. I've tried doing this random data insert via SQL-Server and decided it was not a good solution -- SQL is not good at random on an each-row basis. Generating the random data -- 975k rows of it -- takes a minimal amount of time. It's in a List of custom objects.
I need to take this random data and update many rows in the database with the new random data. I tried updating the rows one at a time, very slow because of the repeated searching of the List object in code. So I think the best approach is to put all the randomized data into a table in the database, then update all the other tables that use this data. I.e. UPDATE t SET t.Address1=d.Address1 FROM Table1 t INNER JOIN RandomizedData d ON d.AcctId = t.Acct_ID. The database is very un-normalized so this Acct data is sprinkled all over the place. I've got no control of the normalization.
So, having decided to insert all of the randomized data into a single table, I set out to create insert scripts:
USE TheDatabase
Insert tmp_RandomizedData
SELECT 1,'4392 EIGHTH AVE','','JENNIFER CARTER','BARBARA CARTER' UNION ALL
SELECT 2,'2168 MAIN ST','HNGR F','DANIEL HERNANDEZ','SUSAN MARTIN'
// etc another 98 times...
// FYI, this is not real data!
I'm building this INSERT script in batches of 100. It's taking on average 175 ms to run each insert. Does this seem like a long time? It's going to take about 35 mins to run the whole insert.
The table doesn't have a primary key or any indexes. I was planning on adding those after all the data is inserted (thinking that that would be faster).
Is there a better way to do this?
The SQLBulkCopy class in .net can blast records in pretty quickly. I used this to transfer data from an i-Series database to SQL Tables very rapidly.
Use BCP. You can use this article as a guide. It's for VB6 but the gist is exactly the same. The trick is to use the BULK INSERT command.
... Read more of your question, you might also want to look at Sql RedGates sample data generator, it generates tons of data really, really, fast.
Use larger batches, 50,000 to 75,000 rows. On SQL 2000 on hardware from 2000, the sweet spot for inserts was 50,000 rows. This was on a live production database, with indexes, during the day on a very large table.
Small batch sizes are better for inserts into a highly active table and where there is a high deadlock risk. Is anyone else using this table while your doing inserts?
Is this a one time import? Let it run over night.
Finally, INSERT statements executed via ADO.NET isn't really an optimal ETL solution. SSIS, DTS, (or any other ETL solution, such as Talend) would be more appropriate for heavy duty data moving. On the other hand, if all you have is a hammer...
I have some complex stored procedures that may return many thousands of rows, and take a long time to complete.
Is there any way to find out how many rows are going to be returned before the query executes and fetches the data?
This is with Visual Studio 2005, a Winforms application and SQL Server 2005.
You mentioned your stored procedures take a long time to complete. Is the majority of the time taken up during the process of selecting the rows from the database or returning the rows to the caller?
If it is the latter, maybe you can create a mirror version of your SP that just gets the count instead of the actual rows. If it is the former, well, there isn't really that much you can do since it is the act of finding the eligible rows which is slow.
A solution to your problem might be to re-write the stored procedure so that it limits the result set to some number, like:
SELECT TOP 1000 * FROM tblWHATEVER
in SQL Server, or
SELECT * FROM tblWHATEVER WHERE ROWNUM <= 1000
in Oracle. Or implement a paging solution so that the result set of each call is acceptably small.
make a stored proc to count the rows first.
SELECT COUNT(*) FROM table
Unless there's some aspect of the business logic of you app that allows calculating this, no. The database it going to have to do all the where & join logic to figure out how line rows, and that's the vast majority of the time spend in the SP.
You can't get the rowcount of a procedure without executing the procedure.
You could make a different procedure that accepts the same parameters, the purpose of which is to tell you how many rows the other procedure should return. However, the steps required by this procedure would normally be so similar to those of the main procedure that it should take just about as long as just executing the main procedure.
You would have to write a different version of the stored procedure to get a row count. This one would probably be much faster because you could eliminate joining tables which you aren't filtered against, remove ordering, etc. For example if your stored proc executed the sql such as:
select firstname, lastname, email, orderdate from
customer inner join productorder on customer.customerid=productorder.productorderid
where orderdate>#orderdate order by lastname, firstname;
your counting version would be something like:
select count(*) from productorder where orderdate>#orderdate;
Not in general.
Through knowledge about the operation of the stored procedure, you may be able to get either an estimate or an accurate count (for instance, if the "core" or "base" table of the query is able to be quickly calculated, but it is complex joins and/or summaries which drive the time upwards).
But you would have to call the counting SP first and then the data SP or you could look at using a multiple result set SP.
It could take as long to get a row count as to get the actual data, so I wouldn't advodate performing a count in most cases.
Some possibilities:
1) Does SQL Server expose its query optimiser findings in some way? i.e. can you parse the query and then obtain an estimate of the rowcount? (I don't know SQL Server).
2) Perhaps based on the criteria the user gives you can perform some estimations of your own. For example, if the user enters 'S%' in the customer surname field to query orders you could determine that that matches 7% (say) of the customer records, and extrapolate that the query may return about 7% of the order records.
Going on what Tony Andrews said in his answer, you can get an estimated query plan of the call to your query with:
SET showplan_text OFF
GO
SET showplan_all on
GO
--Replace with call you your stored procedure
select * from MyTable
GO
SET showplan_all ofF
GO
This should return a table, or many tables which will let you get the estimated row count of your query.
You need to analyze the returned data set, to determine what is a logical, (meaningful) primary key for the result set that is being returned. In general this WILL be much faster than the complete procedure, because the server is not constructing a result set from data in all the columns of each row of each table, it is simply counting the rows... In general, it may not even need to read the actual table rows off disk to do this, it may simply need to count index nodes...
Then write another SQL statement that only includes the tables necessary to generate those key columns (Hopefully this is a subset of the tables in the main sql query), and the same where clause with the same filtering predicate values...
Then add another Optional parameter to the Stored Proc called, say, #CountsOnly, with a default of false (0) as so...
Alter Procedure <storedProcName>
#param1 Type,
-- Other current params
#CountsOnly TinyInt = 0
As
Set NoCount On
If #CountsOnly = 1
Select Count(*)
From TableA A
Join TableB B On etc. etc...
Where < here put all Filtering predicates >
Else
<Here put old SQL That returns complete resultset with all data>
Return 0
You can then just call the same stored proc with #CountsOnly set equal to 1 to just get the count of records. Old code that calls the proc would still function as it used to, since the parameter value is set to default to false (0), if it is not included
It's at least technically possible to run a procedure that puts the result set in a temporary table. Then you can find the number of rows before you move the data from server to application and would save having to create the result set twice.
But I doubt it's worth the trouble unless creating the result set takes a very long time, and in that case it may be big enough that the temp table would be a problem. Almost certainly the time to move the big table over the network will be many times what is needed to create it.