Make this SQL query more efficient - c#

I have the following query which checks goods receipts against purchase orders to see what items were originally ordered and how many have been booked in via goods receipts. E.g I place a purchase order for 10 banana milkshakes, I then generate a goods receipt stating that I received 5 of these milkshakes on said purchase order.
SELECT t.PONUM, t.ITMNUM, t.ordered,
SUM(t.received) as received,
t.ordered - ISNULL(SUM(t.received),0) as remaining,
SUM(t.orderedcartons) as orderedcartons,
SUM(t.cartonsreceived) as cartonsreceived,
SUM(t.remainingcartons) as remainingcartonsFROM(SELECT pod.PONUM,
pod.ITMNUM, pod.QTY as ordered, ISNULL(grd.QTYRECEIVED, 0) as received,
pod.DELIVERYSIZE as orderedcartons,
ISNULL(grd.DELIVERYSIZERECEIVED, 0) as cartonsreceived,
(pod.DELIVERYSIZE - ISNULL(grd.DELIVERYSIZERECEIVED, 0)) as remainingcartons
FROM TBLPODETAILS pod
LEFT OUTER JOIN TBLGRDETAILS grd
ON pod.PONUM = grd.PONUM and pod.ITMNUM = grd.ITMNUM) t
GROUP BY t.ITMNUM, t.PONUM, t.ordered
ORDER BY t.PONUM
Which returns the following data:
PONUM ITMNUM ordered received remaining orderedcartons cartonsreceived remainingcartons
1 1 5.0000 3.0000 2.0000 5.0000 3.0000 2.0000
Next I have a C# loop to generate update queries based on the data I get back from the above query:
foreach (DataRow POUpdate in dt.Rows) {...
query += "UPDATE MYTABLE SET REMAININGITEMS=" + remainingQty.ToString()
+ ", REMAININGDELIVERYSIZE=" + remainingBoxes.ToString() + " WHERE ITMNUM="
+ itemNumber + " AND PONUM=" + poNumber + ";";
I then execute each update query against the DB. Which works fine on my local dev machine.
However deploying to production server pulls back over 150,000 records on that first query.
So looping around so many rows locks up SQL and my app. Is it the foreach? Is it the original select loading all that data into memory? Both? Can I make this query into one single query and cut out the C# loop? If so what's the most efficient way to achieve this?

In SQL, the goal should be to write operations on entire tables at once. The SQL server can be very efficient at doing so, but will need a significant overhead on any interaction, since it needs to deal with consistency, atomicity of transactions, etc. So in a way, your fixed cost per transaction is high, for the server to do its thing, but your marginal cost for additional rows in a transaction is very low - updating 1m rows may be 1/2 as fast as updating 10.
This means that the foreach is going to cause the SQL server to constantly go back and forth with your application, and that fixed cost of locking/unlocking and doing transactions is being occurred every time.
Can you write the query to operate in SQL, instead of manipulating data in C#? It seems you want to write a relatively simple update based on your select statement (See, for instance, SQL update from one Table to another based on a ID match.
Try something like the following (Not code tested, since i don't have access to your database structure, etc.):
UPDATE MYTABLE
SET REMAININGITEMS = remainingQty,
REMAININGDELIVERYSIZE=remainingBoxes
From
(SELECT t.PONUM, t.ITMNUM, t.ordered,
SUM(t.received) as received,
t.ordered - ISNULL(SUM(t.received),0) as remaining,
SUM(t.orderedcartons) as orderedcartons,
SUM(t.cartonsreceived) as cartonsreceived,
SUM(t.remainingcartons) as remainingcartonsFROM(SELECT pod.PONUM,
pod.ITMNUM, pod.QTY as ordered, ISNULL(grd.QTYRECEIVED, 0) as received,
pod.DELIVERYSIZE as orderedcartons,
ISNULL(grd.DELIVERYSIZERECEIVED, 0) as cartonsreceived,
(pod.DELIVERYSIZE - ISNULL(grd.DELIVERYSIZERECEIVED, 0)) as remainingcartons
FROM TBLPODETAILS pod
LEFT OUTER JOIN TBLGRDETAILS grd
ON pod.PONUM = grd.PONUM and pod.ITMNUM = grd.ITMNUM) t
GROUP BY t.ITMNUM, t.PONUM, t.ordered
ORDER BY t.PONUM ) as x
join MYTABLE on MYTABLE.ITMNUM=x.itmnum AND MYTABLE.PONUM=i.ponum

As KM says in the comments, the problem here is coming back to the client app, and to then operate on each row with another database trip. That's slow, and can lead to stupid little bugs, which could cause bogus data.
Also, concatenating strings into SQL as you're doing is generally considered a very bad idea - SQL Injection (as Joel Coehoorn writes) is a real possibility.
How about:
create view OrderBalance
as
SELECT t.PONUM, t.ITMNUM, t.ordered,
SUM(t.received) as received,
t.ordered - ISNULL(SUM(t.received),0) as remaining,
SUM(t.orderedcartons) as orderedcartons,
SUM(t.cartonsreceived) as cartonsreceived,
SUM(t.remainingcartons) as remainingcartonsFROM(SELECT pod.PONUM,
pod.ITMNUM, pod.QTY as ordered, ISNULL(grd.QTYRECEIVED, 0) as received,
pod.DELIVERYSIZE as orderedcartons,
ISNULL(grd.DELIVERYSIZERECEIVED, 0) as cartonsreceived,
(pod.DELIVERYSIZE - ISNULL(grd.DELIVERYSIZERECEIVED, 0)) as remainingcartons
FROM TBLPODETAILS pod
LEFT OUTER JOIN TBLGRDETAILS grd
ON pod.PONUM = grd.PONUM and pod.ITMNUM = grd.ITMNUM) t
GROUP BY t.ITMNUM, t.PONUM, t.ordered
This seems to have exactly the data that your "MYTABLE" has - maybe you don't even need MYTABLE anymore, and you can just use the view!
If you have other data on MYTABLE, your update becomes:
UPDATE MYTABLE
SET REMAININGITEMS = ob.remainingitems,
REMAININGDELIVERYSIZE = ob.remainingBoxes
from MYTABLE mt
join OrderBalance ob
on mt.ITMNUM = ob.itemNumber
AND mt.PONUM = ob.poNumber
(Although, as David Mannheim writes, it may be better to not use a view and use a solution similar to the one he proposes).

The other answers show you a great way to perform the whole update entirely in RDBMS. If you can do it like that, that's the perfect solution: you cannot beat it with a C# / RDBMS combination because of extra roundtrips and data transfer issues.
However, if your update requires some calculations that for one reason or the other cannot be performed in RDBMS, you should modify your code to construct a single parameterized update in place of a potentially gigantic 150000-row update that you are currently constructing.
using (var upd = conn.CreateCommand()) {
upd.CommandText = #"
UPDATE MYTABLE SET
REMAININGITEMS=#remainingQty
, REMAININGDELIVERYSIZE=#remainingBoxes
WHERE ITMNUM=#itemNumber AND PONUM=#poNumber";
var remainingQtyParam = upd.CreateParameter();
remainingQtyParam.ParameterName = "#remainingQty";
remainingQtyParam.DbType = DbType.Int64; // <<== Correct for your specific type
upd.Parameters.Add(remainingQtyParam);
var remainingBoxesParam = upd.CreateParameter();
remainingBoxesParam.ParameterName = "#remainingBoxes";
remainingBoxesParam.DbType = DbType.Int64; // <<== Correct for your specific type
upd.Parameters.Add(remainingBoxesParam);
...
foreach (DataRow POUpdate in dt.Rows) {
remainingQtyParam.Value = ...
remainingBoxesParam.Value = ...
upd.ExecuteNonQuery();
}
}
The idea is to make 150,000 updates that look the same into a single parameterized update that is actually a single statement.

Related

One query that gets all data vs 300 specific queries

First of all, I'm aware that there are similar questions out there, but I can't seem to find one that fits my case.
I have a program that updates the stock in a CSV file of about 300 products by checking with our database.
The database has +/- 100k records. At the moment I'm doing a select of all records in the database and just use the ones I need. But I selected 99k records too much.
I could do specific selects for the products too, but I'm worried sending 300 queries (might become more in the future) in a matter of seconds would be too much.
Or another (probably stupid) option:
select * from database where id=product1 || id=product2
|| id=product3 || id=product4,...
What's the better option in this case? I'm not worrying about execution time, I'd rather have clean and "efficient" code than fast code in this scenario.
You can try like this:
select *
from database where id IN (1, 2, 3)
If the count of values to search is more than the count which is not then do the reverse
select *
from database where id NOT IN (some values)
You could do something like this:
select *
from database
where id IN (1, 2, 3)
All of your ids that you want to get can just go into that array so you don't have to use a long list of or clauses.

Performance issues to iterate results with C# SQLite DataReader and attached database

I am using System.Data.SQLite and SQLiteDataReader in my C# project. I am facing performance issues when getting the results of a query with attached databases.
Here is an example of a query to search text into two databases :
ATTACH "db2.db" as db2;
SELECT MainRecord.RecordID,
((LENGTH(MainRecord.Value) - LENGTH(REPLACE(UPPER(MainRecord.Value), UPPER("FirstValueToSearch"), ""))) / 18) AS "FirstResultNumber",
((LENGTH(DB2Record.Value) - LENGTH(REPLACE(UPPER(DB2Record.Value), UPPER("SecondValueToSearch"), ""))) / 19) AS "SecondResultNumber"
FROM main.Record MainRecord
JOIN db2.Record DB2Record ON DB2Record.RecordID BETWEEN (MainRecord.PositionMin) AND (MainRecord.PositionMax)
WHERE FirstResultNumber > 0 AND SecondResultNumber > 0;
DETACH db2;
When executing this query with SQLiteStudio or SQLiteAdmin, this works fine, I am getting the results in a few seconds (the Record table can contain hundreds of thousands of records, the query returns 36000 records).
When executing this query in my C# project, the execution takes a few seconds too, but it takes hours to run through all the results.
Here is my code :
// Attach databases
SQLiteDataReader data = null;
using (SQLiteCommand command = this.m_connection.CreateCommand())
{
command.CommandText = "SELECT...";
data = command.ExecuteReader();
}
if (data.HasRows)
{
while (data.Read())
{
// Do nothing, just iterate all results
}
}
data.Close();
// Detach databases
Calling the Read method of the SQLiteDataReader once can take more than 10 seconds ! I guess this is because the SQLiteDataReader is lazy loaded (and so it doesn't return the whole rowset before reading the results), am I right ?
EDIT 1 :
I don't know if this has something to do with lazy loading, like I said initially, but all I want is being able to get ALL the results as soon as the query is ended. Isn't it possible ? In my opinion, this is really strange that it takes hours to get results of a query executed in few seconds...
EDIT 2 :
I just added a COUNT(*) in my select query in order to see if I could get the total number of results at the first data.Read(), just to be sure that it was only the iteration of the results that was taking so long. And I was wrong : this new request executes in few seconds in SQLiteAdmin / SQLiteStudio, but takes hours to execute in my C# project. Any idea why the same query is so much longer to execute in my C# project?
EDIT 3 :
Thanks to EXPLAIN QUERY PLAN, I noticed that there was a slight difference in the execution plan for the same query between SQLiteAdmin / SQLiteStudio and my C# project. In the second case, it is using an AUTOMATIC PARTIAL COVERING INDEX on DB2Record instead of using the primary key index. Is there a way to ignore / disable the use of automatic partial covering indexes? I know it is used to speed up the queries, but in my case, it's rather the opposite that happens...
Thank you.
Besides finding matching records, it seems that you're also counting the number of times the strings matched. The result of this count is also used in the WHERE clause.
You want the number of matches, but the number of matches does not matter in the WHERE clause - you could try change the WHERE clause to:
WHERE MainRecord.Value LIKE '%FirstValueToSearch%' AND DB2Record.Value LIKE '%SecondValueToSearch%'
It might not result in any difference though - especially if there's no index on the Value columns - but worth a shot. Indexes on text columns require alot of space, so I wouldn't blindly recommend that.
If you haven't done so yet, place an index on the DB2's RecordID column.
You can use EXPLAIN QUERY PLAN SELECT ... to make SQLite spit out what it does to try to make your query perform, the output of that might help diagnose the problem.
Are you sure you use the same version of sqlite in System.Data.SQLite, SQLiteStudio and SQLiteAdmin ?
You can have huge differences.
One more typical reason why SQL query can take different amount of time when executed with ADO.NET and from native utility (like SQLiteAdmin) are command parameters used in CommandText (it is not clear from your code whether parameters are used or not). Depending on ADO.NET provider implementation the following identical CommandText values:
SELECT * FROM sometable WHERE somefield = ? // assume parameter is '2'
and
SELECT * FROM sometable WHERE somefield='2'
may lead to absolutely different execution plan and query performance.
Another suggestion: you may disable journal (specifying "Journal Mode=off;" in the connection string) and synchronous mode ("Synchronous=off;") as these options also may affect query performance in some cases.

Joining values in a WHERE comparison in Oracle

I have a HUGE query which I need to optimize. Before my coding it was like
SELECT [...] WHERE foo = 'var' [...]
executed 2000 times for 2000 different values of foo. We all know how slow it is. I managed to join all that different queries in
SELECT [...] WHERE foo = 'var' OR foo = 'var2' OR [...]
Of course, there are 2000 chained comparisons. The result is a huge query, executed a few seconds faster than before but not enough. I suppose the StringBuilder I am using takes a while in building the query, so the time earned by saving 1999 queries is wasted in this:
StringBuilder query = new StringBuilder();
foreach (string var in vars)
query.Append("foo = '").Append(var).Append("' OR ");
query.Remove(query.Length - 4) // for removing the last " OR "
So I would like to know if I could use some workaround for optimize the building of that string, maybe joining different values in the comparison with some SQL trick like
SELECT [...] WHERE foo = ('var' OR 'var2' OR [...])
so I can save some Append operations. Of course, any different idea trying to avoid that huge query at all will be more than welcome.
#Armaggedon,
For any decent DBMS, the IN () operator should correspond to a number of x OR y corresponding comparisons. About your concern about StringBuild.Append, its implementation is very efficient and you shouldn't notice any delay regarding this amount of data, if you have a few MB to spare for its temporary internal buffer. That said, I don't think your performance problem is related to these issues.
For database tuning it's always a far shot to propose solutions without the "full picture", but I think your problem might be related to compiling such a huge dynamic SQL statement. -- parsing and optimizing SQL statements can consume lots of processor time and it should be avoided.
Maybe you could improve the response time by moving your domain into an auxiliary indexed table. Or by moving the various checks over the same char column to a text search using INSTR functions:
-- 1. using domain table
SELECT myColumn FROM myTable WHERE foo IN (SELECT myValue FROM myDomain);
-- 2. using INSTR function
SELECT myColumn FROM myTable WHERE INSTR('allValues', foo, 1, 1) > 0;
Why not use the IN-operator as of IN-operator on W3school? It lets you combine your values in a much shorter way. You can also store the values in a temporary table as mentioned in this post to bypass the limit of 1000 rows on Oracle
It's been a while since I danced the Oracle dance, but I seem to remember a concept of "Bind Variables" - typically used for bulk insertions... I'm wondering if you could express the list of values as an array, and use that with IN...
Have to say - this is just an idea - I don't have time to research it further for you...

Evaluate SQL where clause in c# without SQL Server

I have a large set of conditions (I don't know them ahead of time) that I need count against a relatively small SQL Server table (< 10,000 rows). Each condition is in the form of a SQL where clause. Currently, I build a complete sql statement in the form of "select count(*) from some_table where " + Where_Clause; and I let SQL Server return the count to me. I do this in a loop to get all the counts I need for all the various conditions.
I'm looking for ways to speed this up when I have dozens or hundreds of counts I need to run. I don't know the statements ahead of time. Some will select the whole table, some may not select any rows. I've tried the following solutions:
Issuing the queries in parallel - saw minimal improvement even when not doing any locking
Writing them as complex counts on case statements so I can run multiple where clauses in one statement
Passing many queries together as UNIONs and getting the results back as one result set
None of these options have had a fantastic improvement in runtime (sometimes it would run more slowly) and with the added complexity, I don't feel any of them are worth it.
My question is: I can very quickly load the entire table into a DataTable object just by running one "select *" against it. If I had the whole table in memory, is there a way to run counts against it without going back and forth to SQL Server? I'm hoping to cut out the overhead of the: network, I/O, locking, etc.
The most complex where clause would be something like:
a=1 or b in (2,3) or c<4 or d like '%5' or substring(e,2,1)='z'
So it's not trivial and supporting as much as possible of T-SQL would be ideal, but I don't think DataTable's .Select() method supports this OR is it very fast. So given a table of data in memory, can I could rows using T-SQL syntax very fast (or in parallel)?
You could look into using LINQ to SQL, or possibly building the queries as Stored Procedures that you can execute.
Things to try
a) Make sure you've got the right indexes - check the query plan for the most complex query, and make sure there are no table scans in there.
b) Do multiple queries in the same request using something like this.
var cmd = new SqlCommand(#"SELECT TOP 10 * FROM Foo; SELECT TOP 10 * FROM Bar", con))
var ad = new SqlDataAdapter(cmd);
var ds = new DataSet();
ad.Fill(ds);
var foo = ds.Tables[0];
var bar = ds.Tables[1];

LINQ Query incredibly slow - why?

I got a very simple LINQ query:
List<table> list = ( from t in ctx.table
where
t.test == someString
&& t.date >= dateStartInt
&& t.date <= dateEndInt
select t ).ToList<table>();
The table which gets queried has got about 30 million rows, but the columns test and date are indexed.
When it should return around 5000 rows it takes several minutes to complete.
I also checked the SQL command which LINQ generates.
If I run that command on the SQL Server it takes 2 seconds to complete.
What's the problem with LINQ here?
It's just a very simple query without any joins.
That's the query SQL Profiler shows:
exec sp_executesql N'SELECT [t0].[test]
FROM [dbo].[table] AS [t0]
WHERE ([t0].[test] IN (#p0)) AND ([t0].[date] >= #p1)
AND ([t0].[date] <= #p2)',
N'#p0 nvarchar(12),#p1 int,#p2 int',#p0=N'123test',#p1=110801,#p2=110804
EDIT:
It's really weird. While testing I noticed that it's much faster now. The LINQ query now takes 3 seconds for around 20000 rows, which is quite ok.
What's even more confusing:
It's the same behaviour on our production server. An hour ago it was really slow, now it's fast again. As I was testing on the development server, I didn't change anything on the production server. The only thing I can think of being a problem is that both servers are virtualized and share the SAN with lots of other servers.
How can I find out if that's the problem?
Before blaming LINQ first find out where the actual delay is taking place.
Sending the query
Execution of the query
Receiving the results
Transforming the results into local types
Binding/showing the result in the UI
And any other events tied to this process
Then start blaming LINQ ;)
If I had to guess, I would say "parameter sniffing" is likely, i.e. it has built and cached a query plan based on one set of parameters, which is very suboptimal for your current parameter values. You can tackle this with OPTION (OPTIMIZE FOR UNKNOWN) in regular TSQL, but there is no LINQ-to-SQL / EF way of expolsing this.
My plan would be:
use profiling to prove that the time is being lost in the query (as opposed to materialization etc)
once confirmed, consider using direct TSQL methods to invoke
For example, with LINQ-to-SQL, ctx.ExecuteQuery<YourType>(tsql, arg0, ...) can be used to throw raw TSQL at the server (with parameters as {0} etc, like string.Format). Personally, I'd lean towards "dapper" instead - very similar usage, but a faster materializer (but it doesn't support EntityRef<> etc for lazy-loading values - which is usually a bad thing anyway as it leads to N+1).
i.e. (with dapper)
List<table> list = ctx.Query<table>(#"
select * from table
where test == #someString
and date >= #dateStartInt
and date <= #dateEndInt
OPTION (OPTIMIZE FOR UNKNOWN)",
new {someString, dateStartInt, dateEndInt}).ToList();
or (LINQ-to-SQL):
List<table> list = ctx.ExecuteQuery<table>(#"
select * from table
where test == {0}
and date >= {1}
and date <= {2}
OPTION (OPTIMIZE FOR UNKNOWN)",
someString, dateStartInt, dateEndInt).ToList();

Categories

Resources