Fine tune some SQL called multiple times

Fine tune some SQL called multiple times - c#

I currently have an SQL query which is currently called multiple times like this (pseudo code):
foreach(KeyValuePair kvp in someMapList)
{
select col1 from table where col2 = kvp.key and col3 = kvp.value;
//do some processing on the returned value
}
The above could obviously call the database a large number of times if it is a big list of pairs.
Can anyone think of a more efficient piece of SQL which could essentially return a list of specific values based on a list of two unique pieces of information so that the processing can be done in bulk? One obvious one would be to build up a big piece of SQL with ORs but this would probably be quite inefficient?
Thanks
Carl

Insert KeyValuePair into a table and use a JOIN man
or (if c#)
from record in table
join key in someMap on record.col2 equals key.key && record.col3 equals key.keyvalue
select col1 //or some anonymous type or a DTO I do not know...
That is WAY more efficeint:D

Assuming that you're stuck with the entity-value pair pattern, your best bet is to either create a stored procedure and pass in a delimited string of your pairs which you can then turn into a temporary table and select out using a join, or if you can pass in a table parameter that would be even better.
Another possibility would be to insert into a work table with some kind of unique connection ID for your values and then join against that.
In any case, your goal is to be able to select the data using a set-based method by getting your pairs into the database in some way.

Related

Split t-sql results into data tables, by non-numerical value in a field

I writing a C# program to output t-SQL records into separate tabs in an excel spreadsheet, split by the person the records belong to.
I have seen that I can have many data tables in a single data set, and turn each into a separate tab (how to store multiple DataTables into single DataSet in c#?), so now I need to populate my data tables.
I do not have a fixed list of people, it will vary each time the program is run, and a person could have any number of records assigned to them.
Is there a way of doing this using SQL / C# using something like order or group by; or do I have to get my results, pick up the list of people, then loop each SQL query for that specific person and feed that into a new data table?
Thought I'd ask if anyone knew a short way before I did it the long way, because this can't be an uncommon thing to do; so I suspect there must be a simpler way.

Normally you get one DataTable per SELECT statement.
However, you could just select everything and then use LINQ to group the data and fill your DataTables. See if this is any help.

It depends on the table structure, as well as for the source as for the destination.
If you have multiple source tables you can append them together with the UNION statement. Which gives the distinct value of all tables. You can use UNION ALL to keep duplicate values.
SELECT customer_key, customer_name, customer_address
FROM table_1
WHERE customer_key = #Customer_key
UNION (or UNION ALL)
SELECT customer_key, customer_name, customer_address
FROM table_2
WHERE Customer_key = #Customer_key
UNION etc..

SQL Server where using multiple data types

I couldn't come up with a better title.
I have a list of values like:
List<string> ids = new List<string>() {"1", "AND", "2", "NOT", "3"}
And a database table which contains the ids specified in the list above.
My problem is that I need to retrieve the data based on the ids, including the values which are not found in the table ( the operators AND and NOT for the list above).
For example, using the list above, after reading the db values for 1,2 and 3 a new list will be created like:
List<string> values = {"Value1", "AND", "Value2", "NOT", "Value3"}
The operators, AND, OR, NOT, ) and ( are just simple strings which are not found in MyTable.
Until now I was splitting the ids and for each value I do a trip to the db and get the value. The problem with this is that it takes a lot of time even for 2k lists. A list has around 10 values each.
What I was thinking about is to use the where in clause which will hopefully reduce the trips to the database. Unfortunately, I don't know how to handle the operators from the first list. (The operators are not saved in MyTable ).
I've tried using multiple where conditions but I cannot convert the operators (which are strings) to int (type of the Id column).
Queries I've tried:
SELECT Title
FROM MyTable where Id =1 or Id = cast('AND' as int)
And
SELECT Title
FROM MyTable where Id in (1,2,'AND')
Both of them fail (rightfully) because sql cannot convert AND to data type int.
What I want is to use the where clause (or any other) for the first list and where there is no result found in the db return the value used in the query.
For example, for the query:
SELECT Title
FROM MyTable where Id in (1,2,'AND')
I'd like to receive from the db Value1, Value2, AND. I know that the order is not guaranteed when executing the query.
How could I solve this problem efficiently? I'm using c# with sql server 2012 without any orm

First, you need to extract those list entries which need to be converted, i.e. your search keys, from the list.
Once done, you can query those like so:
select id, title from MyTable where id in ( /* list goes here*/)
Load the results of the query into a Dictionary<int,string>.
Then, go over the list a second time, replacing the keys with the values from your dictionary.
If you find that the SQL query performs badly you should look at indexes and query hints, or ask again for a faster query.
Why this might improve performance
The main reason is it will reduce trips to the database. However if the database is on the same physical machine this is unlikely to be significant.
Alternatively, if the list of keys and titles does not change often, and it's OK for it to be slightly out of date, you might load the whole thing into a dictionary. 100,000 entries is not that many - if they are 50 bytes each that's only 5 MB which is not very much on a modern server.

You can try to cast the Id in your SQL-Query
SELECT Title
FROM MyTable where cast(Id as nvarchar) in (1,2,'AND')
but this would greatly reduce your performance

Retrieving records from table where a column may have multiple rows of data

I have a database with several tables and I am using the following query to return a record that matches a string(Name).
In the MHP table there is a Name field(primary key), Num_Sites and a few more, but these are the only ones I am concerned with.
In the MHP_Parcel_Info table there are many fields with one of them being Name(foreign key). There is a parcel_id field and in some case there may only be one parcel for one name, but there may also be many parcels for a Name.
As it is now my query will return one of the rows for instances where there are multiple parcels for a name.
What I would like to do is: if there is more than one parcel for a Name, have all the parcels put into a list(so I can display in listbox on form).
My SQL skills are limited and I don’t know how I would go about doing something like this.
SELECT MHP_Parcel_Info.*, MHP.NUM_SITES FROM MHP_Parcel_Info INNER JOIN MHP ON " +
"(MHP_Parcel_Info.MHP_NAME = MHP.MHP_NAME) WHERE MHP_Parcel_Info.MHP_NAME='" + strValue + "'"

This is not something you can do directly in SQL. There's no way to select data in a parent/child structure in a SQL query - you have to do that as a post-processing step.
Since this is tagged as C# and Winforms I'm assuming this is from inside a .Net app. You will need to execute the query as you have it above, then in C# you can use the LINQ GroupBy extension method on the result to group the results into a list of IGrouping objects which use the name as the key, and has all of the parcel info as the items in the list.
Even better, if you are using (or can use) LINQ to SQL or Entity Framework you can just write a linq query that fetches the data from the database and does the grouping all at once.

T-SQL query timeout / performance issue

I have a table having around 1 million records. Table structure is shown below. The UID column is a primary key and uniqueidentifier type.
Table_A (contains a million records)
UID Name
-----------------------------------------------------------
E8CDD244-B8E4-4807-B04D-FE6FDB71F995 DummyRecord
I also have a function called fn_Split('Guid_1,Guid_2,Guid_3,....,Guid_n') which accepts a list of comma
seperated guids and gives back a table variable containing the guids.
From my application code I am passing a sql query to get new guids [Keys that are with application code but not in the database table]
var sb = new StringBuilder();
sb
.Append(" SELECT NewKey ")
.AppendFormat(" FROM fn_Split ('{0}') ", keyList)
.Append(" EXCEPT ")
.Append("SELECT UID from Table_A");
The first time this command is executed it times out on quite a few occassions. I am trying to figure out what would be a better approach here to avoid such timeouts and/or improve performance of this.
Thanks.

Firstly add an index if there isn't one, on table_a.uid, but i assume there is.
Some alternate queries to try,
select newkey
from fn_split
left outer join table_a
on newkey = uid
where uid IS NULL
select newkey
from fn_split(blah)
where newkey not in (select uid
from table_a)
select newkey
from fn_split(blah) f
where not exists(select uid
from table_a a
where f.newkey = a.uid)

There is plenty of info around here as to why you should not use a Guid for your primary key, especially if it in unordered. That would be the first thing to fix. As far as your query goes you might try what Paul or Tim suggested, but as far as I know EXCEPT and NOT IN will use the same execution plan, though the OUTER JOIN may be more efficint in some cases.

If you're using MS SQL 2008 then you can/should use TableValue Parameters. Essentially you'd send in your guids in the form of a DataTable to your stored procedure.
Then inside your stored procedure you can use the parameters as a "table" and do a join or EXCEPT or what have you to get your results.
This method is faster than using a function to split because functions in MS SQL server are really slow.
But I guess is the time is being taken due to massive Disk I/O this query requires. Since you're searching on your UId column and since they are "random" no index is going to help here. The engine will have to resort to a table scan. Which means you'll need some serious Disk I/O performance to get the results in "good time".
Using the Uid data type as in index is not recommended. However, it may not make a difference in your case. But let me ask you this:
The guids that you send in from your app, are in just a random list of guids or is here some business relationship or entity relationship here? It's possible, that your data model is not correct for what you are trying to do. So how do you determine what guids you have to search on?
However, for argument sake, let's assume your guids are just a random selection then there is no index that is really being used since the database engine will have to do a table scan to pick out each of the required guids/records from the million records you have. In a situation like this the only way to speed things up is at the physical database level, that is how your data is physically stored on the hard drives etc.
For example:
Having faster drives will improve performance
If this kind of query is being fired over and over then more memory on the box will help because the engine can cache the data in memory and it won't need to do physical reads
If you partition your table then the engine can parallelize the the seek operation and get you results faster.
If your table contains a lot of other fields that you don't always need, then spliting the table in two tables where table1 contains the guid and the bare minimum set of fields and table2 contains the rest will speed up the query quite a bit due to the disk I/O demands being less
Lot's of other things to look at here
Also note that when you send in adhoc SQL statements that don't have parameters the engine has to create a plan each time you execute it. In this case it's not a big deal but keep in mind that each plan will be cached in memory thus pushing out any data that might have been cached.
Lastly you can always increase the commandTimeOut property in this case to get past the timeout issues.
How much time does it take now and what kind of improvement are you looking to get ot hoping to get?

If I understand your question correctly, in your client code you have a comma-delimited string of (string) GUIDs. These GUIDS are usable by the client only if they don't already exist in TableA. Could you invoke a SP which creates a temporary table on the server containing the potentially usable GUIDS, and then do this:
select guid from #myTempTable as temp
where not exists
(
select uid from TABLEA where uid = temp.guid
)
You could pass your string of GUIDS to the SP; it would populate the temp table using your function; and then return an ADO.NET DataTable to the client. This should be very easy to test before you even bother to write the SP.

I am questioning what you do with this information.
If you insert the keys into this table afterwards you could simply try to insert them on first hand - that's much faster and more solid in a multi-user environment then query first insert later:
create procedure TryToInsert #GUID uniqueidentifier, #Name varchar(n) as
begin try
insert into Table_A (UID,Name)
values (#GUID, #Name);
return 0;
end try
begin catch
return 1;
end;
In all cases you can split the KeyList at the client to get faster results - and you could query the keys that are not valid:
select UID
from Table_A
where UID in ('new guid','new guid',...);
If the GUID are random you should use newsequentialid() with you clustered primary key:
create table Table_A (
UID uniqueidentifier default newsequentialid() primary key,
Name varchar(n) not null
);
With this you can insert and query your newly inserted data in one step:
insert into Table_A (Name)
output inserted.*
values (#Name);
... just my two cents

In any case, are not GUIDs intrinsically engineered to be, for all intents and purposes, unique? (i.e. universally unique -- doesn't matter where generated). I wouldn't even bother to do the test beforehand; just insert your row with the GUID PK and if the insert should fail, discard the GUID. But it should not fail, unless these are not truly GUIDs.
http://en.wikipedia.org/wiki/GUID
http://msdn.microsoft.com/en-us/library/ms190215.aspx
It seems you are doing a lot of unnecessary work, but perhaps I don't grasp your application requirement.

Enhance performance of large slow dataloading query

I'm trying to load data from oracle to sql server (Sorry for not writing this before)
I have a table(actually a view which has data from different tables) with 1 million records atleast. I designed my package in such a way that i have functions for business logics and call them in select query directly.
Ex:
X1(id varchar2)
x2(id varchar2, d1 date)
x3(id varchar2, d2 date)
Select id, x, y, z, decode (.....), x1(id), x2(id), x3(id)
FROM Table1
Note: My table has 20 columns and i call 5 different functions on atleast 6-7 columns.
And some functions compare the parameters passed with audit table and perform logic
How can i improve performance of my query or is there a better way to do this
I tried doing it in C# code but initial select of records is large enough for dataset and i get outofmemory exception.
my function does selects and then performs logic for example:
Function(c_x2, eid)
Select col1
into p_x1
from tableP
where eid = eid;
IF (p_x1 = NULL) THEN
ret_var := 'INITIAL';
ELSIF (p_x1 = 'L') AND (c_x2 = 'A') THEN
ret_var:= 'RL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'RL', eid, 'PackageProcName');
ELSIF (p_x1 = 'A') AND (c_x2 = 'L') THEN
ret_var := 'GL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'GL', eid, 'PackgProcName');
END IF;
RETURN ret_var;

i'm getting each row and performing
logic in C# and then inserting
If possible INSERT from the SELECT:
INSERT INTO YourNewTable
(col1, col2, col3)
SELECT
col1, col2, col3
FROM YourOldTable
WHERE ....
this will run significantly faster than a single query where you then loop over the result set and have an INSERT for each row.
EDIT as for the OP question edit:
you should be able to replace the function call to plain SQL in your query. Mimic the "initial" using a LEFT JOIN tableP, and the "RL" or "GL" can be calculated using CASE.
EDIT based on OP recent comments:
since you are loading data from Oracle into SQL Server, this is what I would do: most people that could help have moved on and will not read this question again, so open a new question where you say: 1) you need to load data from Oracle (version) to SQL Server Version 2) currently you are loading it from one query processing each row in C# and inserting it into SQL Server, and it is slow. and all the other details. There are much better ways of bulk loading data into SQL Server. As for this question, you could accept an answer, answer yourself where you explain you need to ask a new question, or just leave it unaccepted.

My recommendation is that you do not use functions and then call them within other SELECT statements. This:
SELECT t.id, ...
x1(t.id) ...
FROM TABLE t
...is equivalent to:
SELECT t.id, ...
(SELECT x.column FROM x1 x WHERE x.id = t.id)
FROM TABLE t
Encapsulation doesn't work in SQL like when using C#/etc. While the approach makes maintenance easier, performance suffers because sub selects will execute for every row returned.
A better approach would be to update the supporting function to include the join criteria (IE: "where x.id = t.id" for lack of real one) in the SELECT:
SELECT x.id
x.column
FROM x1 x
...so you can use it as a JOIN:
SELECT t.id, ...
x1.column
FROM TABLE t
JOIN (SELECT x.id,
x.column
FROM MY_PACKAGE.x) x1 ON x1.id = t.id
I prefer that to having to incorporate the function logic into the queries for sake of maintenance, but sometimes it can't be helped.

Personally I'd create an SSIS import to do this task. USing abulk insert you can imporve speed dramitcally and SSIS can handle the functions part after the bulk insert.

Firstly you need to find where the performance problem actually is. Then you can look at trying to solve it.
What is the performance of the view like? How long does it take the view to execute
without any of the function calls? Try running the command
How well does it perform? Does it take 1 minute or 1 hour?
create table the_view_table
as
select *
from the_view;
How well do the functions perform? According to the description you are making approximately 5 million function calls. They had better be pretty efficient! Also are the functions defined as deterministic. If the functions are defined using the deterministic keyword, the Oracle has a chance of optimizing away some of the calls.
Is there a way of reducing the number of function calls? The function are being called once the view has been evaluated and the million rows of data are available. BUT are all the input values from the highest level of the query? Can the function calls be imbeded into the view at a lower level. Consider the following two queries. Which would be quicker?
select
f.dim_id,
d.dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from large_fact_table f
join small_dim_table d on (f.dim_id = d.dim_id)
select
f.dim_id,
d.dim_col_1,
d.dim_col_2
from large_fact_table f
join (
select
dim_id,
dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from small_dim_table) d on (f.dim_id = d.dim_id)
Ideally the second query should run quicker as it calling the function fewer times.
The performance issue could be in any of these places and until you investigate the issue, it would be difficult to know where to direct your tuning efforts.

A couple of tips:
Don't load all records into RAM but process them one by one.
Try to run as many functions on the client as possible. Databases are really slow to execute user defined functions.
If you need to join two tables, it's sometimes possible to create two connections on the client. Fetch the data main data with connection 1 and the audit data with connection 2. Order the data for both connections in the same way so you can read single records from both connections and perform whatever you need on them.
If your functions always return the same result for the same input, use a computed column or a materialized view. The database will run the function once and save it in a table somewhere. That will make INSERT slow but SELECT quick.

Create a sorted intex on your table.
Introduction to SQL Server Indizes, other RDBMS are similar.
Edit since you edited your question:
Using a view is even more sub-optimal, especially when querying single rows from it. I think your "busines functions" are actually something like stored procedures?
As others suggested, in SQL always go set based. I assumed you already did that, hence my tip to start using indexing.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.