I have to get information from two databases. The one is an ORACLE and the other is a DB2. In my program (C#) I get in the first step the base informations of my objects from the ORACLE database. In the second step I want to add the information which are saved in the DB2. The table in the DB2 has composite primary key and I'm not sure which is the best way to request or if there is an alternative that I don't see at the moment.
For example: COLUMN1 and COLUMN2 are the composite primary key.
Variant 1:
SELECT *
FROM (SELECT COLUMN1, COLUNN2, COLUMN3, ..., COLUMN1||'_'||COLUMN2 AS ID
FROM TABLE1) AS TEMP
WHERE ID='2011_123456'
OR ID='2011_987654'
Here I think the disadvantage is that for each row in the table the string concatenation is build and also the execution speed is comparatively slow because the primary key columns are indexed and the new one is not.
Variant 2:
SELECT COLUMN1, COLUMN2, COLUMN3, ..., COLUMN1||'_'||COLUMN2 AS ID
FROM TABLE1
WHERE (COLUMN1='2011' AND COLUMN2='123456')
OR (COLUMN1='2011' AND COLUMN2='987654')
This one is really fast but anytime I get an exception SQL0954C (Not enough storage is available in the application heap to process the statement).
Variant 3:
SELECT COLUMN1, COLUMN2, COLUMN3, ..., COLUMN1||'_'||COLUMN2 AS ID
FROM TABLE1
WHERE COLUMN1 IN ('2011')
AND COLUMN2 IN ('123456','987654')
This one is also slow in comparison to variant 2.
Some more numbers: TABLE1 has at the moment approx. 600k rows
I tried the variants and got the following execution times:
For 100 requested objects:
Variant 1: 3900ms
Variant 2: 218ms
For 400 requested objects:
Variant 1: 10983ms
Variant 2: 266ms
For 500 requested objects:
Variant 1: 12796ms
Variant 2: exception SQL0954C
Variant 3: 7061ms
Only looking on the times I would prefer variant 2 but there is the problem with the exception.
The databases are not under my control and I have only SELECT rights. What do you think is the best for this use case? Are there any other possibilities that I don't see?
Regards,
pkoeppe
Could you do a modification to variant 2 that
defined a cursor
bulk collected 100 rows (for example) into a pl/sql table
do your processing
fetch the next 100 rows
For example see http://oracletoday.blogspot.com/2005/11/bulk-collect_15.html
I had a problem very similar to this with Oracle and Informix.
SQL0954C can be resolved by tweaking your system's configuration. Have you explored that avenue yet? Find out more.
For variant 3, change
SELECT COLUMN1, COLUMN2, COLUMN3, ..., COLUMN1||'_'||COLUMN2 AS ID
FROM TABLE1
WHERE COLUMN1 IN ('2011')
AND COLUMN2 IN ('123456','987654')
To:
SELECT COLUMN1, COLUMN2, COLUMN3, ..., COLUMN1||'_'||COLUMN2 AS ID
FROM TABLE1
WHERE COLUMN1 ='2011'
AND COLUMN2 IN ('123456','987654')
If you're only searching for one value for COLUMN1 there's no reason to use IN.
Both variant 2 and 3 are sane. 1 is not sane.
As the computed column ID in 1 is not in any index the DB will be forced to do at least a full index scan. In 2 and 3 the DB can use indexes on both column1 and column2 to filter the result.
To find out whether 2 or 3 are best you need to study the execution plans for those queries.
One more note about indexes. Relevant indexes will be much more important than the difference between 2 and 3. Even if you only have select rights you can maybe suggest a composite index on (column1,column2) to the DBA if there are no such indexes already.
Edit
Another common approach when you have many values in WHERE COL IN (...) is to create a temp table (if you have permission) with all the values and join with that temp table instead. Sometimes you also need to create an index on the temp table to make it perform well.
In some DBMS:s you can use table valued parameters instead of temp tables, but I can not find anything like that for DB2.
Related
I've searched through numerous threads to try to find an answer to this but any answer I've found suggests using a unique constraint on a single column, or multiple columns.
My problem is, I'm writing an application in C# with a SQL Server back end. One of the features is to allow a user to import a .CSV file into the database after a little bit of pre-processing. I need to find the quickest method to prevent the user from importing the same data more than once. The data will look something like
ID -- will be auto-generated in SQL Server (PK)
Date Time(datetime)
Machine(nchar)
...
...
...
Name(nchar)
Age(int)
I want to allow any number of the columns to be duplicate values, a long as the entire record is not.
I was thinking of creating another column in the database, obtained by hashing all of the columns together and making it unique but want sure if that was the most efficient method, or if the resulting hash would be guaranteed unique. The CSV files will only be around 60 MB, but there will be tens of thousands of them.
Any help would be appreciated.
Thanks
You should be able to resolve this by creating a unique constraint which includes all the columns.
create table #a (col1 varchar(10), col2 varchar(10))
ALTER TABLE #a
ADD CONSTRAINT UQ UNIQUE NONCLUSTERED
(col1, col2)
-- Works, duplicate entries in columns
insert into #a (col1, col2)
values ('a', 'b')
,('a', 'c')
,('b', 'c')
-- Fails, full duplicate record:
insert into #a (col1, col2)
values ('a1', 'b1')
,('a1', 'b1')
The code below can work to ensure that you don't duplicate the [Date Time], Machine, [Name] and Age columns when you insert the data.
It's important to ensure that at the time of running the code, each row of the incoming dataset has a unique ID on it. This code just fails to shift any rows where the ID gets selected because all four other values are already duplicated in the destination table.
INSERT INTO MAIN_TABLE ([Date Time],Machine,[Name],Age)
SELECT [Date Time],Machine,[Name],Age
FROM IMPORT_TABLE WHERE ID NOT IN
(
SELECT I.ID FROM IMPORT_TABLE I INNER JOIN MAIN_TABLE M
ON I.[Date Time]=M.[Date Time]
AND I.Machine=M.Machine
AND I.[Name]=M.[Name]
AND I.Age=M.Age
)
I have this situation:
I have two tables:
Table A
Staging_Table A
Both tables contain those common columns:
Code
Description
Into Table A I also have a column Version which identifies the last version of corresponding column Code.
My problem is how to update the column Version once a new Description is stored for the same Code (I fill up the Staging_Table with a bulk Insert from C#. I have a flow of data that change once a week).
I need to insert the new row into Table A which contain the same Code, but a different Description, without deleting the old one.
I insert the rows from Staging table to table A with MINUS operation and I have this mechanism within a stored procedure because I also fill up the staging table with a Bulk Insert from C#.
The result I need to obtain is the following:
TABLE A:
Id Code Description Version End_date
-- ----------------- ------- --------
1 8585 Red Car 1 26-mag-2015
2 8585 Red Car RRRR 2 01-giu-2015
How can I do that?
I hope the issue is clear
If I understand correctly process work like that:
1. Data is loaded to staging table Staging_table_A
2. Data is inserted from Staging_table_A itno Table_A with additional column version.
I would do:
with cnt as (select count(*) c, code from Table_A group by code)
Insert into Table_A (select sta.*, nvl(cnt.c,0) + 1 as version
from Staging_table_A sta left outer join cnt on (sta.code = cnt.code));
This is based on condition that in Table_A versions contains no duplicates.
TABLE 1 (ID,PID,PNO) contains start and end point for eg; (A, B). with Primary Key (ID, PID) Foreign Key (ID)
TABLE 2 (ID,PNO) contains middle point information in order (a1,a2 ... bn-1, bn). with Primary Key (ID)
I am trying to join them in such a way that i can get [A, a1, a2 ... bn-1 , bn, B].
I fetched data using
SELECT PNO FROM TABLE2 WHERE ID= 123 UNION SELECT PNO FROM TABLE1 WHERE ID= 123
and tried it in C# code by fetching all data and then adding condition's and reordering them. This attempt is two lengthy.
Apart from this is there a way to join these two tables to get the result set.
Note : These tables are related to each other by common field ID, and The PID in TABLE1 one has two distinct value's like 1 for start and 2 for end. based on this the PNO with 1 should come first and PNO with 2 is expected to come at end.
"tried it in C# code by fetching all data and then adding condition's and reordering them".
This is usually a very bad idea, especially if you have a lot of network activity. SQL is very good at manipulating data: in fact it is optimized for that task. So, try passing the conditions to the database as a WHERE clause, using an ORDER BY to sort the final result set and returning just the rows you need. This could have a big impact on the total elapsed time if there is a large difference between the number of rows in the raw database set and the final C# set.
Other things. If you're still finding this too slow you have a standard tuning problem. You haven't provided any of the hard information necessary to give a definitive solution, so here are some guesses.
You want all the records for an ID so there isn't a more efficient way of joining the two intermediary result sets to get a final set. But if the two sets are exclusive - that is, if the endpoints in Table1 are not included in the points from Table2 (your question isn't completely clear on this) - a UNION ALL would be more efficient:
SELECT PNO FROM TABLE2 WHERE ID= 123
UNION ALL
SELECT PNO FROM TABLE1 WHERE ID= 123
That's because UNION makes an additional operation to produce a distinct set of values. Skipping that step will save you some time.
An index on table2 ( ID, PNO) could speed up retrieval times by avoiding the need to touch the table at all. Whether it's worth the overhead of maintaining an index depends on how often you want to run this query and how you load Table2. It also depends on what further filters you apply if you act on my opening paragraph.
I have tried but I get error:
SubQuery are not allowed in this context message comes.
I have two tables Product and Category and want to use categoryId base on CategoryName.
The query is
Insert into Product(Product_Name,Product_Model,Price,Category_id)
values(' P1','M1' , 100, (select CategoryID from Category where Category_Name=Laptop))
Please tell me a solution with code.
(you didn't clearly specify what database you're using - this is for SQL Server but should apply to others as well, with some minor differences)
The INSERT command comes in two flavors:
(1) either you have all your values available, as literals or SQL Server variables - in that case, you can use the INSERT .. VALUES() approach:
INSERT INTO dbo.YourTable(Col1, Col2, ...., ColN)
VALUES(Value1, Value2, #Variable3, #Variable4, ...., ValueN)
Note: I would recommend to always explicitly specify the list of column to insert data into - that way, you won't have any nasty surprises if suddenly your table has an extra column, or if your tables has an IDENTITY or computed column. Yes - it's a tiny bit more work - once - but then you have your INSERT statement as solid as it can be and you won't have to constantly fiddle around with it if your table changes.
(2) if you don't have all your values as literals and/or variables, but instead you want to rely on another table, multiple tables, or views, to provide the values, then you can use the INSERT ... SELECT ... approach:
INSERT INTO dbo.YourTable(Col1, Col2, ...., ColN)
SELECT
SourceColumn1, SourceColumn2, #Variable3, #Variable4, ...., SourceColumnN
FROM
dbo.YourProvidingTableOrView
Here, you must define exactly as many items in the SELECT as your INSERT expects - and those can be columns from the table(s) (or view(s)), or those can be literals or variables. Again: explicitly provide the list of columns to insert into - see above.
You can use one or the other - but you cannot mix the two - you cannot use VALUES(...) and then have a SELECT query in the middle of your list of values - pick one of the two - stick with it.
So in your concrete case, you'll need to use:
INSERT INTO dbo.Product(Product_Name, Product_Model, Price, Category_id)
SELECT
' P1', 'M1', 100, CategoryID
FROM
dbo.Category
WHERE
Category_Name = 'Laptop'
Try like this
Insert into Product
(
Product_Name,
Product_Model,
Price,Category_id
)
Select
'P1',
'M1' ,
100,
CategoryID
From
Category
where Category_Name='Laptop'
Try this:
DECLARE #CategoryID BIGINT = (select top 1 CategoryID from Category where Category_Name='Laptop')
Insert into Product(Product_Name,Product_Model,Price,Category_id)
values(' P1','M1' , 100, #CategoryID)
I am receiving a large list of current account numbers daily, and storing them in a database. My task is to find added and released accounts from each file. Right now, I have 4 SQL tables, (AccountsCurrent, AccountsNew, AccountsAdded, AccountsRemoved). When I receive a file, I am adding it entirely to AccountsNew. Then running the below queries to find which we added and removed.
INSERT AccountsAdded(AccountNum, Name) SELECT AccountNum, Name FROM AccountsNew WHERE AccountNumber not in (SELECT AccountNum FROM AccountsCurrent)
INSERT AccountsRemoved(AccountNum, Name) SELECT AccountNum, Name FROM AccountsCurrent WHERE AccountNumber not in (SELECT AccountNum FROM AccountsNew)
TRUNCATE TABLE AccountsCurrent
INSERT AccountsCurrent(AccountNum, Name) SELECT AccountNum, Name FROM AccountsNew
TRUNCATE TABLE AccountsNew
Right now, I am differencing about 250,000 accounts, but this is going to keep growing. Is this the best method, do you have any other ideas?
EDIT:
This is an MSSQL 2000 database. I'm using c# to process the file.
The only data I am focused on is the accounts that were added and removed between the last and current files. The AccountsCurrent, is only used to determine what accounts were added or removed.
To be honest, I think that I'd follow something like your approach. One thing is that you could remove the truncate, do a rename of the "new" to "current" and re-create "new".
Sounds like a history/audit process that might be better done using triggers. Have a separate history table that captures changes (e.g., timestamp, operation, who performed the change, etc.)
New and deleted accounts are easy to understand. "Current" accounts implies that there's an intermediate state between being new and deleted. I don't see any difference between "new" and "added".
I wouldn't have four tables. I'd have a STATUS table that would have the different possible states, and ACCOUNTS or the HISTORY table would have a foreign key to it.
Using IN clauses on long lists can be slow.
If the tables are indexed, using a LEFT JOIN can prove to be faster...
INSERT INTO [table] (
[fields]
)
SELECT
[fields]
FROM
[table1]
LEFT JOIN
[table2]
ON [join condition]
WHERE
[table2].[id] IS NULL
This assumes 1:1 relationships and not 1:many. If you have 1:many you can do any of...
1. SELECT DISTINCT
2. Use a GROUP BY clause
3. Use a different query, see below...
INSERT INTO [table] (
[fields]
)
SELECT
[fields]
FROM
[table1]
WHERE
EXISTS (SELECT * FROM [table2] WHERE [condition to match tables 1 and 2])
-- # This is quick provided that all fields to match the two tables are
-- # indexed in both tables. Should then be much faster than the IN clause.
You could also subtract the intersection to get the differences in one table.
If the initial file is ordered in a sensible and consistent way (big IF!), it would run considerably faster as a C# program which logically compared the files.