I am new to ASP.net and the concept of UDT in it. I used to work on PHP and so I am having difficulty in understanding the UDT concept.
This is the stored procedure written to insert data from input forms to the database(SQL Server).
The code is working fine and is written by senior developers in my company.
CREATE Procedure [dbo].[Save_Supplier]
#Supplier_UDT Supplier_UDT Readonly,
#UserName varchar(80)
AS
Begin
-------------Block 1------
Declare #TP Table(ID int,Suppliercode varchar(80),Suppliername varchar(80),GSTVATNumber int,Description varchar(80),Productlist varchar(80),Bankdetails varchar(80),
pymenttermdescription varchar(80),Currency int,Pendingpayement Varchar(80),pendingorders int,Active bit )
-------------Block 2------
Insert into #TP(ID ,Suppliercode ,Suppliername,GSTVATNumber,Description ,Productlist,Bankdetails,pymenttermdescription,Currency,Pendingpayement ,pendingorders ,Active)
select ID ,Suppliercode ,Suppliername,GSTVATNumber,Description ,Productlist,Bankdetails,pymenttermdescription,Currency,Pendingpayement ,pendingorders ,Active from #Supplier_UDT
-------------Block 3------
Update Supplier
set
Suppliercode=a.Suppliercode ,
Suppliername=a.Suppliername
,GSTVATNumber=a.GSTVATNumber
,Description =a.Description
,Productlist=a.Productlist
,Bankdetails=a.Bankdetails
,pymenttermdescription=a.pymenttermdescription
,Currency=a.Currency
,Pendingpayement=a.Pendingpayement
,pendingorders=a.pendingorders
,Active=a.Active
from #TP a inner join Supplier
on a.ID=Supplier.ID
-------------Block 4------
Insert into Supplier(Suppliercode ,Suppliername,GSTVATNumber,Description ,Productlist,Bankdetails,pymenttermdescription,Currency,Pendingpayement ,pendingorders ,Active)
select Suppliercode ,Suppliername,GSTVATNumber,Description ,Productlist,Bankdetails,pymenttermdescription,Currency,Pendingpayement ,pendingorders ,Active
from #TP where ID not in (select ID from Supplier) and Suppliercode!=''
Upto my understanding, Block 1 is simply declaration the structure of temporary table/variable.
In Block 2, the the user passed input data in stored in the temporary table/variable.
I am having difficulty in understanding Block 3 and Block 4
I don't understand what is the UPDATE query doing before the INSERT query?
What is the purpose of Block 3 and Block 4?
(The code is working all fine, without errors.)
[1] First thing I would like to notice about this source code isn't the usage of another table variable (#TP) but the missing of transaction management and also missing of error handling. There are at least two statements (last two: UPDATE and INSERT) under the risk of generating exceptions / errors at statement level (for example).
[2] I don't see any reason to use one more table variable (#TP), the first one being the parameter #Supplier_UDT Supplier_UDT. It'll create/increase tempdb contention and from developer point of view will create another dependency (for example: if we are going to change data type of one of those columns within dbo.Supplier table then we'll have to update also this stored procedure and definition of #TP column.
[3] Note: both table variables (#TP and #Supplier_UDT) have the same columns or (at least) a set of common columns: ID ,Suppliercode ,Suppliername,GSTVATNumber,Description ,Productlist,Bankdetails,pymenttermdescription,Currency,Pendingpayement ,pendingorders ,Active. Isn't clear if data types, NULL-ability and constraints are the same.
[4] Block 3 & 4 seems to be an implementation of UPSERT pattern but for many rows (note: most of examples for UPSERT are using just one row). This means that for those suppliers that already exist in dbo.Supplier table (SQL schema should be mandatory) UPDATE statement will change/update following columns SupplierCode, SupplierName, ... with the latest values and new suppliers are INSERTed into dbo.Supplier table.
As Dan Guzman already mentioned within his comment (+1), instead of these two statements (UPDATE and INSERT), a single MERGE statement could be used:
MERGE dbo.Supplier WITH(HOLDLOCK) AS dst -- Destination table
USING #Supplier_UDT AS src ON dst.ID = src.ID -- Source table
WHEN MATCHED THEN
UPDATE
SET
Suppliercode = a.Suppliercode ,
Suppliername = a.Suppliername,
GSTVATNumber = a.GSTVATNumber,
Description = a.Description,
Productlist = a.Productlist,
Bankdetails = a.Bankdetails,
pymenttermdescription = a.pymenttermdescription,
Currency = a.Currency,
Pendingpayement = a.Pendingpayement,
pendingorders = a.pendingorders,
Active = a.Active
WHEN NOT MATCHED AND dst.Suppliercode != '' THEN -- Please make sure that Suppliercode refers to destination table and not to source table
INSERT (Suppliercode ,Suppliername,GSTVATNumber,Description ,Productlist,Bankdetails,pymenttermdescription,Currency,Pendingpayement ,pendingorders ,Active)
VALUES (Suppliercode ,Suppliername,GSTVATNumber,Description ,Productlist,Bankdetails,pymenttermdescription,Currency,Pendingpayement ,pendingorders ,Active);
[5] Why I would use HOLDLOCK table hint ? See Dan Guzman's blog: http://weblogs.sqlteam.com/dang/archive/2009/01/31/UPSERT-Race-Condition-With-MERGE.aspx
[6] Also, there are some bugs regarding MERGE statement described here:
https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/
Some of them are more or less serious.
[7] If it ain't broke, don't fix it
Related
How am I supposed to get the IDENTITY of an inserted row?
I know about ##IDENTITY and IDENT_CURRENT and SCOPE_IDENTITY, but don't understand the implications or impacts attached to each.
Can someone please explain the differences and when I would be using each?
##IDENTITY returns the last identity value generated for any table in the current session, across all scopes. You need to be careful here, since it's across scopes. You could get a value from a trigger, instead of your current statement.
SCOPE_IDENTITY() returns the last identity value generated for any table in the current session and the current scope. Generally what you want to use.
IDENT_CURRENT('tableName') returns the last identity value generated for a specific table in any session and any scope. This lets you specify which table you want the value from, in case the two above aren't quite what you need (very rare). Also, as #Guy Starbuck mentioned, "You could use this if you want to get the current IDENTITY value for a table that you have not inserted a record into."
The OUTPUT clause of the INSERT statement will let you access every row that was inserted via that statement. Since it's scoped to the specific statement, it's more straightforward than the other functions above. However, it's a little more verbose (you'll need to insert into a table variable/temp table and then query that) and it gives results even in an error scenario where the statement is rolled back. That said, if your query uses a parallel execution plan, this is the only guaranteed method for getting the identity (short of turning off parallelism). However, it is executed before triggers and cannot be used to return trigger-generated values.
I believe the safest and most accurate method of retrieving the inserted id would be using the output clause.
for example (taken from the following MSDN article)
USE AdventureWorks2008R2;
GO
DECLARE #MyTableVar table( NewScrapReasonID smallint,
Name varchar(50),
ModifiedDate datetime);
INSERT Production.ScrapReason
OUTPUT INSERTED.ScrapReasonID, INSERTED.Name, INSERTED.ModifiedDate
INTO #MyTableVar
VALUES (N'Operator error', GETDATE());
--Display the result set of the table variable.
SELECT NewScrapReasonID, Name, ModifiedDate FROM #MyTableVar;
--Display the result set of the table.
SELECT ScrapReasonID, Name, ModifiedDate
FROM Production.ScrapReason;
GO
I'm saying the same thing as the other guys, so everyone's correct, I'm just trying to make it more clear.
##IDENTITY returns the id of the last thing that was inserted by your client's connection to the database.
Most of the time this works fine, but sometimes a trigger will go and insert a new row that you don't know about, and you'll get the ID from this new row, instead of the one you want
SCOPE_IDENTITY() solves this problem. It returns the id of the last thing that you inserted in the SQL code you sent to the database. If triggers go and create extra rows, they won't cause the wrong value to get returned. Hooray
IDENT_CURRENT returns the last ID that was inserted by anyone. If some other app happens to insert another row at an unforunate time, you'll get the ID of that row instead of your one.
If you want to play it safe, always use SCOPE_IDENTITY(). If you stick with ##IDENTITY and someone decides to add a trigger later on, all your code will break.
The best (read: safest) way to get the identity of a newly-inserted row is by using the output clause:
create table TableWithIdentity
( IdentityColumnName int identity(1, 1) not null primary key,
... )
-- type of this table's column must match the type of the
-- identity column of the table you'll be inserting into
declare #IdentityOutput table ( ID int )
insert TableWithIdentity
( ... )
output inserted.IdentityColumnName into #IdentityOutput
values
( ... )
select #IdentityValue = (select ID from #IdentityOutput)
Add
SELECT CAST(scope_identity() AS int);
to the end of your insert sql statement, then
NewId = command.ExecuteScalar()
will retrieve it.
From MSDN
##IDENTITY, SCOPE_IDENTITY, and IDENT_CURRENT are similar functions in that they return the last value inserted into the IDENTITY column of a table.
##IDENTITY and SCOPE_IDENTITY will return the last identity value generated in any table in the current session. However, SCOPE_IDENTITY returns the value only within the current scope; ##IDENTITY is not limited to a specific scope.
IDENT_CURRENT is not limited by scope and session; it is limited to a specified table. IDENT_CURRENT returns the identity value generated for a specific table in any session and any scope. For more information, see IDENT_CURRENT.
IDENT_CURRENT is a function which takes a table as a argument.
##IDENTITY may return confusing result when you have an trigger on the table
SCOPE_IDENTITY is your hero most of the time.
When you use Entity Framework, it internally uses the OUTPUT technique to return the newly inserted ID value
DECLARE #generated_keys table([Id] uniqueidentifier)
INSERT INTO TurboEncabulators(StatorSlots)
OUTPUT inserted.TurboEncabulatorID INTO #generated_keys
VALUES('Malleable logarithmic casing');
SELECT t.[TurboEncabulatorID ]
FROM #generated_keys AS g
JOIN dbo.TurboEncabulators AS t
ON g.Id = t.TurboEncabulatorID
WHERE ##ROWCOUNT > 0
The output results are stored in a temporary table variable, joined back to the table, and return the row value out of the table.
Note: I have no idea why EF would inner join the ephemeral table back to the real table (under what circumstances would the two not match).
But that's what EF does.
This technique (OUTPUT) is only available on SQL Server 2008 or newer.
Edit - The reason for the join
The reason that Entity Framework joins back to the original table, rather than simply use the OUTPUT values is because EF also uses this technique to get the rowversion of a newly inserted row.
You can use optimistic concurrency in your entity framework models by using the Timestamp attribute: 🕗
public class TurboEncabulator
{
public String StatorSlots)
[Timestamp]
public byte[] RowVersion { get; set; }
}
When you do this, Entity Framework will need the rowversion of the newly inserted row:
DECLARE #generated_keys table([Id] uniqueidentifier)
INSERT INTO TurboEncabulators(StatorSlots)
OUTPUT inserted.TurboEncabulatorID INTO #generated_keys
VALUES('Malleable logarithmic casing');
SELECT t.[TurboEncabulatorID], t.[RowVersion]
FROM #generated_keys AS g
JOIN dbo.TurboEncabulators AS t
ON g.Id = t.TurboEncabulatorID
WHERE ##ROWCOUNT > 0
And in order to retrieve this Timetsamp you cannot use an OUTPUT clause.
That's because if there's a trigger on the table, any Timestamp you OUTPUT will be wrong:
Initial insert. Timestamp: 1
OUTPUT clause outputs timestamp: 1
trigger modifies row. Timestamp: 2
The returned timestamp will never be correct if you have a trigger on the table. So you must use a separate SELECT.
And even if you were willing to suffer the incorrect rowversion, the other reason to perform a separate SELECT is that you cannot OUTPUT a rowversion into a table variable:
DECLARE #generated_keys table([Id] uniqueidentifier, [Rowversion] timestamp)
INSERT INTO TurboEncabulators(StatorSlots)
OUTPUT inserted.TurboEncabulatorID, inserted.Rowversion INTO #generated_keys
VALUES('Malleable logarithmic casing');
The third reason to do it is for symmetry. When performing an UPDATE on a table with a trigger, you cannot use an OUTPUT clause. Trying do UPDATE with an OUTPUT is not supported, and will give an error:
Cannot use UPDATE with OUTPUT clause when a trigger is on the table
The only way to do it is with a follow-up SELECT statement:
UPDATE TurboEncabulators
SET StatorSlots = 'Lotus-O deltoid type'
WHERE ((TurboEncabulatorID = 1) AND (RowVersion = 792))
SELECT RowVersion
FROM TurboEncabulators
WHERE ##ROWCOUNT > 0 AND TurboEncabulatorID = 1
I can't speak to other versions of SQL Server, but in 2012, outputting directly works just fine. You don't need to bother with a temporary table.
INSERT INTO MyTable
OUTPUT INSERTED.ID
VALUES (...)
By the way, this technique also works when inserting multiple rows.
INSERT INTO MyTable
OUTPUT INSERTED.ID
VALUES
(...),
(...),
(...)
Output
ID
2
3
4
##IDENTITY is the last identity inserted using the current SQL Connection. This is a good value to return from an insert stored procedure, where you just need the identity inserted for your new record, and don't care if more rows were added afterward.
SCOPE_IDENTITY is the last identity inserted using the current SQL Connection, and in the current scope -- that is, if there was a second IDENTITY inserted based on a trigger after your insert, it would not be reflected in SCOPE_IDENTITY, only the insert you performed. Frankly, I have never had a reason to use this.
IDENT_CURRENT(tablename) is the last identity inserted regardless of connection or scope. You could use this if you want to get the current IDENTITY value for a table that you have not inserted a record into.
ALWAYS use scope_identity(), there's NEVER a need for anything else.
One other way to guarantee the identity of the rows you insert is to specify the identity values and use the SET IDENTITY_INSERT ON and then OFF. This guarantees you know exactly what the identity values are! As long as the values are not in use then you can insert these values into the identity column.
CREATE TABLE #foo
(
fooid INT IDENTITY NOT NULL,
fooname VARCHAR(20)
)
SELECT ##Identity AS [##Identity],
Scope_identity() AS [SCOPE_IDENTITY()],
Ident_current('#Foo') AS [IDENT_CURRENT]
SET IDENTITY_INSERT #foo ON
INSERT INTO #foo
(fooid,
fooname)
VALUES (1,
'one'),
(2,
'Two')
SET IDENTITY_INSERT #foo OFF
SELECT ##Identity AS [##Identity],
Scope_identity() AS [SCOPE_IDENTITY()],
Ident_current('#Foo') AS [IDENT_CURRENT]
INSERT INTO #foo
(fooname)
VALUES ('Three')
SELECT ##Identity AS [##Identity],
Scope_identity() AS [SCOPE_IDENTITY()],
Ident_current('#Foo') AS [IDENT_CURRENT]
-- YOU CAN INSERT
SET IDENTITY_INSERT #foo ON
INSERT INTO #foo
(fooid,
fooname)
VALUES (10,
'Ten'),
(11,
'Eleven')
SET IDENTITY_INSERT #foo OFF
SELECT ##Identity AS [##Identity],
Scope_identity() AS [SCOPE_IDENTITY()],
Ident_current('#Foo') AS [IDENT_CURRENT]
SELECT *
FROM #foo
This can be a very useful technique if you are loading data from another source or merging data from two databases etc.
Create a uuid and also insert it to a column. Then you can easily identify your row with the uuid. Thats the only 100% working solution you can implement. All the other solutions are too complicated or are not working in same edge cases.
E.g.:
1) Create row
INSERT INTO table (uuid, name, street, zip)
VALUES ('2f802845-447b-4caa-8783-2086a0a8d437', 'Peter', 'Mainstreet 7', '88888');
2) Get created row
SELECT * FROM table WHERE uuid='2f802845-447b-4caa-8783-2086a0a8d437';
Even though this is an older thread, there is a newer way to do this which avoids some of the pitfalls of the IDENTITY column in older versions of SQL Server, like gaps in the identity values after server reboots. Sequences are available in SQL Server 2016 and forward which is the newer way is to create a SEQUENCE object using TSQL. This allows you create your own numeric sequence object in SQL Server and control how it increments.
Here is an example:
CREATE SEQUENCE CountBy1
START WITH 1
INCREMENT BY 1 ;
GO
Then in TSQL you would do the following to get the next sequence ID:
SELECT NEXT VALUE FOR CountBy1 AS SequenceID
GO
Here are the links to CREATE SEQUENCE and NEXT VALUE FOR
Complete solution in SQL and ADO.NET
const string sql = "INSERT INTO [Table1] (...) OUTPUT INSERTED.Id VALUES (...)";
using var command = connection.CreateCommand();
command.CommandText = sql;
var outputIdParameter = new SqlParameter("#Id", SqlDbType.Int) { Direction = ParameterDirection.Output };
command.Parameters.Add(outputIdParameter);
await connection.OpenAsync();
var outputId= await command.ExecuteScalarAsync();
await connection.CloseAsync();
int id = Convert.ToInt32(outputId);
After Your Insert Statement you need to add this. And Make sure about the table name where data is inserting.You will get current row no where row affected just now by your insert statement.
IDENT_CURRENT('tableName')
My database contains three tables called Object_Table, Data_Table and Link_Table. The link table just contains two columns, the identity of an object record and an identity of a data record.
I want to copy the data from DATA_TABLE where it is linked to one given object identity and insert corresponding records into Data_Table and Link_Table for a different given object identity.
I can do this by selecting into a table variable and the looping through doing two inserts for each iteration.
Is this the best way to do it?
Edit : I want to avoid a loop for two reason, the first is that I'm lazy and a loop/temp table requires more code, more code means more places to make a mistake and the second reason is a concern about performance.
I can copy all the data in one insert but how do get the link table to link to the new data records where each record has a new id?
In one statement: No.
In one transaction: Yes
BEGIN TRANSACTION
DECLARE #DataID int;
INSERT INTO DataTable (Column1 ...) VALUES (....);
SELECT #DataID = scope_identity();
INSERT INTO LinkTable VALUES (#ObjectID, #DataID);
COMMIT
The good news is that the above code is also guaranteed to be atomic, and can be sent to the server from a client application with one sql string in a single function call as if it were one statement. You could also apply a trigger to one table to get the effect of a single insert. However, it's ultimately still two statements and you probably don't want to run the trigger for every insert.
You still need two INSERT statements, but it sounds like you want to get the IDENTITY from the first insert and use it in the second, in which case, you might want to look into OUTPUT or OUTPUT INTO: http://msdn.microsoft.com/en-us/library/ms177564.aspx
The following sets up the situation I had, using table variables.
DECLARE #Object_Table TABLE
(
Id INT NOT NULL PRIMARY KEY
)
DECLARE #Link_Table TABLE
(
ObjectId INT NOT NULL,
DataId INT NOT NULL
)
DECLARE #Data_Table TABLE
(
Id INT NOT NULL Identity(1,1),
Data VARCHAR(50) NOT NULL
)
-- create two objects '1' and '2'
INSERT INTO #Object_Table (Id) VALUES (1)
INSERT INTO #Object_Table (Id) VALUES (2)
-- create some data
INSERT INTO #Data_Table (Data) VALUES ('Data One')
INSERT INTO #Data_Table (Data) VALUES ('Data Two')
-- link all data to first object
INSERT INTO #Link_Table (ObjectId, DataId)
SELECT Objects.Id, Data.Id
FROM #Object_Table AS Objects, #Data_Table AS Data
WHERE Objects.Id = 1
Thanks to another answer that pointed me towards the OUTPUT clause I can demonstrate a solution:
-- now I want to copy the data from from object 1 to object 2 without looping
INSERT INTO #Data_Table (Data)
OUTPUT 2, INSERTED.Id INTO #Link_Table (ObjectId, DataId)
SELECT Data.Data
FROM #Data_Table AS Data INNER JOIN #Link_Table AS Link ON Data.Id = Link.DataId
INNER JOIN #Object_Table AS Objects ON Link.ObjectId = Objects.Id
WHERE Objects.Id = 1
It turns out however that it is not that simple in real life because of the following error
the OUTPUT INTO clause cannot be on
either side of a (primary key, foreign
key) relationship
I can still OUTPUT INTO a temp table and then finish with normal insert. So I can avoid my loop but I cannot avoid the temp table.
I want to stress on using
SET XACT_ABORT ON;
for the MSSQL transaction with multiple sql statements.
See: https://msdn.microsoft.com/en-us/library/ms188792.aspx
They provide a very good example.
So, the final code should look like the following:
SET XACT_ABORT ON;
BEGIN TRANSACTION
DECLARE #DataID int;
INSERT INTO DataTable (Column1 ...) VALUES (....);
SELECT #DataID = scope_identity();
INSERT INTO LinkTable VALUES (#ObjectID, #DataID);
COMMIT
It sounds like the Link table captures the many:many relationship between the Object table and Data table.
My suggestion is to use a stored procedure to manage the transactions. When you want to insert to the Object or Data table perform your inserts, get the new IDs and insert them to the Link table.
This allows all of your logic to remain encapsulated in one easy to call sproc.
If you want the actions to be more or less atomic, I would make sure to wrap them in a transaction. That way you can be sure both happened or both didn't happen as needed.
You might create a View selecting the column names required by your insert statement, add an INSTEAD OF INSERT Trigger, and insert into this view.
Before being able to do a multitable insert in Oracle, you could use a trick involving an insert into a view that had an INSTEAD OF trigger defined on it to perform the inserts. Can this be done in SQL Server?
Insert can only operate on one table at a time. Multiple Inserts have to have multiple statements.
I don't know that you need to do the looping through a table variable - can't you just use a mass insert into one table, then the mass insert into the other?
By the way - I am guessing you mean copy the data from Object_Table; otherwise the question does not make sense.
//if you want to insert the same as first table
$qry = "INSERT INTO table (one, two, three) VALUES('$one','$two','$three')";
$result = #mysql_query($qry);
$qry2 = "INSERT INTO table2 (one,two, three) VVALUES('$one','$two','$three')";
$result = #mysql_query($qry2);
//or if you want to insert certain parts of table one
$qry = "INSERT INTO table (one, two, three) VALUES('$one','$two','$three')";
$result = #mysql_query($qry);
$qry2 = "INSERT INTO table2 (two) VALUES('$two')";
$result = #mysql_query($qry2);
//i know it looks too good to be right, but it works and you can keep adding query's just change the
"$qry"-number and number in #mysql_query($qry"")
I have 17 tables this has worked in.
-- ================================================
-- Template generated from Template Explorer using:
-- Create Procedure (New Menu).SQL
--
-- Use the Specify Values for Template Parameters
-- command (Ctrl-Shift-M) to fill in the parameter
-- values below.
--
-- This block of comments will not be included in
-- the definition of the procedure.
-- ================================================
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE InsetIntoTwoTable
(
#name nvarchar(50),
#Email nvarchar(50)
)
AS
BEGIN
SET NOCOUNT ON;
insert into dbo.info(name) values (#name)
insert into dbo.login(Email) values (#Email)
END
GO
I have this situation:
I have two tables:
Table A
Staging_Table A
Both tables contain those common columns:
Code
Description
Into Table A I also have a column Version which identifies the last version of corresponding column Code.
My problem is how to update the column Version once a new Description is stored for the same Code (I fill up the Staging_Table with a bulk Insert from C#. I have a flow of data that change once a week).
I need to insert the new row into Table A which contain the same Code, but a different Description, without deleting the old one.
I insert the rows from Staging table to table A with MINUS operation and I have this mechanism within a stored procedure because I also fill up the staging table with a Bulk Insert from C#.
The result I need to obtain is the following:
TABLE A:
Id Code Description Version End_date
-- ----------------- ------- --------
1 8585 Red Car 1 26-mag-2015
2 8585 Red Car RRRR 2 01-giu-2015
How can I do that?
I hope the issue is clear
If I understand correctly process work like that:
1. Data is loaded to staging table Staging_table_A
2. Data is inserted from Staging_table_A itno Table_A with additional column version.
I would do:
with cnt as (select count(*) c, code from Table_A group by code)
Insert into Table_A (select sta.*, nvl(cnt.c,0) + 1 as version
from Staging_table_A sta left outer join cnt on (sta.code = cnt.code));
This is based on condition that in Table_A versions contains no duplicates.
Im working on database synchronization in my app. It means I have 5 databases, but:
only in first database product could be added/removed/modified
this first database saving information about added/removed/modified product to table (with flag 1/2/3 as add/edit/remove and productID)
so first database generates INSERT script from SELECT, for example:
in my product_changes table (addedRemovedEdited INT, productID INT) I have information:
1, 15 (1 - flag means product with ID = 15 was added), or
2, 15 (2 - flag means product with ID = 15 was edited) etc.
Now using this information I can create script - and there is problem.
At this momment im creating scripts like:
SELECT (col1, col2, col3,...) FROM Product_Category;
string query = "INSERT INTO Table VALUES (#a,#b,#c)...";
SELECT (col1,col2,col3,...) FROM Product_price;
query += "INSERT INTO .......";
And I need to do it foreach tables which contains information about one single products. So for 10 products I'll have 10 * 12 (12 because there is ~12 tables about one product) blocks of code like INSERT INTO Table 1(....); INSERT INTO TABLE2(....).
Problem is also that, all data need to have same ID in every databases - so I'm using ##identity and put it into insert query. It has to be this way, because product with ID = 10 with name 'Keyboard' in mainDB = product with ID = 10 in DB10.
And the question - maybe some of you know any better (becouse that one is not so good) solution how can I create those scripts? Like query, which will take all information from my string[] a = {"Product", "Product_price", "Product_category"} tables and generate INSERT queries but - most important - where I can add ##identity.
#EDIT: I forgot. I found that solution: how i can generate programmatically "insert into" data script file from a database table?
Well, it does generate scripts, but with auto-incremented ID. And I need to add information in right order (as middle tables) for example:
INSERT INTO Product(.....) VALUES (...);
SET #pID = ##identity FROM Product;
INSERT INTO Price (priceID,.....) VALUES (...);
SET #prID = ##identity FROM Price;
INSERT INTO Product_price (priceID, productID,...) VALUES (#prID, #pID)
I have a table, schema is very simple, an ID column as unique primary key (uniqueidentifier type) and some other nvarchar columns. My current goal is, for 5000 inputs, I need to calculate what ones are already contained in the table and what are not. Tht inputs are string and I have a C# function which converts string into uniqueidentifier (GUID). My logic is, if there is an existing ID, then I treat the string as already contained in the table.
My question is, if I need to find out what ones from the 5000 input strings are already contained in DB, and what are not, what is the most efficient way?
BTW: My current implementation is, convert string to GUID using C# code, then invoke/implement a store procedure which query whether an ID exists in database and returns back to C# code.
My working environment: VSTS 2008 + SQL Server 2008 + C# 3.5.
My first instinct would be to pump your 5000 inputs into a single-column temporary table X, possibly index it, and then use:
SELECT X.thecol
FROM X
JOIN ExistingTable USING (thecol)
to get the ones that are present, and (if both sets are needed)
SELECT X.thecol
FROM X
LEFT JOIN ExistingTable USING (thecol)
WHERE ExistingTable.thecol IS NULL
to get the ones that are absent. Worth benchmarking, at least.
Edit: as requested, here are some good docs & tutorials on temp tables in SQL Server. Bill Graziano has a simple intro covering temp tables, table variables, and global temp tables. Randy Dyess and SQL Master discuss performance issue for and against them (but remember that if you're getting performance problems you do want to benchmark alternatives, not just go on theoretical considerations!-).
MSDN has articles on tempdb (where temp tables are kept) and optimizing its performance.
Step 1. Make sure you have a problem to solve. Five thousand inserts isn't a lot to insert one at a time in a lot of contexts.
Are you certain that the simplest way possible isn't sufficient? What performance issues have you measured so far?
What do you need to do with those entries that do or don't exist in your table??
Depending on what you need, maybe the new MERGE statement in SQL Server 2008 could fit your bill - update what's already there, insert new stuff, all wrapped neatly into a single SQL statement. Check it out!
http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx
http://www.sql-server-performance.com/articles/dba/SQL_Server_2008_MERGE_Statement_p1.aspx
http://blogs.msdn.com/brunoterkaly/archive/2008/11/12/sql-server-2008-merge-capability.aspx
Your statement would look something like this:
MERGE INTO
(your target table) AS t
USING
(your source table, e.g. a temporary table) AS s
ON t.ID = s.ID
WHEN NOT MATCHED THEN -- new rows does not exist in base table
....(do whatever you need to do)
WHEN MATCHED THEN -- row exists in base table
... (do whatever else you need to do)
;
To make this really fast, I would load the "new" records from e.g. a TXT or CSV file into a temporary table in SQL server using BULK INSERT:
BULK INSERT YourTemporaryTable
FROM 'c:\temp\yourimportfile.csv'
WITH
(
FIELDTERMINATOR =',',
ROWTERMINATOR =' |\n'
)
BULK INSERT combined with MERGE should give you the best performance you can get on this planet :-)
Marc
PS: here's a note from TechNet on MERGE performance and why it's faster than individual statements:
In SQL Server 2008, you can perform multiple data manipulation language (DML) operations in a single statement by using the MERGE statement. For example, you may need to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table. Typically, this is done by executing a stored procedure or batch that contains individual INSERT, UPDATE, and DELETE statements. However, this means that the data in both the source and target tables are evaluated and processed multiple times; at least once for each statement.
By using the MERGE statement, you can replace the individual DML statements with a single statement. This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed. However, performance gains depend on having correct indexes, joins, and other considerations in place. This topic provides best practice recommendations to help you achieve optimal performance when using the MERGE statement.
Try to ensure you end up running only one query - i.e. if your solution consists of running 5000 queries against the database, that'll probably be the biggest consumer of resources for the operation.
If you can insert the 5000 IDs into a temporary table, you could then write a single query to find the ones that don't exist in the database.
If you want simplicity, since 5000 records is not very many, then from C# just use a loop to generate an insert statement for each of the strings you want to add to the table. Wrap the insert in a TRY CATCH block. Send em all up to the server in one shot like this:
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
if you have a unique index or primary key defined on your string GUID, then the duplicate inserts will fail. Checking ahead of time to see if the record does not exist just duplicates work that SQL is going to do anyway.
If performance is really important, then consider downloading the 5000 GUIDS to your local station and doing all the analysis localy. Reading 5000 GUIDS should take much less than 1 second. This is simpler than bulk importing to a temp table (which is the only way you will get performance from a temp table) and doing an update using a join to the temp table.
Since you are using Sql server 2008, you could use Table-valued parameters. It's a way to provide a table as a parameter to a stored procedure.
Using ADO.NET you could easily pre-populate a DataTable and pass it as a SqlParameter.
Steps you need to perform:
Create a custom Sql Type
CREATE TYPE MyType AS TABLE
(
UniqueId INT NOT NULL,
Column NVARCHAR(255) NOT NULL
)
Create a stored procedure which accepts the Type
CREATE PROCEDURE spInsertMyType
#Data MyType READONLY
AS
xxxx
Call using C#
SqlCommand insertCommand = new SqlCommand(
"spInsertMyType", connection);
insertCommand.CommandType = CommandType.StoredProcedure;
SqlParameter tvpParam =
insertCommand.Parameters.AddWithValue(
"#Data", dataReader);
tvpParam.SqlDbType = SqlDbType.Structured;
Links: Table-valued Parameters in Sql 2008
Definitely do not do it one-by-one.
My preferred solution is to create a stored procedure with one parameter that can take and XML in the following format:
<ROOT>
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000">
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000001">
....
</ROOT>
Then in the procedure with the argument of type NCHAR(MAX) you convert it to XML, after what you use it as a table with single column (lets call it #FilterTable). The store procedure looks like:
CREATE PROCEDURE dbo.sp_MultipleParams(#FilterXML NVARCHAR(MAX))
AS BEGIN
SET NOCOUNT ON
DECLARE #x XML
SELECT #x = CONVERT(XML, #FilterXML)
-- temporary table (must have it, because cannot join on XML statement)
DECLARE #FilterTable TABLE (
"ID" UNIQUEIDENTIFIER
)
-- insert into temporary table
-- #important: XML iS CaSe-SenSiTiv
INSERT #FilterTable
SELECT x.value('#ID', 'UNIQUEIDENTIFIER')
FROM #x.nodes('/ROOT/MyObject') AS R(x)
SELECT o.ID,
SIGN(SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END)) AS FoundInDB
FROM #FilterTable o
LEFT JOIN dbo.MyTable t
ON o.ID = t.ID
GROUP BY o.ID
END
GO
You run it as:
EXEC sp_MultipleParams '<ROOT><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000"/><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000002"/></ROOT>'
And your results look like:
ID FoundInDB
------------------------------------ -----------
60EAD98F-8A6C-4C22-AF75-000000000000 1
60EAD98F-8A6C-4C22-AF75-000000000002 0