I may be missing something here, but I've searched for hours and I'm either not finding what I need, or I'm not searching on the correct terms. Never-the-less, this is what I'm trying to do.
I'm currently exploring migrating from EF to plain-old ADO. I'm happy that whilst there is a development hit in doing so all current testing points to ADO still being many times faster than EF (which given EF is built on ADO makes sense).
Where I am a little stumped, is generating an update statement for a table row, and an efficient one. Any update statement may change values in 1 or 10 fields, but it's clearly more efficient to only post the data that needs changing.
My question is, what is the best way to generate the update statement to as to remain protected from SQL injection?
For instance, one column value update would be
update Table1 set Column2 = 'somevalue' WHERE Column1 = #id;
Where two columns would be
update Table1 set Column2 = 'somevalue', Column 3 = 'some other value' WHERE Column1 = #id;
Does anyone have any best practises on how they handle this please?
Additional Information:
I've had this down-voted, but quite honestly I think that is because I haven't made myself clear in what I want.
Let me start be confirming that I understand I have options of straight-forward SQL commands (which I am fairly competent on) or placing the said command within a Stored Procedure and calling either from ADO. I also fully understand the importance of using parameters in any SQL statement where user input is placed.
Imagine the following table:
DECLARE #example TABLE
(
Id INT IDENTITY NOT NULL,
Name VARCHAR(50) NOT NULL,
Description VARCHAR(1000) NOT NULL
);
-- Indexes omitted for simplicity
Now imagine I have an API, allowing users to update a row in this table. The user can update either Name, Description OR both columns, simply by passing the Id. The call is completely disconnected from any "result sets" and therefore I must issue an UPDATE command to the database manually (or through a Stored Procedure).
To keep data transmission to a minimum (therefore helping to maximise performance), I want to cater for the following scenarios:
User updates just Name
UPDATE #example SET [Name] = #name WHERE [Id] = #id;
User updates just Description
UPDATE #example SET [Description] = #description WHERE [Id] = #id;
User updates both
UPDATE #example SET [Name] = #name, [Description] = #description WHERE [Id] = #id;
After all, with each call, I don't know what the caller wishes to update.
In reality, tables can have many, many columns, and it in completely ridiculous to create the relevant SQL statements for every possible combination - let alone the ludicrous effort it would require to keep updated.
What I'm looking for (as I seem to be missing in searches) is how to generate a safe SQL statement that caters for each option based on what the user supplies AND uses parameters AND generates the smallest query possible - needed because we cannot update a column value if the user did not pass a value for it.
I hope this helps to clarify the requirement better.
Parameterize ALL values in ALL cases. This will ensure you avoid SQL injection attacks. As far as patterns for tracking which fields have changed and thus need updating, that is a larger exercise with many examples available on the interwebs for your reading enjoyment.
update Table1
set Column2 = #Column2,
Column3 = #Column3
where Column1 = #Column1
Related
I have a table where the primary key is an increment int 'ID', that I have to manually set. I know an autoincrement int (IDENTITY) should have been the best option, but I can't change the existing table design.
So I need to atomize the operation of Read-Write, in some sort of:
Lock table
Read the MAX value of existings ID
Add new record with Primary Key = ID+1
Release table
What is the correct way to lock the table in a multiuser environment? I suppose it's a mix of transactions and the use of TABLOCX. I need to ensure:
No deadlocks
If something fails, the table should no stay locked (for example, program fails and exits when triying to write, and no COMMIT/ROLLBACK is called). I don't know even if this could be possible.
NOTE: The database is also used by other applications that I suppose care themselves of this problem.
EDITED: Could this be considered enough atomic to be a solution?:
INSERT INTO MYTABLE (ID, OtherFields...) VALUES ((Select Max(ID)+1 from MYTABLE), 'values'...)
Attempting to roll your own auto-increment mechanism using table locks is almost bound to fail - however, since you wrote you can't change the existing table, I would suggest using a sequence to get the next number instead of locking the table.
CREATE SEQUENCE dbo.MySequence -- Don't use this name, please!
AS int -- note: default is bigInt
START WITH 1
INCREMENT BY 1
NO CYCLE;
This has all some1 of the benefits of an identity column, without having to add an identity column to your table.
You can also use the sequence to generate a default value to a column (assuming adding a default constraint doesn't count as "changing the existing table structure", of course). See example D in official documentation
ALTER TABLE dbo.YourTableName
ADD CONSTRAINT YourTableName_id_default
DEFAULT NEXT VALUE FOR MySequence
FOR Id;
1 The benefits are you don't need to add locks or to calculate the next number yourself.
However, you should know that unlike an identity column, this doesn't protect you from updates to the id column, nor does it protect you from insert statements that explicitly insert a value to this column (without using next value for).
The first problem can be quite easily solved with an instead-of-update trigger on the table that will only update columns that aren't the id column, but I'm not sure how to solve the other problem.
So if the other process is correctly handling the locking, you could do exactly what you mentioned (lock, get last ID, insert and release) by executing something similar to the following:
DECLARE #MaxID INT
BEGIN TRY
BEGIN TRANSACTION
SELECT
#MaxID = MAX(I.ID)
FROM
MyTable AS I WITH (TABLOCKX, HOLDLOCK) -- TABLOCKX: no operations can be done, HOLDLOCK: until the end of the transaction
INSERT INTO MyTable (
ID,
OtherColumn)
SELECT
ID = ISNULL(#MaxID + 1, 1)
OtherColumn = 'Other values'
COMMIT
END TRY
BEGIN CATCH
-- Handle your error logging and rollback the transaction so the table locks are released, a basic example:
DECLARE #ErrorMessage VARCHAR(MAX) = ERROR_MESSAGE()
IF ##TRANCOUNT > 0
ROLLBACK
RAISERROR(#ErrorMessage, 16, 1)
END CATCH
However you will still have to do additional stuff for batch inserts, or if you need the inserted ID to load other related tables.
Also TABLOCKX is pretty restrictive, there are other less-restrictive locks but I believe they might leave you open for concurrency issues. You can check other locking hints in the docs.
I've searched every way I can come up with, but can't find an technique for initializing a DataTable to match a UDT Table declared in our DB. I could manually go through and add columns, but I don't want to duplicate the structure in both places. For a normal table, one option would be to simply issue a "select * where ..." that returns no results. But can something like this be done for a UDT Table?
And here is the background problem.
This DB has a sproc that accepts a Table Valued Parameter that is an instance of the indicated UDT Table declared in the same DB. Most of the UD fields are nullable, and the logic to load the TVP is quite involved. What I hoped to do is initialize the DT, then insert rows as needed and set required column/field values as I go until I'm ready to toss the result to SS for final processing.
I can certainly add the dozen or more fields in code, but the details are still in flux (and may continue to be so for some time), which is one reason I don't really want to have to load all the columns in code.
So, is there a reasonable solution, or am I barking up the wrong tree? I've already spent more time looking for the solution I expected to exist than it would have taken to write the column loading code 100 times over, but now I just want to know if it's possible.
Ok, I was discussing with a friend who is MUCH more SQL savvy than I am (doesn't take much), and he suggested the following SQL query:
"DECLARE #TVP as MyUDTTable; SELECT * FROM #TVP"
This appears to give me exactly what I want, so I'm updating here should some other poor sap want something similar in the future. Perhaps others may offer different or better answers.
Here is an example of how I did this. This style of input/output is something me and a co-worker put together to allow quick and effective use of entity framework on his side and keeps my options open to use all sql toys. If that is the same use as you have you might also like the OUTPUT use I did here. It spits the newly created ids right back at whatever method calls the proc allowing the program to go right on to the next activity withouth pestering my database for the numbers.
My Udt
CREATE TYPE [dbo].[udtOrderLineBatch] AS TABLE
(
[BrandId] [bigint] NULL,
[ProductClassId] [bigint] NULL,
[ProductStatus] [bigint] NULL,
[Quantity] [bigint] NULL
)
and the procedure that takes is as an input
create procedure [ops].[uspBackOrderlineMultipleCreate]
#parmBackOrderId int
,#UserGuid uniqueidentifier
null
,#parmOrderLineBatch as udtOrderLineBatch readonly
as
begin
insert ops.OrderLine
(
BrandId
,ProductClassId
,ProductStatusId
,BackOrderId
,OrderId
,DeliveryId
,CreatedDate
,CreatedBy)
output cast(inserted.OrderLineId as bigint) OrderLineId
select line.BrandId
,line.ProductClassId
,line.ProductStatus
,#parmBackOrderId
,null
,null
,getdate()
,#UserGuid
from #parmOrderLineBatch line
join NumberSequence seq on line.Quantity >= seq.Number
end
I want to get new id(Identity) before insert it. so, use this code:
select SCOPE_IDENTITY() AS NewId from tblName
but is get this:
1- Null
2- Null
COMPUTED COLUMN VERSION
You'll have to do this on the sql server to add the column.
alter table TableName add Code as (name + cast(id as varchar(200)))
Now your result set will always have Code as the name + id value, nice because this column will remain updated with that expression even if the field are changed (such as name).
Entity Framework Option (Less ideal)
You mentioned you are using Entity Framework. You need to concatenate the ID on a field within the same record during insert. There is no capacity in SQL (outside of Triggers) or Entity Framework to do what you are wanting in one step.
You need to do something like this:
var obj = new Thing{ field1= "some value", field2 = ""};
context.ThingTable.Add(obj);
context.SaveChanges();
obj.field2 = "bb" + obj.id; //after the first SaveChanges is when your id field would be populated
context.SaveChanges();
ORIGINAL Answer:
If you really must show this value to the user then the safe way to do it would be something like this:
begin tran
insert into test(test) values('this is something')
declare #pk int = scope_identity()
print #pk
You can now return the value in #pk and let the user determine if its acceptable. If it is then issue a COMMIT else issue the ROLLBACK command.
This however is not a very good design and I would think a misuse of the how identity values are generated. Also you should know if you perform a rollback, the ID that would of been used is lost and wont' be used again.
This is too verbose for a comment.
Consider how flawed this concept really is. The identity property is a running tally of the number of attempted inserts. You are wanting to return to the user the identity of a row that does not yet exist. Consider what would happen if you have values in the insert that cause it too fail. You already told the user what the identity would be but the insert failed so that identity has already been consumed. You should report to the user the value when the row actually exists, which is after the insert.
I can't understand why you want to show that identity to user before insert, I believe (as #SeanLange said) that is not custom and not useful, but if you insist I think you can do some infirm ways. One of them is
1) Insert new row then get ID with SCOPE_IDENTITY() and show to user
2) Then if you want to cancel operation delete the row and reset
identity (if necessary) with DBCC CHECKIDENT('[Table Name]', RESEED,
[Identity Seed]) method
Other way is not using the Identity column and manage id column by yourself and it must be clear this approach can't be work in concurrency scenarios.
I think perhaps you're confusing the SQL identity with a ORACLE sequence.
They work completely different.
With the ORACLE sequence you'll get the sequence before you insert the record.
With a SQL Identity, the last identity generated AFTER the insert in available via the SCOPE_IDENTITY() function.
If you really need to show the ID to the user before the insert, your best bet is to keep a counter in a separate table, and read the current value, and increment that by one. As long as "gaps" in the numbers aren't a problem.
I have created two threads in C# and I am calling two separate functions in parallel. Both functions read the last ID from XYZ table and insert new record with value ID+1. Here ID column is the primary key. When I execute the both functions I am getting primary key violation error. Both function having the below query:
insert into XYZ values((SELECT max(ID)+1 from XYZ),'Name')
Seems like both functions are reading the value at a time and trying to insert with the same value.
How can I solve this problem.. ?
Let the database handle selecting the ID for you. It's obvious from your code above that what you really want is an auto-incrementing integer ID column, which the database can definitely handle doing for you. So set up your table properly and instead of your current insert statement, do this:
insert into XYZ values('Name')
If your database table is already set up I believe you can issue a statement similar to:
alter table your_table modify column you_table_id int(size) auto_increment
Finally, if none of these solutions are adequate for whatever reason (including, as you indicated in the comments section, inability to edit the table schema) then you can do as one of the other users suggested in the comments and create a synchronized method to find the next ID. You would basically just create a static method that returns an int, issue your select id statement in that static method, and use the returned result to insert your next record into the table. Since this method would not guarantee a successful insert (due to external applications ability to also insert into the same table) you would also have to catch Exceptions and retry on failure).
Set ID column to be "Identity" column. Then, you can execute your queries as:
insert into XYZ values('Name')
I think that you can't use ALTER TABLE to change column to be Identity after column is created. Use Managament Studio to set this column to be Identity. If your table has many rows, this can be a long running process, because it will actually copy your data to a new table (will perform table re-creation).
Most likely that option is disabled in your Managament Studio. In order to enable it open Tools->Options->Designers and uncheck option "Prevent saving changes that require table re-creation"...depending on your table size, you will probably have to set timeout, too. Your table will be locked during that time.
A solution for such problems is to have generate the ID using some kind of a sequence.
For example, in SQL Server you can create a sequence using the command below:
CREATE SEQUENCE Test.CountBy1
START WITH 1
INCREMENT BY 1 ;
GO
Then in C#, you can retrieve the next value out of Test and assign it to the ID before inserting it.
It sounds like you want a higher transaction isolation level or more restrictive locking.
I don't use these features too often, so hopefully somebody will suggest an edit if I'm wrong, but you want one of these:
-- specify the strictest isolation level
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
insert into XYZ values((SELECT max(ID)+1 from XYZ),'Name')
or
-- make locks exclusive so other transactions cannot access the same rows
insert into XYZ values((SELECT max(ID)+1 from XYZ WITH (XLOCK)),'Name')
I have a table, schema is very simple, an ID column as unique primary key (uniqueidentifier type) and some other nvarchar columns. My current goal is, for 5000 inputs, I need to calculate what ones are already contained in the table and what are not. Tht inputs are string and I have a C# function which converts string into uniqueidentifier (GUID). My logic is, if there is an existing ID, then I treat the string as already contained in the table.
My question is, if I need to find out what ones from the 5000 input strings are already contained in DB, and what are not, what is the most efficient way?
BTW: My current implementation is, convert string to GUID using C# code, then invoke/implement a store procedure which query whether an ID exists in database and returns back to C# code.
My working environment: VSTS 2008 + SQL Server 2008 + C# 3.5.
My first instinct would be to pump your 5000 inputs into a single-column temporary table X, possibly index it, and then use:
SELECT X.thecol
FROM X
JOIN ExistingTable USING (thecol)
to get the ones that are present, and (if both sets are needed)
SELECT X.thecol
FROM X
LEFT JOIN ExistingTable USING (thecol)
WHERE ExistingTable.thecol IS NULL
to get the ones that are absent. Worth benchmarking, at least.
Edit: as requested, here are some good docs & tutorials on temp tables in SQL Server. Bill Graziano has a simple intro covering temp tables, table variables, and global temp tables. Randy Dyess and SQL Master discuss performance issue for and against them (but remember that if you're getting performance problems you do want to benchmark alternatives, not just go on theoretical considerations!-).
MSDN has articles on tempdb (where temp tables are kept) and optimizing its performance.
Step 1. Make sure you have a problem to solve. Five thousand inserts isn't a lot to insert one at a time in a lot of contexts.
Are you certain that the simplest way possible isn't sufficient? What performance issues have you measured so far?
What do you need to do with those entries that do or don't exist in your table??
Depending on what you need, maybe the new MERGE statement in SQL Server 2008 could fit your bill - update what's already there, insert new stuff, all wrapped neatly into a single SQL statement. Check it out!
http://blogs.conchango.com/davidportas/archive/2007/11/14/SQL-Server-2008-MERGE.aspx
http://www.sql-server-performance.com/articles/dba/SQL_Server_2008_MERGE_Statement_p1.aspx
http://blogs.msdn.com/brunoterkaly/archive/2008/11/12/sql-server-2008-merge-capability.aspx
Your statement would look something like this:
MERGE INTO
(your target table) AS t
USING
(your source table, e.g. a temporary table) AS s
ON t.ID = s.ID
WHEN NOT MATCHED THEN -- new rows does not exist in base table
....(do whatever you need to do)
WHEN MATCHED THEN -- row exists in base table
... (do whatever else you need to do)
;
To make this really fast, I would load the "new" records from e.g. a TXT or CSV file into a temporary table in SQL server using BULK INSERT:
BULK INSERT YourTemporaryTable
FROM 'c:\temp\yourimportfile.csv'
WITH
(
FIELDTERMINATOR =',',
ROWTERMINATOR =' |\n'
)
BULK INSERT combined with MERGE should give you the best performance you can get on this planet :-)
Marc
PS: here's a note from TechNet on MERGE performance and why it's faster than individual statements:
In SQL Server 2008, you can perform multiple data manipulation language (DML) operations in a single statement by using the MERGE statement. For example, you may need to synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table. Typically, this is done by executing a stored procedure or batch that contains individual INSERT, UPDATE, and DELETE statements. However, this means that the data in both the source and target tables are evaluated and processed multiple times; at least once for each statement.
By using the MERGE statement, you can replace the individual DML statements with a single statement. This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the data in the source and target tables are processed. However, performance gains depend on having correct indexes, joins, and other considerations in place. This topic provides best practice recommendations to help you achieve optimal performance when using the MERGE statement.
Try to ensure you end up running only one query - i.e. if your solution consists of running 5000 queries against the database, that'll probably be the biggest consumer of resources for the operation.
If you can insert the 5000 IDs into a temporary table, you could then write a single query to find the ones that don't exist in the database.
If you want simplicity, since 5000 records is not very many, then from C# just use a loop to generate an insert statement for each of the strings you want to add to the table. Wrap the insert in a TRY CATCH block. Send em all up to the server in one shot like this:
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
BEGIN TRY
INSERT INTO table (theCol, field2, field3)
SELECT theGuid, value2, value3
END TRY BEGIN CATCH END CATCH
if you have a unique index or primary key defined on your string GUID, then the duplicate inserts will fail. Checking ahead of time to see if the record does not exist just duplicates work that SQL is going to do anyway.
If performance is really important, then consider downloading the 5000 GUIDS to your local station and doing all the analysis localy. Reading 5000 GUIDS should take much less than 1 second. This is simpler than bulk importing to a temp table (which is the only way you will get performance from a temp table) and doing an update using a join to the temp table.
Since you are using Sql server 2008, you could use Table-valued parameters. It's a way to provide a table as a parameter to a stored procedure.
Using ADO.NET you could easily pre-populate a DataTable and pass it as a SqlParameter.
Steps you need to perform:
Create a custom Sql Type
CREATE TYPE MyType AS TABLE
(
UniqueId INT NOT NULL,
Column NVARCHAR(255) NOT NULL
)
Create a stored procedure which accepts the Type
CREATE PROCEDURE spInsertMyType
#Data MyType READONLY
AS
xxxx
Call using C#
SqlCommand insertCommand = new SqlCommand(
"spInsertMyType", connection);
insertCommand.CommandType = CommandType.StoredProcedure;
SqlParameter tvpParam =
insertCommand.Parameters.AddWithValue(
"#Data", dataReader);
tvpParam.SqlDbType = SqlDbType.Structured;
Links: Table-valued Parameters in Sql 2008
Definitely do not do it one-by-one.
My preferred solution is to create a stored procedure with one parameter that can take and XML in the following format:
<ROOT>
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000">
<MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000001">
....
</ROOT>
Then in the procedure with the argument of type NCHAR(MAX) you convert it to XML, after what you use it as a table with single column (lets call it #FilterTable). The store procedure looks like:
CREATE PROCEDURE dbo.sp_MultipleParams(#FilterXML NVARCHAR(MAX))
AS BEGIN
SET NOCOUNT ON
DECLARE #x XML
SELECT #x = CONVERT(XML, #FilterXML)
-- temporary table (must have it, because cannot join on XML statement)
DECLARE #FilterTable TABLE (
"ID" UNIQUEIDENTIFIER
)
-- insert into temporary table
-- #important: XML iS CaSe-SenSiTiv
INSERT #FilterTable
SELECT x.value('#ID', 'UNIQUEIDENTIFIER')
FROM #x.nodes('/ROOT/MyObject') AS R(x)
SELECT o.ID,
SIGN(SUM(CASE WHEN t.ID IS NULL THEN 0 ELSE 1 END)) AS FoundInDB
FROM #FilterTable o
LEFT JOIN dbo.MyTable t
ON o.ID = t.ID
GROUP BY o.ID
END
GO
You run it as:
EXEC sp_MultipleParams '<ROOT><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000000"/><MyObject ID="60EAD98F-8A6C-4C22-AF75-000000000002"/></ROOT>'
And your results look like:
ID FoundInDB
------------------------------------ -----------
60EAD98F-8A6C-4C22-AF75-000000000000 1
60EAD98F-8A6C-4C22-AF75-000000000002 0