Linq to SQL - Update Batch - c#

I have the following inline SQL:
internal void UpdateWorkflowProcessingByNullToken(Guid processingToken, int numberToProcess)
{
string sql = string.Format(CultureInfo.InvariantCulture,
"UPDATE TOP ({0}) Master.WorkflowEventProcessing " +
"SET ProcessingToken = '{1}' " +
"WHERE ProcessingToken IS NULL",
numberToProcess, processingToken);
this.Database.ExecuteCommand(sql);
}
Inline SQL was used for performance. It was my understanding that L2S would create a SQL statement for every row that I needed to update. And that was taking too long. Note, this was a couple of years ago.
Now I have a DBA telling me this:
This query appears to be one of the most frequently blocking or been blocked.
This is not optimized, from database perspective, due to execution plan need to be generated for every execution.
Based on the data, this simple query is using more than 1GB of plan cache (more than 25000 copies of similar execution plans), while it could actually use only less than 50KB of memory, if only 1 copy is stored.
I would propose to create a stored procedure with the unique identifier as parameter. By design, the stored procedure (bypass compilation stage) should run faster than ad hoc query.
As developers, we've been reluctant to use stored procedures. We like having all of our data code in our C# data layer. Am I stuck here? Do I need to use a stored procedure? Or is there a way to do a mass update with L2S?
I remember reading about compiling an L2S query. I could look into that as well...

You can use parameterized SQL commands to execute. This will generate a reusable query execution plan that will be as efficient as a stored procedure after it is initially created and cached. Each execution you simply supply new parameters.
More Details
Given the following code that updates a demo database and a table named "Foo"
///////////////////////////////////////////////////////////
// just setup for the context for demo purposes, you would
// reference this.Database in place of creating context.
SqlConnection connection = new SqlConnection("Data Source = .; Initial Catalog = MyDb; Integrated Security = SSPI;");
var dataContext = new System.Data.Linq.DataContext(connection);
///////////////////////////////////////////////////////////
string updateQuery = "UPDATE TOP (#p1) dbo.Foo " +
"SET Data = #p2 " +
"WHERE Data IS NULL";
dataContext.Connection.Open();
var command = dataContext.Connection.CreateCommand();
command.CommandText = updateQuery;
command.CommandType = System.Data.CommandType.Text;
var param1 = new SqlParameter("#p1", System.Data.SqlDbType.Int);
param1.Value = 3;
command.Parameters.Add(param1);
var param2 = new SqlParameter("#p2", System.Data.SqlDbType.Int);
param2.Value = 1;
command.Parameters.Add(param2);
command.Prepare();
command.ExecuteNonQuery();
param2.Value = 5;
command.ExecuteNonQuery();
From the profiler output you can see it calls sp_prepexec
declare #p1 int
set #p1=1
exec sp_prepexec #p1 output,N'#p1 int,#p2 int',N'UPDATE TOP (#p1) dbo.Foo SET Data = #p2 WHERE Data IS NULL',#p1=3,#p2=1
select #p1
and executes the statement passing the parameters 3 and 1 then when param2.Value is set to 5 and the command executed again the profiler shows it reusing the prepared command (thus no recompiling or new execution plan generated)
exec sp_execute 1,#p1=3,#p2=5
This is what the profiler output looks like, FYI...

Related

How can I parameterize an SQL table without vulnerability to SQL injection

I'm writing a C# class library in which one of the features is the ability to create an empty data table that matches the schema of any existing table.
For example, this:
private DataTable RetrieveEmptyDataTable(string tableName)
{
var table = new DataTable() { TableName = tableName };
using var command = new SqlCommand($"SELECT TOP 0 * FROM {tableName}", _connection);
using SqlDataAdapter dataAdapter = new SqlDataAdapter(command);
dataAdapter.Fill(table);
return table;
}
The above code works, but it has a glaring security vulnerability: SQL injection.
My first instinct is to parameterize the query like so:
using var command = new SqlCommand("SELECT TOP 0 * FROM #tableName", _connection);
command.Parameters.AddWithValue("#tableName", tableName);
But this leads to the following exception:
Must declare the table variable "#tableName"
After a quick search on Stack Overflow I found this question, which recommends using my first approach (the one with sqli vulnerability). That doesn't help at all, so I kept searching and found this question, which says that the only secure solution would be to hard-code the possible tables. Again, this doesn't work for my class library which needs to work for arbitrary table names.
My question is this: how can I parameterize the table name without vulnerability to SQL injection?
An arbitrary table name still has to exist, so you can check first that it does:
IF EXISTS (SELECT 1 FROM sys.objects WHERE name = #TableName)
BEGIN
... do your thing ...
END
And further, if the list of tables you want to allow the user to select from is known and finite, or matches a specific naming convention (like dbo.Sales%), or belongs to a specific schema (like Reporting), you could add additional predicates to check for those.
This requires you to pass the table name in as a proper parameter, not concatenate or token-replace. (And please don't use AddWithValue() for anything, ever.)
Once your check that the object is real and valid has passed, then you will still have to build your SQL query dynamically, because you still won't be able to parameterize the table name. You still should apply QUOTENAME(), though, as I explain in these posts:
Protecting Yourself from SQL Injection in SQL Server - Part 1
Protecting Yourself from SQL Injection in SQL Server - Part 2
So the final code would be something like:
CREATE PROCEDURE dbo.SelectFromAnywhere
#TableName sysname
AS
BEGIN
IF EXISTS (SELECT 1 FROM sys.objects
WHERE name = #TableName)
BEGIN
DECLARE #sql nvarchar(max) = N'SELECT *
FROM ' + QUOTENAME(#TableName) + N';';
EXEC sys.sp_executesql #sql;
END
ELSE
BEGIN
PRINT 'Nice try, robot.';
END
END
GO
If you also want it to be in some defined list you can add
AND #TableName IN (N't1', N't2', …)
Or LIKE <some pattern> or join to sys.schemas or what have you.
Provided nobody has the rights to then modify the procedure to change the checks, there is no value you can pass to #TableName that will allow you to do anything malicious, other than maybe selecting from another table you didn’t expect because someone with too much access was able to create before calling the code. Replacing characters like -- or ; does not make this any safer.
You could pass the table name to the SQL Server to apply quotename() on it to properly quote it and subsequently only use the quoted name.
Something along the lines of:
...
string quotedTableName = null;
using (SqlCommand command = new SqlCommand("SELECT quotename(#tablename);", connection))
{
SqlParameter parameter = command.Parameters.Add("#tablename", System.Data.SqlDbType.NVarChar, 128 /* nvarchar(128) is (currently) equivalent to sysname which doesn't seem to exist in SqlDbType */);
parameter.Value = tableName;
object buff = command.ExecuteScalar();
if (buff != DBNull.Value
&& buff != null /* theoretically not possible since a FROM-less SELECT always returns a row */)
{
quotedTableName = buff.ToString();
}
}
if (quotedTableName != null)
{
using (SqlCommand command = new SqlCommand($"SELECT TOP 0 FROM { quotedTableName };", connection))
{
...
}
}
...
(Or do the dynamic part on SQL Server directly, also using quotename(). But that seems overly and unnecessary tedious, especially if you will do more than one operation on the table in different places.)
Aaron Bertrand's answer solved the problem, but a stored procedure is not useful for a class library that might interact with any database. Here is the way to write RetrieveEmptyDataTable (the method from my question) using his
answer:
private DataTable RetrieveEmptyDataTable(string tableName)
{
const string tableNameParameter = "#TableName";
var query =
" IF EXISTS (SELECT 1 FROM sys.objects\n" +
$" WHERE name = {tableNameParameter})\n" +
" BEGIN\n" +
" DECLARE #sql nvarchar(max) = N'SELECT TOP 0 * \n" +
$" FROM ' + QUOTENAME({tableNameParameter}) + N';';\n" +
" EXEC sys.sp_executesql #sql;\n" +
"END";
using var command = new SqlCommand(query, _connection);
command.Parameters.Add(tableNameParameter, SqlDbType.NVarChar).Value = tableName;
using SqlDataAdapter dataAdapter = new SqlDataAdapter(command);
var table = new DataTable() { TableName = tableName };
Connect();
dataAdapter.Fill(table);
Disconnect();
return table;
}

ADO.NET and SQLite single cell select performance

I want to create simple database in runtime, fill it with data from internal resource and then read each record through loop. Previously I used LiteDb for that but I couldn't squeeze time anymore so
I choosed SQLite.
I think there are few things to improve I am not aware of.
Database creation process:
First step is to create table
using var create = transaction.Connection.CreateCommand();
create.CommandText = "CREATE TABLE tableName (Id TEXT PRIMARY KEY, Value TEXT) WITHOUT ROWID";
create.ExecuteNonQuery();
Next insert command is defined
var insert = transaction.Connection.CreateCommand();
insert.CommandText = "INSERT OR IGNORE INTO tableName VALUES (#Id, #Record)";
var idParam = insert.CreateParameter();
var valueParam = insert.CreateParameter();
idParam.ParameterName = "#" + IdColumn;
valueParam.ParameterName = "#" + ValueColumn;
insert.Parameters.Add(idParam);
insert.Parameters.Add(valueParam);
Through loop each value is inserted
idParameter.Value = key;
valueParameter.Value = value.ValueAsText;
insert.Parameters["#Id"] = idParameter;
insert.Parameters["#Value"] = valueParameter;
insert.ExecuteNonQuery();
Transaction commit transaction.Commit();
Create index
using var index = transaction.Connection.CreateCommand();
index.CommandText = "CREATE UNIQUE INDEX idx_tableName ON tableName(Id);";
index.ExecuteNonQuery();
And after that i perform milion selects (to retrieve single value):
using var command = _connection.CreateCommand();
command.CommandText = "SELECT Value FROM tableName WHERE Id = #id;";
var param = command.CreateParameter();
param.ParameterName = "#id";
param.Value = id;
command.Parameters.Add(param);
return command.ExecuteReader(CommandBehavior.SingleResult).ToString();
For all select's one connection is shared and never closed. Insert is quite fast (less then minute) but select's are very troublesome here. Is there a way to improve them?
Table is quite big (around ~2 milions records) and Value contains quite heavy serialized objects.
System.Data.SQLite provider is used and connection string contains this additional options: Version=3;Journal Mode=Off;Synchronous=off;
If you go for performance, you need to consider this: each independent SELECT command is a roundtrip to the DB with some extra costs. It's similar to a N+1 select problem in case of parent-child relations.
The best thing you can do is to get a LIST of items (values):
SELECT Value FROM tableName WHERE Id IN (1, 2, 3, 4, ...);
Here's a link on how to code that: https://www.mikesdotnetting.com/article/116/parameterized-in-clauses-with-ado-net-and-linq
You could have the select command not recreated for every Id but created once and only executed for every Id. From your code it seems every select is CreateCommand/CreateParameters and so on. See this for example: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.prepare?view=net-5.0 - you run .Prepare() once and then only execute (they don't need to be NonQuery)
you could then try to see if you can be faster with ExecuteScalar and not having reader created for one data result, like so: https://learn.microsoft.com/en-us/dotnet/api/system.data.idbcommand.executescalar?view=net-5.0
If scalar will not prove to be faster then you could try to use .SingleRow instead of .SingleResult in your ExecuteReader for possible performance optimisations. According to this: https://learn.microsoft.com/en-us/dotnet/api/system.data.commandbehavior?view=net-5.0 it might work. I doubt that but if first two don't help, why not try it too.

How to fix SQL Injection Issue of truncation of table

Below is the line of code where I truncate table records. The table value is coming from the front end. In my Veracode scan, it is showing SQL injection. How can I avoid this? I cannot create a stored procedure as the connection string is dynamic where I need to truncate this table. Is there another approach?
SqlCommand cmd = connection.CreateCommand();
cmd.Transaction = transaction;
cmd.CommandText = "TRUNCATE TABLE " + tablename;
cmd.ExecuteNonQuery();
You need dynamic sql:
string sql = #"
DECLARE #SQL nvarchar(150);
SELECT #SQL = 'truncate table ' + quotename(table_name) + ';'
FROM information_schema.tables
WHERE table_name = #table;
EXEC(#SQL);";
using (var connection = new SqlConnection("connection string here"))
using (var cmd = new SqlCommand(sql, connection))
{
cmd.Transaction = transaction;
cmd.Parameters.Add("#table", SqlDbType.NVarChar, 128).Value = tablename;
connection.Open();
cmd.ExecuteNonQuery();
}
This is one of very few times dynamic SQL makes things more secure, rather than less. Even better, if you also maintain a special table in this database listing other tables users are allowed to truncate, and use that rather than information_schema to validate the name. The idea of letting users just truncate anything is kind of scary.
Parametrized or not, you can make it only a little more secured in this case. Never totally secured. For this you need
create table TruncMapping in DB where you store
id guid
statement varchar(300)
your data will look like
SOME-GUID-XXX-YYY, 'TRUNCATE TABLE TBL1'
In your front end use a listbox or combobox with text/value like "Customer Data"/"SOME-GUID-XXX-YYY"
In your code use ExecuteScalar to execute Select statement from TruncMapping where id = #1 , where id will be parameterized GUID from combo value
Execute your truncate command using ExecuteNonQuery as you do now but with a retrieved string from previous call.
Your scan tool will most likely choke. If it is still thinking code is unsafe, you can safely point this as false positive because what you execute is coming from your secured DB. Potential attacker has no way to sabotage your "non-tuncatable tables" because they are not listed in TruncMapping tables.
You've just created multi-layered defense against sql injection.
here is one way to hide it from scanning tools
private const string _sql = "VFJVTkNBVEUgVEFCTEU=";
. . . .
var temp = new { t = tablename };
cmd.CommandText =
Encoding.ASCII.GetString(Convert.FromBase64String(_sql)) + temp.t.PadLeft(temp.t.Length + 1);
security by obscurity

How to Identify whether SQL job is successfully executed or not in C#

I have an C# method to execute a SQL job. It executes the SQL job successfully.
And the code works perfect.
And I'm using standard SQL stored procedure msdb.dbo.sp_start_job for this.
Here is my code..
public int ExcecuteNonquery()
{
var result = 0;
using (var execJob =new SqlCommand())
{
execJob.CommandType = CommandType.StoredProcedure;
execJob.CommandText = "msdb.dbo.sp_start_job";
execJob.Parameters.AddWithValue("#job_name", "myjobname");
using (_sqlConnection)
{
if (_sqlConnection.State == ConnectionState.Closed)
_sqlConnection.Open();
sqlCommand.Connection = _sqlConnection;
result = sqlCommand.ExecuteNonQuery();
if (_sqlConnection.State == ConnectionState.Open)
_sqlConnection.Close();
}
}
return result;
}
Here is the sp which executing inside the job
ALTER PROCEDURE [Area1].[Transformation]
AS
BEGIN
SET NOCOUNT ON;
SELECT NEXT VALUE FOR SQ_COMMON
-- Transform Master Data
exec [dbo].[sp_Transform_Address];
exec [dbo].[sp_Transform_Location];
exec [dbo].[sp_Transform_Product];
exec [dbo].[sp_Transform_Supplier];
exec [dbo].[sp_Transform_SupplierLocation];
-- Generate Hierarchies and Product References
exec [dbo].[sp_Generate_HierarchyObject] 'Area1',FGDemand,1;
exec [dbo].[sp_Generate_HierarchyObject] 'Area1',RMDemand,2;
exec [dbo].[sp_Generate_Hierarchy] 'Area1',FGDemand,1;
exec [dbo].[sp_Generate_Hierarchy] 'Area1',RMDemand,2;
exec [dbo].[sp_Generate_ProductReference] 'Area1',FGDemand,1;
exec [dbo].[sp_Generate_ProductReference] 'Area1',RMDemand,2;
-- Transform Demand Allocation BOM
exec [Area1].[sp_Transform_FGDemand];
exec [Area1].[sp_Transform_FGAllocation];
exec [Area1].[sp_Transform_RMDemand];
exec [Area1].[sp_Transform_RMAllocation];
exec [Area1].[sp_Transform_BOM];
exec [Area1].[sp_Transform_RMDemand_FK];
-- Transform Purchasing Document Data
exec [dbo].[sp_Transform_PurchasingDoc];
exec [dbo].[sp_Transform_PurchasingItem];
exec [dbo].[sp_Transform_ScheduleLine];
exec [dbo].[sp_CalculateRequirement] 'Area1'
exec [dbo].[sp_Create_TransformationSummary] 'Area1'
-- Trauncate Integration Tables
exec [dbo].[sp_TruncateIntegrationTables] 'Area1'
END
The problem is, even the job is executed successfully or not it always returns -1. How can I identify whether job is successfully executed or not.
After running msdb.dbo.sp_start_job the return code is mapped to an output parameter. You have the opportunity to control the parameter's name prior to execution:
public int StartMyJob( string connectionString )
{
using (var sqlConnection = new SqlConnection( connectionString ) )
{
sqlConnection.Open( );
using (var execJob = sqlConnection.CreateCommand( ) )
{
execJob.CommandType = CommandType.StoredProcedure;
execJob.CommandText = "msdb.dbo.sp_start_job";
execJob.Parameters.AddWithValue("#job_name", "myjobname");
execJob.Parameters.Add( "#results", SqlDbType.Int ).Direction = ParameterDirection.ReturnValue;
execJob.ExecuteNonQuery();
return ( int ) sqlCommand.Parameters["results"].Value;
}
}
}
You need to know the datatype of the return code to do this - and for sp_start_job, it's SqlDbType.Int.
However, this is only the results of starting the job, which is worth knowing, but isn't the results of running your job. To get the results running of your job, you can periodically execute:
msdb.dbo.sp_help_job #jobName
One of the columns returned by the procedure is last_run_outcome and probably contains what you're really interested in. It will be 5 (unknown) while it's still running.
A job is usually the a number of steps - where each step may or may not be executed according to the outcome of previous steps. Another procedure called sp_help_jobhistory supports a lot of filters to specify which specific invocation(s) and/or steps of the job you're interested in.
SQL likes to think about jobs as scheduled work - but there's nothing to keep you from just starting a job ad-hoc - although it doesn't really provide you with much support to correlate your ad-hoc job with an instance is the job history. Dates are about as good as it gets (unless somebody knows a trick I don't know.)
I've seen where the job is created ad-hoc job just prior to running it, so the current ad-hoc execution is the only execution returned. But you end up with a lot of duplicate or near-duplicate jobs laying around that are never going to be executed again. Something you'll have to plan on cleaning up afterwards, if you go that route.
A note on your use of the _sqlConnection variable. You don't want to do that. Your code disposes of it, but it was apparently created elsewhere before this method gets called. That's bad juju. You're better off just creating the connection and disposing of it the same method. Rely on SQL connection pooling to make the connection fast - which is probably already turned on.
Also - in the code you posted - it looks like you started with execJob but switched to sqlCommand - and kinda messed up the edit. I assumed you meant execJob all the way through - and that's reflected in the example.
From MSDN about SqlCommand.ExecuteNonQuery Method:
For UPDATE, INSERT, and DELETE statements, the return value is the number of rows affected by the command. When a trigger exists on a table being inserted or updated, the return value includes the number of rows affected by both the insert or update operation and the number of rows affected by the trigger or triggers. For all other types of statements, the return value is -1. If a rollback occurs, the return value is also -1.
In this line:
result = sqlCommand.ExecuteNonQuery();
You want to return the number of rows affected by the command and save it to an int variable but since the type of statement is select so it returns -1. If you test it with INSERT or DELETE or UPDATE statements you will get the correct result.
By the way if you want to get the number of rows affected by the SELECT command and save it to an int variable you can try something like this:
select count(*) from jobs where myjobname = #myjobname
And then use ExecuteScalar to get the correct result:
result = (int)execJob.ExecuteScalar();
You need to run stored proceedure msdb.dbo.sp_help_job
private int CheckAgentJob(string connectionString, string jobName) {
SqlConnection dbConnection = new SqlConnection(connectionString);
SqlCommand command = new SqlCommand();
command.CommandType = System.Data.CommandType.StoredProcedure;
command.CommandText = "msdb.dbo.sp_help_job";
command.Parameters.AddWithValue("#job_name", jobName);
command.Connection = dbConnection;
using (dbConnection)
{
dbConnection.Open();
using (command){
SqlDataReader reader = command.ExecuteReader();
reader.Read();
int status = reader.GetInt32(21); // Row 19 = Date Row 20 = Time 21 = Last_run_outcome
reader.Close();
return status;
}
}
}
enum JobState { Failed = 0, Succeeded = 1, Retry = 2, Cancelled = 3, Unknown = 5};
Keep polling on Unknown, until you get an answer. Lets hope it is succeeded :-)

SQL Insert one row or multiple rows data?

I am working on a console application to insert data to a MS SQL Server 2005 database. I have a list of objects to be inserted. Here I use Employee class as example:
List<Employee> employees;
What I can do is to insert one object at time like this:
foreach (Employee item in employees)
{
string sql = #"INSERT INTO Mytable (id, name, salary)
values ('#id', '#name', '#salary')";
// replace #par with values
cmd.CommandText = sql; // cmd is IDbCommand
cmd.ExecuteNonQuery();
}
Or I can build a balk insert query like this:
string sql = #"INSERT INTO MyTable (id, name, salary) ";
int count = employees.Count;
int index = 0;
foreach (Employee item in employees)
{
sql = sql + string.format(
"SELECT {0}, '{1}', {2} ",
item.ID, item.Name, item.Salary);
if ( index != (count-1) )
sql = sql + " UNION ALL ";
index++
}
cmd.CommandType = sql;
cmd.ExecuteNonQuery();
I guess the later case is going to insert rows of data at once. However, if I have
several ks of data, is there any limit for SQL query string?
I am not sure if one insert with multiple rows is better than one insert with one row of data, in terms of performance?
Any suggestions to do it in a better way?
Actually, the way you have it written, your first option will be faster.
Your second example has a problem in it. You are doing sql = + sql + etc. This is going to cause a new string object to be created for each iteration of the loop. (Check out the StringBuilder class). Technically, you are going to be creating a new string object in the first instance too, but the difference is that it doesn't have to copy all the information from the previous string option over.
The way you have it set up, SQL Server is going to have to potentially evaluate a massive query when you finally send it which is definitely going to take some time to figure out what it is supposed to do. I should state, this is dependent on how large the number of inserts you need to do. If n is small, you are probably going to be ok, but as it grows your problem will only get worse.
Bulk inserts are faster than individual ones due to how SQL server handles batch transactions. If you are going to insert data from C# you should take the first approach and wrap say every 500 inserts into a transaction and commit it, then do the next 500 and so on. This also has the advantage that if a batch fails, you can trap those and figure out what went wrong and re-insert just those. There are other ways to do it, but that would definately be an improvement over the two examples provided.
var iCounter = 0;
foreach (Employee item in employees)
{
if (iCounter == 0)
{
cmd.BeginTransaction;
}
string sql = #"INSERT INTO Mytable (id, name, salary)
values ('#id', '#name', '#salary')";
// replace #par with values
cmd.CommandText = sql; // cmd is IDbCommand
cmd.ExecuteNonQuery();
iCounter ++;
if(iCounter >= 500)
{
cmd.CommitTransaction;
iCounter = 0;
}
}
if(iCounter > 0)
cmd.CommitTransaction;
In MS SQL Server 2008 you can create .Net table-UDT that will contain your table
CREATE TYPE MyUdt AS TABLE (Id int, Name nvarchar(50), salary int)
then, you can use this UDT in your stored procedures and your с#-code to batch-inserts.
SP:
CREATE PROCEDURE uspInsert
(#MyTvp AS MyTable READONLY)
AS
INSERT INTO [MyTable]
SELECT * FROM #MyTvp
C# (imagine that records you need to insert already contained in Table "MyTable" of DataSet ds):
using(conn)
{
SqlCommand cmd = new SqlCommand("uspInsert", conn);
cmd.CommandType = CommandType.StoredProcedure;
SqlParameter myParam = cmd.Parameters.AddWithValue
("#MyTvp", ds.Tables["MyTable"]);
myParam.SqlDbType = SqlDbType.Structured;
myParam.TypeName = "dbo.MyUdt";
// Execute the stored procedure
cmd.ExecuteNonQuery();
}
So, this is the solution.
Finally I want to prevent you from using code like yours (building the strings and then execute this string), because this way of executing may be used for SQL-Injections.
look at this thread,
I've answered there about table valued parameter.
Bulk-copy is usually faster than doing inserts on your own.
If you still want to do it in one of your suggested ways you should make it so that you can easily change the size of the queries you send to the server. That way you can optimize for speed in your production environment later on. Query times may v ary alot depending on the query size.
The batch size for a SQL Server query is listed at being 65,536 * the network packet size. The network packet size is by default 4kbs but can be changed. Check out the Maximum capacity article for SQL 2008 to get the scope. SQL 2005 also appears to have the same limit.

Categories

Resources