I have a process that takes a lists and inserts it into a database using SQL bulk copy because of how particularly large this list can be. It works fine, checks constraints and all which is perfect. The problem is, if I have 10,000 records and one of those records has an error, I still want to commit the other 9,999. Is there a way to do this other than manually checking each constraint before SQL bulk copy or inserting one at a time? Seems tedious and slow which kind of defeats the point. Thanks.
var copy = new SqlBulkCopy(ConfigurationManager.ConnectionStrings["constr"].ConnectionString, SqlBulkCopyOptions.CheckConstraints)
{
DestinationTableName = obj.TableName
};
var table = new DataTable();
copy.WriteToServer(table);
Without setting a batch size to 1 (which would defeat the purpose of the bulk copy) or pre-checking the data before the copy the normal way around this issue is you copy in to a temporary table with the same schema as your target table but with no constraints, remove the rows that would violate the constraints on insert, then do a normal insert from the temp table in to your live table.
const string _createTableString = "Create table #temp (/* SNIP */)";
const string _insertTableString = #"
declare #sql nvarchar(2000)
set #sql = N'INSERT INTO ' + QUOTENAME(#tableName) + N' SELECT * from #temp'
exec sp_executesql #sql";
using (var connection = new SqlConnection(ConfigurationManager.ConnectionStrings["constr"].ConnectionString))
{
connection.Open();
using (var command = new SqlCommand(_createTableString, connection))
{
command.ExecuteNonQuery();
}
using (var copy = new SqlBulkCopy(connection))
{
copy.DestinationTableName = "#temp";
copy.WriteToServer(table);
}
using (var command = new SqlCommand(_insertTableString, connection))
{
command.Parameters.AddWithValue("#tableName", obj.TableName)
command.ExecuteNonQuery();
}
}
Note the use of QUOTENAME to make sure that no SQL injections can sneak in via the name of the table passed in to obj.TableName.
Related
I'm writing a C# class library in which one of the features is the ability to create an empty data table that matches the schema of any existing table.
For example, this:
private DataTable RetrieveEmptyDataTable(string tableName)
{
var table = new DataTable() { TableName = tableName };
using var command = new SqlCommand($"SELECT TOP 0 * FROM {tableName}", _connection);
using SqlDataAdapter dataAdapter = new SqlDataAdapter(command);
dataAdapter.Fill(table);
return table;
}
The above code works, but it has a glaring security vulnerability: SQL injection.
My first instinct is to parameterize the query like so:
using var command = new SqlCommand("SELECT TOP 0 * FROM #tableName", _connection);
command.Parameters.AddWithValue("#tableName", tableName);
But this leads to the following exception:
Must declare the table variable "#tableName"
After a quick search on Stack Overflow I found this question, which recommends using my first approach (the one with sqli vulnerability). That doesn't help at all, so I kept searching and found this question, which says that the only secure solution would be to hard-code the possible tables. Again, this doesn't work for my class library which needs to work for arbitrary table names.
My question is this: how can I parameterize the table name without vulnerability to SQL injection?
An arbitrary table name still has to exist, so you can check first that it does:
IF EXISTS (SELECT 1 FROM sys.objects WHERE name = #TableName)
BEGIN
... do your thing ...
END
And further, if the list of tables you want to allow the user to select from is known and finite, or matches a specific naming convention (like dbo.Sales%), or belongs to a specific schema (like Reporting), you could add additional predicates to check for those.
This requires you to pass the table name in as a proper parameter, not concatenate or token-replace. (And please don't use AddWithValue() for anything, ever.)
Once your check that the object is real and valid has passed, then you will still have to build your SQL query dynamically, because you still won't be able to parameterize the table name. You still should apply QUOTENAME(), though, as I explain in these posts:
Protecting Yourself from SQL Injection in SQL Server - Part 1
Protecting Yourself from SQL Injection in SQL Server - Part 2
So the final code would be something like:
CREATE PROCEDURE dbo.SelectFromAnywhere
#TableName sysname
AS
BEGIN
IF EXISTS (SELECT 1 FROM sys.objects
WHERE name = #TableName)
BEGIN
DECLARE #sql nvarchar(max) = N'SELECT *
FROM ' + QUOTENAME(#TableName) + N';';
EXEC sys.sp_executesql #sql;
END
ELSE
BEGIN
PRINT 'Nice try, robot.';
END
END
GO
If you also want it to be in some defined list you can add
AND #TableName IN (N't1', N't2', …)
Or LIKE <some pattern> or join to sys.schemas or what have you.
Provided nobody has the rights to then modify the procedure to change the checks, there is no value you can pass to #TableName that will allow you to do anything malicious, other than maybe selecting from another table you didn’t expect because someone with too much access was able to create before calling the code. Replacing characters like -- or ; does not make this any safer.
You could pass the table name to the SQL Server to apply quotename() on it to properly quote it and subsequently only use the quoted name.
Something along the lines of:
...
string quotedTableName = null;
using (SqlCommand command = new SqlCommand("SELECT quotename(#tablename);", connection))
{
SqlParameter parameter = command.Parameters.Add("#tablename", System.Data.SqlDbType.NVarChar, 128 /* nvarchar(128) is (currently) equivalent to sysname which doesn't seem to exist in SqlDbType */);
parameter.Value = tableName;
object buff = command.ExecuteScalar();
if (buff != DBNull.Value
&& buff != null /* theoretically not possible since a FROM-less SELECT always returns a row */)
{
quotedTableName = buff.ToString();
}
}
if (quotedTableName != null)
{
using (SqlCommand command = new SqlCommand($"SELECT TOP 0 FROM { quotedTableName };", connection))
{
...
}
}
...
(Or do the dynamic part on SQL Server directly, also using quotename(). But that seems overly and unnecessary tedious, especially if you will do more than one operation on the table in different places.)
Aaron Bertrand's answer solved the problem, but a stored procedure is not useful for a class library that might interact with any database. Here is the way to write RetrieveEmptyDataTable (the method from my question) using his
answer:
private DataTable RetrieveEmptyDataTable(string tableName)
{
const string tableNameParameter = "#TableName";
var query =
" IF EXISTS (SELECT 1 FROM sys.objects\n" +
$" WHERE name = {tableNameParameter})\n" +
" BEGIN\n" +
" DECLARE #sql nvarchar(max) = N'SELECT TOP 0 * \n" +
$" FROM ' + QUOTENAME({tableNameParameter}) + N';';\n" +
" EXEC sys.sp_executesql #sql;\n" +
"END";
using var command = new SqlCommand(query, _connection);
command.Parameters.Add(tableNameParameter, SqlDbType.NVarChar).Value = tableName;
using SqlDataAdapter dataAdapter = new SqlDataAdapter(command);
var table = new DataTable() { TableName = tableName };
Connect();
dataAdapter.Fill(table);
Disconnect();
return table;
}
Below is the line of code where I truncate table records. The table value is coming from the front end. In my Veracode scan, it is showing SQL injection. How can I avoid this? I cannot create a stored procedure as the connection string is dynamic where I need to truncate this table. Is there another approach?
SqlCommand cmd = connection.CreateCommand();
cmd.Transaction = transaction;
cmd.CommandText = "TRUNCATE TABLE " + tablename;
cmd.ExecuteNonQuery();
You need dynamic sql:
string sql = #"
DECLARE #SQL nvarchar(150);
SELECT #SQL = 'truncate table ' + quotename(table_name) + ';'
FROM information_schema.tables
WHERE table_name = #table;
EXEC(#SQL);";
using (var connection = new SqlConnection("connection string here"))
using (var cmd = new SqlCommand(sql, connection))
{
cmd.Transaction = transaction;
cmd.Parameters.Add("#table", SqlDbType.NVarChar, 128).Value = tablename;
connection.Open();
cmd.ExecuteNonQuery();
}
This is one of very few times dynamic SQL makes things more secure, rather than less. Even better, if you also maintain a special table in this database listing other tables users are allowed to truncate, and use that rather than information_schema to validate the name. The idea of letting users just truncate anything is kind of scary.
Parametrized or not, you can make it only a little more secured in this case. Never totally secured. For this you need
create table TruncMapping in DB where you store
id guid
statement varchar(300)
your data will look like
SOME-GUID-XXX-YYY, 'TRUNCATE TABLE TBL1'
In your front end use a listbox or combobox with text/value like "Customer Data"/"SOME-GUID-XXX-YYY"
In your code use ExecuteScalar to execute Select statement from TruncMapping where id = #1 , where id will be parameterized GUID from combo value
Execute your truncate command using ExecuteNonQuery as you do now but with a retrieved string from previous call.
Your scan tool will most likely choke. If it is still thinking code is unsafe, you can safely point this as false positive because what you execute is coming from your secured DB. Potential attacker has no way to sabotage your "non-tuncatable tables" because they are not listed in TruncMapping tables.
You've just created multi-layered defense against sql injection.
here is one way to hide it from scanning tools
private const string _sql = "VFJVTkNBVEUgVEFCTEU=";
. . . .
var temp = new { t = tablename };
cmd.CommandText =
Encoding.ASCII.GetString(Convert.FromBase64String(_sql)) + temp.t.PadLeft(temp.t.Length + 1);
security by obscurity
I'm working with an hosted service in C# asp.net core, linQ and T-SQL.
I need to make an insert one by one of records in my database.
Of course this is not a fast operation, but I'm not that experienced in this field so maybe I'm doing it wrong.
This is my code in my manager:
public void StrategyMassive(string foldpathsave)
{
using (IServiceScope scope = _services.CreateScope())
{
List<string> filesreading = new List<string>();
VUContext _context = scope.ServiceProvider.GetRequiredService<VUContext>();
List<string> filesnumber = File.ReadAllLines(foldpathsave).ToList();
filesreading = filesnumber.ToList();
filesreading.RemoveRange(0, 2);
foreach (string singlefile in filesreading)
{
//INTERNAL DATA NORMALIZATION
_repository.ImportAdd(_context, newVUL, newC2, newC3, newDATE);
_repository.Save(_context);
}
}
}
And this is my repository interface:
public void ImportAdd(VUContext _context, AVuTable newVUL, ACs2Table newC2, ACs3Table newC3, ADateTable newDATe)
{
_context.AVuTable.Add(newVU);
_context.ADateTable.Add(newDATE);
if (newC2 != null)
{
_context.ACs2Table.Add(newC2);
}
if (newC3 != null)
{
_context.ACs3Table.Add(newC3);
}
public void Save(VUContext _context)
{
_context.SaveChanges();
}
}
It everything quite simple I know, so how can I speed up this insert keeping it one by one record easly?
Start NOT using the slowest way to do it.
It starts with the way you actually load the files.
It goes on by not using SqlBulkCopy - in multiple threads possibly - to write the data to the database.
What you do is the slowest possible way - because EntityFramework is NOT an ETL tool.
Btw., one transaction per item (SaveChanges) does not help either. It maeks a super slow solution really really really super slow.
I manage to laod around 64k rows per second per thread, with 4-6 threads running in parallel.
To my experience SqlBulkCopy is the fastest way to do it. filesnumber sounds to be misnomer and I suspect you are reading a list of delimited files to be loaded to SQL Server after some normalization process. Probably that would even be faster if you do your normalization on server side, after loading the data initially to a temp file. Here is a sample SqlBulkCopy from a delimited file:
void Main()
{
Stopwatch sw = new Stopwatch();
sw.Start();
string sqlConnectionString = #"server=.\SQLExpress2012;Trusted_Connection=yes;Database=SampleDb";
string path = #"d:\temp\SampleTextFiles";
string fileName = #"combDoubledX.csv";
using (OleDbConnection cn = new OleDbConnection(
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source="+path+
";Extended Properties=\"text;HDR=No;FMT=Delimited\";"))
using (SqlConnection scn = new SqlConnection( sqlConnectionString ))
{
OleDbCommand cmd = new OleDbCommand("select * from "+fileName, cn);
SqlBulkCopy sbc = new SqlBulkCopy(scn, SqlBulkCopyOptions.TableLock,null);
sbc.ColumnMappings.Add(0,"[Category]");
sbc.ColumnMappings.Add(1,"[Activity]");
sbc.ColumnMappings.Add(5,"[PersonId]");
sbc.ColumnMappings.Add(6,"[FirstName]");
sbc.ColumnMappings.Add(7,"[MidName]");
sbc.ColumnMappings.Add(8,"[LastName]");
sbc.ColumnMappings.Add(12,"[Email]");
cn.Open();
scn.Open();
SqlCommand createTemp = new SqlCommand();
createTemp.CommandText = #"if exists
(SELECT * FROM tempdb.sys.objects
WHERE object_id = OBJECT_ID(N'[tempdb]..[##PersonData]','U'))
BEGIN
drop table [##PersonData];
END
create table ##PersonData
(
[Id] int identity primary key,
[Category] varchar(50),
[Activity] varchar(50) default 'NullOlmasin',
[PersonId] varchar(50),
[FirstName] varchar(50),
[MidName] varchar(50),
[LastName] varchar(50),
[Email] varchar(50)
)
";
createTemp.Connection = scn;
createTemp.ExecuteNonQuery();
OleDbDataReader rdr = cmd.ExecuteReader();
sbc.NotifyAfter = 200000;
//sbc.BatchSize = 1000;
sbc.BulkCopyTimeout = 10000;
sbc.DestinationTableName = "##PersonData";
//sbc.EnableStreaming = true;
sbc.SqlRowsCopied += (sender,e) =>
{
Console.WriteLine("-- Copied {0} rows to {1}.[{2} milliseconds]",
e.RowsCopied,
((SqlBulkCopy)sender).DestinationTableName,
sw.ElapsedMilliseconds);
};
sbc.WriteToServer(rdr);
if (!rdr.IsClosed) { rdr.Close(); }
cn.Close();
scn.Close();
}
sw.Stop();
sw.Dump();
}
And few sample lines from that file:
"Computer Labs","","LRC 302 Open Lab","","","10057380","Test","","Cetin","","5550123456","","cb#nowhere.com"
"Computer Labs","","LRC 302 Open Lab","","","123456789","John","","Doe","","5551234567","","jdoe#somewhere.com"
"Computer Labs","","LRC 302 Open Lab","","","012345678","Mary","","Doe","","5556666444","","mdoe#here.com"
You could create and run a list of Tasks<> doing SqlBulkCopy reading from a source (SqlBulkCopy supports a series of readers).
For faster operation you need to reduce the amount of database roundtrips
Using batching of statements feature in EF Core
You can see this feature is available only in EF Core, so you need to migrate to using EF Core if you are still using EF 6.
Compare EF Core & EF6
For this feature to work you need to move the Save operation outside of the loop.
Bulk insert
Bulk insert feature is designed to be the fastest way to insert large amount of database records
Bulk Copy Operations in SQL Server
To use it you need to use the SqlBulkCopy class for SQL Server and your code needs considerable rework.
I have written a program that in .net that should copy tables data from one server to another. However I am getting an error:
cannot access destination table "mytable"
Despite googling and looking everywhere I cannot find a solution to the error I am getting
Some posts mentions permissions and I have done the following:
GRANT SELECT, UPDATE, DELETE, INSERT TO bulkadmin
but still no success.
Am I missing the obvious?
Help is greatly appreciated.
EDIT
I bulk copy 3 databases with 1000 tables to 01 "target" database.
I have simplified the code that I use and also tested with no luck.The intention is todo in Parallel ,but I want to get it working with a simple table first
private void TestBulkCopy(string sourceServer, string sourceDatabase, List<string> sourceTables)
{
string connectionStringSource = ConfigurationManager.ConnectionStrings["TestDB"].ConnectionString;
string connectionStringTarget = ConfigurationManager.ConnectionStrings["TestDB"].ConnectionString;
string sqlGetDataFromSource = string.Format("SELECT * FROM {0}", "testTable");
using (var sourceConnection = new SqlConnection(connectionStringSource))
{
sourceConnection.Open();
using (var cmdSource = new SqlCommand(sqlGetDataFromSource, sourceConnection))
using (SqlDataReader readerSource = cmdSource.ExecuteReader())
{
using (var sqlTargetConnection = new SqlConnection(connectionStringTarget))
{
sqlTargetConnection.Open();
using (var bulkCopy = new SqlBulkCopy(sqlTargetConnection, SqlBulkCopyOptions.TableLock, null))
{
bulkCopy.DestinationTableName = "testTable";
bulkCopy.SqlRowsCopied += OnSqlRowsCopied;
bulkCopy.BatchSize = 2600;
bulkCopy.NotifyAfter = 50;
bulkCopy.BulkCopyTimeout = 60;
bulkCopy.WriteToServer(readerSource);
}
}
}
}
}
}
Write the Schema before the table Name
Change
bulkCopy.DestinationTableName = "testTable";
to
bulkCopy.DestinationTableName = "dbo.testTable";
I think your destination table have defined field with auto-number identify. So, SqlBulkCopy can not copy values into that column. You must OFF that to-number identify column on destination table using this code :
BEGIN
SET IDENTITY_INSERT [building] ON;
INSERT INTO [Table2](.....)
VALUES(#id, #id_project,....)
SET IDENTITY_INSERT [building] OFF;
END
or edit definition of destination table and remove auto-number identify on that column.
The table name in WriteToServer method of SqlBulkCopy must be surrounded with [ ] signs.
I'm working with 2 SQL 2008 Servers on different machines. The server names are source.ex.com, and destination.ex.com.
destination.ex.com is linked to source.ex.com and the appropriate permissions are in place for source.ex.com to write to a database called bacon-wrench on destination.ex.com
I've logged into source.ex.com via SMS and tested this query (successfully):
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (4,6);
In a C# .NET 4.0 WebPage I connect to source.ex.com and perform a similar query (successfully):
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
String sql = #"
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (34,56);";
using(SqlCommand cmd = new SqlCommand(sql, c))
{
cmd.ExecuteNonQuery();
}
}
For small sets of insert statements (say 20 or less) doing something like this performs fine:
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
String sql = #"
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (34,56);
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (22,11);
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (33,55);
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (1,2);";
using(SqlCommand cmd = new SqlCommand(sql, c))
{
cmd.ExecuteNonQuery();
}
}
I'm trying to do something like this with around 20000 records. The above method takes 11 minutes to complete -- which I assume is the server sreaming at me to make it some kind of bulk operation. From other StackOverflow threads the SqlBulkCopy class was recommended and it takes as a parameter DataTable, perfect!
So I build a DataTable and attempt to write it to the server (fail):
DataTable dt = new DataTable();
dt.Columns.Add("PunchID", typeof(int));
dt.Columns.Add("BaconID", typeof(int));
for(int i = 0; i < 20000; i++)
{
//I realize this would make 20000 duplicate
//rows but its not important
dt.Rows.Add(new object[] {
11, 33
});
}
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
using(SqlBulkCopy bulk = new SqlBulkCopy(c))
{
bulk.DestinationTableName = "[destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]";
bulk.ColumnMappings.Add("PunchID", "PunchID");
bulk.ColumnMappings.Add("BaconID", "BaconID");
bulk.WriteToServer(dt);
}
}
EDIT2: The below message is what I'm attempting to fix:
The web page crashes at bulk.WriteToServer(dt); with an error message Database bacon-wrench does not exist please ensure it is typed correctly. What am I doing wrong? How do I change this to get it to work?
EDIT1:
I was able to speed up the query significantly using the below syntax. But it is still very slow for such a small record set.
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
String sql = #"
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES
(34,56),
(22,11),
(33,55),
(1,2);";
using(SqlCommand cmd = new SqlCommand(sql, c))
{
cmd.ExecuteNonQuery();
}
}
If you are using SQL Server 2008+, you can introduce a Table user datatype. Prepare the type, receiving table and stored procedure something like below. Data type and stored procedure is on the local system. I generally have an if statement in the code detecting whether the table is remote or local, remote I do this, local I use SqlBulkCopy.
if(TYPE_ID(N'[Owner].[TempTableType]') is null)
begin
CREATE TYPE [Owner].[TempTableType] AS TABLE ( [PendingID] uniqueidentifier, [Reject] bit)
end
IF NOT EXISTS (SELECT * FROM [LinkedServer].[DatabaseOnLS].sys.tables where name = 'TableToReceive')
EXEC('
CREATE TABLE [DatabaseOnLS].[Owner].[TableToReceive] ( [PendingID] uniqueidentifier, [Reject] bit)
') AT [LinkedServer]
else
EXEC('
TRUNCATE TABLE [DatabaseOnLS].[Owner].[TableToReceive]
') AT [LinkedServer]
CREATE PROCEDURE [Owner].[TempInsertTable]
#newTableType TempTableType readonly
AS
BEGIN
insert into [LinkedServer].[DatabaseOnLS].[Owner].[TableToReceive] select * from #newTableType
END
In the C# code you can then do something like this to insert the DataTable into the table on the linked server (I'm using an existing UnitOfWork, which already have a connection and transaction):
using (var command = new SqlCommand("TempInsertTable",
oUoW.Database.Connection as SqlConnection) { CommandType = CommandType.StoredProcedure }
)
{
command.Transaction = oUoW.Database.CurrentTransaction as SqlTransaction;
command.Parameters.Add(new SqlParameter("#newTableType", oTempTable));
drResults = command.ExecuteReader();
drResults.Close();
}
After trying a number of things including linked server settings, collations, synonyms, etc., I eventually got to this error message:
Inserting into remote tables or views is not allowed by using the BCP utility or by using BULK INSERT.
Perhaps you can bulk insert to a staging table on your local server (your code works fine for this) and then insert from that staging table to your linked server from there, followed by a local delete of the staging table. You'll have to test for performance.