Slow insert performance with large amounts of data (SQL Server / C#) - c#

I'm working with electronic equipment that digitizes waveforms in real-time (each device generates around 1000 512 byte arrays per second - we have 12 devices). I've written a client for these devices in C# that for the most part works fine and has no performance issues.
However, one of the requirements for the application is archival, and Microsoft SQL Server 2010 was mandated as the storage mechanism (outside of my control). The database layout is very simple: there is one table per device per day ("Archive_Dev02_20131015" etc). Each table has an Id column, a timestamp column, a Data column (varbinary) and 20 more integer columns with some metadata. There's a clustered primary key on Id and timestamp, and another separate index on timestamp. My naive approach was queue all data in the client application, and then inserting everything into the database in 5 second intervals using SqlCommand.
The basic mechanism looks like this:
using (SqlTransaction transaction = connection.BeginTransaction()
{
//Beginning of the insert sql statement...
string sql = "USE [DatabaseName]\r\n" +
"INSERT INTO [dbo].[Archive_Dev02_20131015]\r\n" +
"(\r\n" +
" [Timestamp], \r\n" +
" [Data], \r\n" +
" [IntField1], \r\n" +
" [...], \r\n" +
") \r\n" +
"VALUES \r\n" +
"(\r\n" +
" #timestamp, \r\n" +
" #data, \r\n" +
" #int1, \r\n" +
" #..., \r\n" +
")";
using (SqlCommand cmd = new SqlCommand(sql))
{
cmd.Connection = connection;
cmd.Transaction = transaction;
cmd.Parameters.Add("#timestamp", System.Data.SqlDbType.DateTime);
cmd.Parameters.Add("#data", System.Data.SqlDbType.Binary);
cmd.Parameters.Add("#int1", System.Data.SqlDbType.Int);
foreach (var sample in samples)
{
cmd.Parameters[0].Value = amples.ReceiveDate;
cmd.Parameters[1].Value = samples.Data; //Data is a byte array
cmd.Parameters[1].Size = samples.Data.Length;
cmd.Parameters[2].Value = sample.IntValue1;
...
int affected = cmd.ExecuteNonQuery();
if (affected != 1)
{
throw new Exception("Could not insert sample into the database!");
}
}
}
}
transaction.Commit();
}
To summarize: a batch of 1 transaction with a loop that generates insert statements and executes them.
This method turned out to be very, very slow. On my machine (i5-2400 # 3.1GHz, 8GB RAM, using .NET 4.0 and SQL Server 2008, 2 internal HDs in mirror, everything runs locally), it takes about 2,5 seconds to save the data from 2 devices, so saving 12 devices each 5 seconds is impossible.
To compare, I've written a small SQL script (actually I extracted the code C# runs with the sql server profiler) that does the same directly on the server (still running on my own machine):
set statistics io on
go
begin transaction
go
declare #i int = 0;
while #i < 24500 begin
SET #i = #i + 1
exec sp_executesql N'USE [DatabaseName]
INSERT INTO [dbo].[Archive_Dev02_20131015]
(
[Timestamp],
[Data],
[int1],
...
[int20]
)
VALUES
(
#timestamp,
#data,
#compressed,
#int1,
...
#int20,
)',N'#timestamp datetime,#data binary(118),#int1 int,...,#int20 int,',
#timestamp='2013-10-14 14:31:12.023',
#data=0xECBD07601C499625262F6DCA7B7F4AF54AD7E074A10880601324D8904010ECC188CDE692EC1D69472329AB2A81CA6556655D661640CCED9DBCF7DE7BEFBDF7DE7BEFBDF7BA3B9D4E27F7DFFF3F5C6664016CF6CE4ADAC99E2180AAC81F3F7E7C1F3F22FEEF5FE347FFFDBFF5BF1FC6F3FF040000FFFF,
#int=0,
...
#int20=0
end
commit transaction
This does (imo, but I'm probably wrong ;) ) the same thing, only this time I'm using 24500 iterations, to simulate the 12 devices at once. The query takes about 2 seconds. If I use the same amount of iterations as the C# version, the query runs in less than a second.
So my first question is: why does it run way faster on SQL server than in C#? Does this have anything to do with the connection (local tcp)?
To make matters more confusing (to me) this code runs twice as slow on the production server (IBM bladecenter, 32GB ram, fiber connection to SAN, ... filesystem operations are really fast). I've tried looking at the sql activity monitor and write performance never goes above 2MB/sec, but this might as well be normal. I'm a complete newbie to sql server (about the polar opposite of a competent DBA in fact).
Any ideas on how I can make the C# code more performant?

By far the best approach for loading this sort of data is to use a table-valued parameter, and a stored procedure that takes the data. A really simple example of a table type and procedure that uses it would be:
CREATE TYPE [dbo].[StringTable]
AS TABLE ([Value] [nvarchar] (MAX) NOT NULL)
GO
CREATE PROCEDURE [dbo].[InsertStrings]
#Paths [dbo].[StringTable] READONLY
AS
INSERT INTO [dbo].[MyTable] ([Value])
SELECT [Value] FROM #Paths
GO
Then the C# code would be something along the lines of (please bear in mind that I've typed this into the S/O editor so there might be typos):
private static IEnumerable<SqlDataRecord> TransformStringList(ICollection<string> source)
{
if (source == null || source.Count == 0)
{
return null;
}
return GetRecords(source,
() => new SqlDataRecord(new SqlMetaData("Value", SqlDbType.NVarChar, -1)),
(record, value) => record.SetString(0, value));
}
private static IEnumerable<SqlDataRecord> GetRecords<T>(IEnumerable<T> source, Func<SqlDataRecord> factory, Action<SqlDataRecord, T> hydrator)
{
SqlDataRecord dataRecord = factory();
foreach (var value in source)
{
hydrator(dataRecord, value);
yield return dataRecord;
}
}
private InsertStrings(ICollection<string> strings, SqlConnection connection)
{
using (var transaction = connection.BeginTransaction())
{
using (var cmd = new SqlCommand("dbo.InsertStrings"))
{
cmd.Connection = connection;
cmd.Transaction = transaction;
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(new SqlParameter("#Paths", SqlDbType.Structured) { Value = TransformStringList(strings) };
cmd.ExecuteNonQuery();
}
}
}
This approach has speed that rivals SqlBulkCopy, but it also yields some better control through the ability to run the things that you're updating through a procedure, and also makes it a lot easier to deal with concurrency.
Edit -> Just for completeness, this approach works on SQL Server 2008 and up. Seeing as there isn't such a thing as SQL Server 2010 I thought I'd better mention that.

In sql server,
CREATE TYPE [dbo].[ArchiveData]
AS TABLE (
[Timestamp] [DateTime] NOT NULL,
[Data] [VarBinary](MAX) NOT NULL,
[IntField1] [Int] NOT NULL,
[...] [Int] NOT NULL,
[IntField20] NOT NULL)
GO
Then your code should be something like the code below. This code uses a Table Value Parameter to insert all pending data at once, is a single transaction.
Note the ommission of the the slow and unecessaery USE DATABASE and the use of verbatim strings (#"") to make the code more readable.
// The insert sql statement.
string sql =
#"INSERT INTO [dbo].[Archive_Dev02_20131015] (
[Timestamp],
[Data],
[IntField1],
[...],
[IntField20])
SELECT * FROM #data;";
using (SqlCommand cmd = new SqlCommand(sql))
{
using (SqlTransaction transaction = connection.BeginTransaction()
{
cmd.Connection = connection;
cmd.Transaction = transaction;
cmd.Parameters.Add(new SqlParameter("#data", SqlDbType.Structured)
{
Value = TransformSamples(samples);
});
int affected = cmd.ExecuteNonQuery();
transaction.Commit();
}
}
...
private static IEnumerable<SqlDataRecord> TransformSamples(
{YourSampleType} samples)
{
var schema = new[]
{
new SqlMetaData("Timestamp", SqlDbType.DateTime),
new SqlMetaData("Timestamp", SqlDbType.VarBinary, -1),
new SqlMetaData("IntField1", SqlDbType.Int),
new SqlMetaData("...", SqlDbType.Int),
new SqlMetaData("IntField20", SqlDbType.Int)
};
foreach (var sample in samples)
{
var row = new SqlDataRecord(schema);
row.SetSqlDate(0, sample.ReceiveDate);
row.SetSqlBinary(1, sample.Data);
row.SetSqlInt(2, sample.Data.Length);
row.SetSqlInt(..., ...);
row.SetSqlInt(24, sample.IntValue19);
yield return row;
}
}

I've managed to solve my issue by using SqlBulkInsert as suggested by juharr in one of the comments above.
I've mainly based myself on this post to convert my data to a DataTable that can be bulk inserted into the database:
Convert generic List/Enumerable to DataTable?
Thanks for all your answers!

Related

Best way to insert milions of records

I'm working with an hosted service in C# asp.net core, linQ and T-SQL.
I need to make an insert one by one of records in my database.
Of course this is not a fast operation, but I'm not that experienced in this field so maybe I'm doing it wrong.
This is my code in my manager:
public void StrategyMassive(string foldpathsave)
{
using (IServiceScope scope = _services.CreateScope())
{
List<string> filesreading = new List<string>();
VUContext _context = scope.ServiceProvider.GetRequiredService<VUContext>();
List<string> filesnumber = File.ReadAllLines(foldpathsave).ToList();
filesreading = filesnumber.ToList();
filesreading.RemoveRange(0, 2);
foreach (string singlefile in filesreading)
{
//INTERNAL DATA NORMALIZATION
_repository.ImportAdd(_context, newVUL, newC2, newC3, newDATE);
_repository.Save(_context);
}
}
}
And this is my repository interface:
public void ImportAdd(VUContext _context, AVuTable newVUL, ACs2Table newC2, ACs3Table newC3, ADateTable newDATe)
{
_context.AVuTable.Add(newVU);
_context.ADateTable.Add(newDATE);
if (newC2 != null)
{
_context.ACs2Table.Add(newC2);
}
if (newC3 != null)
{
_context.ACs3Table.Add(newC3);
}
public void Save(VUContext _context)
{
_context.SaveChanges();
}
}
It everything quite simple I know, so how can I speed up this insert keeping it one by one record easly?
Start NOT using the slowest way to do it.
It starts with the way you actually load the files.
It goes on by not using SqlBulkCopy - in multiple threads possibly - to write the data to the database.
What you do is the slowest possible way - because EntityFramework is NOT an ETL tool.
Btw., one transaction per item (SaveChanges) does not help either. It maeks a super slow solution really really really super slow.
I manage to laod around 64k rows per second per thread, with 4-6 threads running in parallel.
To my experience SqlBulkCopy is the fastest way to do it. filesnumber sounds to be misnomer and I suspect you are reading a list of delimited files to be loaded to SQL Server after some normalization process. Probably that would even be faster if you do your normalization on server side, after loading the data initially to a temp file. Here is a sample SqlBulkCopy from a delimited file:
void Main()
{
Stopwatch sw = new Stopwatch();
sw.Start();
string sqlConnectionString = #"server=.\SQLExpress2012;Trusted_Connection=yes;Database=SampleDb";
string path = #"d:\temp\SampleTextFiles";
string fileName = #"combDoubledX.csv";
using (OleDbConnection cn = new OleDbConnection(
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source="+path+
";Extended Properties=\"text;HDR=No;FMT=Delimited\";"))
using (SqlConnection scn = new SqlConnection( sqlConnectionString ))
{
OleDbCommand cmd = new OleDbCommand("select * from "+fileName, cn);
SqlBulkCopy sbc = new SqlBulkCopy(scn, SqlBulkCopyOptions.TableLock,null);
sbc.ColumnMappings.Add(0,"[Category]");
sbc.ColumnMappings.Add(1,"[Activity]");
sbc.ColumnMappings.Add(5,"[PersonId]");
sbc.ColumnMappings.Add(6,"[FirstName]");
sbc.ColumnMappings.Add(7,"[MidName]");
sbc.ColumnMappings.Add(8,"[LastName]");
sbc.ColumnMappings.Add(12,"[Email]");
cn.Open();
scn.Open();
SqlCommand createTemp = new SqlCommand();
createTemp.CommandText = #"if exists
(SELECT * FROM tempdb.sys.objects
WHERE object_id = OBJECT_ID(N'[tempdb]..[##PersonData]','U'))
BEGIN
drop table [##PersonData];
END
create table ##PersonData
(
[Id] int identity primary key,
[Category] varchar(50),
[Activity] varchar(50) default 'NullOlmasin',
[PersonId] varchar(50),
[FirstName] varchar(50),
[MidName] varchar(50),
[LastName] varchar(50),
[Email] varchar(50)
)
";
createTemp.Connection = scn;
createTemp.ExecuteNonQuery();
OleDbDataReader rdr = cmd.ExecuteReader();
sbc.NotifyAfter = 200000;
//sbc.BatchSize = 1000;
sbc.BulkCopyTimeout = 10000;
sbc.DestinationTableName = "##PersonData";
//sbc.EnableStreaming = true;
sbc.SqlRowsCopied += (sender,e) =>
{
Console.WriteLine("-- Copied {0} rows to {1}.[{2} milliseconds]",
e.RowsCopied,
((SqlBulkCopy)sender).DestinationTableName,
sw.ElapsedMilliseconds);
};
sbc.WriteToServer(rdr);
if (!rdr.IsClosed) { rdr.Close(); }
cn.Close();
scn.Close();
}
sw.Stop();
sw.Dump();
}
And few sample lines from that file:
"Computer Labs","","LRC 302 Open Lab","","","10057380","Test","","Cetin","","5550123456","","cb#nowhere.com"
"Computer Labs","","LRC 302 Open Lab","","","123456789","John","","Doe","","5551234567","","jdoe#somewhere.com"
"Computer Labs","","LRC 302 Open Lab","","","012345678","Mary","","Doe","","5556666444","","mdoe#here.com"
You could create and run a list of Tasks<> doing SqlBulkCopy reading from a source (SqlBulkCopy supports a series of readers).
For faster operation you need to reduce the amount of database roundtrips
Using batching of statements feature in EF Core
You can see this feature is available only in EF Core, so you need to migrate to using EF Core if you are still using EF 6.
Compare EF Core & EF6
For this feature to work you need to move the Save operation outside of the loop.
Bulk insert
Bulk insert feature is designed to be the fastest way to insert large amount of database records
Bulk Copy Operations in SQL Server
To use it you need to use the SqlBulkCopy class for SQL Server and your code needs considerable rework.

SqlBulkInsert with a DataTable to a Linked Server

I'm working with 2 SQL 2008 Servers on different machines. The server names are source.ex.com, and destination.ex.com.
destination.ex.com is linked to source.ex.com and the appropriate permissions are in place for source.ex.com to write to a database called bacon-wrench on destination.ex.com
I've logged into source.ex.com via SMS and tested this query (successfully):
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (4,6);
In a C# .NET 4.0 WebPage I connect to source.ex.com and perform a similar query (successfully):
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
String sql = #"
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (34,56);";
using(SqlCommand cmd = new SqlCommand(sql, c))
{
cmd.ExecuteNonQuery();
}
}
For small sets of insert statements (say 20 or less) doing something like this performs fine:
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
String sql = #"
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (34,56);
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (22,11);
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (33,55);
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES (1,2);";
using(SqlCommand cmd = new SqlCommand(sql, c))
{
cmd.ExecuteNonQuery();
}
}
I'm trying to do something like this with around 20000 records. The above method takes 11 minutes to complete -- which I assume is the server sreaming at me to make it some kind of bulk operation. From other StackOverflow threads the SqlBulkCopy class was recommended and it takes as a parameter DataTable, perfect!
So I build a DataTable and attempt to write it to the server (fail):
DataTable dt = new DataTable();
dt.Columns.Add("PunchID", typeof(int));
dt.Columns.Add("BaconID", typeof(int));
for(int i = 0; i < 20000; i++)
{
//I realize this would make 20000 duplicate
//rows but its not important
dt.Rows.Add(new object[] {
11, 33
});
}
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
using(SqlBulkCopy bulk = new SqlBulkCopy(c))
{
bulk.DestinationTableName = "[destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]";
bulk.ColumnMappings.Add("PunchID", "PunchID");
bulk.ColumnMappings.Add("BaconID", "BaconID");
bulk.WriteToServer(dt);
}
}
EDIT2: The below message is what I'm attempting to fix:
The web page crashes at bulk.WriteToServer(dt); with an error message Database bacon-wrench does not exist please ensure it is typed correctly. What am I doing wrong? How do I change this to get it to work?
EDIT1:
I was able to speed up the query significantly using the below syntax. But it is still very slow for such a small record set.
using(SqlConnection c = new SqlConnection(ConfigurationManager.ConnectionStrings["SOURCE"].ConnectionString))
{
c.Open();
String sql = #"
INSERT INTO [destination.ex.com].[bacon-wrench].[dbo].[tblFruitPunch]
(PunchID, BaconID) VALUES
(34,56),
(22,11),
(33,55),
(1,2);";
using(SqlCommand cmd = new SqlCommand(sql, c))
{
cmd.ExecuteNonQuery();
}
}
If you are using SQL Server 2008+, you can introduce a Table user datatype. Prepare the type, receiving table and stored procedure something like below. Data type and stored procedure is on the local system. I generally have an if statement in the code detecting whether the table is remote or local, remote I do this, local I use SqlBulkCopy.
if(TYPE_ID(N'[Owner].[TempTableType]') is null)
begin
CREATE TYPE [Owner].[TempTableType] AS TABLE ( [PendingID] uniqueidentifier, [Reject] bit)
end
IF NOT EXISTS (SELECT * FROM [LinkedServer].[DatabaseOnLS].sys.tables where name = 'TableToReceive')
EXEC('
CREATE TABLE [DatabaseOnLS].[Owner].[TableToReceive] ( [PendingID] uniqueidentifier, [Reject] bit)
') AT [LinkedServer]
else
EXEC('
TRUNCATE TABLE [DatabaseOnLS].[Owner].[TableToReceive]
') AT [LinkedServer]
CREATE PROCEDURE [Owner].[TempInsertTable]
#newTableType TempTableType readonly
AS
BEGIN
insert into [LinkedServer].[DatabaseOnLS].[Owner].[TableToReceive] select * from #newTableType
END
In the C# code you can then do something like this to insert the DataTable into the table on the linked server (I'm using an existing UnitOfWork, which already have a connection and transaction):
using (var command = new SqlCommand("TempInsertTable",
oUoW.Database.Connection as SqlConnection) { CommandType = CommandType.StoredProcedure }
)
{
command.Transaction = oUoW.Database.CurrentTransaction as SqlTransaction;
command.Parameters.Add(new SqlParameter("#newTableType", oTempTable));
drResults = command.ExecuteReader();
drResults.Close();
}
After trying a number of things including linked server settings, collations, synonyms, etc., I eventually got to this error message:
Inserting into remote tables or views is not allowed by using the BCP utility or by using BULK INSERT.
Perhaps you can bulk insert to a staging table on your local server (your code works fine for this) and then insert from that staging table to your linked server from there, followed by a local delete of the staging table. You'll have to test for performance.

ExecuteScalar vs ExecuteNonQuery when returning an identity value

Trying to figure out if it's best to use ExecuteScalar or ExecuteNonQuery if I want to return the identity column of a newly inserted row. I have read this question and I understand the differences there, but when looking over some code I wrote a few weeks ago (whilst heavily borrowing from this site) I found that in my inserts I was using ExecuteScalar, like so:
public static int SaveTest(Test newTest)
{
var conn = DbConnect.Connection();
const string sqlString = "INSERT INTO dbo.Tests ( Tester , Premise ) " +
" VALUES ( #tester , #premise ) " +
"SET #newId = SCOPE_IDENTITY(); ";
using (conn)
{
using (var cmd = new SqlCommand(sqlString, conn))
{
cmd.Parameters.AddWithValue("#tester", newTest.tester);
cmd.Parameters.AddWithValue("#premise", newTest.premise);
cmd.Parameters.Add("#newId", SqlDbType.Int).Direction = ParameterDirection.Output;
cmd.CommandType = CommandType.Text;
conn.Open();
cmd.ExecuteScalar();
return (int) cmd.Parameters["#newId"].Value;
}
}
}
This works fine for what I need, so I'm wondering
Whether I should be using ExecuteNonQuery here because it is "more proper" for doing inserts?
Would retrieving the identity value be the same either way since I'm using an output parameter?
Are there any performance hits associated with one way or the other?
Is there generally a better way to do this overall?
I'm using Visual Studio 2010, .NET 4.0, and SQL Server 2008r2, in case that makes any difference.
As suggested by Aaron, a stored procedure would make it faster because it saves Sql Server the work of compiling your SQL batch. However, you could still go with either approach: ExecuteScalar or ExecuteNonQuery. IMHO, the performance difference between them is so small, that either method is just as "proper".
Having said that, I don't see the point of using ExecuteScalar if you are grabbing the identity value from an output parameter. In that case, the value returned by ExecuteScalar becomes useless.
An approach that I like because it requires less code, uses ExecuteScalar without output parameters:
public static int SaveTest(Test newTest)
{
var conn = DbConnect.Connection();
const string sqlString = "INSERT INTO dbo.Tests ( Tester , Premise ) " +
" VALUES ( #tester , #premise ) " +
"SELECT SCOPE_IDENTITY()";
using (conn)
{
using (var cmd = new SqlCommand(sqlString, conn))
{
cmd.Parameters.AddWithValue("#tester", newTest.tester);
cmd.Parameters.AddWithValue("#premise", newTest.premise);
cmd.CommandType = CommandType.Text;
conn.Open();
return (int) (decimal) cmd.ExecuteScalar();
}
}
}
Happy programming!
EDIT: Note that we need to cast twice: from object to decimal, and then to int (thanks to techturtle for noting this).

"connection has been disabled" error message when quickly creating/deleting databases

Introduction
I'm writing a web application (C#/ASP.NET MVC 3, .NET Framework 4, MS SQL Server 2008, System.Data.ODBC for database connections) and I'm having quite some issues regarding database creation/deletion.
I have a requirement that application should be able to create and delete databases.
Problem
Application fails stress testing for that function. More specifically, if client starts to quickly create, delete, create again a database with the same name then eventually (~on 5th request) server code throws ODBCException 'Connection has been disabled.'. This behavior is observed on all machines that test has been performed on - the exact failing request may be not 5th but somewhere around that value.
Research
Googling on exception gave very low output - the exception seems very generic one and no analogue issues found. One of suggestions I've found was that my development Windows 7 might not be able to handle numerous simultaneous connections as it's not Server OS. I've tried installing our app on Windows 2008 Server - almost no change in behavior, just a bit more requests processed before exception occurs.
Code and additional comments on implementation
Databases are created using stored procedure like this:
CREATE PROCEDURE [dbo].[sp_DBCreate]
...
#databasename nvarchar(124) -- 124 is max length of database file names
AS
DECLARE #sql nvarchar(150);
BEGIN
...
-- Create a new database
SET #sql = N'CREATE DATABASE ' + quotename(#databasename, '[');
EXEC(#sql);
IF ##ERROR <> 0
RETURN -2;
...
RETURN 0;
END
Databases are deleted using the following SP:
CREATE PROCEDURE [dbo].[sp_DomainDelete]
...
#databasename nvarchar(124) -- 124 is max length of database file names
AS
DECLARE #sql nvarchar(200);
BEGIN
...
-- check if database exists
IF EXISTS(SELECT * FROM [sys].[databases] WHERE [name] = #databasename)
BEGIN
-- drop all active connections
SET #sql = N'ALTER DATABASE' + quotename(#databasename, '[') + ' SET SINGLE_USER WITH ROLLBACK IMMEDIATE';
EXEC(#sql);
-- Delete database
SET #sql = N'DROP DATABASE ' + quotename(#databasename, '[');
EXEC(#sql);
IF ##ERROR <> 0
RETURN -1; --error deleting database
END
--ELSE database does not exist. consider it deleted.
RETURN 0;
END
In both SPs I've skipped less relevant parts like sanity checks.
I'm not using any ORMs, all SPs are called from code by using OdbcCommand instances. New OdbcConnection is created for each function call.
I sincerely hope someone might give me clue to the problem.
UPD: The exactly same problem occurs if we just rapidly create a bunch of databases. Thanks to everyone for suggestions on database delete code, but I'd prefer to have a solution or at least a hint for more general problem - the one which occurs even without deleting DBs at all.
UPD2: The following code is used for SP calls:
public static int ExecuteNonQuery(string sql, params object[] parameters)
{
try
{
var command = new OdbcCommand();
Prepare(command, new OdbcConnection( GetConnectionString() /*irrelevant*/), null, CommandType.Text, sql,
parameters == null ?
new List<OdbcParameter>().ToArray() :
parameters.Select(p => p is OdbcParameter ? (OdbcParameter)p : new OdbcParameter(string.Empty, p)).ToArray());
return command.ExecuteNonQuery();
}
catch (OdbcException ex)
{
// Logging here
throw;
}
}
public static void Prepare(
OdbcCommand command,
OdbcConnection connection,
OdbcTransaction transaction,
CommandType commandType,
string commandText,
params OdbcParameter[] commandParameters)
{
if (connection.State != ConnectionState.Open)
{
connection.Open();
}
command.Connection = connection;
command.CommandText = commandText;
if (transaction != null)
{
command.Transaction = transaction;
}
command.CommandType = commandType;
if (commandParameters != null)
{
command.Parameters.AddRange(
commandParameters.Select(p => p.Value==null &&
p.Direction == ParameterDirection.Input ?
new OdbcParameter(p.ParameterName, DBNull.Value) : p).ToArray());
}
}
Sample connection string:
Driver={SQL Server}; Server=LOCALHOST;Uid=sa;Pwd=<password here>;
Okay. There may be issues of scope for OdbcConnection but also you don't appear to be closing connections after you've finished with them. This may mean that you're reliant on the pool manager to close off unused connections and return them to the pool as they timeout. The using block will automatically close and dispose of the connection when finished, allowing it to be returned to the connection pool.
Try this code:
public static int ExecuteNonQuery(string sql, params object[] parameters)
{
int result = 0;
try
{
var command = new OdbcCommand();
using (OdbcConnection connection = new OdbcConnection(GetConnectionString() /*irrelevant*/))
{
connection.Open();
Prepare(command, connection, null, CommandType.Text, sql,
parameters == null ?
new List<OdbcParameter>().ToArray() :
parameters.Select(p => p is OdbcParameter ? (OdbcParameter)p : new OdbcParameter(string.Empty, p)).ToArray());
result = command.ExecuteNonQuery();
}
}
catch (OdbcException ex)
{
// Logging here
throw;
}
return result;
}

SQL Insert one row or multiple rows data?

I am working on a console application to insert data to a MS SQL Server 2005 database. I have a list of objects to be inserted. Here I use Employee class as example:
List<Employee> employees;
What I can do is to insert one object at time like this:
foreach (Employee item in employees)
{
string sql = #"INSERT INTO Mytable (id, name, salary)
values ('#id', '#name', '#salary')";
// replace #par with values
cmd.CommandText = sql; // cmd is IDbCommand
cmd.ExecuteNonQuery();
}
Or I can build a balk insert query like this:
string sql = #"INSERT INTO MyTable (id, name, salary) ";
int count = employees.Count;
int index = 0;
foreach (Employee item in employees)
{
sql = sql + string.format(
"SELECT {0}, '{1}', {2} ",
item.ID, item.Name, item.Salary);
if ( index != (count-1) )
sql = sql + " UNION ALL ";
index++
}
cmd.CommandType = sql;
cmd.ExecuteNonQuery();
I guess the later case is going to insert rows of data at once. However, if I have
several ks of data, is there any limit for SQL query string?
I am not sure if one insert with multiple rows is better than one insert with one row of data, in terms of performance?
Any suggestions to do it in a better way?
Actually, the way you have it written, your first option will be faster.
Your second example has a problem in it. You are doing sql = + sql + etc. This is going to cause a new string object to be created for each iteration of the loop. (Check out the StringBuilder class). Technically, you are going to be creating a new string object in the first instance too, but the difference is that it doesn't have to copy all the information from the previous string option over.
The way you have it set up, SQL Server is going to have to potentially evaluate a massive query when you finally send it which is definitely going to take some time to figure out what it is supposed to do. I should state, this is dependent on how large the number of inserts you need to do. If n is small, you are probably going to be ok, but as it grows your problem will only get worse.
Bulk inserts are faster than individual ones due to how SQL server handles batch transactions. If you are going to insert data from C# you should take the first approach and wrap say every 500 inserts into a transaction and commit it, then do the next 500 and so on. This also has the advantage that if a batch fails, you can trap those and figure out what went wrong and re-insert just those. There are other ways to do it, but that would definately be an improvement over the two examples provided.
var iCounter = 0;
foreach (Employee item in employees)
{
if (iCounter == 0)
{
cmd.BeginTransaction;
}
string sql = #"INSERT INTO Mytable (id, name, salary)
values ('#id', '#name', '#salary')";
// replace #par with values
cmd.CommandText = sql; // cmd is IDbCommand
cmd.ExecuteNonQuery();
iCounter ++;
if(iCounter >= 500)
{
cmd.CommitTransaction;
iCounter = 0;
}
}
if(iCounter > 0)
cmd.CommitTransaction;
In MS SQL Server 2008 you can create .Net table-UDT that will contain your table
CREATE TYPE MyUdt AS TABLE (Id int, Name nvarchar(50), salary int)
then, you can use this UDT in your stored procedures and your с#-code to batch-inserts.
SP:
CREATE PROCEDURE uspInsert
(#MyTvp AS MyTable READONLY)
AS
INSERT INTO [MyTable]
SELECT * FROM #MyTvp
C# (imagine that records you need to insert already contained in Table "MyTable" of DataSet ds):
using(conn)
{
SqlCommand cmd = new SqlCommand("uspInsert", conn);
cmd.CommandType = CommandType.StoredProcedure;
SqlParameter myParam = cmd.Parameters.AddWithValue
("#MyTvp", ds.Tables["MyTable"]);
myParam.SqlDbType = SqlDbType.Structured;
myParam.TypeName = "dbo.MyUdt";
// Execute the stored procedure
cmd.ExecuteNonQuery();
}
So, this is the solution.
Finally I want to prevent you from using code like yours (building the strings and then execute this string), because this way of executing may be used for SQL-Injections.
look at this thread,
I've answered there about table valued parameter.
Bulk-copy is usually faster than doing inserts on your own.
If you still want to do it in one of your suggested ways you should make it so that you can easily change the size of the queries you send to the server. That way you can optimize for speed in your production environment later on. Query times may v ary alot depending on the query size.
The batch size for a SQL Server query is listed at being 65,536 * the network packet size. The network packet size is by default 4kbs but can be changed. Check out the Maximum capacity article for SQL 2008 to get the scope. SQL 2005 also appears to have the same limit.

Categories

Resources