Best way to insert milions of records - c#

I'm working with an hosted service in C# asp.net core, linQ and T-SQL.
I need to make an insert one by one of records in my database.
Of course this is not a fast operation, but I'm not that experienced in this field so maybe I'm doing it wrong.
This is my code in my manager:
public void StrategyMassive(string foldpathsave)
{
using (IServiceScope scope = _services.CreateScope())
{
List<string> filesreading = new List<string>();
VUContext _context = scope.ServiceProvider.GetRequiredService<VUContext>();
List<string> filesnumber = File.ReadAllLines(foldpathsave).ToList();
filesreading = filesnumber.ToList();
filesreading.RemoveRange(0, 2);
foreach (string singlefile in filesreading)
{
//INTERNAL DATA NORMALIZATION
_repository.ImportAdd(_context, newVUL, newC2, newC3, newDATE);
_repository.Save(_context);
}
}
}
And this is my repository interface:
public void ImportAdd(VUContext _context, AVuTable newVUL, ACs2Table newC2, ACs3Table newC3, ADateTable newDATe)
{
_context.AVuTable.Add(newVU);
_context.ADateTable.Add(newDATE);
if (newC2 != null)
{
_context.ACs2Table.Add(newC2);
}
if (newC3 != null)
{
_context.ACs3Table.Add(newC3);
}
public void Save(VUContext _context)
{
_context.SaveChanges();
}
}
It everything quite simple I know, so how can I speed up this insert keeping it one by one record easly?

Start NOT using the slowest way to do it.
It starts with the way you actually load the files.
It goes on by not using SqlBulkCopy - in multiple threads possibly - to write the data to the database.
What you do is the slowest possible way - because EntityFramework is NOT an ETL tool.
Btw., one transaction per item (SaveChanges) does not help either. It maeks a super slow solution really really really super slow.
I manage to laod around 64k rows per second per thread, with 4-6 threads running in parallel.

To my experience SqlBulkCopy is the fastest way to do it. filesnumber sounds to be misnomer and I suspect you are reading a list of delimited files to be loaded to SQL Server after some normalization process. Probably that would even be faster if you do your normalization on server side, after loading the data initially to a temp file. Here is a sample SqlBulkCopy from a delimited file:
void Main()
{
Stopwatch sw = new Stopwatch();
sw.Start();
string sqlConnectionString = #"server=.\SQLExpress2012;Trusted_Connection=yes;Database=SampleDb";
string path = #"d:\temp\SampleTextFiles";
string fileName = #"combDoubledX.csv";
using (OleDbConnection cn = new OleDbConnection(
"Provider=Microsoft.ACE.OLEDB.12.0;Data Source="+path+
";Extended Properties=\"text;HDR=No;FMT=Delimited\";"))
using (SqlConnection scn = new SqlConnection( sqlConnectionString ))
{
OleDbCommand cmd = new OleDbCommand("select * from "+fileName, cn);
SqlBulkCopy sbc = new SqlBulkCopy(scn, SqlBulkCopyOptions.TableLock,null);
sbc.ColumnMappings.Add(0,"[Category]");
sbc.ColumnMappings.Add(1,"[Activity]");
sbc.ColumnMappings.Add(5,"[PersonId]");
sbc.ColumnMappings.Add(6,"[FirstName]");
sbc.ColumnMappings.Add(7,"[MidName]");
sbc.ColumnMappings.Add(8,"[LastName]");
sbc.ColumnMappings.Add(12,"[Email]");
cn.Open();
scn.Open();
SqlCommand createTemp = new SqlCommand();
createTemp.CommandText = #"if exists
(SELECT * FROM tempdb.sys.objects
WHERE object_id = OBJECT_ID(N'[tempdb]..[##PersonData]','U'))
BEGIN
drop table [##PersonData];
END
create table ##PersonData
(
[Id] int identity primary key,
[Category] varchar(50),
[Activity] varchar(50) default 'NullOlmasin',
[PersonId] varchar(50),
[FirstName] varchar(50),
[MidName] varchar(50),
[LastName] varchar(50),
[Email] varchar(50)
)
";
createTemp.Connection = scn;
createTemp.ExecuteNonQuery();
OleDbDataReader rdr = cmd.ExecuteReader();
sbc.NotifyAfter = 200000;
//sbc.BatchSize = 1000;
sbc.BulkCopyTimeout = 10000;
sbc.DestinationTableName = "##PersonData";
//sbc.EnableStreaming = true;
sbc.SqlRowsCopied += (sender,e) =>
{
Console.WriteLine("-- Copied {0} rows to {1}.[{2} milliseconds]",
e.RowsCopied,
((SqlBulkCopy)sender).DestinationTableName,
sw.ElapsedMilliseconds);
};
sbc.WriteToServer(rdr);
if (!rdr.IsClosed) { rdr.Close(); }
cn.Close();
scn.Close();
}
sw.Stop();
sw.Dump();
}
And few sample lines from that file:
"Computer Labs","","LRC 302 Open Lab","","","10057380","Test","","Cetin","","5550123456","","cb#nowhere.com"
"Computer Labs","","LRC 302 Open Lab","","","123456789","John","","Doe","","5551234567","","jdoe#somewhere.com"
"Computer Labs","","LRC 302 Open Lab","","","012345678","Mary","","Doe","","5556666444","","mdoe#here.com"
You could create and run a list of Tasks<> doing SqlBulkCopy reading from a source (SqlBulkCopy supports a series of readers).

For faster operation you need to reduce the amount of database roundtrips
Using batching of statements feature in EF Core
You can see this feature is available only in EF Core, so you need to migrate to using EF Core if you are still using EF 6.
Compare EF Core & EF6
For this feature to work you need to move the Save operation outside of the loop.
Bulk insert
Bulk insert feature is designed to be the fastest way to insert large amount of database records
Bulk Copy Operations in SQL Server
To use it you need to use the SqlBulkCopy class for SQL Server and your code needs considerable rework.

Related

Best performance in reading million records of data

I have a database with a large number of data (millions of rows), and also is updating during the day with large number of data, I have a back up of this database for reporting, so getting report of data does not affect on the performance of main database.
For syncing back up database with main database, I wrote a windows service which queries the main database and inserts new data into backup database... every time the query gets 5000 rows from the main database...
EDIT:
the query is like below:
const string cmdStr = "SELECT * FROM [RLCConvertor].[dbo].[RLCDiffHeader] WHERE ID >= #Start and ID <= #End";
Here is the code:
using (var conn = new SqlConnection(_connectionString))
{
conn.Open();
var cmd = new SqlCommand(cmdStr, conn);
cmd.Parameters.AddWithValue("#Start", start);
cmd.Parameters.AddWithValue("#End", end);
SqlDataReader reader = cmd.ExecuteReader(CommandBehavior.SequentialAccess);
while (reader.Read())
{
var rldDiffId = Convert.ToInt32(reader["ID"].ToString());
var rlcDifHeader = new RLCDiffHeader
{
Tech_head_Type = long.Parse(reader["Tech_head_Type"].ToString()),
ItemCode = long.Parse(reader["ItemCode"].ToString()),
SessionNumber = long.Parse(reader["SessionNumber"].ToString()),
MarketFeedCode = reader["MarketFeedCode"].ToString(),
MarketPlaceCode = reader["MarketPlaceCode"].ToString(),
FinancialMarketCode = reader["FinancialMarketCode"].ToString(),
CIDGrc = reader["CIDGrc"].ToString(),
InstrumentID = reader["InstrumentID"].ToString(),
CValMNE = reader["CValMNE"].ToString(),
DEven = reader["DEven"].ToString(),
HEven = reader["HEven"].ToString(),
MessageCodeType = reader["MessageCodeType"].ToString(),
SEQbyINSTandType = reader["SEQbyINSTandType"].ToString()
};
newRLCDiffHeaders.Add(rldDiffId, rlcDifHeader);
}
conn.Close();
}
but when I started the service... the performance of main database got worse... is the code not efficient? Is there any better way? Because I searched and found that dataReader is the best for this case... or should I use DataTable and SqlDataAdapter?
You cannot treat this an a correct answer or solution for your problem.
Since the comment goes big, I am providing a solution to you.
Can you try using the concept of Ad hoc queries
Using this you can query another database using the following way
SELECT a.*
FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;',
'SELECT GroupName, Name, DepartmentID
FROM AdventureWorks2012.HumanResources.Department
ORDER BY GroupName, Name') AS a;
Read more
http://technet.microsoft.com/en-us/library/ms187569.aspx
http://technet.microsoft.com/en-us/library/ms190312.aspx
Since you are using a service, the service account surely have access to read the main db and insert to report db. I will suggest you to have a SP in your report DB , that can access the main DB using OpenRowSet and insert to it.
Query will be similar like this.
Insert into tbl
SELECT a.*
FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;',
'SELECT GroupName, Name, DepartmentID
FROM AdventureWorks2012.HumanResources.Department
ORDER BY GroupName, Name') AS a;
Form the service, you need to invoke the SP.
We had a similar issue and this was done by openrowset and I don't know how much performance impact this can provide. But I suggest you to do a POC and just analyze it.
Once again please consider this as a suggestion.

Slow insert performance with large amounts of data (SQL Server / C#)

I'm working with electronic equipment that digitizes waveforms in real-time (each device generates around 1000 512 byte arrays per second - we have 12 devices). I've written a client for these devices in C# that for the most part works fine and has no performance issues.
However, one of the requirements for the application is archival, and Microsoft SQL Server 2010 was mandated as the storage mechanism (outside of my control). The database layout is very simple: there is one table per device per day ("Archive_Dev02_20131015" etc). Each table has an Id column, a timestamp column, a Data column (varbinary) and 20 more integer columns with some metadata. There's a clustered primary key on Id and timestamp, and another separate index on timestamp. My naive approach was queue all data in the client application, and then inserting everything into the database in 5 second intervals using SqlCommand.
The basic mechanism looks like this:
using (SqlTransaction transaction = connection.BeginTransaction()
{
//Beginning of the insert sql statement...
string sql = "USE [DatabaseName]\r\n" +
"INSERT INTO [dbo].[Archive_Dev02_20131015]\r\n" +
"(\r\n" +
" [Timestamp], \r\n" +
" [Data], \r\n" +
" [IntField1], \r\n" +
" [...], \r\n" +
") \r\n" +
"VALUES \r\n" +
"(\r\n" +
" #timestamp, \r\n" +
" #data, \r\n" +
" #int1, \r\n" +
" #..., \r\n" +
")";
using (SqlCommand cmd = new SqlCommand(sql))
{
cmd.Connection = connection;
cmd.Transaction = transaction;
cmd.Parameters.Add("#timestamp", System.Data.SqlDbType.DateTime);
cmd.Parameters.Add("#data", System.Data.SqlDbType.Binary);
cmd.Parameters.Add("#int1", System.Data.SqlDbType.Int);
foreach (var sample in samples)
{
cmd.Parameters[0].Value = amples.ReceiveDate;
cmd.Parameters[1].Value = samples.Data; //Data is a byte array
cmd.Parameters[1].Size = samples.Data.Length;
cmd.Parameters[2].Value = sample.IntValue1;
...
int affected = cmd.ExecuteNonQuery();
if (affected != 1)
{
throw new Exception("Could not insert sample into the database!");
}
}
}
}
transaction.Commit();
}
To summarize: a batch of 1 transaction with a loop that generates insert statements and executes them.
This method turned out to be very, very slow. On my machine (i5-2400 # 3.1GHz, 8GB RAM, using .NET 4.0 and SQL Server 2008, 2 internal HDs in mirror, everything runs locally), it takes about 2,5 seconds to save the data from 2 devices, so saving 12 devices each 5 seconds is impossible.
To compare, I've written a small SQL script (actually I extracted the code C# runs with the sql server profiler) that does the same directly on the server (still running on my own machine):
set statistics io on
go
begin transaction
go
declare #i int = 0;
while #i < 24500 begin
SET #i = #i + 1
exec sp_executesql N'USE [DatabaseName]
INSERT INTO [dbo].[Archive_Dev02_20131015]
(
[Timestamp],
[Data],
[int1],
...
[int20]
)
VALUES
(
#timestamp,
#data,
#compressed,
#int1,
...
#int20,
)',N'#timestamp datetime,#data binary(118),#int1 int,...,#int20 int,',
#timestamp='2013-10-14 14:31:12.023',
#data=0xECBD07601C499625262F6DCA7B7F4AF54AD7E074A10880601324D8904010ECC188CDE692EC1D69472329AB2A81CA6556655D661640CCED9DBCF7DE7BEFBDF7DE7BEFBDF7BA3B9D4E27F7DFFF3F5C6664016CF6CE4ADAC99E2180AAC81F3F7E7C1F3F22FEEF5FE347FFFDBFF5BF1FC6F3FF040000FFFF,
#int=0,
...
#int20=0
end
commit transaction
This does (imo, but I'm probably wrong ;) ) the same thing, only this time I'm using 24500 iterations, to simulate the 12 devices at once. The query takes about 2 seconds. If I use the same amount of iterations as the C# version, the query runs in less than a second.
So my first question is: why does it run way faster on SQL server than in C#? Does this have anything to do with the connection (local tcp)?
To make matters more confusing (to me) this code runs twice as slow on the production server (IBM bladecenter, 32GB ram, fiber connection to SAN, ... filesystem operations are really fast). I've tried looking at the sql activity monitor and write performance never goes above 2MB/sec, but this might as well be normal. I'm a complete newbie to sql server (about the polar opposite of a competent DBA in fact).
Any ideas on how I can make the C# code more performant?
By far the best approach for loading this sort of data is to use a table-valued parameter, and a stored procedure that takes the data. A really simple example of a table type and procedure that uses it would be:
CREATE TYPE [dbo].[StringTable]
AS TABLE ([Value] [nvarchar] (MAX) NOT NULL)
GO
CREATE PROCEDURE [dbo].[InsertStrings]
#Paths [dbo].[StringTable] READONLY
AS
INSERT INTO [dbo].[MyTable] ([Value])
SELECT [Value] FROM #Paths
GO
Then the C# code would be something along the lines of (please bear in mind that I've typed this into the S/O editor so there might be typos):
private static IEnumerable<SqlDataRecord> TransformStringList(ICollection<string> source)
{
if (source == null || source.Count == 0)
{
return null;
}
return GetRecords(source,
() => new SqlDataRecord(new SqlMetaData("Value", SqlDbType.NVarChar, -1)),
(record, value) => record.SetString(0, value));
}
private static IEnumerable<SqlDataRecord> GetRecords<T>(IEnumerable<T> source, Func<SqlDataRecord> factory, Action<SqlDataRecord, T> hydrator)
{
SqlDataRecord dataRecord = factory();
foreach (var value in source)
{
hydrator(dataRecord, value);
yield return dataRecord;
}
}
private InsertStrings(ICollection<string> strings, SqlConnection connection)
{
using (var transaction = connection.BeginTransaction())
{
using (var cmd = new SqlCommand("dbo.InsertStrings"))
{
cmd.Connection = connection;
cmd.Transaction = transaction;
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(new SqlParameter("#Paths", SqlDbType.Structured) { Value = TransformStringList(strings) };
cmd.ExecuteNonQuery();
}
}
}
This approach has speed that rivals SqlBulkCopy, but it also yields some better control through the ability to run the things that you're updating through a procedure, and also makes it a lot easier to deal with concurrency.
Edit -> Just for completeness, this approach works on SQL Server 2008 and up. Seeing as there isn't such a thing as SQL Server 2010 I thought I'd better mention that.
In sql server,
CREATE TYPE [dbo].[ArchiveData]
AS TABLE (
[Timestamp] [DateTime] NOT NULL,
[Data] [VarBinary](MAX) NOT NULL,
[IntField1] [Int] NOT NULL,
[...] [Int] NOT NULL,
[IntField20] NOT NULL)
GO
Then your code should be something like the code below. This code uses a Table Value Parameter to insert all pending data at once, is a single transaction.
Note the ommission of the the slow and unecessaery USE DATABASE and the use of verbatim strings (#"") to make the code more readable.
// The insert sql statement.
string sql =
#"INSERT INTO [dbo].[Archive_Dev02_20131015] (
[Timestamp],
[Data],
[IntField1],
[...],
[IntField20])
SELECT * FROM #data;";
using (SqlCommand cmd = new SqlCommand(sql))
{
using (SqlTransaction transaction = connection.BeginTransaction()
{
cmd.Connection = connection;
cmd.Transaction = transaction;
cmd.Parameters.Add(new SqlParameter("#data", SqlDbType.Structured)
{
Value = TransformSamples(samples);
});
int affected = cmd.ExecuteNonQuery();
transaction.Commit();
}
}
...
private static IEnumerable<SqlDataRecord> TransformSamples(
{YourSampleType} samples)
{
var schema = new[]
{
new SqlMetaData("Timestamp", SqlDbType.DateTime),
new SqlMetaData("Timestamp", SqlDbType.VarBinary, -1),
new SqlMetaData("IntField1", SqlDbType.Int),
new SqlMetaData("...", SqlDbType.Int),
new SqlMetaData("IntField20", SqlDbType.Int)
};
foreach (var sample in samples)
{
var row = new SqlDataRecord(schema);
row.SetSqlDate(0, sample.ReceiveDate);
row.SetSqlBinary(1, sample.Data);
row.SetSqlInt(2, sample.Data.Length);
row.SetSqlInt(..., ...);
row.SetSqlInt(24, sample.IntValue19);
yield return row;
}
}
I've managed to solve my issue by using SqlBulkInsert as suggested by juharr in one of the comments above.
I've mainly based myself on this post to convert my data to a DataTable that can be bulk inserted into the database:
Convert generic List/Enumerable to DataTable?
Thanks for all your answers!

Improving SQLite Performance

Well i have a file.sql that contains 20,000 of insert commands
Sample From the .sql file
INSERT INTO table VALUES
(1,-400,400,3,154850,'Text',590628,'TEXT',1610,'TEXT',79);
INSERT INTO table VALUES
(39,-362,400,3,111659,'Text',74896,'TEXT',0,'TEXT',14);
And i am using the following code to create an in memory Sqlite database and pull the values into it then calculate the time elapsed
using (var conn = new SQLiteConnection(#"Data Source=:memory:"))
{
conn.Open();
var stopwatch = new Stopwatch();
stopwatch.Start();
using (var cmd = new SQLiteCommand(conn))
{
using (var transaction = conn.BeginTransaction())
{
cmd.CommandText = File.ReadAllText(#"file.sql");
cmd.ExecuteNonQuery();
transaction.Commit();
}
}
var timeelapsed = stopwatch.Elapsed.TotalSeconds <= 60
? stopwatch.Elapsed.TotalSeconds + " seconds"
: Math.Round(stopwatch.Elapsed.TotalSeconds/60) + " minutes";
MessageBox.Show(string.Format("Time elapsed {0}", timeelapsed));
conn.Close();
}
Things i have tried
Using file database instead of memory one.
Using begin transaction and commit transaction [AS SHOWN IN MY CODE].
Using Firefox's extension named SQLite Manager to test whether the
slowing down problem is from the script; However, I was surprised
that the same 20,000 lines that i am trying to process using my code
has been pulled to the database in JUST 4ms!!!.
Using PRAGMA synchronous = OFF, as well as, PRAGMA journal_mode =
MEMORY.
Appending begin transaction; and commit transaction; to the
beginning and ending of the .sql file respectively.
As the SQLite documentations says : SQLite is capable of processing 50,000 commands per seconds. And that is real and i made sure of it using the SQLite Manager [AS DESCRIPED IN THE THIRD SOMETHING THAT I'V TRIED]; However, I am getting my 20,000 commands done in 4 minutes something that tells that there is something wrong.
QUESTION : What is the problem am i facing why is the Execution done very slowly ?!
SQLite.Net documentation recommends the following construct for transactions
using (SqliteConnection conn = new SqliteConnection(#"Data Source=:memory:"))
{
conn.Open();
using(SqliteTransaction trans = conn.BeginTransaction())
{
using (SqliteCommand cmd = new SQLiteCommand(conn))
{
cmd.CommandText = File.ReadAllText(#"file.sql");
cmd.ExecuteNonQuery();
}
trans.Commit();
}
con.Close();
}
Are you able to manipulate the text file contexts to something like:
INSERT INTO table (col01, col02, col03, col04, col05, col06, col07, col08, col09, col10, col11)
SELECT 1,-400,400,3,154850,'Text',590628,'TEXT',1610,'TEXT',79
UNION ALL
SELECT 39,-362,400,3,111659,'Text',74896,'TEXT',0,'TEXT',14
;
Maybe try "batching them" into groups of 100 as a initial test.
http://sqlite.org/lang_select.html
SqlLite seems to support the UNION ALL statement.

want to run query on each database, How to calculate the performance difference in C# and in SQL Server

I am working on sql server monitoring product and i have database query that will fetch data regarding All Table details of all the Databases in SQL server.
For this i have two options.
Fire query on data base from code as select name from [master].sys.sysdatabases
Get the DB name of all the data base first then i will fire my main query on each DB
using "USE <fetched DB name>;"+"mainQuery";
Please check followin code for the same.
public DataTable GetResultsOfAllDB(string query)
{
SqlConnection con = new SqlConnection(_ConnectionString);
string locleQuery = "select name from [master].sys.sysdatabases";
DataTable dtResult = new DataTable("Result");
SqlCommand cmdData = new SqlCommand(locleQuery, con);
cmdData.CommandTimeout = 0;
SqlDataAdapter adapter = new SqlDataAdapter(cmdData);
DataTable dtDataBases = new DataTable("DataBase");
adapter.Fill(dtDataBases);
foreach (DataRow drDB in dtDataBases.Rows)
{
if (dtResult.Rows.Count >= 15000)
break;
locleQuery = " Use [" + Convert.ToString(drDB[0]) + "]; " + query;
cmdData = new SqlCommand(locleQuery, con);
adapter = new SqlDataAdapter(cmdData);
DataTable dtTemp = new DataTable();
adapter.Fill(dtTemp);
dtResult.Merge(dtTemp);
}
return dtResult;
}
I will use sys store procedure i.e.EXEC sp_MSforeachdb and fetched data will be stored store data in table datatype select from temptable; Drop Table temptable.
Check following query for the same
Declare #TableDetail table
(
field1 varchar(500),
field2 int,
field3 varchar(500),
field4 varchar(500),
field5 decimal(18,2),
field6 decimal(18,2)
)
INSERT #TableDetail EXEC sp_MSforeachdb 'USE [?]; QYERY/COMMAND FOR ALL DATABASE'
Select
field1,field2 ,field3 ,field4 ,field5,field6 FROM #TableDetail
Note : In second option query takes time because if number of database and number of table are huge then this will wait until all database get finish.
Now my question is which is the good option from above two options and why? or any other solution for the same.
Thanks in advance.
One key difference is the second option blocks until everything is done. All of the work is done sql server side. That has the issue of not being able to apply feedback to the user as it runs and it can potentially time out and not be resiliant to network blips. This option can be used as a pure sql script (some sql admins like that) where the first needs a program.
In the first example, the client is doing iterative more granular tasks where you can supply feedback to the user. You can also retry in the face of network blips without redoing all of the work. In the first example, you can also use SqlConnectionBuild instead of USE concatentation.
If performance is a concern, you could also potentially parallelize the first one with some locking around adapter.Fill
Both suck - they are both serial.
Use the first, get rid of the ridiculous objects (DataSet) and use TASKS to parallelize X databases at the same time. X determined by trying ut how much load the server can handle.
Finished.
If your queries are simple enough you can try to generate single script instead of execute queries in each DB one by one:
select 'DB1' as DB, Field1, Field2, ...
from [DB1]..[TableOrViewName]
union all
select 'DB2' as DB, Field1, Field2, ...
from [DB2]..[TableOrViewName]
union all
...
Everything is looking fine. I just want to add Using statements for IDisposable objects
public DataTable GetResultsOfAllDB(string query)
{
using (SqlConnection con = new SqlConnection(_ConnectionString))
{
string locleQuery = "select name from [master].sys.sysdatabases";
DataTable dtResult = new DataTable("Result");
using (SqlCommand cmdData = new SqlCommand(locleQuery, con))
{
cmdData.CommandTimeout = 0;
using (SqlDataAdapter adapter = new SqlDataAdapter(cmdData))
{
using (DataTable dtDataBases = new DataTable("DataBase"))
{
adapter.Fill(dtDataBases);
foreach (DataRow drDB in dtDataBases.Rows)
{
if (dtResult.Rows.Count >= 15000)
break;
locleQuery = " Use [" + Convert.ToString(drDB[0]) + "]; " + query;
cmdData = new SqlCommand(locleQuery, con);
adapter = new SqlDataAdapter(cmdData);
using (DataTable dtTemp = new DataTable())
{
adapter.Fill(dtTemp);
dtResult.Merge(dtTemp);
}
}
return dtResult;
}
}
}
}
}

How to do a batch update?

I am wondering is there a way to do batch updating? I am using ms sql server 2005.
I saw away with the sqlDataAdaptor but it seems like you have to first the select statement with it, then fill some dataset and make changes to dataset.
Now I am using linq to sql to do the select so I want to try to keep it that way. However it is too slow to do massive updates. So is there away that I can keep my linq to sql(for the select part) but using something different to do the mass update?
Thanks
Edit
I am interested in this staging table way but I am not sure how to do it and still not clear how it will be faster since I don't understand how the update part works.
So can anyone show me how this would work and how to deal with concurrent connections?
Edit2
This was my latest attempt at trying to do a mass update using xml however it uses to much resources and my shared hosting does not allow it to go through. So I need a different way so thats why I am not looking into a staging table.
using (TestDataContext db = new TestDataContext())
{
UserTable[] testRecords = new UserTable[2];
for (int count = 0; count < 2; count++)
{
UserTable testRecord = new UserTable();
if (count == 1)
{
testRecord.CreateDate = new DateTime(2050, 5, 10);
testRecord.AnotherField = true;
}
else
{
testRecord.CreateDate = new DateTime(2015, 5, 10);
testRecord.AnotherField = false;
}
testRecords[count] = testRecord;
}
StringBuilder sBuilder = new StringBuilder();
System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder);
XmlSerializer serializer = new XmlSerializer(typeof(UserTable[]));
serializer.Serialize(sWriter, testRecords);
using (SqlConnection con = new SqlConnection(connectionString))
{
string sprocName = "spTEST_UpdateTEST_TEST";
using (SqlCommand cmd = new SqlCommand(sprocName, con))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.CommandType = System.Data.CommandType.StoredProcedure;
SqlParameter param1 = new SqlParameter("#UpdatedProdData", SqlDbType.VarChar, int.MaxValue);
param1.Value = sBuilder.Remove(0, 41).ToString();
cmd.Parameters.Add(param1);
con.Open();
int result = cmd.ExecuteNonQuery();
con.Close();
}
}
}
# Fredrik Johansson I am not sure what your saying will work. Like it seems to me you want me to make a update statement for each record. I can't do that since I will have need update 1 to 50,000+ records and I will not know till that point.
Edit 3
So this is my SP now. I think it should be able to do concurrent connections but I wanted to make sure.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[sp_MassUpdate]
#BatchNumber uniqueidentifier
AS
BEGIN
update Product
set ProductQty = 50
from Product prod
join StagingTbl stage on prod.ProductId = stage.ProductId
where stage.BatchNumber = #BatchNumber
DELETE FROM StagingTbl
WHERE BatchNumber = #BatchNumber
END
You can use the sqlDataAdapter to do a batch update. It dosen’t matter how you fill your dataset. L2SQL or whatever, you can use different methods to do the update. Just define the query to run using the data in your datatable.
The key here is the UpdateBatchSize. The dataadapter will send the updates in batches of whatever size you define. You need to expirement with this value to see what number works best, but typicaly numbers of 500-1000 do best. SQL can then optimize the update and execute a little faster. Note that when doing batchupdates, you cannot update the row source of the datatable.
I use this method to do updates of 10-100K and it usualy runs in under 2 minutes. It will depend on what you are updating though.
Sorry, this is in VB….
Using da As New SqlDataAdapter
da.UpdateCommand = conn.CreateCommand
da.UpdateCommand.CommandTimeout = 300
da.AcceptChangesDuringUpdate = False
da.ContinueUpdateOnError = False
da.UpdateBatchSize = 1000 ‘Expirement for best preformance
da.UpdateCommand.UpdatedRowSource = UpdateRowSource.None 'Needed if UpdateBatchSize > 1
sql = "UPDATE YourTable"
sql += " SET YourField = #YourField"
sql += " WHERE ID = #ID"
da.UpdateCommand.CommandText = sql
da.UpdateCommand.UpdatedRowSource = UpdateRowSource.None
da.UpdateCommand.Parameters.Clear()
da.UpdateCommand.Parameters.Add("#YourField", SqlDbType.SmallDateTime).SourceColumn = "YourField"
da.UpdateCommand.Parameters.Add("#ID", SqlDbType.SmallDateTime).SourceColumn = "ID"
da.Update(ds.Tables("YourTable”)
End Using
Another option is to bulkcopy to a temp table, and then run a query to update the main table from it. This may be faster.
As allonym said, Use SqlBulkCopy, which is very fast(I found speed improvements of over 200x - from 1500 secs to 6s). However you can use the DataTable and DataRows classes to provide data to SQlBulkCopy (which seems easier). Using SqlBulkCopy this way has the added advantage of bein .NET 3.0 compliant as well (Linq was added only in 3.5).
Checkout http://msdn.microsoft.com/en-us/library/ex21zs8x%28v=VS.100%29.aspx for some sample code.
Use SqlBulkCopy, which is lightning-fast. You'll need a custom IDataReader implementation which enumerates over your linq query results. Look at http://code.msdn.microsoft.com/LinqEntityDataReader for more info and some potentially suitable IDataReader code.
You have to work with the expression trees directly, but it's doable. In fact, it's already been done for you, you just have to download the source:
Batch Updates and Deletes with LINQ to SQL
The alternative is to just use stored procedures or ad-hoc SQL queries using the ExecuteMethodCall and ExecuteCommand methods of the DataContext.
You can use SqlDataAdapter to do a batch-update even if a datatable is filled manually/programmatically (from linq of any other source).
Just remember to manually set the RowState for the rows in the datatable. Use dataRow.SetModified() for this.

Categories

Resources