How to do a batch update? - c#

I am wondering is there a way to do batch updating? I am using ms sql server 2005.
I saw away with the sqlDataAdaptor but it seems like you have to first the select statement with it, then fill some dataset and make changes to dataset.
Now I am using linq to sql to do the select so I want to try to keep it that way. However it is too slow to do massive updates. So is there away that I can keep my linq to sql(for the select part) but using something different to do the mass update?
Thanks
Edit
I am interested in this staging table way but I am not sure how to do it and still not clear how it will be faster since I don't understand how the update part works.
So can anyone show me how this would work and how to deal with concurrent connections?
Edit2
This was my latest attempt at trying to do a mass update using xml however it uses to much resources and my shared hosting does not allow it to go through. So I need a different way so thats why I am not looking into a staging table.
using (TestDataContext db = new TestDataContext())
{
UserTable[] testRecords = new UserTable[2];
for (int count = 0; count < 2; count++)
{
UserTable testRecord = new UserTable();
if (count == 1)
{
testRecord.CreateDate = new DateTime(2050, 5, 10);
testRecord.AnotherField = true;
}
else
{
testRecord.CreateDate = new DateTime(2015, 5, 10);
testRecord.AnotherField = false;
}
testRecords[count] = testRecord;
}
StringBuilder sBuilder = new StringBuilder();
System.IO.StringWriter sWriter = new System.IO.StringWriter(sBuilder);
XmlSerializer serializer = new XmlSerializer(typeof(UserTable[]));
serializer.Serialize(sWriter, testRecords);
using (SqlConnection con = new SqlConnection(connectionString))
{
string sprocName = "spTEST_UpdateTEST_TEST";
using (SqlCommand cmd = new SqlCommand(sprocName, con))
{
cmd.CommandType = CommandType.StoredProcedure;
cmd.CommandType = System.Data.CommandType.StoredProcedure;
SqlParameter param1 = new SqlParameter("#UpdatedProdData", SqlDbType.VarChar, int.MaxValue);
param1.Value = sBuilder.Remove(0, 41).ToString();
cmd.Parameters.Add(param1);
con.Open();
int result = cmd.ExecuteNonQuery();
con.Close();
}
}
}
# Fredrik Johansson I am not sure what your saying will work. Like it seems to me you want me to make a update statement for each record. I can't do that since I will have need update 1 to 50,000+ records and I will not know till that point.
Edit 3
So this is my SP now. I think it should be able to do concurrent connections but I wanted to make sure.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[sp_MassUpdate]
#BatchNumber uniqueidentifier
AS
BEGIN
update Product
set ProductQty = 50
from Product prod
join StagingTbl stage on prod.ProductId = stage.ProductId
where stage.BatchNumber = #BatchNumber
DELETE FROM StagingTbl
WHERE BatchNumber = #BatchNumber
END

You can use the sqlDataAdapter to do a batch update. It dosen’t matter how you fill your dataset. L2SQL or whatever, you can use different methods to do the update. Just define the query to run using the data in your datatable.
The key here is the UpdateBatchSize. The dataadapter will send the updates in batches of whatever size you define. You need to expirement with this value to see what number works best, but typicaly numbers of 500-1000 do best. SQL can then optimize the update and execute a little faster. Note that when doing batchupdates, you cannot update the row source of the datatable.
I use this method to do updates of 10-100K and it usualy runs in under 2 minutes. It will depend on what you are updating though.
Sorry, this is in VB….
Using da As New SqlDataAdapter
da.UpdateCommand = conn.CreateCommand
da.UpdateCommand.CommandTimeout = 300
da.AcceptChangesDuringUpdate = False
da.ContinueUpdateOnError = False
da.UpdateBatchSize = 1000 ‘Expirement for best preformance
da.UpdateCommand.UpdatedRowSource = UpdateRowSource.None 'Needed if UpdateBatchSize > 1
sql = "UPDATE YourTable"
sql += " SET YourField = #YourField"
sql += " WHERE ID = #ID"
da.UpdateCommand.CommandText = sql
da.UpdateCommand.UpdatedRowSource = UpdateRowSource.None
da.UpdateCommand.Parameters.Clear()
da.UpdateCommand.Parameters.Add("#YourField", SqlDbType.SmallDateTime).SourceColumn = "YourField"
da.UpdateCommand.Parameters.Add("#ID", SqlDbType.SmallDateTime).SourceColumn = "ID"
da.Update(ds.Tables("YourTable”)
End Using
Another option is to bulkcopy to a temp table, and then run a query to update the main table from it. This may be faster.

As allonym said, Use SqlBulkCopy, which is very fast(I found speed improvements of over 200x - from 1500 secs to 6s). However you can use the DataTable and DataRows classes to provide data to SQlBulkCopy (which seems easier). Using SqlBulkCopy this way has the added advantage of bein .NET 3.0 compliant as well (Linq was added only in 3.5).
Checkout http://msdn.microsoft.com/en-us/library/ex21zs8x%28v=VS.100%29.aspx for some sample code.

Use SqlBulkCopy, which is lightning-fast. You'll need a custom IDataReader implementation which enumerates over your linq query results. Look at http://code.msdn.microsoft.com/LinqEntityDataReader for more info and some potentially suitable IDataReader code.

You have to work with the expression trees directly, but it's doable. In fact, it's already been done for you, you just have to download the source:
Batch Updates and Deletes with LINQ to SQL
The alternative is to just use stored procedures or ad-hoc SQL queries using the ExecuteMethodCall and ExecuteCommand methods of the DataContext.

You can use SqlDataAdapter to do a batch-update even if a datatable is filled manually/programmatically (from linq of any other source).
Just remember to manually set the RowState for the rows in the datatable. Use dataRow.SetModified() for this.

Related

Is there a better way to fill this autocomplete? C#

I have a textbox that autocompletes from values in a SQL Server database. I also created a stored procedure, which is very simple:
Stored procedure code
My code is this:
public AutoCompleteStringCollection AutoCompleteFlight(TextBox flight)
{
using (SqlConnection connection = new SqlConnection(ConnectionLoader.ConnectionString("Threshold")))
{
AutoCompleteStringCollection flightCollection = new AutoCompleteStringCollection();
connection.Open();
SqlCommand flights = new SqlCommand("AutoComplete_Flight", connection);
flights.CommandType = CommandType.StoredProcedure;
SqlDataReader readFlights = flights.ExecuteReader();
while (readFlights.Read())
{
flightCollection.Add(readFlights["Flight_Number"].ToString());
}
return flight.AutoCompleteCustomSource = flightCollection;
}
}
Is there a point to having this stored procedure since it's such a simple query? Or am I doing this wrong, since it still has to use the data reader and insert it into the collections.
My previous code before the stored procedure was:
using (SqlConnection connection = new SqlConnection(ConnectionLoader.ConnectionString("Threshold")))
{
AutoCompleteStringCollection flightCollection = new AutoCompleteStringCollection();
connection.Open();
SqlCommand flights = new SqlCommand("SELECT DISTINCT Flight_Number FROM Ramp_Board", connection);
SqlDataReader readFlights = flights.ExecuteReader();
while (readFlights.Read())
{
flightCollection.Add(readFlights["Flight_Number"].ToString());
}
return flight.AutoCompleteCustomSource = flightCollection;
}
Is the second piece of code better or are they both wrong, and there is a way better way of doing this?
"Better way" is a little undefined.
If you are looking for a performance answer of stored procedure or not, I'm not sure it matters all that much with that small of a data set and a simple query. Stored procedures shine when there are complex operations to perform that can limit back and forth with the server or limit the amount of data returned. In your case, the server side effort is the same either way, and the amount of data returned is also the same. #Niel points out that the procedures can be updated server side without changing your deployed code. This is another useful feature of Stored procedures that you probably will not need for this scenario though.
If you are looking for an alternate code answer then you could use a DataAdapter instead of a DataReader. There are many articles on this site that talk about the performance of the two, and most of them agree that they are more or less the same. The only exception is if you dont't plan on reading all of the rows. In your case, you are reading the whole table, so they are effectively the same.
SqlCommand sqlCmd = new SqlCommand("SELECT * FROM SomeTable", connection);
SqlDataAdapter sqlDA= new SqlDataAdapter();
sqlDA.SelectCommand = sqlCmd;
DataTable table = new DataTable();
// Fill table from SQL using the command and connection
sqlDA.Fill(table);
// Fill autoComplete from table
autoComplete.AddRange(table.AsEnumerable().Select(dr => dr["ColumnName"].ToString()).ToArray());
If you decide to use this kind of a LINQ statement, it is best to set the column to not allow nulls, or add a where that filters nulls. I'm not sure how or if AutoCompleteStringCollection handles nulls.

Get output 'inserted' on update with Entity Framework

SQL Server provides output for inserted and updated record with the 'inserted' keyword.
I have a table representing a processing queue. I use the following query to lock a record and get the ID of the locked record:
UPDATE TOP (1) GlobalTrans
SET LockDateTime = GETUTCDATE()
OUTPUT inserted.ID
WHERE LockDateTime IS NULL
This will output a column named ID with all the updated record IDs (a single ID in my case). How can I translate this into EF in C# to execute the update and get the ID back?
Entity Framework has no way of doing that.
You could do it the ORM way, by selecting all the records, setting their LockDateTime and writing them back. That probably is not safe for what you want to do because by default it's not one single transaction.
You can span your own transactions and use RepeatableRead as isolation level. That should work. Depending on what your database does in the background, it might be overkill though.
You could write the SQL by hand. That defeats the purpose of entity framework, but it should be just as safe as it was before as far as the locking mechanism is concerned.
You could also put it into a stored procedure and call that. It's a little bit better than the above version because at least somebody will compile it and check that the table and column names are correct.
Simple Example #1 to get a data table:
I did this directly against the connection:
Changed the command.ExecuteNonQuery() to command.ExecuteReader()
var connection = DbContext().Database.Connection as SqlConnection;
using (var command = connection.CreateCommand())
{
command.CommandText = sql;
command.CommandTimeout = 120;
command.Parameters.Add(param);
using (var reader = command.ExecuteReader())
{
var resultTable = new DataTable();
resultTable.Load(reader);
return resultTable;
}
}
FYI, If you don't have an OUTPUT clause in your SQL, it will return an empty data table.
Example #2 to return entities:
This is a bit more complicated but does work.
using a SQL statement with a OUTPUT inserted.*
var className = typeof(T).Name;
var container = ObjContext().MetadataWorkspace.GetEntityContainer(UnitOfWork.ObjContext().DefaultContainerName, DataSpace.CSpace);
var setName = (from meta in container.BaseEntitySets where meta.ElementType.Name == className select meta.Name).First();
var results = ObjContext().ExecuteStoreQuery<T>(sql, setName, trackingEnabled ? MergeOption.AppendOnly : MergeOption.NoTracking).ToList();
T being the entity being worked on

SqlBulkCopy succeeds but inserts no records

I am trying to insert a large number of records into a variety of tables. I cannot fit all of the records into memory at once, so instead, I am using an IDataReader implementation to fit some of the data into memory and then dump it into the database with SqlBulkCopy.
The problem I am experiencing is that the second time I try to write to a table, SqlBulkCopy will succeed but fail to actually insert the records. I thought it was a transaction issue at first, but I disabled all transactions on my connection and am still seeing the same problem. I can also independently confirm the size of the tables before and after both inside the code and outside.
Here is a code snippet:
long before = GetCount(tableName);
DataServerConnection conn = GetConnection();
using (var batch = conn.HasTransaction ?
new SqlBulkCopy((SqlConnection)conn.IDbConnection, SqlBulkCopyOptions.Default, (SqlTransaction)conn.Transaction) :
new SqlBulkCopy((SqlConnection)conn.IDbConnection))
{
batch.DestinationTableName = tableName;
batch.WriteToServer(reader);
}
long after = GetCount(tableName);
if ((reader.Count + before) != after)
{
throw new Exception($"Not all records inserted: Before = {before}, After = {after}, Reader Count = {reader.Count}, Expected = {reader.Count + before}");
}
Any ideas what I am missing? GetCount(tableName) is doing a simple
SELECT COUNT(*) FROM [{tableName}]
reader is a basic IDataReader implementation that I have verified works other places on millions of records. GetConnection() is returning a wrapper for the connection which helps to prevent me from having to manage my connections constantly.
I'm not sure how your reader variable is declared so I'll tell you how I do it sometimes (and this doesn't mean it is the best way to do it).
First I declare a tableAdapter based on the dataSet I have:
DataSetTableAdapter.MyTableAdapter dataTable = new DataSetTableAdapter.MyTableAdapter();
Then, I create the connection.
SqlConnection sql = new SqlConnection(...);
And then the BulkCopy variable:
SqlBulkCopy insertData = new SqlBulkCopy(sql);
Once I have this, I start adding rows to the table Adapter like this:
dataTable.AddMyTableRow(...);
When I am finished, I do the bulk insertion:
sql.Open();
insertData.DestinationTableName = "MyTable";
insertData.WriteToServer(dataTable);
sql.Close();
Let me know if this helps you.

Best performance in reading million records of data

I have a database with a large number of data (millions of rows), and also is updating during the day with large number of data, I have a back up of this database for reporting, so getting report of data does not affect on the performance of main database.
For syncing back up database with main database, I wrote a windows service which queries the main database and inserts new data into backup database... every time the query gets 5000 rows from the main database...
EDIT:
the query is like below:
const string cmdStr = "SELECT * FROM [RLCConvertor].[dbo].[RLCDiffHeader] WHERE ID >= #Start and ID <= #End";
Here is the code:
using (var conn = new SqlConnection(_connectionString))
{
conn.Open();
var cmd = new SqlCommand(cmdStr, conn);
cmd.Parameters.AddWithValue("#Start", start);
cmd.Parameters.AddWithValue("#End", end);
SqlDataReader reader = cmd.ExecuteReader(CommandBehavior.SequentialAccess);
while (reader.Read())
{
var rldDiffId = Convert.ToInt32(reader["ID"].ToString());
var rlcDifHeader = new RLCDiffHeader
{
Tech_head_Type = long.Parse(reader["Tech_head_Type"].ToString()),
ItemCode = long.Parse(reader["ItemCode"].ToString()),
SessionNumber = long.Parse(reader["SessionNumber"].ToString()),
MarketFeedCode = reader["MarketFeedCode"].ToString(),
MarketPlaceCode = reader["MarketPlaceCode"].ToString(),
FinancialMarketCode = reader["FinancialMarketCode"].ToString(),
CIDGrc = reader["CIDGrc"].ToString(),
InstrumentID = reader["InstrumentID"].ToString(),
CValMNE = reader["CValMNE"].ToString(),
DEven = reader["DEven"].ToString(),
HEven = reader["HEven"].ToString(),
MessageCodeType = reader["MessageCodeType"].ToString(),
SEQbyINSTandType = reader["SEQbyINSTandType"].ToString()
};
newRLCDiffHeaders.Add(rldDiffId, rlcDifHeader);
}
conn.Close();
}
but when I started the service... the performance of main database got worse... is the code not efficient? Is there any better way? Because I searched and found that dataReader is the best for this case... or should I use DataTable and SqlDataAdapter?
You cannot treat this an a correct answer or solution for your problem.
Since the comment goes big, I am providing a solution to you.
Can you try using the concept of Ad hoc queries
Using this you can query another database using the following way
SELECT a.*
FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;',
'SELECT GroupName, Name, DepartmentID
FROM AdventureWorks2012.HumanResources.Department
ORDER BY GroupName, Name') AS a;
Read more
http://technet.microsoft.com/en-us/library/ms187569.aspx
http://technet.microsoft.com/en-us/library/ms190312.aspx
Since you are using a service, the service account surely have access to read the main db and insert to report db. I will suggest you to have a SP in your report DB , that can access the main DB using OpenRowSet and insert to it.
Query will be similar like this.
Insert into tbl
SELECT a.*
FROM OPENROWSET('SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;',
'SELECT GroupName, Name, DepartmentID
FROM AdventureWorks2012.HumanResources.Department
ORDER BY GroupName, Name') AS a;
Form the service, you need to invoke the SP.
We had a similar issue and this was done by openrowset and I don't know how much performance impact this can provide. But I suggest you to do a POC and just analyze it.
Once again please consider this as a suggestion.

How to know when to stop filling an OracleDataAdapter

I'm using the OPD.NET dll in a project that is accessing oracle.
Users can type in any SQL into a text box, that is then executed against the db. I've been trying to use the OracleDataAdapter to populate a datatable with the resultset, but I want to be able to return the resultset in stages (for large select queries).
An example of my problem is...
If a select query returns 13 rows of data, the code snippet below will execute without issue until the fourth time oda.Fill (start row is 15 which doesn't exist) is called, I presume because it is calling into a reader that has closed or something similar.
It then will throw a System.InvalidOperationException with the message - Operation is not valid due to the current state of the object.
How can I find out how many rows in total the command will eventually contain (so that I don't encounter the exception)?
OracleDataAdapter oda = new OracleDataAdapter(oracleCommand);
oda.Requery = false;
var dts = new DataTable[] { dt };
DataTable dt = new DataTable();
oda.Fill(0, 5, dts);
var a = dts[0].Rows.Count;
oda.Fill(a, 5, dts);
var b = dts[0].Rows.Count;
oda.Fill(b, 5, dts);
var c = dts[0].Rows.Count;
oda.Fill(c, 5, dts);
var d = dts[0].Rows.Count;
Note: I've omitted the connection and oracle command objects for brevity.
EDIT 1:
I've just thought I could just wrap the SQL entered by the user in another query and execute it...
SELECT COUNT(*) FROM (...intial query in here...)
but this isn't exactly a clean solution, and surely there is a method somewhere that I haven't seen?
Thanks in advance.
For paging in Oracle, see: http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html
There is no way to know the record set count without running a separate count(*) query. This is by design. The DataReader and DataAdapter are forward-only, read only.
If efficiency is a concern (i.e., large record sets), one should let the database do the paging and not ask the OracleDataAdapter to run the full query. Imagine if Google filled a DataTable with all 1M+ results for each user search!! The following article addresses this concern, although the examples are in sql:
http://www.asp.net/data-access/tutorials/efficiently-paging-through-large-amounts-of-data-cs
I've revised my example below to allow paging on any sql query. The calling procedure is responsible for keeping track of the user's current page and page size. If the result set is less than the requested page size, there are no more pages.
Of course, running custom sql from user input is a huge security risk. But that wasn't the question at hand.
Good luck! --Brett
DataTable GetReport(string sql, int pageIndex, int pageSize)
{
DataTable table = new DataTable();
int rowStart = pageIndex * pageSize + 1;
int rowEnd = (pageIndex + 1) * pageSize;
string qry = string.Format(
#"select *
from (select rownum ""ROWNUM"", a.*
from ({0}) a
where rownum <= :rowEnd)
where ""ROWNUM"" >= :rowStart
", sql);
try
{
using (OracleConnection conn = new OracleConnection(_connStr))
{
OracleCommand cmd = new OracleCommand(qry, conn);
cmd.Parameters.Add(":rowEnd", OracleDbType.Int32).Value = rowEnd;
cmd.Parameters.Add(":rowStart", OracleDbType.Int32).Value = rowStart;
cmd.CommandType = CommandType.Text;
conn.Open();
OracleDataAdapter oda = new OracleDataAdapter(cmd);
oda.Fill(table);
}
}
catch (Exception)
{
throw;
}
return table;
}
You could add an Analytic COUNT to your query:
SELECT foo, bar, COUNT(*) OVER () TheCount WHERE ...;
That way the count of the entire query is returned with each row in TheCount, and you could set your loop to terminate accordingly.
To gain control over Fill DataTable Loop you need own the loop.
Then build your own Function to Fill DataTable using OracleDataReader.
To get Columns information, you can use dataReader.GetSchemaTable
To Fill the Table:
MyTable.BeginLoadData
Dim Values(mySchema.rows.count-1)
Do while myReader.read
MyReader.GetValues(Values)
MyTable.Rows.add(Values)
'Include here your control over load Count
Loop
MyTable.EndLoadData

Categories

Resources