SqlBulkCopy WriteToServerAsync using IDataReader - c#

I'm trying to use SqlBulkCopy to insert the contents of the CSV file into SQL Server Database from IDataReader (the specific implementation of the IDataReader is CsvDataReader from CsvHelper package). The following is my code:
using (var bulkCopy = new SqlBulkCopy(sqlConnection, SqlBulkCopyOptions.UseInternalTransaction | SqlBulkCopyOptions.KeepNulls, null))
{
bulkCopy.DestinationTableName = tableName;
bulkCopy.BatchSize = 4000;
bulkCopy.EnableStreaming = true;
await bulkCopy.WriteToServerAsync(dataReader, cancellationToken).ConfigureAwait(false);
}
When trying to run this code I receive the following exception:
System.InvalidOperationException: Synchronous operations are disallowed. Call ReadAsync or set AllowSynchronousIO to true instead.
I believe this is the result of the fact that IDataReader doesn't have ReadAsync method
Is there a way to solve this problem?

I ended up writing a wrapper around CsvReader, that inherits from DbDataReader. I left most of the abstract method unimplemented, because I only needed a few methods for my purposes

Related

Can't select more than 700000 rows from SQL Server using C#

I couldn't fetch more than 700000 rows from SQL Server using C# - I get a "out-of-memory" exception. Please help me out.
This is my code:
using (SqlConnection sourceConnection = new SqlConnection(constr))
{
sourceConnection.Open();
SqlCommand commandSourceData = new SqlCommand("select * from XXXX ", sourceConnection);
reader = commandSourceData.ExecuteReader();
}
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(constr2))
{
bulkCopy.DestinationTableName = "destinationTable";
try
{
// Write from the source to the destination.
bulkCopy.WriteToServer(reader);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
finally
{
reader.Close();
}
}
I have made up small console App based on the given solution 1 but ends up with same exception also i have posted my Memory process Before and After
Before Processing:
After adding the command timeout at the read code side, Ram Peaks up,
That code should not cause an OOM exception. When you pass a DataReader to SqlBulkCopy.WriteToServer you are streaming the rows from the source to the destination. Somewhere else you are retaining stuff in memory.
SqlBulkCopy.BatchSize controls how often SQL Server commits the rows loaded at the destination, limiting the lock duration and the log file growth (if not minimally logged and in simple recovery mode). Whether you use one batch or not should have no impact on the amount of memory used either in SQL Server or in the client.
Here's a sample that copies 10M rows without growing memory:
using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace SqlBulkCopyTest
{
class Program
{
static void Main(string[] args)
{
var src = "server=localhost;database=tempdb;integrated security=true";
var dest = src;
var sql = "select top (1000*1000*10) m.* from sys.messages m, sys.messages m2";
var destTable = "dest";
using (var con = new SqlConnection(dest))
{
con.Open();
var cmd = con.CreateCommand();
cmd.CommandText = $"drop table if exists {destTable}; with q as ({sql}) select * into {destTable} from q where 1=2";
cmd.ExecuteNonQuery();
}
Copy(src, dest, sql, destTable);
Console.WriteLine("Complete. Hit any key to exit.");
Console.ReadKey();
}
static void Copy(string sourceConnectionString, string destinationConnectionString, string query, string destinationTable)
{
using (SqlConnection sourceConnection = new SqlConnection(sourceConnectionString))
{
sourceConnection.Open();
SqlCommand commandSourceData = new SqlCommand(query, sourceConnection);
var reader = commandSourceData.ExecuteReader();
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnectionString))
{
bulkCopy.BulkCopyTimeout = 60 * 10;
bulkCopy.DestinationTableName = destinationTable;
bulkCopy.NotifyAfter = 10000;
bulkCopy.SqlRowsCopied += (s, a) =>
{
var mem = GC.GetTotalMemory(false);
Console.WriteLine($"{a.RowsCopied:N0} rows copied. Memory {mem:N0}");
};
// Write from the source to the destination.
bulkCopy.WriteToServer(reader);
}
}
}
}
}
Which outputs:
. . .
9,830,000 rows copied. Memory 1,756,828
9,840,000 rows copied. Memory 798,364
9,850,000 rows copied. Memory 4,042,396
9,860,000 rows copied. Memory 3,092,124
9,870,000 rows copied. Memory 2,133,660
9,880,000 rows copied. Memory 1,183,388
9,890,000 rows copied. Memory 3,673,756
9,900,000 rows copied. Memory 1,601,044
9,910,000 rows copied. Memory 3,722,772
9,920,000 rows copied. Memory 1,642,052
9,930,000 rows copied. Memory 3,763,780
9,940,000 rows copied. Memory 1,691,204
9,950,000 rows copied. Memory 3,812,932
9,960,000 rows copied. Memory 1,740,356
9,970,000 rows copied. Memory 3,862,084
9,980,000 rows copied. Memory 1,789,508
9,990,000 rows copied. Memory 3,903,044
10,000,000 rows copied. Memory 1,830,468
Complete. Hit any key to exit.
NB: Per DavidBrowne's answer, it seems I'd misunderstood how the batching of the SqlBulkCopy class works. The refactored code may still be useful to you, so I've not deleted this answer (as the code is still valid), but the answer is not to set the BatchSize as I'd believed. Please see David's answer for an explanation.
Try something like this; the key being setting the BatchSize property to limit how many rows you deal with at once:
using (SqlConnection sourceConnection = new SqlConnection(constr))
{
sourceConnection.Open();
SqlCommand commandSourceData = new SqlCommand("select * from XXXX ", sourceConnection);
using (reader = commandSourceData.ExecuteReader() { //add a using statement for your reader so you don't need to worry about close/dispose
//keep the connection open or we'll be trying to read from a closed connection
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(constr2))
{
bulkCopy.BatchSize = 1000; //Write a few pages at a time rather than all at once; thus lowering memory impact. See https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy.batchsize?view=netframework-4.7.2
bulkCopy.DestinationTableName = "destinationTable";
try
{
// Write from the source to the destination.
bulkCopy.WriteToServer(reader);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
throw; //we've caught the top level Exception rather than somethign specific; so once we've logged it, rethrow it for a proper handler to deal with up the call stack
}
}
}
}
Note that because the SqlBulkCopy class takes an IDataReader as an argument we don't need to download the full data set. Instead, the reader gives us a way to pull back records as required (hence us leaving the connection open after creating the reader). When we call the SqlBulkCopy's WriteToServer method, internally it has logic to loop multiple times, selecting BatchSize new records from the reader, then pushing those to the destination table before repeating / completing once the reader has sent all pending records. This works differently to, say, a DataTable, where we'd have to populate the data table with the full set of records, rather than being able to read more back as required.
One potential risk of this approach is, because we have to keep the connection open, any locks on our source are kept in place until we close our reader. Depending on the isolation level and whether other queries are trying to access the same records, this may cause blocking; whilst the data table approach would have taken a one-off copy of the data into memory and then closed the connection, avoiding any blocks. If this blocking is a concern you should look at changing the isolation level of your query, or applying hints... Exactly how you approach that would depend on the requirements though.
NB: In reality, instead of running the above code as is, you'd want to refactor things a bit, so the scope of each method is contained. That way you can reuse this logic to copy other queries to other tables.
You'd also want to make the batch size configurable rather than hard-coded so you can adjust to a value that gives a good balance of resource usage vs performance (which will vary based on the host's resources).
You may also want to use async methods, to allow other parts of your program to progress whilst you're waiting on data to flow from/to your databases.
Here's a slightly amended version:
public Task<SqlDataReader> async ExecuteReaderAsync(string connectionString, string query)
{
SqlConnection connection;
SqlCommand command;
try
{
connection = new SqlConnection(connectionString); //not in a using as we want to keep the connection open until our reader's finished with it.
connection.Open();
command = new SqlCommand(query, connection);
return await command.ExecuteReaderAsync(CommandBehavior.CloseConnection); //tell our reader to close the connection when done.
}
catch
{
//if we have an issue before we've returned our reader, dispose of our objects here
command?.Dispose();
connection?.Dispose();
//then rethrow the exception
throw;
}
}
public async Task CopySqlDataAsync(string sourceConnectionString, string sourceQuery, string destinationConnectionString, string destinationTableName, int batchSize)
{
using (var reader = await ExecuteReaderAsync(sourceConnectionString, sourceQuery))
await CopySqlDataAsync(reader, destinationConnectionString, destinationTableName, batchSize);
}
public async Task CopySqlDataAsync(IDataReader sourceReader, string destinationConnectionString, string destinationTableName, int batchSize)
{
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnectionString))
{
bulkCopy.BatchSize = batchSize;
bulkCopy.DestinationTableName = destinationTableName;
await bulkCopy.WriteToServerAsync(sourceReader);
}
}
public void CopySqlDataExample()
{
try
{
var constr = ""; //todo: define connection string; ideally pulling from config
var constr2 = ""; //todo: define connection string #2; ideally pulling from config
var batchSize = 1000; //todo: replace hardcoded batch size with value from config
var task = CopySqlDataAsync(constr, "select * from XXXX", constr2, "destinationTable", batchSize);
task.Wait(); //waits for the current task to complete / if any exceptions will throw an aggregate exception
}
catch (AggregateException es)
{
var e = es.InnerExceptions[0]; //get the wrapped exception
Console.WriteLine(e.Message);
//throw; //to rethrow AggregateException
ExceptionDispatchInfo.Capture(e).Throw(); //to rethrow the wrapped exception
}
}
Something went horribly wrong in your design if you even try to process 700k Rows in C#. That you fail at this is to be expected.
If this is data retrieval for display: There is no way the user will be able to process that amount of data. And filtering down from 700k Rows in the GUI is just a waste of time and Bandwidth. 25-100 fields at once is about the limit. Do filtering or pagination on the Query side so you do not end up retrieving orders of magnitude more then you can actually process.
If this is some form of Bulk insert or Bulk modification: Do that kind of operation in the SQL Server, not in your code. Retrieving, processing in C# and then posting back just adds layers of Overhead. If you add the 2 way Network transfer, you will easily triple the time this will take.

SqlBulkCopy Wcf Rest Moving Data between 2 sql server tables

I am trying to move data from some tables from one sql server database to another sql server database. I am planning to write a wcf rest service to do that. I am also trying to implement this using SQLBulkCopy. I am trying to implement the below functionality on button click. Copy Table1 data from source sql server to Table 1 in destination sql server. Same as for table 2 and table 3. I am blocked on couple of things. Is sql bulk copy a best option with wcf rest service to transfer data. I was asked not to use ssis in this task. If there is any exception while moving data from source to destination, then the destination data should be reverted back. It is something like transaction. How do I implement this transaction functionality. Any pointer would help.
Based on the info you gave, your solution would be something like the sample code I inserted. Pointers for the SQLBulkCopy: BCopyTutorial1, BCopyTutorial2 SQLBulkCopy with transaction scope examples: TrasactionBulkCopy
My version is a simplified version of these, based on your question.
Interface:
[ServiceContract]
public interface IYourBulkCopyService {
[OperationContract]
void PerformBulkCopy();
}
Implementation:
public class YourBulkCopyService : IYourBulkCopyService {
public void PerformBulkCopy() {
string sourceCs = ConfigurationManager.AppSettings["SourceCs"];
string destinationCs = ConfigurationManager.AppSettings["DestinationCs"];
// Open a sourceConnection to the AdventureWorks database.
using (SqlConnection sourceConnection = new SqlConnection(sourceCs) {
sourceConnection.Open();
// Get data from the source table as a SqlDataReader.
SqlCommand commandSourceData = new SqlCommand(
"SELECT * FROM YourSourceTable", sourceConnection);
SqlDataReader reader = commandSourceData.ExecuteReader();
//Set up the bulk copy object inside the transaction.
using (SqlConnection destinationConnection = new SqlConnection(destinationCs)) {
destinationConnection.Open();
using (SqlTransaction transaction = destinationConnection.BeginTransaction()) {
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(
destinationConnection, SqlBulkCopyOptions.KeepIdentity,
transaction)) {
bulkCopy.BatchSize = 10;
bulkCopy.DestinationTableName =
"YourDestinationTable";
// Write from the source to the destination.
try {
bulkCopy.WriteToServer(reader);
transaction.Commit();
} catch (Exception ex) {
// If any error, rollback
Console.WriteLine(ex.Message);
transaction.Rollback();
} finally {
reader.Close();
}
}
}
}
}
}
}

"Invalid attempt to call Read when reader is closed" when using a SqlDataReader

1) I have the following codes:
private static sqlDataReader gCandidateList = null;
public SqlDataReader myCandidateList
{
set
{
gCandidateList = value;
}
get
{
return gCandidateList;
}
}
2) In FormA I have:
sqlConn.ConnectionString = mySettings.myConnString;
sqlConn.Open();
SqlCommand cmdAvailableCandidate = new SqlCommand(tempString, sqlConn);
SqlDataReader drAvailableCandidate = cmdAvailableCandidate.ExecuteReader();
mySettings.myCandidateList = drAvailableCandidate;
sqlConn.Close();
3) In FormB I want to reuse the data saved in myCandidatList so I use:
SqlDataReader drCandidate = mySettings.myCandidateList;
drCandidate.Read();
4) I then got the error "Invalide attempt to call Read when reader is closed."
5) I tried mySettings.myCandidateList.Read() in (3) above and again received the same error message.
6) How can I re-open SqlDataReader drCandidate to read data?
7) Would appreciate very much for advise and help, please.
You can't read reader once the connection is closed or disposed. If you want to use those rows (fetch result) later in your code you need to create a List or DataTable.
For instance,
System.Data.DataTable dt = new System.Data.DataTable();
dt.Load(drAvailableCandidate);
If you want to use the datareader at later stage, you have to specify the same as a parameter to the ExecuteReader Method. Your code in FormA should be changed as below.
sqlConn.ConnectionString = mySettings.myConnString;
sqlConn.Open();
SqlCommand cmdAvailableCandidate = new SqlCommand(tempString, sqlConn);
SqlDataReader drAvailableCandidate = cmdAvailableCandidate.ExecuteReader(CommandBehavior.CloseConnection);
mySettings.myCandidateList = drAvailableCandidate;
sqlConn.Close();
Make sure to dispose the datareader once it is used, as the connection to the database will be held open till the datareader is closed. Better change your code in FormB as below.
using (mySettings.myCandidateList)
{
mySettings.myCandidateList.Read();
}
You're closing the connection before you attempt to read from the reader. That won't work.
When you call Close on the SqlConnection object (sqlConn.Close();) it closes the connection and your data reader. That is why you are getting the error when you try to read from your SqlDataReader from FormB.
What you need to do is change the definition of your myCandidateList property to instead return a representation of the data that you have extracted from the your drAvailableCandidate reader.
Essentially what you need to do is iterate through the rows in the drAvailableCandidate object, extract the values and cache them in your property for later retrieval.
Just to add to the answers already given, if you're using async/await then it's easy to get caught out with this by not awaiting an operation inside a using block of a SqlConnection. For example, doing the following can give the reported error
public Task GetData()
{
using(new SqlConnection(connString))
{
return SomeAsyncOperation();
}
}
The problem here is we're not awaiting the operation inside the using, therefore it's being disposed of before we actually execute the underling async operation. Fairly obvious, but has caught me out before.
The correct thing to do being to await inside the using.
public async Task GetData()
{
using(new SqlConnection(connString))
{
await SomeAsyncOperation();
}
}

How to use SqlBulkCopy with SMO and transactions

I'm trying to create a table using SMO, and further use the SqlBulkCopy object to inject a bunch of data into that table. I can do this without using a transaction like this:-
Server server = new Server(new ServerConnection(new SqlConnection(connectionString)));
var database = server.Databases["MyDatabase"];
using (SqlConnection connection = server.ConnectionContext.SqlConnectionObject)
{
try
{
connection.Open();
Table table = new Table(database, "MyNewTable");
// --- Create the table and its columns --- //
SqlBulkCopy sqlBulkCopy = new SqlBulkCopy(connection);
sqlBulkCopy.DestinationTableName = "MyNewTable";
sqlBulkCopy.WriteToServer(dataTable);
}
catch (Exception)
{
throw;
}
}
Basically I want to perform the above using a SqlTransaction object and committing it when the operation has been completed (Or rolling it back if it fails).
Can anyone help?
2 Things -
A - The SQLBulkCopy method is already transaction based by default. That means the copy itself is encapsulated in a transaction and works for fails as a unit.
B - The ServerConnection object has methods for StartTransaction, CommitTransaction, RollbackTransaction.
You should be able to use those methods in your code above, but I suspect if there is an issue with the table creation your try/catch will handle that appropriately.

Using statement question

I have two questions.
1) Should you always use a using statement on a connection? So, I would use it on the connection and then another one on a reader within the connection? So I would be using two using statements.
2) Lets say you use the using statement on the connection and also a reader being returned on the connection. So you have two using statements. Does it create two Try{}Finally{} blocks or just one?
Thanks!
Be careful here. You should always have a using statement on any local object that implements IDisposable. That includes not only connections and readers, but also the command. But it can be tricky sometimes exactly where that using statement goes. If you're not careful it can cause problems. For example, in the code that follows the using statement will close your reader before you ever get to use it:
DataReader MyQuery()
{
string sql="some query";
using (var cn = new SqlConnection("connection string"))
using (var cmd = new SqlCommand(sql, cn))
{
cn.Open();
using (var rdr = cmd.ExecuteReader())
{
return rdr;
}
}
}
Instead, you have four options. One is to wait to create the using block until you call the function:
DataReader MyQuery()
{
string sql="some query";
using (var cn = new SqlConnection("connection string"))
using (var cmd = new SqlCommand(sql, cn))
{
cn.Open();
return cmd.ExecuteReader();
}
}
using (var rdr = MyQuery())
{
while (rdr.Read())
{
//...
}
}
Of course, you still have to careful with your connection there and it means remember to write a using block everywhere you use the function.
Option two is just process the query results in the method itself, but that breaks separation of your data layer from the rest of the program. A third option is for your MyQuery() function to accept an argument of type Action that you can call inside the while (rdr.Read()) loop, but that's just awkward.
I generally prefer option four: turn the data reader into an IEnumerable, like this:
IEnumerable<IDataRecord> MyQuery()
{
string sql="some query";
using (var cn = new SqlConnection("connection string"))
using (var cmd = new SqlCommand(sql, cn))
{
cn.Open();
using (var rdr = cmd.ExecuteReader())
{
while (rdr.Read())
yield return rdr;
}
}
}
Now everything will be closed correctly, and the code that handles it is all in one place. You also get a nice bonus: your query results will work well with any of the linq operators.
Finally, something new I'm playing with for the next time I get to build a completely new project that combines the IEnumerable with passing in a delegate argument:
//part of the data layer
private static IEnumerable<IDataRecord> Retrieve(string sql, Action<SqlParameterCollection> addParameters)
{
//DL.ConnectionString is a private static property in the data layer
// depending on the project needs, it can be implementing to read from a config file or elsewhere
using (var cn = new SqlConnection(DL.ConnectionString))
using (var cmd = new SqlCommand(sql, cn))
{
addParameters(cmd.Parameters);
cn.Open();
using (var rdr = cmd.ExecuteReader())
{
while (rdr.Read())
yield return rdr;
}
}
}
And then I'll use it within the data layer like this:
public IEnumerable<IDataRecord> GetFooChildrenByParentID(int ParentID)
{
//I could easily use a stored procedure name instead, and provide overloads for commandtypes.
return Retrieve(
"SELECT c.*
FROM [ParentTable] p
INNER JOIN [ChildTable] c ON c.ParentID = f.ID
WHERE f.ID= #ParentID", p =>
{
p.Add("#ParentID", SqlDbType.Int).Value = ParentID;
}
);
}
1) Should you always use a using
statement on a connection? So, I would
use it on the connection and then
another one on a reader within the
connection? So I would be using two
using statements.
Yes, because they implement IDisposable. And don't forget a using statement on the command too :
using (DbConnection connection = GetConnection())
using (DbCommand command = connection.CreateCommand())
{
command.CommandText = "SELECT FOO, BAR FROM BAZ";
connection.Open();
using (DbDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
....
}
}
}
2) Lets say you use the using
statement on the connection and also a
reader being returned on the
connection. So you have two using
statements. Does it create two
Try{}Finally{} blocks or just one?
Each using statement will create its own try/finally block
You should always use a using statement when an object implements IDisposable. This includes connections.
It will create two nested try{}finally{} blocks.
Special point on 1). You need to specifically avoid that technique when the connection is used in asynchronous ADO.NET methods - like BeginExecuteReader, because more than likely, you will fall out of scope and try to dispose the connection while the async operation is still in progress. This is similar to the case when you are using class variables and not local variables. Often times the connection reference is stored in a class used as the "control block" for the asynchronous operation.
To answer each one:
1) Yes, this would be best practice to dispose both as soon as possible.
2) using() will create two blocks, wrapped in each other in the same order. It will dispose the inner object (the reader) first, then dispose the object from the outer using (the connection).
Probably this article will be interesting for you: How to Implement IDisposable and Finalizers: 3 Easy Rules

Categories

Resources