SQL Insert statements in C# on Azure DB - Running very slow - c#

I'm working on an importer in our web application. With the code I currently have, when you are connecting via local SQL server, it runs fine and within reason. I'm also creating a .sql script that they can download as well
Example 1
40k records, 8 columns, from 1 minute and 30 seconds until 2 minutes
When I move it to production and Azure app service, it is running VERY slow.
Example 2
40k records, 8 columns, from 15 minutes to 18 minutes
The current database is set to: Pricing tier: Standard S2: 50 DTUs
Here is the code:
using (var sqlConnection = new SqlConnection(connectionString))
{
try
{
var generatedScriptFilePathInfo = GetImportGeneratedScriptFilePath(trackingInfo.UploadTempDirectoryPath, trackingInfo.FileDetail);
using (FileStream fileStream = File.Create(generatedScriptFilePathInfo.GeneratedScriptFilePath))
{
using (StreamWriter writer = new StreamWriter(fileStream))
{
sqlConnection.Open();
sqlTransaction = sqlConnection.BeginTransaction();
await writer.WriteLineAsync("/* Insert Scripts */").ConfigureAwait(false);
foreach (var item in trackingInfo.InsertSqlScript)
{
errorSqlScript = item;
using (var cmd = new SqlCommand(item, sqlConnection, sqlTransaction))
{
cmd.CommandTimeout = 800;
cmd.CommandType = CommandType.Text;
await cmd.ExecuteScalarAsync().ConfigureAwait(false);
}
currentRowLine++;
rowsProcessedUpdateEveryXCounter++;
rowsProcessedTotal++;
// append insert statement to the file
await writer.WriteLineAsync(item).ConfigureAwait(false);
}
// write out a couple of blank lines to separate insert statements from post scripts (if there are any)
await writer.WriteLineAsync(string.Empty).ConfigureAwait(false);
await writer.WriteLineAsync(string.Empty).ConfigureAwait(false);
}
}
}
catch (OverflowException exOverFlow)
{
sqlTransaction.Rollback();
sqlTransaction.Dispose();
trackingInfo.IsSuccessful = false;
trackingInfo.ImportMetricUpdateError = new ImportMetricUpdateErrorDTO(trackingInfo.ImportMetricId)
{
ErrorLineNbr = currentRowLine + 1, // add one to go ahead and count the record we are on to sync up with the file
ErrorMessage = string.Format(CultureInfo.CurrentCulture, "{0}", ImporterHelper.ArithmeticOperationOverflowFriendlyErrorText),
ErrorSQL = errorSqlScript,
RowsProcessed = currentRowLine
};
await LogImporterError(trackingInfo.FileDetail, exOverFlow.ToString(), currentUserId).ConfigureAwait(false);
await UpdateImportAfterFailure(trackingInfo.ImportMetricId, exOverFlow.Message, currentUserId).ConfigureAwait(false);
return trackingInfo;
}
catch (Exception ex)
{
sqlTransaction.Rollback();
sqlTransaction.Dispose();
trackingInfo.IsSuccessful = false;
trackingInfo.ImportMetricUpdateError = new ImportMetricUpdateErrorDTO(trackingInfo.ImportMetricId)
{
ErrorLineNbr = currentRowLine + 1, // add one to go ahead and count the record we are on to sync up with the file
ErrorMessage = string.Format(CultureInfo.CurrentCulture, "{0}", ex.Message),
ErrorSQL = errorSqlScript,
RowsProcessed = currentRowLine
};
await LogImporterError(trackingInfo.FileDetail, ex.ToString(), currentUserId).ConfigureAwait(false);
await UpdateImportAfterFailure(trackingInfo.ImportMetricId, ex.Message, currentUserId).ConfigureAwait(false);
return trackingInfo;
}
}
Questions
Is there any way to speed this up on Azure? Or is the only way to upgrade the DTUs?
We are looking into SQL Bulk Copy as well. Will this help any or still cause slowness on Azure: https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy?redirectedfrom=MSDN&view=dotnet-plat-ext-5.0
Desired results
Run at the same speed when running it at a local SQL Server database

For now, I updated my code to batch the insert statements based on how many records. If the record count is over 10k, then it will batch them by dividing the total by 10.
This helped performance BIG TIME on our Azure instance. I was able to add 40k records within 30 seconds. I also think some of the issue was how many different slots use our app service on Azure.
We will also probably move to SQLBulkCopy later on as users need to import larger excel files.
Thanks everyone for the helps and insights!
// apply the create table SQL script if found.
if (string.IsNullOrWhiteSpace(trackingInfo.InsertSqlScript.ToString()) == false)
{
int? updateEveryXRecords = GetProcessedEveryXTimesForApplyingInsertStatementsValue(trackingInfo.FileDetail);
trackingInfo.FileDetail = UpdateImportMetricStatus(trackingInfo.FileDetail, ImportMetricStatus.ApplyingInsertScripts, currentUserId);
int rowsProcessedUpdateEveryXCounter = 0;
int rowsProcessedTotal = 0;
await UpdateImportMetricsRowsProcessed(trackingInfo.ImportMetricId, rowsProcessedTotal, trackingInfo.FileDetail.ImportMetricStatusHistories).ConfigureAwait(false);
bool isBulkMode = trackingInfo.InsertSqlScript.Count >= 10000;
await writer.WriteLineAsync("/* Insert Scripts */").ConfigureAwait(false);
int insertCounter = 0;
int bulkCounter = 0;
int bulkProcessingAmount = 0;
int lastInsertCounter = 0;
if (isBulkMode == true)
{
bulkProcessingAmount = trackingInfo.InsertSqlScript.Count / 10;
}
await LogInsertBulkStatus(trackingInfo.FileDetail, isBulkMode, trackingInfo.InsertSqlScript.Count, bulkProcessingAmount, currentUserId).ConfigureAwait(false);
StringBuilder sbInsertBulk = new StringBuilder();
foreach (var item in trackingInfo.InsertSqlScript)
{
if (isBulkMode == false)
{
errorSqlScript = item;
using (var cmd = new SqlCommand(item, sqlConnection, sqlTransaction))
{
cmd.CommandTimeout = 800;
cmd.CommandType = CommandType.Text;
await cmd.ExecuteScalarAsync().ConfigureAwait(false);
}
currentRowLine++;
rowsProcessedUpdateEveryXCounter++;
rowsProcessedTotal++;
// append insert statement to the file
await writer.WriteLineAsync(item).ConfigureAwait(false);
// Update database with the insert statement created count to alert the user of the status.
if (updateEveryXRecords.HasValue)
{
if (updateEveryXRecords.Value == rowsProcessedUpdateEveryXCounter)
{
await UpdateImportMetricsRowsProcessed(trackingInfo.ImportMetricId, rowsProcessedTotal, trackingInfo.FileDetail.ImportMetricStatusHistories).ConfigureAwait(false);
rowsProcessedUpdateEveryXCounter = 0;
}
}
}
else
{
sbInsertBulk.AppendLine(item);
if (bulkCounter < bulkProcessingAmount)
{
errorSqlScript = string.Format(CultureInfo.CurrentCulture, "IsBulkMode is True | insertCounter = {0}", insertCounter);
bulkCounter++;
}
else
{
// display to the end user
errorSqlScript = string.Format(CultureInfo.CurrentCulture, "IsBulkMode is True | currentInsertCounter value = {0} | lastInsertCounter (insertCounter when the last bulk insert occurred): {1}", insertCounter, lastInsertCounter);
await ApplyBulkInsertStatements(sbInsertBulk, writer, sqlConnection, sqlTransaction, trackingInfo, rowsProcessedTotal).ConfigureAwait(false);
bulkCounter = 0;
sbInsertBulk.Clear();
lastInsertCounter = insertCounter;
}
rowsProcessedTotal++;
}
insertCounter++;
}
// get the remaining records after finishing the forEach insert statement
if (isBulkMode == true)
{
await ApplyBulkInsertStatements(sbInsertBulk, writer, sqlConnection, sqlTransaction, trackingInfo, rowsProcessedTotal).ConfigureAwait(false);
}
}
/// <summary>
/// Applies the bulk insert statements.
/// </summary>
/// <param name="sbInsertBulk">The sb insert bulk.</param>
/// <param name="wrtier">The wrtier.</param>
/// <param name="sqlConnection">The SQL connection.</param>
/// <param name="sqlTransaction">The SQL transaction.</param>
/// <param name="trackingInfo">The tracking information.</param>
/// <param name="rowsProcessedTotal">The rows processed total.</param>
/// <returns>Task</returns>
private async Task ApplyBulkInsertStatements(
StringBuilder sbInsertBulk,
StreamWriter wrtier,
SqlConnection sqlConnection,
SqlTransaction sqlTransaction,
ProcessImporterTrackingDTO trackingInfo,
int rowsProcessedTotal)
{
var bulkInsertStatements = sbInsertBulk.ToString();
using (var cmd = new SqlCommand(bulkInsertStatements, sqlConnection, sqlTransaction))
{
cmd.CommandTimeout = 800;
cmd.CommandType = CommandType.Text;
await cmd.ExecuteScalarAsync().ConfigureAwait(false);
}
// append insert statement to the file
await wrtier.WriteLineAsync(bulkInsertStatements).ConfigureAwait(false);
// Update database with the insert statement created count to alert the user of the status.
await UpdateImportMetricsRowsProcessed(trackingInfo.ImportMetricId, rowsProcessedTotal, trackingInfo.FileDetail.ImportMetricStatusHistories).ConfigureAwait(false);
}

Related

There is already an open DataReader associated with this Command which must be closed first - AGAIN

I have a client running a web site portal which uses some legacy code that is periodically causing an issue. The system will work for days even weeks and then the worker process will die and no longer serve any data and you have to perform an IISRESET to get it working again
I have found numerous postings about this error and in my mind none of the solutions or explanations fit my code.
Here is the method in question that causes my error
/// <summary>
/// Returns Data from Database using Table Name using a field list (if supplied)
/// Otherwise will return all fields.
/// </summary>
/// <param name="TableName">The TableName rquired</param>
/// <param name="WHERE">Where clause if required</param>
/// <param name="FieldNames">String array of required field names (if any)</param>
/// <returns>Dictionary List of results.</returns>
public List<Dictionary<string, object>> Select(string TableName, string WHERE, string[] FieldNames, int TopRecords = -1, string OrderBy = null)
{
string query = string.Empty;
string sFieldNames = string.Empty;
if (FieldNames.Length > 0)
{
sFieldNames = string.Join(", ", FieldNames);
query = string.Format("SELECT {2}{0} FROM {1} ", sFieldNames, TableName, TopRecords > -1 ? "TOP (" + TopRecords + ") " : "");
if (!string.IsNullOrEmpty(WHERE))
{
query += string.Format(" WHERE {0}", WHERE);
}
}
else
{
// Select ALL fields
query = string.Format("SELECT {1}* FROM {0} ", TableName, TopRecords > -1 ? "TOP (" + TopRecords + ") " : "");
if (!string.IsNullOrEmpty(WHERE))
{
query += string.Format(" WHERE {0}", WHERE);
}
}
if (!string.IsNullOrEmpty(OrderBy))
{
query += " ORDER BY " + OrderBy;
}
//Open connection
if (this.OpenConnection() == true)
{
//System.Diagnostics.Debug.WriteLine( "SQL : " + query );
//Create Command
using (SqlCommand cmd = new SqlCommand(query, DBConnection))
{
//Create a data reader and Execute the command
//Read the data and store them in the list
List<Dictionary<string, object>> ResultsSet = null;//Create a list to store the result
using (SqlDataReader dataReader = cmd.ExecuteReader())
{
ResultsSet = new List<Dictionary<string, object>>();
while (dataReader.Read())
{
Dictionary<string, object> ROW = new Dictionary<string, object>();
for (int i = 0; i < dataReader.FieldCount; i++)
{
if (dataReader[i].GetType().ToString() == "System.Byte[]")
{
ROW.Add(dataReader.GetName(i), (byte[])dataReader[i]);
}
else
{
ROW.Add(dataReader.GetName(i), dataReader[i] + string.Empty);
}
}
ResultsSet.Add(ROW);
}
dataReader.Close(); //close Data Reader
cmd.Dispose(); // Only added today - have to wait for some time to see if it fails
}
return ResultsSet;
}
}
else
{
return null;
}
}
Many solutions state you cannot re-use a connection to perform updates but this method does not. I am pretty sure its obvious and it is only fetching the data from the database and no updates are performed.
I don't want to use MARS unless I have absolutely no choice.
Looking for pointers as to what I might have missed
Connection string
<add name="APP.Properties.Settings.DB" connectionString="server=trs-app;User Id=username;password=xxx;Persist Security Info=False;database=TRailS;Pooling=True" providerName="System.Data.SqlClient"/>
OpenConnection Method
//open connection to database
public bool OpenConnection()
{
try
{
if (DBConnection.State != ConnectionState.Open)
{
while (DBConnection.State == ConnectionState.Connecting)
{
// Do Nothing
}
DBConnection.Open();
System.Diagnostics.Debug.WriteLine("SQL Connection Opened");
}
return true;
}
catch (SqlException ex)
{
switch (ex.Number)
{
case 0:
LastError = "Cannot connect to server. Contact administrator";
return false;
case 1045:
LastError = "Invalid username/password, please try again";
return false;
}
LastError = "Unknown Error : " + ex.Message;
return false;
}
}
Just spotted this in the DAL Class - is this the cause!!!
private static SqlConnection DBConnection;
Solution might be to remove static from Sqlconnection variable (DBConnection) and implement the IDisposable pattern in the DAL class as suggested
protected virtual void Dispose(bool disposing)
{
if (disposing == true)
{
DBConnection.Close(); // call close here to close connection
}
}
~MSSQLConnector()
{
Dispose(false);
}
The code you have here ... while not perfect (see Zohar's comment - it is actively dangerous, in fact), does treat the SqlDataReader correctly - it is making use of using, etc. So: if this is throwing this error, there are three possibilities:
of the code we can see, one of the operations inside the while (dataReader.Read()) has side-effects that is causing another SQL operation to be executed on the same connection; frankly I suspect this is unlikely based on the code shown
there is code that we can't see that already has an open reader before this method is called
this could be because some code higher in the call-stack is doing another similar while (dataReader.Read()) {...} - typical in "N+1" scenarios
or it could be because something that happened earlier (but no-longer in the same call-stack) executed a query, and left the reader dangling
your this.OpenConnection() is sharing a connection between different call contexts without any consideration of what is going on (a textbook example of this would be a static connection, or a connection on some kind of "provider" that is shared between multiple call contexts)
2 and 3 are the most likely options. Unfortunately, diagnosing and fixing that requires a lot of code that we can't see.
After reading all the comments and reviewing the code in light of the information provided I have refactored the DAL class to ensure every method used in the class is now set to a using statement to create the connection. I understand IIS will handle the connection pool for this
I am also closing the db connection in code too (I know its not required with a using statement but its just for neatness.
I have a small monitoring application that will periodically refresh the login page of the portal to watch for an outage and log then perform an IISReset so I can monitor if the problem goes away after all.
Thanks for all the input.
//Open connection
using (SqlConnection DBConnection = new SqlConnection(connectionString))
{
DBConnection.Open();
//System.Diagnostics.Debug.WriteLine( "SQL : " + query );
//Create Command
using (SqlCommand cmd = new SqlCommand(query, DBConnection))
{
//Create a data reader and Execute the command
//Read the data and store them in the list
List<Dictionary<string, object>> ResultsSet = null;//Create a list to store the result
using (SqlDataReader dataReader = cmd.ExecuteReader())
{
ResultsSet = new List<Dictionary<string, object>>();
while (dataReader.Read())
{
Dictionary<string, object> ROW = new Dictionary<string, object>();
for (int i = 0; i < dataReader.FieldCount; i++)
{
if (dataReader[i].GetType().ToString() == "System.Byte[]")
{
ROW.Add(dataReader.GetName(i), (byte[])dataReader[i]);
}
else
{
ROW.Add(dataReader.GetName(i), dataReader[i] + string.Empty);
}
}
ResultsSet.Add(ROW);
}
}
DBConnection.Close();
return ResultsSet;
}
}
}

Using ExecuteNonQueryAsync and Reporting Progress

I thought I was trying to do something very simple. I just want to report a running number on the screen so the user gets the idea that the SQL Stored Procedure that I'm executing is working and that they don't get impatient and start clicking buttons.
The problem is that I can't figure out how to actually call the progress reporter for the ExecutNonQueryAsync command. It gets stuck in my reporting loop and never executes the command but, if I put it after the async command, it will get executed and result will never not equal zero.
Any thoughts, comments, ideas would be appreciated. Thank you so much!
int i = 0;
lblProcessing.Text = "Transactions " + i.ToString();
int result = 0;
while (result==0)
{
i++;
if (i % 500 == 0)
{
lblProcessing.Text = "Transactions " + i.ToString();
lblProcessing.Refresh();
}
}
// Yes - I know - the code never gets here - that is the problem!
result = await cmd.ExecuteNonQueryAsync();
The simplest way to do this is to use a second connection to monitor the progress, and report on it. Here's a little sample to get you started:
using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Text;
using System.Threading.Tasks;
namespace Microsoft.Samples.SqlServer
{
public class SessionStats
{
public long Reads { get; set; }
public long Writes { get; set; }
public long CpuTime { get; set; }
public long RowCount { get; set; }
public long WaitTime { get; set; }
public string LastWaitType { get; set; }
public string Status { get; set; }
public override string ToString()
{
return $"Reads {Reads}, Writes {Writes}, CPU {CpuTime}, RowCount {RowCount}, WaitTime {WaitTime}, LastWaitType {LastWaitType}, Status {Status}";
}
}
public class SqlCommandWithProgress
{
public static async Task ExecuteNonQuery(string ConnectionString, string Query, Action<SessionStats> OnProgress)
{
using (var rdr = await ExecuteReader(ConnectionString, Query, OnProgress))
{
rdr.Dispose();
}
}
public static async Task<DataTable> ExecuteDataTable(string ConnectionString, string Query, Action<SessionStats> OnProgress)
{
using (var rdr = await ExecuteReader(ConnectionString, Query, OnProgress))
{
var dt = new DataTable();
dt.Load(rdr);
return dt;
}
}
public static async Task<SqlDataReader> ExecuteReader(string ConnectionString, string Query, Action<SessionStats> OnProgress)
{
var mainCon = new SqlConnection(ConnectionString);
using (var monitorCon = new SqlConnection(ConnectionString))
{
mainCon.Open();
monitorCon.Open();
var cmd = new SqlCommand("select ##spid session_id", mainCon);
var spid = Convert.ToInt32(cmd.ExecuteScalar());
cmd = new SqlCommand(Query, mainCon);
var monitorQuery = #"
select s.reads, s.writes, r.cpu_time, s.row_count, r.wait_time, r.last_wait_type, r.status
from sys.dm_exec_requests r
join sys.dm_exec_sessions s
on r.session_id = s.session_id
where r.session_id = #session_id";
var monitorCmd = new SqlCommand(monitorQuery, monitorCon);
monitorCmd.Parameters.Add(new SqlParameter("#session_id", spid));
var queryTask = cmd.ExecuteReaderAsync( CommandBehavior.CloseConnection );
var cols = new { reads = 0, writes = 1, cpu_time =2,row_count = 3, wait_time = 4, last_wait_type = 5, status = 6 };
while (!queryTask.IsCompleted)
{
var firstTask = await Task.WhenAny(queryTask, Task.Delay(1000));
if (firstTask == queryTask)
{
break;
}
using (var rdr = await monitorCmd.ExecuteReaderAsync())
{
await rdr.ReadAsync();
var result = new SessionStats()
{
Reads = Convert.ToInt64(rdr[cols.reads]),
Writes = Convert.ToInt64(rdr[cols.writes]),
RowCount = Convert.ToInt64(rdr[cols.row_count]),
CpuTime = Convert.ToInt64(rdr[cols.cpu_time]),
WaitTime = Convert.ToInt64(rdr[cols.wait_time]),
LastWaitType = Convert.ToString(rdr[cols.last_wait_type]),
Status = Convert.ToString(rdr[cols.status]),
};
OnProgress(result);
}
}
return queryTask.Result;
}
}
}
}
Which you would call something like this:
class Program
{
static void Main(string[] args)
{
Run().Wait();
}
static async Task Run()
{
var constr = "server=localhost;database=tempdb;integrated security=true";
var sql = #"
set nocount on;
select newid() d
into #foo
from sys.objects, sys.objects o2, sys.columns
order by newid();
select count(*) from #foo;
";
using (var rdr = await SqlCommandWithProgress.ExecuteReader(constr, sql, s => Console.WriteLine(s)))
{
if (!rdr.IsClosed)
{
while (rdr.Read())
{
Console.WriteLine("Row read");
}
}
}
Console.WriteLine("Hit any key to exit.");
Console.ReadKey();
}
}
Which outputs:
Reads 0, Writes 0, CPU 1061, RowCount 0, WaitTime 0, LastWaitType SOS_SCHEDULER_YIELD, Status running
Reads 0, Writes 0, CPU 2096, RowCount 0, WaitTime 0, LastWaitType SOS_SCHEDULER_YIELD, Status running
Reads 0, Writes 0, CPU 4553, RowCount 11043136, WaitTime 198, LastWaitType CXPACKET, Status suspended
Row read
Hit any key to exit.
You're not going to be able to get ExecuteNonQueryAsync to do what you want here. To do what you're looking for, the result of the method would have to be either row by row or in chunks incremented during the SQL call, but that's not how submitting a query batch to SQL Server works or really how you would want it to work from an overhead perspective. You hand a SQL statement to the server and after it is finished processing the statement, it returns the total number of rows affected by the statement.
Do you just want to let the user know that something is happening, and you don't actually need to display current progress?
If so, you could just display a ProgressBar with its Style set to Marquee.
If you want this to be a "self-contained" method, you could display the progress bar on a modal form, and include the form code in the method itself.
E.g.
public void ExecuteNonQueryWithProgress(SqlCommand cmd) {
Form f = new Form() {
Text = "Please wait...",
Size = new Size(400, 100),
StartPosition = FormStartPosition.CenterScreen,
FormBorderStyle = FormBorderStyle.FixedDialog,
MaximizeBox = false,
ControlBox = false
};
f.Controls.Add(new ProgressBar() {
Style = ProgressBarStyle.Marquee,
Dock = DockStyle.Fill
});
f.Shown += async (sender, e) => {
await cmd.ExecuteNonQueryAsync();
f.Close();
};
f.ShowDialog();
}
That is an interesting question. I have had to implement similar things in the past. In our case the priority was to:
Keep client side responsive in case the user doesn't want to stick around and wait.
Update the user of action and progress.
What I would do is use threading to run the process in the background like:
HostingEnvironment.QueueBackgroundWorkItem(ct => FunctionThatCallsSQLandTakesTime(p, q, s));
Then using a way to estimate work time I would increment a progress bar from client side on a clock. For this, query your data for a variable that gives you a linear relationship to the work time needed by FunctionThatCallsSQLandTakesTime.
For example; the number of active users this month drives the time FunctionThatCallsSQLandTakesTime takes. For each 10000 user it takes 5 minutes. So you can update your progress bar accordingly.
I'm wondering if this might be a reasonable approach:
IAsyncResult result = cmd2.BeginExecuteNonQuery();
int count = 0;
while (!result.IsCompleted)
{
count++;
if (count % 500 == 0)
{
lblProcessing.Text = "Transactions " + i.ToString();
lblProcessing.Refresh();
}
// Wait for 1/10 second, so the counter
// does not consume all available resources
// on the main thread.
System.Threading.Thread.Sleep(100);
}

Create a local db in winform and get data from server in the background

I'm looking for a way to enhance the performance of my application which excessively uses the database that is hosted on a server, The application needs to remotely access the database thus, causing it to be slow, so I was thinking about creating a local database and populating it from the server the minute the application is run, and afterwards performing updates on a regular basis to the hosted mySQl database after every hour or when the user decides to logout, the main issue I have is I will be having 10-20 users, they don't update the same kind of data but how will I know which tables have been updated and according to that I would apply the changes over my hosted database? and is there any article or link that has further explanation regarding this issue?
My Application is a C# windows form application and the database is mysql database.
One of the queries that I have and takes too long to execute is this one:
/**
* getting the schedule based on the submitted id
* */
public static Schedule2 getTeachersSchedule(String therapistID, int weekday, int period, int school_year)
{
// connecting to mysql database
try
{
using (MySqlConnection myConn = new MySqlConnection(connectionString))
{
using (MySqlCommand command = new MySqlCommand("Select * FROM student_schedule2, weekday, school_year, period, task where school_year_id = school_year_id_fk AND therapist_id_fk =" + therapistID + " AND weekday_id = weekday_id_fk AND period_id=period_id_fk AND task_id=task_id_fk AND weekday_id=" + weekday + " AND period_id=" + period + " AND school_year_id =" + school_year, myConn))
{
MySqlDataReader reader;
myConn.Open();
reader = command.ExecuteReader();
Schedule2 schedule = null;
while (reader.Read())
{
schedule = new Schedule2();
schedule.ID = reader.GetInt32("student_schedule2_id");
try
{
if (reader["student_id_fk"] != DBNull.Value)
schedule.student = getStudent(reader.GetString("student_id_fk"));
else
schedule.student = null;
Weekday weekDay = new Weekday();
weekDay.ID = reader.GetInt32("weekday_id");
weekDay.weekdayName = reader.GetString("weekday_name");
schedule.weekday = weekDay;
schedule.semesterName = reader.GetString("semester_name");
Period periodObj = new Period();
SchoolYear schoolYEar = new SchoolYear();
periodObj.ID = reader.GetInt32("period_id");
periodObj.period_name = reader.GetString("period_name");
schedule.period = periodObj;
schoolYEar.ID = reader.GetInt32("school_year_id");
schoolYEar.year_name = reader.GetString("school_year_name");
schedule.schoolYear = schoolYEar;
Task course = new Task();
course.ID = reader.GetInt32("task_id");
course.taskName = reader.GetString("task_name");
schedule.task = course;
schedule.therapist = getTherapist(reader.GetString("therapist_id_fk"));
}
catch
{
}
}
myConn.Close();
return schedule;
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
return null;
}
}
Your approach will be prone to stale data and most likely be slower, since all data will be sent over the network.
I suggest you look at your SQL queries and determine why they are slow. Adding indexes to the table will help enormously.

Export a large CSV file in parallel to SQL server

I have a large CSV file... 10 columns, 100 million rows, roughly 6 GB in size on my hard disk.
I want to read this CSV file line by line and then load the data into a Microsoft SQL server database using SQL bulk copy.
I have read couple of threads on here and also on the internet. Most people suggest that reading a CSV file in parallel doesn't buy much in terms of efficiency as the tasks/threads contend for disk access.
What I'm trying to do is, read line by line from CSV and add it to blocking collection of size 100K rows. And once this collection is full spin up a new task/thread to write the data to SQL server using SQLBuckCopy API.
I have written this piece of code, but hitting an error at run time that says "Attempt to invoke bulk copy on an object that has a pending operation." This scenario looks like something that can be easily solved using .NET 4.0 TPL but I'm not able to get it work. Any suggestions on what I'm doing wrong?
public static void LoadCsvDataInParalleToSqlServer(string fileName, string connectionString, string table, DataColumn[] columns, bool truncate)
{
const int inputCollectionBufferSize = 1000000;
const int bulkInsertBufferCapacity = 100000;
const int bulkInsertConcurrency = 8;
var sqlConnection = new SqlConnection(connectionString);
sqlConnection.Open();
var sqlBulkCopy = new SqlBulkCopy(sqlConnection.ConnectionString, SqlBulkCopyOptions.TableLock)
{
EnableStreaming = true,
BatchSize = bulkInsertBufferCapacity,
DestinationTableName = table,
BulkCopyTimeout = (24 * 60 * 60),
};
BlockingCollection<DataRow> rows = new BlockingCollection<DataRow>(inputCollectionBufferSize);
DataTable dataTable = new DataTable(table);
dataTable.Columns.AddRange(columns);
Task loadTask = Task.Factory.StartNew(() =>
{
foreach (DataRow row in ReadRows(fileName, dataTable))
{
rows.Add(row);
}
rows.CompleteAdding();
});
List<Task> insertTasks = new List<Task>(bulkInsertConcurrency);
for (int i = 0; i < bulkInsertConcurrency; i++)
{
insertTasks.Add(Task.Factory.StartNew((x) =>
{
List<DataRow> bulkInsertBuffer = new List<DataRow>(bulkInsertBufferCapacity);
foreach (DataRow row in rows.GetConsumingEnumerable())
{
if (bulkInsertBuffer.Count == bulkInsertBufferCapacity)
{
SqlBulkCopy bulkCopy = x as SqlBulkCopy;
var dataRows = bulkInsertBuffer.ToArray();
bulkCopy.WriteToServer(dataRows);
Console.WriteLine("Inserted rows " + bulkInsertBuffer.Count);
bulkInsertBuffer.Clear();
}
bulkInsertBuffer.Add(row);
}
},
sqlBulkCopy));
}
loadTask.Wait();
Task.WaitAll(insertTasks.ToArray());
}
private static IEnumerable<DataRow> ReadRows(string fileName, DataTable dataTable)
{
using (var textFieldParser = new TextFieldParser(fileName))
{
textFieldParser.TextFieldType = FieldType.Delimited;
textFieldParser.Delimiters = new[] { "," };
textFieldParser.HasFieldsEnclosedInQuotes = true;
while (!textFieldParser.EndOfData)
{
string[] cols = textFieldParser.ReadFields();
DataRow row = dataTable.NewRow();
for (int i = 0; i < cols.Length; i++)
{
if (string.IsNullOrEmpty(cols[i]))
{
row[i] = DBNull.Value;
}
else
{
row[i] = cols[i];
}
}
yield return row;
}
}
}
Don't.
Parallel access may or may not give you faster read of the file (it won't, but I'm not going to fight that battle...) but for certain parallel writes it won't give you faster bulk insert. That is because minimally logged bulk insert (ie. the really fast bulk insert) requires a table lock. See Prerequisites for Minimal Logging in Bulk Import:
Minimal logging requires that the target table meets the following conditions:
...
- Table locking is specified (using TABLOCK).
...
Parallel inserts, by definition, cannot obtain concurrent table locks. QED. You are barking up the wrong tree.
Stop getting your sources from random finding on the internet. Read The Data Loading Performance Guide, is the guide to ... performant data loading.
I would recommend to you stop inventing the wheel. Use an SSIS, this is exactly what is designed to handle.
http://joshclose.github.io/CsvHelper/
https://efbulkinsert.codeplex.com/
If possible for you, I suggest you read your file into a List<T> using the aforementioned csvhelper and write to your db using bulk insert as you are doing or efbulkinsert which I have used and is amazingly fast.
using CsvHelper;
public static List<T> CSVImport<T,TClassMap>(string csvData, bool hasHeaderRow, char delimiter, out string errorMsg) where TClassMap : CsvHelper.Configuration.CsvClassMap
{
errorMsg = string.Empty;
var result = Enumerable.Empty<T>();
MemoryStream memStream = new MemoryStream(Encoding.UTF8.GetBytes(csvData));
StreamReader streamReader = new StreamReader(memStream);
var csvReader = new CsvReader(streamReader);
csvReader.Configuration.RegisterClassMap<TClassMap>();
csvReader.Configuration.DetectColumnCountChanges = true;
csvReader.Configuration.IsHeaderCaseSensitive = false;
csvReader.Configuration.TrimHeaders = true;
csvReader.Configuration.Delimiter = delimiter.ToString();
csvReader.Configuration.SkipEmptyRecords = true;
List<T> items = new List<T>();
try
{
items = csvReader.GetRecords<T>().ToList();
}
catch (Exception ex)
{
while (ex != null)
{
errorMsg += ex.Message + Environment.NewLine;
foreach (var val in ex.Data.Values)
errorMsg += val.ToString() + Environment.NewLine;
ex = ex.InnerException;
}
}
return items;
}
}
Edit - I don't understand what you are doing with the bulk insert. You want to bulk insert the whole list or data data table, not row-by-row.
You can create store procedure and pass the file location like below
CREATE PROCEDURE [dbo].[CSVReaderTransaction]
#Filepath varchar(100)=''
AS
-- STEP 1: Start the transaction
BEGIN TRANSACTION
-- STEP 2 & 3: checking ##ERROR after each statement
EXEC ('BULK INSERT Employee FROM ''' +#Filepath
+''' WITH (FIELDTERMINATOR = '','', ROWTERMINATOR = ''\n'' )')
-- Rollback the transaction if there were any errors
IF ##ERROR <> 0
BEGIN
-- Rollback the transaction
ROLLBACK
-- Raise an error and return
RAISERROR ('Error in inserting data into employee Table.', 16, 1)
RETURN
END
COMMIT TRANSACTION
You can also add BATCHSIZE option like FIELDTERMINATOR and ROWTERMINATOR.

Make sure to insert and update row in database

A bit of pseudocode for you, the system itself is much more verbose:
using (var insertCmd = new SqlCommand("insert new row in database, selects the ID that was inserted", conn)) {
using (var updateCmd = new SqlCommand("update the row with #data1 where id = #idOfInsert", conn)) {
// Got a whole lot of inserts AND updates to process - those two has to be seperated in this system
// I have to make sure all the data that has been readied earlier in the system are inserted
// My MS sql server is known to throw timeout errors, no matter how long the SqlCommand.CommandTimeout is.
for (int i = 0; i < 100000; i++) {
if (i % 100 == 99) { // every 100th items
// sleep for 10 seconds, so the sql server isn't locked while I do my work
System.Threading.Thread.Sleep(1000 * 10);
}
var id = insertCmd.ExecuteScalar().ToString();
updateCmd.Parameters.AddWithValue("#data1", i);
updateCmd.Parameters.AddWithValue("#idOfInsert", id);
updateCmd.ExecuteNonQuery();
}
}
}
How would I make sure that the ExecuteScalar and ExecuteNonQuery are able to recover from exceptions? I have thought of using (I'M VERY SORRY) a goto and exceptions for flow control, such as this:
Restart:
try {
updateCmd.ExecuteNonQuery();
} catch (SqlException) {
System.Threading.Thread.Sleep(1000 * 10); // sleep for 10 seconds
goto Restart;
}
Is there another way to do it, completely?
Instead of goto you can use a loop.
while(sqlQueryHasNotSucceeded)
{
try
{
updateCmd.ExecuteNonQuery();
sqlQueryHasNotSucceeded = false;
}
catch(Exception e)
{
LogError(e);
System.Threading.Thread.Sleep(1000 * 10);
}
}

Categories

Resources