I have the following code, which im trying to use to test out if its possible to have transactions and a notifyAfter property used to raise an event(i have already tried substituting the event for one i create/raise myself but it only gets raised after all the rows have been copied). The following link suggests that its not possible
MSDN
Has anyone had any experience with this? Thanks
using (SqlConnection connection = new SqlConnection(connectionString))
{
connection.Open();
{
try
{
using (SqlBulkCopy copy = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.KeepIdentity |SqlBulkCopyOptions.UseInternalTransaction))
{
//Column mapping for the required columns.
for (int count = 0; count < numberOfColumns; count++)
{
copy.ColumnMappings.Add(count, count);
}
//SQLBulkCopy parameters.
copy.DestinationTableName = dataTableName;
copy.BatchSize = batchSize;
copy.SqlRowsCopied += new SqlRowsCopiedEventHandler(OnSqlRowsCopied);
copy.NotifyAfter = 5;
copy.WriteToServer(fullDataTable);
}
}
//Error(s) occured while trying to commit the transaction.
catch (InvalidOperationException transactionEx)
{
//uploadTransaction.Rollback();
status = "The current transaction has been rolled back due to an error. \n\r" + transactionEx.Message;
MessageBox.Show(status, "Error Message:");
alreadyCaught = true;
throw;
}
}
I would presume because of the transaction, the processing only occurs after the transaction is committed hence you won't get the event raised until after that.
Related
I have a discord bot that gets its data from a SQLite Database. I am using the System.Data.SQLite-Namespace
My problem is this code part:
m_dbConnection.Open();
SQLiteDataReader sqlite_datareader;
SQLiteCommand sqlite_cmd;
sqlite_cmd = m_dbConnection.CreateCommand();
sqlite_cmd.CommandText = SQLCommand; //SQLCommand is a command parameter
sqlite_datareader = sqlite_cmd.ExecuteReader();
while (sqlite_datareader.Read())
{
int i = 0;
while (true)
{
try
{
string temp = "";
try
{
temp = sqlite_datareader.GetString(i).ToString();
}
catch (Exception e)
{
Console.WriteLine(e.Message);
try
{
temp = sqlite_datareader.GetInt32(i).ToString();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
break;
}
}
output.Add(temp);
i++;
}
catch (Exception)
{
break;
}
}
}
For this example the variable SQLCommand is "SELECT Money FROM Users WHERE UserId = 12345 AND ServerID = 54321".
When I execute this command in an SQL Editor, , I get the value "10". So the command works. Now when I pass this command in my method, to get the data, I just got with the editor, I get the error Specified cast is not valid. at the code temp = sqlite_datareader.GetString(i).ToString();.
The value i is 0, to get the very first row that the sql command selected. I don't know why this happens, every other SQLite-Command works and gives me what I want. Why isn't this command working too?
Try using it this way
while (sqlite_datareader.Read())
{
for (int i = 0; i < reader.FieldCount; i++)
{
var ColName = reader.GetName(i);
var colValue = reader[i];
}
}
Please note that
while (sqlite_datareader.Read()){..}
the purpose of the above statement is to fetch all the rows.
therefore I would like to mention the problems in your code
1-) while(true){...}
is infinite loop, ofcourse in this scenario it would hit the break and quit but still this is not a good practice
2-) int i = 0;
you have declared and increased it by one inside the while loop. The problem here is that:
say you have 100 rows and 10 columns; this means that i would be increased to 99
however you have 10 colums, trying to get an invalid column value would give you an error
putting your code in try catch/ nested try catch statements would solve the issue however it's a nasty solution.
I have a large CSV file... 10 columns, 100 million rows, roughly 6 GB in size on my hard disk.
I want to read this CSV file line by line and then load the data into a Microsoft SQL server database using SQL bulk copy.
I have read couple of threads on here and also on the internet. Most people suggest that reading a CSV file in parallel doesn't buy much in terms of efficiency as the tasks/threads contend for disk access.
What I'm trying to do is, read line by line from CSV and add it to blocking collection of size 100K rows. And once this collection is full spin up a new task/thread to write the data to SQL server using SQLBuckCopy API.
I have written this piece of code, but hitting an error at run time that says "Attempt to invoke bulk copy on an object that has a pending operation." This scenario looks like something that can be easily solved using .NET 4.0 TPL but I'm not able to get it work. Any suggestions on what I'm doing wrong?
public static void LoadCsvDataInParalleToSqlServer(string fileName, string connectionString, string table, DataColumn[] columns, bool truncate)
{
const int inputCollectionBufferSize = 1000000;
const int bulkInsertBufferCapacity = 100000;
const int bulkInsertConcurrency = 8;
var sqlConnection = new SqlConnection(connectionString);
sqlConnection.Open();
var sqlBulkCopy = new SqlBulkCopy(sqlConnection.ConnectionString, SqlBulkCopyOptions.TableLock)
{
EnableStreaming = true,
BatchSize = bulkInsertBufferCapacity,
DestinationTableName = table,
BulkCopyTimeout = (24 * 60 * 60),
};
BlockingCollection<DataRow> rows = new BlockingCollection<DataRow>(inputCollectionBufferSize);
DataTable dataTable = new DataTable(table);
dataTable.Columns.AddRange(columns);
Task loadTask = Task.Factory.StartNew(() =>
{
foreach (DataRow row in ReadRows(fileName, dataTable))
{
rows.Add(row);
}
rows.CompleteAdding();
});
List<Task> insertTasks = new List<Task>(bulkInsertConcurrency);
for (int i = 0; i < bulkInsertConcurrency; i++)
{
insertTasks.Add(Task.Factory.StartNew((x) =>
{
List<DataRow> bulkInsertBuffer = new List<DataRow>(bulkInsertBufferCapacity);
foreach (DataRow row in rows.GetConsumingEnumerable())
{
if (bulkInsertBuffer.Count == bulkInsertBufferCapacity)
{
SqlBulkCopy bulkCopy = x as SqlBulkCopy;
var dataRows = bulkInsertBuffer.ToArray();
bulkCopy.WriteToServer(dataRows);
Console.WriteLine("Inserted rows " + bulkInsertBuffer.Count);
bulkInsertBuffer.Clear();
}
bulkInsertBuffer.Add(row);
}
},
sqlBulkCopy));
}
loadTask.Wait();
Task.WaitAll(insertTasks.ToArray());
}
private static IEnumerable<DataRow> ReadRows(string fileName, DataTable dataTable)
{
using (var textFieldParser = new TextFieldParser(fileName))
{
textFieldParser.TextFieldType = FieldType.Delimited;
textFieldParser.Delimiters = new[] { "," };
textFieldParser.HasFieldsEnclosedInQuotes = true;
while (!textFieldParser.EndOfData)
{
string[] cols = textFieldParser.ReadFields();
DataRow row = dataTable.NewRow();
for (int i = 0; i < cols.Length; i++)
{
if (string.IsNullOrEmpty(cols[i]))
{
row[i] = DBNull.Value;
}
else
{
row[i] = cols[i];
}
}
yield return row;
}
}
}
Don't.
Parallel access may or may not give you faster read of the file (it won't, but I'm not going to fight that battle...) but for certain parallel writes it won't give you faster bulk insert. That is because minimally logged bulk insert (ie. the really fast bulk insert) requires a table lock. See Prerequisites for Minimal Logging in Bulk Import:
Minimal logging requires that the target table meets the following conditions:
...
- Table locking is specified (using TABLOCK).
...
Parallel inserts, by definition, cannot obtain concurrent table locks. QED. You are barking up the wrong tree.
Stop getting your sources from random finding on the internet. Read The Data Loading Performance Guide, is the guide to ... performant data loading.
I would recommend to you stop inventing the wheel. Use an SSIS, this is exactly what is designed to handle.
http://joshclose.github.io/CsvHelper/
https://efbulkinsert.codeplex.com/
If possible for you, I suggest you read your file into a List<T> using the aforementioned csvhelper and write to your db using bulk insert as you are doing or efbulkinsert which I have used and is amazingly fast.
using CsvHelper;
public static List<T> CSVImport<T,TClassMap>(string csvData, bool hasHeaderRow, char delimiter, out string errorMsg) where TClassMap : CsvHelper.Configuration.CsvClassMap
{
errorMsg = string.Empty;
var result = Enumerable.Empty<T>();
MemoryStream memStream = new MemoryStream(Encoding.UTF8.GetBytes(csvData));
StreamReader streamReader = new StreamReader(memStream);
var csvReader = new CsvReader(streamReader);
csvReader.Configuration.RegisterClassMap<TClassMap>();
csvReader.Configuration.DetectColumnCountChanges = true;
csvReader.Configuration.IsHeaderCaseSensitive = false;
csvReader.Configuration.TrimHeaders = true;
csvReader.Configuration.Delimiter = delimiter.ToString();
csvReader.Configuration.SkipEmptyRecords = true;
List<T> items = new List<T>();
try
{
items = csvReader.GetRecords<T>().ToList();
}
catch (Exception ex)
{
while (ex != null)
{
errorMsg += ex.Message + Environment.NewLine;
foreach (var val in ex.Data.Values)
errorMsg += val.ToString() + Environment.NewLine;
ex = ex.InnerException;
}
}
return items;
}
}
Edit - I don't understand what you are doing with the bulk insert. You want to bulk insert the whole list or data data table, not row-by-row.
You can create store procedure and pass the file location like below
CREATE PROCEDURE [dbo].[CSVReaderTransaction]
#Filepath varchar(100)=''
AS
-- STEP 1: Start the transaction
BEGIN TRANSACTION
-- STEP 2 & 3: checking ##ERROR after each statement
EXEC ('BULK INSERT Employee FROM ''' +#Filepath
+''' WITH (FIELDTERMINATOR = '','', ROWTERMINATOR = ''\n'' )')
-- Rollback the transaction if there were any errors
IF ##ERROR <> 0
BEGIN
-- Rollback the transaction
ROLLBACK
-- Raise an error and return
RAISERROR ('Error in inserting data into employee Table.', 16, 1)
RETURN
END
COMMIT TRANSACTION
You can also add BATCHSIZE option like FIELDTERMINATOR and ROWTERMINATOR.
Currently playing around with Dapper I'm trying to insert values into the db as follows
using (var sqlCon = new SqlConnection(Context.ReturnDatabaseConnection()))
{
sqlCon.Open();
try
{
var emailExists = sqlCon.Query<UserProfile>(#"SELECT UserId FROM User_Profile WHERE EmailAddress = #EmailAddress",
new { EmailAddress = userRegister.EmailAddress.Trim() }).FirstOrDefault();
if (emailExists == null) // No profile exists with the email passed in, so insert the new user.
{
userProfile.UniqueId = Guid.NewGuid();
userProfile.Firstname = userRegister.Firstname;
userProfile.Surname = userRegister.Surname;
userProfile.EmailAddress = userRegister.EmailAddress;
userProfile.Username = CreateUsername(userRegister.Firstname);
userProfile.Password = EncryptPassword(userRegister.Password);
userProfile.AcceptedTerms = true;
userProfile.AcceptedTermsDate = System.DateTime.Now;
userProfile.AccountActive = true;
userProfile.CurrentlyOnline = true;
userProfile.ClosedAccountDate = null;
userProfile.JoinedDate = System.DateTime.Now;
userProfile.UserId = SqlMapperExtensions.Insert(sqlCon, userProfile); // Error on this line
Registration.SendWelcomeEmail(userRegister.EmailAddress, userRegister.Firstname); // Send welcome email to new user.
}
}
catch (Exception e)
{
Console.WriteLine(e);
}
finally
{
sqlCon.Close();
}
}
The error I get is
ExecuteNonQuery requires the command to have a transaction when the connection
assigned to the command is in a pending local transaction. The Transaction
property of the command has not been initialized.
I have googled this error, but I misunderstood the answers provided.
From the error message I assume that you have started a transaction that was neither committed nor rolled back. The real cause for this error message is elsewhere.
I suggest you to log requests in Context.ReturnDatabaseConnection() and trace what requests precede this error.
Also I advice you to look in your code for all transactions and check if they are correctly completed (commit/rollback).
A bit of pseudocode for you, the system itself is much more verbose:
using (var insertCmd = new SqlCommand("insert new row in database, selects the ID that was inserted", conn)) {
using (var updateCmd = new SqlCommand("update the row with #data1 where id = #idOfInsert", conn)) {
// Got a whole lot of inserts AND updates to process - those two has to be seperated in this system
// I have to make sure all the data that has been readied earlier in the system are inserted
// My MS sql server is known to throw timeout errors, no matter how long the SqlCommand.CommandTimeout is.
for (int i = 0; i < 100000; i++) {
if (i % 100 == 99) { // every 100th items
// sleep for 10 seconds, so the sql server isn't locked while I do my work
System.Threading.Thread.Sleep(1000 * 10);
}
var id = insertCmd.ExecuteScalar().ToString();
updateCmd.Parameters.AddWithValue("#data1", i);
updateCmd.Parameters.AddWithValue("#idOfInsert", id);
updateCmd.ExecuteNonQuery();
}
}
}
How would I make sure that the ExecuteScalar and ExecuteNonQuery are able to recover from exceptions? I have thought of using (I'M VERY SORRY) a goto and exceptions for flow control, such as this:
Restart:
try {
updateCmd.ExecuteNonQuery();
} catch (SqlException) {
System.Threading.Thread.Sleep(1000 * 10); // sleep for 10 seconds
goto Restart;
}
Is there another way to do it, completely?
Instead of goto you can use a loop.
while(sqlQueryHasNotSucceeded)
{
try
{
updateCmd.ExecuteNonQuery();
sqlQueryHasNotSucceeded = false;
}
catch(Exception e)
{
LogError(e);
System.Threading.Thread.Sleep(1000 * 10);
}
}
I've written a small console app that I point to a folder containing DBF/FoxPo files.
It then creates a table in SQL based on each dbf table, then does a bulk copy to insert the data into SQL. It works quite well for the most part, except for a few snags..
1) Some of the FoxPro tables contain 5000000+ records and the connection expries before the insert completes..
Here is my connection string:
<add name="SQL" connectionString="data source=source_source;persist security info=True;user id=DBFToSQL;password=DBFToSQL;Connection Timeout=20000;Max Pool Size=200" providerName="System.Data.SqlClient" />
Error message:
"Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding."
CODE:
using (SqlConnection SQLConn = new SqlConnection(SQLString))
using (OleDbConnection FPConn = new OleDbConnection(FoxString))
{
ServerConnection srvConn = new Microsoft.SqlServer.Management.Common.ServerConnection(SQLConn);
try
{
FPConn.Open();
string dataString = String.Format("Select * from {0}", tableName);
using (OleDbCommand Command = new OleDbCommand(dataString, FPConn))
using (OleDbDataReader Reader = Command.ExecuteReader(CommandBehavior.SequentialAccess))
{
tbl = new Table(database, tableName, "schema");
for (int i = 0; i < Reader.FieldCount; i++)
{
col = new Column(tbl, Reader.GetName(i), ConvertTypeToDataType(Reader.GetFieldType(i)));
col.Nullable = true;
tbl.Columns.Add(col);
}
tbl.Create();
BulkCopy(Reader, tableName);
}
}
catch (Exception ex)
{
// LogText(ex, #"C:\LoadTable_Errors.txt", tableName);
throw ex;
}
finally
{
SQLConn.Close();
srvConn.Disconnect();
}
}
private DataType ConvertTypeToDataType(Type type)
{
switch (type.ToString())
{
case "System.Decimal":
return DataType.Decimal(18, 38);
case "System.String":
return DataType.NVarCharMax;
case "System.Int32":
return DataType.Int;
case "System.DateTime":
return DataType.DateTime;
case "System.Boolean":
return DataType.Bit;
default:
throw new NotImplementedException("ConvertTypeToDataType Not implemented for type : " + type.ToString());
}
}
private void BulkCopy(OleDbDataReader reader, string tableName)
{
using (SqlConnection SQLConn = new SqlConnection(SQLString))
{
SQLConn.Open();
SqlBulkCopy bulkCopy = new SqlBulkCopy(SQLConn);
bulkCopy.DestinationTableName = "schema." + tableName;
try
{
bulkCopy.WriteToServer(reader);
}
catch (Exception ex)
{
//LogText(ex, #"C:\BulkCopy_Errors.txt", tableName);
}
finally
{
SQLConn.Close();
reader.Close();
}
}
}
My 2nd & 3rd errors are the following:
I understand what the issues are, but how to rectify them i'm not so sure
2) "The provider could not determine the Decimal value. For example, the row was just created, the default for the Decimal column was not available, and the consumer had not yet set a new Decimal value."
3) SqlDateTime overflow. Must be between 1/1/1753 12:00:00 AM and 12/31/9999 11:59:59 PM.
I found a result on google that indicated what the issue is : [A]... and a possible work around [B] (but I'd like to keep my decimal values as decimal, and dates as date, as I'll be doing further calculations against the data)
What I'm wanting to do as a solution
1.) Either increase the connection time, (but i dont think i can increase it any more than i have), or alternatively is it possible to split the OleDbDataReader's results and do in incremental bulk insert?
2.)I was thinking if its possible to have bulk copy to ignore results with errors, or have the records that do error out log to a csv file or something to that extent?
So where you do the "for" statement I would probably break it up to take so many at a time :
int i = 0;
int MaxCount = 1000;
while (i < Reader.FieldCount)
{
var tbl = new Table(database, tableName, "schema");
for (int j = i; j < MaxCount; j++)
{
col = new Column(tbl, Reader.GetName(j), ConvertTypeToDataType(Reader.GetFieldType(j)));
col.Nullable = true;
tbl.Columns.Add(col);
i++;
}
tbl.Create();
BulkCopy(Reader, tableName);
}
So, "i" keeps track of the overall count, "j" keeps track of the incremental count (ie your max at one time count) and when you have created your 'batch', you create the table and Bulk Copy it.
Does that look like what you would expect?
Cheers,
Chris.
This is my current attemt at the bulk copy method, I't works for about 90% of the tables, but i get an OutOfMemory exeption, with the bigger tables... I'd like to split the reader's data into smaller secions, without having to pass it into a DataTable and store it in memory first (which is the cause of the OutOfMemory exception on the bigger result sets)
UPDATE
Imodified the code below as to how it looks in my solution.. It aint pretty.. but it works. I'll def do some refactoring, and update my answer again.
private void BulkCopy(OleDbDataReader reader, string tableName, Table table)
{
Console.WriteLine(tableName + " BulkCopy Started.");
try
{
DataTable tbl = new DataTable();
List<Type> typeList = new List<Type>();
foreach (Column col in table.Columns)
{
tbl.Columns.Add(col.Name, ConvertDataTypeToType(col.DataType));
typeList.Add(ConvertDataTypeToType(col.DataType));
}
int batch = 1;
int counter = 0;
DataRow tblRow = tbl.NewRow();
while (reader.Read())
{
counter++;
int colcounter = 0;
foreach (Column col in table.Columns)
{
try
{
tblRow[colcounter] = reader[colcounter];
}
catch (Exception)
{
tblRow[colcounter] = GetDefault(typeList[0]);
}
colcounter++;
}
tbl.LoadDataRow(tblRow.ItemArray, true);
if (counter == BulkInsertIncrement)
{
Console.WriteLine(tableName + " :: Batch >> " + batch);
counter = PerformInsert(tableName, tbl, batch);
batch++;
}
}
if (counter > 0)
{
Console.WriteLine(tableName + " :: Batch >> " + batch);
PerformInsert(tableName, tbl, counter);
}
tbl = null;
Console.WriteLine("BulkCopy Success!");
}
catch (Exception ex)
{
Console.WriteLine("BulkCopy Fail!");
SharedLogger.Write(ex, #"C:\BulkCopy_Errors.txt", tableName);
Console.WriteLine(ex.Message);
}
finally
{
reader.Close();
reader.Dispose();
}
Console.WriteLine(tableName + " BulkCopy Ended.");
Console.WriteLine("*****");
Console.WriteLine("");
}