DataSet Inserts take progressively longer - c#

Similar to this question I am running through a datatable, using the data to fill a new dataset for the purposes of data migration.
The migration inserts into a data set then every 5000 records the added rows get saved to the database using EricEJ SqlCeBulkCopy method.
My problem is that for the first amount of records (5000 ish) the average milliseconds taken per record is around 150-200, but it gradually increases. at record 11000 this figure is now at around 475 milliseconds.
I have a typed data set with EnforceConstraints turned off.
The actual database write always takes less than a second so I am pretty sure it is not the database itself, so I am left with the code taken longer to run each iteration, which could be down to the code itself or something I am not realising about datasets.
Could the dataset be increasing the time because it is using indexes or some keys that are not turned off by using the EnforceConstraints = false property?
One other thought is that I am checking to see if a record exists before inserting it, so I have tried both the Linq methods .ANY() and FirstOrDefault() != null
I iterate through a datatable, for each record I read some values then pass them to this method.
private int MigrateItems(string reference, string brand, string captureSite, string captureOperator, DateTime captureDate, DateTime addedDate, DateTime updatedDate, bool retain)
{
//prepare the inputs
reference = reference.Trim();
int brandID = -1, databaseUpdateID = -1, captureID = -1, insertedRowID = -1;
//get the foreign keys
brandID = MigrateBrands(brand);
databaseUpdateID = MigrateDatabaseUpdates(reference);
captureID = MigrateCaptures(captureSite, captureOperator, captureDate);
//if the item doesn't exist then add it
bool exists = dataSet.Item.FirstOrDefault(a => string.Equals(a.Reference, reference, StringComparison.CurrentCultureIgnoreCase)) == null ? false : true; ;
if (exists == false)
{
var insertedRow = dataSet.Item.AddItemRow(brandID, databaseUpdateID, captureID, reference, retain, updatedDate, addedDate);
insertedRowID = insertedRow.ID;
}
else insertedRowID = dataSet.Item.Single(a => string.Equals(a.Reference, reference, StringComparison.CurrentCultureIgnoreCase)).ID;
return insertedRowID;
}
Once 5000 records have been iterated or all records have been done then I call this method:
private void BulkInsertData()
{
using (var bulkCopier = new SqlCeBulkCopy(connectionString))
{
bulkCopier.DestinationTableName = dataSet.Brand.TableName;
bulkCopier.WriteToServer(dataSet.Brand.Where(a => a.RowState == DataRowState.Added).AsEnumerable());
//(same code for all the tables)
//change all row states to unchanged
dataSet.AcceptChanges();
}
}
I'm using the following:
C#
Visual Studio 2012
Sql Server Ce 4.0

Related

Remove duplicate rows from c# datatable using dataview.totable() not working for empty row values

I need to check existing datarows(for all columns) for possible duplicates in a datatable before adding a new row. Currently using the approach of adding a new datarow(using itemarray) to the table dt1 and comparing the row count of the existing table(dt1) and new table(dt1.dataview.totable(true)). However this does not seem to be working when the rows contain empty/null values
| Col1 | Col2 |
|------|------|
|Abc | |
| |Xyz |
When i try to add a new datarow - (Abc, ), it gets interpreted as a distinct row. Is there a way to handle such empty/null values using dataview.totable(), or linq approach to check if a row already exists(values of all columns need to be checked), or any other easier way? *the datatable columns get changed dynamically, so can't rely on column names
Gee, either always put in dbnull for a "", or always put in "" for dbnull.
If you don't adopt a set of HARD rules in your data, then coding this out will forevermore be a real pain. Computers and code HAS to have solid rules, and if you designs are well, sort of blank, but sort of null - the whole process becomes painful. This "long time" rule (be 100% defined in your rules) applies to all code - not just this example.
I think for this case? I would just "code right" though this problem if we going to ignore the above. (my choice would be always null - and use a data view against the table to filter and check in one operation.
However, due to the flip flop between "" and nulls?
Then just brute force code this out.
This works:
My table from SQL server:
SELECT FirstName, LastName, HotelName from tblHotels
Above is shoved into a data table - can have nulls.
So, now our code is thus this:
object[] MySearch = new[] { txtFirst.Text, txtLast.Text, txtHotel.Text };
bool bolFound = false;
foreach (var MyRow in MyTable.Rows)
{
if (FindRow(MySearch, MyRow) == true)
{
bolFound = true;
break;
}
}
Response.Write("<h2>Results of search = " + bolFound.ToString() + "</h2>");
}
public bool FindRow(object[] MySearch, DataRow OneRow)
{
object[] ar2;
ar2 = OneRow.ItemArray;
for (int i = 0; i <= Information.UBound(MySearch); i++)
{
if (Information.IsDBNull(ar2[i]))
{
if (MySearch[i] != "")
return false;
}
else if (MySearch[i] != ar2[i])
return false;
}
return true;
}
So, you just loop the rows - and your test converts any db null in the datatable to "". You ASSUME that your search array will ALWAYS use "" and not a dbnull in that array.
Above is less then pretty, but it not a huge amount of code.

Insert and ignore duplications

I have a list of rows that I want to insert in one batch (add X rows, single call to SaveChanges). Unfortunately, from time to time, some of the items in the list already exist. Since the insertion process is taking place in a transaction (all or nothing), nothing gets added.
Code to show the idea:
using (var context = new CacheDbContext())
{
context.Counter.Add(new Counter
{
Id = "test-3",
CounterType = "test",
Expiry = DateTime.UtcNow.AddHours(1),
Value = 0
});
context.Counter.Add(new Counter
{
Id = "test-2",
CounterType = "test",
Expiry = DateTime.UtcNow.AddHours(1),
Value = 0
});
await context.SaveChangesAsync().ConfigureAwait(false);
}
My goal is to do an insertion and if the one or more items are already exists - ignore.
The naive solution is to check if the ID is exists, before insert it. This will work, but it has poor performance. I want to execute multiple inserts, with one call.
I know that it possible using SQL like this:
INSERT INTO table_name(c1)
VALUES(c1)
ON DUPLICATE KEY UPDATE c1 = VALUES(c1) + 1;
if my EF will translate my SQL INSERT statement for something like this, it will be good for me.
Is this possible?
Any other solution will be welcome.

Linq-To-Sql: Why my changeset is always zero?

I have a query similar to the one on msdn site:
// Query the database for the row to be updated.
var query =
from ord in db.Orders
where ord.OrderID == 11000
select ord;
// Execute the query, and change the column values
// you want to change.
foreach (Order ord in query)
{
ord.ShipName = "Mariner";
ord.ShipVia = 2;
// Insert any additional changes to column values.
}
// Submit the changes to the database.
try
{
db.SubmitChanges();
}
catch (Exception e)
{
Console.WriteLine(e);
// Provide for exceptions.
}
What I want now, is a way to know the affected rows of last update command. I've tried using:
int affectedRows = dc.GetChangeSet().Updates.Count;
in various manner, but this instruction always getting me 0, even if the table is correctly updated.
dc.GetChangeSet() tells you how many changes your LINQ to SQL context is planning to make when you call SubmitChanges(). It does not track the number of affected rows as reported by the database.
If you call int affectedRows = dc.GetChangeSet().Updates.Count; before calling SubmitChanges, you will see how many rows it expects to have affected. After calling SubmitChanges, there are no more pending changes, so you'll always get a zero count.

Fastest way to import from Excel to MVC3 Application

I'm working on an import from a CSV file to my ASP.NET MVC3/C#/Entity Framework Application.
Currently this is my code, but I'm looking to optimise:
var excel = new ExcelQueryFactory(file);
var data = from c in excel.Worksheet(0)
select c;
var dataList = data.ToList();
List<FullImportExcel> importList = new List<FullImportExcel>();
foreach (var s in dataList.ToArray())
{
if ((s[0].ToString().Trim().Length < 6) && (s[1].ToString().Trim().Length < 7))
{
FullImportExcel item = new FullImportExcel();
item.Carrier = s[0].ToString().Trim();
item.FlightNo = s[1].ToString().Trim();
item.CodeFlag = s[2].ToString().Trim();
//etc etc (50 more columns here)
importList.Add(item);
}
}
PlannerEntities context = null;
context = new PlannerEntities();
context.Configuration.AutoDetectChangesEnabled = false;
int count = 0;
foreach (var item in importList)
{
++count;
context = AddToFullImportContext(context, item, count, 100, true);
}
private PlannerEntities AddToFullImportContext(PlannerEntities context, FullImportExcel entity, int count, int commitCount, bool recreateContext)
{
context.Set<FullImportExcel>().Add(entity);
if (count % commitCount == 0)
{
context.SaveChanges();
if (recreateContext)
{
context.Dispose();
context = new PlannerEntities();
context.Configuration.AutoDetectChangesEnabled = false;
}
}
return context;
}
This works fine, but isn't as quick as it could be, and the import that I'm going to need to do will be a minimum of 2 million lines every month. Are there any better methods out there for bulk imports?
Am I better avoiding EF altogether and using SQLConnection and inserting that way?
Thanks
I do like how you're only committing records every X number of records (100 in your case.)
I've recently written a system that once a month, needed to update the status of upwards of 50,000 records in one go - this is updating each record and inserting an audit record for each updated record.
Originally I wrote this with the entity framework, and it took 5-6 minutes to do this part of the task. SQL Profiler showed me it was doing 100,000 SQL queries - one UPDATE and one INSERT per record (as expected I guess.)
I changed this to a stored procedure which takes a comma-separated list of record IDs, the status and user ID as parameters, which does a mass-update followed by a mass-insert. This now takes 5 seconds.
In your case, for this number of records, I'd recommend creating a BULK IMPORT file and passing that over to SQL to import.
http://msdn.microsoft.com/en-us/library/ms188365.aspx
For large number of inserts in SQL Server Bulk Copy is the fastest way. You can use the SqlBulkCopy class for accessing Bulk Copy from code. You have to create an IDataReader for your List or you can use this IDataReader for inserting generic Lists I have written.
Thanks to Andy for the heads up - this was the code used in SQL, with a little help from the ever helpful, Pinal Dave - http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/ :)
DECLARE #bulkinsert NVARCHAR(2000)
DECLARE #filepath NVARCHAR(100)
set #filepath = 'C:\Users\Admin\Desktop\FullImport.csv'
SET #bulkinsert =
N'BULK INSERT FullImportExcel2s FROM ''' +
#filepath +
N''' WITH (FIRSTROW = 2, FIELDTERMINATOR = '','', ROWTERMINATOR = ''\n'')'
EXEC sp_executesql #bulkinsert
Still got a bit of work to do to work it into the code, but we're down to 25 seconds for 50000 rows instead of an hour, so a huge improvement!

SQLException : String or binary data would be truncated

I have a C# code which does lot of insert statements in a batch. While executing these statements, I got "String or binary data would be truncated" error and transaction roledback.
To find out the which insert statement caused this, I need to insert one by one in the SQLServer until I hit the error.
Is there clever way to findout which statement and which field caused this issue using exception handling? (SqlException)
In general, there isn't a way to determine which particular statement caused the error. If you're running several, you could watch profiler and look at the last completed statement and see what the statement after that might be, though I have no idea if that approach is feasible for you.
In any event, one of your parameter variables (and the data inside it) is too large for the field it's trying to store data in. Check your parameter sizes against column sizes and the field(s) in question should be evident pretty quickly.
This type of error occurs when the datatype of the SQL Server column has a length which is less than the length of the data entered into the entry form.
this type of error generally occurs when you have to put characters or values more than that you have specified in Database table like in that case: you specify
transaction_status varchar(10)
but you actually trying to store
_transaction_status
which contain 19 characters. that's why you faced this type of error in this code
Generally it is that you are inserting a value that is greater than the maximum allowed value. Ex, data column can only hold up to 200 characters, but you are inserting 201-character string
BEGIN TRY
INSERT INTO YourTable (col1, col2) VALUES (#val1, #val2)
END TRY
BEGIN CATCH
--print or insert into error log or return param or etc...
PRINT '#val1='+ISNULL(CONVERT(varchar,#val1),'')
PRINT '#val2='+ISNULL(CONVERT(varchar,#val2),'')
END CATCH
For SQL 2016 SP2 or higher follow this link
For older versions of SQL do this:
Get the query that is causing the problems (you can also use SQL Profiler if you dont have the source)
Remove all WHERE clauses and other unimportant parts until you are basically just left with the SELECT and FROM parts
Add WHERE 0 = 1 (this will select only table structure)
Add INTO [MyTempTable] just before the FROM clause
You should end up with something like
SELECT
Col1, Col2, ..., [ColN]
INTO [MyTempTable]
FROM
[Tables etc.]
WHERE 0 = 1
This will create a table called MyTempTable in your DB that you can compare to your target table structure i.e. you can compare the columns on both tables to see where they differ. It is a bit of a workaround but it is the quickest method I have found.
It depends on how you are making the Insert Calls. All as one call, or as individual calls within a transaction? If individual calls, then yes (as you iterate through the calls, catch the one that fails). If one large call, then no. SQL is processing the whole statement, so it's out of the hands of the code.
I have created a simple way of finding offending fields by:
Getting the column width of all the columns of a table where we're trying to make this insert/ update. (I'm getting this info directly from the database.)
Comparing the column widths to the width of the values we're trying to insert/ update.
Assumptions/ Limitations:
The column names of the table in the database match with the C# entity fields. For eg: If you have a column like this in database:
You need to have your Entity with the same column name:
public class SomeTable
{
// Other fields
public string SourceData { get; set; }
}
You're inserting/ updating 1 entity at a time. It'll be clearer in the demo code below. (If you're doing bulk inserts/ updates, you might want to either modify it or use some other solution.)
Step 1:
Get the column width of all the columns directly from the database:
// For this, I took help from Microsoft docs website:
// https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.getschema?view=netframework-4.7.2#System_Data_SqlClient_SqlConnection_GetSchema_System_String_System_String___
private static Dictionary<string, int> GetColumnSizesOfTableFromDatabase(string tableName, string connectionString)
{
var columnSizes = new Dictionary<string, int>();
using (var connection = new SqlConnection(connectionString))
{
// Connect to the database then retrieve the schema information.
connection.Open();
// You can specify the Catalog, Schema, Table Name, Column Name to get the specified column(s).
// You can use four restrictions for Column, so you should create a 4 members array.
String[] columnRestrictions = new String[4];
// For the array, 0-member represents Catalog; 1-member represents Schema;
// 2-member represents Table Name; 3-member represents Column Name.
// Now we specify the Table_Name and Column_Name of the columns what we want to get schema information.
columnRestrictions[2] = tableName;
DataTable allColumnsSchemaTable = connection.GetSchema("Columns", columnRestrictions);
foreach (DataRow row in allColumnsSchemaTable.Rows)
{
var columnName = row.Field<string>("COLUMN_NAME");
//var dataType = row.Field<string>("DATA_TYPE");
var characterMaxLength = row.Field<int?>("CHARACTER_MAXIMUM_LENGTH");
// I'm only capturing columns whose Datatype is "varchar" or "char", i.e. their CHARACTER_MAXIMUM_LENGTH won't be null.
if(characterMaxLength != null)
{
columnSizes.Add(columnName, characterMaxLength.Value);
}
}
connection.Close();
}
return columnSizes;
}
Step 2:
Compare the column widths with the width of the values we're trying to insert/ update:
public static Dictionary<string, string> FindLongBinaryOrStringFields<T>(T entity, string connectionString)
{
var tableName = typeof(T).Name;
Dictionary<string, string> longFields = new Dictionary<string, string>();
var objectProperties = GetProperties(entity);
//var fieldNames = objectProperties.Select(p => p.Name).ToList();
var actualDatabaseColumnSizes = GetColumnSizesOfTableFromDatabase(tableName, connectionString);
foreach (var dbColumn in actualDatabaseColumnSizes)
{
var maxLengthOfThisColumn = dbColumn.Value;
var currentValueOfThisField = objectProperties.Where(f => f.Name == dbColumn.Key).First()?.GetValue(entity, null)?.ToString();
if (!string.IsNullOrEmpty(currentValueOfThisField) && currentValueOfThisField.Length > maxLengthOfThisColumn)
{
longFields.Add(dbColumn.Key, $"'{dbColumn.Key}' column cannot take the value of '{currentValueOfThisField}' because the max length it can take is {maxLengthOfThisColumn}.");
}
}
return longFields;
}
public static List<PropertyInfo> GetProperties<T>(T entity)
{
//The DeclaredOnly flag makes sure you only get properties of the object, not from the classes it derives from.
var properties = entity.GetType()
.GetProperties(System.Reflection.BindingFlags.Public
| System.Reflection.BindingFlags.Instance
| System.Reflection.BindingFlags.DeclaredOnly)
.ToList();
return properties;
}
Demo:
Let's say we're trying to insert someTableEntity of SomeTable class that is modeled in our app like so:
public class SomeTable
{
[Key]
public long TicketID { get; set; }
public string SourceData { get; set; }
}
And it's inside our SomeDbContext like so:
public class SomeDbContext : DbContext
{
public DbSet<SomeTable> SomeTables { get; set; }
}
This table in Db has SourceData field as varchar(16) like so:
Now we'll try to insert value that is longer than 16 characters into this field and capture this information:
public void SaveSomeTableEntity()
{
var connectionString = "server=SERVER_NAME;database=DB_NAME;User ID=SOME_ID;Password=SOME_PASSWORD;Connection Timeout=200";
using (var context = new SomeDbContext(connectionString))
{
var someTableEntity = new SomeTable()
{
SourceData = "Blah-Blah-Blah-Blah-Blah-Blah"
};
context.SomeTables.Add(someTableEntity);
try
{
context.SaveChanges();
}
catch (Exception ex)
{
if (ex.GetBaseException().Message == "String or binary data would be truncated.\r\nThe statement has been terminated.")
{
var badFieldsReport = "";
List<string> badFields = new List<string>();
// YOU GOT YOUR FIELDS RIGHT HERE:
var longFields = FindLongBinaryOrStringFields(someTableEntity, connectionString);
foreach (var longField in longFields)
{
badFields.Add(longField.Key);
badFieldsReport += longField.Value + "\n";
}
}
else
throw;
}
}
}
The badFieldsReport will have this value:
'SourceData' column cannot take the value of
'Blah-Blah-Blah-Blah-Blah-Blah' because the max length it can take is
16.
It could also be because you're trying to put in a null value back into the database. So one of your transactions could have nulls in them.
Most of the answers here are to do the obvious check, that the length of the column as defined in the database isn't smaller than the data you are trying to pass into it.
Several times I have been bitten by going to SQL Management Studio, doing a quick:
sp_help 'mytable'
and be confused for a few minutes until I realize the column in question is an nvarchar, which means the length reported by sp_help is really double the real length supported because it's a double byte (unicode) datatype.
i.e. if sp_help reports nvarchar Length 40, you can store 20 characters max.
Checkout this gist.
https://gist.github.com/mrameezraja/9f15ad624e2cba8ac24066cdf271453b.
public Dictionary<string, string> GetEvilFields(string tableName, object instance)
{
Dictionary<string, string> result = new Dictionary<string, string>();
var tableType = this.Model.GetEntityTypes().First(c => c.GetTableName().Contains(tableName));
if (tableType != null)
{
int i = 0;
foreach (var property in tableType.GetProperties())
{
var maxlength = property.GetMaxLength();
var prop = instance.GetType().GetProperties().FirstOrDefault(_ => _.Name == property.Name);
if (prop != null)
{
var length = prop.GetValue(instance)?.ToString()?.Length;
if (length > maxlength)
{
result.Add($"{i}.Evil.Property", prop.Name);
result.Add($"{i}.Evil.Value", prop.GetValue(instance)?.ToString());
result.Add($"{i}.Evil.Value.Length", length?.ToString());
result.Add($"{i}.Evil.Db.MaxLength", maxlength?.ToString());
i++;
}
}
}
}
return result;
}
With Linq To SQL I debugged by logging the context, eg. Context.Log = Console.Out
Then scanned the SQL to check for any obvious errors, there were two:
-- #p46: Input Char (Size = -1; Prec = 0; Scale = 0) [some long text value1]
-- #p8: Input Char (Size = -1; Prec = 0; Scale = 0) [some long text value2]
the last one I found by scanning the table schema against the values, the field was nvarchar(20) but the value was 22 chars
-- #p41: Input NVarChar (Size = 4000; Prec = 0; Scale = 0) [1234567890123456789012]
In our own case I increase the sql table allowable character or field size which is less than the total characters posted from theĀ front end. Hence that resolve the issue.
Simply Used this:
MessageBox.Show(cmd4.CommandText.ToString());
in c#.net and this will show you main query , Copy it and run in database .

Categories

Resources