Fastest way to copy data from one table to another using linq - c#

I have a table on another SQL server which I need to copy from overnight. The structure of the destination table is very similar so I was just going to use something like the code below.
Source - http://forums.asp.net/t/1322979.aspx/1
I have not tried this yet, but is there a better/quicker way to do this in linq?
//there exist two table list and listSecond
DataClassesDataContext dataClass = new DataClassesDataContext(); //create the instance of the DataContext
var str = from a in dataClass.lists select a;
foreach (var val in str) // iterator the data from the list and insert them into the listSecond
{
listSecond ls = new listSecond();
ls.ID = val.ID;
ls.pid = val.pid;
ls.url = val.url;
dataClass.listSeconds.InsertOnSubmit(ls);
}
dataClass.SubmitChanges();
Response.Write("success");

Using LINQ to insert large amounts of data is not a good idea, except maybe with complicated schemas that need much transformations before being copied. It will create a separate query for each row inserted, in addition to logging them all in the transaction log.
A much faster solution can be found here - it's using SqlBulkCopy, which is a method for inserting large amounts of data in a single query, withouth transaction logging to slow it down. It will be an order of magnitude faster, and I'm telling you this from personal experience with both methods.

Related

Need to get the data during an SaveChanges() within EF Core

I need data for a LINQ query that is not already saved to the database. Here is my code:
foreach (var item in BusProc)
{
var WorkEffortTypeXref = new WorkEffortTypeXref
{
SourceDataEntityTypeName = "BusProc",
SourceDataEntityTypeId = item.BusProcId,
};
_clDBContext.WorkEffortTypeXref.AddRange(WorkEffortTypeXref);
}
but I need this data in the SQL Database before I do a join LINQ query on the data. Although I don't really want to do a OnSave() after this function because I want to make the whole process transactional.
This is the LINQ I need to execute. What is the best way to do this?
var linqquery = from bupr in BusProc
join wrtx in WorkEffortTypeXref on bupr.BusProcId equals wrtx.SourceDataEntityTypeId
// where wrtx.SourceDataEntityTypeName == "BusProc"
select new
{
wrtx.TargetDataEntityTypeId
};
First, try to compile your code. This code won't compile and as-is isn't comprehensible. You are treating WorkEffortTypeXref as a class, a singular object and a list all in the first section of code, so we really can't know what it is supposed to be.
Now, as I understand your question, you want to query information that is being added to a table, (currently stored in memory as a collection of some sort) but you want to query it before it is added? What if the table has other rows that match your query? What if the insert violates a constraint and therefore fails? Linq can query an in-memory collection, but you have to choose, are you querying the collection that you have in memory (and isn't yet a row of your table) or the database (which has all of the rules/contstraints/etc that databases provide)? Until the records are saved to your table, they aren't the same thing.

C# Efficiently delete 50000 records in batches using SQLBulkCopy or equivalent library

I'm using this library to perform bulk delete in batches like following:
while (castedEndedItems.Any())
{
var subList = castedEndedItems.Take(4000).ToList();
DBRetry.Do(() => EFBatchOperation.For(ctx, ctx.SearchedUserItems).Where(r => subList.Any(a => a == r.ItemID)).Delete(), TimeSpan.FromSeconds(2));
castedEndedItems.RemoveRange(0, subList.Count);
Console.WriteLine("Completed a batch of ended items");
}
As you can see guys I take a batch of 4000 items to delete at once and I pass them as argument to the query...
I'm using this library to perform bulk delete:
https://github.com/MikaelEliasson/EntityFramework.Utilities
However the performance like this is absolutely terrible... I tested the application couple of times and to delete the 80000 records for example it takes literally 40 minutes!?
I should note that that parameter by which I'm deleting (ItemID) is of varchar(400) type and it's indexed for performance reasons....
Is there any other library that I could possibly use or tweak this query to make it work faster, because currently the performance is absolutely terrible.. :/
If you are prepared to use a stored procedure then you can do this without any external library:
Create the sproc using a table valued parameter #ids
Define a SQL type for that table valued parameter (just an id column assuming a simple PK)
In the sproc use
delete from table where id in (select id from #ids);
In your application create a DataTable and populate to match the SQL table
Pass the data table as an command parameter when calling the sproc.
This answer illustrates the process.
Any other option will need to do the equivalent of this – or something less efficient.
any EF solution here is probably going to perform lots of discreet operations. Instead, I would suggest manually building your SQL in a loop, something like:
using(var cmd = db.CreateCommand())
{
int index = 0;
var sql = new StringBuilder("delete from [SomeTable] where [SomeId] in (");
foreach(var item in items)
{
if (index != 0) sql.Append(',');
var name = "#id_" + index++;
sql.Append(name);
cmd.Parameters.AddWithValue(name, item.SomeId);
}
cmd.CommandText = sql.Append(");").ToString();
cmd.ExecuteNonQuery();
}
You may need to loop this in batches, though, as there is an upper limit on the number of parameters allowed on a command.
If you don't mind the extra dependency, you could use the NuGet package Z.EntityFramework.Plus.
The code is roughly as follows:
using Z.EntityFramework.Plus;
[...]
using (yourDbContext context = new yourDbContext())
{
yourDbContext.yourDbSet.Where( yourWhereExpression ).Delete();
}
It is simple and efficient. The documentation contains exact numbers about the performance.
Regarding licensing: As far as I know, version 1.8 has an MIT license: https://github.com/zzzprojects/EntityFramework-Plus/blob/master/LICENSE
The newer version are not free to use.

How To insert records of one database to other database in LINQ

I have 2 database with tables.
I wanted to insert records from first database to second database table in LINQ. I have created 2 dbml files with 2 datacontexts but I am unable to code the insertion of records.
I have list of records:
using(_TimeClockDataContext)
{
var Query = (from EditTime in _TimeClockDataContext.tblEditTimes
orderby EditTime.ScanDate ascending
select new EditTimeBO
{
EditTimeID = EditTime.EditTimeID,
UserID = Convert.ToInt64(EditTime.UserID),
ScanCardId = Convert.ToInt64(EditTime.ScanCardId),
}).ToList();
return Query;
}
Now I want to insert record in new table which is in _Premire2DataContext.
If you want to "copy" records from one database to another using Linq then you need two database contexts, one for the database you are reading from, and one for the database you are reading to.
EditTime[] sourceRows;
using (var sourceContext = CreateSourceContext())
{
sourceRows = ReadRows(sourceContext);
}
using (var destinationContext = CreateDestinationContext())
{
WriteRows(destinationContext, sourceRows);
}
You now just need to fill in the implementations for ReadRows and WriteRows using standard LINQ to SQL code. The code for writing rows should look a bit like this.
void WriteRows(TimeClockDataContext context, EditTime[] rows)
{
foreach (var row in rows)
{
destinationContext.tblEditTimes.Add(row);
}
destinationContext.SubmitChanges();
}
Note that as long as the schema is the same you can use the same context and therefore the same objects - so when reading records we ideally want to return the correct array type, therefore reading is going to look a bit like this
EditTime[] ReadRows(TimeClockDataContext context)
{
return (
from EditTime in _TimeClockDataContext.tblEditTimes
orderby EditTime.ScanDate ascending
select EditTime
).ToArray();
}
You can use an array or a list - it doesn't really matter. I've used an array mostly because the syntax is shorter. Note that we return the original EditTime objects rather than create new ones as this means we can add those objects directly to the second data context.
I've not compiled any of this code yet, so it might contain some typos. Also apologies if I've made some obvious errors - its been a while since I last used LINQ to SQL.
If you have foreign keys or the second database has a different schema then things get more complicated, but the fundamental process remains the same - read from one context (using standard LINQ to SQL) and store the results somewhere, then add the rows the the second context (using standard LINQ to SQL).
Also note that this isn't necessarily going to be particularly quick. If performance is an issue then you should look into using bulk inserts in the WriteRows method, or potentially even use linked servers to do the entire thing in SQL.

Can you get DataReader-like streaming using Linq-to-SQL?

I've been using Linq-to-SQL for quite awhile and it works great. However, lately I've been experimenting with using it to pull really large amounts of data and am running across some issues. (Of course, I understand that L2S may not be the right tool for this particular kind of processing, but that's why I'm experimenting - to find its limits.)
Here's a code sample:
var buf = new StringBuilder();
var dc = new DataContext(AppSettings.ConnectionString);
var records = from a in dc.GetTable<MyReallyBigTable>() where a.State == "OH" select a;
var i = 0;
foreach (var record in records) {
buf.AppendLine(record.ID.ToString());
i += 1;
if (i > 3) {
break; // Takes forever...
}
}
Once I start iterating over the data, the query executes as expected. When stepping through the code, I enter the loop right away which is exactly what I hoped for - that means that L2S appears to be using a DataReader behind the scenes instead of pulling all the data first. However, once I get to the break, the query continues to run and pull all the rest of the records. Here are my questions for the SO community:
1.) Is there a way to stop Linq-to-SQL from finishing execution of a really big query in the middle the way you can with a DataReader?
2.) If you execute a large Linq-to-SQL query, is there a way to prevent the DataContext from filling up with change tracking information for every object returned. Basically, instead of filling up memory, can I do a large query with short object lifecycles the way you can with DataReader techniques?
I'm okay if this isn't functionality built-in to the DataContext itself and requires extending the functionality with some customization. I'm just looking to leverage the simplicity and power of Linq for large queries for nightly processing tasks instead of relying on T-SQL for everything.
1.) Is there a way to stop Linq-to-SQL from finishing execution of a really
big query in the middle the way you
can with a DataReader?
Not quite. Once the query is finally executed the underlying SQL statement is returning a result set of matching records. The query is deferred up till that point, but not during traversal.
For your example you could simply use records.Take(3) but I understand your actual logic to halt the process might be external to SQL or not easily translatable.
You could use a combination approach by building a strongly typed LINQ query then executing it with old fashioned ADO.NET. The downside is you lose the mapping to the class and have to manually deal with the SqlDataReader results. An example of this is shown below:
var query = from c in Customers
where c.ID < 15
select c;
using (var command = dc.GetCommand(query))
{
command.Connection.Open();
using (var reader = command.ExecuteReader())
{
int i = 0;
while (reader.Read())
{
Customer c = new Customer();
c.ID = reader.GetInt32(reader.GetOrdinal("ID"));
c.Name = reader.GetString(reader.GetOrdinal("Name"));
Console.WriteLine("{0}: {1}", c.ID, c.Name);
i++;
if (i > 3)
break;
}
}
}
2.) If you execute a large Linq-to-SQL query, is there a way to prevent the
DataContext from filling up with
change tracking information for every
object returned.
If your intention for a particular query is to use it for read-only purposes then you could disable object tracking to increase performance by setting the DataContext.ObjectTrackingEnabled property to false:
using (var dc = new MyDataContext())
{
dc.ObjectTrackingEnabled = false;
// do stuff
}
You can also read this MSDN topic: How to: Retrieve Information As Read-Only (LINQ to SQL).

Is it possible to insert large amount of data using linq-to-sql?

I need to insert large amount of data into SqlServer 2008. My project is based on linq-to-sql.
I process csv file with 100.000 rows. Each row is mapped to Order object. Order contains also collection of Category and Code objects. I need to map each row to object in order to validate it.
Then I need to insert all these objects into database.
List<Order> orders = Import("test.csv");
db.Orders.InsertAllOnSubmit(orders);
db.SubmitChanges();
OR
foreach(Order order in orders)
db.Orders.InsertOnSubmit(order);
db.SubmitChanges();
Both ways are slow. Is there any workaround? I may use other approach than l2sql for this task.
I read about SqlBulkCopy class - would it handle inserting child entities as well?
Try using smaller transactions.
foreach(List<Order> orderbatch in orders.Batch(100))
{
db.Orders.InsertOnSubmit(orderbatch);
db.SubmitChanges();
}
public static IEnumerable<List<T>> Batch<T>(this IEnumerable<T> source, int batchAmount)
{
List<T> result = new List<T>();
foreach(T t in source)
{
result.Add(t);
if (result.Count == batchSize)
{
yield return result;
result = new List<T>();
}
}
if (result.Any())
{
yield return result;
}
}
As #Brian points out LINQ to SQL does not do bulk insert, but this blog talks about away to get it to work.
The author seems to have added the code since I first read it (it's from 2008).
This CSV reader was really fast for me: http://www.codeproject.com/KB/database/CsvReader.aspx
But yes, a bulk copy operation utilizing only SQL Server would be faster if you have the option to.
LINQ to SQL doesn't have a bulk update capability that I'm aware of... you have to iterate through.
HTH.
I think it is better to insert objects by groups with, for instance, 1000 objects, then dispose session.
Performance here is balanced between two edges: memory overusing caused by keeping all 100,000 objects in memory at one side, and time for creating session and reconnecting database at another side.
By the way, there is no significant difference between session.InsertAllOnSubmit(data) and foreach(var i in data) session.Insert(i).

Categories

Resources