Efficient way of inserting items into two tables using EF6 DbFirst - c#

I have two tables, eq. Users (UserID, CompanyID, Name, ...) and Companies (CompanyID, Name, ...). I have to process XML import data which generates a great number of data item into each table. In the xml file when a (new) user references into a new company - the company item must be inserted as well. When a user (or a company) item in the xml contains changed data - the items in the database must be updated.
Using EF6 creating and updating the items is quite simple - but the SubmitChanges() takes extremely long. I was searching the google and stackoverflow and found several similar topic - using datacontext and bulk inserting items. They are useful, but in my case when I bulk insert a new company - its ID is unknown for me, so I cannot bulk insert the user item as well. Which is the good way to solve this?
gather all the new company items first, bulk insert them, read back all the
information and ID-s? (or how would I know these new company item's ID)
then bulk insert all the user items?
My second question: is there a common way to generate the data table structure for the bulk insert from the EF6 class? Can I write a generic bulk insert method using the DB First table class?

Related

Fastest way to get the distinction/delta between a local table and the values of a database table

I have the following requirements:
I have many different database tables for example a table which shows different materialnumbers. There are many thousands rows in the database table (>100.000)
A user could import data with an excel file with thousends of rows.
The distinction/ difference between the excel file and the database-table should be added. The allready existing materialnumbers should be updated.
Is there a faster way than this to get the allready existing values in entity framework:
var allreadyExistingMaterialnumbersInDatabase = myDbContext.Materialnumbers.Where(x=>myExcelListMaterialnumbers.Contains(x.Materialnumber)).ToList();
After that step i have to do a update for all the data which is returned in the code example and then i have to insert all the difference between the list allreadyExistingMaterialNumbersInDatabase and myExcelListMaterialnumbers.
Thanks for your help.

C# Fast Upload Related Tables

I am working on a dynamic loader. Based on a database table that defines the flat text files I can read a single file with multiple record types and load it into database tables. The tables are related and using identity primary keys. Everything is currently working but runs really slow as would be expected given that it is all accomplished by single insert statements. I am working on optimizing the process and cant find an 'easy' or 'best practice' answer on the web.
My current project deals with 8 tables but to simplify I will use a customers / orders example.
Lets look at two customers below, the data would repeat for each set of customers and orders in the data file. Parent records are always before child records. The first field is record type and each record type has a different definition of the fields that follow. This is all specified in the control tables.
CUST|Joe Green|123 Main St
ORD|Pancakes|5
ORD|Nails|2
CUST|John Deere|456 Park Pl
ORD|Tires|4
Current code will:
Insert customer Joe Green and return an ID. (Using Output
Inserted.Id in the insert statement)
Insert orders pancakes and nails attaching the returned ID.
Insert customer John Deere and return an ID.
Insert order Tires with the return ID.
This runs painfully slow. If this could be optimized and I wouldn't have to change much code, that would be ideal but I cant think of how.
So the solution? I was thinking datatables... Here is what I am thinking of so far.
Create Transaction
Lock all tables that are part of the 'file definition', in this case
Customers and Orders Get max ID for each table and increment by one
to have starting IDs for all tables
Create datatable for all tables
Execute as currently set up but instead of issuing insert statements
add to data table
After data is read bulk upload tables in the correct order based on
relationships
Unlock tables
End Transaction
I was wondering, before I go down this path, if anyone has worked out a better solution. I am also considering a custom script component in SSIS. I have seen posts and blogs about holding off on commiting a transaction but each parent record has only a few child records and the tree can get up to 4 deep, think order details and products. Due to needing the parent record ID I need to commit the insert of parent records. I have also considered managing the ID's myself rather than Identity but I do not want to add that extra management if I can avoid it.
UPDATE based on answer, for clarification / context.
A typical text file has
one file header record
- 5 facility records that relate to the file header
- 7,000 customers(account)
- 5 - 10 notes per customer
- 1-5 payments at the account level
- 1-5 adjustments at the account level
- 5 - 20 orders per customer
- 5 - 20 order details per order
- 1-5 payments at the order level
- 1-5 adjustments at the order level
- one file trailer record related to the file header
Keys
- File Header -> Facility -> Customer (Account)
- File Header -> FileTrailer
- Customer -> Notes
- Customer -> Payments
- Customer -> Adjustments
- Customer -> Orders
- Order -> OrderDetails
- Order -> Payments
- Order -> Adjustments
There are a few more tables involved but this should give an idea of the overall context.
Data Sample ... = MORE FIELDS .... MORE RECORDS
HEADER|F1|F2|...
FACILITY|F1|F2|..
CUSTOMER|F1|F2|...
NOTE|F1|F2|....
....
ORDER|F1|F2|...
ORDERDETAIL|F1|F2|...
.... ORDER DETAILS
ORDERPYMT|F1|F2|...
....
ORDERADJ|F1|F2|...
....
CUSTOMERPYMT|F1|F2|...
....
CUSTOMERADJ|F1|F2|...
....
(The structure repeats for each facility)
TRAILER|F1|F2|...
Inserting related tables with low data volumes should normally not be a problem. If they are slow, we will need more context to answer your question.
If you are encountering problems because you have many records to insert, you will probably have to look at SqlBulkCopy.
If you prefer not managing your ids yourself, the cleanest way I know of is working with temporary placeholder id columns.
Create and fill datatables with your data and a tempId columns you fill yourself and foreign keys blank
SqlBulkCopy primary table
Update secondary datatable with generated foreign keys by finding primary keys from previously inserted table through your tempids column
Upload secondary table
Repeat until done
Remove temporary id columns (optional)

How to use Autocreated DataSet - TableAdapterManager

(Sorry for my bad English)
I have imported an access database to a C# winform project (.net 4.0) in visual studio 2013. It automatically creates a .cs file with a DataSet, TableAdapter and a TableAdapterManager.
I import data from the database to the DataSet, without error. I succeed to manipulate data, and save change to the database with TableAdapterManager.UpdateAll().
But now I try to insert new data, with relation between tables.
For example, a database like mine
Parent table :
autonum key
string parentname
Child table
autonum key
string childname
int parentKey
First try :
I create a new record with parentTable.AddparenttableRow(data ...) and get a parentRow.
I create a new record with childTable.AddchildtableRow(parentRow, data ...)
But if I call TableAdpaterManager.UpdateAll(), I get an error "can't add or modify a record because a related record is required in parentTable" (not the real message, it's a translation). I think that AddchildtableRow create the correct relation. And another problem appears : because of the error, the database isn't modified (which is good), but the records I had add, are always in the table of the DataSet.
So I try another method : TableAdpaterManager.tablenameTableAdpater.Insert()
First I insert a parentRow without any problem. But when I want to insert a childRow, the insert function asks for the parent key. But I don't have it (the insert parent call doesn't return the key).
My question is : how can I use the DataSet, TableAdapter and TableAdapterManager to insert records in the DataSet AND in the database, and with a transaction (if there is an error, the data won't be written to the database, and won't be added to the DataSet) ? And actually, how to correctly use these classes ?
Look up the typed dataset code. Switch between the default TableAdapterManager.UpdateOrderOption.InsertUpdateDelete to UpdateInsertDelete (msdn). For hierarchical updates you have to merge new values for your identity columns (msdn). Also see this post. The way ADO.NET deals with preventing collisions with it's disconnected dataset, it assigns negative IDENTITY column values, because it wouldn't know a possible positive number that IS NOT a collision as it's disconnected. Also managing a ##identity crisis with parent-child relations. The typed dataset technology also had issues with circular table references.

Recommend usage of temp table or table variable in Entity Framework 4. Update Performance Entity framework

I need to update a bit field in a table and set this field to true for a specific list of Ids in that table.
The Ids are passed in from an external process.
I guess in pure SQL the most efficient way would be to create a temp table and populate it with the Ids, then join the main table with this and set the bit field accordingly.
I could create a SPROC to take the Ids but there could be 200 - 300,000 rows involved that need this flag set so its probably not the most efficient way. Using the IN statement has limitation wrt the amount of data that can be passed and performance.
How can I achieve the above using the Entity Framework
I guess its possible to create a SPROC to create a temp table but this would not exist from the models perspective.
Is there a way to dynamically add entities at run time. [Or is this approach just going to cause headaches].
I'm making the assumption above though that populating a temp table with 300,000 rows and doing a join would be quicker than calling a SPROC 300,000 times :)
[The Ids are Guids]
Is there another approach that I should consider.
For data volumes like 300k rows, I would forget EF. I would do this by having a table such as:
BatchId RowId
Where RowId is the PK of the row we want to update, and BatchId just refers to this "run" of 300k rows (to allow multiple at once etc).
I would generate a new BatchId (this could be anything unique -Guid leaps to mind), and use SqlBulkCopy to insert te records onto this table, i.e.
100034 17
100034 22
...
100034 134556
I would then use a simgle sproc to do the join and update (and delete the batch from the table).
SqlBulkCopy is the fastest way of getting this volume of data to the server; you won't drown in round-trips. EF is object-oriented : nice for lots of scenarios - but not this one.
I'm assigning Marcs response as the answer but I'd just like to give a little detail on how we implemented the requirement.
Marc response helped greatly in the formulation of our solution.
We had to deal with an aim/guideline to keep within the Entity Framework while not utilizing SPROCS and although our solution may not suit others it has worked for us
We created a Item table in the Database with BatchId [uniqueidentifier] and ItemId varchar columns.
This table was added to the EF model so we did not use temporary tables.
On upload of these Ids this table is populated with the Ids [Inserts are quick enough we find using EF]
We then use context.ExecuteStoreCommand to run the SQL to do join the item table and the main table and update the bit field in the main table for records that exist for the batch Id created specifically for that session.
We finally clear this table for that batchId.
We have the performance, keeping within our no SPROC goal. [Which not of us agree with :) but its a democracy]
Our exact requirements are a little more complex but insofar as needing good update performance using the Entity framework given our specific restrictions it works fine.
Liam

Effective one-to-many inserts in C# and MsSql

I have two tables which looks like this:
News: (ID, Title, TagID)
Tags: (ID, Tag)
Each news can only have one tag. What is the most effective way to handle inserts to the news table? The Tags table has like 50 000 rows.
I'm only doing bulk inserts of approx. 300 news at a time, around 2 times per hour. I assume that i need some in-memory cache for the tags?
If the tag is not in the tags table, i need to insert it and set TagID to the newly inserted id.
Hope you'll get the idea!
What version of SQL Server are you using in the background?
If you're using SQL Server 2008, I would recommend bulk-loading the tags and news for each day into a temporary working table, and then using the MERGE statement to update the actual Tags and News table from those working tables. I'd use the C# "SqlBulkCopy" class for that.
MERGE allows you to easily insert only those items that have changed, and possibly update those that already exist, all in one single, handy SQL statement.
If you're on SQL Server 2005 or below, you can do basically the same, but you'll have to write some code (C# or T-SQL) to manually check what needs to be inserted from your temp bulkload tables, and what is already present.
Marc
I presume with each news item you'll get a list of strings that are the supposed "tags". From the structure you've given, you can only have one tag on each news item? That seems unusual, but the below applies anyway.
If your Tags table has an index on it, the searches will be really fast, and the database will take care of the caching anyway, so don't worry about the caching. You'll be amazed how much the database can speed things up when you have indexes in the right place
Do a select from Tags where Tag = whatever1 (do this for each tag), each time if no rows returned insert it, otherwise use the id you've found to do it. Run the proc on each INSERT.

Categories

Resources