I have the following 2 tables with a one to many relationship. The table ORDER and ORDER_DETAILS. I am using C# to call a stored procedure to do some processing but eventually it should return the orders with its corresponding details.
So the question is which one of the 2 below is more optimized.
Select the orders and joining with the order details, and then having the data in c#.
Having 2 result sets, 1 orders and the order order details, then building up the data in c#.
My guess is that since the join will repeat the same columns in the order table each order details, the 2nd option is best.
What are your views on the above.
#Steve asked the right question. You need to clarify that.
But I would in most of cases go for first option. Joining both the tables at database end and selecting only necessary columns for the front end.
In this way you need to transport lesser amount of data and in normal scenario it should be faster than getting data from both tables and building it in front end. But without knowing your proper context it might not be the best possible solution.
Related
I am working on a dynamic loader. Based on a database table that defines the flat text files I can read a single file with multiple record types and load it into database tables. The tables are related and using identity primary keys. Everything is currently working but runs really slow as would be expected given that it is all accomplished by single insert statements. I am working on optimizing the process and cant find an 'easy' or 'best practice' answer on the web.
My current project deals with 8 tables but to simplify I will use a customers / orders example.
Lets look at two customers below, the data would repeat for each set of customers and orders in the data file. Parent records are always before child records. The first field is record type and each record type has a different definition of the fields that follow. This is all specified in the control tables.
CUST|Joe Green|123 Main St
ORD|Pancakes|5
ORD|Nails|2
CUST|John Deere|456 Park Pl
ORD|Tires|4
Current code will:
Insert customer Joe Green and return an ID. (Using Output
Inserted.Id in the insert statement)
Insert orders pancakes and nails attaching the returned ID.
Insert customer John Deere and return an ID.
Insert order Tires with the return ID.
This runs painfully slow. If this could be optimized and I wouldn't have to change much code, that would be ideal but I cant think of how.
So the solution? I was thinking datatables... Here is what I am thinking of so far.
Create Transaction
Lock all tables that are part of the 'file definition', in this case
Customers and Orders Get max ID for each table and increment by one
to have starting IDs for all tables
Create datatable for all tables
Execute as currently set up but instead of issuing insert statements
add to data table
After data is read bulk upload tables in the correct order based on
relationships
Unlock tables
End Transaction
I was wondering, before I go down this path, if anyone has worked out a better solution. I am also considering a custom script component in SSIS. I have seen posts and blogs about holding off on commiting a transaction but each parent record has only a few child records and the tree can get up to 4 deep, think order details and products. Due to needing the parent record ID I need to commit the insert of parent records. I have also considered managing the ID's myself rather than Identity but I do not want to add that extra management if I can avoid it.
UPDATE based on answer, for clarification / context.
A typical text file has
one file header record
- 5 facility records that relate to the file header
- 7,000 customers(account)
- 5 - 10 notes per customer
- 1-5 payments at the account level
- 1-5 adjustments at the account level
- 5 - 20 orders per customer
- 5 - 20 order details per order
- 1-5 payments at the order level
- 1-5 adjustments at the order level
- one file trailer record related to the file header
Keys
- File Header -> Facility -> Customer (Account)
- File Header -> FileTrailer
- Customer -> Notes
- Customer -> Payments
- Customer -> Adjustments
- Customer -> Orders
- Order -> OrderDetails
- Order -> Payments
- Order -> Adjustments
There are a few more tables involved but this should give an idea of the overall context.
Data Sample ... = MORE FIELDS .... MORE RECORDS
HEADER|F1|F2|...
FACILITY|F1|F2|..
CUSTOMER|F1|F2|...
NOTE|F1|F2|....
....
ORDER|F1|F2|...
ORDERDETAIL|F1|F2|...
.... ORDER DETAILS
ORDERPYMT|F1|F2|...
....
ORDERADJ|F1|F2|...
....
CUSTOMERPYMT|F1|F2|...
....
CUSTOMERADJ|F1|F2|...
....
(The structure repeats for each facility)
TRAILER|F1|F2|...
Inserting related tables with low data volumes should normally not be a problem. If they are slow, we will need more context to answer your question.
If you are encountering problems because you have many records to insert, you will probably have to look at SqlBulkCopy.
If you prefer not managing your ids yourself, the cleanest way I know of is working with temporary placeholder id columns.
Create and fill datatables with your data and a tempId columns you fill yourself and foreign keys blank
SqlBulkCopy primary table
Update secondary datatable with generated foreign keys by finding primary keys from previously inserted table through your tempids column
Upload secondary table
Repeat until done
Remove temporary id columns (optional)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Is there a way to join between SQL tables dynamically?
For example, a function that looks at the tables provided and generate join statements between primary and foreign keys
I have items in winform listview as
1. TABLE1.NAME
1. TABLE2.AGE
1. TABLE3.ADDRESS
In the database, I have 3 tables:
TABLE1
------
- TBL1_ID
- NAME
TABLE2
------
- TBL_ID2
- TBL1_ID_FK
- AGE
TABLE3
------
- TBL_ID3
- TBL2_ID_FK
- ADDRESS
The output that I am trying to achieve should look like this:
SELECT TABLE1.NAME, TABLE2.AGE, TABLE3.ADDRESS FROM TABLE1 JOIN TABLE2
ON TBL1_ID_FK = TBL1_ID JOIN TABLE3 ON TBL2_ID_FK = TBL_ID2
you probably don't want to do this... if you REALLY think you need to do this see this as an idea of an approach:
read and understand the TSQL SELECT syntax:
https://msdn.microsoft.com/en-US/en-en/library/ms189499.aspx
break down your problem into the fragments of that syntax
you need to create a select_list, a table_source and most likely a WHERE clause
create a datastructure that represents your schema (how your tables are to be joined, and which columns are in which table)
for a column selection, itterate through all selected columns, and find the tables they reside in. store the table selection in a temporary list (keep in mind that a column selection also needs to hold information about the association to the other selected columns ... for example, a Person can be someone who is a buyer of some goods, or the seller, while in both cases, the records are stored in the same table. the columns alone dont hold enough information about which way to join the Person table ... on the buyer or the seller side?)
for the table selection, select one of the tables as the starting point and start with this table as the first table of your FROM clause, itterate through all selected tables, and find ALL paths along your schema definition on how to connect/join them and keep those that are valid for the associations of your columnselection. walk along the paths and add all tables with their associated join condition to the from clause, while giving each a distinct name. optionally now reduce the joins based on the tables and associations (you need only one pair for each)
update the columselection to reference the names given to the joined tables, use the association information of the table to match the right columns
in the end, actually generate text for all fragments, and put them together in the right order...
as you can see a GENERAL solution is hard work ... if you can water down your expectations, especially on the association side, you can greatly reduce the complexity, but it will not work for every situation
or use an ORM which will save you litterally thousands of hours...
wit EF for example, you can use a user defined expression tree to project the result you want to have, and EF takes care of the statement for you
Just to name a few:
LINQ to SQL: https://msdn.microsoft.com/en-us/library/bb425822.aspx
Entity Framework: http://www.entityframeworktutorial.net/what-is-entityframework.aspx
Im using LINQ to SQL and i'm happy with it. So i would recommend it if you want to code instead of "designing" things.
Although I consider this a bad practice, sometimes you have to create the SQL statement dynamically. In this case you create the command as a string and then run it using the SQL's EXEC command.
Here is an example (strictly SQL statements for SQL Server):
declare #cmd varchar(max) = 'Select * from Name'
-- apply your logic to modify the select statement and create the final query
EXEC (#cmd)
I'm trying to join two tables from different servers , but it keeps throwing this exception :
This method supports LINQ to Entities infrastructure and is not intended to be used directly from your code.
here is my query :
var myList = (from myTableA in _dbProvider.ContextOne.TableA
join myTableB in _dbProvider.ContextOne.TableB on myTableA.ProductId equals myTableB.Oid
join myTableC in _dbProvider.ContextTwo.TableC on myTableB.Id equals myTableC.ProductId
where
select myTableC.Name).Distinct().ToList();
what's that mean ?,
knowing that I found an other solution by getting data separately from each table into lists then joining them but it's very greedy in terms of time
is there any other solution ?
You can't join two tables from two different servers. Definitely not from EF.
Your best bet is to only fetch the data in two separate lists and then join them together using Linq to objects.
Let me make an imaginary example: You have 1000,000 invoices on one table, each one has about 10 items, a total of 10,000,000 invoice details on anther server. You need Invoices and their details for 10 first invoices created on 2015-5-4
you send a query to first DB, getting only that 10 invoices, extract their ids and use that to query about 100 rows from the other server. This is only about two times slower than making a single join query.
In some cases this becomes impossible (you have conditions on both tables) and you need to bring more rows, but in simple scenarios this is possible.
I need to update a bit field in a table and set this field to true for a specific list of Ids in that table.
The Ids are passed in from an external process.
I guess in pure SQL the most efficient way would be to create a temp table and populate it with the Ids, then join the main table with this and set the bit field accordingly.
I could create a SPROC to take the Ids but there could be 200 - 300,000 rows involved that need this flag set so its probably not the most efficient way. Using the IN statement has limitation wrt the amount of data that can be passed and performance.
How can I achieve the above using the Entity Framework
I guess its possible to create a SPROC to create a temp table but this would not exist from the models perspective.
Is there a way to dynamically add entities at run time. [Or is this approach just going to cause headaches].
I'm making the assumption above though that populating a temp table with 300,000 rows and doing a join would be quicker than calling a SPROC 300,000 times :)
[The Ids are Guids]
Is there another approach that I should consider.
For data volumes like 300k rows, I would forget EF. I would do this by having a table such as:
BatchId RowId
Where RowId is the PK of the row we want to update, and BatchId just refers to this "run" of 300k rows (to allow multiple at once etc).
I would generate a new BatchId (this could be anything unique -Guid leaps to mind), and use SqlBulkCopy to insert te records onto this table, i.e.
100034 17
100034 22
...
100034 134556
I would then use a simgle sproc to do the join and update (and delete the batch from the table).
SqlBulkCopy is the fastest way of getting this volume of data to the server; you won't drown in round-trips. EF is object-oriented : nice for lots of scenarios - but not this one.
I'm assigning Marcs response as the answer but I'd just like to give a little detail on how we implemented the requirement.
Marc response helped greatly in the formulation of our solution.
We had to deal with an aim/guideline to keep within the Entity Framework while not utilizing SPROCS and although our solution may not suit others it has worked for us
We created a Item table in the Database with BatchId [uniqueidentifier] and ItemId varchar columns.
This table was added to the EF model so we did not use temporary tables.
On upload of these Ids this table is populated with the Ids [Inserts are quick enough we find using EF]
We then use context.ExecuteStoreCommand to run the SQL to do join the item table and the main table and update the bit field in the main table for records that exist for the batch Id created specifically for that session.
We finally clear this table for that batchId.
We have the performance, keeping within our no SPROC goal. [Which not of us agree with :) but its a democracy]
Our exact requirements are a little more complex but insofar as needing good update performance using the Entity framework given our specific restrictions it works fine.
Liam
Out of my lack of SQL Server experience and taking into account that this task is a usual one for Line of Business applications, I'd like to ask, maybe there is a standard, common way of doing the following database operation:
Assume we have two tables, connected with each other by one-to-many relationship, for example SalesOderHeader and SalesOrderLines
http://s43.radikal.ru/i100/1002/1d/c664780e92d5.jpg
Field SalesHeaderNo is a PK in SalesOderHeader table and a FK in SalesOrderLines table.
In a front-end app a User selects some number of records in the SalesOderHeader table, using for example Date range, or IsSelected field by clicking checkbox fields in a GridView. Then User performs some operations (let it be just "move to another table") on selected range of Sales Orders.
My question is:
How, in this case, I can reach child records in the SalesOrderLines table for performing the same operations (in our case "move to another table") over these child records in as easy, correct, fast and elegant way as possible?
If you're okay with a T-SQL based solution (as opposed to C# / LINQ) - you could do something like this:
-- define a table to hold the primary keys of the selected master rows
DECLARE #MasterIDs TABLE (HeaderNo INT)
-- fill that table somehow, e.g. by passing in values from a C# apps or something
INSERT INTO dbo.NewTable(LineCodeNo, Item, Quantity, Price)
SELECT SalesLineCodeNo, Item, Quantity, Price
FROM dbo.SalesOrderLine sol
INNER JOIN #MasterIDs m ON m.HeaderNo = sol.SalesHeaderNo
With this, you can insert a whole set of rows from your child table into a new table based on a selection criteria.
Your question is still a bit vague to me in that I'm not exactly sure what would be entailed by "move to another table." Does that mean there is another table with the exact schema of both your sample tables?
However, here's stab at a solution. When a user commits on a SalesOrderHeader record, some operation will be performed that looks like:
Update SalesOrderHeader
Set....
Where SalesOrderHeaderNo = #SalesOrderHeaderNo
Or
Insert SomeOtherTable
Select ...
From SalesOrderHeader
Where SalesOrderHeaderNo = #SalesOrderHeaderNo
In that same operation, is there a reason you can't also do something to the line items such as:
Insert SomeOtherTableItems
Select ...
From SalesOrderLineItems
Where SalesOrderHeaderNo = #SalesOrderHeaderNo
I don't know about "Best Practices", but this is what I use:
var header = db.SalesOrderHeaders.SingleOrDefault(h => h.SaleHeaderNo == 14);
IEnumerable<SalesOrderLine> list = header.SalesOrderLines.AsEnumerable();
// now your list contains the "many" records for the header
foreach (SalesOrderLine line in list)
{
// some code
}
I tried to model it after your table design, but the names may be a little different.
Now whether this is the "best practices" way, I am not sure.
EDITED: Noticed that you want to update them all, possibly move to another table. Since LINQ-To-SQL can't do bulk inserts/updates, you would probably want to use T-SQL for that.