Joining two tables from different servers using linq - c#

I'm trying to join two tables from different servers , but it keeps throwing this exception :
This method supports LINQ to Entities infrastructure and is not intended to be used directly from your code.
here is my query :
var myList = (from myTableA in _dbProvider.ContextOne.TableA
join myTableB in _dbProvider.ContextOne.TableB on myTableA.ProductId equals myTableB.Oid
join myTableC in _dbProvider.ContextTwo.TableC on myTableB.Id equals myTableC.ProductId
where
select myTableC.Name).Distinct().ToList();
what's that mean ?,
knowing that I found an other solution by getting data separately from each table into lists then joining them but it's very greedy in terms of time
is there any other solution ?

You can't join two tables from two different servers. Definitely not from EF.
Your best bet is to only fetch the data in two separate lists and then join them together using Linq to objects.
Let me make an imaginary example: You have 1000,000 invoices on one table, each one has about 10 items, a total of 10,000,000 invoice details on anther server. You need Invoices and their details for 10 first invoices created on 2015-5-4
you send a query to first DB, getting only that 10 invoices, extract their ids and use that to query about 100 rows from the other server. This is only about two times slower than making a single join query.
In some cases this becomes impossible (you have conditions on both tables) and you need to bring more rows, but in simple scenarios this is possible.

Related

Join few tables using IQueryable

Let's say I need to make a complicated query to database (it includes 4 tables). I've written 4 different methods that query 4 different tables and I know it's not good:
var groups = *method that contains of .Where(), .Select() etc, that queries groups*
var marks = *method that has a parameter GROUPS and queries another table*
...
So it's like a ladder: to query another table, I need the results of the previous query (that has another context :))
I know about the Join method, but how can I give it the results of previous query?
Thank you in advance
I'm not sure if I understand the question correctly, but when it comes to joining multiple tables in one query, you can use model relationships (you can create them yourself or with the scaffold command, here you have how to use it ), then you will be able to use a relation in a linq query.

Having join or multiple resultsets

I have the following 2 tables with a one to many relationship. The table ORDER and ORDER_DETAILS. I am using C# to call a stored procedure to do some processing but eventually it should return the orders with its corresponding details.
So the question is which one of the 2 below is more optimized.
Select the orders and joining with the order details, and then having the data in c#.
Having 2 result sets, 1 orders and the order order details, then building up the data in c#.
My guess is that since the join will repeat the same columns in the order table each order details, the 2nd option is best.
What are your views on the above.
#Steve asked the right question. You need to clarify that.
But I would in most of cases go for first option. Joining both the tables at database end and selecting only necessary columns for the front end.
In this way you need to transport lesser amount of data and in normal scenario it should be faster than getting data from both tables and building it in front end. But without knowing your proper context it might not be the best possible solution.

Split t-sql results into data tables, by non-numerical value in a field

I writing a C# program to output t-SQL records into separate tabs in an excel spreadsheet, split by the person the records belong to.
I have seen that I can have many data tables in a single data set, and turn each into a separate tab (how to store multiple DataTables into single DataSet in c#?), so now I need to populate my data tables.
I do not have a fixed list of people, it will vary each time the program is run, and a person could have any number of records assigned to them.
Is there a way of doing this using SQL / C# using something like order or group by; or do I have to get my results, pick up the list of people, then loop each SQL query for that specific person and feed that into a new data table?
Thought I'd ask if anyone knew a short way before I did it the long way, because this can't be an uncommon thing to do; so I suspect there must be a simpler way.
Normally you get one DataTable per SELECT statement.
However, you could just select everything and then use LINQ to group the data and fill your DataTables. See if this is any help.
It depends on the table structure, as well as for the source as for the destination.
If you have multiple source tables you can append them together with the UNION statement. Which gives the distinct value of all tables. You can use UNION ALL to keep duplicate values.
SELECT customer_key, customer_name, customer_address
FROM table_1
WHERE customer_key = #Customer_key
UNION (or UNION ALL)
SELECT customer_key, customer_name, customer_address
FROM table_2
WHERE Customer_key = #Customer_key
UNION etc..

c# vs mysql: calling function in sql select statement vs fetching data and calling same function in c#

We are a product website with several products having guarantee. Guarantee is only applicable for few products with particular dealerids. The 2 tables are:
Product table with columns as id, name, cityId, dealerId, price. This table has all the products.
GuaranteeDealers table with column as dealerId. This has all dealer with guaranteed products.
We want to get all products with info if it is guaranteed or not. The query looks like:
APPROACH1: Get isGuaranteed from sql function to server(c#) side:
select id, name, cityId, dealerId, price, isGuaranteed = isGuaranteed( dealerId) from customers
isGuaranteed is a sql function that checks if dealerId is in the table guranteeDealers. If yes it returns 1 else 0.
I have 50000 products and 500 such dealers and this query takes too long to execute.
OR
APPROACH2: Get list of dealers and set isGuaranteed flag in c#(server) side.
select id, name, cityId, dealerId, price. Map these to c# list of products
select dealerId from guaranteeDealers table to c# list of dealers.
Iterate product records in c# and set the isGuaranteed flag by c# function that checks if product's dealerId is in c# list of guaranteeDealers.
This takes very less time compared to 1.
While both approaches look similar to me, can someone explain why it takes so long time to execute function in select statement in mysql? Also which is correct to do, approach 1 or 2?
Q: "why it takes so long time to execute function in select statement in mysql?"
In terms of performance, executing a correlated subquery 50,000 times will eat our lunch, and if we're not careful, it will eat our lunchbox too.
That subquery will be executed for each and every row returned by the outer query. That's like executing 50,000 separate, individual SELECT statements. And that's going to take time.
Hiding a correlated subquery inside a MySQL stored program (function) doesn't help. That just adds overhead on each execution of the subquery, and makes things slower. If we strip out the function and bring that subquery inline, we are probably looking at something like this:
SELECT p.id
, p.name
, p.cityId
, p.dealerId
, p.price
, IFNULL( ( SELECT 1
FROM guaranteeDealers d
WHERE d.dealerId = p.dealerID
LIMIT 1
)
,0) AS isGuarantee
FROM products p
ORDER BY ...
For each and every row returned from products (that isn't filtered out by a predicate e.g. condition in the WHERE clause), this is essentially telling MySQL to execute a separate SELECT statement. Run a query to look to see if the dealerID is found in the guaranteeDealers table. And that happens for each row.
If the outer query is only returning a couple of rows, then that's only a couple of extra SELECT statements to execute, and we aren't really going to notice the extra time. But when we return tens (or hundreds) of thousands of rows, that starts to add up. And it gets expensive, in terms of the total amount of time all those query executions take.
And if we "hide" that subquery in a MySQL stored program (function), that adds more overhead, introducing a bunch of context switches. From query executing in the database context, calling a function that switches over to the stored program engine which executes the function, which then needs to run a database query, which switches back to the database context to execute the query and return a resultset, switching back to the stored program environment to process the resultset and return a value, and then switching back to the original database context, to get the returned value. If we have to do that a couple of times, no big whoop. Repeat that tens of thousands of times, and that overhead is going to add up.
(Note that native MySQL built-in functions don't have this same context switching overhead. The native functions are compiled code that execute within the database context. Which is a big reason we favor native functions over MySQL stored programs.)
If we want improved performance, we need to ditch the processing RBAR (row by agonizing row), which gets excruciatingly slow for large sets. We need to approach the problem set-wise rather than row-wise.
We can tell MySQL what set to return, and let it figure out the most efficient way to return that. Rather than us round tripping back and forth to the database, executing individual SQL statements to get little bits of the set piecemeal, using instructions that dictate how MySQL should prepare the set.
In answer to the question
Q: "which approach is correct"
both approaches are "correct" is as much as they return the set we're after.
The second approach is "better" in that it significantly reduces the number of SELECT statements that need to be executed (2 statements rather than 50,001 statements).
In terms of the best approach, we are usually better off letting MySQL do the "matching" of rows, rather than doing the matching in client code. (Why unnecessarily clutter up our code doing an operation that can usually be much more efficiently accomplished in the database.) Yes, sometimes we need to do the matching in our code. And occasionally it turns out to be faster.
But sometimes, we can write just one SELECT statement that specifies the set we want returned, and let MySQL have a go at it. And if it's slow, we can do some tuning, looking at the execution plan, making sure suitable indexes are available, and tweaking the query.
Given the information in the question about the set to be returned, and assuming that dealerId is unique in the guaranteeDealers table. If our "test" is whether a matching row exists in the guaranteeDealers table, we can use an OUTER JOIN operation, and an expression in the SELECT list that returns a 0 or 1, depending on whether a matching row was found.
SELECT p.id
, p.name
, p.cityId
, p.dealerId
, p.price
, IF(d.dealerId IS NULL,0,1) AS isGuarantee
FROM products p
LEFT
JOIN guaranteeDealers d
ON d.dealerId = p.dealerId
ORDER BY ...
For optimal performance, we are going to want to have suitable indexes defined. At a mimimum (if there isn't already such an index defined)
ON guaranteeDealers (dealerId)
If there are also other tables that are involved in producing the result we are after, then we want to also involve that table in the query we execute. That will give the MySQL optimizer a chance to come up with the most efficient plan to return the entire set. And not constrain MySQL to performing individual operations to be return bits piecemeal.
select id, name, cityId, customers.dealerId, price,
isGuaranteed = guaranteeDealers.dealerId is not null
from customers left join guaranteeDealers
on guaranteeDealers.dealerId = customets.dealerId
No need to call a function.
Note I have used customers because that is the table you used in your question - although I suspect you might have meant products.
Approach 1 is the better one because it reduces the size of the resultset being transferred from the database server to the application server. Its performance problem is caused by the isGuaranteed function, which is being executed once per row (of the customers table, which looks like it might be a typo). An approach like this would be much more performant:
select p.id, p.name, p.cityId, p.dealerId, p.price, gd.IsGuaranteed is not null
from Product p
left join GuaranteeDealers gd on p.dealerId = gd.dealerId

Optimize multiple SQL Selects in C#

I have a part of a program wich has to display data from multiple tables, based on different queries. So far it looks like this: (keep in mind that every subsequent SELECT is based on something we got from A)
SELECT * FROM TABLE A WHERE ID = ...
SELECT [8 fields] FROM TABLE B WHERE ...
SELECT [5 fields] FROM TABLE C WHERE ...
SELECT [1 field] FROM TABLE D WHERE ...
SELECT [1 field] FROM TABLE E WHERE ...
SELECT [1 field] FROM TABLE F WHERE ...
SELECT [1 field] FROM TABLE G WHERE ...
SELECT [1 field] FROM TABLE H WHERE ...
SELECT [2 fields] FROM TABLE I WHERE ...
After that, I take the results and create different objects or put in different fields with them.
Thing is, between clicking the button and getting the window to show, I have a delay of about 2 seconds.
Keep in mind, this is a very big database, with millions of records. Changing the DB is out of the question, unfortunately.
I am searching only by Primary Keys, I have no way to restrict the search even more than that.
The connection is opened from the start, I don't close/reopen it after each statement.
Joining just Table A and Table B takes a lot longer than two different Selects, up to 1.5 seconds, while running sequential selects goes down to bout 300 ms.
I still find it to be quite a long time, given that the first query executes in around 53 ms in the DBMS.
I am using the ODBC driver in C#, Net Framework 4. The database itself is a DB2, however, using the DB2 native driver has given us a plethora of problems and IBM has been less than helpful about it.
Also, whnever I select only a few fields, I create the needed object using only those and leaving the rest on default.
Is there any way I could improve this?
Thank you in advance,
Andrei
Edit: The diagnostic tool says something among the lines of:
--Two queries in another part of the program, we can ignore these, as they are not usually there-- 0.31 s
First query - 0.75 s
Second query - 0.87s
Third query - 0.95s
Fourth query - 0.99s
Fifth query - 1.00s
Sixth query - 1.04s
Seventh query - 1.08s
Eighth query - 1.10s
Ninth query - 1.12s
Program output - 1.81s
There is overhead to constructing query strings and executing them. When running multiple similar queries, you want to be sure that they are compiled once and then the execution plan is re-used.
However, it is likely that the best approach is to create a single query that returned multiple columns. Naively, this would look like:
select . . .
from a join
b
on . . . join
c
on . . . join
. . .
However, the joins might be left joins. The query might be more complex if you are joining along different dimensions that might produce Cartesian products.
The key point is that the SQL query will optimize the query inside the database. This is (generally) more efficient than constructing multiple different queries.

Categories

Resources