The following query uses join operation, i want the same query to execute without using join operation. Is it possible to do so? If yes than how can i do it?
select jname, jcode from heardt inner join judge ON heardt.jud1 = jcode
The reason i am trying to do this is because i am using odbc connection for mysql and join operations are not getting executed at all as web page is loading for infinitely long and not giving any output. That is the reason why i want to try without using join operation
I don't know your rationale, I find JOINS much easier to read but you can replace them by joining (no pun intented) the tables in the where clause.
select jname
, jcode
from heardt
, judge
where heardt.jud1 = judge.jcode
There is no additional filter on that query. It might cause the query to return many rows. This could cause a slowdown, depending on the number of records in your table.
You should consider limiting the number of returned records from the query.
Something else you need to check, if there is an index on the JCode field
Select jname, jud1 from heardt where not jud1 is null
EDIT: Ok, this was quick. So: Why do you need the 'join'?
The query Select jname, jud1 from heardt where not jud1 is null shows that jud1 has a value, but not that that value is valid. The join or where validates the relationship between the tables.
If your query takes a very long time to execute with the join in place it is most likely that you do not have correct indexes on the tables causing a table scan to take place instead of and indexed search.
i am using odbc connection for mysql and join operations are not getting executed as web page is loading for infinitely long and not giving any output. That is the reason why i want to try without using join operation
That's probably not because your JOIN is not getting executed, but because your JOIN query is taking too long. And that's probably because you don't have the correct index defined (an index, preferably a clustered index, on judge.jcode).
If the join is still taking too long after adding such an index you could consider precaching the query with a table or indexed view (latter however not supported in MySQL).
If you are able to run it SQL Manager you should be able to run it on the ODBC Connection, if not there is something wrong with the way you are instantiating that connection in C#.
Can you post the c# code you are using so we can make a better judged answer for you.
As lieven pointed out I think his solution is a good one
select jname
, jcode
from heardt
, judge
where heardt.jud1 = judge.jcode
But you should create indexes in the fields you are joining, therefore the result will be provided much quickly so add
Create index a1 on heardt(jud1);
Create index a2 on judge(jcode);
I think this is the better possible option
Related
We are a product website with several products having guarantee. Guarantee is only applicable for few products with particular dealerids. The 2 tables are:
Product table with columns as id, name, cityId, dealerId, price. This table has all the products.
GuaranteeDealers table with column as dealerId. This has all dealer with guaranteed products.
We want to get all products with info if it is guaranteed or not. The query looks like:
APPROACH1: Get isGuaranteed from sql function to server(c#) side:
select id, name, cityId, dealerId, price, isGuaranteed = isGuaranteed( dealerId) from customers
isGuaranteed is a sql function that checks if dealerId is in the table guranteeDealers. If yes it returns 1 else 0.
I have 50000 products and 500 such dealers and this query takes too long to execute.
OR
APPROACH2: Get list of dealers and set isGuaranteed flag in c#(server) side.
select id, name, cityId, dealerId, price. Map these to c# list of products
select dealerId from guaranteeDealers table to c# list of dealers.
Iterate product records in c# and set the isGuaranteed flag by c# function that checks if product's dealerId is in c# list of guaranteeDealers.
This takes very less time compared to 1.
While both approaches look similar to me, can someone explain why it takes so long time to execute function in select statement in mysql? Also which is correct to do, approach 1 or 2?
Q: "why it takes so long time to execute function in select statement in mysql?"
In terms of performance, executing a correlated subquery 50,000 times will eat our lunch, and if we're not careful, it will eat our lunchbox too.
That subquery will be executed for each and every row returned by the outer query. That's like executing 50,000 separate, individual SELECT statements. And that's going to take time.
Hiding a correlated subquery inside a MySQL stored program (function) doesn't help. That just adds overhead on each execution of the subquery, and makes things slower. If we strip out the function and bring that subquery inline, we are probably looking at something like this:
SELECT p.id
, p.name
, p.cityId
, p.dealerId
, p.price
, IFNULL( ( SELECT 1
FROM guaranteeDealers d
WHERE d.dealerId = p.dealerID
LIMIT 1
)
,0) AS isGuarantee
FROM products p
ORDER BY ...
For each and every row returned from products (that isn't filtered out by a predicate e.g. condition in the WHERE clause), this is essentially telling MySQL to execute a separate SELECT statement. Run a query to look to see if the dealerID is found in the guaranteeDealers table. And that happens for each row.
If the outer query is only returning a couple of rows, then that's only a couple of extra SELECT statements to execute, and we aren't really going to notice the extra time. But when we return tens (or hundreds) of thousands of rows, that starts to add up. And it gets expensive, in terms of the total amount of time all those query executions take.
And if we "hide" that subquery in a MySQL stored program (function), that adds more overhead, introducing a bunch of context switches. From query executing in the database context, calling a function that switches over to the stored program engine which executes the function, which then needs to run a database query, which switches back to the database context to execute the query and return a resultset, switching back to the stored program environment to process the resultset and return a value, and then switching back to the original database context, to get the returned value. If we have to do that a couple of times, no big whoop. Repeat that tens of thousands of times, and that overhead is going to add up.
(Note that native MySQL built-in functions don't have this same context switching overhead. The native functions are compiled code that execute within the database context. Which is a big reason we favor native functions over MySQL stored programs.)
If we want improved performance, we need to ditch the processing RBAR (row by agonizing row), which gets excruciatingly slow for large sets. We need to approach the problem set-wise rather than row-wise.
We can tell MySQL what set to return, and let it figure out the most efficient way to return that. Rather than us round tripping back and forth to the database, executing individual SQL statements to get little bits of the set piecemeal, using instructions that dictate how MySQL should prepare the set.
In answer to the question
Q: "which approach is correct"
both approaches are "correct" is as much as they return the set we're after.
The second approach is "better" in that it significantly reduces the number of SELECT statements that need to be executed (2 statements rather than 50,001 statements).
In terms of the best approach, we are usually better off letting MySQL do the "matching" of rows, rather than doing the matching in client code. (Why unnecessarily clutter up our code doing an operation that can usually be much more efficiently accomplished in the database.) Yes, sometimes we need to do the matching in our code. And occasionally it turns out to be faster.
But sometimes, we can write just one SELECT statement that specifies the set we want returned, and let MySQL have a go at it. And if it's slow, we can do some tuning, looking at the execution plan, making sure suitable indexes are available, and tweaking the query.
Given the information in the question about the set to be returned, and assuming that dealerId is unique in the guaranteeDealers table. If our "test" is whether a matching row exists in the guaranteeDealers table, we can use an OUTER JOIN operation, and an expression in the SELECT list that returns a 0 or 1, depending on whether a matching row was found.
SELECT p.id
, p.name
, p.cityId
, p.dealerId
, p.price
, IF(d.dealerId IS NULL,0,1) AS isGuarantee
FROM products p
LEFT
JOIN guaranteeDealers d
ON d.dealerId = p.dealerId
ORDER BY ...
For optimal performance, we are going to want to have suitable indexes defined. At a mimimum (if there isn't already such an index defined)
ON guaranteeDealers (dealerId)
If there are also other tables that are involved in producing the result we are after, then we want to also involve that table in the query we execute. That will give the MySQL optimizer a chance to come up with the most efficient plan to return the entire set. And not constrain MySQL to performing individual operations to be return bits piecemeal.
select id, name, cityId, customers.dealerId, price,
isGuaranteed = guaranteeDealers.dealerId is not null
from customers left join guaranteeDealers
on guaranteeDealers.dealerId = customets.dealerId
No need to call a function.
Note I have used customers because that is the table you used in your question - although I suspect you might have meant products.
Approach 1 is the better one because it reduces the size of the resultset being transferred from the database server to the application server. Its performance problem is caused by the isGuaranteed function, which is being executed once per row (of the customers table, which looks like it might be a typo). An approach like this would be much more performant:
select p.id, p.name, p.cityId, p.dealerId, p.price, gd.IsGuaranteed is not null
from Product p
left join GuaranteeDealers gd on p.dealerId = gd.dealerId
I'm running Entity Framework 5 in my C# .NET application, against a MySQL database using the MySQL .NET connector 6.6.5.
Usually, when EF isn't fast enough, I resort to either Stored Procedures or direct SQL execution with context.Database.SqlQuery However, I recently had an issue with a SQL call actually taking far longer than its EF equivalent, and I wondered if anybody knows why this is?
Here's the (slow) SQL query:
public sbyte? getFirstRouteTypeFromStop(string primaryCode) {
string sql = string.Format("SELECT r.route_type FROM stoptimes st INNER JOIN trips t ON st.trip_id = t.trip_id INNER JOIN routes r ON t.route_id = r.route_id WHERE st.stop_id = '{0}' LIMIT 1;", primaryCode);
return context.Database.SqlQuery<sbyte?>(sql).FirstOrDefault();
}
Here's the (fast) EF code:
public sbyte? getFirstRouteTypeFromStop(string primaryCode) {
return context.stoptimes.Where(st => st.stop_id.Equals(primaryCode)).FirstOrDefault().trip.route.route_type;
}
This method gets called repeatedly in a loop and EF is a LOT faster. (at least 1000%)
Why?
Important Notes:
The MySQL database has all these columns appropriately indexed.
When the native SQL query is run directly in MySQL it seems to execute much faster than when run in the C# app - I suspect this is quite an important observation.
You are using an INNER JOIN in the SqlQuery which means:
total rows selected = (no. of rows in stoptimes) * (no. of rows in routes) * (no. of rows in trips)
and then on that huge list "WHERE st.stop_id = '{0}'" is executed...
I suspect this is the problem in the sql query... the inner joins make a huge selection and filter the records from that...
Whereas the EF code filters only on the table stoptimes
Where(st => st.stop_id.Equals(primaryCode))
so it makes the filtration fast and then on the single selected record, the route and trip are fetched.
Note:- Try using LEFT OUTER JOIN... It will make your query faster.
hope it helps...
Maybe you have all or some Stoptimes (and its navigation properties like trip and route) already in memory. If this is the case, EF are just hitting databse one time (or at least not in every iteration of the loop). With SQLQuery you go to database in every iteration of the loop. Maybe primaryCode repeats a lot in the loop? or maybe you retrieve Stoptimes in previous code in your app without disposing context?
Try changing the TSQL
SELECT r.route_type
FROM stoptimes st
JOIN trips t
ON st.trip_id = t.trip_id
AND st.stop_id = '{0}'
JOIN routes r
ON t.route_id = r.route_id
LIMIT 1;
I'm using Entity Framework 5, ObjectContext and POCOs on my data access layer. I have a generic respository implementation and I have a method that queries the database with paging using Skip() and Take(). Everything works fine, except that the query performance is very slow when skipping a lot of rows (I'm talking about 170k rows)
This is an excerpt of my query on Linq to Entities:
C# Code:
ObjectContext oc = TheOBJEntitiesFactory.CreateOBJEntitiesContext(connection);
var idPred = oc.CreateObjectSet<view_Trans>("view_Trans").AsQueryable();
idPred = idPred.OrderBy(sortColumn, sortDirection.ToLower().Equals("desc"));
var result = idPred.Skip(iDisplayStart).Take(iDisplayLength);
return new PagedResult<view_Trans>(result, totalRecords);
In the translated query to Transact-SQL I noticed that instead of using the ROW_NUMBER() clause with the view directly its making a sub-query and applying the ROW_NUMBER() to the results of the sub-query...
example:
select top(10) extent1.A, extent1.B.extent1.C from (
select extent1.A, extent1.B, extent1.C,
row_number() OVER (ORDER BY [Extent1].[A] DESC) AS [row_number]
from (
select A,B,C from table as extent1)) as extent1
WHERE [Extent1].[row_number] > 176610
ORDER BY [Extent1].[A] DESC
This takes about 165 seconds to complete. Any idea on how to improve the performance of the translated query statement?
For those not following the comments above, I suspected the problem was not the extra SELECT, since that extra SELECT is present on many, many EF queries which do not take 165s to run. I eventually noticed that his ObjectSet referenced a VIEW and wondered if that might be part of the problem. After some experimentation, he narrowed the problem down to a LEFT JOIN inside the view. I suggested that he ran the Database Tuning Advisor on that query; he did, and the two indices suggested fixed the problem.
One reason for the slowness is probably that your sql is ordering your rows twice.
To control the query, the only option I know of is to call idPred.SqlQuery("Select ...", params). This will allow you to write your own optimized query for the data request.
I'm trying to write the following LINQ-Entities query:
Get a list of Questions, that have been answered, ordered by most recently answered
So, basically it's a 1..* between Question and Answer.
So i tried to write the query in SQL first, so that i understood it, and here's what i came up with:
WITH [Answers] AS
(
SELECT QuestionId,
CreatedOn,
ROW_NUMBER() OVER
(
PARTITION BY QuestionId
ORDER BY CreatedOn DESC
) As [Rank]
FROM dbo.Answers
)
select a.*
from dbo.questions a
inner join answers on a.questionid = answers.questionid
where answers.rank = 1
order by answers.createdon desc
Now, i have no idea if it's even possible to do this with LINQ.
Of course, that query above might be the wrong way to go about it, so don't think of this as a simple T-SQL to LINQ-Entities translation.
I'm just looking for a way to write a LINQ-Entities query for the above requirement.
Any ideas?
EDIT
Here's what i've tried so far:
var query = questions
.Where(q => q.Answers.Any())
.OrderByDescending(
q => q.Answers.OrderByDescending(
a => a.CreatedOn).FirstOrDefault());
Just hopeful i guess. Following error received:
DbSortClause expressions must have a type that is order comparable.
Parameter name: key
EDIT
I should also mention that i need to eager load the Answers in the final result set, e.g:
return ctx.Questions.Include(q => q.Answers)
from question in context.Questions
where question.Answers.Any()
let max = question.Answers.Max(a=>a.CreatedOn)
orderby max descending
select question
EDIT: since you want to eager load answers; you might want to either resort to doing this entire query in SQL Server and exposing it to EF as a stored procedure or you might want to add LastAnswerOn column in questions table. This will make your query much more efficient and simple and you will be able to use it in EF without problems.
Replace that FirstOrDefault which does not make sense with First. Or Replace entire sub-query with Max (on date). It should work.
Good luck with EF, it is so broken ;-) In many cases where Linq To Sql would work, with EF I had to give up and fetch the data and do the computation locally.
I'm trying to load data from oracle to sql server (Sorry for not writing this before)
I have a table(actually a view which has data from different tables) with 1 million records atleast. I designed my package in such a way that i have functions for business logics and call them in select query directly.
Ex:
X1(id varchar2)
x2(id varchar2, d1 date)
x3(id varchar2, d2 date)
Select id, x, y, z, decode (.....), x1(id), x2(id), x3(id)
FROM Table1
Note: My table has 20 columns and i call 5 different functions on atleast 6-7 columns.
And some functions compare the parameters passed with audit table and perform logic
How can i improve performance of my query or is there a better way to do this
I tried doing it in C# code but initial select of records is large enough for dataset and i get outofmemory exception.
my function does selects and then performs logic for example:
Function(c_x2, eid)
Select col1
into p_x1
from tableP
where eid = eid;
IF (p_x1 = NULL) THEN
ret_var := 'INITIAL';
ELSIF (p_x1 = 'L') AND (c_x2 = 'A') THEN
ret_var:= 'RL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'RL', eid, 'PackageProcName');
ELSIF (p_x1 = 'A') AND (c_x2 = 'L') THEN
ret_var := 'GL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'GL', eid, 'PackgProcName');
END IF;
RETURN ret_var;
i'm getting each row and performing
logic in C# and then inserting
If possible INSERT from the SELECT:
INSERT INTO YourNewTable
(col1, col2, col3)
SELECT
col1, col2, col3
FROM YourOldTable
WHERE ....
this will run significantly faster than a single query where you then loop over the result set and have an INSERT for each row.
EDIT as for the OP question edit:
you should be able to replace the function call to plain SQL in your query. Mimic the "initial" using a LEFT JOIN tableP, and the "RL" or "GL" can be calculated using CASE.
EDIT based on OP recent comments:
since you are loading data from Oracle into SQL Server, this is what I would do: most people that could help have moved on and will not read this question again, so open a new question where you say: 1) you need to load data from Oracle (version) to SQL Server Version 2) currently you are loading it from one query processing each row in C# and inserting it into SQL Server, and it is slow. and all the other details. There are much better ways of bulk loading data into SQL Server. As for this question, you could accept an answer, answer yourself where you explain you need to ask a new question, or just leave it unaccepted.
My recommendation is that you do not use functions and then call them within other SELECT statements. This:
SELECT t.id, ...
x1(t.id) ...
FROM TABLE t
...is equivalent to:
SELECT t.id, ...
(SELECT x.column FROM x1 x WHERE x.id = t.id)
FROM TABLE t
Encapsulation doesn't work in SQL like when using C#/etc. While the approach makes maintenance easier, performance suffers because sub selects will execute for every row returned.
A better approach would be to update the supporting function to include the join criteria (IE: "where x.id = t.id" for lack of real one) in the SELECT:
SELECT x.id
x.column
FROM x1 x
...so you can use it as a JOIN:
SELECT t.id, ...
x1.column
FROM TABLE t
JOIN (SELECT x.id,
x.column
FROM MY_PACKAGE.x) x1 ON x1.id = t.id
I prefer that to having to incorporate the function logic into the queries for sake of maintenance, but sometimes it can't be helped.
Personally I'd create an SSIS import to do this task. USing abulk insert you can imporve speed dramitcally and SSIS can handle the functions part after the bulk insert.
Firstly you need to find where the performance problem actually is. Then you can look at trying to solve it.
What is the performance of the view like? How long does it take the view to execute
without any of the function calls? Try running the command
How well does it perform? Does it take 1 minute or 1 hour?
create table the_view_table
as
select *
from the_view;
How well do the functions perform? According to the description you are making approximately 5 million function calls. They had better be pretty efficient! Also are the functions defined as deterministic. If the functions are defined using the deterministic keyword, the Oracle has a chance of optimizing away some of the calls.
Is there a way of reducing the number of function calls? The function are being called once the view has been evaluated and the million rows of data are available. BUT are all the input values from the highest level of the query? Can the function calls be imbeded into the view at a lower level. Consider the following two queries. Which would be quicker?
select
f.dim_id,
d.dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from large_fact_table f
join small_dim_table d on (f.dim_id = d.dim_id)
select
f.dim_id,
d.dim_col_1,
d.dim_col_2
from large_fact_table f
join (
select
dim_id,
dim_col_1,
long_slow_function(d.dim_col_2) as dim_col_2
from small_dim_table) d on (f.dim_id = d.dim_id)
Ideally the second query should run quicker as it calling the function fewer times.
The performance issue could be in any of these places and until you investigate the issue, it would be difficult to know where to direct your tuning efforts.
A couple of tips:
Don't load all records into RAM but process them one by one.
Try to run as many functions on the client as possible. Databases are really slow to execute user defined functions.
If you need to join two tables, it's sometimes possible to create two connections on the client. Fetch the data main data with connection 1 and the audit data with connection 2. Order the data for both connections in the same way so you can read single records from both connections and perform whatever you need on them.
If your functions always return the same result for the same input, use a computed column or a materialized view. The database will run the function once and save it in a table somewhere. That will make INSERT slow but SELECT quick.
Create a sorted intex on your table.
Introduction to SQL Server Indizes, other RDBMS are similar.
Edit since you edited your question:
Using a view is even more sub-optimal, especially when querying single rows from it. I think your "busines functions" are actually something like stored procedures?
As others suggested, in SQL always go set based. I assumed you already did that, hence my tip to start using indexing.