Consider a normal Linq expression will be something like this:
(This is just a sample to make things more understandable)
IQueryable<Person> personQuery= (from ppl in PersonContext
select ppl).ASQueryable();
List<Person>personList = personQuery.where(x => x.age==13).ToList();
So if I decided to put the 1st part of the linq query inside a stored procedure,
things will work out something like this.
IQueryable<Person> personQuery= PersonContext.sp_RetrievePerson().ASQueryable();
List<Person> personList = personQuery.where(x => x.age==13).ToList();
So for the question, I believe that the 1st method only sends the sql call when toList() is called.
In another words, the query sent to sql for execution will be
Select * from Person where age=13
But for method 2, how many times will this query be sent for execution?
If it is only sent 1 time, does it make it redundant to call the stored procedure as stored procedure is known for having a faster execution and how will the query sent to sql look like?
I am not sure about this one, but PersonContext.sp_RetrievePerson() returns an ObjectResult, which internally uses a DbDataReader. That means that the stored procedure is called once, the
personQuery.where(x => x.age==13) iterates over the resulting rows and filters out not matching objects.
By using stored procedures for this you might get a small performance gain on querying the database, but you have to evaluate each object in the persons table, AFTER reading it from the database. So I think in this scenario using stored procedures only makes sense, if you provide a parameter age to the stored procedure for filtering the results already in the database.
Related
We are a product website with several products having guarantee. Guarantee is only applicable for few products with particular dealerids. The 2 tables are:
Product table with columns as id, name, cityId, dealerId, price. This table has all the products.
GuaranteeDealers table with column as dealerId. This has all dealer with guaranteed products.
We want to get all products with info if it is guaranteed or not. The query looks like:
APPROACH1: Get isGuaranteed from sql function to server(c#) side:
select id, name, cityId, dealerId, price, isGuaranteed = isGuaranteed( dealerId) from customers
isGuaranteed is a sql function that checks if dealerId is in the table guranteeDealers. If yes it returns 1 else 0.
I have 50000 products and 500 such dealers and this query takes too long to execute.
OR
APPROACH2: Get list of dealers and set isGuaranteed flag in c#(server) side.
select id, name, cityId, dealerId, price. Map these to c# list of products
select dealerId from guaranteeDealers table to c# list of dealers.
Iterate product records in c# and set the isGuaranteed flag by c# function that checks if product's dealerId is in c# list of guaranteeDealers.
This takes very less time compared to 1.
While both approaches look similar to me, can someone explain why it takes so long time to execute function in select statement in mysql? Also which is correct to do, approach 1 or 2?
Q: "why it takes so long time to execute function in select statement in mysql?"
In terms of performance, executing a correlated subquery 50,000 times will eat our lunch, and if we're not careful, it will eat our lunchbox too.
That subquery will be executed for each and every row returned by the outer query. That's like executing 50,000 separate, individual SELECT statements. And that's going to take time.
Hiding a correlated subquery inside a MySQL stored program (function) doesn't help. That just adds overhead on each execution of the subquery, and makes things slower. If we strip out the function and bring that subquery inline, we are probably looking at something like this:
SELECT p.id
, p.name
, p.cityId
, p.dealerId
, p.price
, IFNULL( ( SELECT 1
FROM guaranteeDealers d
WHERE d.dealerId = p.dealerID
LIMIT 1
)
,0) AS isGuarantee
FROM products p
ORDER BY ...
For each and every row returned from products (that isn't filtered out by a predicate e.g. condition in the WHERE clause), this is essentially telling MySQL to execute a separate SELECT statement. Run a query to look to see if the dealerID is found in the guaranteeDealers table. And that happens for each row.
If the outer query is only returning a couple of rows, then that's only a couple of extra SELECT statements to execute, and we aren't really going to notice the extra time. But when we return tens (or hundreds) of thousands of rows, that starts to add up. And it gets expensive, in terms of the total amount of time all those query executions take.
And if we "hide" that subquery in a MySQL stored program (function), that adds more overhead, introducing a bunch of context switches. From query executing in the database context, calling a function that switches over to the stored program engine which executes the function, which then needs to run a database query, which switches back to the database context to execute the query and return a resultset, switching back to the stored program environment to process the resultset and return a value, and then switching back to the original database context, to get the returned value. If we have to do that a couple of times, no big whoop. Repeat that tens of thousands of times, and that overhead is going to add up.
(Note that native MySQL built-in functions don't have this same context switching overhead. The native functions are compiled code that execute within the database context. Which is a big reason we favor native functions over MySQL stored programs.)
If we want improved performance, we need to ditch the processing RBAR (row by agonizing row), which gets excruciatingly slow for large sets. We need to approach the problem set-wise rather than row-wise.
We can tell MySQL what set to return, and let it figure out the most efficient way to return that. Rather than us round tripping back and forth to the database, executing individual SQL statements to get little bits of the set piecemeal, using instructions that dictate how MySQL should prepare the set.
In answer to the question
Q: "which approach is correct"
both approaches are "correct" is as much as they return the set we're after.
The second approach is "better" in that it significantly reduces the number of SELECT statements that need to be executed (2 statements rather than 50,001 statements).
In terms of the best approach, we are usually better off letting MySQL do the "matching" of rows, rather than doing the matching in client code. (Why unnecessarily clutter up our code doing an operation that can usually be much more efficiently accomplished in the database.) Yes, sometimes we need to do the matching in our code. And occasionally it turns out to be faster.
But sometimes, we can write just one SELECT statement that specifies the set we want returned, and let MySQL have a go at it. And if it's slow, we can do some tuning, looking at the execution plan, making sure suitable indexes are available, and tweaking the query.
Given the information in the question about the set to be returned, and assuming that dealerId is unique in the guaranteeDealers table. If our "test" is whether a matching row exists in the guaranteeDealers table, we can use an OUTER JOIN operation, and an expression in the SELECT list that returns a 0 or 1, depending on whether a matching row was found.
SELECT p.id
, p.name
, p.cityId
, p.dealerId
, p.price
, IF(d.dealerId IS NULL,0,1) AS isGuarantee
FROM products p
LEFT
JOIN guaranteeDealers d
ON d.dealerId = p.dealerId
ORDER BY ...
For optimal performance, we are going to want to have suitable indexes defined. At a mimimum (if there isn't already such an index defined)
ON guaranteeDealers (dealerId)
If there are also other tables that are involved in producing the result we are after, then we want to also involve that table in the query we execute. That will give the MySQL optimizer a chance to come up with the most efficient plan to return the entire set. And not constrain MySQL to performing individual operations to be return bits piecemeal.
select id, name, cityId, customers.dealerId, price,
isGuaranteed = guaranteeDealers.dealerId is not null
from customers left join guaranteeDealers
on guaranteeDealers.dealerId = customets.dealerId
No need to call a function.
Note I have used customers because that is the table you used in your question - although I suspect you might have meant products.
Approach 1 is the better one because it reduces the size of the resultset being transferred from the database server to the application server. Its performance problem is caused by the isGuaranteed function, which is being executed once per row (of the customers table, which looks like it might be a typo). An approach like this would be much more performant:
select p.id, p.name, p.cityId, p.dealerId, p.price, gd.IsGuaranteed is not null
from Product p
left join GuaranteeDealers gd on p.dealerId = gd.dealerId
Given the EF linq below, will all the records in the stored proc usp_GetTestRecords() come across and then get filtered?
TestRecordsDBEntities dataContext = new TestRecordsDBEntities();
var tests = dataContext.usp_GetTestRecords();
var filtered = tests.Where(x => x.GroupId == groupId)
.OrderByDescending(y => y.Name)
.ToList();
Yes all data will be first fetched in the memory and then filtered on the client side. Using stored procedure with EF is not a good idea. You will loose the advantage of lazy,eager or explicit loading here. However if you let EF generate queries for you then it will be compiled will all filters and executed on server
As an alternative, you may want to consider creating a table value function rather than stored proc. The advantage here is that result-set of the function can be joined with other tables on the server side. The disadvantage is that you are limited in terms of what you can do inside of the function and the database does not have access to indexes for the function results that you could with indexed views.
See more about using TVS with EF at http://blogs.msdn.com/b/efdesign/archive/2011/01/21/table-valued-function-support.aspx
Sprocs will allways return all the records affected by te query associated to it. If you want to add a where clause just add a param to your sproc and perform the filtering in sql. Check this for more information How to pass Parameters to Stored Procedure from Entity Framework?
MSDN docs say that I can write a stored procedure like this
CREATE PROCEDURE MultipleResultTypesSequentially
AS
select * from products
select * from customers
then read it from LINQ like this
IMultipleResults sprocResults =
db.MultipleResultTypesSequentially();
// First read products.
foreach (Product prod in sprocResults.GetResult<Product>())
{
Console.WriteLine(prod.ProductID);
}
// Next read customers.
foreach (Customer cust in sprocResults.GetResult<Customer>())
{
Console.WriteLine(cust.CustomerID);
}
what if one of my select statements return something other then a regular table object - with a join or just selecting certain columns?
how do I let LINQ know that I want to read the next SELECT ??? basically , what I'm wondering is this example from MSDN reading Products first then Customers because they are written in that order in the stored procedure , or is writing .GetResult<Customer>() telling c# to find the result that maps to type Customer? and also what would the foreach loop look like for this unknown type?
I did a quick test with a similar stored procedure and found that if your foreach loops are not in the correct order, an InvalidOperationException is thrown, so it doesn't look like C# is able to find the correct result based on the type used in GetResult<>.
As for your select statement returning something other than a table, if you drag a stored procedure from the Server Explorer onto the Linq to Sql designer, the designer will autogenerate a class based on procedures output. I created a procedure with a couple of joined tables, neither of which existed in my project, and Linq to Sql created a type for me, named like StoredProcedureNameResult, so a procedure called GetCreditCard had a type named GetCreditCardResult.
I am writing a Silverlight client that interacts with an SQL database via ASP.NET 2.0 web services, which I am also developing. One of the services needs to return the results of a stored procedure and it works well. However, one parameter of the stored procedure needs to be retrieved from a different location before the stored procedure can be executed and this additional request to the database causes an obvious slowdown (evident when I cache the retrieved value rather than obtaining it every call).
Unfortunately, caching the value isn't valid for my situation and I'd rather combine the retrieval of this value and the subsequent stored procedure execution into a single query so that the the server can optimise the request. However, my SQL is not strong and I haven't the faintest idea how I go about this.
The value, let's call it tasktype, is referenced via a single key, id. The stored procedure, getrecords, takes a few arguments including tasktype, but it can be assumed that the other argument values are known at the time of calling the query. The stored procedure returns a table of records.
Thanks for any help.
Well, it could be something like:
cmd.CommandType = CommandType.Text;
cmd.Parameters.AddWithValue("#id", ...); // your id arg
cmd.Parameters.AddWithValue(... , ...); // your other args...
cmd.CommandText = #"
DECLARE #TaskType int -- or whatever
SELECT #TaskType = // some existing query based on #id
EXEC getrecords #TaskType, ...
";
However, you will perhaps have to clarify how to get a task-type.
You should be able to consume this either as an IDataReader, or using DataTable.Load.
You should be able to create a UDF that will get the value of tasktype and call this from within your data retrieval sproc.
Something like
CREATE FUNCTION dbo.TaskType()
Returns int
SELECT ... stuff that gets task type
END
then from with your data retrieval sproc call
DECLARE tasktype int
SELECT tasktype = dbo.TaskType
or something like that...might need a bit of reworking :-)
I have some complex stored procedures that may return many thousands of rows, and take a long time to complete.
Is there any way to find out how many rows are going to be returned before the query executes and fetches the data?
This is with Visual Studio 2005, a Winforms application and SQL Server 2005.
You mentioned your stored procedures take a long time to complete. Is the majority of the time taken up during the process of selecting the rows from the database or returning the rows to the caller?
If it is the latter, maybe you can create a mirror version of your SP that just gets the count instead of the actual rows. If it is the former, well, there isn't really that much you can do since it is the act of finding the eligible rows which is slow.
A solution to your problem might be to re-write the stored procedure so that it limits the result set to some number, like:
SELECT TOP 1000 * FROM tblWHATEVER
in SQL Server, or
SELECT * FROM tblWHATEVER WHERE ROWNUM <= 1000
in Oracle. Or implement a paging solution so that the result set of each call is acceptably small.
make a stored proc to count the rows first.
SELECT COUNT(*) FROM table
Unless there's some aspect of the business logic of you app that allows calculating this, no. The database it going to have to do all the where & join logic to figure out how line rows, and that's the vast majority of the time spend in the SP.
You can't get the rowcount of a procedure without executing the procedure.
You could make a different procedure that accepts the same parameters, the purpose of which is to tell you how many rows the other procedure should return. However, the steps required by this procedure would normally be so similar to those of the main procedure that it should take just about as long as just executing the main procedure.
You would have to write a different version of the stored procedure to get a row count. This one would probably be much faster because you could eliminate joining tables which you aren't filtered against, remove ordering, etc. For example if your stored proc executed the sql such as:
select firstname, lastname, email, orderdate from
customer inner join productorder on customer.customerid=productorder.productorderid
where orderdate>#orderdate order by lastname, firstname;
your counting version would be something like:
select count(*) from productorder where orderdate>#orderdate;
Not in general.
Through knowledge about the operation of the stored procedure, you may be able to get either an estimate or an accurate count (for instance, if the "core" or "base" table of the query is able to be quickly calculated, but it is complex joins and/or summaries which drive the time upwards).
But you would have to call the counting SP first and then the data SP or you could look at using a multiple result set SP.
It could take as long to get a row count as to get the actual data, so I wouldn't advodate performing a count in most cases.
Some possibilities:
1) Does SQL Server expose its query optimiser findings in some way? i.e. can you parse the query and then obtain an estimate of the rowcount? (I don't know SQL Server).
2) Perhaps based on the criteria the user gives you can perform some estimations of your own. For example, if the user enters 'S%' in the customer surname field to query orders you could determine that that matches 7% (say) of the customer records, and extrapolate that the query may return about 7% of the order records.
Going on what Tony Andrews said in his answer, you can get an estimated query plan of the call to your query with:
SET showplan_text OFF
GO
SET showplan_all on
GO
--Replace with call you your stored procedure
select * from MyTable
GO
SET showplan_all ofF
GO
This should return a table, or many tables which will let you get the estimated row count of your query.
You need to analyze the returned data set, to determine what is a logical, (meaningful) primary key for the result set that is being returned. In general this WILL be much faster than the complete procedure, because the server is not constructing a result set from data in all the columns of each row of each table, it is simply counting the rows... In general, it may not even need to read the actual table rows off disk to do this, it may simply need to count index nodes...
Then write another SQL statement that only includes the tables necessary to generate those key columns (Hopefully this is a subset of the tables in the main sql query), and the same where clause with the same filtering predicate values...
Then add another Optional parameter to the Stored Proc called, say, #CountsOnly, with a default of false (0) as so...
Alter Procedure <storedProcName>
#param1 Type,
-- Other current params
#CountsOnly TinyInt = 0
As
Set NoCount On
If #CountsOnly = 1
Select Count(*)
From TableA A
Join TableB B On etc. etc...
Where < here put all Filtering predicates >
Else
<Here put old SQL That returns complete resultset with all data>
Return 0
You can then just call the same stored proc with #CountsOnly set equal to 1 to just get the count of records. Old code that calls the proc would still function as it used to, since the parameter value is set to default to false (0), if it is not included
It's at least technically possible to run a procedure that puts the result set in a temporary table. Then you can find the number of rows before you move the data from server to application and would save having to create the result set twice.
But I doubt it's worth the trouble unless creating the result set takes a very long time, and in that case it may be big enough that the temp table would be a problem. Almost certainly the time to move the big table over the network will be many times what is needed to create it.