I know this was partly asked before, but none of the questions completely answer this.
What happens when one uses LINQ to SQL to retrieve data from the database?
I've read the following questions:
Optimizing a LINQ to SQL query
Linq-To-Sql optimization for queries
What is unclear to me is: at which point is the database accessed? When are the queries run?
If I run the following query, how will it translate to a SQL query?
DatabaseDataContext db = new DatabaseDataContext();
var users = from x in db.Users
where x.Rank > 10
orderby x.RegistrationDate descending
select x)
.Skip(pageIndex * recordCount)
.Take(recordCount);
And then, later, if I try to access some property of some user, how will the query be constructed (this is partly answered here)?
var temp = users.ToList()[0].SomeProperty;
Basically, what I really want to know is how the LINQ to SQL works under the hood, how it goes from the C# language statement to the SQL and how it is optimized.
The LINQ to SQL framework will take your Linq query, which is in fact an expression tree, and then will convert this expression tree into a pure SQL query. See How to: Use Expression Trees to Build Dynamic Queries
In fact, every expression tree can be translated into whatever langage or database you need. You will have different providers implementing IQueryable for different databases (Oracle, SQLite, etc.). Note that LINQ to SQL is the abbreviation of LINQ to SQL Server. On the other hand, Entity framework/LINQ to Entities can be extent more easily for other databases.
The main point here is the IQueryable interface, which contains an expression tree, and also the implementation of the provider. For an example on how to implement a provider, ie how to translate from an expression tree to a query, see LINQ: Building an IQueryable Provider
Here is a snippet that will give you a flavor of what happens under the hood:
if (select.OrderBy != null && select.OrderBy.Count > 0)
{
this.AppendNewLine(Indentation.Same);
sb.Append("ORDER BY ");
for (int i = 0, n = select.OrderBy.Count; i < n; i++)
{
OrderExpression exp = select.OrderBy[i];
if (i > 0)
{
sb.Append(", ");
}
this.Visit(exp.Expression);
if (exp.OrderType != OrderType.Ascending)
{
sb.Append(" DESC");
}
}
}
The queries are run as soon as you demand the result.
var qry = (from x in db.Users where x.Rank > 10 orderby x.RegistrationDate descending
select x)
at this point the query has not run, becuase you haven't used the result.
Put it in a foreach or transfor it to a List and the query is forced to Materiliaze.
The rule of thumb is:
Whenever GetEnumerator is called on an IQueryable - the query is forced to materiliaze (wich meens "to to the database and get the actual recourds")
All you want to know is answered in the article on MSDN about LINQ to SQL: http://msdn.microsoft.com/en-us/library/bb425822.aspx
By the way, if you're only going to use a part of your result, as in your code above, it's better to modify your query, like so:
var prop = (from x in db.Users
where x.Rank > 10
orderby x.RegistrationDate descending
select x.SomeProperty)
.Skip(pageIndex)
.First()
.Select(x => x);
Optimization you do in your query is often more important than how the system performs peephole optimization under the hood...
Related
I have an IQueryable that has a list of pages.
I want to do: Pages.OrderByDescending(o => CalculateSort(o.page));
the method calculate sort is similar to that here is a plain english version:
public int calculatesort(page p)
{
int rating = (from r in db.rating select r). sum();
int comments = //query database for comments;
float timedecayfactor = math.exp(-page.totalhoursago);
return sortscore = (rating +comments)* timedecayfactor;
}
when I run a code similar to the one above an error is thrown that the mothode calculatesort cannot be converted to sql.
How can I do a conver the function above to be understood by sql so that I can use it to sort the pages?
Is this not a good approach for large data? Is there another method used to sort sets of results other than dynamically at the database?
I havent slept for days trying to fix this one :(
your code is nowhere near compiling so I'm guessing a lot here but I hope this gives an idea none the less.
As several have posted you need to give Linq-2-Sql an expression tree. Using query syntax that's what happens (by compiler magic)
from p in pages
let rating = (from r in db.rating
where r.PageId == p.PageId
select r.Value).Sum()
let comments = (from c in db.Comments
where c.PageId == p.PageId
select 1).Count()
let timedecayfactor = Math.Exp(-(p.totalhoursago))
orderby (rating + comments)*timedecayfactor descending
select p;
I haven't actually tried this against a database, there's simply too many unknown based on your code, so there might still be stuff that can't be translated.
The error occurs because LINQ cannot convert custom code/methods into SQL. It can convert only Expression<Func<>> objects into SQL.
In your case, you have a complex logic to do while sorting, so it might make sense to do it using a Stored Procedure, if you want to do it in the DB Layer.
Or load all the objects into main memory, and run the calculate sort method on the objects in memory
EDIT :
I don't have the code, so Describing in english is the best I can do :
Have table with structure capable of temporarily storing all the current users data.
Have a calculated field in the Pages table that holds the value calculated from all the non-user specific fields
Write a stored procedure that uses values from these two sources (temp table and calc field) to actually do the sort.
Delete the temp table as the last part in the stored proc
You can read about stored procs here and here
var comments = db.comments.Where(...);
Pages.OrderByDescending(p=>(db.rating.Sum(r=>r.rate) + comments.Count()) * Math.Exp(-p.totalhoursago))
Linq is expecting Calculatesort to return a "queryable" expression in order to generate its own SQL.
In can embed your 'calculatesort' method in this lambda expression. (I replaced your variables with constants in order to compile in my environment)
public static void ComplexSort(IQueryable<string> Pages)
{
Pages.OrderByDescending(p =>
{
int rating = 99;//(from r in db.rating select r). sum();
int comments = 33;//query database for comments;
double timedecayfactor = Math.Exp(88);
return (rating + comments) * timedecayfactor;
});
}
Also, you can even try to run that in parallel (since .net 4.0) replacing the first line with
Pages.AsParallel().OrderByDescending(p =>
Yes, counting previous answers: the LINQ to SQL doesn't know how to translate CalculateSort method. You should convert LINQ to SQL to ordinary LINQ to Object before using custom method.
Try to use this in the way you call the CalculateSort by adding AsEnumerable:
Pages.AsEnumerable().OrderByDescending(o => CalculateSort(o.page));
Then you're fine to use the OrderByDescending extension method.
UPDATE:
LINQ to SQL will always translate the query in the code into Expression tree. It's quite almost the same concept as AST of any programming language. These expression trees are further translated into SQL expression specific to SQL Server's SQL, because currently LINQ to SQL only supports SQL Server 2005 and 2008.
I'm trying to create a LINQ provider. I'm using the guide LINQ: Building an IQueryable provider series, and I have added the code up to LINQ: Building an IQueryable Provider - Part IV.
I am getting a feel of how it is working and the idea behind it. Now I'm stuck on a problem, which isn't a code problem but more about the understanding.
I'm firing off this statement:
QueryProvider provider = new DbQueryProvider();
Query<Customer> customers = new Query<Customer>(provider);
int i = 3;
var newLinqCustomer = customers.Select(c => new { c.Id, c.Name}).Where(p => p.Id == 2 | p.Id == i).ToList();
Somehow the code, or expression, knows that the Where comes before the Select. But how and where?
There is no way in the code that sorts the expression, in fact the ToString() in debug mode, shows that the Select comes before the Where.
I was trying to make the code fail. Normal I did the Where first and then the Select.
So how does the expression sort this? I have not done any change to the code in the guide.
The expressions are "interpreted", "translated" or "executed" in the order you write them - so the Where does not come before the Select
If you execute:
var newLinqCustomer = customers.Select(c => new { c.Id, c.Name})
.Where(p => p.Id == 2 | p.Id == i).ToList();
Then the Where is executed on the IEnumerable or IQueryable of the anonymous type.
If you execute:
var newLinqCustomer = customers.Where(p => p.Id == 2 | p.Id == i)
.Select(c => new { c.Id, c.Name}).ToList();
Then the Where is executed on the IEnumerable or IQueryable of the customer type.
The only thing I can think of is that maybe you're seeing some generated SQL where the SELECT and WHERE have been reordered? In which case I'd guess that there's an optimisation step somewhere in the (e.g.) LINQ to SQL provider that takes SELECT Id, Name FROM (SELECT Id, Name FROM Customer WHERE Id=2 || Id=#i) and converts it to SELECT Id, Name FROM Customer WHERE Id=2 || Id=#i - but this must be a provider specific optimisation.
No, in the general case (such as LINQ to Objects) the select will be executed before the where statement. Think of it is a pipeline, your first step is a transformation, the second a filter. Not the other way round, as it would be the case if you wrote Where...Select.
Now, a LINQ Provider has the freedom to walk the expression tree and optimize it as it sees fit. Be aware that you may not change the semantics of the expression though. This means that a smart LINQ to SQL provider would try to pull as many where clauses it can into the SQL query to reduce the amount of data travelling over the network. However, keep the example from Stuart in mind: Not all query providers are clever, partly because ruling out side effects from query reordering is not as easy as it seems.
As I understand it when I use LINQ extension methods (with lambda expression syntax) on IQueryable that is in the fact instance of ObjectSet they are translated into LINQ to SQL queries. What I mean is that command
IQueryable<User> users = db.UserSet;
var users32YearsOld = users.Where(user => user.Age == 32);
is exactly the same as
IQueryable<User> users = db.UserSet;
var users32YearsOld = from user in users where user.Age == 32 select user;
So non of them hits database until they users32YearsOld are enumerated in for cycle or such. (Hope I understand this correctly).
But what is going to happen if I don't mask that ObjectSet as IQueryable but as IEnumerable ? So if the type of it is IEnumerable ?
IEnumerable<User> users = db.UserSet;
var users32YearsOld = users.Where(user => user.Age == 32);
Is it going to hit the database immediately (if so then when ? Right on the first line or on the second) ? Or is it going to behave as the previous command that is will not hit database until users32YearsOld is enumerated ? Will there be any difference if I use following instead ?
IEnumerable<User> users = db.UserSet;
var users32YearsOld = from user in users where user.Age == 32 select user;
Thank you
Undeleting my answer because I just tested it and it works exactly as I described:
None of mentioned queries will hit the database because there was no enumeration. The difference between IQueryable query and IEnumerable query is that in the case of IQueryable the filtering will be executed on the database server whereas in the case of IEnumerable all objects will be loaded from the database to a memory and the filtering will be done in .NET code (linq-to-objects). As you can imagine that is usually performance killer.
I wrote simple test in my project:
[TestMethod]
public void Test()
{
// ObjectQuery<Department> converted ot IEnumerable<Department>
IEnumerable<Department> departmetns = CreateUnitOfWork().GetRepository<Department>().GetQuery();
// No query execution here - Enumerable has also deffered exection
var query = departmetns.Where(d => d.Id == 1);
// Queries ALL DEPARTMENTS here and executes First on the retrieved result set
var result = departmetns.First();
}
Here's a simple explanation:
IEnumerable<User> usersEnumerable = db.UserSet;
IQueryable<User> usersQueryable = db.UserSet;
var users = /* one of usersEnumerable or usersQueryable */;
var age32StartsWithG = users.Where(user => user.Age == 32)
.Where(user => user.Name.StartsWith("G");
If you use usersEnumerable, when you start enumerating over it, the two Wheres will be run in sequence; first the ObjectSet will fetch all objects and the objects will be filtered down to those of age 32, and then these will be filtered down to those whose name starts with G.
If you use usersQueryable, the two Wheres will return new objects which will accumulate the selection criteria, and when you start enumerating over it, it will translate all of the criteria to a query. This makes a noticeable difference.
Normally, you don't need to worry, since you'll either say var users or ObjectSet users when you declare your variable, which means that C# will know that you are interested in invoking the most specific method that's available on ObjectSet, and the IQueryable query operator methods (Where, Select, ...) are more specific than the IEnumerable methods. However, if you pass around objects to methods that take IEnumerable parameters, they might end up invoking the wrong methods.
You can also use the way this works to your advantage by using the AsEnumerable() and AsQueryable() methods to start using the other approach. For example, var groupedPeople = users.Where(user => user.Age > 15).AsEnumerable().GroupBy(user => user.Age); will pull down the right users with a database query and then group the objects locally.
As other have said, it's worth repeating that nothing happens until you start enumerating the sequences (with foreach). You should now understand why it couldn't be any other way: if all results were retrieved at once, you couldn't build up queries to be translated into a more efficient query (like an SQL query).
The difference is that IEnumerable performs the filters if they are more, one at a time. For example, from 100 elements will output 20 by the first filter and then will filter second time the needed 10. It will make one query to the database but will download unnecessary data. Using IQueryable will download again with one query but only the required 10 items. The following link gives some excellent examples of how these queries work:
https://filteredcode.wordpress.com/2016/04/29/ienumerable-vs-iqueryable-part-2-practical-questions/
You are correct about IQueryable. As for IEnumerable, it would hit the database immediately upon assigning IEnumerable user.
There is no real difference between using Linq Extensions vs. syntax in the example you provided. Sometimes one or the other will be more convenient (see linq-extension-methods-vs-linq-syntax), but IMO it's more about personal preference.
Let's say I have a the following query:
int x = 5;
var result = from p in db.products
where p.CategoryId == x
select p;
int count = result.Count();
List<product> products = result.ToList();
That's what I have now. But aditionally I need to have a DataReader from result:
// that's what I need:
var reader = ConvertSubSonicLinqQueryToDataReader(result);
How can I convert the linq statement to something I can work with?
A DataReader or a DbCommand or even plain sql with a list of paramters.
I know SubSonic can do that (since it translates the query to plain sql anyway) but I haven't found anything in the public accessible methods yet.
Any suggestions?
Converting the LINQ query is the wrong approach. LINQ returns results at a level of abstraction higher than a DataReader works at.
There's also the issue of deferred execution so your LINQ query may not be executed as a single SQL statement anyway.
Rater than use a LINQ statement why not just use an SqlQuery instead?
var qry = new Select().From(Product.Schema).Where(Product.CategoryIdColumn).IsEqualTo(x);
return qry.ExecuteReader();
Edit:
Just seen you're using SubSonic3 (not 2 as the above code would be for) but the potential misuse of LINQ and duplication of work still stands.
The code that creates object from the DataReader can be found in DbDataProvider.ToEnumerable. It's called from DbQueryProvider's Execute method (line 227). The best way to "understand" the LINQ magic is to place some breakpoints on DbQueryProvider methods.
I would like to run a LINQ query like this:
var words = from p in db.Words
where p.Document.Corpus.Name == corpus
//where LevenshteinDistance(p.Text.ToCharArray(), word.ToCharArray()) < threshold
select p;
But if I place the "LevenshteinDistance" function in there it will generate an error:
NotSupportedException: Method 'Char[] ToCharArray()' has no supported translation to SQL.
Is there a correct way to do this?
LINQ to SQL tries to translate the entire expression into SQL. If you want to run your distance function on SQL Server, you'll need to define a SQL Server UDF and map a custom CLR method to that. If you're content to get all the results and then filter client-side on the distance function, use AsEnumerable():
var words = (from p in db.Words
where p.Document.Corpus.Name == corpus)
select p)
.AsEnumerable()
.Where(p => /* distance function */ < threshold);
The AsEnumerable forces LINQ to SQL to enumerate the query results, allowing the remainder of the query to be resolved using LINQ to Objects and your distance delegate (instead of being translated to SQL).