I can't wrap my brain around this, I don't get why this behaves likes this.
I have made an OData query that returns a collection of 175 items
from c in navClient.Item where c.No != "" && c.Last_Date_Modified > dtfrom select c
navClient.Item is a System.Data.Services.Client.DataServiceQuery
However I want to take first 100 items from the collection using .Take(100) but get 0 items. It isn't until I do .Take(121) I get my first item, which is the first item in the collection, .Take(122) returns the first two items and so on.
Any idea why this behaves like this?
Edit:
Doing ToList first then Take(100) returns first 100 as expected.
My only theory right know is that the table I'm running my query against is just a temp table that is out of sync with the database.
You have described what looks like an issue in the server implementation that handles your request. This behaviour would occur if the .Take(100) was evaluated before the filter criteria. This issues can easily occur in the client or the server implementations. The .ToList() works because it brings back the entire collection to the client and then applies the filter criteria with standard linq to objects evaluation.
While not the cause today...
as a general rule, whenever you specify .Take() or .Skip() you should have also specified the explicit .OrderBy(), when the order is ambiguous in a limiting or paging query so too can be the results.
There are 3 common levels at play here:
Client Query Resolution
Your linq query is first resolved into a URL that will be used to make the request to the API, you should first check that the URL is correctly constructed.
You should be expecting a URL similar to:
/odata/Item?$top=100&$filter=No ne '' AND Last_Date_Modified gt '2020-01-20T17:24:21.3605918+11:00'
You can inspect the URL using the .RequestUri property on the query, try something like the following to capture it:
var query = from c in navClient.Item
where c.No != "" && c.Last_Date_Modified > dtfrom
select c;
query = query.Take(100);
System.Diagnostics.Debug.WriteLine(((DataServiceQuery<Item>)query).RequestUri);
The URL needs to through both the $Top and the $filter query options to the server.
API Controller
The client is expecting the controller to handle or pass through both the $filter criteria and the $top expression to the underlying data store.
Many default EF or Unit of work based OData implementations would simply return a proper IQueryable<T> expression using deferred execution. However if the backend store uses a repository pattern or the controller is otherwise constructing the dataset then that code may need to explicitly handle the $filter criteria before applying the $top
There are many simple OData controller examples out there that have a mocked List<T> or otherwise IEnumerable<T> backend, the red flag is that if in the controller code you see a .ToList() or a .AsQueryable() on the main expression, or it returns an IEnumerable<T> response, then it indicates that the expression is not using deferred execution at all. This means that it most likely needs to manage the query options manually.
OData Query Expression Resolver
Especially in the .Net Implementation, the EnableQueryAttribute by default applies the query options to the final IQueryable output immediately before or effectively as part of the serialization process.
So the $top and $filter (and $orderby,$select,$expand) will be applied again even if the controller method has already evaluated these options. This isn't normally a problem, its a validation fail-safe and means that you do not strictly have to return IQueryable<T> from your controller methods at all.
Due to this, if our backend supports deferred IQueryable<T> linq expressions (like DbSet<T> in an EF DbContext) then in the controller implementation we do not usually do anything with the query options, you let the EnableQueryAttribute process it for you.
However, if the controller were to process the paging expression first .Take(100) and not apply the $filter correctly or at all, then the EnableQueryAttribute would apply the criteria to a list that had already been restricted, in your example, perhaps the first 100 items
What is the difference between returning IQueryable<T> vs. IEnumerable<T>, when should one be preferred over the other?
IQueryable<Customer> custs = from c in db.Customers
where c.City == "<City>"
select c;
IEnumerable<Customer> custs = from c in db.Customers
where c.City == "<City>"
select c;
Will both be deferred execution and when should one be preferred over the other?
Yes, both will give you deferred execution.
The difference is that IQueryable<T> is the interface that allows LINQ-to-SQL (LINQ.-to-anything really) to work. So if you further refine your query on an IQueryable<T>, that query will be executed in the database, if possible.
For the IEnumerable<T> case, it will be LINQ-to-object, meaning that all objects matching the original query will have to be loaded into memory from the database.
In code:
IQueryable<Customer> custs = ...;
// Later on...
var goldCustomers = custs.Where(c => c.IsGold);
That code will execute SQL to only select gold customers. The following code, on the other hand, will execute the original query in the database, then filtering out the non-gold customers in the memory:
IEnumerable<Customer> custs = ...;
// Later on...
var goldCustomers = custs.Where(c => c.IsGold);
This is quite an important difference, and working on IQueryable<T> can in many cases save you from returning too many rows from the database. Another prime example is doing paging: If you use Take and Skip on IQueryable, you will only get the number of rows requested; doing that on an IEnumerable<T> will cause all of your rows to be loaded in memory.
The top answer is good but it doesn't mention expression trees which explain "how" the two interfaces differ. Basically, there are two identical sets of LINQ extensions. Where(), Sum(), Count(), FirstOrDefault(), etc all have two versions: one that accepts functions and one that accepts expressions.
The IEnumerable version signature is: Where(Func<Customer, bool> predicate)
The IQueryable version signature is: Where(Expression<Func<Customer, bool>> predicate)
You've probably been using both of those without realizing it because both are called using identical syntax:
e.g. Where(x => x.City == "<City>") works on both IEnumerable and IQueryable
When using Where() on an IEnumerable collection, the compiler passes a compiled function to Where()
When using Where() on an IQueryable collection, the compiler passes an expression tree to Where(). An expression tree is like the reflection system but for code. The compiler converts your code into a data structure that describes what your code does in a format that's easily digestible.
Why bother with this expression tree thing? I just want Where() to filter my data.
The main reason is that both the EF and Linq2SQL ORMs can convert expression trees directly into SQL where your code will execute much faster.
Oh, that sounds like a free performance boost, should I use AsQueryable() all over the place in that case?
No, IQueryable is only useful if the underlying data provider can do something with it. Converting something like a regular List to IQueryable will not give you any benefit.
Yes, both use deferred execution. Let's illustrate the difference using the SQL Server profiler....
When we run the following code:
MarketDevEntities db = new MarketDevEntities();
IEnumerable<WebLog> first = db.WebLogs;
var second = first.Where(c => c.DurationSeconds > 10);
var third = second.Where(c => c.WebLogID > 100);
var result = third.Where(c => c.EmailAddress.Length > 11);
Console.Write(result.First().UserName);
In SQL Server profiler we find a command equal to:
"SELECT * FROM [dbo].[WebLog]"
It approximately takes 90 seconds to run that block of code against a WebLog table which has 1 million records.
So, all table records are loaded into memory as objects, and then with each .Where() it will be another filter in memory against these objects.
When we use IQueryable instead of IEnumerable in the above example (second line):
In SQL Server profiler we find a command equal to:
"SELECT TOP 1 * FROM [dbo].[WebLog] WHERE [DurationSeconds] > 10 AND [WebLogID] > 100 AND LEN([EmailAddress]) > 11"
It approximately takes four seconds to run this block of code using IQueryable.
IQueryable has a property called Expression which stores a tree expression which starts being created when we used the result in our example (which is called deferred execution), and at the end this expression will be converted to an SQL query to run on the database engine.
Both will give you deferred execution, yes.
As for which is preferred over the other, it depends on what your underlying datasource is.
Returning an IEnumerable will automatically force the runtime to use LINQ to Objects to query your collection.
Returning an IQueryable (which implements IEnumerable, by the way) provides the extra functionality to translate your query into something that might perform better on the underlying source (LINQ to SQL, LINQ to XML, etc.).
A lot has been said previously, but back to the roots, in a more technical way:
IEnumerable is a collection of objects in memory that you can enumerate - an in-memory sequence that makes it possible to iterate through (makes it way easy for within foreach loop, though you can go with IEnumerator only). They reside in the memory as is.
IQueryable is an expression tree that will get translated into something else at some point with ability to enumerate over the final outcome. I guess this is what confuses most people.
They obviously have different connotations.
IQueryable represents an expression tree (a query, simply) that will be translated to something else by the underlying query provider as soon as release APIs are called, like LINQ aggregate functions (Sum, Count, etc.) or ToList[Array, Dictionary,...]. And IQueryable objects also implement IEnumerable, IEnumerable<T> so that if they represent a query the result of that query could be iterated. It means IQueryable don't have to be queries only. The right term is they are expression trees.
Now how those expressions are executed and what they turn to is all up to so called query providers (expression executors we can think them of).
In the Entity Framework world (which is that mystical underlying data source provider, or the query provider) IQueryable expressions are translated into native T-SQL queries. Nhibernate does similar things with them. You can write your own one following the concepts pretty well described in LINQ: Building an IQueryable Provider link, for example, and you might want to have a custom querying API for your product store provider service.
So basically, IQueryable objects are getting constructed all the way long until we explicitly release them and tell the system to rewrite them into SQL or whatever and send down the execution chain for onward processing.
As if to deferred execution it's a LINQ feature to hold up the expression tree scheme in the memory and send it into the execution only on demand, whenever certain APIs are called against the sequence (the same Count, ToList, etc.).
The proper usage of both heavily depends on the tasks you're facing for the specific case. For the well-known repository pattern I personally opt for returning IList, that is IEnumerable over Lists (indexers and the like). So it is my advice to use IQueryable only within repositories and IEnumerable anywhere else in the code. Not saying about the testability concerns that IQueryable breaks down and ruins the separation of concerns principle. If you return an expression from within repositories consumers may play with the persistence layer as they would wish.
A little addition to the mess :) (from a discussion in the comments))
None of them are objects in memory since they're not real types per se, they're markers of a type - if you want to go that deep. But it makes sense (and that's why even MSDN put it this way) to think of IEnumerables as in-memory collections whereas IQueryables as expression trees. The point is that the IQueryable interface inherits the IEnumerable interface so that if it represents a query, the results of that query can be enumerated. Enumeration causes the expression tree associated with an IQueryable object to be executed.
So, in fact, you can't really call any IEnumerable member without having the object in the memory. It will get in there if you do, anyways, if it's not empty. IQueryables are just queries, not the data.
In general terms I would recommend the following:
Return IQueryable<T> if you want to enable the developer using your method to refine the query you return before executing.
Return IEnumerable if you want to transport a set of Objects to enumerate over.
Imagine an IQueryable as that what it is - a "query" for data (which you can refine if you want to). An IEnumerable is a set of objects (which has already been received or was created) over which you can enumerate.
In general you want to preserve the original static type of the query until it matters.
For this reason, you can define your variable as 'var' instead of either IQueryable<> or IEnumerable<> and you will know that you are not changing the type.
If you start out with an IQueryable<>, you typically want to keep it as an IQueryable<> until there is some compelling reason to change it. The reason for this is that you want to give the query processor as much information as possible. For example, if you're only going to use 10 results (you've called Take(10)) then you want SQL Server to know about that so that it can optimize its query plans and send you only the data you'll use.
A compelling reason to change the type from IQueryable<> to IEnumerable<> might be that you are calling some extension function that the implementation of IQueryable<> in your particular object either cannot handle or handles inefficiently. In that case, you might wish to convert the type to IEnumerable<> (by assigning to a variable of type IEnumerable<> or by using the AsEnumerable extension method for example) so that the extension functions you call end up being the ones in the Enumerable class instead of the Queryable class.
There is a blog post with brief source code sample about how misuse of IEnumerable<T> can dramatically impact LINQ query performance: Entity Framework: IQueryable vs. IEnumerable.
If we dig deeper and look into the sources, we can see that there are obviously different extension methods are perfomed for IEnumerable<T>:
// Type: System.Linq.Enumerable
// Assembly: System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
// Assembly location: C:\Windows\Microsoft.NET\Framework\v4.0.30319\System.Core.dll
public static class Enumerable
{
public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
return (IEnumerable<TSource>)
new Enumerable.WhereEnumerableIterator<TSource>(source, predicate);
}
}
and IQueryable<T>:
// Type: System.Linq.Queryable
// Assembly: System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
// Assembly location: C:\Windows\Microsoft.NET\Framework\v4.0.30319\System.Core.dll
public static class Queryable
{
public static IQueryable<TSource> Where<TSource>(
this IQueryable<TSource> source,
Expression<Func<TSource, bool>> predicate)
{
return source.Provider.CreateQuery<TSource>(
Expression.Call(
null,
((MethodInfo) MethodBase.GetCurrentMethod()).MakeGenericMethod(
new Type[] { typeof(TSource) }),
new Expression[]
{ source.Expression, Expression.Quote(predicate) }));
}
}
The first one returns enumerable iterator, and the second one creates query through the query provider, specified in IQueryable source.
The main difference between “IEnumerable” and “IQueryable” is about where the filter logic is executed. One executes on the client side (in memory) and the other executes on the database.
For example, we can consider an example where we have 10,000 records for a user in our database and let's say only 900 out which are active users, so in this case if we use “IEnumerable” then first it loads all 10,000 records in memory and then applies the IsActive filter on it which eventually returns the 900 active users.
While on the other hand on the same case if we use “IQueryable” it will directly apply the IsActive filter on the database which directly from there will return the 900 active users.
I would like to clarify a few things due to seemingly conflicting responses (mostly surrounding IEnumerable).
(1) IQueryable extends the IEnumerable interface. (You can send an IQueryable to something which expects IEnumerable without error.)
(2) Both IQueryable and IEnumerable LINQ attempt lazy loading when iterating over the result set. (Note that implementation can be seen in interface extension methods for each type.)
In other words, IEnumerables are not exclusively "in-memory". IQueryables are not always executed on the database. IEnumerable must load things into memory (once retrieved, possibly lazily) because it has no abstract data provider. IQueryables rely on an abstract provider (like LINQ-to-SQL), although this could also be the .NET in-memory provider.
Sample use case
(a) Retrieve list of records as IQueryable from EF context. (No records are in-memory.)
(b) Pass the IQueryable to a view whose model is IEnumerable. (Valid. IQueryable extends IEnumerable.)
(c) Iterate over and access the data set's records, child entities and properties from the view. (May cause exceptions!)
Possible Issues
(1) The IEnumerable attempts lazy loading and your data context is expired. Exception thrown because provider is no longer available.
(2) Entity Framework entity proxies are enabled (the default), and you attempt to access a related (virtual) object with an expired data context. Same as (1).
(3) Multiple Active Result Sets (MARS). If you are iterating over the IEnumerable in a foreach( var record in resultSet ) block and simultaneously attempt to access record.childEntity.childProperty, you may end up with MARS due to lazy loading of both the data set and the relational entity. This will cause an exception if it is not enabled in your connection string.
Solution
I have found that enabling MARS in the connection string works unreliably. I suggest you avoid MARS unless it is well-understood and explicitly desired.
Execute the query and store results by invoking resultList = resultSet.ToList() This seems to be the most straightforward way of ensuring your entities are in-memory.
In cases where the you are accessing related entities, you may still require a data context. Either that, or you can disable entity proxies and explicitly Include related entities from your DbSet.
I recently ran into an issue with IEnumerable v. IQueryable. The algorithm being used first performed an IQueryable query to obtain a set of results. These were then passed to a foreach loop, with the items instantiated as an Entity Framework (EF) class. This EF class was then used in the from clause of a Linq to Entity query, causing the result to be IEnumerable.
I'm fairly new to EF and Linq for Entities, so it took a while to figure out what the bottleneck was. Using MiniProfiling, I found the query and then converted all of the individual operations to a single IQueryable Linq for Entities query. The IEnumerable took 15 seconds and the IQueryable took 0.5 seconds to execute. There were three tables involved and, after reading this, I believe that the IEnumerable query was actually forming a three table cross-product and filtering the results.
Try to use IQueryables as a rule-of-thumb and profile your work to make your changes measurable.
We can use both for the same way, and they are only different in the performance.
IQueryable only executes against the database in an efficient way. It means that it creates an entire select query and only gets the related records.
For example, we want to take the top 10 customers whose name start with ‘Nimal’. In this case the select query will be generated as select top 10 * from Customer where name like ‘Nimal%’.
But if we used IEnumerable, the query would be like select * from Customer where name like ‘Nimal%’ and the top ten will be filtered at the C# coding level (it gets all the customer records from the database and passes them into C#).
In addition to first 2 really good answers (by driis & by Jacob) :
IEnumerable
interface is in the System.Collections namespace.
The IEnumerable object represents a set of data in memory and can move on this data only forward. The query represented by the IEnumerable object is executed immediately and completely, so the application receives data quickly.
When the query is executed, IEnumerable loads all the data, and if we need to filter it, the filtering itself is done on the client side.
IQueryable interface is located in the System.Linq namespace.
The IQueryable object provides remote access to the database and allows you to navigate through the data either in a direct order from beginning to end, or in the reverse order. In the process of creating a query, the returned object is IQueryable, the query is optimized. As a result, less memory is consumed during its execution, less network bandwidth, but at the same time it can be processed slightly more slowly than a query that returns an IEnumerable object.
What to choose?
If you need the entire set of returned data, then it's better to use IEnumerable, which provides the maximum speed.
If you DO NOT need the entire set of returned data, but only some filtered data, then it's better to use IQueryable.
In addition to the above, it's interesting to note that you can get exceptions if you use IQueryable instead of IEnumerable:
The following works fine if products is an IEnumerable:
products.Skip(-4);
However if products is an IQueryable and it's trying to access records from a DB table, then you'll get this error:
The offset specified in a OFFSET clause may not be negative.
This is because the following query was constructed:
SELECT [p].[ProductId]
FROM [Products] AS [p]
ORDER BY (SELECT 1)
OFFSET #__p_0 ROWS
and OFFSET can't have a negative value.
Usually the distinction between LINQ to SQL and LINQ to Objects isn't much of an issue, but how can I determine which is happening?
It would be useful to know when writing the code, but I fear one can only be sure at run time sometimes.
It's not micro optimization to make the distinction between Linq-To-Sql and Linq-To-Objects. The latter requires all data to be loaded into memory before you start filtering it. Of course, that can be a major issue.
Most LINQ methods are using deferred execution, which means that it's just building the query but it's not yet executed (like Select or Where). Few others are executing the query and materialize the result into an in-memory collection (like ToLIst or ToArray). If you use AsEnumerable you are also using Linq-To-Objects and no SQL is generated for the parts after it, which means that the data must be loaded into memory (yet still using deferred execution).
So consider the following two queries. The first selects and filters in the database:
var queryLondonCustomers = from cust in db.customers
where cust.City == "London"
select cust;
whereas the second selects all and filters via Linq-To-Objects:
var queryLondonCustomers = from cust in db.customers.AsEnumerable()
where cust.City == "London"
select cust;
The latter has one advantage: you can use any .NET method since it doesn't need to be translated to SQL (e.g. !String.IsNullOrWhiteSpace(cust.City)).
If you just get something that is an IEnumerable<T>, you can't be sure if it's actually a query or already an in-memory object. Even the try-cast to IQueryable<T> will not tell you for sure what it actually is because of the AsQueryable-method. Maybe you could try-cast it to a collection type. If the cast succeeds you can be sure that it's already materialized but otherwise it doesn't tell you if it's using Linq-To-Sql or Linq-To-Objects:
bool isMaterialized = queryLondonCustomers as ICollection<Customer> != null;
Related: EF ICollection Vs List Vs IEnumerable Vs IQueryable
The first solution comes into my mind is checking the query provider.
If the query is materialized, which means the data is loaded into memory, EnumerableQuery(T) is used. Otherwise, a special query provider is used, for example, System.Data.Entity.Internal.Linq.DbQueryProvider for entityframework.
var materialized = query
.AsQueryable()
.Provider
.GetType()
.GetGenericTypeDefinition() == typeof(EnumerableQuery<>);
However the above are ideal cases because someone can implement a custom query provider behaves like EnumerableQuery.
I had the same question, for different reasons.
Judging purely on your title & initial description (which is why google search brought me here).
Pre compilation, given an instance that implements IQueryable, there's no way to know the implementation behind the interface.
At runtime, you need to check the instance's Provider property like #Danny Chen mentioned.
public enum LinqProvider
{
Linq2SQL, Linq2Objects
}
public static class LinqProviderExtensions
{
public static LinqProvider LinqProvider(this IQueryable query)
{
if (query.Provider.GetType().IsGenericType && query.Provider.GetType().GetGenericTypeDefinition() == typeof(EnumerableQuery<>))
return LinqProvider.Linq2Objects;
if (typeof(ICollection<>).MakeGenericType(query.ElementType).IsAssignableFrom(query.GetType()))
return LinqProvider.Linq2Objects;
return LinqProvider.Linq2SQL;
}
}
In our case, we are adding additional filters dynamically, but ran into issues with different handling of case-sensitivity/nullreference handling on different providers.
Hence, at runtime we had to tweak the filters that we add based on the type of provider, and ended up adding this extension method:
Using EF core in net core 6
To see if the provider is an EF provider, use the following code:
if (queryable.Provider is Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryProvider)
{
// Queryable is backed by EF and is not an in-memory/client-side queryable.
}
One could get the opposite by testing the provider against System.Linq.EnumerableQuery (base type of EnumerableQuery<T> - so you don't have to test generics).
This is useful if you have methods like EF.Functions.Like(...) which can only be executed in the database - and you want to branch to something else in case of client-side execution.
I'm using Microsoft's Entity Framework as an ORM and am wondering how to solve the following problem.
I want to get a number of Product objects from the Products collection where the Product.StartDate is greater than today. (This is a simplified version of the whole problem.)
I currently use:
var query = dbContext.Products.Where(p => p.StartDate > DateTime.Now);
When this is executed, after using ToList() for example on query, it works and the SQL created is effectively:
SELECT * FROM Product WHERE StartDate > (GetDate());
However, I want to move the predicate to a function for better maintainability, so I tried this:
private Func<Product, bool> GetFilter()
{
Func<Product, bool> filter = p => p.StartDate > DateTime.Now;
return filter;
}
var query = dbContext.Products.Where(GetFilter());
This also works from a code point of view insofar as it returns the same Product set but this time the SQL created is analogous to:
SELECT * FROM Product;
The filter is moved from the SQL Server to the client making it much less efficient.
So my questions are:
Why is this happening, why does the LINQ parser treat these two formats so differently?
What can I do to take advantage of having the filter separate but having it executed on the server?
You need to use an Expression<Func<Product, bool>> in order for it to work like you intend. A plain Func<Product, bool> tells LINQ that you want it to run the Where in MSIL in your program, not in SQL. That's why the SQL is pulling in the whole table, then your .NET code is running the predicate on the entire table.
You are returning a Func, but to inject the predicate into the SQL, LINQ requires an expression tree. It should work if you change the return type of your method (and of your local variable, of course) to Expression<Func<Product, bool>>.
Since in second case filter func maybe arbitrary LINQ to EF can't parse you filter to SQL and has to resolve it on client side.
As far as I understand the only thing LINQ supports, which Scala currently doesn't with its collection library, is the integration with a SQL Database.
As far as I understand LINQ can "accumulate" various operations and can give "the whole" statement to the database when queried to process it there, preventing that a simple SELECT first copies the whole table into data structures of the VM.
If I'm wrong, I would be happy to be corrected.
If not, what is necessary to support the same in Scala?
Wouldn't it possible to write a library which implements the collection interface, but doesn't have any data structures backing it but a String which gets assembled with following collection into the required Database statement?
Or am I completely wrong with my observations?
As the author of ScalaQuery, I don't have much to add to Stilgar's explanation. The part of LINQ which is missing in Scala is indeed the expression trees. That is the reason why ScalaQuery performs all its computations on Column and Table types instead of the basic types of those entities.
You declare a table as a Table object with a projection (tuple) of its columns, e.g.:
class User extends Table[(Int, String)] {
def id = column[Int]("id", O.PrimaryKey, O.AutoInc)
def name = column[String]("name")
def * = id ~ name
}
User.id and User.name are now of type Column[Int] and Column[String] respectively. All computations are performed in the Query monad (which is a more natural representation of database queries than the SQL statements that have to be created from it). Take the following query:
val q = for(u <- User if u.id < 5) yield u.name
After some implicit conversions and desugaring this translates to:
val q:Query[String] =
Query[User.type](User).filter(u => u.id < ConstColumn[Int](5)).map(u => u.name)
The filter and map methods do not have to inspect their arguments as expression trees in order to build the query, they just run them. As you can see from the types, what looks superficially like "u.id:Int < 5:Int" is actually "u.id:Column[Int] < u.id:Column[Int]". Running this expression results in a query AST like Operator.Relational("<", NamedColumn("user", "id"), ConstColumn(5)). Similarly, the "filter" and "map" methods of the Query monad do not actually perform filtering and mapping but instead build up an AST that describes these operations.
The QueryBuilder then uses this AST to construct the actual SQL statement for the database (with a DBMS-specific syntax).
An alternative approach has been taken by ScalaQL which uses a compiler plugin to work directly with expression trees, ensure that they only contain the language subset which is allowed in database queries, and construct the queries statically.
I should mention that Scala does have experimental support for expression trees. If you pass an anonymous function as an argument to a method expecting a parameter of type scala.reflect.Code[A], you get an AST.
scala> import scala.reflect.Code
import scala.reflect.Code
scala> def codeOf[A](code: Code[A]) = code
codeOf: [A](code:scala.reflect.Code[A])scala.reflect.Code[A]
scala> codeOf((x: Int) => x * x).tree
res8: scala.reflect.Tree=Function(List(LocalValue(NoSymbol,x,PrefixedType(ThisType(Class(scala)),Class(scala.Int)))),Apply(Select(Ident(LocalValue(NoSymbol,x,PrefixedType(ThisType(Class(scala)),Class(scala.Int)))),Method(scala.Int.$times,MethodType(List(LocalValue(NoSymbol,x$1,PrefixedType(ThisType(Class(scala)),Class(scala.Int)))),PrefixedType(ThisType(Class(scala)),Class(scala.Int))))),List(Ident(LocalValue(NoSymbol,x,PrefixedType(ThisType(Class(scala)),Class(scala.Int)))))))
This has been used in the bytecode generation library 'Mnemonics', which was presented by its author Johannes Rudolph at Scala Days 2010.
With LINQ the compiler checks to see if the lambda expression is compiled to IEnumerable or to IQueryable. The first works like Scala collections. The second compiles the expression to an expression tree (i.e. data structure). The power of LINQ is that the compiler itself can translate the lambdas to expression trees. You can write a library that builds expression trees with interface similar to what you have for collection but how are you goning to make the compiler build data structures (instead of JVM code) from lambdas?
That being said I am not sure what Scala provides in this respect. Maybe it is possible to build data structures out of lambdas in Scala but in any case I believe you need a similar feature in the compiler to build support for databases. Mind you that databases are not the only underlying data source that you can build providers for. There are numerous LINQ providers to stuff like Active Directory or the Ebay API for example.
Edit:
Why there cannot be just an API?
In order to make queries you do not only use the API methods (filter, Where, etc...) but you also use lambda expressions as arguments of these methods .Where(x => x > 3) (C# LINQ). The compilers translate the lambdas to bytecode. The API needs to build data structures (expression trees) so that you can translate the data structure to the underlying data source. Basically you need the compiler to do this for you.
Disclaimer 1:
Maybe (just maybe) there is some way to create proxy objects that execute the lambdas but overload the operators to produce data structures. This would result in slightly worse performance than the actual LINQ (runtime vs compile time). I am not sure if such a library is possible. Maybe the ScalaQuery library uses similar approach.
Disclaimer 2:
Maybe the Scala language actually can provide the lambdas as an inspectable objects so that you can retrieve the expression tree. This would make the lambda feature in Scala equivalent to the one in C#. Maybe the ScalaQuery library uses this hypothetical feature.
Edit 2:
I did a bit of digging. It seems like ScalaQuery uses the library approach and overloads a bunch of operators to produce the trees at runtime. I am not entirely sure about the details because I am not familiar with Scala terminology and have hard time reading the complex Scala code in the article:
http://szeiger.de/blog/2008/12/21/a-type-safe-database-query-dsl-for-scala/
Like every object which can be used in or returned from a query, a table is parameterized with the type of the values it represents. This is always a tuple of individual column types, in our case Integers and Strings (note the use of java.lang.Integer instead of Int; more about that later). In this respect SQuery (as I’ve named it for now) is closer to HaskellDB than to LINQ because Scala (like most languages) does not give you access to an expression’s AST at runtime. In LINQ you can write queries using the real types of the values and columns in your database and have the query expression’s AST translated to SQL at runtime. Without this option we have to use meta-objects like Table and Column to build our own AST from these.
Very cool library indeed. I hope in the future it gets the love it deserves and becomes real production ready tool.
You probably want something like http://scalaquery.org/. It does exactly what #Stilgar answer suggests, except it's only SQL.
check out http://squeryl.org/