How to build () => new { x.prop} lambda expression dynamically? - c#

How to dynamically create the below linq expression.
IQueryable abc = QueryData.Select(a => new { a, TempData = a.customer.Select(b => b.OtherAddress).ToList()[0] }).OrderBy(a => a.TempData).Select(a => a.a);
public class Orders
{
public long OrderID { get; set; }
public string CustomerID { get; set; }
public int EmployeeID { get; set; }
public double Freight { get; set; }
public string ShipCountry { get; set; }
public string ShipCity { get; set; }
public Customer[] customer {get; set;}
}
public class Customer
{
public string OtherAddress { get; set; }
public int CustNum { get; set; }
}
Actual data:
List<Orders> order = new List<Orders>();
Customer[] cs = { new Customer { CustNum = 5, OtherAddress = "Hello" }, new
Customer { CustNum = 986, OtherAddress = "Other" } };
Customer[] cso = { new Customer { OtherAddress = "T", CustNum = 5 }, new
Customer { CustNum = 777, OtherAddress = "other" } };
order.Add(new Orders(code + 1, "ALFKI", i + 0, 2.3 * i, "Mumbari", "Berlin", cs));
order.Add(new Orders(code + 2, "ANATR", i + 2, 3.3 * i, "Sydney", "Madrid", cso));
order.Add(new Orders(code + 3, "ANTON", i + 1, 4.3 * i, "NY", "Cholchester", cs));
order.Add(new Orders(code + 4, "BLONP", i + 3, 5.3 * i, "LA", "Marseille", cso));
order.Add(new Orders(code + 5, "BOLID", i + 4, 6.3 * i, "Cochin", "Tsawassen", cs));
public Orders(long OrderId, string CustomerId, int EmployeeId, double Freight, string ShipCountry, string ShipCity, Customer[] Customer = null)
{
this.OrderID = OrderId;
this.CustomerID = CustomerId;
this.EmployeeID = EmployeeId;
this.Freight = Freight;
this.ShipCountry = ShipCountry;
this.ShipCity = ShipCity;
this.customer = Customer;
}
If i sort the OtherAddress field 0th index means Customer field only sorted. I need to sort the whole order data based on OtherAddress field.
I have tried the below way:
private static IQueryable PerformComplexDataOperation<T>(this IQueryable<T> dataSource, string select)
{
string[] selectArr = select.Split('.');
ParameterExpression param = Expression.Parameter(typeof(T), "a");
Expression property = param;
for (int i = 0; i < selectArr.Length; i++)
{
int n;
if (int.TryParse(selectArr[i + 1], out n))
{
int index = Convert.ToInt16(selectArr[i + 1]);
property = Expression.PropertyOrField(Expression.ArrayIndex(Expression.PropertyOrField(property, selectArr[i]), Expression.Constant(index)), selectArr[i + 2]);
i = i + 2;
}
else property = Expression.PropertyOrField(property, selectArr[i]);
}
var TempData = dataSource.Select(Expression.Lambda<Func<T, object>>(property, param));
IQueryable<object> data = dataSource.Select(a => new { a, TempData = property});// Expression.Lambda<Func<T, object>>(property, param) });
return data;
}
Method call : PerformComplexDataOperation(datasource, "customer.0.OtherAddress")
I can get the value from this line : var TempData = dataSource.Select(Expression.Lambda>(property, param));
But i can't get the values in dataSource.Select(a => new { a, TempData = property});
It is working when we use the below code :
var TempData = dataSource.Select(Expression.Lambda<Func<T, object>>(property, param)).ToList();
IQueryable<object> data = dataSource.Select((a, i) => new { a, TempData = TempData[i] });
Is it proper solution ?

XY problem?
This feels like it's a case of the XY problem. Your solution is contrived (no offense intended), and the problem you're trying to solve is not apparent by observing your proposed solution.
However, I do think there is technical merit to your question when I read the intention of your code as opposed to your described intention.
Redundant steps
IQueryable abc = QueryData
.Select(a => new {
a,
TempData = a.customer.Select(b => b.OtherAddress).ToList()[0] })
.OrderBy(a => a.TempData)
.Select(a => a.a);
First of all, when you inline this into a single chained command, TempData becomes a redundant step. You could simply shift the first TempData logic (from the first Select) directly into the OrderBy lambda:
IQueryable abc = QueryData
.OrderBy(a => a.customer.Select(b => b.OtherAddress).ToList()[0])
.AsQueryable();
As you can see, this also means that you no longer need the second Select (since it existed only to undo the earlier Select)
Parametrization and method abstraction
You mentioned you're looking for a usage similar to:
PerformComplexDataOperation(datasource, "customer.0.OtherAddress")
However, this doesn't quite make sense, since you've defined an extension method:
private static IQueryable PerformComplexDataOperation<T>(this IQueryable<T> dataSource, string select)
I think you need to reconsider your intended usage, and also the method as it is currently defined.
Minor note, the return type of the method should be IQueryable<T> instead of IQueryable. Otherwise, you lose the generic type definition that LINQ tends to rely on.
Based on the method signature, your expected usage should be myData = myData.PerformComplexDataOperation("customer.0.OtherAddress").
Strings are easy hacks to allow you to circumvent an otherwise strongly typed system. While your strign usage is technically functional, it is non-idiomatic and it opens the door to unreadable and/or bad code.
Using strings leads to a contrived string parsing logic. Look at your method definition, and count how many lines are there simply to parse the string and translate that into actual code again.
Strings also mean that you get no Intellisense, which can cause unseen bugs further down the line.
So let's not use strings. Let's look back at how I initially rewrote the `OrderBy:
.OrderBy(a => a.customer.Select(b => b.OtherAddress).ToList()[0])
When you consider OrderBy as an ordinary method, no different from any custom method you and I can develop, then you should understand that a => a.customer.Select(b => b.OtherAddress).ToList()[0] is nothing more than a parameter that's being passed.
The type of this parameter is Func<A,B>, where:
A equals the type of your entity. So in this case, A is the same as T in your existing method.
B equals the type of your sorting value.
OrderBy(x => x.MyIntProp) means that B is of type int.
OrderBy(x => x.MyStringProp) means that B is of type string.
OrderBy(x => x.Customer) means that B is of type Customer.
Generally speaking, the type of B doesn't matter for you (since it will only be used by LINQ's internal ordering method).
Let's look at a very simple extension method that uses a parameter for its OrderBy:
public static IQueryable<A> OrderData<A, B>(this IQueryable<A> data, Func<A, B> orderbyClause)
{
return data
.OrderBy(orderbyClause)
.AsQueryable();
}
Using the method looks like:
IQueryable<MyEntity> myData = GetData(); //assume this returns a correct value
myData = myData.OrderData(x => x.MyIntProperty);
Notice how I did not need to specify either of the generic type arguments when calling the method.
A is already known to be MyEntity, because we're calling the method on an object of type IQueryable<MyEntity>.
B is already known to be an int, since the used lambda method returns a value of type int (from MyIntProperty)
As it stands, my example method is just a boring wrapper that does nothing different from the existing OrderBy method. But you can change the method's logic to suit your needs, and actually make it meaningfully different from the existing OrderBy method.
Your expectations
Your description of your goals makes me think that you're expecting too much.
I need to sort "customer.0.OtherAddress" nested file compared to whole base data. But it sorted only for that field. For this case, I find that field value and stored it to TempData. Then Sorting the TempData field.
i need to sort the parent nodes not an sibling alone. QueryData.Select(a => new { a, TempData = a.customer.Select(b => b.OtherAddress).ToList()[0] }).OrderBy(a => a.TempData).Select(a => a.a); I sorting a original data based on temp data. Then i split the original data alone.
It's not possible to sort an entire nested data structure based on a single OrderBy call. OrderBy only sorts the collection on which you call Orderby, nothing else.
If you have a list of Customer entities, who each have a list of Adress entities, then you are working with many lists (a list of customer and several lists of adresses). OrderBy will only sort the list that you ask it to sort, it will not look for any nested lists.
You mention that your TempData solution works. I actually wrote an entire answer contesting that notion (it should be functionally similar to my suggested alternatives, and it should always order the original list, not any nested list), until I noticed that you've made it work for a very insidious and non-obvious reason:
.Select(a => new {
a,
TempData = a.customer.Select(b => b.OtherAddress).ToList()[0]
})
You are calling .ToList(), which changes how the code behaves. You started off with an IQueryable<>, which means that LINQ was preparing an SQL command to retrieve the data when you enumerate it.
This is the goal of an IQueryable<>. Instead of pulling all the data into memory and then filtering it according to your specifications, it instead constructs a complex SQL query, and will then only need to execute a single (constructed) query.
The execution of that constructed query occurs when you try to access the data (obviously, the data needs to be fetched if you want to access it). A common method of doing so is by enumerating the IQueryable<> into an IEnumerable<>.
This is what you've done in the Select lambda. Instead of asking LINQ to enumerate your list of orders, you've asked it to enumerate every list of addresses from every customer from every order in the list of orders.
But in order to know which adresses need to be enumerated, LINQ must first know which customers it's supposed to get the adresses from. And to find out which customers it needs, it must first figure out which orders you're working with. The only way it can figure all of that out is by enumerating everything.
My initial suggestion, that you should avoid using the TempData solution, is still valid. It's a redundant step that serves no functional purpose. However, the enumeration that also takes place may actually be of use to you here, because it changes LINQ's behavior slightly. You claim that it fixes your problem, so I'm going to take your statement at face value and assume that the slightly different behavior between LINQ-to-SQL and LINQ-to-Entities solves your problem.
You can keep the enumeration and still omit the TempData workaround:
IQueryable abc = QueryData
.OrderBy(a => a.customer.Select(b => b.OtherAddress).ToList()[0])
.AsEnumerable()
.AsQueryable();
Some footnotes:
You can use ToList() instead of AsEnumerable(), the result is the same.
When you use First() or Single(), enumeration will inherently take place, so you don't need to call AsEnumerable() beforehand.
Notice that I cast the result to an IEnumerable<>, but then I immediately re-cast it to IQueryable<>. Once a collection has been enumerated, any further operation on it will occur in-memory. Casting it back to an IQueryable<> does not change the fact that the collection has already been enumerated.
But does it work?
Now, I think that this still doesn't sort all of your nested lists with a single call. However, you claim it does. If you still believe that it does, then you don't need to read on (because your problem is solved). Otherwise, the following may be useful to you.
SQL, and by extension LINQ, has made it possible to sort a list based on information that is not found in the list. This is essentially what you're doing, you're asking LINQ to sort a list of orders based on a related address (regardless of whether you want the adresses to be retrieved from the database or not!) You're not asking it to sort the customers, or the addresses. You're only asking it to sort the orders.
Your sort logic feels a bit dirty to me. You are supplying an Address entity to your OrderBy method, without specifiying any of its (value type) properties. But how are you expecting your addresses to be sorted? By alphabetical street name? By database id? ...
I would expect you to be more explicit about what you want, e.g. OrderBy(x => x.Address.Street).ThenBy(x => x.Address.HouseNumber) (this is a simplified example).
After enumeration, since all the (relevant) data is in-memory, you can start ordering all the nested lists. For example:
foreach(var order in myOrders)
{
order.Customer.Addresses = order.Customer.Addresses.OrderBy(x => x.Street).ToList();
}
This orders all the lists of addresses. It does not change the order of the list of orders.
Do keep in mind that if you want to order data in-memory, that you do in fact need the data to be present in-memory. If you never loaded the customer's addresses, you can't use addresses as a sorting argument.
Ordering the list of orders should be done before enumeration. It's generally faster to have it handled by your SQL database, which is what happens when you're working with LINQ-to-SQL.
Ordering nested lists should be done after enumeration, because the order of these lists is unrelated to the original IQueryable<Order>, which only focused on sorting the orders, not its nested related entities (during enumeration, the included entities such as Customer and Address are retrieved without ordering them).

You can transform your OrderBy so you don't need an anonymous type (though I like the Perl/Lisp Schwartzian Transform) and then it is straightforward to create dynamically (though I am not sure how dynamically you mean).
Using the new expression:
var abc = QueryData.OrderBy(a => a.customer[0].OtherAddress);
Not being sure what you mean by dynamic, you can create the lambda
x => x.OrderBy(a => a.customer[0].Otheraddress)
using Expression as follows:
var parmx = Expression.Parameter(QueryData.GetType(), "x");
var parma = Expression.Parameter(QueryData[0].GetType(), "a");
var abc2 = Expression.Lambda(Expression.Call(MyExtensions.GetMethodInfo((IEnumerable<Orders> x)=>x.OrderBy(a => a.customer[0].OtherAddress)),
new Expression[] { parmx,
Expression.Lambda(Expression.Property(Expression.ArrayIndex(Expression.Property(parma, "customer"), Expression.Constant(0)), "OtherAddress"), parma) }),
parmx);

Related

Sum multiple fields with select new and where raises error 'Model' does not have definition of 'Sum'

First timer to C# and want to see if i can change a query. Basically, I want an aggregated values of the two currency columns in one go. There are Car dealer, with each dealer having a list of cars in their possession. So I want to get list of car dealers along with a total cost of all the cars under them. For now what I am doing is I loop over each dealer, get the cars then loop the cars over summing each cars price, which sounds inefficient to me.
public object DealerObj(Dealer d)
{
var cars = d.DealerCars.Select(cc => new { cc.Cost }).ToList();
var totalCost decimal = 0;
foreach (var car in cars){
totalCost += car.Cost;
}
return new {
d.DealerName,
totalCost = totalCost
}
}
While I am getting the correct results, the query appears slower. Is there a way to do a Sum() on the results and avoid the loop, something like below
var cars = d.DealerCars.Select(cc => new { total = cc.Sum(s=>s.Cost) });
It appears to me that you just need to do this:
public object DealerObj(Dealer d)
{
return new
{
d.DealerName,
totalCost = d.DealerCars.Sum(cc => cc.Cost),
};
}
It seems like a fairly pointless method. I would love to know how you are using the above code, espcially how you're turning the returned object into something useful.
Based on this:
There are Car dealer, with each dealer having a list of cars in their
possession. So I want to get list of car dealers along with a total
cost of all the cars under them.
The function that you posted and Enigmativity has correctly updated to get the sum would be used to iterate over a dealer at a time.
If you have a collection of Dealers outside of this method call, in this case a List<Dealer> as an example:
List<Dealer> dealers = loadDealers();
var sumObjects = dealers.Select(d => DealerObj(d)).ToList();
and with Enigmativity's change to get the Sum for the cars under the dealer, you would be good to go. Though I would recommend defining a suitable model (DTO or ViewModel) for this kind of summary rather than trying to pass anonymous types around and/or object references, and giving the method a more descriptive name.
If instead you want to run an operation across a dealers collection directly:
List<Dealer> dealers = loadDealers(); // Load these from data source.
var sumObjects = dealers
.SelectMany(d => d.DealerCars)
.GroupBy(c => c.Dealer.DealerName)
.Select(g => new
{
DealerName = g.Key,
TotalCost = g.Sum(c => c.Cost)
}).ToList();
This assumes all dealers are uniquely named. If not, you would want to group by c.Dealer and then select g.Key.Name along with anything to differentiate the multiple dealers /w the same name.
If you're dealing with a list of dealer objects that are loaded as a whole into memory then I would probably stick with the method approach, as it's a bit cleaner to read and interpret what is going on.
Where the 2nd approach would come into play is if you are dealing with something that uses IQueryable such as Entity Framework to retrieve your data. The first approach would need to load all dealers AND their related details into memory entirely to work without a potentially significant performance hit if related details for each dealer needed to be lazy loaded after the fact. (Extra round trips to the DB) Using the 2nd approach against an IQueryable (DbSet / EF Linq expression) means that it can be translated down to SQL provided you follow some key rules, resulting in a single, far more efficient query to the database.
For instance: Given an EF context:
using(var context = new AppContext())
{
var sumObjects = context.Dealers
.SelectMany(d => d.DealerCars)
.GroupBy(c => c.Dealer.DealerName)
.Select(g => new
{
DealerName = g.Key,
TotalCost = g.Sum(c => c.Cost)
}).ToList();
}
Alternatively for that to work efficiently with something like a repository method, that repository would need to return IQueryable<Dealer> rather than IEnumerable<Dealer>. EF will only compose these down to an efficient query if using IQueryable. If you use IEnumerable the query will be executed before these criteria are considered and you would potentially be facing lazy load hits.
If your methods are using IEnumerable<Dealer> or List<Dealer> or arrays etc. and no idea or need for IQueryable then sticking with the method per Dealer will probably be the easiest to follow.
From the document
Computes the sum of a sequence of numeric values
So you can only Sum() for IEnumerable<Cost> like below.
var totalCost = d.DealerCars.Sum(p => p.Cost });
/* From my point of view, you should where the condition from your parameter
var totalCost = d.DealerCars.Where(p => p.DealerID == d.DealerID)
.Sum(p => p.Cost }); */
return new { d.DealerName, totalCost = totalCost }

How to filter a List<T> if it contains specific class data?

I need help with filtering list data in c#.
I got 3 class named Product.cs, Storage.cs and Inventory.cs.
public class Storage{
string StorageId;
string Name;
}
public class Inventory{
string InventoryId;
string StorageId;
string ProductId;
}
I got the filled List<Storage> mStorages, List<Product> mProduct and List<Inventory> mInventories.
I have trouble to print mStorages that contain with specific productId that only can be obtained from mInventories.
So, I tried this:
List<Storage> mFilteredStorage;
for(int i=0;i<mStorages.Count;i++){
if(mStorages[i] contain (productId from inventories)){
mFilteredStorage.add(mstorages[i]);
}
So I can get mFilteredStorage that contains specific product from inventories. (in inventories there are lot of product id).
What should I do to get that filteredStorage? I tried to use list.contains() but it only return true and at last there are duplicated storage at mFilteredStorage.
Really need your help guys. Thanks in advance.
I suggest you to read about lambda-expressions, that is what you are looking for.
mFilteredStorage.AddRange(mStorages.Where(storage => inventories.Any(inventory => inventory.productId == storage.productId)).ToList());
This returns you a list with your filtered conditions. So right after Where you iterate over each item in your list, I called this item storage. (you can name those what ever you want to) Then we iterate over your object inventories with another lambda expression. This, the second lambda expression, returns either true if any of inventories's productIds match the productId of the current iterating object of mStorages or false if they don't match.
So you once the productIds match you can imagine the code like the following:
mStorages.Where(storage => true);
And once the result of the second lambda expression is true, storage will be added to the IEnumerable you will get as a result of the Where method.
Since we get an IEnumerable as return, but we want to add those Storage objects to mFilteredStorage, I convert the IEnumerable to a list, by:
/*(the return object we get from the `Where` method)*/.ToList();
You can use LINQ to accomplish your goal. Since Storage has no ProductId, the query will match by StorageId.
var filteredStoragesQry =
from storage in mStorages
where inventories.Any(inventory => inventory.StorageId == storage.StorageId)
select storage;
mFilteredStorages = filteredStoragesQry.ToList();
This query is for LINQ to objects, but it will also work in Entity Framework, when you replace mStorages and inventories by the respective DbSet objects from the context.
mStorages.Join(mInventories, x => x.StorageId, y => y.StorageId, (x, y) => new { Storage = x, ProductId = y.ProductId})
.Where(z => z.ProductId == "specificProductId").Select(z => z.Storage).ToList()
I ended with this code.
mFilteredStorage = tempStorage.GroupBy(s => s.Id).Select(group => group.First()).ToList()
This code is what I want to show.

Lambda expression to return one result for each distinct value in list

I currently have a large list of a class object and I am currently using the following lambda function to return elements that meet the condition.
var call = callList.Where(i => i.ApplicationID == 001).ToList();
This will return a list of objects that all have an id of 001.
I am now curious as to what different ApplicationIDs there are. So I would like a lambda function that will look into this list and return a list where all the element have a different ApplicationID but only fetches one of those.
If i understand your question you can try:
var list = callList.GroupBy(x => x.ApplicationID).Select(x => x.First()).ToList();
So if you have a list like:
AppID:1, AppID:1, AppID:2, AppID:2, AppID:3, AppID:3
Will return:
AppID:1 AppID:2 AppID:3
You can use either First or FirstOrDefault to get back one result
var call = callList.First(i => i.ApplicationID == 001);
If no call exisrs with an ApplicationID of 001 this will throw an exception. If this may be expected consider using:
var call = callList.FirstOrDefault(i => i.ApplicationID == 001);
Here null will be returned if no such call exists and you can handle accordingly in you code.
To find out what other ApplicationId's exist you can query:
var Ids = callList.Where(i => i.ApplicationID != 001).Select(i => i.ApplicationID).Distinct();
You are saying
I am now curious as to what different ApplicationIDs there are. So I
would like a lambda function that will look into this list and return
a list where all the element have a different ApplicationID but only
fetches one of those.
I would suggest that is never something you'd actually want. You either don't care about the elements, you care about all of them, or you care about a specific one. There are few (none?) situations where you care about a random one from the list.
Without knowing about which specific one you care, I can't give you a solution for that version. Allesandro has given you a solution for the random one.
When you only care about the distinct ID's you would end up with
callList.Select(c => c.ApplicationID).Distinct()
which just gives you all ApplicationIDs.
if you care about all of them, you'd end up with
callList.GroupBy(c => c.ApplicationID)
this will give you an IEnumerable<IGrouping<String, Thingy>> (where Thingy is the type of whatever the type of elements of callList is.)
This means you now have a collection of ApplicationID -> collection of Thingy's. For each distinct ApplicationID you'll have a "List" (actually IEnumerable) of every element that has that ApplicationID
If you care for the Thingy of that - for example - has the lowest value of property Foo you would want
callList.GroupBy(c => c.ApplicationID)
.Select(group => group.OrderBy(thingy => thingy.Foo).First()))
here you first Group them by ApplicationID, and then for each list of thingies with the sample ApplicationID you Select the first one of them if you Order them by Foo
There is a way to use the Distinct in the query, but it makes you take care about the values equality. Let's assume your type is called CallClass and try:
class CallClass : IEqualityComparer<CallClass>
{
public int ApplicationId { get; set; }
//other properties etc.
public bool Equals(CallClass x, CallClass y)
{
return x.ApplicationId == y.ApplicationId;
}
public int GetHashCode(CallClass obj)
{
return obj.GetHashCode();
}
}
Now you're able to query values distinctly:
var call = callList.Distinct().ToList();

Detecting "near duplicates" using a LINQ/C# query

I'm using the following queries to detect duplicates in a database.
Using a LINQ join doesn't work very well because Company X may also be listed as CompanyX, therefore I'd like to amend this to detect "near duplicates".
var results = result
.GroupBy(c => new {c.CompanyName})
.Select(g => new CompanyGridViewModel
{
LeadId = g.First().LeadId,
Qty = g.Count(),
CompanyName = g.Key.CompanyName,
}).ToList();
Could anybody suggest a way in which I have better control over the comparison? Perhaps via an IEqualityComparer (although I'm not exactly sure how that would work in this situation)
My main goals are:
To list the first record with a subset of all duplicates (or "near duplicates")
To have some flexibility over the fields and text comparisons I use for my duplicates.
For your explicit "ignoring spaces" case, you can simply call
var results = result.GroupBy(c => c.Name.Replace(" ", ""))...
However, in the general case where you want flexibility, I'd build up a library of IEqualityComparer<Company> classes to use in your groupings. For example, this should do the same in your "ignore space" case:
public class CompanyNameIgnoringSpaces : IEqualityComparer<Company>
{
public bool Equals(Company x, Company y)
{
return x.Name.Replace(" ", "") == y.Name.Replace(" ", "");
}
public int GetHashCode(Company obj)
{
return obj.Name.Replace(" ", "").GetHashCode();
}
}
which you could use as
var results = result.GroupBy(c => c, new CompanyNameIgnoringSpaces())...
It's pretty straightforward to do similar things containing multiple fields, or other definitions of similarity, etc.
Just note that your defintion of "similar" must be transitive, e.g. if you're looking at integers you can't define "similar" as "within 5", because then you'd have "0 is similar to 5" and "5 is similar to 10" but not "0 is similar to 10". (It must also be reflexive and symmetric, but that's more straightforward.)
Okay, so since you're looking for different permutations you could do something like this:
Bear in mind this was written in the answer so it may not fully compile, but you get the idea.
var results = result
.Where(g => CompanyNamePermutations(g.Key.CompanyName).Contains(g.Key.CompanyName))
.GroupBy(c => new {c.CompanyName})
.Select(g => new CompanyGridViewModel
{
LeadId = g.First().LeadId,
Qty = g.Count(),
CompanyName = g.Key.CompanyName,
}).ToList();
private static List<string> CompanyNamePermutations(string companyName)
{
// build your permutations here
// so to build the one in your example
return new List<string>
{
companyName,
string.Join("", companyName.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
};
}
In this case you need to define where the work is going to take place i.e. fully on the server, in local memory or a mixture of both.
In local memory:
In this case we have two routes, to pull back all the data and just do the logic in local memory, or to stream the data and apply the logic piecewise. To pull all the data just ToList() or ToArray() the base table. To stream the data would suggest using ToLookup() with custom IEqualityComparer, e.g.
public class CustomEqualityComparer: IEqualityComparer<String>
{
public bool Equals(String str1, String str2)
{
//custom logic
}
public int GetHashCode(String str)
{
// custom logic
}
}
//result
var results = result.ToLookup(r => r.Name,
new CustomEqualityComparer())
.Select(r => ....)
Fully on the server:
Depends on your provider and what it can successfully map. E.g. if we define a near duplicate as one with an alternative delimiter one could do something like this:
private char[] delimiters = new char[]{' ','-','*'}
var results = result.GroupBy(r => delimiters.Aggregate( d => r.Replace(d,'')...
Mixture:
In this case we are splitting the work between the two. Unless you come up with a nice scheme this route is most likely to be inefficient. E.g. if we keep the logic on the local side, build groupings as a mapping from a name into a key and just query the resulting groupings we can do something like this:
var groupings = result.Select(r => r.Name)
//pull into local memory
.ToArray()
//do local grouping logic...
//Query results
var results = result.GroupBy(r => groupings[r]).....
Personally I usually go with the first option, pulling all the data for small data sets and streaming large data sets (empirically I found streaming with logic between each pull takes a lot longer than pulling all the data then doing all the logic)
Notes: Dependent on the provider ToLookup() is usually immediate execution and in construction applies its logic piecewise.

NHibernate 3 LINQ - how to create a valid parameter for Average()

Say I have a very simple entity like this:
public class TestGuy
{
public virtual long Id {get;set;}
public virtual string City {get;set;}
public virtual int InterestingValue {get;set;}
public virtual int OtherValue {get;set;}
}
This contrived example object is mapped with NHibernate (using Fluent) and works fine.
Time to do some reporting. In this example, "testGuys" is an IQueryable with some criteria already applied.
var byCity = testGuys
.GroupBy(c => c.City)
.Select(g => new { City = g.Key, Avg = g.Average(tg => tg.InterestingValue) });
This works just fine. In NHibernate Profiler I can see the correct SQL being generated, and the results are as expected.
Inspired by my success, I want to make it more flexible. I want to make it configurable so that the user can get the average of OtherValue as well as InterestingValue. Shouldn't be too hard, the argument to Average() seems to be a Func (since the values are ints in this case). Easy peasy. Can't I just create a method that returns a Func based on some condition and use that as an argument?
var fieldToAverageBy = GetAverageField(SomeEnum.Other);
private Func<TestGuy,int> GetAverageField(SomeEnum someCondition)
{
switch(someCondition)
{
case SomeEnum.Interesting:
return tg => tg.InterestingValue;
case SomeEnum.Other:
return tg => tg.OtherValue;
}
throw new InvalidOperationException("Not in my example!");
}
And then, elsewhere, I could just do this:
var byCity = testGuys
.GroupBy(c => c.City)
.Select(g => new { City = g.Key, Avg = g.Average(fieldToAverageBy) });
Well, I thought I could do that. However, when I do enumerate this, NHibernate throws a fit:
Object of type 'System.Linq.Expressions.ConstantExpression' cannot be converted to type 'System.Linq.Expressions.LambdaExpression'.
So I am guessing that behind the scenes, some conversion or casting or some such thing is going on that in the first case accepts my lambda, but in the second case makes into something NHibernate can't convert to SQL.
My question is hopefully simple - how can my GetAverageField function return something that will work as a parameter to Average() when NHibernate 3.0 LINQ support (the .Query() method) translates this to SQL?
Any suggestions welcome, thanks!
EDIT
Based on the comments from David B in his answer, I took a closer look at this. My assumption that Func would be the right return type was based on the intellisense I got for the Average() method. It seems to be based on the Enumerable type, not the Queryable one. That's strange.. Need to look a bit closer at stuff.
The GroupBy method has the following return signature:
IQueryable<IGrouping<string,TestGuy>>
That means it should give me an IQueryable, all right. However, I then move on to the next line:
.Select(g => new { City = g.Key, Avg = g.Average(tg => tg.InterestingValue) });
If I check the intellisense for the g variable inside the new { } object definition, it is actually listed as being of type IGrouping - NOT IQueryable>. This is why the Average() method called is the Enumerable one, and why it won't accept the Expression parameter suggested by David B.
So somehow my group value has apparently lost it's status as an IQueryable somewhere.
Slightly interesting note:
I can change the Select to the following:
.Select(g => new { City = g.Key, Avg = g.AsQueryable<TestGuy>().Average(fieldToAverageBy) });
And now it compiles! Black magic! However, that doesn't solve the issue, as NHibernate now doesn't love me anymore and gives the following exception:
Could not parse expression '[-1].AsQueryable()': This overload of the method 'System.Linq.Queryable.AsQueryable' is currently not supported, but you can register your own parser if needed.
What baffles me is that this works when I give the lambda expression to the Average() method, but that I can't find a simple way to represent the same expression as an argument. I am obviously doing something wrong, but can't see what...!?
I am at my wits end. Help me, Jon Skeet, you're my only hope! ;)
You won't be able to call a "local" method within your lambda expression. If this were a simple non-nested clause, it would be relatively simple - you'd just need to change this:
private Func<TestGuy,int> GetAverageField(SomeEnum someCondition)
to this:
private Expression<Func<TestGuy,int>> GetAverageField(SomeEnum someCondition)
and then pass the result of the call into the relevant query method, e.g.
var results = query.Select(GetAverageField(fieldToAverageBy));
In this case, however, you'll need to build the whole expression tree up for the Select clause - the anonymous type creation expression, the extraction of the Key, and the extraction of the average field part. It's not going to be fun, to be honest. In particular, by the time you've built up your expression tree, that's not going to be statically typed in the same way as a normal query expression would be, due to the inability to express the anonymous type in a declaration.
If you're using .NET 4, dynamic typing may help you, although you'd pay the price of not having static typing any more, of course.
One option (horrible though it may be) would be try to use a sort of "template" of the anonymous type projection expression tree (e.g. always using a single property), and then build a copy of that expression tree, inserting the right expression instead. Again, it's not going to be fun.
Marc Gravell may be able to help more on this - it does sound like the kind of thing which should be possible, but I'm at a loss as to how to do it elegantly at the moment.
Eh? the parameter to Queryable.Average is not Func<T, U>. It's Expression<Func<T, U>>
The way to do this is:
private Expression<Func<TestGuy,int>> GetAverageExpr(SomeEnum someCondition)
{
switch(someCondition)
{
case SomeEnum.Interesting:
return tg => tg.InterestingValue;
case SomeEnum.Other:
return tg => tg.OtherValue;
}
throw new InvalidOperationException("Not in my example!");
}
Followed by:
Expression<Func<TestGuy, int>> averageExpr = GetAverageExpr(someCondition);
var byCity = testGuys
.GroupBy(c => c.City)
.Select(g => new { City = g.Key, Avg = g.Average(averageExpr) });

Categories

Resources