How to join unknown number of lists in LINQ - c#

I have three lists of different types :
List<Customer> customerList = new List<Customer>();
List<Product> productList = new List<Product>();
List<Vehicle> vehicleList = new List<Vehicle>();
I also have this list
List<string> stringList = {"AND","OR"};
Since first element of stringList is AND I want to make inner join with customerList and productList. Then I want to make right join vehicleList with the result such as :
from cust in customerList
join prod in productList on cust.ProductId equals prod.Id
join veh in vehicleList on prod.VehicleId equals veh.Id into v
from veh in v.DefaultIfEmpty()
select new {customerName = cust.Name, customerVehicle=veh.VehicleName}
I want to make this in automatized way, lets say I have N number of lists and N-1 number of ANDs and ORs, how can I join them? Besides there can be many lists of the same type. Is such a thing even possible? If not what can I do to make this closer to my need? Thanks in advance.
EDIT :
I'm holding the lists and their types in a Dictionary like this :
var listDict = new Dictionary<Type, object>();
So I can iterate inside this dictionary if necessary.

UPDATE 5-15-17:
Just for the sake of recap what I am proposing is an example that we want to:
Pass in a list of N number of Table objects.
Pass in a list of N-1 join clauses of how to join them. EG: You have 2 tables you need a single join, 3 you need 2, and so on.
We want to be to pass in a predicate to go up or down the chain to narrow scope.
What I would propose is to do all of this in SQL and pass into SQL an xml object that it can parse. However to keep it a little more simple to not deal with XML serialization too, let's stick with strings that are essentially one or many values to pass in. Say we have a structure going off of above like this:
/*
CREATE TABLE Customer ( Id INT IDENTITY, CustomerName VARCHAR(64), ProductId INT)
INSERT INTO Customer VALUES ('Acme', 1),('Widgets', 2)
CREATE TABLE Product (Id INT IDENTITY, ProductName VARCHAR(64), VehicleId INT)
Insert Into Product Values ('Shirt', 1),('Pants', 2)
CREATE TABLE VEHICLE (Id INT IDENTITY, VehicleName VARCHAR(64))
INSERT INTO dbo.VEHICLE VALUES ('Car'),('Truck')
CREATE TABLE Joins (Id INT IDENTITY, OriginTable VARCHAR(32), DestinationTable VARCHAR(32), JoinClause VARCHAR(32))
INSERT INTO Joins VALUES ('Customer', 'Product', 'ProductId = Id'),('Product', 'Vehicle', 'VehicleId = Id')
--Data as is if I joined all three tables
CustomerId CustomerName ProductId ProductName VehicleId VehicleName
1 Acme 1 Shirt 1 Car
2 Widgets 2 Pants 2 Truck
*/
This structure is pretty simplistic and everything is one to one key relationships versus it could have some other identifiers. The key to making things work is to maintain a table that describes HOW these tables relate. I called this table joins. Now I can create a dynamic proc like so:
CREATE PROC pDynamicFind
(
#Tables varchar(256)
, #Joins VARCHAR(256)
, #Predicate VARCHAR(256)
)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #SQL NVARCHAR(MAX) =
'With x as
(
SELECT
a.Id
, {nameColumns}
From {joins}
Where {predicate}
)
SELECT *
From x
UNPIVOT (Value FOR TableName In ({nameColumns})) AS unpt
'
DECLARE #Tbls TABLE (id INT IDENTITY, tableName VARCHAR(256), joinType VARCHAR(16))
DECLARE #Start INT = 2
DECLARE #alphas VARCHAR(26) = 'abcdefghijklmnopqrstuvwxyz'
--Comma seperated into temp table (realistically most people create a function to do this so you don't have to do it over and over again)
WHILE LEN(#Tables) > 0
BEGIN
IF PATINDEX('%,%', #Tables) > 0
BEGIN
INSERT INTO #Tbls (tableName) VALUES (RTRIM(LTRIM(SUBSTRING(#Tables, 0, PATINDEX('%,%', #Tables)))))
SET #Tables = SUBSTRING(#Tables, LEN(SUBSTRING(#Tables, 0, PATINDEX('%,%', #Tables)) + ',') + 1, LEN(#Tables))
END
ELSE
BEGIN
INSERT INTO #Tbls (tableName) VALUES (RTRIM(LTRIM(#Tables)))
SET #Tables = NULL
END
END
--Have to iterate over this one seperately
WHILE LEN(#Joins) > 0
BEGIN
IF PATINDEX('%,%', #Joins) > 0
BEGIN
Update #Tbls SET joinType = (RTRIM(LTRIM(SUBSTRING(#Joins, 0, PATINDEX('%,%', #Joins))))) WHERE id = #Start
SET #Joins = SUBSTRING(#Joins, LEN(SUBSTRING(#Joins, 0, PATINDEX('%,%', #Joins)) + ',') + 1, LEN(#Joins))
SET #Start = #Start + 1
END
ELSE
BEGIN
Update #Tbls SET joinType = (RTRIM(LTRIM(#Joins))) WHERE id = #Start
SET #Joins = NULL
SET #Start = #Start + 1
END
END
DECLARE #Join VARCHAR(256) = ''
DECLARE #Cols VARCHAR(256) = ''
--Determine dynamic columns and joins
Select
#Join += CASE WHEN joinType IS NULL THEN t.tableName + ' ' + SUBSTRING(#alphas, t.id, 1)
ELSE ' ' + joinType + ' JOIN ' + t.tableName + ' ' + SUBSTRING(#alphas, t.id, 1) + ' ON ' + SUBSTRING(#alphas, t.id-1, 1) + '.' + REPLACE(j.JoinClause, '= ', '= ' + SUBSTRING(#alphas, t.id, 1) + '.' )
END
, #Cols += CASE WHEN joinType IS NULL THEN t.tableName + 'Name' ELSE ' , ' + t.tableName + 'Name' END
From #Tbls t
LEFT JOIN Joins j ON t.tableName = j.DestinationTable
SET #SQL = REPLACE(#SQL, '{joins}', #Join)
SET #SQL = REPLACE(#SQL, '{nameColumns}', #Cols)
SET #SQL = REPLACE(#SQL, '{predicate}', #Predicate)
--PRINT #SQL
EXEC sp_executesql #SQL
END
GO
I now have a medium for finding things that makes it stubbed query so to speak that I can replace the source of the from statement, what I query on, what value I use to query on. I would get results from it like this:
EXEC pDynamicFind 'Customer, Product', 'Inner', 'CustomerName = ''Acme'''
EXEC pDynamicFind 'Customer, Product, Vehicle', 'Inner, Inner', 'VehicleName = ''Car'''
Now what about setting that up in EF and using it in code? Well you can add procs to EF and get data from this as context. The answer that this addresses is that I am essentially giving back a fixed object now despite however many columns I may add. If my pattern is always going to be '(table)name' to N numbers of tables I can normalize my result by unpivoting and then just getting N number of rows for however many tables I have. Thus performance may be worse as you get larger result sets but the potential to make however many joins you want as long as similar structure is used is possible.
The point I am making though is that SQL is ultimately getting your data and doing crazy joins that result from Linq is at times more work than it's worth. But if you do have a small result set and a small db, you are probably fine. This is just an example of how you would get completely different objects in SQL using dynamic sql and how fast it can do something once the code for the proc is written. This is just one way to skin a cat of which I am sure there are many. The problem is whatever road you go down with dynamic joins or a method of getting things out is going to require some type of normalization standard, factory pattern or something where it says I can have N inputs that always yield the same X object no matter what. I do this through a vertical result set, but if you want a different column than say 'name' you are going to have to code more for that as well. However the way I built this if you want the description but say wanted to do a predicate for a date field, this would be fine with that.

If you always want the same set of output columns, then write your query ahead of time:
select *
from
customerList c
inner join
productList p on c.ProductId = p.Id
inner join
vehicleList v on p.VehicleId = v.Id
Then append a dynamic where. At its simplest, just replace 'CustomerCity:' with 'c.city' and so on, so that what they wrote becomes valid SQL (Danger danger: if your user is not to be trusted then you must must must make your SQL injection proof. At the very least scan it for DML, or limit the keywords they can provide. Better would be to parse it into fields, parameterise it properly and add the values they provide to parameters)
Simple (ugh) we let the SQL parser do some work:
string whereClause = userInput;
whereClause = whereClause.Replace("CustomerCity:", "c.City = '");
whereClause = whereClause.Replace("VehicleNumber:", "v.Number = ");
//and so on
whereClause = whereClause.Replace(" AND", "' AND");
//some logic here to go through the string and close up those apostrophes
Ugly, and fragile. And hackable (if you care).
Parsing would be better:
sqlCommand.CommandText = "SELECT ... WHERE ";
string whereBits = userInput.Split(" ");
var parameters as new Dictionary<string, string>();
parameters["customercity"] = "c.City";
parameters["vehiclenumber"] = "v.Number";
foreach(var token in whereBits){
var frags = token.Split(':');
string friendlyName = frags[0].ToLower();
//handle here the AND and OR -> append to sql command text and continue the loop
if(parameters.ContainsKey(friendlyName)){
sqlCommand.CommandText += parameters[friendlyName] + " = #" + friendlyName;
sqlCommand.Parameters.AddWithValue("#" + friendlyname, frags[1]);
}
}
//now you should have an sql that looks like
//SELECT ... WHERE customercity = #customercity ...
// and a params collection that looks like:
//sql.Params[0] => ("#customercity", "Seattle", varchar)...
One thing to consider: will your user be able to construct that query and get the results they want? What in a users mind does CustomerCity:Seattle OR ProductType:Computer AND VehicleNumber:8 AND CustomerName:Jason mean anyway? Everyone in Seattle, plus every Jason whose Computer is in vehicle 8?
Everyone in Seattle or who has a computer, but they must have vehicle 8 and be called jason?
Without precedence, queries could just turn out garbage in the user's hands

I think it would have been better if you just describe what the requirement is, instead of asking how to implement this strange design.
Performance isn't a problem... now. But that is how it always starts...
Anyways, I do not think performance has to be an issue. But that depends on the relations between tables. In your example there are lists with only one foreign key. Each customer has one product and each product has one vehicle. Resulting in one record.
But what happens if one vehicle has multiple products, from multiple customers? If you allow to combine tables in all kinds of ways, you're bound to create a Cartesian Product somewhere. Resulting in 1000s or more rows.
And how are you going to implement multiple relations between objects? Suppose there are users, and customer has the fields UpdatedByUser and CreatedByUser. How do you know which user maps to which field?
And what about numeric fields? It seems that you are treating all fields as string.
If you want to allow users to build queries, according to the relations in the database and existing fields, the best thing to do may be to write (generic) code to build your own expression trees. Using reflection you can show properties, etc. That may also result in the best queries.
But you may also consider to use MongoDB instead of Sql Server. If relations are not that important, then a relational database may not be the right place to store data. You may also consider to use the Full-text search feature in Sql Server.
If you want to use Sql Server then you should take advantage of the navigation properties that are present in Entity Framework 6 (code first). You think that is not what you need, but I think it can be very easy.
First you'll need to create a model and entities. Please note that you should not use the [Required] attribute for foreign keys. Because if you do, this will be translated to an inner join.
Next take the table you want to query:
var ctx = new Model();
//ctx.Configuration.ProxyCreationEnabled = false;
var q = ctx.Customers.AsQueryable();
// parse the 'parameters' to build the query
q = q.Include("Product");
// You'll have to build the include string
q = q.Include("Product.Vehicle");
var res = q.FirstOrDefault();
This will get all the data you'll need, all using left joins. In order to 'convert' a left join to an inner join you filter the foreign key to be not null:
var res = q.FirstOrDefault(cust => cust.ProductId != null);
So all you need is the table where you want to start. And then build the query anyway you like. You can even parse a string: Customer AND Product OR Vehicle instead of using seperate lists.
The variable res contains the customer which links to Product. But res should be the result of a select:
var res = q.Select(r => new { CustName = Customer.Name, ProductName = Customer.Product.Name).FirstOrDefault();
In the question there is no mention of filters, but in the comments there is. In case you want to add filters you can also think of building your query like this:
q = q.Where(cust => cust.Name.StartsWith("a"));
if (someCondition = true)
q = q.Where(cust => cust.Product.Name.StartsWith("a"));
var res = q.ToList();
This is just to give you an idea how you can take advantage of EF6 (code-first). You don't have to think about the joins, since these are already defined and automatically picked up.

decompose your linq/lambda expression using How to Convert LINQ Comprehension Query Syntax to Method Syntax using Lambda
you will get
customerList.Join(productList, cust => cust.ProductId, prod => prod.Id, (cust, prod) => new { cust = cust, prod = prod })
.GroupJoin(vehicleList, cp => cp.prod.VehicleId, veh => veh.Id, (cp, v) => new { cp = cp, v = v })
.SelectMany(cv => cv.v.DefaultIfEmpty(), (cv, veh) => new { customerName = cv.cp.cust.Name, customerVehicle = veh.VehicleName });
besides listDict, you will need the following keyArr as well:
keyArr[0] = { OuterKey = cust => cust.ProductId; InnerKey = prod => cust.Id; };
keyArr[1] = ...
for loop the listDict using the follow code:
var result = customerList;
foreach(var ld in listDict)
{
//use this
result = result.Join(ld, keyArr[i].OuterKey, keyArr[i].InnerKey, (cust, prod) => new { cust = cust, prod = prod });
//or this or both depends on the query
result = result.GroupJoin(ld, cp => cp.prod.VehicleId, veh => veh.Id, (cp, v) => new { cp = cp, v = v })
}
// need to define concrete class for each table
// and grouping result after each join
//and finally
result.SelectMany(cv => cv.v.DefaultIfEmpty(), (cv, veh) => { customerName = cv.cp.cust.Name, customerVehicle = veh.VehicleName });

The following code solves your problem.
Fist we need data, so I build some sample lists of three different types. My solution can handle multiple tables of the same data type.
Then I build the list of join specifications, specifying the tables, join fields and join type:
Warning: The order of the specifications must be same (must follow the topological sort). The first join joins two tables. The subsequent joins must join one new table to one of the existing tables.
var joinSpecs = new IJoinSpecification[] {
JoinSpecification.Create(list1, list2, v1 => v1.Id, v2 => v2.ForeignKeyTo1, JoinType.Inner),
JoinSpecification.Create(list2, list3, v2 => v2.Id, v3 => v3.ForeignKeyTo2, JoinType.LeftOuter)
};
then you just execute the joins:
//Creating LINQ query
IEnumerable<Dictionary<object, object>> result = null;
foreach (var joinSpec in joinSpecs) {
result = joinSpec.PerformJoin(result);
}
//Executing the LINQ query
var finalResult = result.ToList();
The result is a list of dictionaries containing the joined items, so the access looks like this: rowDict[table1].Column2. You can even have multiple tables of same type - this system handles that easily.
Here is how you do the final projection of your joined data:
var resultWithColumns = (
from row in finalResult
let item1 = row.GetItemFor(list1)
let item2 = row.GetItemFor(list2)
let item3 = row.GetItemFor(list3)
select new {
Id1 = item1?.Id,
Id2 = item2?.Id,
Id3 = item3?.Id,
Value1 = item1?.Value,
Value2 = item2?.Value,
Value3 = item3?.Value
}).ToList();
The full code:
using System;
using System.Collections.Generic;
using System.Linq;
public class Type1 {
public int Id { get; set; }
public int Value { get; set; }
}
public class Type2 {
public int Id { get; set; }
public string Value { get; set; }
public int ForeignKeyTo1 { get; set; }
}
public class Type3 {
public int Id { get; set; }
public string Value { get; set; }
public int ForeignKeyTo2 { get; set; }
}
public class Program {
public static void Main() {
//Data
var list1 = new List<Type1>() {
new Type1 { Id = 1, Value = 1 },
new Type1 { Id = 2, Value = 2 },
new Type1 { Id = 3, Value = 3 }
//4 is missing
};
var list2 = new List<Type2>() {
new Type2 { Id = 1, Value = "1", ForeignKeyTo1 = 1 },
new Type2 { Id = 2, Value = "2", ForeignKeyTo1 = 2 },
//3 is missing
new Type2 { Id = 4, Value = "4", ForeignKeyTo1 = 4 }
};
var list3 = new List<Type3>() {
new Type3 { Id = 1, Value = "1", ForeignKeyTo2 = 1 },
//2 is missing
new Type3 { Id = 3, Value = "2", ForeignKeyTo2 = 2 },
new Type3 { Id = 4, Value = "4", ForeignKeyTo2 = 4 }
};
var joinSpecs = new IJoinSpecification[] {
JoinSpecification.Create(list1, list2, v1 => v1.Id, v2 => v2.ForeignKeyTo1, JoinType.Inner),
JoinSpecification.Create(list2, list3, v2 => v2.Id, v3 => v3.ForeignKeyTo2, JoinType.LeftOuter)
};
//Creating LINQ query
IEnumerable<Dictionary<object, object>> result = null;
foreach (var joinSpec in joinSpecs) {
result = joinSpec.PerformJoin(result);
}
//Executing the LINQ query
var finalResult = result.ToList();
//This is just to illustrate how to get the final projection columns
var resultWithColumns = (
from row in finalResult
let item1 = row.GetItemFor(list1)
let item2 = row.GetItemFor(list2)
let item3 = row.GetItemFor(list3)
select new {
Id1 = item1?.Id,
Id2 = item2?.Id,
Id3 = item3?.Id,
Value1 = item1?.Value,
Value2 = item2?.Value,
Value3 = item3?.Value
}).ToList();
foreach (var row in resultWithColumns) {
Console.WriteLine(row.ToString());
}
//Outputs:
//{ Id1 = 1, Id2 = 1, Id3 = 1, Value1 = 1, Value2 = 1, Value3 = 1 }
//{ Id1 = 2, Id2 = 2, Id3 = 3, Value1 = 2, Value2 = 2, Value3 = 2 }
}
}
public static class RowDictionaryHelpers {
public static IEnumerable<Dictionary<object, object>> CreateFrom<T>(IEnumerable<T> source) where T : class {
return source.Select(item => new Dictionary<object, object> { { source, item } });
}
public static T GetItemFor<T>(this Dictionary<object, object> dict, IEnumerable<T> key) where T : class {
return dict[key] as T;
}
public static Dictionary<object, object> WithAddedItem<T>(this Dictionary<object, object> dict, IEnumerable<T> key, T item) where T : class {
var result = new Dictionary<object, object>(dict);
result.Add(key, item);
return result;
}
}
public interface IJoinSpecification {
IEnumerable<Dictionary<object, object>> PerformJoin(IEnumerable<Dictionary<object, object>> sourceData);
}
public enum JoinType {
Inner = 1,
LeftOuter = 2
}
public static class JoinSpecification {
public static JoinSpecification<TLeft, TRight, TKeyType> Create<TLeft, TRight, TKeyType>(IEnumerable<TLeft> LeftTable, IEnumerable<TRight> RightTable, Func<TLeft, TKeyType> LeftKeySelector, Func<TRight, TKeyType> RightKeySelector, JoinType JoinType) where TLeft : class where TRight : class {
return new JoinSpecification<TLeft, TRight, TKeyType> {
LeftTable = LeftTable,
RightTable = RightTable,
LeftKeySelector = LeftKeySelector,
RightKeySelector = RightKeySelector,
JoinType = JoinType,
};
}
}
public class JoinSpecification<TLeft, TRight, TKeyType> : IJoinSpecification where TLeft : class where TRight : class {
public IEnumerable<TLeft> LeftTable { get; set; } //Must already exist
public IEnumerable<TRight> RightTable { get; set; } //Newly joined table
public Func<TLeft, TKeyType> LeftKeySelector { get; set; }
public Func<TRight, TKeyType> RightKeySelector { get; set; }
public JoinType JoinType { get; set; }
public IEnumerable<Dictionary<object, object>> PerformJoin(IEnumerable<Dictionary<object, object>> sourceData) {
if (sourceData == null) {
sourceData = RowDictionaryHelpers.CreateFrom(LeftTable);
}
return
from joinedRowsObj in sourceData
join rightRow in RightTable
on joinedRowsObj.GetItemFor(LeftTable).ApplyIfNotNull(LeftKeySelector) equals rightRow.ApplyIfNotNull(RightKeySelector)
into rightItemsForLeftItem
from rightItem in rightItemsForLeftItem.DefaultIfEmpty()
where JoinType == JoinType.LeftOuter || rightItem != null
select joinedRowsObj.WithAddedItem(RightTable, rightItem)
;
}
}
public static class FuncExtansions {
public static TResult ApplyIfNotNull<T, TResult>(this T item, Func<T, TResult> func) where T : class {
return item != null ? func(item) : default(TResult);
}
}
The code outputs:
{ Id1 = 1, Id2 = 1, Id3 = 1, Value1 = 1, Value2 = 1, Value3 = 1 }
{ Id1 = 2, Id2 = 2, Id3 = 3, Value1 = 2, Value2 = 2, Value3 = 2 }
P.S. The code absolutely lacks any error checking to make it more compact and easier to read.

I think there are several reasons why you (and other answers and comments so far) are struggling with the solution. Primarily, as stated, you do not have enough meta information to successfully construct the complex relationship of the overall operation.
Absent Metadata
In looking at your inline LINQ example, specifically to quote:
from cust in customerList
join prod in productList on cust.ProductId equals prod.Id
join veh in vehicleList on prod.VehicleId equals veh.Id into v
from veh in v.DefaultIfEmpty()
select new {customerName = cust.Name, customerVehicle=veh.VehicleName}
... if we are to parse the knowledge that is inherently stated in the above code, we'll identify the following:
There are 3 separate data sets (of non-homogeneous types, though this is more evident from your List<T> examples at the beginning of the question) that serve as source of data. This meta information is available in the List<T> setups as sources to LINQ, and thus this part is not an issue.
The join order and type of join (i.e. AND implies .Join() and OR implies .GroupJoin()). This meta information is more or less also available for the list approach setup.
The relationship between the types, and the key to be used to compare one type to another. That is, that customer relates to product (as opposed to vehicle) and that customer-product relationship is defined as Customer.ProductId = Product.Id; or that vehicle relates to product (as opposed to customer) and that relationship is defined as Product.VehicleId = Vehicle.Id. This meta information, as list setup presented in your question is NOT available.
Projection of the resulting (interim and final) data set members. The example is not specific whether each data set is represented by a unique model (i.e. for all List<T>s that each T is unique) or if repeats are possible. Because inline LINQ allows you to reference specific data set, having two data sets of the same type is not an issue when defined statically because each data set is referenced by name and thus relationship is clear. If type can appear more than once, and if metadata is available to determine type relationships dynamically, the trouble creeps in that you don't know which instance of multiple instances of the same type to relate to. In other words if it is possible to have Person join Friends join Person join Car, it is not clear if Car should be matched to first Person or second Person. One possibility is to make assumption that in such cases you resolve relationship to the last instance of Person. Needless to say your lists setup doesn't have this meta information. For the purposes of this answer going forward, I'll assume that all types are unique and do not repeat.
Unlike the intersect example you referenced in comments, whereas Intersect is a parameter-less operator (besides the other set to intersect over), Join operator requires parameter(s) to identify the relationship by which to relate to the other data set. I.e. the parameter(s) is the meta information described in point 3 above.
Metadata
To close the gaps identified above is not simple, but is not insurmountable either. One approach is to simply annotate the data model types with relationship meta data. Something along the lines of:
class Vehicle
{
public int Id;
}
// PrimaryKey="Id" - Id refers to Vehicle.Id, not Product.Id
[RelationshipLink(BelongsTo=typeof(Product), PrimaryKey="Id", ForeignKey="VehicleId"]
class Product
{
public int Id;
public int VehicleId;
}
// PrimaryKey="Id" - Id refers to Product.Id, not Customer.Id
[RelationshipLink(BelongsTo=typeof(Product), PrimaryKey="Id", ForeignKey="ProductId"]
class Customer
{
public int Id;
public int ProductId;
}
This way, as you loop through the data sets as you're setting up joins, using reflection you can examine what type this data set is related to and how, lookup previous data sets for matching data type, and, again using reflection, setup .Join's or .GroupJoins key selectors for matching the relationship of instances of data.
Interim Projections
In static definitions of LINQ statements (be it using inline join or extension method .Join) you control what result of the join looks like and how data is merged and transformed into a shape (aka another model) convenient for subsequent operations (usually by use of anonymous objects). With dynamic set up, this is very difficult if not altogether impossible because you'd need to know what to keep, what not, how to resolve name collision of data models' properties, etc.
To solve this issue, you can probably propagate all interim results (aka projections) as a Dictionary<Type, object>, and simply carry through full models, each tracked by its type. And the reason you want to make it easy to track by its type is so that when you join previous interim result with the next dataset, and need to build the primary/foreign key functions, you have easy means to lookup the time that you discover from [RelationshipLink] metadata.
The final project of the result, again, is not really stated in your question, but you need some way of dynamically determining what part of very wide result do you want (or all of it), or how to transform its shape back into whatever function that will be consuming the results of the giant join.
Algorithm
Finally, we can put the whole thing together. The code below is going to be just high-level of algorithm in C#-pseudocode, and not full C#. See footnote.
var datasets = GetListsOfDatasets().ToArray(); // i.e. the function that returns customerList, productList, vehicleList, etc as a set of List<T>'s
var joins = datasets.First().Select(item => new Dictionary<Type, object> {[item.GetType()] = item});
var joinTypes = stringList.ToQueue() // the "AND", "OR" that tells how to join next one. Convert to queue so we can pop of the top. Better make it enum rather than string.
foreach(dataset in datasets.Skip(1))
{
var outerKeyMember = GetPrimaryKeyMember(dataset.GetGenericEnumerableUnderlyingType());
var innerKeyMember = GetForeignKeyMember(dataset.GetGenericEnumerableUnderlyingType());
var joinType = joinTypes.Pop();
if ()
joins = joinType == "AND:
? joins.Join(
dataset,
outerKey => ReflectionGetValue(outerKeyMember.Member, outerKey[outerKeyMember.Type]),
innerKey => ReflectionGetValue(innerKeyMember.Member, innerKey),
(outer, inner) => {
outer[inner.GetType] = inner;
return outer;
})
: joins.GroupJoin(/* similar key selection as above */)
.SelectMany (i => i) // Flatten the list from IGrouping<T> back to IEnumerable<T>
}
var finalResult = joins.Select(v => /* TODO: whatever you want to project out, and however you dynamically want to determine what you want out */);
/////////////////////////////////////
public Type GetGenericEnumerableUnderlyingType<T>(this IEnumerable<T>)
{
return typeof(T);
}
public TypeAndMemberInfo GetPrimaryKeyMember(Type type)
{
// TODO
// Using reflection examine type, look for RelationshipLinkAttribute, and examine PrimaryKey specified on the attribute.
// Then reflect over BelongsTo declared type and find member declared as PrimaryKey
return new TypeAndMemberInfo {Type = __belongsToType, Member = __relationshipLinkAttribute.PrimaryKey.AsMemberInfo }
}
public TypeAndMemberInfo GetForeignKeyMember(Type type)
{
// TODO Very similar to GetPrimaryKeyMember, but for this type and this type's foreign key annotation marker.
}
public object ReflectionGetValue(MemberInfo member, object instance)
{
// TODO using reflection as member to return value belonging to instance.
}
So the high-level idea is that you take the first data set and wrap each member of the set with dictionary that specifies the type of the member and the member instance itself. Then, for each next dataset, you discover the underlying model type of the dataset, using reflection lookup the relationship metadata that tells you how to relate it to another type (that should have already been exposed in previous processed dataset or the code will blow up because join won't have anything to get key values from), lookup instance of the type from the outer enumerable's dictionary, get that instance and discovered key and get that instance's value as the value for outer key, and very similar reflect and discover value of the inner's foreign key member, and let .Join do the rest of the joining. Keep looping to the end, with each iteration projection carrying full instances of each model.
Once done with all datasets, define what you want out of it using .Select with whatever definition you want, and execute the complex LINQ to pump the data.
Performance Considerations
To perform a join, it means that at least one data-set must be fully read so that key membership may be probed into it while processing the other data-set for matches.
Modern DB engines like SQL Server are able to process joins of extremely large data sets because they go the extra step of having the ability to persist out interim results rather than build up everything in memory, and pull from disk as needed. As such, billions of items join billions of items does not blow up due to free memory starvation - once memory pressure is identified, the interim data and matched results are temporarily persisted to tempdb (or whatever disk storage that backs memory).
Here, default LINQ .Join is an in-memory operator. Large enough data set will blow memory and cause OutOfMemoryException. If you foresee processing many joins resulting in very large datasets, you may need to write your own implementation of .Join and .GroupJoin that use some sort of disk paging to store one data set in format that can be easily probed for membership when trying to match items from the other set, so as to relieve the memory pressure and use disk for memory.
Voila!
Footnotes
First, because you question (sans comments) is asked in the domain of a simple LINQ (meaning IEnumerable and not IQueryable and not SQL or stored procs, I have thus limited the scope of the answer to strictly that domain to follow the spirit of the question. This is not to say that at higher level this problem doesn't lend well to a solution in some other domain.
Second, even though SO rules are for good, compile-able, working code in answers, the reality of this solution is that it is probably at least a few hundred lines of code, and would require many lines of code to do reflection. How to do reflection in C# is, obviously, beyond the scope of the question. As thus, code presented is pseudo code and focuses on algorithm, reducing non-pertinent parts to comments describing what happens and leaving the implementation to the OP (or those finding this useful in the future.

Related

LINQ - Simulating multiple columns in IN clausule

In oracle I can do the following query:
SELECT *
FROM Tabl Tabb
WHERE (tabb.Col1, tabb.Col2) IN ( (1,2), (3,4))
Consider I 've following entity:
public class Tabb
{
public int Col1 {get; set; }
public int Col2 {get; set; }
// other props
}
and criteria class
public class Search
{
public int Col1 {get; set; }
public int Col2 {get; set; }
}
I need to write:
public IEnumerable<Tabb> Select(IEnumerable<Search> s)
{
var queryable = this.context.Tabbs;
return queryable.Where(\* some *\).ToList();
}
How can I select entities, that search collection contain instance of search that has the same value of Col1 and Col2?
EDIT:
var result = from x in entity
join y in entity2
on new { x.field1, x.field2 } equals new { y.field1, y.field2 }
It doesn't work (As I expected) - in may case entity2 is not a entity table, it is static collection, so EF throws exception (sth like: cannot find mapping layer to type Search[]);
There's a few ways, which all have pros and cons, and are sometimes a little bit tricky...
Solution 1
You enumerate the ef part first (of course, depending on the size of your data, this might be a very bad idea)
Solution 2
You concatenate your fields with an element you're sure (hum) you won't find in your fields, and use a Contains on concatenated EF data.
var joinedCollection =entity2.Select(m => m.field1 + "~" + m.field2);
var result = entity.Where(m => joinedCollection.Contains(m.field1 + "~" + m.field2));
of course, this would be a little bit more complicated if field1 and field2 are not string, you'll have to use something like that
SqlFunctions.StringConvert((double)m.field1) + "~" + //etc.
Solution 3
you do this in two step, assuming you will have "not too much result" with a partial match (on only one field)
var field1Collection = joinedCollection.Select(m => m.field1);
var result = entity.Where(m => joinedCollection.Contains(m.field1)).ToList();
then you make the "complete join" on the two enumerated lists...
Solution 4
use a stored procedure / generated raw sql...
Just understood the problem better. You want all rows where the columns match, may be this will help:
myDBTable.Where(x =>
myStaticCollection.Any(y => y.Col2 == x.Col2) &&
myStaticCollection.Any(y => y.Col1 == x.Col1))
.ToList()
.Select(x => new Search { Col1 = x.Col1, Col2 = x.Col2 });
This is saying, I want each row where any Col2 in my static collection matches this database Col2 AND where any Col1 matches this database Col1
this.context.Searches.Join(
this.context.Tabbs,
s => s.Col2,
t => t.Col2,
(search, tab) => new {
search,
tab
});
This will bring back IEnumerable<'a> containing a search and a tab
This guy is doing something similar LINK
var result = from x in entity
join y in entity2
on new { x.field1, x.field2 } equals new { y.field1, y.field2 }
Once you have your result then you want to enumerate that to make sure you're hitting the database and getting all your values back. Once they're in memory, then you can project them into objects.
result.ToList().Select(a => new MyEntity { MyProperty = a.Property });

Linq and RESTful services: how to best merge data from multiple tables in a resultset

I'm experimenting with pulling data from multiple datasets using RESTful services. I'm hooking up to the Cloud version of Northwind, and attempting to use Linq to get the equivalent of this:
SELECT TOP 20 p.ProductName, p.ProductID, s.SupplierID, s.CompanyName AS Supplier,
s.ContactName, s.ContactTitle, s.Phone
FROM Products p
JOIN Suppliers s on p.SupplierID = s.SupplierID
ORDER BY ProductName
So, I define a class to hold my data:
public class ProductSuppliers
{
public string ProductName;
public int ProductID;
public string SupplierName;
public string ContactName;
public string ContactPosition;
public string ContactPhone;
}
And hook into the Northwind service:
NorthwindEntities dc = new NorthwindEntities (new
Uri("http://services.odata.org/Northwind/Northwind.svc/"));
After trying to set up a join, not being able to get it to work, and wandering around in the back corridors of MSDN for a while, I find that Linq joins aren't supported by the OData spec. Which seems obvious once you think about it, given the limitations of URI syntax.
Of course, the usual thing to do is stored procs and views on the server side anyway, handling any sort of joins there. However, I wanted to work out some sort of solution for a situation like this one, where you don't have the capability of creating stored procs or views.
My naive solution has all the elegance of medieval battlefield surgery, and it has to scale horribly. I pulled the two tables as two separate List objects, then iterated one, used Find to locate the matching ID in the other, and Added a combined record into my Product. Here's the code:
public List<ProductSuppliers> GetProductSuppliers()
{
var result = new List<ProductSuppliers>();
ProductSuppliers ps;
var prods =
(
from p in dc.Products
orderby p.ProductName
select p
).ToList();
var sups =
(
from s in dc.Suppliers
select s
).ToList();
foreach (var p in prods)
{
int cIndex = sups.IndexOf(sups.Find(x => x.SupplierID == p.SupplierID));
ps = new ProductSuppliers()
{
ProductName = p.ProductName,
ProductID = p.ProductID,
SupplierName = sups[cIndex].CompanyName,
ContactName = sups[cIndex].ContactName,
ContactPosition = sups[cIndex].ContactTitle,
ContactPhone = sups[cIndex].Phone
};
result.Add(ps);
}
return result;
}
There has to be something better than this, doesn't there? Is there something obvious I'm missing?
[Edit] I've looked at the link someone gave me on the Expand method, and that works...sort of. Here's the code change:
var sups =
(
from s in dc.Suppliers.Expand("Products")
select s
).ToList();
This gives me a list of Suppliers with Products for each in a sublist (dc.Suppliers[0].Products[0], etc.). While I could get what I want from there, I'd still have to iterate the entire list to invert the values (wouldn't I?), so it doesn't look like a more scaleable solution. Also, I can't apply Expand to the Products table to include Suppliers (Changing the from clause in prods to from p in dc.Products.Expand("Suppliers") results in a helpful "An Error occurred while processing this request."). So, it doesn't look like I can expand products to include lookup values from Suppliers, since it looks like expanding is expanding parents to include children, not looking up parent values in a list of children. Is there a way to use Expand (or is there some other mechanism besides client-side manipulation of the two tables) to include lookup values from a foreign key table?
The best you can do is described in this SO answer to a similar question. Not what you expected either, since you're required to make multiple roundtrips to the service.
If you don't control the server-side of things (or you don't want to use SPs/views/joins there) you are forced to use one of these mechanisms.
Anyway, at the very least you can improve the products-suppliers matching in your code to this:
var results = from p in prods
join s in sups on s.SupplierId equals p.SupplierId
select new ProductSuppliers()
{
ProductName = p.ProductName,
ProductID = p.ProductID,
SupplierName = s.CompanyName,
ContactName = s.ContactName,
ContactPosition = s.ContactTitle,
ContactPhone = s.Phone
};
You still need to retrieve all records and join in-memory, though.

LINQ - Place part of a query that is always the same in a separate class and use in other linq queries

Current situation
Currently I have this Linq query:
return from function in this.context.Functions
join fp in this.context.FunctionParameters on function.FunctionID equals fp.FunctionID into functionParameters
from functionParameter in functionParameters.DefaultIfEmpty()
join d in this.context.Descriptions on new
{
ID = (int)function.FunctionID,
languageID = languageID,
originID = (byte)origin
}
equals new
{
ID = d.ValueID,
languageID = d.LanguageID,
originID = d.OriginID
}
into entityWithDescription
from x in entityWithDescription.DefaultIfEmpty()
select new FunctionDTO()
{
Function = function,
Description = x
};
This returns the functions with their parameters and the specific descriptions. So, a select with two left outer joins.
This is all good and works.
The problem
I have multiple objects that have a description. The description table has no relationship with these objects (so no FK).
So there is a part of the above query that is always the same, namely the join query to the description table:
join d in this.context.Descriptions on new
{
ID = (int)function.FunctionID,
languageID = languageID,
originID = (byte)origin
}
equals new
{
ID = d.ValueID,
languageID = d.LanguageID,
originID = d.OriginID
}
into entityWithDescription
The variables languageID and origin are two parameters that are passed on with the method. The FunctionID is a property in my Function class, ie a property in my entity model. So that is a
public partial class Function
{
public byte FunctionID { get; set; }
/** Other properties **/
}
My question
Is it possible to create a separate class with the part of the linq query that is always the same? So that I don't have to duplicate the same code all over again?
What I already tried
var query = from function in this.context.Functions
join fp in this.context.FunctionParameters on function.FunctionID equals fp.FunctionID into functionParameters
from functionParameter in functionParameters.DefaultIfEmpty()
select function;
var testResult = this.context.Descriptions.GetDescriptionsByJoin(query, languageID, origin);
And the duplicate code in a separate class:
public static IQueryable<IEnumerable<Description>> GetDescriptionsByJoin(
this IDbSet<Description> descriptions, IQueryable<ITranslatable> query, byte languageID, OriginEnum origin)
{
return from q in query
join d in descriptions on new
{
ID = q.ValueID,
languageID = languageID,
originID = (byte)origin
}
equals new
{
ID = d.ValueID,
languageID = d.LanguageID,
originID = d.OriginID
}
into entityWithDescription
select entityWithDescription;
}
But this gave me the following error:
The specified type member is not supported in LINQ to Entities. Only initializers, entity members, and entity navigation properties are supported
I know that I get this error because I use my 'valueID' as a parameter in my join statement and that variable can't be found in my entity model (valueID is a property in an interface 'ITranslatable' that all my classes that have descriptions will implement).
Thanks in advance!
Greetings
Loetn
I think what you are after is LinqKit.
EF is funny about the expressions it is able to translate into SQL. LinqKit has a built-in expression visitor that helps with these things.
You don't end up creating separate classes for the queries, but rather separate methods. You can then chain them together using the LinqKit-provided extension methods.
Never seen it used with query syntax though (from x in foo where bar select x), I have always used it with extension method syntax (foo.Where(bar)).

How to get last category given a following route alias in a self referencing table

My problem solving like this such a code;
string permalink = "computers/hp/computers"; //example for similarity
List<string> aliases = permalink.Split('/').ToList();
Category cat = db.Categories.SingleOrDefault(c => c.Alias == aliases.First());
aliases.Remove(aliases.First());
foreach (string alias in aliases)
{
cat = cat.Categories.SingleOrDefault(c => c.Alias == alias);
}
return cat;
But this is sent many query..
How do I make one time?
If I understand what you want, you can use the Enumerable.Aggregate method. You will have to start with a 'root' category, that encompasses all of db.Categories. That's pretty easy to mock up though. Try this:
var aliases = permalink.Split('/');
var category = new Category // start with a 'root' category
{
Id = 0,
Categories = db.Categories
};
var cat = aliases.Aggregate(category, (c, a) => c.Categories.SingleOrDefault(x => x.Alias == a));
Firstly, if the category table is small it is sometimes better to just grab the whole table and do the selection in memory (perhaps using p.w.s.g's answer).
If the table is large, then a Stored procedure would probably be better than Linq.
But, if you really want to do it in Linq, then I think the only way is to repeatedly add a join to same table.
The following is assuming that your relationship is between fields called ParentID and Id. I have also changed your string permalink to better illustrate the order.
You first need a little helper class
public class Info
{
public Category category;
public int? recordID;
}
then your main code
string permalink ="computers1/hp/computers2";
var aliases = permalink.Split('/');
var query = dc.Categories.Where(r=>r.Alias == aliases[aliases.Length-1])
.Select(r=> new Info { category = r, recordID = r.ParentID});
for(int i = aliases.Length -2 ; i >= 0; i--)
{
string alias = aliases[i];
query = query.Join(dc.Categories ,
a => a.recordID , b => b.Id , (a,b) => new { a , b} )
.Where(r=>r.b.Alias == alias)
.Select(r=> new Info { category = r.a.category, recordID = r.b.ParentID});
}
return query.SingleOrDefault().category;
As you can see the lambda syntax of join is (IMHO) horrendous and I usually try to avoid it, but I can't think of anyway of avoiding it here.
Since I can't test it, it could be totally wrong (maybe I've mixed up the ID, ParentID or my a's and b's ), so it is important to test this and to test how it performs.
I think the sql produced should be something like
SELECT * from Categories AS t0
INNER JOIN Categories AS t1 ON t0.ParentID = t1.id
INNER JOIN Categories AS t2 ON t1.ParentID = t2.id
WHERE t2.Alias = 'computers1'
AND t1.Alias = 'hp'
AND t0.Alias = 'computers2'
The more sections or aliases, then the more joins there are.
Now that you've see all that, you probably want to avoid using this method -)
I'll probably just add to your confusion :), but let me just throw an idea...
Let me just say it that this doesn't work (exactly per your specs) - and it's not the solution but might help you simplify things a bit.
var query =
(from cat in db.Categories
where cat.Alias == "mainAalias"
from subcat in cat.Categories
where aliases.Contains(subcat.Alias)
orderby subcat.Alias descending
select subcat);
query.FirstOrDefault(); // or something
This should produce one relatively simple query
(e.g. SELECT...FROM...JOIN...WHERE... AND...IN...ORDERBY...).
e.g. if you give it 'cat1', 'cat2'...'cat6' - out of cat1 - to cat100 - it gives 'cat6'...'cat1' (I mean the aliases)
However it has a major 'flaw' with the 'sorting' - your specs require a sort that is the order of 'aliases' as they come - which is a bit unfortunate for queries. If you could somehow enforce, or define an order, that could be translated to SQL this (or similar) might work.
I'm assuming - that your 'aliases' are pre-sorted in an ascending
order - for this query to work. Which they are not, and I'm aware of
that.
But I think that your idea is not clearly defined here (and why all of us are having problems) - think through, and optimize - simplify your requirements - and let your C# tier help e.g. by pre-sorting.
You could also try some form of 'grouping' per cat.Alias etc. - but I think the same 'sorting problem' persists.

How can I query this hierarchical data using LINQ?

I have 3 kinds of objects: Agency, BusinessUnit and Client (each with their own respective table)
In terms of hierarchy, Agencies own BusinessUnits, and BusinessUnits own Clients.
I have 3 C# POCO Objects to represent them (I usually select new {} into them, rather than use the LINQ generated classes):
public class Agency
{
public IEnumerable<BusinessUnit> BusinessUnits { get; set; }
}
public class BusinessUnit
{
public IEnumerable<Client> Clients { get; set; }
}
public class Client
{
public int NumberOfAccounts { get; set; }
public Decimal AmountOfPlacement { get; set; }
public Decimal AvgBalance { get; set; }
public Double NeuPlacementScore { get; set; }
}
You can see that Agencies contain a list of BusinessUnits, and BusinessUnits contain a list of Clients.
I also have a mapping table called BAC_Map in the database which says which owns which, and it looks something like this:
How can I construct a query, so I can query for and return a list of Agencies? Meaning that, I want each Agency to have its list of BusinessUnit objects set, and I want the list of BusinessObjects to have its list of Clients set.
I can do basic LINQ queries, but this is a little over my head concerning the Map table and the multiple? queries.
How could I construct a method like GetAllAgencies() which would query, for not only all agencies, but populate its BusinessUnits that Agency owns, and the Clients those BusinessUnits own?
Edit: Any tips or info is appreciated. Do I need to do joins? Does this need to be multiple queries to return an Agency list, with its submembers populated?
If you drop all four tables (Agency, BusinessUnit, Client, Map) on the linq to sql designer, and draw relationships from Map to the other three, there will be some useful properties on Map.
//construct a query to fetch the row/column shaped results.
var query =
from m in db.map
//where m.... ?
let a = m.Agency
let b = m.BusinessUnit
let c = m.Client
// where something about a or b or c ?
select new {
AgencyID = a.AgencyID,
AgencyName = a.Name,
BusinessUnitID = b.BusinessUnitID,
ClientID = c.ClientID,
NumberOfAccounts = c.NumberOfAccounts,
Score = c.Score
};
//hit the database
var rawRecords = query.ToList();
//shape the results further into a hierarchy.
List<Agency> results = rawRecords
.GroupBy(x => x.AgencyID)
.Select(g => new Agency()
{
Name = g.First().AgencyName,
BusinessUnits = g
.GroupBy(y => y.BusinessUnitID)
.Select(g2 => new BusinessUnit()
{
Clients = g2
.Select(z => new Client()
{
NumberOfAccounts = z.NumberOfAccounts,
Score = z.Score
})
})
})
.ToList();
If approriate filters are supplied (see the commented out where clauses), then only the needed portions of the tables will be pulled into memory. This is standard SQL joining at work here.
I created your tables in a SQL Server database, and tried to recreate your scenario in LinqPad. I ended up with the following LINQ statements, which basically result in the same structure of your POCO classes:
var map = from bac in BAC_Maps
join a in Agencies on bac.Agency_ID equals a.Agency_ID
join b in BusinessUnits on bac.Business_Unit_ID equals b.Business_Unit_ID
join c in Clients on bac.Client_ID equals c.Client_ID
select new
{
AgencyID = a.Agency_ID,
BusinessUnitID = b.Business_Unit_ID,
Client = c
};
var results = from m in map.ToList()
group m by m.AgencyID into g
select new
{
BusinessUnits = from m2 in g
group m2 by m2.BusinessUnitID into g2
select new
{
Clients = from m3 in g2
select m3.Client
}
};
results.Dump();
Note that I called map.ToList() in the second query. This actually resulted in a single, efficient query. My initial attempt did not include .ToList(), and resulted in nine separate queries to produce the same results. The query generated by the .ToList() version is as follows:
SELECT [t1].[Agency_ID] AS [AgencyID], [t2].[Business_Unit_ID] AS [BusinessUnitID], [t3].[Client_ID], [t3].[NumberOfAccounts], [t3].[AmountOfPlacement], [t3].[AvgBalance], [t3].[NeuPlacementScore]
FROM [BAC_Map] AS [t0]
INNER JOIN [Agencies] AS [t1] ON [t0].[Agency_ID] = [t1].[Agency_ID]
INNER JOIN [BusinessUnits] AS [t2] ON [t0].[Business_Unit_ID] = [t2].[Business_Unit_ID]
INNER JOIN [Clients] AS [t3] ON [t0].[Client_ID] = [t3].[Client_ID]
Here is a screenshot of the results:
alt text http://img411.imageshack.us/img411/5003/agencybusinessunitclien.png
If you are doing this with direct LINQ to SQL, there is no way to do this without some kind of recursion, whether you do it yourself or you hide it behind an extension method. Recursive SQL is very bad (many round trips, many single queries).
There are two options here. One is to pull the entire table(s) with the hierarchy into memory and use LINQ to Objects on it. Leave the "details" tables in SQL. If you have less than several thousand entities, this is probably the most efficient way to go. You can keep a single copy of the table(s) in cache and refresh them when necessary. When you need to fetch more detailed data from the DB for a single record, you can reattach that entity from your cached hierarchy to a new DataContext and fetch it.
The other option is to use a more complex relationship model in your database. Storing parent only by nature demands recursion, but you can use the adjacency list model to construct a single query which can span many levels of inheritance. This will mean your LINQ to SQL queries become less intuitive (querying against Entity.Right and Entity.Left isn't quite as pretty as Parent or Children...) but you can do in one query what might take hundreds or thousands in the literal recursive approach.

Categories

Resources