Running the same linq query on multiple IQueryable in parallel? - c#

Situation: I have a List<IQueryable<MyDataStructure>>. I want to run a single linq query on each of them, in parallel, and then join the results.
Question: How to create a linq query which I can pass as a parameter?
Example code:
Here's some simplified code. First, I have the collection of IQueryable<string>:
public List<IQueryable<string>> GetQueries()
{
var set1 = (new List<string> { "hello", "hey" }).AsQueryable();
var set2 = (new List<string> { "cat", "dog", "house" }).AsQueryable();
var set3 = (new List<string> { "cat", "dog", "house" }).AsQueryable();
var set4 = (new List<string> { "hello", "hey" }).AsQueryable();
var sets = new List<IQueryable<string>> { set1, set2, set3, set4 };
return sets;
}
I would like to find all the words which start with letter 'h'. With a single IQueryable<string> this is easy:
query.Where(x => x.StartsWith("h")).ToList()
But I want to run the same query against all the IQueryable<string> objects in parallel and then combine the results. Here's one way to do it:
var result = new ConcurrentBag<string>();
Parallel.ForEach(queries, query =>
{
var partOfResult = query.Where(x => x.StartsWith("h")).ToList();
foreach (var word in partOfResult)
{
result.Add(word);
}
});
Console.WriteLine(result.Count);
But I want this to be a more generic solution. So that I could define the linq operation separately and pass it as a parameter to a method. Something like this:
var query = Where(x => x.FirstName.StartsWith("d") && x.IsRemoved == false)
.Select(x => x.FirstName)
.OrderBy(x => x.FirstName);
var queries = GetQueries();
var result = Run(queries, query);
But I'm at loss on how to do this. Any ideas?

So the first thing that you wanted was a way of taking a sequence of queries, executing all of them, and then getting the flattened list of results. That's simple enough:
public static IEnumerable<T> Foo<T>(IEnumerable<IQueryable<T>> queries)
{
return queries.AsParallel()
.Select(query => query.ToList())
.SelectMany(results => results);
}
For each query we execute it (call ToList on it) and it's done in parallel, thanks to AsParallel, and then the results are flattened into a single sequence through SelectMany.
The other thing that you wanted to do was to add a number of query operations to each query in a sequence of queries. This doesn't need to be parallelized (thanks to deferred execution, the calls to Where, OrderBy, etc. take almost no time) and can just be done through Select:
var queries = GetQueries().Select(query =>
query.Where(x => x.FirstName.StartsWith("d")
&& !x.IsRemoved)
.Select(x => x.FirstName)
.OrderBy(x => x.FirstName));
var results = Foo(queries);
Personally I don't really see a need to combine these two methods. You can make a method that does both, but they're really rather separate concepts so I don't see a need for it. If you do want them combined though, here it is:
public static IEnumerable<TResult> Bar<TSource, TResult>(
IEnumerable<IQueryable<TSource>> queries,
Func<IQueryable<TSource>, IQueryable<TResult>> selector)
{
return queries.Select(selector)
.AsParallel()
.Select(query => query.ToList())
.SelectMany(results => results);
}
Feel free to make either Foo or Bar extension methods if you want. Also, you really better rename them to something better if you're going to use them.

First - given your current implementation, there is no reason to use IQueryable<T> - you could just use IEnumerable<T>.
You could then write a method which takes an IEnumerable<IEnumerable<T>> and a Func<IEnumerable<T>, IEnumerable<U>>, to build a result:
IEnumerable<IEnumerable<U>> QueryMultiple<T,U>(IEnumerable<IEnumerable<T>> inputs, Func<IEnumerable<T>,IEnumerable<U>> mapping)
{
return inputs.AsParallel().Select(i => mapping(i));
}
You could then use this as:
void Run()
{
IEnumerable<IEnumerable<YourType>> inputs = GetYourObjects();
Func<IEnumerable<YourType>, IEnumerable<YourType>> query = i =>
i.Where(x => x.FirstName.StartsWith("d") && x.IsRemoved == false)
.Select(x => x.FirstName)
.OrderBy(x => x.FirstName);
var results = QueryMultiple(inputs, query);
}

Related

LINQ: how to get an intersection of two sets of ints?

There must be a way to compare two sets of results while staying in LINQ. Here's my existing code that uses a HashSet to do the comparison after two separate queries:
public static void AssertDealershipsShareTransactionGatewayCredentialIds(long DealershipLocationId1,
long DealershipLocationId2)
{
using (var sqlDatabase = new SqlDatabaseConnection())
{
var DealershipCredentials1 =
sqlDatabase.Tables.DealershipLocationTransactionGateway
.Where(x => x.DealershipLocationId == DealershipLocationId1)
.Select(x => x.TransactionGatewayCredentialId);
var DealershipCredentials2 =
sqlDatabase.Tables.DealershipLocationTransactionGateway
.Where(x => x.DealershipLocationId == DealershipLocationId2)
.Select(x => x.TransactionGatewayCredentialId);
var doSetsOfCredentialsMatch = new HashSet<int>(DealershipCredentials1).SetEquals(DealershipCredentials2);
Assert.IsTrue(doSetsOfCredentialsMatch,
"The sets of TransactionGatewayCredentialIds belonging to each Dealership did not match");
}
}
Ideas? Thanks.
Easy answer (This will make 1, possibly 2 database calls, both of which only return a boolean):
if (list1.Except(list2).Any() || list2.Except(list1).Any())
{
... They did not match ...
}
Better answer (This will make 1 database call returning a boolean):
var DealershipCredentials1 =
sqlDatabase.Tables.DealershipLocationTransactionGateway
.Where(x => x.DealershipLocationId == DealershipLocationId1)
.Select(x => x.TransactionGatewayCredentialId);
var DealershipCredentials2 =
sqlDatabase.Tables.DealershipLocationTransactionGateway
.Where(x => x.DealershipLocationId == DealershipLocationId2)
.Select(x => x.TransactionGatewayCredentialId);
if (DealershipCredentials1.GroupJoin(DealershipCredential2,a=>a,b=>b,(a,b)=>!b.Any())
.Union(
DealershipCredentials2.GroupJoin(DealershipCredential1,a=>a,b=>b,(a,b)=>!b.Any())
).Any(a=>a))
{
... They did not match ...
}
The second method works by unioning a left outer join that returns a boolean indicating if any unmatching records were found with a right outer join that does the same. I haven't tested it, but in theory, it should return a simple boolean from the database.
Another approach, which is essentially the same as the first, but wrapped in a single LINQ, so it will always only make 1 database call:
if (list1.Except(list2).Union(list2.Except(list1)).Any())
{
}
And another approach:
var common=list1.Intersect(list2);
if (list1.Except(common).Union(list2.Except(common)).Any()) {}

Create Multiple Objects Single LINQ EF Method

List<MyObject> objects = await item.tables.ToAsyncEnumerable()
.Where(p => p.field1 == value)
.Select(p => new MyObject(p.field1,p.field2))
.ToList();
^ I have something like that, but what i'm wondering, is there anyway way to add a second object creation, in the same select? eg. new MyObject(p.field3,p.field4) ? and add it to the same list? order does not matter.
I know could do this with multiple calls to database or splitting up lists into sections, but is there way to do this in single line?
You could create it as a tuple.
List<Tuple<MyObject1, MyObject2>> = query.Select(x => Tuple.Create(
new MyObject1
{
// fields
},
new MyObject2
{
//fields
}))
.ToList();
From my testing in Linqpad, it seems that this will only hit the database once.
Alternatively, you could just select all the fields you know you'll need from the database to create both:
var myList = query.Select(x => new { FieldA = x.FieldA, FieldB = x.FieldB }).ToList(); //hits db once
var object1s = myList.Select(x => new MyObject1(x.FieldA));
var object2s = myList.Select(x => new MyObject1(x.FieldB));
var bothLists = object1s.Concat(object2s).ToList();
What you'd want to do is use the SelectMany method in linq. Which will select all the items from an array. The array can be created anonymously as seen below.
List<MyObject> objects = await item.tables.ToAsyncEnumerable()
.Where(p => p.field1 == value)
.SelectMany(p => new []{new MyObject(p.field1,p.field2), new MyObject(p.field3,p.field4)})
.ToList();
Hope that solves you problem!
If you use query syntax instead of method chaining, you can use the let operator to accomplish this. Note that the SQL generated may not be exactly performant as this article shows, but it should work for you if you're after a subquery.
You could try creating an array of objects and then flattening with SelectMany:
List<MyObject> objects = await item.tables.ToAsyncEnumerable()
.Where(p => p.field1 == value)
.Select(p => new [] {
new MyObject(p.field1,p.field2),
new MyObject(p.field3,p.field4)
})
.SelectMany(g => g)
.ToList();
But I suspect you'll have problems getting EF to translate that to a query.

C# predicate list passed to Linq Where clause

I have a long Linq Where clause that I would like to populate with a predicate list.
List<Expression<Func<Note, bool>>> filters = new List<Expression<Func<Note, bool>>>();
filters.Add(p => p.Title != null && p.Title.ToLower().Contains(searchString));
filters.Add(p => p.Notes != null && p.Notes.ToLower().Contains(searchString));
filters.Add(GlobalSearchUser((List < User > users = new List<User>() { p.user1, p.user2, p.user3, p.user4 }), searchString));
notes = dataAccess.GetList<Note>(pn => pn.ProjectVersionID == projectVersionID, filterExtensions.ToArray())
.Where(filters.ToArray()).Take(10).ToList();
However I'm getting this error:
cannot convert from 'System.Linq.Expressions.Expression<System.Func<project.Contracts.DTOs.Note,bool>>[]' to 'System.Func<project.Contracts.DTOs.Note,bool>'
Which is an error on the .where clause. Pulling out the .where compiles just fine.
I think great answer from Hogan can be simplified and shorten a bit by use of Any and All Linq methods.
To get items that fulfill all the conditions:
var resultAll = listOfItems.Where(p => filters.All(f => f(p)));
And to get the items that fulfill any condition:
var resultAny = listOfItems.Where(p => filters.Any(f => f(p)));
There are at least two errors in your code:
List<Expression<Func<Note, bool>>> filters = new List<Expression<Func<Note, bool>>>();
change it to
List<Func<Note, bool>> filters = new List<Func<Note, bool>>();
You don't need Expression trees here. You are using IEnumerable<>, not IQueryable<>
notes = dataAccess.GetList<Note>(pn => pn.ProjectVersionID == projectVersionID, filterExtensions.ToArray())
.Where(filters.ToArray()).Take(10).ToList();
There .Where() accepts a single predicate at a time. You could:
notes = dataAccess.GetList<Note>(pn => pn.ProjectVersionID == projectVersionID, filterExtensions.ToArray())
.Where(x => filters.All(x)).Take(10).ToList();
or various other solutions, like:
var notesEnu = dataAccess.GetList<Note>(pn => pn.ProjectVersionID == projectVersionID, filterExtensions.ToArray())
.AsEnumerable();
foreach (var filter in filters)
{
notesEmu = notesEmu.Where(filter);
}
notes = notesEnu.Take(10).ToList();
Because all the .Where() conditions are implicitly in &&.
You have to loop over your filters and run a test on each one.
You can do it with linq like this to return true if any of your filters are true:
.Where(p => { foreach(f in filters) if (f(p) == true) return(true); return(false)})
or like this to to return true if all of your filters are true:
.Where(p => { foreach(f in filters) if (f(p) == false) return(false); return(true)})
You can't just pass an array of predicates to the where method. You need to either iterate over the array and keep calling Where() for each expression in the array, or find a way to merge them all together into one expression and use that. You'll want to use LinqKit if you go the second route.

How to use .Where with lambda / IQueryable

Edit: Code works fine, it was an other bug.
I had comment out the //department.IdAgency = reader.GetByte(2); line, in the created departmentList. When I removed the // then the IQueryable<string> with .Where works fine. Sorry for the inconvenience!
static List<Department> CreateDepartmentList(IDataReader reader)
{
List<Department> departmentList = new List<Department>();
Department department = null;
while (reader.Read())
{
department = new Department();
department.Id = reader.GetByte(0);
department.Name = reader.GetString(1);
//department.IdAgency = reader.GetByte(2);
if (!reader.IsDBNull(3))
{ department.IdTalkGroup = reader.GetInt16(3); }
departmentList.Add(department);
}
return departmentList;
}
Original question:
I have an IQueryable<string> query, that works. But how do I use .Where?
IQueryable<string> query = departmentList.AsQueryable()
.OrderBy(x => x.Name)
.Select(x => x.Name);
I have tried this, but it does not work:
IQueryable<string> query = departmentList.AsQueryable()
.OrderBy(x => x.Name)
.Where(x => x.IdAgency == idAgencySelected[0])
.Select(x => x.Name);
All the .Where() call does is apply a filtering method to each element on the list, thus returning a new IEnumerable.
So, for some IQueryable<string>...
IEnumerable<string> results = SomeStringList.Where(s => s.Contains("Department"));
...You would get a list of strings that contain the word department.
In other words, by passing it some boolean condition that can be applied to a member of the queryable collection, you get a subset of the original collection.
The reason your second block of code does not work, is because you're calling a method or property that does not belong to string. You may want to consider querying against the more complex type, if it has identifier data, and then take the names of the elements and add them to some list instead.

How can I specify to use Linq ThenBy clause only when there is a tie?

I have a linq query (not database-related) with OrderBy and ThenBy
var sortedList = unsortedList
.OrderBy(foo => foo.Bar) //this property access is relatively fast
.ThenBy(foo => foo.GetCurrentValue()) //this method execution is slow
getting foo.Bar is fast, but executing foo.GetCurrentValue() is very slow. The return value only matters if some members have equal Bar values, which happens rarely but important to be considered in case it happens. Is it possible to choose to only execute the ThenBy clause when it's necessary to tie-break in case of equal Bar values? (i.e. will not be executed if foo.Bar values are unique).
Also, actually Bar is also a bit slow, so it is preferred not to invoke it twice for the same object.
Since you are not in a database, and you need a tight control over the sorting, you could use a single OrderBy with a custom IComparer that accesses only what it needs, and does not perform unnecessary evaluations.
This is a bit clumsy, but I'm sure it can be improved - maybe it won't be done in one linq statement, but it should work:
var sortedList2 = unsortedList
.OrderBy(foo => foo.Bar)
.GroupBy(foo => foo.Bar);
var result = new List<Foo>();
foreach (var s in sortedList2)
{
if (s.Count() > 1)
{
var ordered = s
.OrderBy(el => el.GetCurrentValue());
result.AddRange(ordered);
}
else
{
result.AddRange(s);
}
}
UPDATE:
We can argue if that's an improvement, but it looks more concise at least:
var list3 = (from s in sortedList2
let x = s.Count()
select x == 1
? s.Select(el => el)
: s.OrderBy(el => el.GetCurrentValue()))
.SelectMany(n => n);
UPDATE2:
You can use Skip(1).Any() instead of Count() - this should avoid the enumeration of the whole sequence I guess.
var query = unsortedList
.GroupBy(foo => foo.Bar)
.OrderBy(g => g.Key)
.SelectMany(g => g.Skip(1).Any() ? g.OrderBy(foo => foo.GetCurrentValue()) : g);
This has the obvious downside of not returning IOrderedEnumerable<Foo>
I changed Joanna Turban's solution and developed the following extension method:
public static IEnumerable<TSource> OrderByThenBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> orderBy, Func<TSource, TKey> thenBy)
{
var sorted = source
.Select(s => new Tuple<TSource, TKey>(s, orderBy(s)))
.OrderBy(s => s.Item2)
.GroupBy(s => s.Item2);
var result = new List<TSource>();
foreach (var s in sorted)
{
if (s.Count() > 1)
result.AddRange(s.Select(p => p.Item1).OrderBy(thenBy));
else
result.Add(s.First().Item1);
}
return result;
}
Try this one
var sortedList = unsortedList.OrderBy(foo => foo.Bar);
if(some_Condition)
{
sortedList = sortedList.OrderBy(foo => foo.GetCurrentValue());
}

Categories

Resources