SelectMany Anonymous Type and Skip Iterations - c#

I've been trying for a long time to find a "clean" pattern to handle a .SelectMany with anonymous types when you don't always want to return a result. My most common use case looks like this:
We have a list of customers that I want to do reporting on.
Each customer's data resides in a separate database, so I do a parallel .SelectMany
In each lambda expression, I gather results for the customer toward the final report.
If a particular customer should be skipped, I need to return a empty list.
I whip these up often for quick reporting, so I'd prefer an anonymous type.
For example, the logic may looks something like this:
//c is a customer
var context = GetContextForCustomer(c);
// look up some data, myData using the context connection
if (someCondition)
return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
else
return null;
This could be implemented as a foreach statement:
var results = new List<WhatType?>();
foreach (var c in customers) {
var context = GetContextForCustomer(c);
if (someCondition)
results.AddRange(myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 }));
}
Or it could be implemented with a .SelectMany that is pre-filtered with a .Where:
customers
.Where(c => someCondition)
.AsParallel()
.SelectMany(c => {
var context = GetContextForCustomer(c);
return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
})
.ToList();
There are problems with both of these approaches. The foreach solution requires initializing a List to store the results, and you have to define the type. The .SelectMany with .Where is often impractical because the logic for someCondition is fairly complex and depends on some data lookups. So my ideal solution would look something like this:
customers
.AsParallel()
.SelectMany(c => {
var context = GetContextForCustomer(c);
if (someCondition)
return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
else
continue? return null? return empty list?
})
.ToList();
What do I put in the else line to skip a return value? None of the solutions I can come up with work or are ideal:
continue doesn't compile because it's not an active foreach loop
return null causes an NRE
return empty list requires me to initialize a list of anonymous type again.
Is there a way to accomplish the above that is clean, simple, and neat, and satisfies all my (picky) requirements?

You could return an empty Enumerable<dynamic>. Here's an example (though without your customers and someCondition, because I don't know what they are, but of the same general form of your example):
new int[] { 1, 2, 3, 4 }
.AsParallel()
.SelectMany(i => {
if (i % 2 == 0)
return Enumerable.Repeat(new { i, squared = i * i }, i);
else
return Enumerable.Empty<dynamic>();
})
.ToList();
So, with your objects and someCondition, it would look like
customers
.AsParallel()
.SelectMany(c => {
var context = GetContextForCustomer(c);
if (someCondition)
return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
else
return Enumerable.Empty<dynamic>();
})
.ToList();

Without knowing what someCondition and myData look like...
Why don't you just Select and Where the contexts as well:
customers
.Select(c => GetContextForCustomer(c))
.Where(ctx => someCondition)
.SelectMany(ctx =>
myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
EDIT: I just realized you need to carry both the customer and context further, so you can do this:
customers
.Select(c => new { Customer = c, Context = GetContextForCustomer(c) })
.Where(x => someCondition(x.Context))
.SelectMany(x =>
myData.Select(d => new { CustomerID = x.Customer, X1 = d.x1, X2 = d.x2 });

You can try following:
customers
.AsParallel()
.SelectMany(c => {
var context = GetContextForCustomer(c);
if (someCondition)
return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
else
return Enumerable.Empty<int>().Select(x => new { CustomerID = 0, X1 = "defValue", X2 = "defValue" });
})
.ToList();
All anonymous types with the same set of properties (the same names and types) are combined into one one anonymous class by compiler. That's why both your Select and the one on Enumerable.Empty will return the same T.

You can create your own variarion of SelectMany LINQ method which supports nulls:
public static class EnumerableExtensions
{
public static IEnumerable<TResult> NullableSelectMany<TSource, TResult> (
this IEnumerable<TSource> source,
Func<TSource, IEnumerable<TResult>> selector)
{
if (source == null)
throw new ArgumentNullException("source");
if (selector == null)
throw new ArgumentNullException("selector");
foreach (TSource item in source) {
IEnumerable<TResult> results = selector(item);
if (results != null) {
foreach (TResult result in results)
yield return result;
}
}
}
}
Now you can return null in the selector lambda.

The accepted answer returns dynamic. The cleanest would be to move the filtering logic into a Where which makes the whole thing look better in linq context. Since you specifically rule that out in the question and I'm not a fan of delegates written over multiple lines in a linq call I will try this, but one can argue its more hacky.
var results = new
{
customerID = default(int), //notice the casing of property names
x1 = default(U), //whatever types they are
x2 = default(V)
}.GetEmptyListOfThisType();
foreach (var customerID in customers) {
var context = GetContextForCustomer(customerID);
if (someCondition)
results.AddRange(myData.Select(x => new { customerID, x.x1, x.x2 }));
}
public static List<T> GetEmptyListOfThisType<T>(this T item)
{
return new List<T>();
}
Notice the appropriate use of property names which is in accordance with other variable names, hence you dont have to write the property names a second time in the Select call.

Related

how to calculate average for each item in a list with LINQ?

How can I calculate each of the averages of the students? I did this ... but the average does not work for me, how could I do it? With a JoinGroup and then a GroupBy ?, I wait to see solutions, thanks.
var listadoA = alumnos.Join(examenes,
a => a._id,
e => e._alumnoId,
(a, e) => new
{
NombreAlumno = a._nombre,
Examenes = examenes,
Notas = e._nota,
}).Where(p => p.Examenes.Count() >= 1).OrderBy(p => p.NombreAlumno).ToList();
foreach (var obj in listadoA){
var promedio = obj.Average(p => p.Nota);
Console.Write($"\nAlumno = {obj.NombreAlumno}, Promedio ={promedio}");
}
class Examen{
public double _nota{get;set;}
public int _alumnoId {get;set;}
public int cursoId{get;set;}
public Examen(int id, double nota, int idMateria){
this._alumnoId = id;
this.cursoId = idMateria;
this._nota = nota;
}
public override string ToString(){
return ($"Alumno = {this._alumnoId}, Nota = {this._nota}, Curso = {this.cursoId}");
}
public static List<Examen> GetLista(){
return new List<Examen>(){
new Examen(2,5,1),
new Examen(4,7,5),
new Examen(4,9,3),
new Examen(3,10,4),
new Examen(7,5,3),
new Examen(2,8,4),
new Examen(6,9,5),
new Examen(9,7,1),
new Examen(6,5,4),
new Examen(9,1,4),
new Examen(7,9,5),
};
}
}
I'm a bit short on time to test it but I think it should work with a few small tweaks. If I've made any typos, let me know:
var listadoA = alumnos.GroupJoin(examenes,
a => a._id,
e => e._alumnoId,
(a, eGroup) => new
{
Alumno = a,
Examenes = eGroup
}).Where(p => p.Examenes.Count() >= 1).OrderBy(p => p.Alumno._nombre).ToList();
foreach (var obj in listadoA){
var promedio = obj.Examenes.Average(e => e._nota);
I'm curious why your fields starting with underscore are publicly accessible; that's the naming convention for a private field.. should really have public properties for them. Also, I've assumed that "nota" is the exam score..
EDIT
The answer was originaly posted before precisions were made about the classes involved. I still keep the english naming, because it might be clearer for a wider audience. This also helps clarify my answer in regards to #Caius Jard pertinent comment.
Using Linq to object: (no use of Entity Framework)
This code does the following tradeof compared to the use of join. It might be less performant, but it is more simple (you don't even have to understand what a join is).
AveragesByStudents = Students
.Select(s => new
{
StudentName = s.Name,
Notes = Exams
.Where(e => e.StudentId == s.Id)
.Select(e => e.Note)
.ToList()
})
.Select(s => new
{
s.StudentName,
Average = s.Notes.Any() ? s.Notes.Average() : null
});
;
With this example, you obtain all the students, even if they have no notes (in that case their average is null). You could do it in one select, but it would not be more readable.
With the following example, you obtain only the students that have notes, so their average cannot be null.
AveragesByStudents = Students
.Select(s => new
{
StudentName = s.Name,
Notes = Exams
.Where(e => e.StudentId == s.Id)
.Select(e => e.Note)
})
.Where(s => s.Notes.Any())
.Select(s => new
{
s.StudentName,
Average = Notes.Average()
});
;
Add a ToList() at the end of queries if you want to materialize.

How to get distinct values with corresponding data from IEnumerable

I need to be able to return back only the records that have a unique AccessionNumber with it's corresponding LoginId. So that at the end, the data looks something like:
A1,L1
A2,L1
A3,L2
However, my issue is with this line of code because Distinct() returns a IEnumerable of string and not IEnumerable of string[]. Therefore, compiler complains about string not containing a definition for AccessionNumber and LoginId.
yield return new[] { record.AccessionNumber, record.LoginId };
This is the code that I am trying to execute:
internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
IEnumerable<StudentAssessmentTestData> data = DataGetter.GetTestData("MyTestData");
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
var z = data.Select(x => x.AccessionNumber).Distinct();
foreach (var record in z)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
}
That's cause you are selecting only that property AccessionNumber by saying the below
var z = data.Select(x => x.AccessionNumber).Distinct();
You probably want to select entire StudentAssessmentTestData record
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString()).Distinct();
foreach (var record in data)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
Instead of using Distinct, use GroupBy. This:
var z = data.Select(x => x.AccessionNumber).Distinct();
foreach (var record in z)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
should be something like this:
return data.GroupBy(x => x.AccessionNumber)
.Select(r => new { AccessionNumber = r.Key, r.First().LoginId});
The GroupBy() call ensures only unique entries for AccessionNumber and the First() ensures that only the first one LoginId with that AccessionNumber is returned.
This assumes that your data is sorted in a way that if there are multiple logins with the same AccessionNumber, the first login is correct.
If you want to choose distinct values based on a certain property you can do it in several ways.
If it is always the same property you wish to use for comparision, you can override Equals and GetHashCode methods in the StudentAssessmentTestData class, thus allowing the Distinct method to recognize how the classes differ from each other, an example can be found in this question
However, you can also implement a custom IEqualityComparer<T> for your implementation, for example the following version
// Custom comparer taking generic input parameter and a delegate function to do matching
public class CustomComparer<T> : IEqualityComparer<T> {
private readonly Func<T, object> _match;
public CustomComparer(Func<T, object> match) {
_match = match;
}
// tries to match both argument its return values against eachother
public bool Equals(T data1, T data2) {
return object.Equals(_match(data1), _match(data2));
}
// overly simplistic implementation
public int GetHashCode(T data) {
var matchValue = _match(data);
if (matchValue == null) {
return 42.GetHashCode();
}
return matchValue.GetHashCode();
}
}
This class can then be used as an argument for the Distinct function, for example in this way
// compare by access number
var accessComparer = new CustomComparer<StudentTestData>(d => d.AccessionNumber );
// compare by login id
var loginComparer = new CustomComparer<StudentTestData>(d => d.LoginId );
foreach (var d in data.Distinct( accessComparer )) {
Console.WriteLine( "{0}, {1}", d.AccessionNumber, d.LoginId);
}
foreach (var d in data.Distinct( loginComparer )) {
Console.WriteLine( "{0}, {1}", d.AccessionNumber, d.LoginId);
}
A full example you can find in this dotnetfiddle
Add a LinqExtension method DistinctBy as below.
public static class LinqExtensions
{
public static IEnumerable<TSource> DistinctBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> seenKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (seenKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
}
Use it in your code like this:
var z = data.DistinctBy(x => x.AccessionNumber);
internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
IEnumerable<StudentAssessmentTestData> data = DataGetter.GetTestData("MyTestData");
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
var z = data.DistinctBy(x => x.AccessionNumber);
foreach (var record in z)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
}
This is the code that finally worked:
internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
var data = DataGetter.GetTestData("MyTestData");
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
var z = data.GroupBy(x => new{x.AccessionNumber})
.Select(x => new StudentAssessmentTestData(){ AccessionNumber = x.Key.AccessionNumber, LoginId = x.FirstOrDefault().LoginId});
foreach (var record in z)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
}
Returns a sequence that looks like similar to this:
Acc1, Login1
Acc2, Login1
Acc3, Login2
Acc4, Login1
Acc5, Login3
You can try this. It works for me.
IEnumerable<StudentAssessmentTestData> data = DataGetter.GetTestData("MyTestData");
data = data.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString());
var z = data.GroupBy(x => x.AccessionNumber).SelectMany(y => y.Take(1));
foreach (var record in z)
{
yield return new[] { record.AccessionNumber, record.LoginId };
}
I'm not 100% sure what you're asking. You either want (1) only records with a unique AccessionNumber , if two or more records had the same AccessionNumber then don't return them, or (2) only the first record for each AccessionNumber.
Here's both options:
(1)
internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
return
DataGetter
.GetTestData("MyTestData");
.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString())
.GroupBy(x => x.AccessionNumber)
.Where(x => !x.Skip(1).Any())
.SelectMany(x => x)
.Select(x => new [] { x.AccessionNumber, x.LoginId });
}
(2)
internal static IEnumerable<string[]> GetTestDataForSpecificItemType(ItemTypes itemTypeCode)
{
return
DataGetter
.GetTestData("MyTestData");
.Where(x => x.ItemTypeCode.Trim() == itemTypeCode.ToString())
.GroupBy(x => x.AccessionNumber)
.SelectMany(x => x.Take(1))
.Select(x => new [] { x.AccessionNumber, x.LoginId });
}

group by multiple columns (Dynamically) of datatable by linq Query

i want to group by multiple columns in a datatable by linq query.
i tried like this,
var _result = from row in tbl.AsEnumerable()
group row by new
{
id=row.Field<object>(_strMapColumn),
value=row.Field<object>(_strValueColumn),
} into g
select new
{
_strMapColumn = g.Key.id,
ToolTip = g.Sum(r => grp.Sum(r => r.Field<Double>(__strToolTip[1]))),
};
its works fine. my question is i have 10 column names in a strToolTip array i want to access 10 column names dynamically like for loop is it possible?
i want like this
select new
{_strMapColumn = g.Key.id,
for(int index = 1; index <= 10; index++)
{
ToolTip+index = g.Sum(r => getDoubleValue(r.Field<Double>(__strToolTip[1])))
}
};
and also want to add a DataType Dynamically please kindly provide the answer for solve this.
linq query is new for me.
You could group by a Dictionary and pass a custom comparer:
public class MyComparer : IEqualityComparer<Dictionary<string, object>> {
public bool Equals(Dictionary<string, object> a, Dictionary<string, object> b) {
if (a == b) { return true; }
if (a == null || b == null || a.Count != b.Count) { return false; }
return !a.Except(b).Any();
}
}
IEnumerable<string> columnsToGroupBy = ...
var rows = tbl.AsEnumerable();
var grouped = rows.GroupBy(r => columnsToGroupBy.ToDictionary(c => c, c => r[c]), new MyComparer());
var result = grouped.Select(g => {
// whatever logic you want with each grouping
var id = g.Key["id"];
var sum = g.Sum(r => r.Field<int>("someCol"));
});
Thanks to ChaseMedallion, I got dynamic grouping working.
Equals method was not enough, I had to add GetHashCode to MyComparer as well:
public int GetHashCode(Dictionary<string, object> a)
{
return a.ToString().ToLower().GetHashCode();
}

Compare lists with Linq, select records where ID appears in just one list

I am new to Linq and am tryng to filter records from two lists based on a field.
Each list has an ID, I want to take any record where the ID appears in one list but not the other.
I was able to do this with just a list of the ID's as follows:
List1 = _class1.getList1();
List2 = _class2.getList2();
(for introduction purposes I am using a class I would like to get rid of that has a list of the data and also a list of just the ID's, I should be able to do this with just the list of data though in two statements comparing list1 to list2 and vice versa)
var inList1ButNot2 = List1.IDList.Except(List2.IDList);
var inList2ButNot1 = List2.IDList.Except(List1.IDList);
Where I'm running into trouble is using the data list getting the comparison of the second list's ID field. I believe it should be something like:
var inList1ButNot2 = DataList1.Select(x => x.ID)
.Except(DataList2.Select(y => y.ID));
The problem with that is that I'm not getting the entire record just the field I am comparing, do I need to individually select each field afterwards or is there a way in the statement to select the record if ID appears in one list but not the other?
So what you really want here is an ExceptBy method; you want to be able to perform an Except on a projection of each element, rather than on the element iteself. Here is an implementation of such a method:
public static IEnumerable<TSource> ExceptBy<TSource, TKey>(
this IEnumerable<TSource> source,
IEnumerable<TSource> other,
Func<TSource, TKey> selector,
IEqualityComparer<TKey> comparer = null)
{
comparer = comparer ?? EqualityComparer<TKey>.Default;
var set = new HashSet<TKey>(other.Select(selector), comparer);
foreach (var item in source)
if (set.Add(selector(item)))
yield return item;
}
Now you can do:
var inList1ButNot2 = DataList1.ExceptBy(DataList2, item => item.ID);
var inList2ButNot1 = DataList2.ExceptBy(DataList1, item => item.ID);
There may be a better way to do this, but:
var inList1ButNot2 = DataList1.Where(x => !(DataList2.Any(y => y.ID == x.ID)));
NB: I free-handed that, so there may be a typo.
var list1 = new List<Asd>();
var list2 = new List<Asd>();
var asd = new Asd() {Id = 1, Name = "asd"};
var asd2 = new Asd() {Id = 2, Name = "asd"};
var asd3 = new Asd() {Id = 3, Name = "asd"};
var asd4 = new Asd() {Id = 4, Name = "asd"};
var asd5 = new Asd() {Id = 5, Name = "asd"};
list1.Add(asd);
list1.Add(asd2);
list1.Add(asd3);
list2.Add(asd);
list2.Add(asd4);
list2.Add(asd5);
var onlyInFirstList = list1.Where(x => !list2.Any(y => y == x));
var onlyInSecondList = list2.Where(x => !list1.Any(y => y == x));
This should work, not perfect but working :)
You might try something like this.
public void Test(List<List1> list1, List<List2> list2)
{
var result = from l1 in list1
where list2.All(l2 => l1.Id != l2.Id)
select l1;
}
Or, if you have properties from both (I'm assuming they're different types?) you could return an anonymous type and also bring back properties from the callers instance
public void Test(List<List1> list1, List<List2> list2)
{
var result = from l1 in list1
where list2.All(l2 => l1.Id != l2.Id)
select new
{
l1.Id,
l1.OtherField1,
Test = 10.5, //Example declare new field
SomethingElse = this.PropertyXyz; //Set new field = instance property
};
}
Rather than rewrite the logic in Except (and all the other set operations), but allowing for something that can be reused for other classes and selectors, consider something like the following:
private class LambdaComparer<T, U> : IEqualityComparer<T>
{
private Func<T, U> selector;
public LambdaComparer(Func<T, U> selector)
{
this.selector = selector;
}
public bool Equals(T x, T y)
{
if (x == null && y == null) return true;
if (x == null || y == null) return false;
return EqualityComparer<U>.Default.Equals(selector(x), selector(y));
}
public int GetHashCode(T obj)
{
if (obj == null) return 0;
return EqualityComparer<U>.Default.GetHashCode(selector(obj));
}
}
var inList1ButNot2 = List1.IDList.Except(
List2.IDList,
new LambdaComparer<ClassWithID, int>(w => w.ID));
you can try something like this
list1 = list1.Union(list2).Distinct().ToList();

how to deal with exception in LINQ Select statement

I have a LINQ query as follows
m_FOO = rawcollection.Select(p=> p.Split(' ')).Select(p =>
{
int thing = 0;
try
{
thing = CalculationThatCanFail(p[1]);
}
catch{}
return new { Test = p[0], FooThing = thing};
})
.GroupBy(p => p.Test)
.ToDictionary(p => p.Key, s => s.Select(q => q.FooThing).ToList());
So, the CalculationThatCanFail throws sometimes. I don't want to put null in and then filter that out with another Where statement later, and a junk value is equally unacceptable. Does anyone know how to handle this cleanly? Thanks.
EDIT: There's a good reason for the double Select statement. This example was edited for brevity
I'm not clear from question if you mean, you don't want to use null for FooThing or you don't want to use null for the entire anonymously typed object. In any case, would this fit the bill?
m_FOO = rawcollection.Select(p=> p.Split(' ')).Select(p =>
{
int thing = 0;
try
{
thing = CalculationThatCanFail(p[1]);
return new { Test = p[0], FooThing = thing};
}
catch
{
return null;
}
})
.Where(p => p != null)
.GroupBy(p => p.Test)
.ToDictionary(p => p.Key, s => s.Select(q => q.FooThing).ToList());
For these situations I use a Maybe type (similar to this one) for calculations that may or may not return a value, instead of nulls or junk values. It would look like this:
Maybe<int> CalculationThatMayHaveAValue(string x)
{
try
{
return CalculationThatCanFail(x);
}
catch
{
return Maybe<int>.None;
}
}
//...
var xs = ps.Select(p =>
{
Maybe<int> thing = CalculationThatMayHaveAValue(p[1]);
return new { Test = p[0], FooThing = thing};
})
.Where(x => x.FooThing.HasValue);

Categories

Resources