C# GroupBy - Creating multiple grouping levels

C# GroupBy - Creating multiple grouping levels - c#

Given the following class:
public class Transaction
{
public string Category { get; set; }
public string Form { get; set; }
}
How do I get a grouping of transactions that are grouped by both the Category and the Form?
Basically I want it to output like so:
Category 1
Form 1
Transaction1
Transaction2
Transaction3
...
Form 2
Transaction1
Transaction2
Transaction3
...
Category 2
Form 1
Transaction1
Transaction2
Transaction3
...
Form 2
Transaction1
Transaction2
Transaction3
...

Here is an example using nested foreach loops, I'm not sure how you would do this in a single string of linq statements, maybe with lots of selectmanys?
var transactions = new[]{
new{Category = "1", Form = "1", Title = "Trans1" },
new{Category = "1", Form = "1", Title = "Trans2" },
new{Category = "1", Form = "1", Title = "Trans3" },
new{Category = "1", Form = "2", Title = "Trans1" },
new{Category = "1", Form = "2", Title = "Trans2" },
new{Category = "1", Form = "2", Title = "Trans3" },
new{Category = "2", Form = "1", Title = "Trans1" },
new{Category = "2", Form = "1", Title = "Trans2" },
new{Category = "2", Form = "1", Title = "Trans3" },
new{Category = "1", Form = "3", Title = "Trans1" },
new{Category = "1", Form = "3", Title = "Trans2" },
new{Category = "1", Form = "3", Title = "Trans3" },
};
foreach(var byCategory in transactions.GroupBy(x => x.Category))
{
Console.WriteLine(byCategory.Key);
foreach(var byForm in byCategory.GroupBy(x => x.Form))
{
Console.WriteLine("\t" + byForm.Key);
foreach(var trans in byForm)
{
Console.WriteLine("\t\t" + trans.Title);
}
}
}
Just because I was curious what it would look like I came up with the following, YOU SHOULD NOT USE THIS IN PRODUCTION CODE as it is ridiculous (if you do have a data structure like this it should be broken up into something like Dictionary<CategoryName, FormGroup> or something with meaningful types)
Dictionary<string, Dictionary<string, List<string>>> tooManyDictionaries = transactions
.GroupBy(x => x.Category)
.ToDictionary(
catGroup => catGroup.Key,
catGroup => catGroup
.GroupBy(x => x.Form)
.ToDictionary(
formGroup => formGroup.Key,
formGroup => formGroup.Select(x => x.Title).ToList()));

I ended up with the following, because the grouping need to be complete before the iteration over the collection.
Seed Some Transactions
var cats = new[] { "Category 1", "Category 2", "Category 3" };
var frms = new[] { "Form 1", "Form 2", "Form 3" };
var transactions = new List<Transaction>();
for (var i = 0; i <= 150; i++)
{
transactions.Add(new Transaction
{
Category = i % 2 == 0 ? cats[0] : i % 3 == 0 ? cats[1] : cats[2],
Form = i % 5 == 0 ? frms[0] : i % 7 == 0 ? frms[1] : frms[2]
});
}
The Grouping
var groupedTransactions = transactions.GroupBy(x => x.Category)
.Select(x => new
{
Category = x.Key,
Forms = x.ToList()
.GroupBy(y => y.Form)
});
Write it to the Console
foreach (var group in groupedTransactions.OrderBy(x => x.Category))
{
Console.WriteLine(group.Category);
foreach (var form in group.Forms.OrderBy(x => x.Key))
{
Console.WriteLine("\t" + form.Key);
foreach (var transaction in form)
{
Console.WriteLine("\t\t" + transaction.Id);
}
}
}

Basically, the first group should contain every piece of information that you will need to group by later, and then on repeat groupings, remove one piece of information that is relevant for that grouping level. Do this as many times as you need.
ValueTuple can help a lot, since it lets you have a composite key that can be passed to another type. Otherwise with anonymous types, you'll need to rely on type inference to pass the groups to something else. One issue with ValueTuple though is that you can't have a 1-Tuple for some reason, so in that case you need to group by the single property and not use a tuple.
If you already have a hierarchical relationship in your data structure, then grouping by a Tuple might be unnecessary.
var groups =
transactions
.GroupBy(tran => (
Category: tran.Category,
Form: tran.Form
)).GroupBy(group => group.Key.Form)
.ToList();
The type gets complicated very fast, so use type inference and refactoring tools to avoid having to figure out the specific type, when possible. For example, just the above results in the type:
List<IGrouping<string, IGrouping<(string Category, string Form), Transaction>>>

Related

Left Outer join Extension method linq

i would translate this queries with extension method
also i would merge this queries in a single query with extension method
var variants = _ctx.Varianti.Where(i=>i.attivo==0);
var allProducts = await (from p in _ctx.Articoli
where p.cat==1
join v in variants on p.code equals v.code into gj
from articoli in gj.DefaultIfEmpty()
select new {
Codart = p.Codart,
Codvar = articoli.Codvar,
}).ToListAsync();
My classes
class Articolo{
public string Codart //key
public double price
}
class Variante{
public string Codart //key
public string Codvar // key
public int attivo
}
I have to return products like so
Prod1-Variant1
Prod2-(no variant)
prod3-Variant1
prod4-Variant1
prod4-Variant2
prod5-(no variant)
I should filters only variants with attivo==0
And all product without variant if they not have
The code works well but i need to optimize in single query to database
and also with extension method
In T-Sql should be as so:
SELECT Codart,
Codvar
FROM dbo.Articoli
LEFT OUTER JOIN dbo.Varianti
ON dbo.Articoli.Codart = dbo.Varianti.Codart
WHERE (Cat = 1)
AND (attivo = 0)

I'm still not sure what's the problem. Here some example how to "left-outer-join" the products and variants and select a new object.
List<Articolo> products = new List<Articolo>()
{
new Articolo() { Code = "1", price = 1 },
new Articolo() { Code = "2", price = 1 },
new Articolo() { Code = "3", price = 1 },
new Articolo() { Code = "4", price = 1 },
new Articolo() { Code = "5", price = 1 },
};
List<Variante> variants = new List<Variante>()
{
new Variante() { Code = "1", attivo = 0, Codvar = "v1" },
new Variante() { Code = "3", attivo = 0, Codvar = "v1" },
new Variante() { Code = "4", attivo = 0, Codvar = "v1" },
new Variante() { Code = "4", attivo = 0, Codvar = "v2" },
new Variante() { Code = "5", attivo = 1, Codvar = "v2" },
};
var result = products // Our "Left"-List
.GroupJoin( // Join to a "one"-to-"null or many"-List
variants.Where(v => v.attivo == 0), // Our "right"-List
p => p.Code, // select key in left list
v => v.Code, // select key in right list
(p, v) => // for every product "p" we have a list of variants "v"
v.Any() ? // do we have vriants for our product?
v.Select(s => new // for every variant we build our new product
{
Code = p.Code,
FirstVariant = s.Codvar,
})
: // if we got no variants, we build a "no variant"-product
new[] { new {
Code = p.Code,
FirstVariant = "No Variant"
} } // here we got a list of product-variants per product ("list of lists")
).SelectMany(s => s); // We want one list of all product variants
foreach (var item in result)
{
Console.WriteLine("Code: {0}, FirstVar: {1}", item.Code, item.FirstVariant);
}

Get the element with in the array whose occurrence is 4 times

I have an array of strings , i want to find those element whose occurrence are 4 or more than 4 times with in the array.
my code
internal static void ProcessArray(string[] numArray)
{
string response = string.Empty;
var duplicates = (numArray
.GroupBy(e => e)
.Where(e => e.Count() >= 4)
.Select(e => e.First())).ToList();
//do some further business logic
}
So duplicate should return me a list of string which has the element.
I am calling this from my method below
Public static string GetDuplicates()
{
string[] s = new new string[]{" 1","1","2","2","2","1","3,"2","1" }
string result = ProcessArray(s);
return result
}
it only returns 2 in the list , the correct result should be 1,2 in the list.

var values = new string [] { "1", "1", "2", "2", "2", "1", "3", "2", "1" };
var groups = values.GroupBy(i => i).Select(i => new { Number = i.Key, Count = i.Count() });
foreach(var item in groups)
{
if(item.Count == 4)
{
Console.WriteLine(item.Number);
}
}
WORKING FIDDLE

Linq: Sum() non-integer values

This is a continuation from my previos question:
Linq (GroupBy and Sum) over List<List<string>>
I have a query like so:
var content = new List<List<string>>
{
new List<string>{ "book", "code", "columnToSum" },
new List<string>{ "abc", "1", "10" },
new List<string>{ "abc", "1", "5" },
new List<string>{ "cde", "1", "6" },
};
var headers = content.First();
var result = content.Skip(1)
.GroupBy(s => new { Code = s[headers.IndexOf("code")], Book = s[headers.IndexOf("book")]})
.Select(g => new
{
Book = g.Key.Book,
Code = g.Key.Code,
Total = g.Select(s => int.Parse(s[headers.IndexOf("columnToSum")])).Sum()
});
This works fine but I'm just wondering how I can handle the case there the columnToSum is empty? So for example this gives me the error "Input string was not in a correct format" as the int.Parse fails
var content = new List<List<string>>
{
new List<string>{ "book", "code", "columnToSum" },
new List<string>{ "abc", "1", "10" },
new List<string>{ "abc", "1", "" },
new List<string>{ "cde", "1", "6" },
};
How can I handle this scenario gracefully?

Why don't you just add a zero onto the front of the string?
s => int.Parse("0" + s[headers.IndexOf("columnToSum")])
Of course, it's a big hack. But it will solve your problem quickly and (quite) readably if the only exceptional case you're really worried about is the empty string.
I wonder where you're getting these empty strings from. If it's something you have control over like a SQL query, why don't you just change your query to give "0" for no value? (As long as the empty column isn't used in a different sense somewhere else in your code.)

One option, use string.All(Char.IsDigit) as pre-check:
Total = g.Select(s => !string.IsNullOrEmpty(s[headers.IndexOf("columnToSum")]) &&
s[headers.IndexOf("columnToSum")].All(Char.IsDigit) ?
int.Parse(s[headers.IndexOf("columnToSum")]) : 0).Sum())
another would be to use int.TryParse:
int val = 0;
// ...
Total = g.Select(s => int.TryParse(s[headers.IndexOf("columnToSum")], out val) ?
int.Parse(s[headers.IndexOf("columnToSum")]) : 0).Sum())

That code assumes that empty string is 0:
Total = g.Where(s => !String.IsNullOrEmpty(s)).Select(s => int.Parse(s[headers.IndexOf("columnToSum")])).Sum()

Unfortunately, this isn't going to look very nice...
g.Select(s => !s[headers.IndexOf("columnToSum")].Any(Char.IsDigit) ?
0 : Int32.Parse(s[headers.IndexOf("columnToSum")])).Sum()
However, you could wrap this up in a nice extension method
public static class StrExt
{
public static int IntOrDefault(this string str, int defaultValue = 0)
{
return String.IsNullOrEmpty(str) || !str.Any(Char.IsDigit) ? defaultValue : Int32.Parse(str);
}
}
...
g.Select(s => s[headers.IndexOf("columnToSum")].IntOrDefault()).Sum();
The extension method give you the flexibility to set whatever default value you want if the str is not a number - it defaults to 0 if the parameter is ommitted.

Using lists here is problematic, and I would parse this into a proper data structure (like a Book class), which I think will clean up the code a bit. If you're parsing CSV files, take a look at FileHelpers, it's great library for these types of tasks, and it can parse into a data structure for you.
That being said, if you'd still like to continue using this paradigm, I think you can get the code fairly clean by creating two custom methods: one for dealing with the headers (one of the few places I'd use dynamic types to get rid of ugly strings in your code) and one for parsing the ints. You then get something like this:
var headers = GetHeaders(content.First());
var result = from entry in content.Skip(1)
group entry by new {Code = entry[headers.code], Book = entry[headers.book] } into grp
select new {
Book = grp.Key.Book,
Code = grp.Key.Code,
Total = grp.Sum(x => ParseInt(x[headers.columnToSum]))
};
public dynamic GetHeaders(List<string> headersList){
IDictionary<string, object> headers = new ExpandoObject();
for (int i = 0; i < headersList.Count; i++)
headers[headersList[i]] = i;
return headers;
}
public int ParseInt(string s){
int i;
if (int.TryParse(s, out i))
return i;
return 0;
}

You can use multiple lines in a lambda expression and return a value at end.
So, instead of
Total = g.Select(s => int.Parse(s[headers.IndexOf("columnToSum")])).Sum()
I would write
Total = g.Select(s => {
int tempInt = 0;
int.TryParse(s[headers.IndexOf("columnToSum")], out tempInt);
return tempInt;
}).Sum()

t = new List<List<string>>
{
new List<string>{ "book", "code", "columnToSum" },
new List<string>{ "abc", "1", "10" },
new List<string>{ "abc", "1", "5" },
new List<string>{ "cde", "1", "6" },
};
var headers = content.First();
var result = content.Skip(1)
.GroupBy(s => new { Code = s[headers.IndexOf("code")], Book = s[headers.IndexOf("book")]})
.Select(g => new
{
Book = g.Key.Book,
Code = g.Key.Code,
Total = g.Select(s => int.Parse(s[headers.IndexOf("columnToSum")]!=""?s[headers.IndexOf("columnToSum")]:0)).Sum()
});

Linq query to group by field1, count field2 and filter by count between values of joined collection

I'm having trouble with getting a my linq query correct. I've been resisting doing this with foreach loops because I'm trying to better understand linq.
I have following data in LinqPad.
void Main()
{
var events = new[] {
new {ID = 1, EventLevel = 1, PatientID = "1", CodeID = "2", Occurences = 0 },
new {ID = 2, EventLevel = 2, PatientID = "1", CodeID = "2", Occurences = 0 },
new {ID = 3, EventLevel = 1, PatientID = "2", CodeID = "1", Occurences = 0 },
new {ID = 4, EventLevel = 3, PatientID = "2", CodeID = "2", Occurences = 0 },
new {ID = 5, EventLevel = 1, PatientID = "3", CodeID = "3", Occurences = 0 },
new {ID = 6, EventLevel = 3, PatientID = "1", CodeID = "4", Occurences = 0 }
};
var filter = new FilterCriterion();
var searches = new List<FilterCriterion.Occurence>();
searches.Add(new FilterCriterion.Occurence() { CodeID = "1", MinOccurences = 2, MaxOccurences = 3 });
searches.Add(new FilterCriterion.Occurence() { CodeID = "2", MinOccurences = 2, MaxOccurences = 3 });
filter.Searches = searches;
var summary = from e in events
let de = new
{
PatientID = e.PatientID,
CodeID = e.CodeID
}
group e by de into t
select new
{
PatientID = t.Key.PatientID,
CodeID = t.Key.CodeID,
Occurences = t.Count(d => t.Key.CodeID == d.CodeID)
};
var allCodes = filter.Searches.Select(i => i.CodeID);
summary = summary.Where(e => allCodes.Contains(e.CodeID));
// How do I find the original ID property from the "events" collection and how do I
// eliminate the instances where the Occurences is not between MinOccurences and MaxOccurences.
foreach (var item in summary)
Console.WriteLine(item);
}
public class FilterCriterion
{
public IEnumerable<Occurence> Searches { get; set; }
public class Occurence
{
public string CodeID { get; set; }
public int? MinOccurences { get; set; }
public int? MaxOccurences { get; set; }
}
}
The problem I have is that need to filter the results by the MinOccurences and MaxOccurences filter property and in the end I want the "events" objects where the IDs are 1,2,3 and 4.
Thanks in advance if you can provide help.

To access event.ID at the end of processing you need to pass it with your first query. Alter select to this:
// ...
group e by de into t
select new
{
PatientID = t.Key.PatientID,
CodeID = t.Key.CodeID,
Occurences = t.Count(d => t.Key.CodeID == d.CodeID),
// taking original items with us
Items = t
};
Having done that, your final query (including occurrences filter) might look like this:
var result = summary
// get all necessary data, including filter that matched given item
.Select(Item => new
{
Item,
Filter = searches.FirstOrDefault(f => f.CodeID == Item.CodeID)
})
// get rid of those without matching filter
.Where(i => i.Filter != null)
// this is your occurrences filtering
.Where(i => i.Item.Occurences >= i.Filter.MinOccurences
&& i.Item.Occurences <= i.Filter.MaxOccurences)
// and finally extract original events IDs
.SelectMany(i => i.Item.Items)
.Select(i => i.ID);
This produces 1, 2 as result. 3 and 4 are left out as they don't get past occurrences filtering.

I have run your program in linqpad.
My understanding is that you want to filter using filter.MinOccurences and filter.MaxOccurences on Occurences count of result data set.
You can add additional filters using Where clause.
if (filter.MinOccurences.HasValue)
summary = summary.Where (x=> x.Occurences >= filter.MinOccurences);
if (filter.MaxOccurences.HasValue)
summary = summary.Where (x=> x.Occurences <= filter.MaxOccurences);

LINQ-to-objects index within a group + for different groupings (aka ROW_NUMBER with PARTITION BY equivalent)

After much Google searching and code experimentation, I'm stumped on a complex C# LINQ-to-objects problem which in SQL would be easy to solve with a pair of ROW_NUMBER()...PARTITION BY functions and a subquery or two.
Here's, in words, what I'm trying to do in code-- the underlying requirement is removing duplicate documents from a list:
First, group a list by (Document.Title, Document.SourceId), assuming a (simplified) class definition like this:
class Document
{
string Title;
int SourceId; // sources are prioritized (ID=1 better than ID=2)
}
Within that group, assign each document an index (e.g. Index 0 == 1st document with this title from this source, Index 1 = 2nd document with this title from this source, etc.). I'd love the equivalent of ROW_NUMBER() in SQL!
Now group by (Document.Title, Index), where Index was computed in Step #2. For each group, return only one document: the one with the lowest Document.SourceId.
Step #1 is easy (e.g. codepronet.blogspot.com/2009/01/group-by-in-linq.html), but I'm getting stumped on steps #2 and #3. I can't seem to build a red-squiggle-free C# LINQ query to solve all three steps.
Anders Heilsberg's post on this thread is I think the answer to Steps #2 and #3 above if I could get the syntax right.
I'd prefer to avoid using an external local variable to do the Index computation, as recommended on slodge.blogspot.com/2009/01/adding-row-number-using-linq-to-objects.html, since that solution breaks if the external variable is modified.
Optimally, the group-by-Title step could be done first, so the "inner" groupings (first by Source to compute the index, then by Index to filter out duplicates) can operate on small numbers of objects in each "by title" group, since the # of documents in each by-title group is usually under 100. I really don't want an N2 solution!
I could certainly solve this with nested foreach loops, but it seems like the kind of problem which should be simple with LINQ.
Any ideas?

I think jpbochi missed that you want your groupings to be by pairs of values (Title+SourceId then Title+Index). Here's a LINQ query (mostly) solution:
var selectedFew =
from doc in docs
group doc by new { doc.Title, doc.SourceId } into g
from docIndex in g.Select((d, i) => new { Doc = d, Index = i })
group docIndex by new { docIndex.Doc.Title, docIndex.Index } into g
select g.Aggregate((a,b) => (a.Doc.SourceId <= b.Doc.SourceId) ? a : b);
First we group by Title+SourceId (I use an anonymous type because the compiler builds a good hashcode for the grouping lookup). Then we use Select to attach the grouped index to the document, which we use in our second grouping. Finally, for each group we pick the lowest SourceId.
Given this input:
var docs = new[] {
new { Title = "ABC", SourceId = 0 },
new { Title = "ABC", SourceId = 4 },
new { Title = "ABC", SourceId = 2 },
new { Title = "123", SourceId = 7 },
new { Title = "123", SourceId = 7 },
new { Title = "123", SourceId = 7 },
new { Title = "123", SourceId = 5 },
new { Title = "123", SourceId = 5 },
};
I get this output:
{ Doc = { Title = ABC, SourceId = 0 }, Index = 0 }
{ Doc = { Title = 123, SourceId = 5 }, Index = 0 }
{ Doc = { Title = 123, SourceId = 5 }, Index = 1 }
{ Doc = { Title = 123, SourceId = 7 }, Index = 2 }
Update: I just saw your question about grouping by Title first. You can do this using a subquery on your Title groups:
var selectedFew =
from doc in docs
group doc by doc.Title into titleGroup
from docWithIndex in
(
from doc in titleGroup
group doc by doc.SourceId into idGroup
from docIndex in idGroup.Select((d, i) => new { Doc = d, Index = i })
group docIndex by docIndex.Index into indexGroup
select indexGroup.Aggregate((a,b) => (a.Doc.SourceId <= b.Doc.SourceId) ? a : b)
)
select docWithIndex;

To be honest, I'm quite confused with your question. Maybe if you should explain what you're trying to solve. Anyway, I'll try to answer what I understood.
1) First, I'll assume that you already have a list of documents grouped by Title+SourceId. For testing purposes, I hardcoded a list as follow:
var docs = new [] {
new { Title = "ABC", SourceId = 0 },
new { Title = "ABC", SourceId = 4 },
new { Title = "ABC", SourceId = 2 },
new { Title = "123", SourceId = 7 },
new { Title = "123", SourceId = 5 },
};
2) To get put a index in every item, you can use the Select extension method, passing a Func selector function. Like this:
var docsWithIndex
= docs
.Select( (d, i) => new { Doc = d, Index = i } );
3) From what I understood, the next step would be to group the last result by Title. Here's how to do it:
var docsGroupedByTitle
= docsWithIndex
.GroupBy( a => a.Doc.Title );
The GroupBy function (used above) returns an IEnumerable<IGrouping<string,DocumentWithIndex>>. Since a group is enumerable too, we now have an enumerable of enumerables.
4) Now, for each of the groups above, we'll get only the item with the minimum SourceId. To make this operation we'll need 2 levels of recursion. In LINQ, the outer level is a selection (for each group, get one of its items), and the inner level is an aggregation (get the item with the lowest SourceId):
var selectedFew
= docsGroupedByTitle
.Select(
g => g.Aggregate(
(a, b) => (a.Doc.SourceId <= b.Doc.SourceId) ? a : b
)
);
Just to ensure that it works, I tested it with a simple foreach:
foreach (var a in selectedFew) Console.WriteLine(a);
//The result will be:
//{ Doc = { Title = ABC, SourceId = 0 }, Index = 0 }
//{ Doc = { Title = 123, SourceId = 5 }, Index = 4 }
I'm not sure that's what you wanted. If not, please comment the answer and I can fix the answer. I hope this helps.
Obs.: All the classes used in my tests were anonymous. So, you don't really need to define a DocumentWithIndex type. Actually, I haven't even declared a Document class.

Method Based Syntax:
var selectedFew = docs.GroupBy(doc => new {doc.Title, doc.SourceId}, doc => doc)
.SelectMany((grouping) => grouping.Select((doc, index) => new {doc, index}))
.GroupBy(anon => new {anon.doc.Title, anon.index})
.Select(grouping => grouping.Aggregate((a, b) => a.doc.SourceId <= b.doc.SourceId ? a : b));
Would you say the above is the equivalent Method based syntax?

I implemented an extension method. It supports multiple partition by fields as well as multiple order conditions.
public static IEnumerable<TResult> Partition<TSource, TKey, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<IEnumerable<TSource>, IOrderedEnumerable<TSource>> sorter,
Func<TSource, int, TResult> selector)
{
AssertUtilities.ArgumentNotNull(source, "source");
return source
.GroupBy(keySelector)
.Select(arg => sorter(arg).Select(selector))
.SelectMany(arg => arg);
}
Usage:
var documents = new[]
{
new { Title = "Title1", SourceId = 1 },
new { Title = "Title1", SourceId = 2 },
new { Title = "Title2", SourceId = 15 },
new { Title = "Title2", SourceId = 14 },
new { Title = "Title3", SourceId = 100 }
};
var result = documents
.Partition(
arg => arg.Title, // partition by
arg => arg.OrderBy(x => x.SourceId), // order by
(arg, rowNumber) => new { RowNumber = rowNumber, Document = arg }) // select
.Where(arg => arg.RowNumber == 0)
.Select(arg => arg.Document)
.ToList();
Result:
{ Title = "Title1", SourceId = 1 },
{ Title = "Title2", SourceId = 14 },
{ Title = "Title3", SourceId = 100 }

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# GroupBy - Creating multiple grouping levels - c#

Related

Left Outer join Extension method linq

Get the element with in the array whose occurrence is 4 times

Linq: Sum() non-integer values

Linq query to group by field1, count field2 and filter by count between values of joined collection

LINQ-to-objects index within a group + for different groupings (aka ROW_NUMBER with PARTITION BY equivalent)

Categories

Resources