LINQ group by sum of property - c#

I'm trying to create a LINQ query which is a derivative of SelectMany.
I have N items:
new {
{ Text = "Hello", Width = 2 },
{ Text = "Something else", Width = 1 },
{ Text = "Another", Width = 1 },
{ Text = "Extra-wide", Width = 3 },
{ Text = "Random", Width = 1 }
}
I would like the result to be a List<List<object>>(), where:
List<List<object>> = new {
// first "row"
{
{ Text = "Hello", Width = 2 },
{ Text = "Something else", Width = 1 },
{ Text = "Another", Width = 1 }
},
// second "row"
{
{ Text = "Extra-wide", Width = 3 },
{ Text = "Random", Width = 1 }
}
}
So the items are grouped into "rows" where Sum(width) in the internal List is less than or equal to a number (maxWidth - in my instance, 4).
It's kinda a derivative of GroupBy, but the GroupBy is dependent on earlier values in the array - which is where I get stumped.
Any ideas would be appreciated.

We can combine the ideas of LINQ's Aggregate method with a GroupWhile method to group consecutive items while a condition is met to build an aggregate value for the current group to be used in the predicate:
public static IEnumerable<IEnumerable<T>> GroupWhileAggregating<T, TAccume>(
this IEnumerable<T> source,
TAccume seed,
Func<TAccume, T, TAccume> accumulator,
Func<TAccume, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
List<T> list = new List<T>() { iterator.Current };
TAccume accume = accumulator(seed, iterator.Current);
while (iterator.MoveNext())
{
accume = accumulator(accume, iterator.Current);
if (predicate(accume, iterator.Current))
{
list.Add(iterator.Current);
}
else
{
yield return list;
list = new List<T>() { iterator.Current };
accume = accumulator(seed, iterator.Current);
}
}
yield return list;
}
}
Using this grouping method we can now write:
var query = data.GroupWhileAggregating(0,
(sum, item) => sum + item.Width,
(sum, item) => sum <= 4);

You can sort of do that with the Batch method from MoreLinq library which is available as a NuGet package. The result is a List<IEnumerable<object>>. Here is the code:
class Obj
{
public string Text {get;set;}
public int Width {get;set;}
}
void Main()
{
var data = new [] {
new Obj { Text = "Hello", Width = 2 },
new Obj { Text = "Something else", Width = 1 },
new Obj { Text = "Another", Width = 1 },
new Obj { Text = "Extra-wide", Width = 3 },
new Obj { Text = "Random", Width = 1 }
};
var maxWidth = data.Max (d => d.Width );
var result = data.Batch(maxWidth).ToList();
result.Dump(); // Dump is a linqpad method
Output

I don't think you can do that with LINQ. One alternative approach would be the following:
var data = ... // original data
var newdata = new List<List<object>>();
int csum = 0;
var crow = new List<object>();
foreach (var o in data) {
if (csum + o.Width > 4) { //check if the current element fits into current row
newdata.Add(crow); //if not add current row to list
csum = 0;
crow = new List<object>(); //and create new row
}
crow.Add(o); //add current object to current row
csum += o.Width;
}
if (crow.Count() > 0) //last row
newData.Add(c);
EDIT: The other answer suggests to use Batch from the MoreLinq Library. In fact, the above source code, is more or less the same, what Batch does, but not only counting the elements in each batch but summing up the desired property. One could possibly generalize my code with a custom selector to be more flexible in terms of "batch size".

Related

Find the most frequently occurrence combinations for list of items

In my asp.net c# application, I have following list of occurrences of item combinations. I want to list the most frequently occurrence combinations.
Item1
Item1, Item2
Item3
Item1, Item3, Item2
Item3, Item1
Item2, Item1
According to the above example, I should get below output.
most frequently occurrence of the combinations are;
Item1 & Item2 - No of occurrences are 3 (#2, #4 & #6)
Item1 & Item3 - No of occurrences are 2 (#4 & #5)
My structure is as below.
public class MyList
{
public List<MyItem> MyItems { get; set; }
}
public class MyItem
{
public string ItemName { get; set; }
}
Out of the top of my head i would map all possible combinations using a hash where ab is the same as ba (or you could order your items alphabetically for example and then hash them) and then just count occurrences of the hashes...
You can create a weighted graph from your list with weight between two nodes representing frequency of occurrence. This StackExchange post has some information, as well as you can learn about adjacency matrix on this previous SO post here.
According to me, it would be wise to use
HashSet<Tuple<Item1, Item2>> to represent a connection and have it's value stored in a dictionary.
For multiple items, the problem is similar to finding out which path was traversed most, in path traversal algorithm for graphs.
Though for very large set of data, I recommend using SSAS and SSIS services through SQL Statements and Analysis Queries dynamically with C# to create a market basket analysis, which should generate desired statistics for you.
Here is a quick and dirty way to do this to get you started. You should probably use hash tables for performance, but I think Dictionaries are easier to visualize.
Fiddle: https://dotnetfiddle.net/yofkLf
public static void Main()
{
List<MyItem[]> MyItems = new List<MyItem[]>()
{
new MyItem[] { new MyItem("Item1") },
new MyItem[] { new MyItem("Item1"), new MyItem("Item2") },
new MyItem[] { new MyItem("Item3") },
new MyItem[] { new MyItem("Item1"), new MyItem("Item3"), new MyItem("Item2") },
new MyItem[] { new MyItem("Item3"), new MyItem("Item1") },
new MyItem[] { new MyItem("Item2"), new MyItem("Item1") }
};
Dictionary<Tuple<string, string>, int> results = new Dictionary<Tuple<string, string>, int>();
foreach (MyItem[] arr in MyItems)
{
// Iterate through the items in the array. Then, iterate through the items after that item in the array to get all combinations.
for (int i = 0; i < arr.Length; i++)
{
string s1 = arr[i].ItemName;
for (int j = i + 1; j < arr.Length; j++)
{
string s2 = arr[j].ItemName;
// Order the Tuple so that (Item1, Item2) is the same as (Item2, Item1).
Tuple<string, string> t = new Tuple<string, string>(s1, s2);
if (string.Compare(s1, s2) > 0)
{
t = new Tuple<string, string>(s2, s1);
}
if (results.ContainsKey(t))
{
results[t]++;
}
else
{
results[t] = 1;
}
}
}
}
// And here are your results.
// You can always use Linq to sort the dictionary by values.
foreach (var v in results)
{
Console.WriteLine(v.Key.ToString() + " = " + v.Value.ToString());
// Outputs:
// (Item1, Item2) = 3
// (Item1, Item3) = 2
// (Item2, Item3) = 1
}
}
...
public class MyItem
{
public string ItemName { get; set; }
public MyItem(string ItemName)
{
this.ItemName = ItemName;
}
}
Of course this would be different if you didn't have that string property in MyItems.
Here's a rough O(N^2) approach:
Iterate over the outer collection (the List<List<Item>>)
Come up with a way to define the current row, call it rowId
Now iterate the known row ids (inner iteration).
Count when one of these is a complete subset of the other; either the current row is contained in a previous set, or the previous set is contained in the current row. (This is the solution you want.) This works be incrementing the count of the rows previously seen if they are a subset of the current row, or tracking the number of times the current row is a subset of the previously seen combinations and setting that at the end of each inner iteration.
Some assumptions:
You don't care about every possible combination of items, only combinations that have already been seen.
Items have a unique identifier
Like I said above, this is an O(N^2) approach, so performance may be a concern. There's also two checks for subset membership which may be a performance issue. I'm also just joining and splitting ids as strings, you can probably get a more optimal solution by setting up another dictionary that tracks ids. There's also some room for improvement with Dictionary.TryGetValue. Extracting the sets of items you want is left as an exercise for the reader, but should be a straightforward OrderBy(..).Where(...) operation. But this should get you started.
public class MyItem
{
public string ItemName { get; set; }
}
class Program
{
public static void GetComboCount()
{
var itemsCollection = new List<List<MyItem>>() {
new List<MyItem>() { new MyItem() { ItemName = "Item1" } },
new List<MyItem>() { new MyItem() { ItemName = "Item1" }, new MyItem() { ItemName = "Item2" } },
new List<MyItem>() { new MyItem() { ItemName = "Item3" } },
new List<MyItem>() { new MyItem() { ItemName = "Item1" }, new MyItem() { ItemName = "Item3" }, new MyItem() { ItemName = "Item2" } },
new List<MyItem>() { new MyItem() { ItemName = "Item3" }, new MyItem() { ItemName = "Item1" } },
new List<MyItem>() { new MyItem() { ItemName = "Item2" }, new MyItem() { ItemName = "Item1" } }
};
var comboCount = new Dictionary<string, int>();
foreach (var row in itemsCollection)
{
var ids = row.Select(x => x.ItemName).OrderBy(x => x);
var rowId = String.Join(",", ids);
var rowIdCount = ids.Count();
var seen = false;
var comboCountList = comboCount.ToList();
int currentRowCount = 1;
foreach (var kvp in comboCountList)
{
var key = kvp.Key;
if (key == rowId)
{
seen = true;
currentRowCount++;
continue;
}
var keySplit = key.Split(',');
var keyIdCount = keySplit.Length;
if (ids.Where(x => keySplit.Contains(x)).Count() == keyIdCount)
{
comboCount[kvp.Key] = kvp.Value + 1;
}
else if (keySplit.Where(x => ids.Contains(x)).Count() == rowIdCount)
{
currentRowCount++;
}
}
if (!seen)
{
comboCount.Add(rowId, currentRowCount);
}
else
{
comboCount[rowId] = currentRowCount;
}
}
foreach (var kvp in comboCount)
{
Console.WriteLine(String.Format("{0}: {1}", kvp.Key, kvp.Value));
}
}
static void Main(string[] args)
{
GetComboCount();
}
}
console output:
Item1: 5
Item1,Item2: 3
Item3: 3
Item1,Item2,Item3: 1
Item1,Item3: 2

Find variables in a list of array

Let's say I have a list of arrays with contains as below:
var listArray = new List<string[]>():
1st array = {code, ID_1, PK_1, ID_2, PK_2} //Somehow like a header
2nd array = {85734, 32343, 1, 66544, 2}
3rd array = {59382, 23324, 1, 56998, 2}
4rd array = {43234, 45334, 1, 54568, 2}
and these arrays will be added into 'listArray'.
listArray.Add(array);
what should I do for matching the variable inside the list?
e.g: if ID_1 of the array is '32343', ID_2 = '66544'.
// create
var listArray = new List<string[]>():
string whatIWantToFind = "1234";
string[] mySearchArray = new string[] {"1234", "234234", "324234"};
// fill your array here...
// search
foreach(string[] listItem in listArray)
{
// if you want to check a single item inside...
foreach(string item in listItem)
{
// you can compare
if(item == whatIWantToFind)
{
}
// or check if it contains
if(item.Contains(whatIWantToFind))
{
}
}
// to compare everything..
bool checked = true;
for(int i = 0; i < listItem.lenght; i++)
{
if(!listItem[i].Equals(mySearchArray[i])
{
checked = false; break;
}
}
// aha! this is the one
if(checked) {}
}
If you create a class that contains all the data for one array, you can make a master array of those objects. For instance:
public class ListItem {
public string code, ID_1, PK_1, ID_2, PK_2;
}
And then you can use this class:
var listArray = new List<ListItem>();
listArray.add(new ListItem(){ code = 85734, ID_1 = 32343, PK_1 = 1, ID_2 = 66544, PK_2 = 2});
listArray.add(......);
Then, to find the data, you can use a field accessor on the objects in the array:
foreach(var item in listArray)
{
if (item.ID_1.equals("32343") && item.ID_2.equals("66544"))
Console.WriteLine("Found item.");
}
var listArray = new List<string[]>
{
new []{ "code", "ID_1", "PK_1", "ID_2", "PK_2"},
new []{ "85734", "32343", "1", "66544", "2"},
new []{"59382", "23324", "1", "56998", "2"}
};
var index = listArray.First().ToList().IndexOf("ID_1");
var result = listArray.Where((a, i) => i > 0 && a[index] == "32343").ToList();

unequal size lists to merge

I have searched without success to a similar situation as follows.
I have two lists, list A and list B.
List A is composed of 10 objects created from ClassA which contains only strings.
List B is composed of 100 objects created from ClassB which only contains decimals.
List A is the header information.
List B is the data information.
The relationship between the two lists is:
Row 1 of list A corresponds to rows 1-10 of list B.
Row 2 of list A corresponds to rows 11-20 of list B.
Row 3 of list A corresponds to rows 21-30 of list B.
etc.........
How can I combine these two lists so that when I display them on the console the user will see a header row followed immediately by the corresponding 10 data rows.
I apologize if this has been answered before.
Ok, that should work. Let me know in case I got anything wrong.
List<ClassA> listA = GetListA()// ...
List<ClassB> listB = GetListA()// ...
if(listB.Count % listA.Count != 0)
throw new Exception("Unable to match listA to listB");
var datasPerHeader = listB.Count / listA.Count;
for(int i = 0; i < listA.Count;i++)
{
ClassA header = listA[i];
IEnumerable<ListB> datas = listB.Skip(datasPerHeader*i).Take(datasPerHeader);
Console.WriteLine(header.ToString());
foreach(var data in datas)
{
Console.WriteLine("\t{0}", data.ToString());
}
}
Here is some code that should fulfill your request - I am going to find a link for the partition extension as I can't find it in my code anymore:
void Main()
{
List<string> strings = Enumerable.Range(1,10).Select(x=>x.ToString()).ToList();
List<decimal> decimals = Enumerable.Range(1,100).Select(x=>(Decimal)x).ToList();
var detailsRows = decimals.Partition(10)
.Select((details, row) => new {HeaderRow = row, DetailsRows = details});
var headerRows = strings.Select((header, row) => new {HeaderRow = row, Header = header});
var final = headerRows.Join(detailsRows, x=>x.HeaderRow, x=>x.HeaderRow, (header, details) => new {Header = header.Header, Details = details.DetailsRows});
}
public static class Extensions
{
public static IEnumerable<List<T>> Partition<T>(this IEnumerable<T> source, Int32 size)
{
for (int i = 0; i < Math.Ceiling(source.Count() / (Double)size); i++)
yield return new List<T>(source.Skip(size * i).Take(size));
}
}
That Partition method is the one that does the grunt work...
And here is the link to the article - LINK
EDIT 2
Here is better code for the Main() method... Rushed to answer and forgot brain:
void Main()
{
List<string> strings = Enumerable.Range(1,10).Select(x=>x.ToString()).ToList();
List<decimal> decimals = Enumerable.Range(1,100).Select(x=>(Decimal)x).ToList();
var detailsRows = decimals.Partition(10);
var headerRows = strings; //just renamed for clarity from other code
var final = headerRows.Zip(detailsRows, (header, details) => new {Header = header, Details = details});
}
This should be pretty straight forward unless I'm missing something.
var grouped = ListA.Select((value, index) =>
new {
ListAItem = value,
ListBItems = ListB.Skip(index * 10).Take(10)
})
.ToList();
Returns back an anonymous type you can loop through.
foreach (var group in grouped)
{
Console.WriteLine("List A: {0}", group.Name);
foreach (var listBItem in group.ListBItems)
{
Console.WriteLine("List B: {0}", listBItem.Name);
{
}
The easiest way may be something like this:
var listA = new List<string>() { "A", "B", "C", ... }
var listB = new List<decimal>() { 1m, 2m, 3m, ... }
double ratio = ((double)listA.Count) / listB.Count;
var results =
from i in Enumerable.Range(0, listB.Count)
select new { A = listA[(int)Math.Truncate(i * ratio)], B = listB[i] };
Or in fluent syntax:
double ratio = ((double)listA.Count) / listB.Count;
var results = Enumerable.Range(0, listB.Count)
.Select(i => new { A = listA[(int)Math.Truncate(i * ratio)], B = listB[i] });
Of course if you know you will always have 10 items in listB for each item in listA, you can simplify this to:
var results =
from i in Enumerable.Range(0, listB.Count)
select new { A = listA[i / 10], B = listB[i] };
Or in fluent syntax:
var results = Enumerable.Range(0, listB.Count)
.Select(i => new { A = listA[i / 10], B = listB[i] });
This will return a result set like
{ { "A", 1 },
{ "A", 2 },
{ "A", 3 }
..,
{ "A", 10 },
{ "B", 11 },
{ "B", 12 },
{ "B", 13 },
...
{ "B", 20 },
{ "C", 21 },
...
{ "J", 100 }
}

LINQ: Split list into groups according to weight/size

I've many examples using LINQ how to divide a list into sub-list according to max items in each list. But In this case I'm interested in diving a sub-lists using sizemb as a weight - having a max total filesize per list of 9mb.
public class doc
{
public string file;
public int sizemb;
}
var list = new List<doc>()
{
new doc { file = "dok1", sizemb = 5 },
new doc { file = "dok2", sizemb = 5 },
new doc { file = "dok3", sizemb = 5 },
new doc { file = "dok4", sizemb = 4 },
};
int maxTotalFileSize = 9;
The above list should then be divided into 3 lists. If any 'files' are more than 9mb they should be in their own list.
I made a non LINQ-version here:
var lists = new List<List<doc>>();
foreach (var item in list)
{
//Try and place the document into a sub-list
var availableSlot = lists.FirstOrDefault(p => (p.Sum(x => x.sizemb) + item.sizemb) < maxGroupSize);
if (availableSlot == null)
lists.Add(new List<doc>() { item });
else
availableSlot.Add(item);
}
You could use this method:
IEnumerable<IList<doc>> SplitDocumentList(IEnumerable<doc> allDocuments, int maxMB)
{
var lists = new List<IList<doc>>();
var list = new List<doc>();
foreach (doc document in allDocuments)
{
int totalMB = list.Sum(d => d.sizemb) + document.sizemb;
if (totalMB > maxMB)
{
lists.Add(list);
list = new List<doc>();
}
list.Add(document);
}
if (list.Count > 0)
lists.Add(list);
return lists;
}
Here's a demo: http://ideone.com/OkXw7C
dok1
dok2
dok3,dok4
You can use the Aggregate function to do that, the group by will only work when comparing values not based on an arbitrary condition of when to start a new group
list.Aggregate(new List<List<doc>>(), (acc,d) => {
if(acc.last().Sum(x => x.sizemb) + d.sizemb > 9) {
acc.Add(new List<doc>());
}
acc.last().Add(d);
return acc;
}
)

Group a list into groups of 3 and select max of each group

I have a list of lists of dynamic which is currently being filtered through this:
var CPUdataIWant = from s in rawData
where s.stat.Contains("CPU")
select s;
//CPUDataIWant is a List<List<dynamic>>.
I have 86000 values in each inner list.
And what I need to do, is group the values into groups of 3, select the max of that group, and insert that into another list of List of dynamic, or just filter it out of CPUDataIWant.
So an example of what I want would be:
Raw data = 14,5,7,123,5,1,43,87,9
And my processed value would be:
ProceData = [14,5,7], [123,5,1], [43,87,9]
ProceData = [14,123,87]
Doesn't have to be using linq but the easier the better.
EDIT: Ok I explained what a wanted a bit poorly.
here's what I have:
List<List<object>>
In this List, I'll have X amount of Lists called A.
In A I'll have 86000 values, let's say they're ints for now.
What I'd like, is to have
List<List<object>>
But instead of 86000 values in A, I want 28700, which would be made from the max of every 3 values in A.
IEnumerable<int> filtered = raw.Select((x, i) => new { Index = i, Value = x }).
GroupBy(x => x.Index / 3).
Select(x => x.Max(v => v.Value));
or, if you plan to use it more often
public static IEnumerable<int> SelectMaxOfEvery(this IEnumerable<int> source, int n)
{
int i = 0;
int currentMax = 0;
foreach (int d in source)
{
if (i++ == 0)
currentMax = d;
else
currentMax = Math.Max(d, currentMax);
if (i == n)
{
i = 0;
yield return currentMax;
}
}
if (i > 0)
yield return currentMax;
}
//...
IEnumerable<int> filtered = raw.SelectMaxOfEvery(3);
Old-school way of doing things makes it quite simple (although it's not as compact as LINQ):
// Based on this spec: "CPUDataIWant is a List<List<dynamic>>"
// and on the example, which states that the contents are numbers.
//
List<List<dynamic>> filteredList = new List<List<dynamic>>();
foreach (List<dynamic> innerList in CPUDataIWant)
{
List<dynamic> innerFiltered = new List<dynamic>();
// if elements are not in multiples of 3, the last one or two won't be checked.
for (int i = 0; i < innerList.Count; i += 3)
{
if(innerList[i+1] > innerList[i])
if(innerList[i+2] > innerList[i+1])
innerFiltered.Add(innerList[i+2]);
else
innerFiltered.Add(innerList[i+1]);
else
innerFiltered.Add(innerList[i]);
}
filteredList.Add(innerFiltered);
}
This should give the desired result:
var data = new List<dynamic> { 1, 2, 3, 3, 10, 1, 5, 2, 8 };
var firsts = data.Where((x, i) => i % 3 == 0);
var seconds = data.Where((x, i) => (i + 2) % 3 == 0);
var thirds = data.Where((x, i) => (i + 1) % 3 == 0);
var list = firsts.Zip(
seconds.Zip(
thirds, (x, y) => Math.Max(x, y)
),
(x, y) => Math.Max(x, y)
).ToList();
List now contains:
3, 10, 8
Or generalized to an extension method:
public static IEnumerable<T> ReduceN<T>(this IEnumerable<T> values, Func<T, T, T> map, int N)
{
int counter = 0;
T previous = default(T);
foreach (T item in values)
{
counter++;
if (counter == 1)
{
previous = item;
}
else if (counter == N)
{
yield return map(previous, item);
counter = 0;
}
else
{
previous = map(previous, item);
}
}
if (counter != 0)
{
yield return previous;
}
}
Used like this:
data.ReduceN(Math.Max, 3).ToList()
If you felt a need to use Aggregate you could do it like this:
(tested wiht LinqPad)
class Holder
{
public dynamic max = null;
public int count = 0;
}
void Main()
{
var data = new List<dynamic>
{new { x = 1 }, new { x = 2 }, new { x = 3 },
new { x = 3 }, new { x = 10}, new { x = 1 },
new { x = 5 }, new { x = 2 }, new { x = 1 },
new { x = 1 }, new { x = 9 }, new { x = 3 },
new { x = 11}, new { x = 10}, new { x = 1 },
new { x = 5 }, new { x = 2 }, new { x = 12 }};
var x = data.Aggregate(
new LinkedList<Holder>(),
(holdList,inItem) =>
{
if ((holdList.Last == null) || (holdList.Last.Value.count == 3))
{
holdList.AddLast(new Holder { max = inItem, count = 1});
}
else
{
if (holdList.Last.Value.max.x < inItem.x)
holdList.Last.Value.max = inItem;
holdList.Last.Value.count++;
}
return holdList;
},
(holdList) => { return holdList.Select((h) => h.max );} );
x.Dump("We expect 3,10,5,9,11,12");
}

Categories

Resources