Group by a predefined set of keys using LINQ - c#

I've faced the following problem using LINQ. Say I have a collection of numbers in range from one to five: [1,2,4,5,0,3,1 ...]. There could be any number of those in that array. What I want is to transform that array into following structure: [{number:0, count:5},{number:1, count:3}, {number:2, count:0}....]. If I use GroupBy I miss entry for number 2. Is there any elegant and effective way of doing this using LINQ?

You need to perform an outer join between your collection and a "fixed" collection containing numbers 0 to 5 first. Then group that and do the counting.

var arrayOfNumbers = new int[] {1, 5, 4, 5, 1, 0, 2, 3, 4, 5, 1,1,1};
var result =
from n in arrayOfNumbers
group n by n into g
select new { Number = g.Key, Count = g.Count() };
foreach (var item in result)
{
Console.WriteLine(String.Format("Number {0}: count {1}", item.Number, item.Count));
}

List<KeyValuePair<string, int>> pairs = new List<KeyValuePair<string,int>>();
for(int i = 0; i < 5; i++)
{
pairs.Add(new KeyValuePair<string, int>("number:" + i, arrayOfNums.Select(x => x == i).Count()));
}
Where arrayOfNums is your array of values between 0 and 5.
The results are stored in a List<KeyValuePair>> in the format you want.
You could improve upon this, instead of using the hardcoded 5 in the loop, you could use find the highest value in the array.

Related

c# Dictionary with HashSet<int> as value get intersection of all

I have a Dictionary with HashSet as Value. I have an int[] with the keys for which I want to get the Count of common values in the HashSet's.
Here is a piece of code that works in a very inefficient way as it requires to create a HashSet and modify it in memory before the final Count.
Dictionary<int, HashSet<int>> d = new Dictionary<int, HashSet<int>>();
HashSet<int> s1 = new HashSet<int>() { 3, 4, 5, 6, 7, 8, 9 };
HashSet<int> s2 = new HashSet<int>() { 1, 2, 3, 4, 5, 8 };
HashSet<int> s3 = new HashSet<int>() { 1, 3, 5, 10, 15, 20 };
HashSet<int> s4 = new HashSet<int>() { 1, 20 };
d.Add(10, s1);
d.Add(15, s2);
d.Add(20, s3);
d.Add(25, s4);
// List of keys from which I need the intersection of the HashSet's
int[] l = new int[3] { 10, 15, 20 };
// Get an IEnumerator with the HashSet from the values of the selected Dictionary entries (10,15,20 selects s1, s2 and s3)
var hashlist = d.Where(x => l.Contains(x.Key));
// Create a new HashSet to contain the intersection of all the HashSet's
HashSet<int> first = new HashSet<int>(hashlist.First().Value);
foreach (var hash in hashlist.Skip(1))
first.IntersectWith(hash.Value);
// Show the number of common int's
Console.WriteLine("Common elements: {0}", first.Count);
What I am looking for is an efficient way (LinQ perhaps?) to count the common elements without having to create a new HashSet as I am running a similar code hundreds of millions of times.
It is also important to note that I create a new HashSet to get the intersections as I do not want to modify the original HashSet's.
Best regargs,
Jorge
What I am looking for is an efficient way (LinQ perhaps?) to count the common elements
If you really wish maximum performance, forget about LINQ, here is an old school way with all possible optimizations (that I can think of) applied:
// Collect the non empty matching sets, keeping the set with the min Count at position 0
var sets = new HashSet<int>[l.Length];
int setCount = 0;
foreach (var key in l)
{
HashSet<int> set;
if (!d.TryGetValue(key, out set) || set.Count == 0) continue;
if (setCount == 0 || sets[0].Count <= set.Count)
sets[setCount++] = set;
else
{
sets[setCount++] = sets[0];
sets[0] = set;
}
}
int commonCount = 0;
if (setCount > 0)
{
if (setCount == 1)
commonCount = sets[0].Count;
else
{
foreach (var item in sets[0])
{
bool isCommon = true;
for (int i = 1; i < setCount; i++)
if (!sets[i].Contains(item)) { isCommon = false; break; }
if (isCommon) commonCount++;
}
}
}
Console.WriteLine("Common elements: {0}", commonCount);
Hope the code is self explanatory.
This can definitely be improved:
var hashlist = d.Where(x => l.Contains(x.Key));
By rewriting it as:
var hashlist = l.Select(x => d[x]);
This will take advantage of the Dictionary's internal HashSet to efficiently get the value at the specific key rather than repeatedly iterating over the int[].
Your next big problem is that Linq is lazy, so by calling Fist() and Skip(1) separately, you're actually requiring multiple enumerations over the collection using the previously mentioned Where(…) filter.
To avoid multiple enumerations, you could rewrite this:
HashSet<int> first = new HashSet<int>(hashlist.First().Value);
foreach (var hash in hashlist.Skip(1))
first.IntersectWith(hash.Value);
As:
var intersection = hashlist.Aggregate(
(HashSet<int>)null,
(h, j) =>
{
if (h == null)
h = new HashSet<int>(j);
else
h.IntersectWith(j);
return h;
});
But depending on your precise use case it may just be faster (and easier to understand) to simply materialize the result into a List first, then use a simple for loop:
var hashlist = l.Select(x => d[x]).ToList();
HashSet<int> first = hashlist[0];
for (var i = 0; i < hashlist.Count; i++)
first.IntersectWith(hashlist[i]);
Here's a quick benchmark with these various options (your results may vary):
Original 2.285680 (ms)
SelectHashList 1.912829
Aggregate 1.815872
ToListForLoop 1.608565
OrderEnumerator 1.975067 // Scott Chamberlain's answer
EnumeratorOnly 1.732784 // Scott Chamberlain's answer without the call to OrderBy()
AggIntersect 2.046930 // P. Kouvarakis's answer (with compiler error fixed)
JustCount 1.260448 // Ivan Stoev's updated answer
There are a few tricks you could do that could potentially buy you a lot of speed up. The biggest one I see is start with the smallest set first, then work your way up to larger ones, this gives the initial set the smallest possible amount of stuff to intersect with, giving faster lookups.
Also, if you manually build your ienumerable instead of using a foreach you don't need to enumerate the list twice (EDIT: also use the trick p.s.w.g mentioned, select against the dictionary instead of using a .Contains().
Important Note: this method will only give you benefits if you are combining a large number of HashSets with a wide range of item counts. The overhead of calling OrderBy will be significant and in a small dataset like you have in your example and it is unlikely you will see any benefit.
Dictionary<int, HashSet<int>> d = new Dictionary<int, HashSet<int>>();
HashSet<int> s1 = new HashSet<int>() { 3, 4, 5, 6, 7, 8, 9 };
HashSet<int> s2 = new HashSet<int>() { 1, 2, 3, 4, 5, 8 };
HashSet<int> s3 = new HashSet<int>() { 1, 3, 5, 10, 15, 20 };
HashSet<int> s4 = new HashSet<int>() { 1, 20 };
d.Add(10, s1);
d.Add(15, s2);
d.Add(20, s3);
d.Add(25, s4);
// List of keys from which I need the intersection of the HashSet's
int[] l = new int[3] { 10, 15, 20 };
HashSet<int> combined;
//Sort in increasing order by count
//Also used the trick from p.s.w.g's answer to get a better select.
IEnumerable<HashSet<int>> sortedList = l.Select(x => d[x]).OrderBy(x => x.Count);
using (var enumerator = sortedList.GetEnumerator())
{
if (enumerator.MoveNext())
{
combined = new HashSet<int>(enumerator.Current);
}
else
{
combined = new HashSet<int>();
}
while (enumerator.MoveNext())
{
combined.IntersectWith(enumerator.Current);
}
}
// Show the number of common int's
Console.WriteLine("Common elements: {0}", combined.Count);
`IntersectWith()' is probably as efficient as you can get.
Using LINQ you could make code cleaner (?):
var result = l.Aggregate(null, (acc, key) => acc == null? d[key] : acc.Intersect(d[key]));

Enumerator stuck in endless loop when removing excess items from a List

I have a script that takes an int[] array, converts it to a list and removes all further occurrences of the integers that already occurred at least 2 times.
The problem I have is that when it gets into the loop where I am checking the count of each integers occurrences, I am getting stuck in a loop.
EDIT: "What I left out was that the list has to remain in its original order so that excess numbers are removed from top down. Sorry if that confused those who already answered!
I thought that the changed number of the occursintegerOccurrence would act as a change of count for the while loop.
Any ideas on what I'm missing here? Aside from any discernible skill.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Runtime.Remoting.Messaging;
public class Kata
{
public static void Main()
{
int[] arr = new int[] {1, 2, 1, 4, 5, 1, 2, 2, 2};
int occurrenceLimit = 2;
var intList = arr.ToList();
for (int i = 0; i < intList.Count; i++)
{
var occursintegerOccurrence = intList.Count(n => n == occurrenceLimit);
do
{
occursintegerOccurrence = intList.Count(n => n == occurrenceLimit);
foreach (var x in intList)
{
Console.WriteLine(x);
intList.Remove(intList.LastIndexOf(occurrenceLimit));
// Tried changing the count here too
occursintegerOccurrence = intList.Count(n => n == occurrenceLimit);
}
} while (occursintegerOccurrence > occurrenceLimit);
}
}
}
Here's a fairly concise version, assuming that you want to remove all instances of integers with a count in excess of 2, leaving the remainder of the bag in its original sequence, with preference to retention traversing from left to right:
int[] arr = new int[] {1, 2, 1, 4, 5, 1, 2, 2, 2};
var ints = arr.Select((n, idx) => new {n, idx})
.GroupBy(x => x.n)
.SelectMany(grp => grp.Take(2))
.OrderBy(x => x.idx)
.Select(x => x.n)
.ToList();
Result:
1, 2, 1, 4, 5, 2
It works by using the index overload of Select to project an anonymous Tuple and carrying through the original order to allow re-ordering at the end.
The cause of the endless loop is the line
intList.Remove(intList.LastIndexOf(occurrenceLimit));
..you are removing the value equals to the last occurence in the list of the occurrenceLimit value(=2), that it is "8" (the last index of the array counting from 0).
Since "8" it isn't present in the list, you don't remove anything and the loop permanence test doesn't ever change and so it is always verified and the loop never ends..
This method works for any values of occurrenceLimit but I think that the solution of StuartLC is better..
int[] arr = new int[] { 1, 2, 1, 4, 5, 1, 2, 2, 2 };
int?[] arr2 = new int?[arr.Length];
arr2.ToList().ForEach(i => i = null);
int occurrenceLimit = 2;
var ints = arr.GroupBy(x => x).Select(x => x.Key).ToList();
ints.ForEach(i => {
int ndx = 0;
for (int occ = 0; occ < occurrenceLimit; occ++){
ndx = arr.ToList().IndexOf(i, ndx);
if (ndx < 0) break;
arr2[ndx++] = i;
}
});
List<int?> intConverted = arr2.ToList();
intConverted.RemoveAll(i => i.Equals(null));
this may help you
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
int[] arr = new int[] { 1, 2, 1, 4, 5, 1, 2, 2, 2 };
int occurrenceLimit = 2;
var newList = new List<Vm>();
var result=new List<Vm>();
for (int i = 0; i < arr.Length; i++)
{
var a = new Vm {Value = arr[i], Index = i};
result.Add(a);
}
foreach (var item in result.GroupBy(x => x.Value))
{
newList.AddRange(item.Select(x => x).Take(occurrenceLimit));
}
Console.WriteLine(string.Join(",",newList.OrderBy(x=>x.Index).Select(a=>a.Value)));
Console.ReadKey();
}
}
public class Vm
{
public int Value { get; set; }
public int Index { get; set; }
}
}
I did the following:
I created a Vm class with 2 props (Value and Index), in order to save the index of each value in the array.
I goup by value and take 2 ccurence of each values.
I order the result list base on the initial index.
It can be done by defining your own enumerator method, which will count already happened occurrences:
using System;
using System.Collections.Generic;
using System.Linq;
static class Test {
static IEnumerable<int> KeepNoMoreThen(this IEnumerable<int> source, int limit) {
Dictionary<int, int> counts = new Dictionary<int, int>();
foreach(int current in source) {
int count;
counts.TryGetValue(current, out count);
if(count<limit) {
counts[current]=count+1;
yield return current;
}
}
}
static void Main() {
int[] arr = new int[] { 1, 2, 1, 4, 5, 1, 2, 2, 2 };
int occurrenceLimit = 2;
List<int> result = arr.KeepNoMoreThen(occurrenceLimit).ToList();
result.ForEach(Console.WriteLine);
}
}
var removal = arr.GroupBy (a =>a ).Where (a =>a.Count()>2).Select(a=>a.Key).ToArray();
var output = arr.Where (a =>!removal.Contains(a)).ToList();
removal is an array of the items which appear more than twice.
output is the original list with those items removed.
[Update -- Just discovered that this handles the problem as originally specified, not as later clarified)
A single pass over the input array maintaining occurrence count dictionary should do the job in O(N) time:
int[] arr = new int[] { 1, 2, 1, 4, 5, 1, 2, 2, 2 };
int occurrenceLimit = 2;
var counts = new Dictionary<int, int>();
var resilt = arr.Where(n =>
{
int count;
if (counts.TryGetValue(n, out count) && count >= occurrenceLimit) return false;
counts[n] = ++count;
return true;
}).ToList();
Your code is stuck in an infinite loop because you are using List.Remove(), and the Remove() method removes an item by matching against the item you pass in. But you are passing in a list index instead of a list item, so you are getting unintended results. What you want to use is List.RemoveAt(), which removes an item by matching against the index.
So your code is stuck in an infinite loop because intList.LastIndexOf(occurrenceLimit) is returning 8, then Remove() looks for the item 8 in the list, but it doesn't find it so it returns false and your code continues to run. Changing this line:
intList.Remove(intList.LastIndexOf(occurrenceLimit));
to
intList.RemoveAt(intList.LastIndexOf(occurrenceLimit));
will "fix" your code and it will no longer get stuck in an infinite loop. It would then have the expected behavior of throwing an exception because you are modifying a collection that you are iterating through in a foreach.
As for your intended solution, I have rewritten your code with some changes, but keeping most of your code there instead of rewriting it entirely using LINQ or other magic. You had some issues:
1) You were counting the number of times occurenceLimit was found in the list, not the number of times an item was found in the list. I fixed this by comparing against intList[i].
2) You were using Remove() instead of RemoveAt().
3) Your foreach and do while need some work. I went with a while to simplify the initial case, and then used a for loop so I can modify the list (you cannot modify a list that you are iterating over in a foreach). In this for loop I iterate to the number of occurences - occurenceLimit to remove all but the first occurenceLimit number of them -- your initial logic was missing this and if your code worked as intended you would have removed every single one.
static void Main(string[] args)
{
int[] arr = new int[] { 1, 2, 1, 4, 5, 1, 2, 2, 2 };
int occurrenceLimit = 2;
var intList = arr.ToList();
// Interestingly, this `.Count` property updates during the for loop iteration,
// so even though we are removing items inside this `for` loop, we do not run off the
// end of the list as Count is constantly updated.
// Doing `var count = intList.Count`, `for (... i < count ...)` would blow up.
for (int i = 0; i < intList.Count; i++)
{
// Find the number of times the item at index `i` occurs
int occursintegerOccurrence = intList.Count(n => n == intList[i]);
// If `occursintegerOccurrence` is greater than `occurenceLimit`
// then remove all but the first `occurrenceLimit` number of them
while (occursintegerOccurrence > occurrenceLimit)
{
// We are not enumerating the list, so we can remove items at will.
for (var ii = 0; ii < occursintegerOccurrence - occurrenceLimit; ii++)
{
var index = intList.LastIndexOf(intList[i]);
intList.RemoveAt(index);
}
occursintegerOccurrence = intList.Count(n => n == intList[i]);
}
}
// Verify the results
foreach (var item in intList)
{
Console.Write(item + " ");
}
Console.WriteLine(Environment.NewLine + "Done");
Console.ReadLine();
}
Here's a pretty optimal solution:
var list = new List<int> { 1, 2, 1, 4, 5, 1, 2, 2, 2 };
var occurrenceLimit = 2;
list.Reverse(); // Reverse list to make sure we remove LAST elements
// We will store count of each element's occurence here
var counts = new Dictionary<int, int>();
for (int i = list.Count - 1; i >= 0; i--)
{
var elem = list[i];
if (counts.ContainsKey(elem)) // If we already faced this element we increment the number of it's occurencies
{
counts[elem]++;
if (counts[elem] > occurrenceLimit) // If it occured more then 2 times we remove it from the list
list.RemoveAt(i);
}
else
counts.Add(elem, 1); // We haven't faced this element yet so add it to the dictionary with occurence count of 1
}
list.Reverse(); // Again reverse list
The key feature with list is that you have to traverse it backwards to have a possibility to remove items. When you traverse it as usual it will throw you an exception that explains that the list cannot modified. But when you are going backwards you can remove elements as you wish as this won't affect your further operations.

How to count how many times exist each number from int[] inside IEnumerable<int>?

I have array of ints(Call him A) and IEnumarable(Call him B):
B - 1,2,4,8,289
A - 2,2,56,2,4,33,4,1,8,
I need to count how many times exist each number from A inside B and sum the result.
For example:
B - 1,2,4,8,289
A - 2,2,56,2,4,33,4,1,8,
result = 1+3+2+1+0
What is elegant way to implement it?
With LINQ it is easy:
int count = A
.Where(x => B.Contains(x))
.Count();
Counts how many times elements from A are contained in B.
As Yuval Itzchakov points out, this can be simplified like this:
int count = A.Count(x => B.Contains(x));
I need to count how many times exist each number from A inside B and sum the result.
You can get both the count and sum as follows
List<int> b = new List<int>() { 1,2,4,8,289 };
List<int> a = new List<int>() { 2,2,56,2,4,33,4,1,8 };
var subset = a.Where(i => b.Contains(i));
var count = subset.Count(); // 7
var sum = subset.Sum(); // 23
Note that I reuse the same Linq expression to get both the count and the sum.
One might be tempted to use a HashSet<int> in place of a List<int> because the .Contains operation is faster. However, HashSet is a set, meaning if the same number is added multiple times, only one copy of that number will remain in the set.
sweet and simple.. one line solution
why dont you try it..
int sum = 0;
A.ToList().ForEach(a=>sum +=B.Count(b=>b==a));
Console.Write(sum);
you can sweap the A/B it will still work
With Linq you can do like this
var B = new List<int>{ 1, 2, 4, 8, 289 };
var A = new List<int> { 2, 2, 56, 2, 4, 33, 4, 1, 8 };
var repetitionSum = B.Select(b => A.Count(a => a == b)).Sum(); //result = 7
And if you want, you can get the individual repetition list like this
var repetition = B.Select(b => A.Count(a => a == b)).ToList();
// { 1, 3, 2, 1, 0 }
It is not clear if you want to know the occurrences of each number or the final count (your text and your example code differ). Here is the code to get the number of appearances of each number
using System;
using System.Linq;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
int[] a = new []{1,2,3};
int[] b = new []{1,2,2,3};
Dictionary<int, int> aDictionary = a.ToDictionary(i=>i, i => 0);
foreach(int i in b)
{
if(aDictionary.ContainsKey(i))
{
aDictionary[i]++;
}
}
foreach(KeyValuePair<int, int> kvp in aDictionary)
{
Console.WriteLine(kvp.Key + ":" + kvp.Value);
}
}
}

How to search for a number in a list of arrays of numbers based on the first index of each array using LINQ?

I have a list of arrays of numbers. I am searching for the two arrays where my search number falls between the numbers positioned in index 0. Then return the number positioned in index 1 from the second array. (Assume numbers in index 0 are sorted already and there are no duplicates)
My wrong solution for LINQPad:
The value of 'found' should be 3 because 9 falls between 4 & 10 in second and third array. Then I take the second found array and return 3 which is in index 1 of that array.
List<int[]> list = new List<int[]> { new[] { 1, 5 }, new[] { 4, 6 }, new[] { 10, 3} , new[] { 15, 8} };
int searchFor = 9;
int found = list.Where(n => searchFor >= n[0] && searchFor <= n[0]).Select(i => i[1]).FirstOrDefault();
found.Dump(); //should be 3 instead of 0.
Try this :
int found = list.Zip(list.Skip(1), (x, y) => x[0]<=searchFor&&y[0]>=searchFor?y[1]:0).FirstOrDefault(o=>o!=0);
Well my logic is a little different, but get the result you want. I would recommend just using a Dictionary if you are doing key-pair-value stuff like this. It makes things simpler in my opinion and if you have no repeating key's this should work fine.
// Use dictionary instead of array's if just using two int values
var dic = new Dictionary<int, int>();
dic.Add(1, 5);
dic.Add(4, 6);
dic.Add(10, 3);
dic.Add(15, 8);
int searchFor = 9;
// Don't need to find this really
int low = (from l in dic
where l.Key <= searchFor
select l.Key).Max();
// Just need this
int found = (from h in dic
where h.Key >= searchFor
select h.Value).Min();
Console.WriteLine("Low: " + low);
Console.WriteLine("Found: " + found);
How about
var found = list.First(l => l[0] > searchFor)[1];
It should do the trick as I can assume that list is ordered by each first element.
If not, then
var found = list.Orderby(l=>l[0]).First(l => l[0] > searchFor)[1];
should also work.
The expression in the where statement filters for arrays having the first element less or equal and greater or equal than 9. Since it can't be less and greater at the same time it actually filters for all arrays that have 9 as first element.
For the given data this results in an empty sequence. FirstOrDefault therefore returns the default (0 for integers).
You actually have to look for the first element greater or equal than 9:
int[] result = list.FirstOrDefault(arr => arr[0] >= searchFor);
if (result == null)
{
Console.WriteLine("Not found!");
}
else
{
Console.WriteLine(result[1]);
}

How to find the Mode in Array C#? [duplicate]

This question already has answers here:
Find character with most occurrences in string?
(12 answers)
Closed 7 years ago.
I want to find the Mode in an Array. I know that I have to do nested loops to check each value and see how often the element in the array appears. Then I have to count the number of times the second element appears. The code below doesn't work, can anyone help me please.
for (int i = 0; i < x.length; i ++)
{
x[i]++;
int high = 0;
for (int i = 0; i < x.length; i++)
{
if (x[i] > high)
high = x[i];
}
}
Using nested loops is not a good way to solve this problem. It will have a run time of O(n^2) - much worse than the optimal O(n).
You can do it with LINQ by grouping identical values and then finding the group with the largest count:
int mode = x.GroupBy(v => v)
.OrderByDescending(g => g.Count())
.First()
.Key;
This is both simpler and faster. But note that (unlike LINQ to SQL) LINQ to Objects currently doesn't optimize the OrderByDescending when only the first result is needed. It fully sorts the entire result set which is an O(n log n) operation.
You might want this O(n) algorithm instead. It first iterates once through the groups to find the maximum count, and then once more to find the first corresponding key for that count:
var groups = x.GroupBy(v => v);
int maxCount = groups.Max(g => g.Count());
int mode = groups.First(g => g.Count() == maxCount).Key;
You could also use the MaxBy extension from MoreLINQ method to further improve the solution so that it only requires iterating through all elements once.
A non LINQ solution:
int[] x = new int[] { 1, 2, 1, 2, 4, 3, 2 };
Dictionary<int, int> counts = new Dictionary<int, int>();
foreach( int a in x ) {
if ( counts.ContainsKey(a) )
counts[a] = counts[a]+1
else
counts[a] = 1
}
int result = int.MinValue;
int max = int.MinValue;
foreach (int key in counts.Keys) {
if (counts[key] > max) {
max = counts[key];
result = key;
}
}
Console.WriteLine("The mode is: " + result);
As a beginner, this might not make too much sense, but it's worth providing a LINQ based solution.
x
.GroupBy(i => i) //place all identical values into groups
.OrderByDescending(g => g.Count()) //order groups by the size of the group desc
.Select(g => g.Key) //key of the group is representative of items in the group
.First() //first in the list is the most frequent (modal) value
Say, x array has items as below:
int[] x = { 1, 2, 6, 2, 3, 8, 2, 2, 3, 4, 5, 6, 4, 4, 4, 5, 39, 4, 5 };
a. Getting highest value:
int high = x.OrderByDescending(n => n).First();
b. Getting modal:
int mode = x.GroupBy(i => i) //Grouping same items
.OrderByDescending(g => g.Count()) //now getting frequency of a value
.Select(g => g.Key) //selecting key of the group
.FirstOrDefault(); //Finally, taking the most frequent value

Categories

Resources