When comparing two key-value dictionary sets in C#: set A and set B, what is the best way to enumerate keys present in set A but missing from set B and vice-versa?
For example:
A = { 1, 2, 5 }
B = { 2, 3, 5 }
Comparing B with A, missing keys = { 1 } and new keys = { 3 }.
Using Dictionary<...,...> objects, one can enumerating all values in B and test against set A using A.ContainsKey(key);, but it feels like there should be a better way that might involve a sorted set?
I'm aware of two built-in ways of doing set differences.
1) Enumerable.Except
Produces the set difference of two sequences by using the default equality comparer to compare values.
Example:
IEnumerable<int> a = new int[] { 1, 2, 5 };
IEnumerable<int> b = new int[] { 2, 3, 5 };
foreach (int x in a.Except(b))
{
Console.WriteLine(x); // prints "1"
}
2a) HashSet<T>.ExceptWith
Removes all elements in the specified collection from the current HashSet<T> object.
HashSet<int> a = new HashSet<int> { 1, 2, 5 };
HashSet<int> b = new HashSet<int> { 2, 3, 5 };
a.ExceptWith(b);
foreach (int x in a)
{
Console.WriteLine(x); // prints "1"
}
2b) HashSet<T>.SymmetricExceptWith
Modifies the current HashSet<T> object to contain only elements that are present either in that object or in the specified collection, but not both.
HashSet<int> a = new HashSet<int> { 1, 2, 5 };
HashSet<int> b = new HashSet<int> { 2, 3, 5 };
a.SymmetricExceptWith(b);
foreach (int x in a)
{
Console.WriteLine(x); // prints "1" and "3"
}
If you need something more performant, you'll probably need to roll your own collection type.
Use SortedDictionary : logic is A.Except(A.Intersect(B)).
Don't worry overmuch about performance until you've determined it is an issue to your datasets.
you can use the Except method.
Dictionary<string, string> dic1 = new Dictionary<string, string>() { { "rabbit", "hat" }, { "frog", "pond" }, { "cat", "house" } };
Dictionary<string, string> dic2 = new Dictionary<string, string>() { { "rabbit", "hat" }, { "dog", "house"}, {"cat", "garden"}};
var uniqueKeys = dic1.Keys.Except(dic2.Keys);
foreach (var item in uniqueKeys)
{
Console.WriteLine(item);
}
So there are several answers here that will work. But your original question is best addressed in two parts:
Q) When comparing two key-value dictionary sets in C#: set A and set B, what is the best way to enumerate keys present in set A but missing from set B and vice-versa? Using Dictionary<...,...> objects, one can enumerating all values in B and test against set A using A.ContainsKey(key);, ...
If your starting with two dictionaries this may be the best approach. To do anything else would require creating a copy of the keys from both sets, thus making most alternatives more expensive.
Q) ...but it feels like there should be a better way that might involve a sorted set?
Yes this can be done with a sorted list easily enough. Create two List insertion sorted with BinarySearch, then walk set 1 while searching set 2 ect.
See this SetList Complement and Subtract operations:
http://csharptest.net/browse/src/Library/Collections/SetList.cs#234
Related
This question already has answers here:
How to create a HashSet<List<Int>> with distinct elements?
(5 answers)
Closed 3 years ago.
I do have a HashSet in C# which looks like following:
HashSet<List<int>> _hash = new HashSet<List<int>>();
Now, I have inserted a value into it as below:
_hash.add(new List<int> {1,7});
When I write following code after the code above:
_hash.contains(new List<int>{1,7});
I was expecting it to return true since the same value has been just added but it did return false. Which did confuse me. Moreover, how do I make sure when I do have a hashset of List then there are no duplicates in it before I add any new value to it.
I thought the entire reason behind using HashSet is to avoid any duplication but it seems that this one allows duplication.
Now, to put it into perspective, all I want is when I have a List> then how can I make sure that each element(List) going into List> is unique?
You can create your own comparable read-only collection.
public class ComparableReadOnlyCollection<T> : ReadOnlyCollection<T>
{
public ComparableReadOnlyCollection(IList<T> list)
: base(list.ToArray())
{
}
public override bool Equals(object other)
{
return
other is IEnumerable<T> otherEnumerable &&
otherEnumerable.SequenceEqual(this);
}
public override int GetHashCode()
{
int hash = 43;
unchecked {
foreach (T item in this) {
hash = 19 * hash + item.GetHashCode();
}
}
return hash;
}
}
Note that ReadOnlyCollection<T> is just a wrapper for the original list. If you modify this list, the ReadOnlyCollection<T> reflects those changes. My implementation copies the original list to an array to make it really immutable.
But be aware that if the elements T are of a reference type, you can still modify members of the original objects! So be careful.
This test works as expected:
var hashSet = new HashSet<ComparableReadOnlyCollection<int>>();
hashSet.Add(new ComparableReadOnlyCollection<int>(new [] { 1, 7 }));
Console.WriteLine(hashSet.Contains(new ComparableReadOnlyCollection<int>(new [] { 1, 7 })));
Console.WriteLine(hashSet.Contains(new ComparableReadOnlyCollection<int>(new [] { 7, 1 })));
Console.WriteLine(hashSet.Contains(new ComparableReadOnlyCollection<int>(new [] { 1, 7, 0 })));
hashSet.Add(new ComparableReadOnlyCollection<int>(new [] { 1, 7 }));
hashSet.Add(new ComparableReadOnlyCollection<int>(new [] { 1, 7, 0 }));
hashSet.Add(new ComparableReadOnlyCollection<int>(new [] { 7, 1 }));
Console.WriteLine(hashSet.Count);
Console.ReadKey();
It prints
True
False
False
3
Note that it does not print 4, as there cannot be duplicates in the set.
2nd solution:
After reading your edit, I am not sure what you really want. Did you mean to create a HashSet<int> instead of a HashSet<List<int>> and to compare the elements of the lists instead of the lists themselves?
HashSet<int> _hash = new HashSet<int>(new List<int> { 1, 1, 2, 3, 5, 8, 13 });
Now the hash set contains the numbers { 1, 2, 3, 5, 8, 13 }. Set elements are always unique.
You can then test
var hash2 = new HashSet<int> { 3, 8 };
if (_hash.IsSupersetOf(hash2)) {
Console.WriteLine("_hash contains 3 and 8");
}
or, what is equivalent:
if (hash2.IsSubsetOf(_hash)) {
Console.WriteLine("_hash contains 3 and 8");
}
3rd solution:
What about a List<HashSet<int>>? Because now, you can apply set operations to each element of the list (which is a hash set).
I have this List<List<int>>:
{{1,2},{1,3},{1,4},{2,3},{2,4},{3,4}}
In this list there are 6 list, which contain numbers from 1 to 4, and the occurrence of each number is 3;
I want to filter it in order to get:
{{1,2}{1,3}{2,4}{3,4}}
here the occurrence of each number is 2;
the lists are generated dynamically and I want to be able to filter also dynamically, base on the occurrence;
Edit-More Details
I need to count how many times a number is contain in the List<List<int>>, for the above example is 3. Then I want to exclude lists from the List<List<int>> in order to reduce the number of times from 3 to 2,
The main issue for me was to find a way to not block my computer :), and also to get each number appear for 2 times (mandatory);
Well if it's always a combination of 2 numbers, and they have to appear N times on the list, it means that depending on the N You gonna have:
4 (different digits) x 2 (times hey have to appear) = 8 digits = 4 pairs
4 x 3 (times) = 12 = 6 (pairs)
4 x 4 = 16 = 8 pairs
That means - that from 6 pairs we know we must select 4 pairs that best match the criteria
so based on the basic combinatorics (https://www.khanacademy.org/math/probability/probability-and-combinatorics-topic/permutations/v/permutation-formula)
we have a 6!/2! = (6*5*4*3*2*1)/(2*1)= 360 possible permutations
basically You can have 360 different ways how You put the the second list together.
because it doesn't matter how You arrange the items in the list (the order of items in the list) then the number of possible combinations is 6!/(2!*4!) = 15
https://www.khanacademy.org/math/probability/probability-and-combinatorics-topic/combinations-combinatorics/v/combination-formula
so the thing is - you have 15 possible answers to Your question.
Which means - you only need to loop over it for 15 times.
There are only 15 ways to chose 4 items out of the list of 6
seems like this is a solution to Your - "killing the machine" question.
so next question - how do we find all the possible 'combination'
Let's define all the possible items that we can pick from the input array
for example 1-st, 2-nd, 3-rd and 4-th..
1,2,3,4....... 1,2,3,5...... 1,2,3,6 ...
All the combinations would be (from here https://stackoverflow.com/a/10629938/444149)
static IEnumerable<IEnumerable<T>> GetKCombs<T>(IEnumerable<T> list, int length) where T : IComparable
{
if (length == 1) return list.Select(t => new T[] { t });
return GetKCombs(list, length - 1)
.SelectMany(t => list.Where(o => o.CompareTo(t.Last()) > 0),
(t1, t2) => t1.Concat(new T[] { t2 }));
}
and invoke with (because there are 6 items to pick from, who's indexed are 0,1,2,3,4 and 5)
var possiblePicks = GetKCombs(new List<int> { 0, 1, 2, 3, 4, 5 }, 4);
we get 15 possible combinations
so now - we try taking 4 elements out of the first list, and check if they match the criteria.. if not.. then take another combination
var data = new List<List<int>>
{
new List<int> { 1,2 },
new List<int> { 1,3 },
new List<int> { 1,4 },
new List<int> { 2,3 },
new List<int> { 2,4 },
new List<int> { 3,4 }
};
foreach (var picks in possiblePicks)
{
var listToTest = new List<List<int>>(4);
foreach (var i in picks)
listToTest.Add(data[i]);
var ok = Check(listToTest, 2);
if (ok)
break;
}
private bool Check(List<List<int>> listToTest, int limit)
{
Dictionary<int, int> ret = new Dictionary<int, int>();
foreach (var inputElem in listToTest)
{
foreach (var z in inputElem)
{
var returnCount = ret.ContainsKey(z) ? ret[z] : 0;
if (!ret.ContainsKey(z))
ret.Add(z, returnCount + 1);
else
ret[z]++;
}
}
return ret.All(p => p.Value == limit);
}
I'm sure this can be further optimized to minimize the amount of iterations other the 'listToTest'
Also, this is a lazy implementation (Ienumerable) - so if it so happens that the very first (or second) combination is successful, it stop iterating.
I accepted the Marty's answer because fixed my issue, any way trying to use his method for larger lists, I found my self blocking again my computer so I start looking for another method and I end it up with this one:
var main = new List<HashSet<int>> {
new HashSet<int> {1,2},
new HashSet<int> {1,3},
new HashSet<int> {1,4},
new HashSet<int> {2,3},
new HashSet<int> {2,4},
new HashSet<int> {3,4} };
var items = new HashSet<int>(from l in main from p in l select p); //=>{1,2,3,4}
for (int i =main.Count-1;i-->0; )
{
var occurence=items.Select(a=> main.Where(x => x.Contains(a)).Count()).ToList();
var occurenceSum = 0;
foreach(var j in main[i])
{
occurenceSum += occurence[j - 1];
if (occurenceSum==6) //if both items have occurence=3, then the sum=6, then I can remove that list!
{
main.RemoveAt(i);
}
}
}
This question already has answers here:
How to find the Mode in Array C#? [duplicate]
(4 answers)
Closed 7 years ago.
How can I find the mode of a list of numbers? I know the logic of it (I think) but I don't know how to implement that logic or convert what my brain thinks into workable code.
This is what I know:
I need to have a loop that goes through the list one time to see how many times a number is repeated and an array to save the times a number is repeated. I also need to tell my program to discard the lesser amount once a larger one is found.
A linq approach, more concise but almost certainly less efficient than Yeldar Kurmangaliyev's:
int FindMode(IEnumerable<int> data)
{
return data
.GroupBy(n => n)
.Select(x => new { x.Key, Count = x.Count() })
.OrderByDescending(a => a.Count)
.First()
.Key;
}
This does not handle the case where data is empty, nor where there are two or more data points with the same frequency in the data set.
Yes, you are right:
Let we have a list of numbers:
List<int> myValues = new List<int>(new int[] { 1, 3, 3, 3, 7, 7 } );
You need to have a loop that goes through the list one time:
foreach (var val in myValues)
{
}
to see how many times a number is repeated in array to save the times a number is repeated:
Dictionary<int, int> repetitions = new Dictionary<int, int>();
foreach (var val in myValues)
{
if (repetitions.ContainsKey(val))
repetitions[val]++; // Met it one more time
else
repetitions.Add(val, 1); // Met it once, because it is not in dict.
}
Now, your dictionary repetitions stores how many (exactly value) times key value repeated.
Then, you need to find the record of mode (i.e. record with the highest time of repetitions (i.e. highest value)) and take this one. LINQ will help us - let's sort the array by value and take the last one...or sort it descending and take the first one. Actually, that's the same in terms of result and productivity.
var modeRecord = repetitions.OrderByDescending(x => x.Value).First();
// or
var modeRecord = repetitions.OrderBy(x => x.Value).Last();
Here it is! Here we have a mode:
List<int> myValues = new List<int>(new int[] { 1, 3, 3, 3, 7, 7 } );
Dictionary<int, int> repetitions = new Dictionary<int, int>();
foreach (var val in myValues)
{
if (repetitions.ContainsKey(val))
repetitions[val]++; // Met it one more time
else
repetitions.Add(val, 1); // Met it once, because it is not in dict.
}
var modeRecord = repetitions.OrderByDescending(x => x.Value).First();
Console.WriteLine("Mode is {0}. It meets {1} times in an list", modeRecord.Key, modeRecord.Value);
Your mode calculation logic is good. All you need is following your own instructions in a code :)
Here's an alternative LINQ approach:
var values = new int[] { 1, 3, 3, 3, 7, 7 };
var mode =
values
.Aggregate(
new { best = 0, best_length = 0, current = 0, current_length = 0 },
(a, n) =>
{
var current_length = 1 + (a.current == n ? a.current_length : 0);
var is_longer = current_length > a.best_length;
return new
{
best = is_longer ? n : a.best,
best_length = is_longer ? current_length : a.best_length,
current = n,
current_length,
};
}).best;
I have got this structure
var values = new Dictionary<int, List<Guid>>();
And I have to say if all dictionary elements has the same set of List<Guid>.
I dont need to know which are exactly are different, just to answer the question.
So it looks like
List A { 1, 2, 3} List B { 1, 2, 3} List C { 1, 2, 3} the same and have no difference.
and
List A { 3, 2, 3} List B { 1, 2, 3} List C { 1, 2, 3} are not the same.
I have no clue where I can start it.
Initially i guessed to convert List<Guid> to string and just do distinct operation over it.
But is this a good approach?
Thank you!
I'd create a HashSet<Guid> from one of the values (any) and then check that all of the others are equal to it:
// TODO: Handle the dictionary being empty
var firstSet = new HashSet<Guid>(values.First().Value);
var allEqual = values.All(pair => firstSet.SetEquals(pair.Value));
This assumes that:
The order within each list is unimportant
The number of times each GUID appears in the list is unimportant
(i.e. you really are thinking of them as sets, not lists, at least for this part of the code)
In other words, if you have guids A and B, the code above assumes that { A, B, B } is equivalent to { B, A }.
SequenceEquals() might be what you're looking for. Combine it with IEnumerable.All() and you can get a boolean answer whether all elements of your dictionary contain the same Lists. For instance:
values.All(list => values.All(list2 => list2.SequenceEquals(list));
I have two List's which I want to check for corresponding numbers.
for example
List<int> a = new List<int>(){1, 2, 3, 4, 5};
List<int> b = new List<int>() {0, 4, 8, 12};
Should give the result 4.
Is there an easy way to do this without too much looping through the lists?
I'm on 3.0 for the project where I need this so no Linq.
You can use the .net 3.5 .Intersect() extension method:-
List<int> a = new List<int>() { 1, 2, 3, 4, 5 };
List<int> b = new List<int>() { 0, 4, 8, 12 };
List<int> common = a.Intersect(b).ToList();
Jeff Richter's excellent PowerCollections has Set with Intersections. Works all the way back to .NET 2.0.
http://www.codeplex.com/PowerCollections
Set<int> set1 = new Set<int>(new[]{1,2,3,4,5});
Set<int> set2 = new Set<int>(new[]{0,4,8,12});
Set<int> set3 = set1.Intersection(set2);
You could do it the way that LINQ does it, effectively - with a set. Now before 3.5 we haven't got a proper set type, so you'd need to use a Dictionary<int,int> or something like that:
Create a Dictionary<int, int> and populate it from list a using the element as both the key and the value for the entry. (The value in the entry really doesn't matter at all.)
Create a new list for the intersections (or write this as an iterator block, whatever).
Iterate through list b, and check with dictionary.ContainsKey: if it does, add an entry to the list or yield it.
That should be O(N+M) (i.e. linear in both list sizes)
Note that that will give you repeated entries if list b contains duplicates. If you wanted to avoid that, you could always change the value of the dictionary entry when you first see it in list b.
You can sort the second list and loop through the first one and for each value do a binary search on the second one.
If both lists are sorted, you can easily do this in O(n) time by doing a modified merge from merge-sort, simply "remove"(step a counter past) the lower of the two leading numbers, if they are ever equal, save that number to the result list and "remove" both of them. it takes less than n(1) + n(2) steps. This is of course assuming they are sorted. But sorting of integer arrays isn't exactly expensive O(n log(n))... I think. If you'd like I can throw together some code on how to do this, but the idea is pretty simple.
Tested on 3.0
List<int> a = new List<int>() { 1, 2, 3, 4, 5, 12, 13 };
List<int> b = new List<int>() { 0, 4, 8, 12 };
List<int> intersection = new List<int>();
Dictionary<int, int> dictionary = new Dictionary<int, int>();
a.ForEach(x => { if(!dictionary.ContainsKey(x))dictionary.Add(x, 0); });
b.ForEach(x => { if(dictionary.ContainsKey(x)) dictionary[x]++; });
foreach(var item in dictionary)
{
if(item.Value > 0)
intersection.Add(item.Key);
}
In comment to question author said that there will be
Max 15 in the first list and 20 in the
second list
In this case I wouldn't bother with optimizations and use List.Contains.
For larger lists hash can be used to take advantage of O(1) lookup that leads to O(N+M) algorithm as Jon noted.
Hash requires additional space. To reduce memory usage we should hash shortest list.
List<int> a = new List<int>() { 1, 2, 3, 4, 5 };
List<int> b = new List<int>() { 0, 4, 8, 12 };
List<int> shortestList;
List<int> longestList;
if (a.Count > b.Count)
{
shortestList = b;
longestList = a;
}
else
{
shortestList = a;
longestList = b;
}
Dictionary<int, bool> dict = new Dictionary<int, bool>();
shortestList.ForEach(x => dict.Add(x, true));
foreach (int i in longestList)
{
if (dict.ContainsKey(i))
{
Console.WriteLine(i);
}
}
var c = a.Intersect(b);
This only works in 3.5 saw your requirement my apologies.
The method recommended by ocdecio is a good one if you're going to implement it from scratch. Looking at the time complexity compared to the nieve method we see:
Sort/binary search method:
T ~= O(n log n) + O(n) * O(log n) ~= O(n log n)
Looping through both lists (nieve method):
T ~= O(n) * O(n) ~= O(n ^ 2)
There may be a quicker method, but I am not aware of it. Hopefully that should justify choosing his method.
(Previous answer - changed IndexOf to Contains, as IndexOf casts to an array first)
Seeing as it's two small lists the code below should be fine. Not sure if there's a library with an intersection method like Java has (although List isn't a set so it wouldn't work), I know as someone pointed out the PowerCollection library has one.
List<int> a = new List<int>() {1, 2, 3, 4, 5};
List<int> b = new List<int>() {0, 4, 8, 12};
List<int> result = new List<int>();
for (int i=0;i < a.Count;i++)
{
if (b.Contains(a[i]))
result.Add(a[i]);
}
foreach (int i in result)
Console.WriteLine(i);
Update 2: HashSet was a dumb answer as it's 3.5 not 3.0
Update: HashSet seems like the obvious answer:
// Method 2 - HashSet from System.Core
HashSet<int> aSet = new HashSet<int>(a);
HashSet<int> bSet = new HashSet<int>(b);
aSet.IntersectWith(bSet);
foreach (int i in aSet)
Console.WriteLine(i);
Here is a method that removed duplicate strings. Change this to accomidate int and it will work fine.
public List<string> removeDuplicates(List<string> inputList)
{
Dictionary<string, int> uniqueStore = new Dictionary<string, int>();
List<string> finalList = new List<string>();
foreach (string currValue in inputList)
{
if (!uniqueStore.ContainsKey(currValue))
{
uniqueStore.Add(currValue, 0);
finalList.Add(currValue);
}
}
return finalList;
}
Update: Sorry, I am actually combining the lists and then removing duplicates. I am passing the combined list to this method. Not exactly what you are looking for.
Wow. The answers thus far look very complicated. Why not just use :
List<int> a = new List<int>() { 1, 2, 3, 4, 5, 12, 13 };
List<int> b = new List<int>() { 0, 4, 8, 12 };
...
public List<int> Dups(List<int> a, List<int> b)
{
List<int> ret = new List<int>();
foreach (int x in b)
{
if (a.Contains(x))
{
ret.add(x);
}
}
return ret;
}
This seems much more straight-forward to me... unless I've missed part of the question. Which is entirely possible.