Suppose I have two collections as follows:
Collection1:
"A1"
"A1"
"M1"
"M2"
Collection2:
"M2"
"M3"
"M1"
"A1"
"A1"
"A2"
all the values are string values. I want to know if all the elements in Collection1 are contained in Collection2, but I have no guarantee on the order and a set may have multiple entries with the same value. In this case, Collection2 does contain Collection1 because Collection2 has two A1's, M1 and M2. Theres the obvious way: sorting both collections and popping off values as i find matches, but I was wondering if there's a faster more efficient way to do this. Again with the initial collections I have no guarantee on the order or how many times a given value will appear
EDIT: Changed set to collection just to clear up that these aren't sets as they can contain duplicate values
The most concise way I know of:
//determine if Set2 contains all of the elements in Set1
bool containsAll = Set1.All(s => Set2.Contains(s));
Yes, there is a faster way, provided you're not space-constrained. (See space/time tradeoff.)
The algorithm:
Just insert all the elements in Set2 into a hashtable (in C# 3.5, that's a HashSet<string>), and then go through all the elements of Set1 and check if they're in the hashtable. This method is faster (Θ(m + n) time complexity), but uses O(n) space.
Alternatively, just say:
bool isSuperset = new HashSet<string>(set2).IsSupersetOf(set1);
Edit 1:
For those people concerned about the possibility of duplicates (and hence the misnomer "set"), the idea can easily be extended:
Just make a new Dictionary<string, int> representing the count of each word in the super-list (add one to the count each time you see an instance of an existing word, and add the word with a count of 1 if it's not in the dictionary), and then go through the sub-list and decrement the count each time. If every word exists in the dictionary and the count is never zero when you try to decrement it, then the subset is in fact a sub-list; otherwise, you had too many instances of a word (or it didn't exist at all), so it's not a real sub-list.
Edit 2:
If the strings are very big and you're concerned about space efficiency, and an algorithm that works with (very) high probability works for you, then try storing a hash of each string instead. It's technically not guaranteed to work, but the probability of it not working is pretty darn low.
The problem I see with the HashSet, Intersect, and other Set theory answers is that you do contain duplicates, and "A set is a collection that contains no duplicate elements". Here's a way to handle the duplicate cases.
var list1 = new List<string> { "A1", "A1", "M1", "M2" };
var list2 = new List<string> { "M2", "M3", "M1", "A1", "A1", "A2" };
// Remove returns true if it was able to remove it, and it won't be there to be matched again if there's a duplicate in list1
bool areAllPresent = list1.All(i => list2.Remove(i));
EDIT: I renamed from Set1 and Set2 to list1 and list2 to appease Mehrdad.
EDIT 2: The comment implies it, but I wanted to explicitly state that this does alter list2. Only do it this way if you're using it as a comparison or control but don't need the contents afterwards.
Check out linq...
string[] set1 = {"A1", "A1", "M1", "M2" };
string[] set2 = { "M2", "M3", "M1", "A1", "A1", "A2" };
var matching = set1.Intersect(set2);
foreach (string x in matching)
{
Console.WriteLine(x);
}
Similar one
string[] set1 = new string[] { "a1","a2","a3","a4","a5","aa","ab" };
string[] set2 = new string[] {"m1","m2","a4","a6","a1" };
var a = set1.Select(set => set2.Contains(set));
Related
I have two lists of string. I want to compare each elements in one list with another and if at least one of them match then do some processing else dont do anything.
I dont know how to do. I do have the following lists and the code I used was SequenceEqual but my lead said its wrong as it just compares if its equal or not and does nothing. I couldn't disagree and I want to achieve my intended functionality I mentioned above. Please help. As you seem, order doesn't matter, here 123 is in both list but in different order, so it matches and hence do some processing as per my requirement.
List<string> list1 = new List<string> () { "123", "234" };
List<string> list2 = new List<string> () { "333", "234" , "123"};
You can use the Any method for this :
var matchfound = list1.Any(x=> list2.Contains(x));
Now you can do conditional block on the matchFound if it returns true you can process what ever is required.
if you want to do case insentitive comparison then you will need to use String.Equals and can specify if case does not matter for comaparing those.
You can use Intersect to find common elements:
var intersecting = list1.Intersect(list2);
If you just want to know if there are common elements append .Any():
bool atLeastOneCommonElement = intersecting.Any();
If you want to process them:
foreach(var commonElement in intersecting)
{
// do something ...
}
You could check with Intersect and Any
var matchFound = list1.Intersect(list2).Any();
For example,
List<string> list1 = new List<string>{ "123", "234" };
List<string> list2 = new List<string>{ "333", "234" , "123"};
var result = list1.Intersect(list2).Any();
Output True
List<string> list3 = new List<string>{"5656","8989"};
result = list1.Intersect(list3).Any();
Output False
You need to take all those item that are matches from both list and then do code if match found like
foreach (var item in list1.Where(x => list2.Contains(x)))
{
//do some processing here
Console.WriteLine($"Match found: {item}");
}
In above code foreach iterate when item present in both list.
Output:
Use LINQ to find the matches; and then check the resulting array size as follows:
var intersect = list1.Where(el1=>list2.Any(el2=>el2==el1));
var isMatch = intersect.Count > 0;
I'm trying to create a Unit Test that compares two lists of string arrays.
I tried creating two of the exact same List<string[]> objects, but when I use CollectionAssert.AreEqual(expected, actual);, the test fails:
[TestMethod]
public void TestList()
{
List<string[]> expected = new List<string[]> {
new string[] { "John", "Smith", "200" },
new string[] { "John", "Doe", "-100" }
};
List<string[]> actual = new List<string[]> {
new string[] { "John", "Smith", "200" },
new string[] { "John", "Doe", "-100" }
};
CollectionAssert.AreEqual(expected, actual);
}
I've also tried Assert.IsTrue(expected.SequenceEqual(actual));, but that fails as well.
Both these methods work if I am comparing two Lists of strings or two arrays of strings, but they do not work when comparing two Lists of arrays of strings.
I'm assuming these methods are failing because they are comparing two Lists of object references instead of the array string values.
How can I compare the two List<string[]> objects and tell if they are really the same?
It is failing because the items in your list are objects (string[]) and since you did not specify how CollectionAssert.AreEqual should compare the elements in the two sequences it is falling back to the default behavior which is to compare references. If you were to change your lists to the following, for example, you would find that the test passes because now both lists are referencing the same arrays:
var first = new string[] { "John", "Smith", "200" };
var second = new string[] { "John", "Smith", "200" };
List<string[]> expected = new List<string[]> { first, second};
List<string[]> actual = new List<string[]> { first, second};
To avoid referential comparisons you need to tell CollectionAssert.AreEqual how to compare the elements, you can do that by passing in an IComparer when you call it:
CollectionAssert.AreEqual(expected, actual, StructuralComparisons.StructuralComparer);
CollectionAssert.AreEqual(expected, actual); fails, because it compares object references. expected and actual refer to different objects.
Assert.IsTrue(expected.SequenceEqual(actual)); fails for the same reason. This time the contents of expected and actual are compared, but the elements are themselves different array references.
Maybe try to flatten both sequences using SelectMany:
var expectedSequence = expected.SelectMany(x => x).ToList();
var actualSequence = actual.SelectMany(x => x).ToList();
CollectionAssert.AreEqual(expectedSequence, actualSequence);
As Enigmativity correctly noticed in his comment, SelectMany may give a positive result when the number of arrays and/or their elements are different, but flattening the lists will result in an equal number of elements. It is safe only in the case when you always have the same number of arrays and elements in these arrays.
The best solution would be to check both the items in each sub-collection as well as the number of items in each respective sub-collection.
Try with this:
bool equals = expected.Count == actual.Count &&
Enumerable.Range(0, expected.Count).All(i => expected[i].Length == actual[i].Length &&
expected[i].SequenceEqual(actual[i]));
Assert.IsTrue(equals);
This will check that:
Both the lists have the same length
All the pair of sub-collections in both lists have the same length
The items in each pair of sub-collections are the same
Note: using SelectMany isn't a good idea since it could create a false positive you have the same items in your second list, but spread out in different sub-collections. I mean, it'd consider two lists to be the same even if the second one had the same items all in a single sub-collection.
var list1 = new[] { 1, 2, 3, 4, 5 };
var list2 = new int[] { };
var x = list2.Except(list1).ToList();
This doesnt return all elements from list1, var x is actually an empty list. according to msdn it should return all the elements in list1. Why is it not, and what am I missing.
A sequence that contains the set difference of the elements of two sequences.
It works just fine: returns all items from list2 which do not exist in list1. And because list2 is already empty result list is empty as well.
Look at first parameter description on the msdn:
An IEnumerable<T> whose elements that are not also in second will be returned.
The wording might be confusing on MSDN, but refer to set theory to understand it.
A sequence that contains the set difference of the elements of two sequences.
A set difference is also called a complement and is "is the set of elements in B but not in A".
With that in mind, an empty array .Except anything is still an empty array.
You probably want
var list1 = new[] { 1, 2, 3, 4, 5 };
var list2 = new int[] { };
var x = list2.Union(list1).ToList();
Since you mentioned that you are expecting
should return all the elements in list1
Hi there. I was hoping I could ask for some advice in regards to a problem I am struggling with.
I have a List with more than a thousand values and there are some duplicates, not exact duplicates but discrepencies based on upper and lower case.
so for example I would have
Training and training in the same list or
Vision and Values and Vision and values.
So there are various instances where there are minor discrepies based on Case difference.
How could I go about in removing thsese 'excess' values?
Use Linq:
var listWithDups = new List<string>() = {"blah","Blah","etc","etc."};
var listWithoutDups = listWithDups.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();
Tried this in LinqPad:
var list = new List<String> { "Hello", "World", "HELLO", "beautiful", "WORLD" };
var l = list.Distinct(StringComparer.CurrentCultureIgnoreCase).ToList();
Console.WriteLine(l);
I would add all entries to a Hashset
A Hashset is a collection that stores maximum one of every item added to it.
You'd write a "ignore case" equity comparer that you'd pass into the Hashset construcor.
Like:
var set = new Hashset( yourListWithDuplicates, (x,y) => x.Equals(y,
StringComparison.CurrentCultureIgnoreCase));
I have two string arrays
string[] a = ...
string[] b = ...
I want to remove any items from a that also exist in b or return a new array with only those items that exist only in a.
So, for example, if
a={"a", "b", "c"};
and,
b={"b"}
then the result should be
{"a", "c"}
Is there a neat lambda expression or Linq or something I can use to do this?
Thanks,
Sachin
I believe Except will do what you want. Remember, Except, like most LINQ Extension methods, will not modify the existing collection. It will return a new collection.
c = a.Except(b)