how to check contents of collections(>2) are same - c#

I have a List. For valid reasons, I duplicate the List many times and use it for different purposes. At some point I need to check if the contents of all these collections are same.
Well, I know how to do this. But being a fan of "short hand" coding(linq...) I would like to know if I can check this EFFICIENTLY with the shortest number of lines of code.
List<string> original, duplicate1, duplicate2, duplicate3, duplicate4
= new List<string();
//...some code.....
bool isequal = duplicate4.sequenceequal(duplicate3)
&& duplicate3.sequenceequal(duplicate2)
&& duplicate2.sequenceequal(duplicate1)
&& duplicate1.sequenceequal(original);//can we do it better than this
UPDATE
Codeinchaos pointed out certain senarios I havent thought of(duplicates and order of list).Though sequenceequal will take care of duplicates the order of the list can be a problem. So I am changing the code as follows. I need to copy the Lists for this.
List<List<string>> copy = new List<List<int>> { duplicate1, duplicate2,
duplicate3, duplicate4 };
bool iseqaul = (original.All(x => (copy.All(y => y.Remove(x))))
&& copy.All(n => n.Count == 0));
UPDATE2
Thanks to Eric-using a HashSet can be very efficient as follows. This wont cover duplicates though.
List<HashSet<string>> copy2 =new List<HashSet<string>>{new HashSet<string>(duplicate1),
new HashSet<string>(duplicate2),
new HashSet<string> duplicate3),
new HashSet<string>(duplicate4)};
HashSet<string> origninalhashset = new HashSet<string>(original);
bool eq = copy2.All(x => origninalhashset.SetEquals(x));
UPDATE3
Thanks to Eric - The original code in this post with SequenceEqual will work with sorting. As Sequenceequal will consider the order of collections, the collections need to be sorted before calling sequenceequal. I guess this is not much of a probelm as sorting is pretty fast(nlogn).
UPDATE4
As per Brian's suggestion, I can use a lookup for this.
var originallkup = original.ToLookup(i => i);
var lookuplist = new List<ILookup<int, int>>
{ duplicate4.ToLookup(i=> i),
duplicate3.ToLookup(i=> i),
duplicate2.ToLookup(i=> i),
duplicate1.ToLookup(i=> i)
};
bool isequal = (lookuplist.Sum(x => x.Count) == (originallkup.Count * 4)) &&
(originallkup.All(x => lookuplist.All(i => i[x.Key].Count() == x.Count())));
Thank you all for your responses.

I have a List. I duplicate the List many times and use it for different purposes. At some point I need to check if the contents of all these collections are same.
A commenter then asks:
Is the order important? Or just the content?
And you respond:
only the content is important
In that case you are using the wrong data structure in the first place. Use a HashSet<T>, not a List<T>, to represent an unordered collection of items that must be cheaply compared for set equality.
Once you have everything in hash sets instead of lists, you can simply use their SetEquals method to see if any pair of sets is unequal.
Alternatively: keep everything in lists, until the point where you want to compare for equality. Initialize a hash set from one of the lists, and then use SetEquals to compare that hash set to every other list.

I honestly can't think of a more efficient solution, but as for reducing the number of lines of code, give this a bash:
var allLists = new List<List<string>>() { original, duplicate1, duplicate2, duplicate3, duplicate4 };
bool allEqual = allLists.All(l => l.SequenceEqual(original));
Or, use the Any operator - might be better in terms of performance.
bool allEqual = !allLists.Any(l => !l.SequenceEqual(original));
EDIT: Confirmed, Any will stop enumerating the source once it determines a value. Thank you MSDN.
EDIT # 2: I have been looking into the performance of SequenceEquals. This guy has a nice post comparing SequenceEquals to a more imperative function. I modified his example to work with List<string> and my findings match his. It would appear that as far as performance is concerned, SequenceEquals isn't high on the list of preferred methods.

You can use reflection to create a generic comparer, and always use it. Look this thread, has a loot of code that can help you: Comparing two collections for equality irrespective of the order of items in them

Related

Using Where and ForEach to modify specific elements in a list

If I have a list of objects that have the properties fruitName and numFruits and I want to pluralize the fruitName where numFruits is greater than 1, is it possible to do that in a single statement by chaining together Where and Foreach?
Something like:
fruitList.Where(fl => fl.numFruits > 1).ForEach(fl => fl.fruitName = fl.fruitName + "s");
I tried the above and it doesn't work. It complains that System.Collections.Generic.IEnumerable doesn't contain a definition for ForEach.
Typically you want to use foreach the language construct when possible. Eric Lippert has a blog post going into additional detail as to why.
Loops are good when you are doing modifications as it makes finding those modifications easier.
foreach (var fl in fruitList.Where(fl => fl.numFruits > 1))
{
fl.fruitName = fl.fruitName + "s";
}
Is more straightforward and accomplishes the same task.
If you really want a one-liner (it will be harder to maintain) and want to keep the original list intact but only modify some of the elements, you'll have to use a full anonymous function. If you need multiple statements (a block of code), you'll need to include the braces and statement-terminating semicolons like below:
fruitList.ForEach(fl => { fl.fruitName = fl.numFruits > 1 ? fl.fruitName + "s" : fl.fruitName; });
This works on the original list (no subset) and does basically the exact same thing a structured foreach would do.
There's a good blog post by Eric Lippert on why there is no “ForEach” sequence operator extension method, essentially the reason is:
The first reason is that doing so violates the functional programming
principles that all the other sequence operators are based upon.
Clearly the sole purpose of a call to this method is to cause side
effects. The purpose of an expression is to compute a value, not to
cause a side effect. The purpose of a statement is to cause a side
effect. The call site of this thing would look an awful lot like an
expression (though, admittedly, since the method is void-returning,
the expression could only be used in a “statement expression”
context.) It does not sit well with me to make the one and only
sequence operator that is only useful for its side effects.
If you wanted to do this in a single statement you could use a .Select()
var newFruitList = fruitList.Where(fl => fl.numFruits > 1).Select(fl => fl.fruitName + "s");
Like #Tim Schmelter suggested, you can use ToList() to convert to a list and then use the ForEach method on the result returned. Although the ToList() might return a shorter list based on the filter, the original objects themselves would be changed and your fruitList will remain unchanged.
fruitList.Where(fl => fl.numFruits > 1).ToList().ForEach(fl => fl.fruitName = fl.fruitName + "s");
// fruitList still has all elements
You can use the static Array.ForEach method to update the list.
Array.ForEach(fruitList.Where(fl => fl.numFruits > 1).ToArray(), x => { x.fruitName += "s"; });
Given that "append an s" doesn't actually give you the correct answer for many fruits, any approach that does that will give you an incorrect answer, no matter how well it does it.
Consider using a lookup table to map singlular to plurals (and vice versa) instead:
using System;
using System.Collections.Generic;
using System.Linq;
public class Test
{
private static Dictionary<string, string> fruitLookup =
new Dictionary<string, string>
{
{"blueberry", "blueberries"},
{"peach", "peaches"},
{"apple", "apples"}
};
public static void Main()
{
var fruitList = new List<string> {"blueberry", "peach", "apple"};
// Here is your one-line conversion:
var plurals = fruitList.Select(f => fruitLookup[f]).ToList();
foreach (var p in plurals)
{
Console.WriteLine(p);
}
}
}

A better way to loop through lists

So I have a couple of different lists that I'm trying to process and merge into 1 list.
Below is a snipet of code that I want to see if there was a better way of doing.
The reason why I'm asking is that some of these lists are rather large. I want to see if there is a more efficient way of doing this.
As you can see I'm looping through a list, and the first thing I'm doing is to check to see if the CompanyId exists in the list. If it does, then I find item in the list that I'm going to process.
pList is my processign list. I'm adding the values from my different lists into this list.
I'm wondering if there is a "better way" of accomplishing the Exist and Find.
boolean tstFind = false;
foreach (parseAC item in pACList)
{
tstFind = pList.Exists(x => (x.CompanyId == item.key.ToString()));
if (tstFind == true)
{
pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
//Processing done here. pItem gets updated here
...
}
Just as a side note, I'm going to be researching a way to use joins to see if that is faster. But I haven't gotten there yet. The above code is my first cut at solving this issue and it appears to work. However, since I have the time I want to see if there is a better way still.
Any input is greatly appreciated.
Time Findings:
My current Find and Exists code takes about 84 minutes to loop through the 5.5M items in the pACList.
Using pList.firstOrDefault(x=> x.CompanyId == item.key.ToString()); takes 54 minutes to loop through 5.5M items in the pACList
You can retrieve item with FirstOrDefault instead of searching for item two times (first time to define if item exists, and second time to get existing item):
var tstFind = pList.FirstOrDefault(x => x.CompanyId == item.key.ToString());
if (tstFind != null)
{
//Processing done here. pItem gets updated here
}
Yes, use a hashtable so that your algorithm is O(n) instead of O(n*m) which it is right now.
var pListByCompanyId = pList.ToDictionary(x => x.CompanyId);
foreach (parseAC item in pACList)
{
if (pListByCompanyId.ContainsKey(item.key.ToString()))
{
pItem = pListByCompanyId[item.key.ToString()];
//Processing done here. pItem gets updated here
...
}
You can iterate though filtered list using linq
foreach (parseAC item in pACList.Where(i=>pList.Any(x => (x.CompanyId == i.key.ToString()))))
{
pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
//Processing done here. pItem gets updated here
...
}
Using lists for this type of operation is O(MxN) (M is the count of pACList, N is the count of pList). Additionally, you are searching pACList twice. To avoid that issue, use pList.FirstOrDefault as recommended by #lazyberezovsky.
However, if possible I would avoid using lists. A Dictionary indexed by the key you're searching on would greatly improve the lookup time.
Doing a linear search on the list for each item in another list is not efficient for large data sets. What is preferable is to put the keys into a Table or Dictionary that can be much more efficiently searched to allow you to join the two tables. You don't even need to code this yourself, what you want is a Join operation. You want to get all of the pairs of items from each sequence that each map to the same key.
Either pull out the implementation of the method below, or change Foo and Bar to the appropriate types and use it as a method.
public static IEnumerable<Tuple<Bar, Foo>> Merge(IEnumerable<Bar> pACList
, IEnumerable<Foo> pList)
{
return pACList.Join(pList, item => item.Key.ToString()
, item => item.CompanyID.ToString()
, (a, b) => Tuple.Create(a, b));
}
You can use the results of this call to merge the two items together, as they will have the same key.
Internally the method will create a lookup table that allows for efficient searching before actually doing the searching.
Convert pList to HashSet then query pHashSet.Contains(). Complexity O(N) + O(n)
Sort pList on CompanyId and do Array.BinarySearch() = O(N Log N) + O(n * Log N )
If Max company id is not prohibitively large, simply create and array of them where item with company id i exists at i-th position. Nothing can be more fast.
where N is size of pList and n is size of pACList

How to compare two sorted large lists efficiently in C#?

I have got two generic lists with 20,000 and 30,000 objects in each list.
class Employee
{
string name;
double salary;
}
List<Employee> newEmployeeList = List<Employee>() {....} // contains 20,000 objects
List<Employee> oldEmployeeList = List<Employee>() {....} // contains 30,000 objects
Lists can also be sorted by name if it improves the speed.
I want to compare these two lists to find out
employees whose name and salary matching
employees whose name is matching but not salary
What is the fastest way to compare such large data lists with above conditions?
I would sort both newEmployeeList and oldEmployeeList lists by name - O(n*log(n)). And then you can use linear algorithm to search for matches. So the total would be O(n+n*log(n)) if both lists are about the same size. This should be faster than O(n^2) "brute force" algorithm.
I'd probably recommend the two lists be stored in a Dictionary<string, Employee> based on the name to begin with, then you can iterate over the keys in one and lookup to see if they exist and the salaries match in the other. This would also save the cost of sorting them later or putting them in a more efficient structure.
This is pretty much O(n) - linear to build both dictionaries, linear to go through the keys and lookup in the other. Since O(n + m + n) reduces to O(n)
But, if you must use List<T> to hold the lists for other reasons, you could also use the Join() LINQ method, and build a new list with a Match field that tells you whether they were a match or mismatch...
var results = newEmpList.Join(
oldEmpList,
n => n.Name,
o => o.Name,
(n, o) => new
{
Name = n.Name,
Salary = n.Salary,
Match = o.Salary == n.Salary
});
You can then filter this with a Where() clause for Match or !Match.
Update: I assume (by the title of your question) that the 2 lists are already sorted. Perhaps they're stored in a database with a clustered index or something. This answer, therefore, relies on that assumption.
Here is an implementation that has O(n) complexity, and is also very fast, AND is pretty simple too.
I believe this is a variant of the Merge Algorithm.
Here's the idea:
Start enumerating both lists
Compare the 2 current items.
If they match, add to your results.
If the 1st item is "smaller", advance the 1st list.
If the 2nd item is "smaller", advance the 2nd list.
Since both lists are known to be sorted, this will work very well. This implementation assumes that name is unique in each list.
var comparer = StringComparer.OrdinalIgnoreCase;
var namesAndSalaries = new List<Tuple<Employee, Employee>>();
var namesOnly = new List<Tuple<Employee, Employee>>();
// Create 2 iterators; one for old, one for new:
using (IEnumerator<Employee> A = oldEmployeeList.GetEnumerator()) {
using (IEnumerator<Employee> B = newEmployeeList.GetEnumerator()) {
// Start enumerating both:
if (A.MoveNext() && B.MoveNext()) {
while (true) {
int compared = comparer.Compare(A.Current.name, B.Current.name);
if (compared == 0) {
// Names match
if (A.Current.salary == B.Current.salary) {
namesAndSalaries.Add(Tuple.Create(A.Current, B.Current));
} else {
namesOnly.Add(Tuple.Create(A.Current, B.Current));
}
if (!A.MoveNext() || !B.MoveNext()) break;
} else if (compared == -1) {
// Keep searching A
if (!A.MoveNext()) break;
} else {
// Keep searching B
if (!B.MoveNext()) break;
}
}
}
}
}
One of fastest possible solutions on sorted lists is use of BinarySearch in order to find an item in another list.
But as mantioned others, you should measure it against your project requirements, as performance often tends to be a subjective thing.
You could create a Dictionary using
var lookupDictionary = list1.ToDictionary(x=>x.name);
That would give you close to O(1) lookup and a close to O(n) behavior if you're looking up values from a loop over the other list.
(I'm assuming here that ToDictionary is O(n) which would make sense with a straight forward implementation, but I have not tested this to be the case)
This would make for a very straight forward algorithm, and I'm thinking going below O(n) with two unsorted lists is pretty hard.

C# Sort List Based on Another List

I have a class that has multiple List<> contained within it. Its basically a table stored with each column as a List<>. Each column does not contain the same type. Each list is also the same length (has the same number of elements).
For example:
I have 3 List<> objects; one List, two List, and three List.
//Not syntactically correct
List<DateTime> one = new List...{4/12/2010, 4/9/2006, 4/13/2008};
List<double> two = new List...{24.5, 56.2, 47.4};
List<string> three = new List...{"B", "K", "Z"};
I want to be able to sort list one from oldest to newest:
one = {4/9/2006, 4/13/2008, 4/12/2010};
So to do this I moved element 0 to the end.
I then want to sort list two and three the same way; moving the first to the last.
So when I sort one list, I want the data in the corresponding index in the other lists to also change in accordance with how the one list is sorted.
I'm guessing I have to overload IComparer somehow, but I feel like there's a shortcut I haven't realized.
I've handled this design in the past by keeping or creating a separate index list. You first sort the index list, and then use it to sort (or just access) the other lists. You can do this by creating a custom IComparer for the index list. What you do inside that IComparer is to compare based on indexes into the key list. In other words, you are sorting the index list indirectly. Something like:
// This is the compare function for the separate *index* list.
int Compare (object x, object y)
{
KeyList[(int) x].CompareTo(KeyList[(int) y])
}
So you are sorting the index list based on the values in the key list. Then you can use that sorted key list to re-order the other lists. If this is unclear, I'll try to add a more complete example when I get in a situation to post one.
Here's a way to do it using LINQ and projections. The first query generates an array with the original indexes reordered by the datetime values; in your example, the newOrdering array would have members:
{ 4/9/2006, 1 }, { 4/13/2008, 2 }, { 4/12/2010, 0 }
The second set of statements generate new lists by picking items using the reordered indexes (in other words, items 1, 2, and 0, in that order).
var newOrdering = one
.Select((dateTime, index) => new { dateTime, index })
.OrderBy(item => item.dateTime)
.ToArray();
// now, order each list
one = newOrdering.Select(item => one[item.index]).ToList();
two = newOrdering.Select(item => two[item.index]).ToList();
three = newOrdering.Select(item => three[item.index]).ToList();
I am sorry to say, but this feels like a bad design. Especially because List<T> does not guarantee element order before you have called one of the sorting operations (so you have a problem when inserting):
From MSDN:
The List is not guaranteed to be
sorted. You must sort the List
before performing operations (such as
BinarySearch) that require the List
to be sorted.
In many cases you won't run into trouble based on this, but you might, and if you do, it could be a very hard bug to track down. For example, I think the current framework implementation of List<T> maintains insert order until sort is called, but it could change in the future.
I would seriously consider refactoring to use another data structure. If you still want to implement sorting based on this data structure, I would create a temporary object (maybe using an anonymous type), sort this, and re-create the lists (see this excellent answer for an explanation of how).
First you should create a Data object to hold everything.
private class Data
{
public DateTime DateTime { get; set; }
public int Int32 { get; set; }
public string String { get; set; }
}
Then you can sort like this.
var l = new List<Data>();
l.Sort(
(a, b) =>
{
var r = a.DateTime.CompareTo(b);
if (r == 0)
{
r = a.Int32.CompareTo(b);
if (r == 0)
{
r = a.String.CompareTo(b);
}
}
return r;
}
);
I wrote a sort algorithm that does this for Nito.LINQ (not yet released). It uses a simple-minded QuickSort to sort the lists, and keeps any number of related lists in sync. Source code starts here, in the IList<T>.Sort extension method.
Alternatively, if copying the data isn't a huge concern, you could project it into a LINQ query using the Zip operator (requires .NET 4.0 or Rx), order it, and then pull each result out:
List<DateTime> one = ...;
List<double> two = ...;
List<string> three = ...;
var combined = one.Zip(two, (first, second) => new { first, second })
.Zip(three, (pair, third) => new { pair.first, pair.second, third });
var ordered = combined.OrderBy(x => x.first);
var orderedOne = ordered.Select(x => x.first);
var orderedTwo = ordered.Select(x => x.second);
var orderedThree = ordered.Select(x => x.third);
Naturally, the best solution is to not separate related data in the first place.
Using generic arrays, this can get a bit cumbersome.
One alternative is using the Array.Sort() method that takes an array of keys and an array of values to sort. It first sorts the key array into ascending order and makes sure the array of values is reorganized to match this sort order.
If you're willing to incur the cost of converting your List<T>s to arrays (and then back), you could take advantage of this method.
Alternatively, you could use LINQ to combine the values from multiple arrays into a single anonymous type using Zip(), sort the list of anonymous types using the key field, and then split that apart into separate arrays.
If you want to do this in-place, you would have to write a custom comparer and create a separate index array to maintain the new ordering of items.
I hope this could help :
one = one.Sort(delegate(DateTime d1, DateTime d2)
{
return Convert.ToDateTime(d2).CompareTo(Convert.ToDateTime(d1));
});

Better way of searching through lists than using foreach

list vclAsset<FullAsset>
list callsigns<string>
foreach(FullAsset fa in vclAsset)
{
if (callsigns.contains(fa.asset.callsign))
{
//do something
}
}
Is there a more elegant way to do the above? A FullAsset object contains an Asset object which in turn has a string "Callsign." Each callsign will be unique, so my list callsigns will only have one of each string, and no two FullAsset objects will share an Asset.callsign variable.
In a nutshell I want to pull all the FullAssets that have a certain callsign, but using a foreach seems clumsy (given that the number of FullAssets that could be contained in said list potentially has no upper limit).
You could use a lambda expression, something like this:
foreach(FullAsset fa in vclAsset.Where(a => callsigns.contains(a.asset.callsign))
{
// do something
}
If your keys are unique, you can use a Dictionary or a Hashtable to speed up searching.
If you only want to find a certain item, you can use the List<T>.Find method and supply a predicate.
FullAsset result = vclAsset.Find
(fa => callsigns.contains(fa.asset.callsign));
or
List<FullAsset> results = vclAsset.FindAll
(fa => callsigns.contains(fa.asset.callsign));
If you are using .Net 3.5, LINQ Where may be a better solution, as others have stated, since it returns an enumerator (lazy evaluation) vs a full List.
Sure, using linq.
var assets= vclAsset.Where(fullA=>allsigns.contains(fullA.asset.callsign));
assets will be some enumerable structure.
I can recommend 100 Linq samples for inspiration and learning
Not sure if it counts as more elegant but you can use linq...
var results = from fa in vclAsset
where callsigns.Contains(fa.asset.callsign)
select fa;
var result = vclAsset.Where(x=>callsigns.Any(y=>x.asset.callsign==y));
P.s. I would rename vclAsset and asset/callsign properties.
You can also use the Join function to do this.
var sortedList = vclAsset.Join(callsigns,
x => x.asset.callsign, x => x,
x, y => x);
This is the list of vclAssets that have the listed callsign.

Categories

Resources