Compare two List<int> - c#

I am writing a small program to compare two List. If the values are the same, I add them to list dups, if they are different, I add them to distinct. I noticed that some of my values are added and some are not, and after debugging for awhile, I am not certain what the problem is. Can someone shed a little light? Thanks.
List<int> groupA = new List<int>();
List<int> groupB = new List<int>();
List<int> dups = new List<int>();
List<int> distinct = new List<int>();
groupA.Add(2);
groupA.Add(24);
groupA.Add(5);
groupA.Add(72);
groupA.Add(276);
groupA.Add(42);
groupA.Add(92);
groupA.Add(95);
groupA.Add(266);
groupA.Add(42);
groupA.Add(92);
groupB.Add(5);
groupB.Add(42);
groupB.Add(95);
groupA.Sort();
groupB.Sort();
for (int a = 0; a < groupA.Count; a++)
{
for (int b = 0; b < groupB.Count; b++)
{
groupA[a].CompareTo(groupB[b]);
if (groupA[a] == groupB[b])
{
dups.Add(groupA[a]);
groupA.Remove(groupA[a]);
groupB.Remove(groupB[b]);
}
}
distinct.Add(groupA[a]);
}

I would use the Intersect and Except methods:
dups = groupA.Intersect(groupB).ToList();
distinct = groupA.Except(groupB).ToList();

When you remove an item from a list, you move the index of the remaining element down.
In essence, you are skipping some items using a for loop.
Try using a while loop, and manually increment the counter when you are not deleting an item.
For example, the following code is incorrect
List<int> nums = new List<int>{2, 4, 6, 7, 8, 10, 11};
for (int i = 0; i < nums.Count; i++)
{
if (nums[i] % 2 == 0)
nums.Remove(nums[i]);
}
If will return the list {4, 7, 10, 11} instead of just {7, 11}.
It will not remove the value of 4, because, when I remove the value of 2, (for i=0) the nums list goes from
//index 0 1 2 3 4 5 6
nums = {2, 4, 6, 7, 8, 10, 11}
to
//index 0 1 2 3 4 5
nums = {4, 6, 7, 8, 10, 11}
The loop finishes, the i is incremented to 1, and the next item referenced is nums[1], which is not 4 as one would intuitively expect, but 6. So in effect the value of 4 is skipped, and the check is not executed.
You should be very, very careful each time when you are modifying the collection you are iterating. For example, the foreach statement will throw an exception if you even try this. In this case you could use a while like
List<int> nums = new List<int>{2, 4, 6, 7, 8, 10, 11};
int i = 0;
while (i < nums.Count)
{
if (nums[i] % 2 == 0)
{
nums.Remove(nums[i])
}
else
{
i++; //only increment if you are not removing an item
//otherwise re-run the loop for the same value of i
}
}
of you could even fork the for, like
for (int i = 0; i < nums.Count; i++)
{
if (nums[i] % 2 == 0)
{
nums.Remove(nums[i]);
i--; //decrement the counter, so that it will stay in place
//when it is incremented at the end of the loop
}
}
Alternatively you could use linq, like this:
distinct.AddRange(groupA);
distinct.AddRange(groupB);
distinct = distinct.Distinct().ToList();
and
dups.AddRange(groupA);
dups.AddRange(groupB);
dups = dups.GroupBy(i => i)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
Note that the LINQ code will not alter your existing groupA and groupB lists. If you just want to distinct them, you could just do
groupA = groupA.Distinct().ToList();
groupB = groupB.Distinct().ToList();

You can easily do it with Linq:
List<int> dups = groupA.Intersect(groupB).ToList();
List<int> distinct = groupA.Except(groupB).ToList();
(assuming I correctly understood what you were trying to do)

The title of the question is "Compare two List", some people who are interested in true/false only result will land to the question
use Enumerable.SequenceEqual Method
if (listA.SequenceEqual(listB))
{
// they are equal
}

You need to find the missing elements in both:
List<int> onlyInA = groupA.Except(groupB).ToList();
List<int> onlyInB = groupB.Except(groupA).ToList();
Or in a single linq:
List<int> missing = groupA.Except(groupB).Union(groupB.Except(groupA)).ToList()
Note - as with all linq, its worth pointing out that its not the most efficient way to do this. All list iterations have a cost. A longer-hand way of sorting both lists then iterating over them together would be quicker if the lists were REALLY large...

Related

How to remove all 1 from given list in C# program?

I have a List<int> that contains multiple 1 values, and I want to remove all of them. Is this the correct way to do it?
var numbers = new List<int>() { 1, 2, 3, 4, 5, 1, 1, 1, 6, 7 };
for (var i = 0; i < numbers.Count; i++)
{
if (numbers[i] == 1)
{
numbers.Remove(numbers[i]);
}
}
Using the RemoveAll method, you can specify a predicate which will be called for each item in the list. If it returns true, that item will be removed.
For example:
numbers.RemoveAll(n => n == 1)

Every combination of "1 item from each of N collections"

I have a nested List<List<int>> data structure, and I would like to iterate over every possible combination of the inmost int elements, such as that in each combination exactly one value from each inner List<int> is used. For example, please consider the following nested list:
var listOfLists = new List<List<int>>()
{
new List<int>() { 1, 2, 3, 4, 9 },
new List<int>() { 0, 3, 4, 5 },
new List<int>() { 1, 6 }
};
The first few combinations would yield:
1 0 1 // Indices: 0 0 0
1 0 6 // Indices: 0 0 1
1 3 1 // Indices: 0 1 0
1 3 6 // Indices: 0 1 1
2 0 1 // Indices: 1 0 0
2 0 6 // Indices: 1 0 1
2 3 1 // Indices: 1 1 0
...
How could I accomplish this?
My initial approach was to make permutations of indices, but the lengths of inner List<int> lists are not necessarily equal. Another approach I can think of is multiplying the length of each inner List<int>, then using the modulo and division operators combined with Math.Floor to determine indices, but I'm not sure how exactly this could be implemented when N collections are present.
I've answered a several similar questions, which all basically use a variation of one and the same algorithm. Here is the modified version of the Looking at each combination in jagged array:
public static class Algorithms
{
public static IEnumerable<T[]> GetCombinations<T>(this IReadOnlyList<IReadOnlyList<T>> input)
{
var result = new T[input.Count];
var indices = new int[result.Length];
for (int pos = 0, index = 0; ;)
{
for (; pos < result.Length; pos++, index = 0)
{
indices[pos] = index;
result[pos] = input[pos][index];
}
yield return result;
do
{
if (pos == 0) yield break;
index = indices[--pos] + 1;
}
while (index >= input[pos].Count);
}
}
}
Note that in order to not do allocation, the above method yields one and the same array instance. This is perfect if you want just to count or process it with foreach loop or LINQ query without storing the results. For instance:
foreach (var combination in listOfLists.GetCombinations())
{
// do something with the combination
}
If you indeed need to store the results, you can always use ToList:
var allCombinations = listOfLists.GetCombinations().Select(c => c.ToList()).ToList();
How about using LINQ?
var listOfLists = new List<List<int>>()
{
new List<int>() { 1, 2, 3, 4, 9 },
new List<int>() { 0, 3, 4, 5 },
new List<int>() { 1, 6 }
};
var result = from l in listOfLists[0]
from y in listOfLists[1]
from z in listOfLists[2]
select new List<int>()
{
l,
y,
z
};
This will of course only work for this specific list as there are 3 lists in your list of lists.

All possible combinations from given sets but without repetition of sets’ internal elements [duplicate]

This question already has answers here:
Generating all Possible Combinations
(12 answers)
Closed 9 years ago.
I have some fixed sets of integers, with distict increasing values in each a set, for example:
{1, 2, 4}, {1, 3}, {2, 4}, {2, 7}, {3, 6}, {5, 6}, {3, 5, 7}.
Is there any algorithm or better C# code, generating all possible combinations from given sets but without repetition of their internal integers, i.e.
[{1, 2, 4}, {3, 6}] < all integers are unique
[{1, 2, 4}, {3, 5, 7}] < all integers are unique
[{1, 3}, {2, 4}, {5, 6}] <– all integers are unique.
and so on.
====================================
Here is possible organization of input data:
var set1 = new HashSet<int>() { 1, 2, 4 };
var set2 = new HashSet<int>() { 1, 3 };
var set3 = new HashSet<int>() { 2, 4 };
var set4 = new HashSet<int>() { 2, 7 };
var set5 = new HashSet<int>() { 3, 6 };
var set6 = new HashSet<int>() { 5, 6 };
var set7 = new HashSet<int>() { 3, 5, 7 };
var inputList = new List<HashSet<int>>();
inputList.Add(set1);
inputList.Add(set2);
inputList.Add(set3);
inputList.Add(set4);
inputList.Add(set5);
inputList.Add(set6);
inputList.Add(set7);
I need to obtain a list (or collection) of all possible lists (i.e. combinations) of sets from inputList with unique integers in each internal list (combination).
This question was marked as duplicate and as a question which already has an answer here:
“Generating all Possible Combinations (8 answers)”. However, to my mind, it is essentially a DIFFERENT question:
input data are not two arrays of the same length but rather a list of sets with different number of elements in each a set;
equal elements may be present in different sets.
Personally I would approach this as a two part problem:
Find the unique numbers in the list of numbers you have been provided. To do this in C# you could use a hashset, or you can use linq (which I assume you want because you tagged this with linq)
Form all possible combinations of those numbers. This is already answered here: Getting all possible combinations from a list of numbers
Edit / Correction
So, delving further into the problem the goal here is to find combinations of the SETS not the individual numbers within those sets (as I had originally indicated above). With that in mind, I would do the following:
Create a square, two dimensional array of booleans. Each boolean represents whether or not the set at that index conflicts with the set at the other index. Fill this in first.
Use standard combinatorics (in the same was as mentioned above) but combine with checks to the list of booleans to discard invalid results.
I believe this should actually work. I (may) create an example here in a moment!
The Code
IMPORTANT: This isn't pretty or optimized in any sense of the word, but it does work and it does illustrate a way to solve this problem. One simple optimization would be to pass the Boolean array into the recursive step where the combinations are generated and stop the recursion as soon as a conflict is generated. That would save a ton of time and computing power. That said, I'll leave that as an exercise to the reader.
Just paste the following into a new console application and it should work. Best of luck!
static void Main(string[] args)
{
// create/read sets here
var integers = new List<int[]>(){new []{1,2,4}, new []{1,3}, new []{2, 4}, new []{2, 7}, new []{3, 6}, new []{5, 6}, new []{2, 5, 7}};
// allocate/populate booleans - loop may be able to be refined
var conflicts = new bool[integers.Count, integers.Count];
for (var idx = 0; idx < integers.Count; idx++)
for (var cmpIdx = 0; cmpIdx < integers.Count; cmpIdx++)
conflicts[idx, cmpIdx] = integers[idx].Any(x => integers[cmpIdx].Contains(x));
// find combinations (index combinations)
var combinations = GetCombinations(integers.Count);
// remove invalid entries
for (var idx = 0; idx < combinations.Count; idx++)
if (HasConflict(combinations[idx], conflicts))
combinations.RemoveAt(idx--);
// print out the final result
foreach (var combination in combinations) PrintCombination(combination, integers);
// pause
Console.ReadKey();
}
// get all combinatins
static List<Combination> GetCombinations(int TotalLists)
{
var result = new List<Combination>();
for (var combinationCount = 1; combinationCount <= TotalLists; combinationCount++)
result.AddRange(GetCombinations(TotalLists, combinationCount));
return result;
}
static List<Combination> GetCombinations(int TotalLists, int combinationCount)
{
return GetCombinations(TotalLists, combinationCount, 0, new List<int>());
}
// recursive combinatorics - loads of alternatives including linq cartesian coordinates
static List<Combination> GetCombinations(int TotalLists, int combinationCount, int minimumStart, List<int> currentList)
{
// stops endless recursion - forms final result
var result = new List<Combination>();
if (combinationCount <= 0)
{
if ((currentList ?? new List<int>()).Count > 0)
{
result.Add(new Combination() { sets = currentList });
return result;
}
return null;
}
for (var idx = minimumStart; idx <= TotalLists - combinationCount; idx++)
{
var nextList = new List<int>();
nextList.AddRange(currentList);
nextList.Add(idx);
var combinations = GetCombinations(TotalLists, combinationCount - 1, idx + 1, nextList);
if (combinations != null) result.AddRange(combinations);
}
return result;
}
// print the combination
static void PrintCombination(Combination value, List<int[]> sets)
{
StringBuilder serializedSets = new StringBuilder();
foreach (var idx in value.sets)
{
if (serializedSets.Length > 0) serializedSets.Append(", ");
else serializedSets.Append("{");
serializedSets.Append("{");
for (var setIdx = 0; setIdx < sets[idx].Length; setIdx++)
{
if (setIdx > 0) serializedSets.Append(", ");
serializedSets.Append(sets[idx][setIdx].ToString());
}
serializedSets.Append("}");
}
serializedSets.Append("}");
Console.WriteLine(serializedSets.ToString());
}
static bool HasConflict(Combination value, bool[,] conflicts)
{
for (var idx = 0; idx < value.sets.Count; idx++)
for (var cmpIdx = idx + 1; cmpIdx < value.sets.Count; cmpIdx++)
if (conflicts[value.sets[idx], value.sets[cmpIdx]]) return true;
return false;
}
// internal class to manage combinations
class Combination { public List<int> sets; }

Reordering an array in a customized way

I have an array of items that prints to pdf in the following order.
Lets say for eg:
lines = {1, 2, 3,
4, 5, 6,
7, 8, 9,
10}
is the content of my array.
However I want to change the order of the items in the array to
{1, 4, 7,
2, 5, 8,
3, 6, 9,
10}
Then I pass this array to my print engine. Basically if there are more than 3 items in the array, my new code should reorder it.
Could somebody help me figuring out the logic for that.
Thanks
Order the lines by the modulus of the line index with the number of rows.
public static ICollection<T> Sort<T>(ICollection<T> lines, int columns)
{
var rows = lines.Count/columns;
if (rows == 0)
{
return lines;
}
return lines.Select((line, i) => new {line, i})
.OrderBy(item => item.i < columns*rows ? item.i%rows : rows)
.Select(item => item.line)
.ToList();
}
Edit: Alternatively you can use an iterator method and the list's indexer instead of LINQ:
public static IEnumerable<T> Sort<T>(IList<T> lines, int columns)
{
var rows = lines.Count/columns;
for (var i = 0; i < lines.Count; i++)
{
var index = rows > 0 && i < columns*rows
? (i%columns)*rows + i/columns
: i;
yield return lines[index];
}
}
Assuming "for linear array assuming every 9 elements form 3x3 matrix transpose each subsequence, keep remainder as-is":
// assuming T[] items;
var toTranspose = (items.Count() / 9) * 9;
var remap = new int[]{1, 4, 7, 2, 5, 8, 3, 6, 9 };
var result = Enumerable.Range(0, toTranspose)
.Select(pos => items[(pos / 9) * 9 + (remap[pos % 9] - 1)])
.Concat(items.Skip(toTranspose)
.ToArray();
Summary of code:
get number of items that need to be moved (which is number of groups of 9 items int numberOfGroup = Count()/9;, multiplied by group size)
have custom transformation in remap array (note that indexes copied as-is from sample and actually off-by-one hence -1 in computing index)
for each element index under toTranspose get source element from corresponding group and apply transformation with remap.
finally Concat the remainder.
Notes:
one can easily provide custom transformation or inline transposition if needed.
can't apply transformation to the last partial group as elements will have to go to non-existent positions.

Find Same patterns in lists

Assume we have below lists:
List<int> Journey1 = new List<int>() { 1, 2, 3, 4, 5 };
List<int> Journey2 = new List<int>() { 2, 3, 4, 6, 7, 3, 4 };
List<int> Journey3 = new List<int>() { 6, 7, 1 };
List<int> Journey4 = new List<int>() { 3, 1, 4 };
And the patterns are:
2, 3, 4 -> Journey1, Journey2;
6, 7 -> Journey2, Journey3;
1 -> Journey2, Journey3, Journey4;
5 -> Journey1;
3, 4 -> Journey2;
3 -> Journey4;
4 -> Journey4;
We have 5000 lists, and each has around 200 items, so the patterns can have between 1-200 items and can be seen in 1-5000 lists.
Therefore I need very fast way of pattern matching.
Without precomputation and with a naive on-the-fly search:
var matchedJourneys = journeys.Where(x => ContainsPattern(x, mypattern));
bool ContainsPattern(List<int> list, List<int> pattern)
{
for(int i = 0; i < list.Count - (pattern.Count - 1); i++)
{
var match = true;
for(int j = 0; j < pattern.Count; j++)
if(list[i + j] != pattern[j])
{
match = false;
break;
}
if(match) return true;
}
return false;
}
This will execute at max 200 million equals checks for your 'numbers'. But since checks are not expected to be executed for whole patterns, that could be (just a guess) ~5 million equals operations if checking all the lists. That's a few hundred milliseconds.
It all depends on what is 'very fast' for you. If that's too slow, you will need a much much more complicated approach ...
I am not sure what you want as output. I just made a Try.
I suggest that you make a list of lists, instead of declaring individual list variables.
List<List<int>> journeys = new List<List<int>>();
journeys.Add(new List<int>() { 1, 2, 3, 4, 5 });
journeys.Add(new List<int>() { 2, 3, 4, 6, 7, 3, 4 });
journeys.Add(new List<int>() { 6, 7, 1 });
journeys.Add(new List<int>() { 3, 1, 4 });
I assumed that the numbers range from 0 to 255. With this query
var result = Enumerable.Range(0, 256)
.Select(number => new
{
number,
listIndexes = journeys
.Select((list, index) => new { index, list })
.Where(a => a.list.Contains(number))
.Select(a => a.index)
.ToList()
})
.Where(b => b.listIndexes.Count > 0)
.ToList();
and this test loop
foreach (var item in result) {
Console.Write("Number {0} occurs in list # ", item.number);
foreach (var index in item.listIndexes) {
Console.Write("{0} ", index);
}
Console.WriteLine();
}
you will get this result
Number 1 occurs in list # 0 2 3
Number 2 occurs in list # 0 1
Number 3 occurs in list # 0 1 3
Number 4 occurs in list # 0 1 3
Number 5 occurs in list # 0
Number 6 occurs in list # 1 2
Number 7 occurs in list # 1 2
Where the lists are numbered starting at zero.
For brute force approach you can try to use polynomial hash-functions to speed up sub-section matches. Still insane number of comparisons required, but at least match could be almost constant irrespective of sub-sequence length.
In your case there are opportunities to benefit from pattern preprocessing as well as text preprocessing (http://en.wikipedia.org/wiki/String_searching_algorithm).
For instance, constructing a trie for all subsequences in a list will allow to query this list for a given pattern in time proportional to the pattern length.

Categories

Resources