Strange result when comparing two Lists - c#

I want to compare two lists. I want to check if List2 has any of the items in List1. I get unexpected result. Please see my code below.
test code class
class Program
{
static void Main(string[] args)
{
bool loop = true;
int textcount = 1;
while (loop)
{
var collection1 = GetCollection();
var collection2 = GetCollection();
Console.WriteLine("Test No " + textcount.ToString());
Console.WriteLine("Collection 1 =" + String.Join(", ", collection1.ToArray()));
Console.WriteLine("Collection 2 =" + String.Join(", ", collection2.ToArray()));
System.Diagnostics.Stopwatch watch = new System.Diagnostics.Stopwatch();
watch.Start();
var hasitem = collection1.Any(item => collection2.Contains(item));
watch.Stop();
Console.WriteLine(hasitem.ToString() + " Found in " + watch.ElapsedTicks.ToString());
watch.Reset();
watch.Start();
var hasAtLeastOne = collection1.Intersect(collection2).Any();
watch.Stop();
Console.WriteLine(hasAtLeastOne.ToString() + " With Intersect Found in " + watch.ElapsedTicks.ToString());
textcount++;
Console.ReadKey();
}
}
static Random ran = new Random();
private static IEnumerable<int> GetCollection()
{
for (int i = 0; i < 5; i++)
{
yield return ran.Next(i, 20);
}
}
}
and the result is very annoying. see the last 4 result.
Test No 1
Collection 1 =10, 8, 18, 6, 11
Collection 2 =3, 12, 18, 13, 6
True Found in 3075
True With Intersect Found in 15297
Test No 2
Collection 1 =18, 13, 7, 18, 5
Collection 2 =12, 18, 8, 3, 5
True Found in 22
True With Intersect Found in 100
Test No 3
Collection 1 =1, 6, 15, 7, 9
Collection 2 =16, 15, 14, 14, 12
True Found in 21
True With Intersect Found in 23
Test No 4
Collection 1 =3, 16, 7, 4, 19
Collection 2 =6, 3, 15, 15, 9
True Found in 21
True With Intersect Found in 56
Test No 5
Collection 1 =18, 18, 9, 17, 10
Collection 2 =17, 12, 4, 3, 11
True Found in 25
True With Intersect Found in 20
Test No 6
Collection 1 =9, 9, 2, 17, 19
Collection 2 =17, 2, 18, 3, 15
False Found in 109
False With Intersect Found in 41
Test No 7
Collection 1 =3, 15, 3, 5, 5
Collection 2 =2, 2, 11, 7, 6
True Found in 22
False With Intersect Found in 15
Test No 8
Collection 1 =7, 14, 17, 14, 18
Collection 2 =18, 4, 7, 18, 16
False Found in 28
True With Intersect Found in 19
Test No 9
Collection 1 =3, 9, 6, 18, 9
Collection 2 =10, 3, 17, 17, 18
True Found in 28
True With Intersect Found in 22
Test No 10
Collection 1 =15, 18, 2, 9, 8
Collection 2 =10, 15, 3, 10, 19
False Found in 135
True With Intersect Found in 128
Test No 11
Collection 1 =6, 2, 17, 18, 18
Collection 2 =14, 16, 14, 6, 4
False Found in 20
False With Intersect Found in 17

The problem is that what you call "collection" is actually an unstable sequence of items that changes everytime it is enumerated. The reason for this is the way you implemented GetCollection. Using yield return basically returns a blue print on how to create the sequence. It doesn't return the sequence itself.
And so, everytime that "blue print" is being enumerated, it is being used to create a new sequence.
You can verify this by simply outputing your "collections" twice. You will see that the values are different each time.
And that's the reason why your test yields completely arbitrary results:
You enumerate each collection three times:
First enumeration happens when you output it to the console
Second enumeration happens on the test with Any and Contains. Because this starts a new enumeration new random numbers will be generated.
Third enumeration happens on the Intersect test. This creates yet another set of random numbers.
To fix it, create a stable sequence by calling ToArray() on the result of GetCollection.

This is your problem:
private static IEnumerable<int> GetCollection()
{
for (int i = 0; i < 5; i++)
{
yield return ran.Next(i, 20);
}
}
Make it into this:
private static List<int> GetCollection()
{
return new List<int>
{
ran.Next(0, 20),
ran.Next(1, 20),
ran.Next(2, 20),
ran.Next(3, 20),
ran.Next(4, 20)
};
}
And your problem will disappear.
The long explanation is that when you make an IEnumerable function, you can expect it to repeatedly call the iterator on various LINQ calls (after all, that's what an IEnumerable does, right?). Since you do a yield return <some random number> on each iterator call, you can expect unstable results. Best to either save a reference to the .ToArray() result, or just return a stable list.

Related

Fix returned number

I want to know how to clamp/range/fix (I don't know how to call it) a number.
For exmaple, I want to always fix a number into multiple of 10 so it should works like this:
If you get number: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 must be fixed at 0.
If you get number: 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 must be fixed at 10.
If you get number: 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 must be fixed at 20.
And so on.
I know that it must be easy but I can't find out by myself, thank you in advise.
Divide by the desired precision (e.g. 10) -- integer division always floors values, i.e. returns the smallest integer for the division -- and then multiply again with the same precision.
public static int FloorToPrecision(int value, int precision) {
return (value / precision) * precision;
}
e.g. Console.WriteLine(FloorToPrecision(17, 10)); prints out 10.

Describe array transition with specific commands

Lets suppose we have two integer arrays, for example:
var A = new int[] {1, 4, 6, 12, 44};
var B = new int[] {2, 4, 6, 44, 45};
The problem is to describe a transition steps as:
Starting with the array A and continue as following:
STEP 1 : remove index = 0 // result = {4, 6, 12, 44}
STEP 2 : insert index = 0 value = 2 // result = {2, 4, 6, 12, 44}
STEP 3 : remove index = 3 // result = {2, 4, 6, 44}
STEP 4 : insert index = 4 value = 45 // result = {2, 4, 6, 44, 45}
And we now have the array B
My Question is: How can I design this algorithm in any programming language or pseudocode that generates these steps programatically for given array A and B?
common step structure is like :
STEP N : insert/remove index = i [value = v]
Of course removing all elements from A then inserting all elements from B is a solution and like that maybe there will be more than 1 solutions for given A and B, but I am looking for the transition with the fewest steps.
There is the concept of the Levenshtein distance. It is used for comparing two strings (however you can also apply the same principle to integer arrays): minimal number of edits to change one string into another, where one edit is either a deletion, an insertion or a substitution.
The problem can be efficiently solved using dynamic programming. The wiki article shows such an approach.
You can use the same approach for your problem. But since in your case you are not allowed to substitute an element (only remove and insert operations), you can even simplify the the recursion a (tiny) bit:
def edit_distance(s, len_s, t, len_t):
if len_s == 0:
return len_t
if len_t == 0:
return len_s
if s[len_s-1] == t[len_t-1]:
return edit_distance(s, len_s-1, t, len_t-1)
else:
return min(edit_distance(s, len_s-1, t, len_t) + 1,
edit_distance(s, len_s, t, len_t-1) + 1)
The code above is without dynamic programming. To make it efficient, you need to add it.
Also, the code will only compute the number of steps. If you also want to list the steps, you have to store the complete table and backtrack the solution.
Time and memory complexity of the approach: O(len_s * len_t).
Here is an example using your two arrays [1, 4, 6, 12, 44] and [2, 4, 6, 44, 45]. Here is a table that you would get if you apply dynamic programming (e.g. with a bottom first approach) for each of the possible prefix-combination of the strings.
0 1 2 3 4 5
1 2 3 4 5 6
2 3 2 3 4 5
3 4 3 2 3 4
4 5 4 3 4 5
5 6 5 4 3 4
At the bottom right we see that 4 is the optimal number of steps to make both array equal. Now we can backtrack and look at the recursive formula again. Since the last two elements are not equal, it has to be an insert/remove operation. We can see in the table the optimal number of steps for [1, 4, 6, 12], [2, 4, 6, 44, 45] is 5, and for [1, 4, 6, 12, 44], [2, 4, 6, 44] is 3. So the optimal thing here is to remove the last element of the second array, or in other words to insert 45 in the first one.
Now we can thing about the last step that resulted in [1, 4, 6, 12, 44], [2, 4, 6, 44]. Since the last elements are equal, the step is clear. We leave both of them and perform no insert or remove operation.
So what was the last step in [1, 4, 6, 12], [2, 4, 6]? The table shows that the optimal value 3 originated from the position [1, 4, 6], [2, 4, 6], which means a removing 12 in the first array.
And so on.
Interestingly there can be multiple optimal solutions. Here I show you one possible path (which corresponds exactly to your solution):
0-1 2 3 4 5
|
1 2 3 4 5 6
\
2 3 2 3 4 5
\
3 4 3 2 3 4
|
4 5 4 3 4 5
\
5 6 5 4 3-4

Fastest way for Linq to find duplicate Lists?

Given a data structure of:
class TheClass
{
int NodeID;
double Cost;
List<int> NodeIDs;
}
And a List with data:
27 -- 10.0 -- 1, 5, 27
27 -- 10.0 -- 1, 5, 27
27 -- 10.0 -- 1, 5, 27
27 -- 15.5 -- 1, 4, 13, 14, 27
27 -- 10.0 -- 1, 4, 25, 26, 27
27 -- 15.5 -- 1, 4, 13, 14, 27
35 -- 10.0 -- 1, 4, 13, 14, 35
I want to reduce it to the unique NodeIDs lists
27 -- 10.0 -- 1, 5, 27
27 -- 15.5 -- 1, 4, 13, 14, 27
27 -- 10.0 -- 1, 4, 25, 26, 27
35 -- 10.0 -- 1, 4, 13, 14, 35
Then I'll be summing the Cost column (Node 27 total cost: 10.0 + 15.5 + 10.0 = 35.5) -- that part is straight forward.
What is the fastest way to remove the duplicate rows / find uniques?
Production data set will have NodeIDs lists of 100 to 200 IDs, about 1,500 in List with around 500 being unique.
I'm 100% focused on speed -- if adding some other data would help, I'm happy to (I've tried hashing the lists into a SHA value, but that turned out slower than my current grunt exhaustive search).
.GroupBy(x=> string.Join(",", x.NodeIDs)).Select(x=>x.First())
That should be faster for big data than Distinct.
If you want to remove duplicate objects according to equal lists you could create a custom IEqualityComparer<T> for lists and use that for Enumerable.GroupBy. Then you just need to create new instances of your class for each group and sum up Cost.
Here is a possible implementation (from):
public class ListEqualityComparer<T> : IEqualityComparer<List<T>>
{
public bool Equals(List<T> lhs, List<T> rhs)
{
return lhs.SequenceEqual(rhs);
}
public int GetHashCode(List<T> list)
{
unchecked
{
int hash = 23;
foreach (T item in list)
{
hash = (hash * 31) + (item == null ? 0 : item.GetHashCode());
}
return hash;
}
}
}
and here is a query that selects one (unique) instance per group:
var nodes = new List<TheClass>(); // fill ....
var uniqueAndSummedNodes = nodes
.GroupBy(n => n.NodeIDs, new ListEqualityComparer<int>())
.Select(grp => new TheClass
{
NodeID = grp.First().NodeID, // just use the first, change accordingly
Cost = grp.Sum(n => n.Cost),
NodeIDs = grp.Key
});
nodes = uniqueAndSummedNodes.ToList();
This implementation uses SequenceEqual which takes the order and the number of occurences of each number in the list into account.
Edit: I've only just seen that you don't want to sum up the group's Costs but to sum up all groups' Cost, that's simple:
double totalCost = nodes.Sum(n => n.Cost);
If you dont want to sum up the group itself replace
...
Cost = grp.Sum(n => n.Cost),
with
...
Cost = grp.First().Cost, // presumes that all are the same

copy an array from x to y not x to array.length

I've tried copying arrays in such a way I can crunch data in an array with threads but obviously without splitting the array into smaller chunks (lets say 1 array -> 4 quarters (4 arrays)).
The only method I can find copies from a specified (int)start point and copies all leading data from the start to the end which if I am using multiple threads to crunch the data its nullifies the point of threading.
Here is pseudo code to show what I wish to do.
int array { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }
int split1 { 0, 1, 2, 3 }
int split2 { 4, 5, 6, 7 }
int split3 { 8, 9, 10, 11 }
int split4 { 12, 13, 14, 15 }
or lets say the length of the array cant be split up evenly
int array { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 }
int split1 { 0, 1, 2, 3 }
int split2 { 4, 5, 6, 7 }
int split3 { 8, 9, 10, 11 }
int split4 { 12, 13, 14, 15, 16}
The only method I can find copies from a specified (int)start point and copies all leading data from the start to the end which if I am using multiple threads to crunch the data its nullifies the point of threading.
It's a shame you didn't show which method that was. Array.Copy has various overloads for copying part of an array to another array. This one is probably the most helpful:
public static void Copy(
Array sourceArray,
int sourceIndex,
Array destinationArray,
int destinationIndex,
int length
)
Alternatively, look at Buffer.BlockCopy, which has basically the same signature - but the values are all in terms of bytes rather than array indexes. It also only works with arrays of primitives.
Another alternative would be not to create copies of the array at all - if each thread knows which segment of the array it should work with, it can access that directly. You should also look into Parallel.ForEach (and similar methods) as a way of parallelizing operations easily at a higher level.

How to determine the number of right bit-shifts needed for a power of two value?

I have a function that receives a power of two value.
I need to convert it to an enum range (0, 1, 2, 3, and so on), and then shift it back to the power of two range.
0 1
1 2
2 4
3 8
4 16
5 32
6 64
7 128
8 256
9 512
10 1024
... and so on.
If my function receives a value of 1024, I need to convert it to 10. What is the best way to do this in C#? Should I just keep dividing by 2 in a loop and count the iterations?
I know I can put it back with (1 << 10).
Just use the logarithm of base 2:
Math.Log(/* your number */, 2)
For example, Math.Log(1024, 2) returns 10.
Update:
Here's a rather robust version that checks if the number passed in is a power of two:
public static int Log2(uint number)
{
var isPowerOfTwo = number > 0 && (number & (number - 1)) == 0;
if (!isPowerOfTwo)
{
throw new ArgumentException("Not a power of two", "number");
}
return (int)Math.Log(number, 2);
}
The check for number being a power of two is taken from http://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
There are more tricks to find log2 of an integer on that page, starting here:
http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious
This is the probably fastest algorithm when your CPU doesn't have a bit scan instruction or you can't access that instruction:
unsigned int v; // find the number of trailing zeros in 32-bit v
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];
See this paper if you want to know how it works, basically, it's just a perfect hash.
Use _BitScanForward. It does exactly this.

Categories

Resources