Alternative to hashing for quick comparison to avoid conflicts - c#

I'm implementing a caching table to avoid having to perform a costly operation that creates a generic object from a set of parameters describing it. Once an object is requested, an hash of these parameters is computed, and a Dictionary containing the already created objects is queried to check if a copy has already been created, in which case its returned without the need of creating it again.
My problem lies in the fact that since the parameters describing these objects can be many, collisions in the hashing function are unavoidable (and too frequent), but on the other hand retrieving these objects is a performance-critical operation and i cannot afford full comparisons on all existing descriptions to search among already created objects.
I've tried to solve with many different hashing functions but since the nature of the parameters is unknown the results are unreliable.
What solutions other than hashing are there to this caching problem, or can hashing be used differently to avoid conflicts?
C# description of the problem:
class ObjectDescriptor
{
// description made of a list of parameters of unknown type
public object[] Fields;
// hashing procedure that may have conflicts
public override int GetHashCode()
{
int hash = 1009;
for (int i = 0; i < Fields.Length; i++)
{
unchecked { hash = hash * 9176 + Fields[i].GetHashCode(); }
}
return hash;
}
}
abstract class ObjectCache<T>
{
private Dictionary<int, T> indexedObjects;
// this operation is called many times and must be fast
public T Get(ObjectDescriptor descr)
{
T cachedValue;
if(!indexedObjects.TryGetValue(descr.GetHashCode(), out cachedValue))
{
cachedValue = CreateObject(descr);
indexedObjects[descr.GetHashCode()] = cachedValue;
}
return cachedValue;
}
// costly operation
protected abstract T CreateObject(ObjectDescriptor desc);
}

I'll leave the solution I ended up using. This is based on the fact that conflicts can be avoided by storing whole values from multiple fields in a single hash where possible:
byte b1 = 42, b2 = 255;
int naiveHash = CombineHash(b1.GetHashCode(), b2.GetHashCode()); // will always have conflicts
int typeAwareHash = b1 << 8 + b2; // no conflicts
To know how many bits are required by a field I required the implementation of IObjectDescriptorField:
interface IObjectDescriptorField
{
int GetHashCodeBitCount();
}
I then updated the ObjectDescriptor class with an HashCodeBuilder class:
class ObjectDescriptor
{
public IObjectDescriptorField[] Fields;
public override int GetHashCode()
{
HashCodeBuilder hash = new HashCodeBuilder();
for (int i = 0; i < Fields.Length; i++)
{
hash.AddBits(Fields[i].GetHashCode(), Fields[i].GetHashCodeBitCount());
}
return hash.GetHashCode();
}
}
HashCodeBuilder stack up bits until all 32 are used, and then uses a simple hash combination function like before:
public class HashCodeBuilder
{
private const int HASH_SEED = 352654597;
private static int Combine(int hash1, int hash2)
{
return ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ hash2;
}
private int hashAccumulator;
private int bitAccumulator;
private int bitsLeft;
public HashCodeBuilder()
{
hashAccumulator = HASH_SEED;
bitAccumulator = 0;
bitsLeft = 32;
}
public void AddBits(int bits, int bitCount)
{
if (bitsLeft < bitCount)
{
hashAccumulator = Combine(hashAccumulator, bitAccumulator);
bitsLeft = 32;
hashAccumulator = 0;
}
bitAccumulator = bitAccumulator << bitCount + bits;
bitsLeft -= bitCount;
}
public override int GetHashCode()
{
return Combine(hashAccumulator, bitAccumulator);
}
}
This solution of course still have conflicts if more then 32 bits are used, but it worked for me because many of the fields where just bools or Enums with few values, which greatly benefit to be combined like this.

Related

Wrapping C# primitives in struct performance implications

I am writing some code on geometry processing, delaunay triangulation to be more specific, and I need it to be fast, so I use simple arrays of primitive as data structure to represent my triangulation information, here is a sample of it
private readonly float2[] points;
private readonly int[] pointsHalfEdgeStartCount;
private readonly int[] pointsIncomingHalfEdgeIndexes;
So let's say I want to iterate fast through all the incoming half-edge of the point of index p, I just do this using the precomputed arrays:
int count = pointsHalfEdgeStartCount[p * 2 + 1];
for (int i = 0; i < count; i++)
{
var e = pointsIncomingHalfEdgeIndexes[pointsHalfEdgeStartCount[p * 2] + i]
}
// pointsHalfEdgeStartCount[p * 2] is the start index
And this is fast enought, but does not feel safe or very clear. So I had the idea of wrapping my index into struct to make it clearer while retaining the performance, somthing like that:
public readonly struct Point
{
public readonly int index;
public readonly DelaunayTriangulation delaunay
public Point(int index, DelaunayTriangulation delaunay)
{
this.index = index;
this.delaunay = delaunay;
}
public int GetIncomingHalfEdgeCount() => delaunay.pointsEdgeStartCount[index * 2 + 1];
public HalfEdge GetIncomingHalfEdge(int i)
{
return new HalfEdge(
delaunay,
delaunay.pointsIncomingHalfEdgeIndexes[delaunay.pointsEdgeStartCount[index * 2] + i]
);
}
//... other methods
}
Then I can just do so:
int count = p.GetIncomingHalfEdgeCount();
for (int i = 0; i < count; i++)
{
var e = p.GetIncomingHalfEdge(i);
}
However it was kind of killing my performance, being a lot slower (around 10 times) on a benchmark I did, iterating over all the points and iterating over all their incoming half-edge. I guess because storing a reference to the delaunay triangulaiton in each point struct was an obvious waste and slowed down all the operations involving points, having twice the amount of data to move.
I could make the DelaunayTriangulation a static class but it was not practical for other reasons, so I did that:
public readonly struct Point
{
public readonly int index;
public Point(int index) => this.index = index;
public int GetIncomingHalfEdgeCount(DelaunayTriangulation delaunay) => delaunay.pointsEdgeStartCount[index * 2 + 1];
public HalfEdge GetIncomingHalfEdge(DelaunayTriangulation delaunay, int i)
{
return new HalfEdge(
delaunay.pointsIncomingHalfEdgeIndexes[delaunay.pointsEdgeStartCount[index * 2] + i]
);
}
//... other methods
}
I can just do so:
int count = p.GetIncomingHalfEdgeCount(delaunay);
for (int i = 0; i < count; i++)
{
var e = p.GetIncomingHalfEdge(delaunay, i);
}
It was quite a lot faster, but still 2.5 times slower than the first method using simple int. I wondered if it could be because I was getting int in the first method while I got HalfEdge struct in the other methods (A struct similar to the Point struct, contains only an index as data and a couple of methods), and difference between plain int and the faster struct vanished when I used the e int to instantiate a new HalfEdge struct. Though I am not sure why is that so costly.Weirder still, I explored for clarity sake the option of wrinting the method inside the Delaunay class instead of the Point struct:
// In the DelaunayTriangulation class:
public int GetPointIncomingHalfEdgeCount(Point p) => pointsEdgeStartCount[p.index * 2 + 1];
public HalfEdge GetPointIncomingHalfEdge(Point p, int i)
{
return new HalfEdge(
pointsIncomingHalfEdgeIndexes[pointsEdgeStartCount[p.index * 2] + i]
);
}
And I used it like this:
int count = delaunay.GetPointIncomingHalfEdgeCount(p);
for (int i = 0; i < count; i++)
{
var e = delaunay.GetPointIncomingHalfEdge(p, i);
}
And it was 3 times slower than the previous method! I have no idea why.
I tried to use disassembly to see what machine code was generated but I failed to do so (I am working with Unity3D). Am I condemned to rely on plain int in arrays and sane variable naming and to renounce on trying to have some compile-time type checking (is this int really a point index ?)
I am not even bringing up other questions such as, why it is even slower when I try to use IEnumerable types with yields like so:
public IEnumerable<int> GetPointIncomingHalfEdges(Point p)
{
int start = pointsEdgeStartCount[p.index * 2]; // this should be a slight optimization right ?
int count = pointsEdgeStartCount[p.index * 2 + 1];
for (int i = 0; i < count; i++)
{
yield pointsIncomingHalfEdgeIndexes[start + i];
}
}
I have added a compiler directive for aggressive inlining and it seems to make up for the discrepencies in time! For some reason the compiler fails to inline correctly for:
var e = delaunay.GetPointIncomingHalfEdge(p, i);
While it managed to do so with
var e = p.GetIncomingHalfEdge(delaunay, i);
Why ? I do not know. However It would be far easier if I was able to see how the code is compiled and I could not find how to do that. I will search that, maybe open another question and if I find a better explaination I will come back!

filtering out list of unordered int from a bigger list

I got an unordered list of int. Between 80 to 140 items, value of each item is between 0 and 175.
I'm generating a list of that list, about 5 to 10 millions of them.
I need to process, as fast as possible, all unique ordered sequence (excluding duplicate).
The way I'm doing it right now is creating a hash of all value of a list and inserting it into a hashset.
two hot spot while profiling is the ToArray() HOTSPOT1 and Array.Sort() HOTSPOT2
is there a better way of doing that task or a better alternative to fix the 2 hotspots? speed is important.
small demo, I tried to replicate as much as possible
using System;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp1
{
class Example
{
//some other properties
public int Id { get; set; }
}
class Program
{
static void Main(string[] args)
{
var checkedUnlock = new HashSet<int>();
var data = FakeData();
foreach (List<Example> subList in data)
{
var hash = CalcHash(subList.Select(x => x.Id).ToArray()); // HOTPSOT1
var newHash = checkedUnlock.Add(hash);
if (newHash)
{
//do something
}
}
}
static int CalcHash(int[] value)
{
Array.Sort(value); // HOTPSOT2
int hash;
unchecked // https://stackoverflow.com/a/263416/40868
{
hash = (int)2166136261;
var i = value.Length;
while (i-- > 0)
hash = (hash * 16777619) ^ value[i];
}
return hash;
}
//don't look at this, this is just to fake data
static List<List<Example>> FakeData()
{
var data = new List<List<Example>>();
var jMax = 10; //normally between 80 and 140
var idMax = 25; //normally between 0 and 175
var rnd = new Random(42);
var ids = Enumerable.Range(0, idMax).ToArray();
for (int i = 0; i < 500000; ++i)
{
//force duplicate
if(i % 50000 == 0)
{
ids = Enumerable.Range(0, idMax).ToArray();
rnd = new Random(42);
}
for (int r = 0; r < idMax; ++r)
{
int randomIndex = rnd.Next(idMax);
int temp = ids[randomIndex];
ids[randomIndex] = ids[r];
ids[r] = temp;
}
var subList = new List<Example>();
data.Add(subList);
for (int j = 0; j < jMax; ++j)
{
subList.Add(new Example() { Id = ids[j] });
}
}
return data;
}
}
}
So you have an array that can contain up to 140 items, and all values are in the range 0 through 175. All values in the array are unique, and order doesn't matter. That is, the array [20, 90, 16] will be considered the same as [16, 20, 90].
Given that, you can represent a single array as a set of 175 bits. Better, you can create the set without having to sort the input array.
You represent a set in C# as a BitArray. To compute the hash code of your array, you create the set, and then you iterate over the set to get the hash code. It looks something like this:
private BitArray HashCalcSet = new BitArray(175);
int CalcHash(int[] a, int startIndex)
{
// construct the set
HashCalcSet.SetAll(false);
for (var i = startIndex; i < a.Length; ++i)
{
HashCalcSet[a[i]] = true;
}
// compute the hash
hash = (int)2166136261;
for (var i = 174; i >= 0; --i)
{
if (HashCalcSet[i])
{
hash = (hash * 16777619) ^ value[i];
}
}
return hash;
}
That eliminates the sorting as well as the ToArray. You have to loop over the BitArray a couple of times, but three passes over the BitArray is quite possibly faster than sorting.
One problem I see with your solution is in how you're using the HashSet. You have this code:
var hash = CalcHash(subList.Select(x => x.Id).ToArray()); // HOTPSOT1
var newHash = checkedUnlock.Add(hash);
if (newHash)
{
//do something
}
That code mistakenly assumes that if the hash codes for two arrays are equal, then the arrays are equal. You're generating a 32-bit hash code for a 175-bit quantity. There will definitely be hash collisions. You're going to end up saying that two of your arrays are identical, when they aren't.
If that is a concern to you, let me know and I can edit my answer to provide a solution.
Allowing for comparison
If you want the ability to compare items for equality, rather than just checking if their hash codes are the same, you need to create an object that has Equals and GetHashCode methods. You'll insert that object into your HashSet. The simplest of those objects would contain the BitArray I described above, and methods that operate on it. Something like:
class ArrayObject
{
private BitArray theBits;
private int hashCode;
public override bool Equals(object obj)
{
if (object == null || GetType() != obj.GetType())
{
return false;
}
ArrayObject other = (ArrayObject)obj;
// compare two BitArray objects
for (var i = 0; i < theBits.Length; ++i)
{
if (theBits[i] != other.theBits[i])
return false;
}
return true;
}
public override int GetHashCode()
{
return hashCode;
}
public ArrayObject(int hash, BitArray bits)
{
theBits = bits;
hashCode = hash;
}
}
The idea being that you construct the BitArray and the hash code in the method as described above (although you'll have to allocate a new BitArray for each call), and then create and return one of these ArrayObject instances.
Your HashSet becomes HashSet<ArrayObject>.
The above works, but it's a big of a memory hog. You could reduce the memory requirement by creating a class that contains just three long integers. Instead of using a BitArray, you manipulate the bits directly. You map the bits so that numbers 0 through 63 modify bits 0 through 63 in the first number. Numbers 64 through 127 correspond to bits 0 through 63 of the second number, etc. You don't have to save a separate hash code then, because it'd be easy to compute from the three longs, and equality comparison becomes a lot easier, too.
The class looks something like this. Understand, I haven't tested the code, but the idea should be sound.
class ArrayObject2
{
private long l1;
private long l2;
private long l3;
public ArrayObject2(int[] theArray)
{
for (int i = 0; i < theArray.Length; ++i)
{
var rem = theArray[i] % 63;
int bitVal = 1 << rem;
if (rem < 64) l1 |= bitVal;
else if (rem < 128) l2 |= bitVal;
else l3 |= bitVal;
}
}
public override bool Equals(object obj)
{
var other = obj as ArrayObject2;
if (other == null) return false;
return l1 == other.l1 && l2 == other.l2 && l3 == other.l3;
}
public override int GetHashCode()
{
// very simple, and not very good hash function.
return (int)l1;
}
}
As I commented in the code, the hash function there isn't great. It will work, but you can do better with a little research.
This approach has the advantage of using less memory than the BitArray or the Boolean array. It'll probably be slower than the array of bool. It might be faster than the BitArray code. But whatever the case, it'll keep you from making the mistaken assumption that identical hash codes equals identical arrays.
I think you can save some time by re-using one array of bigger size instead of allocating new array every time causing extra memory traffic and garbage collection.
That would require custom sorting implementation which knows that even though array can have 1000 items, for current run only first 80 items needs to be sorted (and same for hash). It looks quicksort operating on subrange of ids should work fine. Quick sample of idea (haven't tested in details)
int[] buffer = new int[1000];
foreach (List<Example> subList in data)
{
for (int i = 0; i < subList.Count; i++)
{
buffer[i] = subList[i].Id;
}
var hash = CalcHashEx(buffer, 0, subList.Count - 1);
var newHash = checkedUnlock.Add(hash);
if (newHash)
{
//do something
}
}
public static void QuickSort(int[] elements, int left, int right)
{
int i = left, j = right;
int pivot = elements[(left + right) / 2];
while (i <= j)
{
while (elements[i] < pivot)
{
i++;
}
while (elements[j] > pivot)
{
j--;
}
if (i <= j)
{
// Swap
int tmp = elements[i];
elements[i] = elements[j];
elements[j] = tmp;
i++;
j--;
}
}
if (left < j)
{
QuickSort(elements, left, j);
}
if (i < right)
{
QuickSort(elements, i, right);
}
}
static int CalcHashEx(int[] value, int startIndex, int endIndex)
{
QuickSort(value, startIndex, endIndex);
int hash;
unchecked // https://stackoverflow.com/a/263416/40868
{
hash = (int)2166136261;
var i = endIndex + 1;
while (i-- > 0)
hash = (hash * 16777619) ^ value[i];
}
return hash;
}
This version of CalcHash() will let you remove the .ToArray() and replaces the Array.Sort() with something different that can act on a sequence, rather than needing the entire set... so that's both hot spots.
static int CalcHash(IEnumerable<int> value)
{
value = value.OrderByDescending(i => i);
int hash;
unchecked // https://stackoverflow.com/a/263416/40868
{
hash = (int)2166136261;
foreach(var item in value)
{
hash = (hash * 16777619) ^ item;
}
}
return hash;
}
I'm not sure how OrderByDescending() will fare in comparison. I suspect it will be slower than Array.Sort(), but still be an over-all win because of eliminating ToArray()... but you'll need to run the profiler again to know for sure.
There may also be improvement you can get from eliminating or reducing branching, via .GroupBy(), and running the code on the .First() item in each group:
var groups = data.GroupBy(sub => CalcHash(sub.Select(x => x.Id)));
foreach(List<Example> subList in groups.Select(g => g.First()))
{
//do something
}
going to put this here since it make no sense to put it in a comment
so far what I have done is created an array of boolean and setting the index of the item to true when present and I replaced the CalcHash with;
unchecked
{
hash = (int)2166136261;
var i = theMaxLength;
while (i-- > 0)
if(testing[i]) //the array of boolean
{
hash = (hash * 16777619) ^ i;
testing[i] = false;
}
}
doing so I removed the ToArray() and the Array.Sort() completely, this solution is kind of built from the dlxeon/jim/joel answer
i dropped the runtime by about 20-25% which is great

C# How to reduce repeat code?

I've largely self taught myself C# code, I took a class in college that didnt help much...sure learned how to follow a book. So i've created tools here and there learning by example online. Stackoverflow is my favorite site for that, usually the community is helpful...
Anyway my question is this, im creating a WPF program that does multiple calculations on a number of decimals, the I've been getting by reusing the same code calculation over an over again, it works fine but I know there is a simpler way of doing it with much, much fewer lines. I seem to be missing that knowledge.
here is an example of how i would do things.
int int1 = 4;
int int2 = 2;
int int3 = 6;
int int4 = 8;
int calc1 = 0;
int calc2 = 0;
int calc3 = 0;
calc = int1 * int4
calc2 = int1 * int2
calc3 = int3 * int3
if (calc >= calc3)
{
do something;
}
else
{
calc = 33
}
if (calc2 >= calc3)
{
do something;
}
else
{
calc2 = 33
}
if (calc3 >= calc2)
{
do something;
}
else
{
calc3 = 33
}
if (calc3 >= calc)
{
do something;
}
else
{
calc2 = 33
}
I hope that is clear enough.. I can repeat the code but I am just unsure how to use C# better, I know it has built in ways of reducing the repeating code, just dont know how to find them.
Any help or examples are appreciated.
The simplest solution that pops out to me is to turn it into a method. (I will leave the access modifier for the function up to you...it depends on where you will be reusing this code)
int CustomCompare(int leftHandSide, int rightHandSide)
{
int calc;
if (leftHandSide >= rightHandside)
{
do something;
}
else
{
leftHandSide= 33
}
return leftHandSide
}
You would just pass in your variables:
calc = CustomCompare(calc, calc3)
You could even change the do something portion to be a custom action that you pass in if you want. Take a look at Action in MSDN
int CustomCompare(int leftHandSide, int rightHandSide, Action doSomething)
{
int calc;
if (leftHandSide >= rightHandside)
{
doSomething();
}
else
{
leftHandSide= 33
}
return leftHandSide
}
...
calc = CustomCompare(calc, calc3,
()=>{do some stuff that will be executed inside the method});
And Func can allow you to return a value from that doSomething action
The simplest way to reuse code among methods of the same class like that is to define a private method for that calculation. This way you'd be able to reference that code by calling a method, rather than by copy-pasting some code. In fact, every time you copy-paste, you know you're missing a method.
If you need to share code among related classes, you can make a protected method in the base class.
Finally, for project-wide "horizontal" reuse you can define a static helper class, and your methods to it as public static. This way every class in your project would be able to reuse your calculation.
How about creating a private method within your class, then call the method when you need a calculation done. This eliminates rewriting code over and over again.
Example:
int int1 = 4;
int calc1 = 0;
Calculation(int1, calc1);
int int2 = 2;
int calc2 = 0;
Calculation(int2, calc2);
//private method
private Calculation(int integer, int calculation)
{
//calculate
}
side note: I prefer to arrange all variables first, then act (function calls, etc.) on it (based on Arrange-Act-Assert associated with unit testing). However, I did it this way to emphasize my point.
Err.... call a function?
doCalc(4, 2, 6, 8)
static public void doCalc(int int1, int int2, int int3, int int4)
{
int calc1 = int1 * int4
int calc2 = int1 * int2
int calc3 = int3 * int3
if (calc >= calc3)
{
do something;
}
else
{
calc = 33
}
if (calc2 >= calc3)
{
do something;
}
else
{
calc2 = 33
}
if (calc3 >= calc2)
{
do something;
}
else
{
calc3 = 33
}
if (calc3 >= calc)
{
do something;
}
else
{
calc2 = 33
}
}
also, mind the indentations. when you start a new scope, add a few spaces to what's inside it.

How can I make a hashcode for a custom data structure?

I've made a custom "Coordinate" data structure which defines the position of an object according to a certain system.
A coordinate is defined as follows:
public class Coordinate
{
public int X;
public int Y;
private int face;
public int Face
{
get { return face; }
set
{
if (value >= 6 | value < 0)
throw new Exception("Invalid face number");
else
face = value;
}
}
private int shell;
public int Shell
{
get { return shell; }
set
{
if (value < 0)
throw new Exception("No negative shell value allowed");
else
shell = value;
}
}
public Coordinate(int face, int x, int y, int shell)
{
this.X = x;
this.Y = y;
this.face = face;
this.shell = shell;
}
public static Coordinate operator +(Coordinate a, Coordinate b)
{
return new Coordinate(a.Face + b.Face, a.X + b.X, a.Y + b.Y, a.Shell + b.Shell);
}
public override bool Equals(object obj)
{
Coordinate other = (obj as Coordinate);
if (other == null)
return false;
else
return (Face == other.Face && Shell == other.Shell && X == other.X && Y == other.Y);
}
}
Or, to summarize, it contains an int Face (0 to 5), an int X, int Y, and int Shell. X, Y, and Shell are all bound below at 0 (inclusive).
I have no experience at all in hash codes. I need to compare them to see if they are equal. I tried this:
private const int MULTIPLIER = 89;
[...]
int hashCode = 1;
hashCode = MULTIPLIER * hashCode + obj.X.GetHashCode();
hashCode = MULTIPLIER * hashCode + obj.Y.GetHashCode();
hashCode = MULTIPLIER * hashCode + obj.Face.GetHashCode();
hashCode = MULTIPLIER * hashCode + obj.Shell.GetHashCode();
return hashCode;
Going off something I found while Googling. But when I try to compile the code with this method, I'm pretty sure it runs into collisions, as it never finishes building. Probably getting into all sorts of messy loops thinking a bunch of the coordinates are the same or somesuch.
I'm sorry this question is rather elementary, but for some reason I'm stumped. I'm just looking for advice on how to write this hash code so that it doesn't collide.
If well this is not the best way, it can be a good enough approach:
public override int GetHashCode()
{
return string.Format("{0}-{1}-{2}-{3}", X, Y, Face, Shell).GetHashCode();
}
Update:
Take a look at this article: http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
Basically, when writing hashcode functions, you need to make sure that:
you don't have stale hashcodes (i.e. the state of the object shouldn't change after a hashcode has been generated, such that the hashcode would change if regenerated)
objects with equal values return the same hashcodes
the same object always returns the same hashcode (if it's not modified) -- deterministic
Also, it's great, but not necessary, if:
your hashcodes are uniformly dispersed over the possible values (source: Wikipedia)
You don't need to ensure that different objects return different hashcodes. It's only frowned upon because it can decrease performance of things like Hashtables (if you have lots of collisions).
However, if you still want your hashcode function to return unique values, then you want to know about perfect hashing.
If you use dotnetcore 2.1+, you can use HashCode struct's Combile method, it's very easily to use and efficiency.

Sorting Complex Numbers

I have a struct called "Complex" in my project (I build it with using C#) and as the name of the struct implies, it's a struct for complex numbers. That struct has a built-in method called "Modulus" so that I can calculate the modulus of a complex number. The things are quite easy up to now.
The thing is, I create an array out of this struct and I want to sort the array according to the modulus of the complex numbers contained.(greater to smaller). Is there a way for that?? (Any algorithm suggestions will be welcomed.)
Thank you!!
Complex[] complexArray = ...
Complex[] sortedArray = complexArray.OrderByDescending(c => c.Modulus()).ToArray();
First of all, you can increase performances comparing squared modulus instead of modulus.
You don't need the squared root: "sqrt( a * a + b * b ) >= sqrt( c * c + d * d )" is equivalent to "a * a + b + b >= c * c + d * d".
Then, you can write a comparer to sort complex numbers.
public class ComplexModulusComparer :
IComparer<Complex>,
IComparer
{
public static readonly ComplexModulusComparer Default = new ComplexModulusComparer();
public int Compare(Complex a, Complex b)
{
return a.ModulusSquared().CompareTo(b.ModulusSquared());
}
int IComparer.Compare(object a, object b)
{
return ((Complex)a).ModulusSquared().CompareTo(((Complex)b).ModulusSquared());
}
}
You can write also the reverse comparer, since you want from greater to smaller.
public class ComplexModulusReverseComparer :
IComparer<Complex>,
IComparer
{
public static readonly ComplexModulusReverseComparer Default = new ComplexModulusReverseComparer();
public int Compare(Complex a, Complex b)
{
return - a.ModulusSquared().CompareTo(b.ModulusSquared());
}
int IComparer.Compare(object a, object b)
{
return - ((Complex)a).ModulusSquared().CompareTo(((Complex)b).ModulusSquared());
}
}
To sort an array you can then write two nice extension method ...
public static void SortByModulus(this Complex[] array)
{
Array.Sort(array, ComplexModulusComparer.Default);
}
public static void SortReverseByModulus(this Complex[] array)
{
Array.Sort(array, ComplexModulusReverseComparer.Default);
}
Then in your code...
Complex[] myArray ...;
myArray.SortReverseByModulus();
You can also implement the IComparable, if you wish, but a more correct and formal approach is to use the IComparer from my point of view.
public struct Complex :
IComparable<Complex>
{
public double R;
public double I;
public double Modulus() { return Math.Sqrt(R * R + I * I); }
public double ModulusSquared() { return R * R + I * I; }
public int CompareTo(Complex other)
{
return this.ModulusSquared().CompareTo(other.ModulusSquared());
}
}
And then you can write the ReverseComparer that can apply to every kind of comparer
public class ReverseComparer<T> :
IComparer<T>
{
private IComparer<T> comparer;
public static readonly ReverseComparer<T> Default = new ReverseComparer<T>();
public ReverseComparer<T>() :
this(Comparer<T>.Default)
{
}
public ReverseComparer<T>(IComparer<T> comparer)
{
this.comparer = comparer;
}
public int Compare(T a, T b)
{
return - this.comparer.Compare(a, b);
}
}
Then when you need to sort....
Complex[] array ...;
Array.Sort(array, ReverseComparer<Complex>.Default);
or in case you have another IComparer...
Complex[] array ...;
Array.Sort(array, new ReverseComparer<Complex>(myothercomparer));
RE-EDIT-
Ok i performed some speed test calculation.
Compiled with C# 4.0, in release mode, launched with all instances of visual studio closed.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace TestComplex
{
class Program
{
public struct Complex
{
public double R;
public double I;
public double ModulusSquared()
{
return this.R * this.R + this.I * this.I;
}
}
public class ComplexComparer :
IComparer<Complex>
{
public static readonly ComplexComparer Default = new ComplexComparer();
public int Compare(Complex x, Complex y)
{
return x.ModulusSquared().CompareTo(y.ModulusSquared());
}
}
private static void RandomComplexArray(Complex[] myArray)
{
// We use always the same seed to avoid differences in quicksort.
Random r = new Random(2323);
for (int i = 0; i < myArray.Length; ++i)
{
myArray[i].R = r.NextDouble() * 10;
myArray[i].I = r.NextDouble() * 10;
}
}
static void Main(string[] args)
{
// We perform some first operation to ensure JIT compiled and optimized everything before running the real test.
Stopwatch sw = new Stopwatch();
Complex[] tmp = new Complex[2];
for (int repeat = 0; repeat < 10; ++repeat)
{
sw.Start();
tmp[0] = new Complex() { R = 10, I = 20 };
tmp[1] = new Complex() { R = 30, I = 50 };
ComplexComparer.Default.Compare(tmp[0], tmp[1]);
tmp.OrderByDescending(c => c.ModulusSquared()).ToArray();
sw.Stop();
}
int[] testSizes = new int[] { 5, 100, 1000, 100000, 250000, 1000000 };
for (int testSizeIdx = 0; testSizeIdx < testSizes.Length; ++testSizeIdx)
{
Console.WriteLine("For " + testSizes[testSizeIdx].ToString() + " input ...");
// We create our big array
Complex[] myArray = new Complex[testSizes[testSizeIdx]];
double bestTime = double.MaxValue;
// Now we execute repeatCount times our test.
const int repeatCount = 15;
for (int repeat = 0; repeat < repeatCount; ++repeat)
{
// We fill our array with random data
RandomComplexArray(myArray);
// Now we perform our sorting.
sw.Reset();
sw.Start();
Array.Sort(myArray, ComplexComparer.Default);
sw.Stop();
double elapsed = sw.Elapsed.TotalMilliseconds;
if (elapsed < bestTime)
bestTime = elapsed;
}
Console.WriteLine("Array.Sort best time is " + bestTime.ToString());
// Now we perform our test using linq
bestTime = double.MaxValue; // i forgot this before
for (int repeat = 0; repeat < repeatCount; ++repeat)
{
// We fill our array with random data
RandomComplexArray(myArray);
// Now we perform our sorting.
sw.Reset();
sw.Start();
myArray = myArray.OrderByDescending(c => c.ModulusSquared()).ToArray();
sw.Stop();
double elapsed = sw.Elapsed.TotalMilliseconds;
if (elapsed < bestTime)
bestTime = elapsed;
}
Console.WriteLine("linq best time is " + bestTime.ToString());
Console.WriteLine();
}
Console.WriteLine("Press enter to quit.");
Console.ReadLine();
}
}
}
And here the results:
For 5 input ...
Array.Sort best time is 0,0004
linq best time is 0,0018
For 100 input ...
Array.Sort best time is 0,0267
linq best time is 0,0298
For 1000 input ...
Array.Sort best time is 0,3568
linq best time is 0,4107
For 100000 input ...
Array.Sort best time is 57,3536
linq best time is 64,0196
For 250000 input ...
Array.Sort best time is 157,8832
linq best time is 194,3723
For 1000000 input ...
Array.Sort best time is 692,8211
linq best time is 1058,3259
Press enter to quit.
My machine is an Intel I5, 64 bit windows seven.
Sorry! I did a small stupid bug in the previous edit!
ARRAY.SORT OUTPEFORMS LINQ, yes by a very small amount, but as suspected, this amount grows with n, seems in a not-so-linear way. It seems to me both code overhead and a memory problem (cache miss, object allocation, GC ... don't know).
You can always use SortedList :) Assuming modulus is int:
var complexNumbers = new SortedList<int, Complex>();
complexNumbers.Add(number.Modulus(), number);
public struct Complex: IComparable<Complex>
{
//complex rectangular number: a + bi
public decimal A
public decimal B
//synonymous with absolute value, or in geometric terms, distance
public decimal Modulus() { ... }
//CompareTo() is the default comparison used by most built-in sorts;
//all we have to do here is pass through to Decimal's IComparable implementation
//via the results of the Modulus() methods
public int CompareTo(Complex other){ return this.Modulus().CompareTo(other.Modulus()); }
}
You can now use any sorting method you choose on any collection of Complex instances; Array.Sort(), List.Sort(), Enumerable.OrderBy() (it doesn't use your IComparable, but if Complex were a member of a containing class you could sort the containing class by the Complex members without having to go the extra level down to comparing moduli), etc etc.
You stated you wanted to sort in descending order; you may consider multiplying the results of the Modulus() comparison by -1 before returning it. However, I would caution against this as it may be confusing; you would have to use a method that normally gives you descending order to get the list in ascending order. Instead, most sorting methods allow you to specify either a sorting direction, or a custom comparison which can still make use of the IComparable implementation:
//This will use your Comparison, but reverse the sort order based on its result
myEnumerableOfComplex.OrderByDescending(c=>c);
//This explicitly negates your comparison; you can also use b.CompareTo(a)
//which is equivalent
myListOfComplex.Sort((a,b) => return a.CompareTo(b) * -1);
//DataGridView objects use a SortDirection enumeration to control and report
//sort order
myGridViewOfComplex.Sort(myGridViewOfComplex.Columns["ComplexColumn"], ListSortDirection.Descending);

Categories

Resources