Sort List<T> by reference

Sort List<T> by reference - c#

I would like to sort a List<T> by reference. Why? I want to compare several lists and find which elements are unique to a given list. Sorting them first in some canonical fashion and then stepping through all lists simultaneously seems a good way to do this. This essentially means that I need to make aa IComparer<T> that returns a non-zero integer for different references (and is consistent). The order does have to mean anything, but just provide a fixed order for all T for at least the time it takes me to get the job done.
GetHashCode can return the same value for two different objects on a 64-bit system, so that is out as a way to do a comparison.
IntPtr from AddrOfPinnedObject() might work since it has the correct size, but I have to pin the object for the duration of the process.
Making a Dictionary<T, long> where the long is issued first-come first-served is what I'm going with now, but I want to know if there are better options. The hash lookup seems unnecessary.
What is the best way to define ReferenceComparer<T> : IComparer<T> to give me that fixed ordering?
Here is the dictionary implementation:
public class ReferenceComparer<T> : IComparer<T>
{
private long nextAvailable = 0;
private readonly Dictionary<T, long> objectToLong = new Dictionary<T, long>(new AsObjectEqualityComparer<T>());
public int Compare(T x, T y)
{
long xLong;
if (!objectToLong.TryGetValue(x, out xLong))
{
xLong = nextAvailable;
objectToLong[x] = xLong;
nextAvailable = nextAvailable + 1;
}
long yLong;
if (!objectToLong.TryGetValue(y, out yLong))
{
yLong = nextAvailable;
objectToLong[y] = yLong;
nextAvailable = nextAvailable + 1;
}
return xLong.CompareTo(yLong);
}
}

Related

Looking for something like a HashSet, but with a range of values for the key?

I'm wondering if there is something like HashSet, but keyed by a range of values.
For example, we could add an item which is keyed to all integers between 100 and 4000. This item would be returned if we used any key between 100 and 4000, e.g. 287.
I would like the lookup speed to be quite close to HashSet, i.e. O(1). It would be possible to implement this using a binary search, but this would be too slow for the requirements. I would like to use standard .NET API calls as much as possible.
Update
This is interesting: https://github.com/mbuchetics/RangeTree
It has a time complexity of O(log(N)) where N is number of intervals, so it's not exactly O(1), but it could be used to build a working implementation.

I don't believe there's a structure for it already. You could implement something like a RangedDictionary:
class RangedDictionary {
private Dictionary<Range, int> _set = new Dictionary<Range, int>();
public void Add(Range r, int key) {
_set.Add(r, key);
}
public int Get(int key) {
//find a range that includes that key and return _set[range]
}
}
struct Range {
public int Begin;
public int End;
//override GetHashCode() and Equals() methods so that you can index a Dictionary by Range
}
EDIT: changed to HashSet to Dictionary

Here is a solution you can try out. However it assumes some points :
No range overlaps
When you request for a number, it is effectively inside a range (no error check)
From what you said, this one is O(N), but you can make it O(log(N)) with little effort I think.
The idea is that a class will handle the range thing, it will basically convert any value given to it to its range's lower boundary. This way your Hashtable (here a Dictionary) contains the low boundaries as keys.
public class Range
{
//We store all the ranges we have
private static List<int> ranges = new List<int>();
public int value { get; set; }
public static void CreateRange(int RangeStart, int RangeStop)
{
ranges.Add(RangeStart);
ranges.Sort();
}
public Range(int value)
{
int previous = ranges[0];
//Here we will find the range and give it the low boundary
//This is a very simple foreach loop but you can make it better
foreach (int item in ranges)
{
if (item > value)
{
break;
}
previous = item;
}
this.value = previous;
}
public override int GetHashCode()
{
return value;
}
}
Here is to test it.
class Program
{
static void Main(string[] args)
{
Dictionary<int, int> myRangedDic = new Dictionary<int,int>();
Range.CreateRange(10, 20);
Range.CreateRange(50, 100);
myRangedDic.Add(new Range(15).value, 1000);
myRangedDic.Add(new Range(75).value, 5000);
Console.WriteLine("searching for 16 : {0}", myRangedDic[new Range(16).value].ToString());
Console.WriteLine("searching for 64 : {0}", myRangedDic[new Range(64).value].ToString());
Console.ReadLine();
}
}
I don't believe you really can go below O(Log(N)) because there is no way for you to know immediately in which range a number is, you must always compare it with a lower (or upper) bound.
If you had predetermined ranges, that would have been easier to do. i.e. if your ranges are every hundreds, it is really easy to find the correct range of any number by calculating it modulo 100, but here we can assume nothing, so we must check.
To go down to Log(N) with this solution, just replace the foreach with a loop that will look at the middle of the array, then split it in two every iteration...

Converting a method to use any Enum

My Problem:
I want to convert my randomBloodType() method to a static method that can take any enum type. I want my method to take any type of enum whether it be BloodType, DaysOfTheWeek, etc. and perform the operations shown below.
Some Background on what the method does:
The method currently chooses a random element from the BloodType enum based on the values assigned to each element. An element with a higher value has a higher probability to be picked.
Code:
public enum BloodType
{
// BloodType = Probability
ONeg = 4,
OPos = 36,
ANeg = 3,
APos = 28,
BNeg = 1,
BPos = 20,
ABNeg = 1,
ABPos = 5
};
public BloodType randomBloodType()
{
// Get the values of the BloodType enum and store it in a array
BloodType[] bloodTypeValues = (BloodType[])Enum.GetValues(typeof(BloodType));
List<BloodType> bloodTypeList = new List<BloodType>();
// Create a list where each element occurs the approximate number of
// times defined as its value(probability)
foreach (BloodType val in bloodTypeValues)
{
for(int i = 0; i < (int)val; i++)
{
bloodTypeList.Add(val);
}
}
// Sum the values
int sum = 0;
foreach (BloodType val in bloodTypeValues)
{
sum += (int)val;
}
//Get Random value
Random rand = new Random();
int randomValue = rand.Next(sum);
return bloodTypeList[randomValue];
}
What I have tried so far:
I have tried to use generics. They worked out for the most part, but I was unable to cast my enum elements to int values. I included a example of a section of code that was giving me problems below.
foreach (T val in bloodTypeValues)
{
sum += (int)val; // This line is the problem.
}
I have also tried using Enum e as a method parameter. I was unable to declare the type of my array of enum elements using this method.

(Note: My apologies in advance for the lengthy answer. My actual proposed solution is not all that long, but there are a number of problems with the proposed solutions so far and I want to try to address those thoroughly, to provide context for my own proposed solution).
In my opinion, while you have in fact accepted one answer and might be tempted to use either one, neither of the answers provided so far are correct or useful.
Commenter Ben Voigt has already pointed out two major flaws with your specifications as stated, both related to the fact that you are encoding the enum value's weight in the value itself:
You are tying the enum's underlying type to the code that then must interpret that type.
Two enum values that have the same weight are indistinguishable from each other.
Both of these issues can be addressed. Indeed, while the answer you accepted (why?) fails to address the first issue, the one provided by Dweeberly does address this through the use of Convert.ToInt32() (which can convert from long to int just fine, as long as the values are small enough).
But the second issue is much harder to address. The answer from Asad attempts to address this by starting with the enum names and parsing them to their values. And this does indeed result in the final array being indexed containing the corresponding entries for each name separately. But the code actually using the enum has no way to distinguish the two; it's really as if those two names are a single enum value, and that single enum value's probability weight is the sum of the value used for the two different names.
I.e. in your example, while the enum entries for e.g. BNeg and ABNeg will be selected separately, the code that receives these randomly selected value has no way to know whether it was BNeg or ABNeg that was selected. As far as it knows, those are just two different names for the same value.
Now, even this problem can be addressed (but not in the way that Asad attempts to…his answer is still broken). If you were, for example, to encode the probabilities in the value while still ensuring unique values for each name, you could decode those probabilities while doing the random selection and that would work. For example:
enum BloodType
{
// BloodType = Probability
ONeg = 4 * 100 + 0,
OPos = 36 * 100 + 1,
ANeg = 3 * 100 + 2,
APos = 28 * 100 + 3,
BNeg = 1 * 100 + 4,
BPos = 20 * 100 + 5,
ABNeg = 1 * 100 + 6,
ABPos = 5 * 100 + 7,
};
Having declared your enum values that way, then you can in your selection code divide the enum value by 100 to obtain its probability weight, which then can be used as seen in the various examples. At the same time, each enum name has a unique value.
But even solving that problem, you are still left with problems related to the choice of encoding and representation of the probabilities. For example, in the above you cannot have an enum that has more than 100 values, nor one with weights larger than (2^31 - 1) / 100; if you want an enum that has more than 100 values, you need a larger multiplier but that would limit your weight values even more.
In many scenarios (maybe all the ones you care about) this won't be an issue. The numbers are small enough that they all fit. But that seems like a serious limitation in what seems like a situation where you want a solution that is as general as possible.
And that's not all. Even if the encoding stays within reasonable limits, you have another significant limit to deal with: the random selection process requires an array large enough to contain for each enum value as many instances of that value as its weight. Again, if the values are small maybe this is not a big problem. But it does severely limit the ability of your implementation to generalize.
So, what to do?
I understand the temptation to try to keep each enum type self-contained; there are some obvious advantages to doing so. But there are also some serious disadvantages that result from that, and if you truly ever try to use this in a generalized way, the changes to the solutions proposed so far will tie your code together in ways that IMHO negate most if not all of the advantage of keeping the enum types self-contained (primarily: if you find you need to modify the implementation to accommodate some new enum type, you will have to go back and edit all of the other enum types you're using…i.e. while each type looks self-contained, in reality they are all tightly coupled with each other).
In my opinion, a much better approach would be to abandon the idea that the enum type itself will encode the probability weights. Just accept that this will be declared separately somehow.
Also, IMHO is would be better to avoid the memory-intensive approach proposed in your original question and mirrored in the other two answers. Yes, this is fine for the small values you're dealing with here. But it's an unnecessary limitation, making only one small part of the logic simpler while complicating and restricting it in other ways.
I propose the following solution, in which the enum values can be whatever you want, the enum's underlying type can be whatever you want, and the algorithm uses memory proportionally only to the number of unique enum values, rather than in proportion to the sum of all of the probability weights.
In this solution, I also address possible performance concerns, by caching the invariant data structures used to select the random values. This may or may not be useful in your case, depending on how frequently you will be generating these random values. But IMHO it is a good idea regardless; the up-front cost of generating these data structures is so high that if the values are selected with any regularity at all, it will begin to dominate the run-time cost of your code. Even if it works fine today, why take the risk? (Again, especially given that you seem to want a generalized solution).
Here is the basic solution:
static T NextRandomEnumValue<T>()
{
KeyValuePair<T, int>[] aggregatedWeights = GetWeightsForEnum<T>();
int weightedValue =
_random.Next(aggregatedWeights[aggregatedWeights.Length - 1].Value),
index = Array.BinarySearch(aggregatedWeights,
new KeyValuePair<T, int>(default(T), weightedValue),
KvpValueComparer<T, int>.Instance);
return aggregatedWeights[index < 0 ? ~index : index + 1].Key;
}
static KeyValuePair<T, int>[] GetWeightsForEnum<T>()
{
object temp;
if (_typeToAggregatedWeights.TryGetValue(typeof(T), out temp))
{
return (KeyValuePair<T, int>[])temp;
}
if (!_typeToWeightMap.TryGetValue(typeof(T), out temp))
{
throw new ArgumentException("Unsupported enum type");
}
KeyValuePair<T, int>[] weightMap = (KeyValuePair<T, int>[])temp;
KeyValuePair<T, int>[] aggregatedWeights =
new KeyValuePair<T, int>[weightMap.Length];
int sum = 0;
for (int i = 0; i < weightMap.Length; i++)
{
sum += weightMap[i].Value;
aggregatedWeights[i] = new KeyValuePair<T,int>(weightMap[i].Key, sum);
}
_typeToAggregatedWeights[typeof(T)] = aggregatedWeights;
return aggregatedWeights;
}
readonly static Random _random = new Random();
// Helper method to reduce verbosity in the enum-to-weight array declarations
static KeyValuePair<T1, T2> CreateKvp<T1, T2>(T1 t1, T2 t2)
{
return new KeyValuePair<T1, T2>(t1, t2);
}
readonly static KeyValuePair<BloodType, int>[] _bloodTypeToWeight =
{
CreateKvp(BloodType.ONeg, 4),
CreateKvp(BloodType.OPos, 36),
CreateKvp(BloodType.ANeg, 3),
CreateKvp(BloodType.APos, 28),
CreateKvp(BloodType.BNeg, 1),
CreateKvp(BloodType.BPos, 20),
CreateKvp(BloodType.ABNeg, 1),
CreateKvp(BloodType.ABPos, 5),
};
readonly static Dictionary<Type, object> _typeToWeightMap =
new Dictionary<Type, object>()
{
{ typeof(BloodType), _bloodTypeToWeight },
};
readonly static Dictionary<Type, object> _typeToAggregatedWeights =
new Dictionary<Type, object>();
Note that the work of actually selecting a random value is simply a matter of choosing a non-negative random integer less than the sum of the weights, and then using a binary search to find the appropriate enum value.
Once per enum type, the code will build the table of values and weight-sums that will be used for the binary search. This result is stored in a cache dictionary, _typeToAggregatedWeights.
There are also the objects that have to be declared and which will be used at run-time to build this table. Note that the _typeToWeightMap is just in support of making this method 100% generic. If you wanted to write a different named method for each specific type you wanted to support, that could still used a single generic method to implement the initialization and selection, but the named method would know the correct object (e.g. _bloodTypeToWeight) to use for initialization.
Alternatively, another way to avoid the _typeToWeightMap while still keeping the method 100% generic would be to have the _typeToAggregatedWeights be of type Dictionary<Type, Lazy<object>>, and have the values of the dictionary (the Lazy<object> objects) explicitly reference the appropriate weight array for the type.
In other words, there are lots of variations on this theme that would work fine. But they will all have essentially the same structure as above; semantics would be the same and performance differences would be negligible.
One thing you'll notice is that the binary search requires a custom IComparer<T> implementation. That is here:
class KvpValueComparer<TKey, TValue> :
IComparer<KeyValuePair<TKey, TValue>> where TValue : IComparable<TValue>
{
public readonly static KvpValueComparer<TKey, TValue> Instance =
new KvpValueComparer<TKey, TValue>();
private KvpValueComparer() { }
public int Compare(KeyValuePair<TKey, TValue> x, KeyValuePair<TKey, TValue> y)
{
return x.Value.CompareTo(y.Value);
}
}
This allows the Array.BinarySearch() method to correct compare the array elements, allowing a single array to contain both the enum values and their aggregated weights, but limiting the binary search comparison to just the weights.

Assuming your enum values are all of type int (you can adjust this accordingly if they're long, short, or whatever):
static TEnum RandomEnumValue<TEnum>(Random rng)
{
var vals = Enum
.GetNames(typeof(TEnum))
.Aggregate(Enumerable.Empty<TEnum>(), (agg, curr) =>
{
var value = Enum.Parse(typeof (TEnum), curr);
return agg.Concat(Enumerable.Repeat((TEnum)value,(int)value)); // For int enums
})
.ToArray();
return vals[rng.Next(vals.Length)];
}
Here's how you would use it:
var rng = new Random();
var randomBloodType = RandomEnumValue<BloodType>(rng);
People seem to have their knickers in a knot about multiple indistinguishable enum values in the input enum (for which I still think the above code provides expected behavior). Note that there is no answer here, not even Peter Duniho's, that will allow you to distinguish enum entries when they have the same value, so I'm not sure why this is being considered as a metric for any potential solutions.
Nevertheless, an alternative approach that doesn't use the enum values as probabilities is to use an attribute to specify the probability:
public enum BloodType
{
[P=4]
ONeg,
[P=36]
OPos,
[P=3]
ANeg,
[P=28]
APos,
[P=1]
BNeg,
[P=20]
BPos,
[P=1]
ABNeg,
[P=5]
ABPos
}
Here is what the attribute used above looks like:
[AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
public class PAttribute : Attribute
{
public int Weight { get; private set; }
public PAttribute(int weight)
{
Weight = weight;
}
}
and finally, this is what the method to get a random enum value would like:
static TEnum RandomEnumValue<TEnum>(Random rng)
{
var vals = Enum
.GetNames(typeof(TEnum))
.Aggregate(Enumerable.Empty<TEnum>(), (agg, curr) =>
{
var value = Enum.Parse(typeof(TEnum), curr);
FieldInfo fi = typeof (TEnum).GetField(curr);
var weight = ((PAttribute)fi.GetCustomAttribute(typeof(PAttribute), false)).Weight;
return agg.Concat(Enumerable.Repeat((TEnum)value, weight)); // For int enums
})
.ToArray();
return vals[rng.Next(vals.Length)];
}
(Note: if this code is performance critical, you might need to tweak this and add caching for the reflection data).

Some of this you can do and some of it isn't so easy. I believe the following extension method will do what you describe.
static public class Util {
static Random rnd = new Random();
static public int PriorityPickEnum(this Enum e) {
// The approved types for an enum are byte, sbyte, short, ushort, int, uint, long, or ulong
// However, Random only supports a int (or double) as a max value. Either way
// it doesn't have the range for uint, long and ulong.
//
// sum enum
int sum = 0;
foreach (var x in Enum.GetValues(e.GetType())) {
sum += Convert.ToInt32(x);
}
var i = rnd.Next(sum); // get a random value, it will form a ratio i / sum
// enums may not have a uniform (incremented) value range (think about flags)
// therefore we have to step through to get to the range we want,
// this is due to the requirement that return value have a probability
// proportional to it's value. Note enum values must be sorted for this to work.
foreach (var x in Enum.GetValues(e.GetType()).OfType<Enum>().OrderBy(a => a)) {
i -= Convert.ToInt32(x);
if (i <= 0) return Convert.ToInt32(x);
}
throw new Exception("This doesn't seem right");
}
}
Here is an example of using this extension:
BloodType bt = BloodType.ABNeg;
for (int i = 0; i < 100; i++) {
var v = (BloodType) bt.PriorityPickEnum();
Console.WriteLine("{0}: {1}({2})", i, v, (int) v);
}
This should work pretty well for enum's of type byte, sbyte, ushort, short and int. Once you get beyond int (uint, long, ulong) the problem is the Random class. You can adjust the code to use doubles generated by Random, which would cover uint, but the Random class just doesn't have the range to cover long and ulong. Of course you could use/find/write a different Random class if this is important.

Is there a class in C# to handle a couple of INT (range of 2 INT- 1-10)

I am quite new to C# and I was wondering if there is a Class or a data structure or the best way to handle the following requirement...
I need to handle a COUPLE of int that represent a range of data (eg. 1 - 10 or 5-245) and I need a method to verify if an Int value is contained in the range...
I believe that in C# there is a class built in the framework to handle my requirement...
what I need to do is to verify if an INT (eg. 5) is contained in the range of values Eg (1-10) ...
in the case that I should discover that there is not a class to handle it, I was thinking to go with a Struct that contain the 2 numbers and make my own Contain method to test if 5 is contained in the range 1-10)

in the case that I should discover that there is not a class to handle
it, I was thinking to go with a Struct that contain the 2 numbers and
make my own Contain method to test if 5 is contained in the range
1-10)
That's actually a great idea as there's no built-in class for your scenario in the BCL.

You're looking for a range type; the .Net framework does not include one.
You should make an immutable (!) Int32Range struct, as you suggested.
You may want to implement IEnumerable<int> to allow users to easily loop through the numbers in the range.
You need to decide whether each bound should be inclusive or exclusive.
[Start, End) is probably the most obvious choice.
Whatever you choose, you should document it clearly in the XML comments.

Nothing exists that meets your requirements exactly.
Assuming I understood you correctly, the class is pretty simple to write.
class Range
{
public int Low {get; set;}
public int High {get; set;}
public bool InRange(int val) { return val >= Low && val <= High; }
}
A Tuple<int,int> would get you part of the way but you'd have to add an extension method to get the extra behavior. The downside is that the lower- and upper-bounds are implicitly Item1 and Item2 which could be confusing.
// written off-the-cuff, may not compile
public static class TupleExtension
{
public static bool InRange(Tuple<int, int> this, int queryFor)
{
return this.Item1 >= queryFor && this.Item2 <= queryFor;
}
}

You could create an extension if you want to avoid making a new type:
public static class Extensions
{
public static bool IsInRange(this int value, int min, int max)
{
return value >= min && value <= max;
}
}
Then you could do something like:
if(!value.IsInRange(5, 545))
throw new Exception("Value is out of range.");

i think you can do that with an array.
some nice examples and explanation can be found here:
http://www.dotnetperls.com/int-array

Nothing built in AFAIK, but (depending on the size of the range) an Enumerable.Range would work (but be less than optimal, as you're really storing every value in the range, not just the endpoints). It does allow you to use the LINQ methods (including Enumerable.Contains), though - which may come in handy.
const int START = 5;
const int END = 245;
var r = Enumerable.Range(START, (END - START)); // 2nd param is # of integers
return r.Contains(100);
Personally, I'd probably go ahead and write the class, since it's fairly simple (and you can always expose an IEnumerable<int> iterator via Enumerable.Range if you want to do LINQ over it)

How to get results efficiently out of an Octree/Quadtree?

I am working on a piece of 3D software that has sometimes has to perform intersections between massive numbers of curves (sometimes ~100,000). The most natural way to do this is to do an N^2 bounding box check, and then those curves whose bounding boxes overlap get intersected.
I heard good things about octrees, so I decided to try implementing one to see if I would get improved performance.
Here's my design:
Each octree node is implemented as a class with a list of subnodes and an ordered list of object indices.
When an object is being added, it's added to the lowest node that entirely contains the object, or some of that node's children if the object doesn't fill all of the children.
Now, what I want to do is retrieve all objects that share a tree node with a given object. To do this, I traverse all tree nodes, and if they contain the given index, I add all of their other indices to an ordered list.
This is efficient because the indices within each node are already ordered, so finding out if each index is already in the list is fast. However, the list ends up having to be resized, and this takes up most of the time in the algorithm. So what I need is some kind of tree-like data structure that will allow me to efficiently add ordered data, and also be efficient in memory.
Any suggestions?

Assuming you keep the size of the OctTree as a property of the tree, you should be able to preallocate a list that is larger than the number of things you could possibly put it in. Preallocating the size will keep the resize from happening as long as the size is larger than you need. I assume that you are using a SortedList to keep your ordered results.
var results = new SortedList<Node>( octTree.Count );
// now find the node and add the points
results = result.TrimToSize(); // reclaim space as needed
An alternative would be to augment your data structure keeping the size of the tree below the current node in the node itself. Then you'd be able to find the node of interest and directly determine what the size of the list needs to be. All you'd have to do is modify the insert/delete operations to update the size of each of the ancestors of the node inserted/deleted at the end of the operation.

SortedDictionary (.NET 2+) or SortedSet (.NET 4 only) is probably what you want. They are tree structures.
SortedList is a dumb class which is no different from List structurally.
However, it is still not entirely clear to me why you need this list as sorted.
Maybe if you could elaborate on this matter we could find a solution where you don't need sorting at all. For example a simple HashSet could do. It is faster at both lookups and insertions than SortedList or any of the tree structures if hashing is done properly.
Ok, now when it is clear to me that you wanted sorted lists merging, I can try to write an implementation.
At first, I implemented merging using SortedDictionary to store heads of all the arrays. At each iteration I removed the smallest element from the dictionary and added the next one from the same array. Performance tests showed that overhead of SortedDictionary is huge, so that it is almost impossible to make it faster than simple concatenation+sorting. It even struggles to match SortedList performance on small tests.
Then I replaced SortedDictionary with custom-made binary heap implementation. Performance improvement was tremendous (more than 6 times). This Heap implementation even manages to beat .Distinct() (which is usually the fastest) in some tests.
Here is my code:
class Heap<T>
{
public Heap(int limit, IComparer<T> comparer)
{
this.comparer = comparer;
data = new T[limit];
}
int count = 0;
T[] data;
public void Add(T t)
{
data[count++] = t;
promote(count-1);
}
IComparer<T> comparer;
public int Count { get { return count; } }
public T Pop()
{
T result = data[0];
fill(0);
return result;
}
bool less(T a, T b)
{
return comparer.Compare(a,b)<0;
}
void fill(int index)
{
int child1 = index*2+1;
int child2 = index*2+2;
if(child1 >= Count)
{
data[index] = data[--count];
if(index!=count)
promote(index);
}
else
{
int bestChild = child1;
if(child2 < Count && less(data[child2], data[child1]))
{
bestChild = child2;
}
data[index] = data[bestChild];
fill(bestChild);
}
}
void promote(int index)
{
if(index==0)
return;
int parent = (index-1)/2;
if(less(data[index], data[parent]))
{
T tmp = data[parent];
data[parent] = data[index];
data[index] = tmp;
promote(parent);
}
}
}
struct ArrayCursor<T>
{
public T [] Array {get;set;}
public int Index {get;set;}
public bool Finished {get{return Array.Length == Index;}}
public T Value{get{return Array[Index];}}
}
class ArrayComparer<T> : IComparer<ArrayCursor<T>>
{
IComparer<T> comparer;
public ArrayComparer(IComparer<T> comparer)
{
this.comparer = comparer;
}
public int Compare (ArrayCursor<T> a, ArrayCursor<T> b)
{
return comparer.Compare(a.Value, b.Value);
}
}
static class HeapMerger
{
public static IEnumerable<T> MergeUnique<T>(this T[][] arrays)
{
bool first = true;
T last = default(T);
IEqualityComparer<T> eq = EqualityComparer<T>.Default;
foreach(T i in Merge(arrays))
if(first || !eq.Equals(last,i))
{
yield return i;
last = i;
first = false;
}
}
public static IEnumerable<T> Merge<T>(this T[][] arrays)
{
var map = new Heap<ArrayCursor<T>>(arrays.Length, new ArrayComparer<T>(Comparer<T>.Default));
Action<ArrayCursor<T>> tryAdd = (a)=>
{
if(!a.Finished)
map.Add(a);
};
for(int i=0;i<arrays.Length;i++)
tryAdd(new ArrayCursor<T>{Array=arrays[i], Index=0});
while(map.Count>0)
{
ArrayCursor<T> lowest = map.Pop();
yield return lowest.Value;
lowest.Index++;
tryAdd(lowest);
}
}
}

How to initialize a List<T> to a given size (as opposed to capacity)?

.NET offers a generic list container whose performance is almost identical (see Performance of Arrays vs. Lists question). However they are quite different in initialization.
Arrays are very easy to initialize with a default value, and by definition they already have certain size:
string[] Ar = new string[10];
Which allows one to safely assign random items, say:
Ar[5]="hello";
with list things are more tricky. I can see two ways of doing the same initialization, neither of which is what you would call elegant:
List<string> L = new List<string>(10);
for (int i=0;i<10;i++) L.Add(null);
or
string[] Ar = new string[10];
List<string> L = new List<string>(Ar);
What would be a cleaner way?
EDIT: The answers so far refer to capacity, which is something else than pre-populating a list. For example, on a list just created with a capacity of 10, one cannot do L[2]="somevalue"
EDIT 2: People wonder why I want to use lists this way, as it is not the way they are intended to be used. I can see two reasons:
One could quite convincingly argue that lists are the "next generation" arrays, adding flexibility with almost no penalty. Therefore one should use them by default. I'm pointing out they might not be as easy to initialize.
What I'm currently writing is a base class offering default functionality as part of a bigger framework. In the default functionality I offer, the size of the List is known in advanced and therefore I could have used an array. However, I want to offer any base class the chance to dynamically extend it and therefore I opt for a list.

List<string> L = new List<string> ( new string[10] );

I can't say I need this very often - could you give more details as to why you want this? I'd probably put it as a static method in a helper class:
public static class Lists
{
public static List<T> RepeatedDefault<T>(int count)
{
return Repeated(default(T), count);
}
public static List<T> Repeated<T>(T value, int count)
{
List<T> ret = new List<T>(count);
ret.AddRange(Enumerable.Repeat(value, count));
return ret;
}
}
You could use Enumerable.Repeat(default(T), count).ToList() but that would be inefficient due to buffer resizing.
Note that if T is a reference type, it will store count copies of the reference passed for the value parameter - so they will all refer to the same object. That may or may not be what you want, depending on your use case.
EDIT: As noted in comments, you could make Repeated use a loop to populate the list if you wanted to. That would be slightly faster too. Personally I find the code using Repeat more descriptive, and suspect that in the real world the performance difference would be irrelevant, but your mileage may vary.

Use the constructor which takes an int ("capacity") as an argument:
List<string> = new List<string>(10);
EDIT: I should add that I agree with Frederik. You are using the List in a way that goes against the entire reasoning behind using it in the first place.
EDIT2:
EDIT 2: What I'm currently writing is a base class offering default functionality as part of a bigger framework. In the default functionality I offer, the size of the List is known in advanced and therefore I could have used an array. However, I want to offer any base class the chance to dynamically extend it and therefore I opt for a list.
Why would anyone need to know the size of a List with all null values? If there are no real values in the list, I would expect the length to be 0. Anyhow, the fact that this is cludgy demonstrates that it is going against the intended use of the class.

Create an array with the number of items you want first and then convert the array in to a List.
int[] fakeArray = new int[10];
List<int> list = fakeArray.ToList();

If you want to initialize the list with N elements of some fixed value:
public List<T> InitList<T>(int count, T initValue)
{
return Enumerable.Repeat(initValue, count).ToList();
}

Why are you using a List if you want to initialize it with a fixed value ?
I can understand that -for the sake of performance- you want to give it an initial capacity, but isn't one of the advantages of a list over a regular array that it can grow when needed ?
When you do this:
List<int> = new List<int>(100);
You create a list whose capacity is 100 integers. This means that your List won't need to 'grow' until you add the 101th item.
The underlying array of the list will be initialized with a length of 100.

This is an old question, but I have two solutions. One is fast and dirty reflection; the other is a solution that actually answers the question (set the size not the capacity) while still being performant, which none of the answers here do.
Reflection
This is quick and dirty, and should be pretty obvious what the code does. If you want to speed it up, cache the result of GetField, or create a DynamicMethod to do it:
public static void SetSize<T>(this List<T> l, int newSize) =>
l.GetType().GetField("_size", BindingFlags.NonPublic | BindingFlags.Instance).SetValue(l, newSize);
Obviously a lot of people will be hesitant to put such code into production.
ICollection<T>
This solution is based around the fact that the constructor List(IEnumerable<T> collection) optimizes for ICollection<T> and immediately adjusts the size to the correct amount, without iterating it. It then calls the collections CopyTo to do the copy.
The code for the List<T> constructor is as follows:
public List(IEnumerable<T> collection) {
....
ICollection<T> c = collection as ICollection<T>;
if (collection is ICollection<T> c)
{
int count = c.Count;
if (count == 0)
{
_items = s_emptyArray;
}
else {
_items = new T[count];
c.CopyTo(_items, 0);
_size = count;
}
}
So we can completely optimally pre-initialize the List to the correct size, without any extra copying.
How so? By creating an ICollection<T> object that does nothing other than return a Count. Specifically, we will not implement anything in CopyTo which is the only other function called.
private struct SizeCollection<T> : ICollection<T>
{
public SizeCollection(int size) =>
Count = size;
public void Add(T i){}
public void Clear(){}
public bool Contains(T i)=>true;
public void CopyTo(T[]a, int i){}
public bool Remove(T i)=>true;
public int Count {get;}
public bool IsReadOnly=>true;
public IEnumerator<T> GetEnumerator()=>null;
IEnumerator IEnumerable.GetEnumerator()=>null;
}
public List<T> InitializedList<T>(int size) =>
new List<T>(new SizeCollection<T>(size));
We could in theory do the same thing for AddRange/InsertRange for an existing array, which also accounts for ICollection<T>, but the code there creates a new array for the supposed items, then copies them in. In such case, it would be faster to just empty-loop Add:
public void SetSize<T>(this List<T> l, int size)
{
if(size < l.Count)
l.RemoveRange(size, l.Count - size);
else
for(size -= l.Count; size > 0; size--)
l.Add(default(T));
}

Initializing the contents of a list like that isn't really what lists are for. Lists are designed to hold objects. If you want to map particular numbers to particular objects, consider using a key-value pair structure like a hash table or dictionary instead of a list.

You seem to be emphasizing the need for a positional association with your data, so wouldn't an associative array be more fitting?
Dictionary<int, string> foo = new Dictionary<int, string>();
foo[2] = "string";

The accepted answer (the one with the green check mark) has an issue.
The problem:
var result = Lists.Repeated(new MyType(), sizeOfList);
// each item in the list references the same MyType() object
// if you edit item 1 in the list, you are also editing item 2 in the list
I recommend changing the line above to perform a copy of the object. There are many different articles about that:
String.MemberwiseClone() method called through reflection doesn't work, why?
https://code.msdn.microsoft.com/windowsdesktop/CSDeepCloneObject-8a53311e
If you want to initialize every item in your list with the default constructor, rather than NULL, then add the following method:
public static List<T> RepeatedDefaultInstance<T>(int count)
{
List<T> ret = new List<T>(count);
for (var i = 0; i < count; i++)
{
ret.Add((T)Activator.CreateInstance(typeof(T)));
}
return ret;
}

You can use Linq to cleverly initialize your list with a default value. (Similar to David B's answer.)
var defaultStrings = (new int[10]).Select(x => "my value").ToList();
Go one step farther and initialize each string with distinct values "string 1", "string 2", "string 3", etc:
int x = 1;
var numberedStrings = (new int[10]).Select(x => "string " + x++).ToList();

string [] temp = new string[] {"1","2","3"};
List<string> temp2 = temp.ToList();

After thinking again, I had found the non-reflection answer to the OP question, but Charlieface beat me to it. So I believe that the correct and complete answer is https://stackoverflow.com/a/65766955/4572240
My old answer:
If I understand correctly, you want the List<T> version of new T[size], without the overhead of adding values to it.
If you are not afraid the implementation of List<T> will change dramatically in the future (and in this case I believe the probability is close to 0), you can use reflection:
public static List<T> NewOfSize<T>(int size) {
var list = new List<T>(size);
var sizeField = list.GetType().GetField("_size",BindingFlags.Instance|BindingFlags.NonPublic);
sizeField.SetValue(list, size);
return list;
}
Note that this takes into account the default functionality of the underlying array to prefill with the default value of the item type. All int arrays will have values of 0 and all reference type arrays will have values of null. Also note that for a list of reference types, only the space for the pointer to each item is created.
If you, for some reason, decide on not using reflection, I would have liked to offer an option of AddRange with a generator method, but underneath List<T> just calls Insert a zillion times, which doesn't serve.
I would also like to point out that the Array class has a static method called ResizeArray, if you want to go the other way around and start from Array.
To end, I really hate when I ask a question and everybody points out that it's the wrong question. Maybe it is, and thanks for the info, but I would still like an answer, because you have no idea why I am asking it. That being said, if you want to create a framework that has an optimal use of resources, List<T> is a pretty inefficient class for anything than holding and adding stuff to the end of a collection.

A notice about IList:
MSDN IList Remarks:
"IList implementations fall into three categories: read-only, fixed-size, and variable-size. (...). For the generic version of this interface, see
System.Collections.Generic.IList<T>."
IList<T> does NOT inherits from IList (but List<T> does implement both IList<T> and IList), but is always variable-size.
Since .NET 4.5, we have also IReadOnlyList<T> but AFAIK, there is no fixed-size generic List which would be what you are looking for.

This is a sample I used for my unit test. I created a list of class object. Then I used forloop to add 'X' number of objects that I am expecting from the service.
This way you can add/initialize a List for any given size.
public void TestMethod1()
{
var expected = new List<DotaViewer.Interface.DotaHero>();
for (int i = 0; i < 22; i++)//You add empty initialization here
{
var temp = new DotaViewer.Interface.DotaHero();
expected.Add(temp);
}
var nw = new DotaHeroCsvService();
var items = nw.GetHero();
CollectionAssert.AreEqual(expected,items);
}
Hope I was of help to you guys.

A bit late but first solution you proposed seems far cleaner to me : you dont allocate memory twice.
Even List constrcutor needs to loop through array in order to copy it; it doesn't even know by advance there is only null elements inside.
1.
- allocate N
- loop N
Cost: 1 * allocate(N) + N * loop_iteration
2.
- allocate N
- allocate N + loop ()
Cost : 2 * allocate(N) + N * loop_iteration
However List's allocation an loops might be faster since List is a built-in class, but C# is jit-compiled sooo...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Sort List<T> by reference - c#

Related

Looking for something like a HashSet, but with a range of values for the key?

Converting a method to use any Enum

Is there a class in C# to handle a couple of INT (range of 2 INT- 1-10)

How to get results efficiently out of an Octree/Quadtree?

How to initialize a List<T> to a given size (as opposed to capacity)?

Categories

Resources