I was using with ArrayPool in C#.
I wanted to create my own pool with max no of arrays 5 and max size of array 1050000.
I used this ArrayPool.Create() method.
I am not able to understand one thing - i am trying to rent from the pool 10 times in the snippet below ,although i specified max arrays to be 5 , then why is it not showing any error.
Also, i specified max length to be 1050000.Then how am i able to rent a 4200000 array without any error ?
byte[] buffer;
ArrayPool<byte> pool = ArrayPool<byte>.Create(1050000, 5);
for (int i = 0; i < 10; i++)
{
buffer = pool.Rent(4200000);
}
The options passed to ArrayPool.Create don't imply you cannot recieve an array larger than those limits. Instead they are used to control the bucketing algorithm of the ConfigurableArrayPool. The second argument is the maximum number of slots in a bucket and the first is the maximum size of any array. This value is capped by an internal constant of 1,048,576 which is already smaller than your 1,050,000.
When you Rent from the array pool, the algorithm will attempt to locate an array in one of the buckets/slots. The number of these buckets (and their internal slots) are what become limited by the values you passed in. If the pool doesn't have an array of the minimum size requested either because all slots are in use or because the requested size is greater than the maximum, it will instead allocate a new one (without pooling it) and return that.
In short, when you request an array larger than the (capped) size you passed in to the Create method you will incur an allocation and recieve an array that does not participate in the pool. Calling Return with this array will not place it back into the pool; instead it will be "dropped".
Keep in mind however that these rules only apply to the built-in array pool. You (or someone else) could write an implementation that caps the size of the returned array or even throws -- though I'd argue that those might not be considered well-behaved (at least without supporting doc).
Update based on your comments:
While true there is not a parameter that corresponds directly to the number of buckets, there is indirectly. The number of buckets is calculated using the maximum array size you pass in. The max buckets is determined based on powers of 2 and some other logic.
Documentation has no exceptions defined for the Rent method (though there is at least one which can be thrown - ArgumentOutOfRangeException for negative array sizes).
Looking at the source code for the Create method - it returns ConfigurableArrayPool. It's Rent method will try to find an array matching for the request and if there is no suitable one it will just allocate a new one:
// The request was for a size too large for the pool. Allocate an array of exactly the requested length.
// When it's returned to the pool, we'll simply throw it away.
buffer = new T[minimumLength];
So both parameters (maxArrayLength and maxArraysPerBucket) are used just to control what ArrayPool will actually store and reuse, not how much can be allocated (and it makes sense, usually you don't want your application to fail while allocating memory if there is enough of memory available to it). Everything else will be in the control of GC, so ArrayPool will not end up storing a lot of noncollectable memory.
Related
Why do I need to use the Add() to add elements to a List. Why can't I use indexing and do it. When I traverse the elements through the List I do it using the help of indexes.
int head = -1;
List<char> arr = new List<char>();
public void push(char s)
{
++head;
arr[head] = s;//throws runtime error.
arr.Add(s);
}
It doesn't throw any error during compile time. But throws an error at runtime stating IndexOutOfRangeException.
++head;
arr[head] = s;
This attempts to set element 1 of the list to s, but there is no element 1 yet because you've not added anything, or set the length of the list.
When you create an array, you define a length, so each item has a memory address that can be assigned to.
Lists are useful when you don't know how many items you're going to have, or what their index is going to be.
Arrays are fixed sizes. Once you allocate them, you can not add or remove "slots" from it. So if you need it to be bigger, you need to:
Detect that you need a bigger array.
Allocate a new, bigger array
copy all existing values to teh new, bigger array
start using the bigger array from now on everywhere
All that Lists do is automate that precise process. It will automatically detect that it needs to increase during Add() and then do step 2-4 automagically. It is even responsible to pick the initial size and by how much to grow it (to avoid having to grow to often.
They could in theory jsut react to List[11000] by growing the size to 11000. But chances are very big, that this value is a huge mistake. And preventing the Progarmmer from doing huge mistakes is what half the classes and compiler rules (like strong typisation) are there for. So they force you to use Add() so such a mistake can not happen.
Actually calling myArray[2] does not add the element, but just assigns the object to the specified index within the array. If the array´s size is less you´d get an IndexOutOfBoundsException, as in a list<T> also. So also in case of an array using the indexer assumes you actually have that many elements:
var array = new int[3];
array[5] = 4; // bang
This is because arrays have a fixed size which you can´t change. If you assign an object to an index greater the arrays size you get the exat same exception as for a List<T> also, there´s no difference here.
The only real difference here is that when using new array[3] you have an array of size 3 with indices up to 2 and you can call array[2]. However this would just return the default-value - in case of int this is zero. When using new List<int>(3) in contrast you don´t have actually three elements. In fact the list has no items at all and calling list[2] throws the exception. The parameter to a list is just the capacity, which is a parameter for the runtime to indicate when the underlying array of a list should be resized - an ability your array does not even have.
A list is an array wrapper, where the internal array size is managed by its methods. The constructor that takes a capacity simply creates an array of that size internally, but the count property (which reflects the count elements that has been added) will be zero. So in essence, zero slots in the array has been assigned a value.
The size of an array is managed by you the programmer. That is why you have to call static methods like System.Array.Resize (notice that the array argument is ref), if you want to change an array yourself. That method allocates a new chunk of memory for the new size.
So to sum up, the list essentially manages an array for you, and as such, the tradeoff is that you can only access as many array-like slots as has been added to it.
I have huge transient arrays created rapidly. Some are kept, some are GC-d. This defragments the heap and the app consumes approx. 2.5x more memory than it would truly need resulting OutOfMemoryException.
As a solution, I would prefer to have one gigantic array (PointF[]) and do the allocation and management of segments by myself. But I wonder how I could make two (or more) arrays share the same memory space.
PointF[] giganticList = new PointF[100];
PointF[] segment = ???;
// I want the segment length to be 20 and starting e.g at position 50
// within the gigantic list
I am thinking of a trick like the winner answer of this SO question.
Would that be possible? The problem is that the length and the number of the segment arrays are known only in runtime.
Assuming you are sure that your OutOfMemoryException could be avoided, and your approach of having it all in memory isn't the actual issue (the GC is pretty good at stopping this happening if memory is available) ...
Here is your first problem. I'm not sure the CLR supports any single object larger than 2 GB.
Crucial Edit - gcAllowVeryLargeObjects changes this on 64-bit systems - try this before rolling your own solution.
Secondly you are talking about "some are kept some are GC'd". i.e. you want to be able to reallocate elements of your array once you are done with a "child array".
Thirdly, I'm assuming that the PointF[] giganticList = new PointF[100]; in your question is meant to be more like PointF[] giganticList = new PointF[1000000];?
Also consider using MemoryFailPoint as this allows you to "demand" memory and check for exceptions instead of crashing with OutOfMemoryException.
EDIT Perhaps most importantly you are now entering a land of trade-offs. If you do this you could start losing the advantages of things like the jitter optimising for loops by doing array bound checks at the start of the loop (for (int i= 0; i < myArray.Length; i++)
gets optimised, int length = 5; for (int i= 0; i < length; i++) doesn't). If you have high computation resource code, then this could hurt you. You are also going to have to work far harder to process different child arrays in parallel with each other as well. Creating copies of the child arrays, or sections of them, or even items inside them, is still going to allocate more memory which will be GC'd.
This is possible by wrapping the array, and tracking which sections are used for which child arrays. You are essentially talking about allocating a huge chunk of memory, and then reusing parts of it without putting the onus on the GC. You can take advantage of ArraySegment<T>, but that comes with its own potential issues like exposing the original array to all callers.
This is not going to be simple, but it is possible. Likely as not each time you remove a child array you will want to defragment your master array by shifting other child arrays to close the gaps (or do that when you have run out of contiguous segments).
A simple example would look something like the (untested, don't blame me if your computer leaves home and your cat blows up) pseudocode below. There are two other approaches, I mention those at the end.
public class ArrayCollection {
List<int> startIndexes = new List<int>();
List<int> lengths = new List<int>();
const int 1beeellion = 100;
PointF[] giganticList = new PointF[1beeellion];
public ArraySegment<PointF> this[int childIndex] {
get {
// Care with this method, ArraySegment exposes the original array, which callers could then
// do bad things to
return new ArraySegment<String>(giganticList, startIndexes[childIndex], length[childIndex]);
}}
// returns the index of the child array
public int AddChild(int length) {
// TODO: needs to take account of lists with no entries yet
int startIndex = startIndexes.Last() + lengths.Last();
// TODO: check that startIndex + length is not more than giganticIndex
// If it is then
// find the smallest unused block which is larger than the length requested
// or defrag our unused array sections
// otherwise throw out of memory
startIndexes.Add(startIndex); // will need inserts for defrag operations
lengths.Add(length); // will need inserts for defrag operations
return startIndexes.Count - 1; // inserts will need to return inserted index
}
public ArraySegment<PointF> GetChildAsSegment(int childIndex) {
// Care with this method, ArraySegment exposes the original array, which callers could then
// do bad things to
return new ArraySegment<String>(giganticList, startIndexes[childIndex], length[childIndex]);
}
public void SetChildValue(int childIndex, int elementIndex, PointF value) {
// TODO: needs to take account of lists with no entries yet, or invalid childIndex
// TODO: check and PREVENT buffer overflow (see warning) here and in other methods
// e.g.
if (elementIndex >= lengths[childIndex]) throw new YouAreAnEvilCallerException();
int falseZeroIndex = startIndexes[childIndex];
giganticList[falseZeroIndex + elementIndex];
}
public PointF GetChildValue(int childIndex, int elementIndex) {
// TODO: needs to take account of lists with no entries yet, bad child index, element index
int falseZeroIndex = startIndexes[childIndex];
return giganticList[falseZeroIndex + elementIndex];
}
public void RemoveChildArray(int childIndex) {
startIndexes.RemoveAt(childIndex);
lengths.RemoveAt(childIndex);
// TODO: possibly record the unused segment in another pair of start, length lists
// to allow for defraging in AddChildArray
}
}
Warning The above code effectively introduces buffer overflow vulnerabilities if, for instance, you don't check the requested childIndex against length for the child array in methods like SetChildValue. You must understand this and prevent it before trying to do this in production, especially if combining these approaches with use of unsafe.
Now, this could be extended to return psuedo index public PointF this[int index] methods for child arrays, enumerators for the child arrays etc., but as I say, this is getting complex and you need to decide if it really will solve your problem. Most of your time will be spent on the reuse (first) defrag (second) expand (third) throw OutOfMemory (last) logic.
This approach also has the advantage that you could allocate many 2GB subarrays and use them as a single array, if my comment about the 2GB object limit is correct.
This assumes you don't want to go down the unsafe route and use pointers, but the effect is the same, you would just create a wrapper class to manage child arrays in a fixed block of memory.
Another approach is to use the hashset/dictionary approach. Allocate your entire (massive 2GB array) and break it into chunks (say 100 array elements). A child array will then have multiple chunks allocated to it, and some wasted space in its final chunk. This will have the impact of some wasted space overall (depending on your average "child length vs. chunk length" predictions), but the advantage that you could increase and decrease the size of child arrays, and remove and insert child arrays with less impact on your fragmentation.
Noteworthy References:
Large arrays in 64 bit .NET 4: gcAllowVeryLargeObjects
MemoryFailPoint - allows you to "demand" memory and check for exceptions instead of crashing with OutOfMemoryException after the fact
Large Arrays, and LOH Fragmentation. What is the accepted convention?
3GB process limit on 32 bit, see: 3_GB_barrier, Server Fault /3GB considerations and AWE/PAE
buffer overflow vulnerability, and why you can get this in C#
Other examples of accessing arrays as a different kind of array or structure. The implementations of these might help you develop your own solution
BitArray Class
BitVector32 Structure
NGenerics - clever uses of array like concepts in some of the members of this library, particularly the general structures such as ObjectMatrix and Bag
C# array of objects, very large, looking for a better way
Array optimisation
Eric Gu - Efficiency of iteration over arrays? - note the age of this, but the approach of how to look for JIT optimisation is still relevant even with .NET 4.0 (see Array Bounds Check Elimination in the CLR? for example)
Dave Detlefs - Array Bounds Check Elimination in the CLR
Warning - pdf: Implicit Array Bounds Checking on 64-bit Architectures
LinkedList - allows you to reference multiple disparate array buckets in a sequence (tieing together the chunks in the chunked bucket approach)
Parallel arrays and use of unsafe
Parallel Matrix Multiplication With the Task Parallel Library (TPL), particularly UnsafeSingle - a square or jagged array represented by a single array is the same class of problem you are trying to solve.
buffer overflow vulnerability, and why you can get this in C# (yes I have mentioned this three times now, it is important)
Your best bet here is probably to use multiple ArraySegment<PointF> all on the same PointF[] instance, but at different offsets, and have your calling code take note of the relative .Offset and .Count. Note that you would have to write your own code to allocate the next block, and look for gaps, etc - essentially your own mini-allocator.
You can't treat the segments just as a PointF[] directly.
So:
PointF[] giganticList = new PointF[100];
// I want the segment length to be 20 and starting e.g at position 50
// within the gigantic list
var segment = new ArraySegment<PointF>(giganticList, 50, 20);
As a side note: another approach might be to use a pointer to the data - either from an unmanaged allocation, or from a managed array that has been pinned (note: you should try to avoid pinning), but : while a PointF* can convey its own offset information, it cannot convey length - so you'd need to always pass both a PointF* and Length. By the time you've done that, you might as well have just used ArraySegment<T>, which has the side-benefit of not needing any unsafe code. Of course, depending on the scenario, treating the huge array as unmanaged memory may (in some scenarios) still be tempting.
I've encountered a problem, which is best illustrated with this code segment:
public static void Foo(long RemoveLocation)
{
// Code body here...
// MyList is a List type collection object.
MyList.RemoveAt(RemoveLocation);
}
Problem: RemoveLocation is a long. The RemoveAt method takes only int types. How do I get around this problem?
Solutions I'd prefer to avoid (because it's crunch time on the project):
Splitting MyList into two or more lists; that would require rewriting a lot of code.
Using int instead of long.
If there was a way you could group similar items together, could you bring the total down below the limit? E.g. if your data contains lots of repeated X,Y coords, you might be able to reduce the number of elements and still keep one list, by creating a frequency count field. e.g. (x,y,count)
In theory, the maximum number of elements in a list is int.MaxValue, which is about 2 billion.
However, it is very inefficient to use the list type to store an extremely large number of elements. It simply has not been designed for that and you're doing way better with a tree-like data structure.
For instance, if you look at Mono's implementation of the list types, you'll see that they're using a single array to hold the elements and I assume .NET's version does the same. Since the maximum size of an element in .NET is 2 GB, the actual maximum number of elements is 2 billion divided by the element size. So, for instance a list of strings on a 64-bit machine could hold at most about 268 million elements.
When using the mutable (non-readonly) list types, this array needs to be re-allocated to a larger size (usually using twice the old size) when adding items, requiring the entire contents to be copied. This is very inefficient.
In addition to this, having too large objects could also have negative impacts on the garbage collector.
Update
If you really need a very large list, you could simply write your own data type, for instance using an array or large arrays as internal storage.
There are also some useful comments about this here:
http://blogs.msdn.com/b/joshwil/archive/2005/08/10/450202.aspx
I know that it takes 4 bytes to store a uint in memory, but how much space in memory does it take to store List<uint> for, say, x number of uints?
How does this compare to the space required by uint[]?
There is no per-item overhead of a List<T> because it uses a T[] to store its data. However, a List<T> containing N items may have 2N elements in its T[]. Also, the List data structure itself has probably 32 or more bytes of overhead.
You probably will notice not so much difference between T[] and list<T> but you can use
System.GC.GetTotalMemory(true);
before and after an object allocation to obtain an approximate memory usage.
List<> uses an array internally, so a List<uint> should take O(4bytes * n) space, just like a uint[]. There may be some more constant overhead in comparison to a array, but you should normally not care about this.
Depending on the specific implementation (this may be different when using Mono as a runtime instead of the MS .NET runtime), the internal array will be bigger than the number of actual items in the list. E.g.: a list of 5 elements has an internal array that can store 10, a list of 10000 elements may have an internal array of size 11000. So you cant generally say that the internal array will always be twice as big, or 5% bigger than the number of list element, it may also depend on the size.
Edit: I've just seen, Hans Passant has described the growing behaviour of List<T> here.
So, if you have a collection of items that you want to append to, and you cant know the size of this collection at the time the list is created, use a List<T>. It is specifically designed for this case. It provides fast random access O(1) to the elements, and has very little memory overhead (internal array). It is on the other hand very slow on removing or inserting in the middle of the list. If you need those operations often, use a LinkedList<T>, which has then more memory overhead (per item!), however. If you know the size of you collection from the beginning, and you know that is wont change (or just very few times) use arrays.
I'm using an application which uses a number of large dictionaries ( up to 10^6 elements), the size of which is unknown in advance, (though I can guess in some cases). I'm wondering how the dictionary is implemented, i.e. how bad the effect is if I don't give an initial estimate of the dictionary size. Does it internally use a (self-growing) array in the way List does? in which case letting the dictionaries grow might leave a lot of large un-referenced arrays on the LOH.
Using Reflector, I found the following: The Dictionary keeps the data in a struct array. It keeps a count on how many empty places are left in that array. When you add an item and no empty place is left, it increases the size of the internal array (see below) and copies the data from the old array to the new array.
So I would suggest you should use the constructor in which you set the initial size if you know there will be many entries.
EDIT: The logic is actually quite interesting: There is an internal class called HashHelpers to find primes. To speed this up, it also has stored some primes in a static array from 3 up to 7199369 (some are missing; for the reason, see below). When you supply a capacity, it finds the next prime (same value or larger) from the array, and uses that as initial capacity. If you give it a larger number than in its array, it starts checking manually.
So if nothing is passed as capacity to the Dictionary, the starting capacity is three.
Once the capacity is exceeded, it multiplies the current capacity by two and then finds the next larger prime using the helper class. That is why in the array not every prime is needed, since primes "too close together" aren't really needed.
So if we pass no initial value, we would get (I checked the internal array):
3
7
17
37
71
163
353
761
1597
3371
7013
14591
30293
62851
130363
270371
560689
1162687
2411033
4999559
Once we pass this size, the next step falls outside the internal array, and it will manually search for larger primes. This will be quite slow. You could initialize with 7199369 (the largest value in the array), or consider if having more than about 5 million entries in a Dictionary might mean that you should reconsider your design.
MSDN says: "Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table." and further on "the capacity is automatically increased as required by reallocating the internal array."
But you get less reallocations if you give an initial estimate. If you have all items from the beginning the LINQ method ToDictionary might be handy.
Hashtables normally have something called a load factor, that will increase the backing bucket store if this threshold is reached. IIRC the default is something like 0.72. If you had perfect hashing, this can be increased to 1.0.
Also when the hashtable needs more buckets, the entire collection has to be rehashed.
The best way for me would be to use the .NET Reflector.
http://www.red-gate.com/products/reflector/
Use the disassembled code to see the implementation.
JSON as dictionary
{
"Details":
{
"ApiKey": 50125
}
}
Model should contain property as type Dictionary.
public Dictionary<string, string> Details{ get; set; }
Implement foreach() block with datatype as "KeyValue"
foreach (KeyValuePair<string, string> dict in Details)
{
switch (dict.Key)
{
case nameof(settings.ApiKey):
int.TryParse(kv.Value, out int ApiKey);
settings.ApiKey=ApiKey;
break;
default:
break;
}
}