Memory usage of Dictionaries in C# - c#

I have some code that I added a nested dictionary to, of the following format
Dictionary<string, Dictionary<string, Dictionary<string, float>>>
After doing so I noticed the memory usage of my application shot up SIGNIFICANTLY. These dictionaries are keyed on strings that are often repeated, and there are many of these dictionaries, on the order of 10's of thousands.
In order to address this problem I hypothesized that the repeated strings were eating up a significant amount of memory. My solution was to hash the strings and use an integer instead (I would keep one copy of a rainbow table so I could reverse the hash when necessary)
Dictionary<int, Dictionary<int, Dictionary<int, float>>>
So I went to a memory profiler to see what kind of size reduction I could get. To my shock I actually found that the string storage was actually smaller in size (both normal and inclusive).
This doesn't make intuitive sense to me. Even if the compiler was smart enough to only store one copy of the string and use a reference, I would think that reference would be a pointer which is double the size of an int. I also didn't use any String.Intern methods so I don't know how this would have been accomplished (also is String.Intern the right method here?)
I'm very confused as to what's happening under the hood, any help would be appreciated

If your keys and values are objects, there's approximately 20 bytes of overhead for each element of a dictionary, plus several more bytes per dictionary. This is in addition to the space consumed by the keys and values themselves. if you have value types as keys and values, then it's 12 bytes plus the space consumed by the key and value for each item in the dictionary. This is if the number of elements equals the internal dictionary capacity. But typically there is more capacity than elements, so there is wasted space.
The wasted space will generally be a higher relative percentage if you have lots of dictionaries with a small number of elements than if you had one dictionary with many elements. If I go by your comment, your dictionaries with 8 elements will have a capacity of 11, those with 2 elements will have a capacity of 3, and those with 10 will have a capacity of 11.
If I understand your nesting counts, then a single top level dictionary will represent 184 dictionary elements. But if we count unused capacity, it's closer to 200 as far as space consumption. 200 * 20 = 4000 bytes for each top level dictionary. How many of those do you have? You say 10's of thousands of them in thousand of objects. Every 10,000 is going to consume about 38 MB of dictionary overhead. Add to that the objects stored in the dictionary.
A possible explanation of why your attempt to make it smaller by managing the hash codes would be if there are not a lot of duplicated references to your keys. Replacing an object reference key with an int key doesn't change the dictionary overhead amount, and you're adding the storage of your new collection of hash codes.

Related

How double hashing works in case of the .NET Dictionary?

The other day I was reading that article on CodeProject
And I got hard times understanding a few points about the implementation of the .NET Dictionary (considering the implementation here without all the optimizations in .NET Core):
Note: If will add more items than the maximum number in the table
(i.e 7199369), the resize method will manually search the next prime
number that is larger than twice the old size.
Note: The reason that the sizes are being doubled while resizing the
array is to make the inner-hash table operations to have asymptotic
complexity. The prime numbers are being used to support
double-hashing.
So I tried to remember my old CS classes back a decade ago with my good friend wikipedia:
Open Addressing
Separate Chaining
Double Hashing
But I still don't really see how first it relates to double hashing (which is a collision resolution technique for open-addressed hash tables) except the fact that the Resize() method double of the entries based on the minimum prime number (taken based on the current/old size), and tbh I don't really see the benefits of "doubling" the size, "asymptotic complexity" (I guess that article meant O(n) when the underlying array (entries) is full and subject to resize).
First, If you double the size with or without using a prime, is it not really the same?
Second, to me, the .NET hash table use a separate chaining technique when it comes to collision resolution.
I guess I must have missed a few things and I would like to have someone who can shed the light on those two points.
I got my answer on Reddit, so I am gonna try to summarize here:
Collision Resolution Technique
First off, it seems that the collision resolution is using Separate Chaining technique and not Open addressing technique and therefore there is no Double Hashing strategy:
The code goes as follows:
private struct Entry
{
public int hashCode; // Lower 31 bits of hash code, -1 if unused
public int next; // Index of next entry, -1 if last
public TKey key; // Key of entry
public TValue value; // Value of entry
}
It just that instead of having one dedicated storage for all the entries sharing the same hashcode / index like a list or whatnot for every bucket, everything is stored in the same entries array.
Prime Number
About the prime number the answer lies here: https://cs.stackexchange.com/a/64191/42745 it's all about multiple:
Therefore, to minimize collisions, it is important to reduce the number of common factors between m and the elements of K. How can this
be achieved? By choosing m to be a number that has very few factors: a
prime number.
Doubling the underlying entries array size
Help to avoid call too many resize operations (i.e. copies) by increasing the size of the array by enough amount of slots.
See that answer: https://stackoverflow.com/a/2369504/4636721
Hash-tables could not claim "amortized constant time insertion" if,
for instance, the resizing was by a constant increment. In that case
the cost of resizing (which grows with the size of the hash-table)
would make the cost of one insertion linear in the total number of
elements to insert. Because resizing becomes more and more expensive
with the size of the table, it has to happen "less and less often" to
keep the amortized cost of insertion constant.

Lookup time of Dicionary.ContainsKey() [duplicate]

This question already has an answer here:
What is performance of ContainsKey and TryGetValue?
(1 answer)
Closed 9 years ago.
As I have read on wikipedia that hash tables have on average O(1) search time.
So lets say I have a very large dictionary that contains maybe tens of millions of records.
If I use Dicionary.ContainsKey to extract the value against a given key will it's lookup time be really 1 or would it be like log n or something else due to some different internal implementation by .NET.
Big Oh notation doesn't tell you how long something takes. It tells you how it scales.
Easiest one to envision is searching for an item in a List<>, it has O(n) complexity. If it takes, on average, 2 milliseconds to find an item in a list with a million elements then you can expect it to take 4 milliseconds if the list has two million elements. It scales linearly with the size of the list.
O(1) predicts constant time for finding an element in a dictionary. In other words, it doesn't depend on the size of the dictionary. If the dictionary is twice as big, it doesn't take twice as long to find the element, it takes (roughly) as much time. The "roughly" means that it actually does take a bit longer, it is amortized O(1).
It would still be close to O(1), because it would still not depend on the number of the entries, but on the numbers of the collisions you have. Indexing an array is still O(1), no matter how many items you have.
Also, there seems to be a top limit on size of Dictionary caused by the implementation: How is the c#/.net 3.5 dictionary implemented?
Once we pass this size, the next step falls outside the internal array, and it will manually search for larger primes. This will be quite slow. You could initialize with 7199369 (the largest value in the array), or consider if having more than about 5 million entries in a Dictionary might mean that you should reconsider your design.
What is the key? If the key is Int32 then yes it will be close to order 1.
You only get less than order 1 if there are hash collisions.
Int32 as a key will have zero hash collisions but that does not guarantee zero hash bucket collisions.
Be careful of keys that produce hash collisions.
KVP and tuple can create a lot of hash collisions and are not good candidates for key.

Does C# Array resize take longer the bigger the array?

In my application I do System.Array.Resize once per frame. Initially I set my arrays to a maximum possible size, and then Resize them to something smaller. In some cases it may be a lot smaller, in others it may be just a little smaller. It appears to me though that the more elements there are to resize, the longer it takes. Perhaps my observations are wrong, and that is why I am asking here.
It should do yes, resizing involves allocating new memory to the size you want and copying the old array into the new one. The larger the array, the more to copy.
From MSDN:
This method allocates a new array with the specified size, copies
elements from the old array to the new one, and then replaces the old
array with the new one.
Without knowing too much about the code, try using List<T> to manage the list and the resizing you need to do and when you need to provide it to Unity, call list.ToArray();.
This will still create the array and copy it, but only once per frame.
As other answers note, "resizing" an array requires copying all the elements, which is an O(N) operation when N gets large. Note that there are a number of approaches that can be used for copying arrays, with differing "setup" and "per-item" costs. A small array-copy operation may be processed 4 bytes at a time (or in some cases, one byte at a time), while a larger array operation would use special 16-byte operations to do most of the copying. These operations are limited to writing aligned 16-byte chunks of memory at a time. Depending upon source and destination alignment, a large array operation might require copying four groups of four bytes (the last byte of which will overlap the next group), many groups of 16 bytes, and four more groups of four bytes (the first byte of which will overlap the previous group). Determining how to subdivide the groups is a little tricky, so for smaller block-copy requests it's more efficient to use one- or four-byte operations.
Note that the real key to minimizing the expense of array resizing is to do it as seldom as possible. Whenever the List<T> type has to expand the size of its array, it doubles it. If its array starts at 16 items, then at the time it doubles the array to 256 elements, 128 will be empty, 64 will have been copied once, 32 will be copied twice, and 16 will have been copied three times. Note that while some elements will end up being copied lg(N) times, the total number of element copy operations in the process of building a list of size N will always be less than 2N.
There's no way to access the backing array of a List<T> as an array, but it's fairly easy to re-implement the class in such a way as to expose the array, and make sure any methods that accept an array as a parameter allow one to specify the length of the portion to be used (instead of just accessing the Length property of the array).
Yes. Array resizing is an O(n) operation. It has to copy each element into the new array.
Maybe it would be better however if you did not use arrays? What are the arrays used for? There might be a better data structure suitable for you application.

How much to grow buffer in a StringBuilder-like C module?

In C, I'm working on a "class" that manages a byte buffer, allowing arbitrary data to be appended to the end. I'm now looking into automatic resizing as the underlying array fills up using calls to realloc. This should make sense to anyone who's ever used Java or C# StringBuilder. I understand how to go about the resizing. But does anyone have any suggestions, with rationale provided, on how much to grow the buffer with each resize?
Obviously, there's a trade off to be made between wasted space and excessive realloc calls (which could lead to excessive copying). I've seen some tutorials/articles that suggest doubling. That seems wasteful if the user manages to supply a good initial guess. Is it worth trying to round to some power of two or a multiple of the alignment size on a platform?
Does any one know what Java or C# does under the hood?
In C# the strategy used to grow the internal buffer used by a StringBuilder has changed over time.
There are three basic strategies for solving this problem, and they have different performance characteristics.
The first basic strategy is:
Make an array of characters
When you run out of room, create a new array with k more characters, for some constant k.
Copy the old array to the new array, and orphan the old array.
This strategy has a number of problems, the most obvious of which is that it is O(n2) in time if the string being built is extremely large. Let's say that k is a thousand characters and the final string is a million characters. You end up reallocating the string at 1000, 2000, 3000, 4000, ... and therefore copying 1000 + 2000 + 3000 + 4000 + ... + 999000 characters, which sums to on the order of 500 billion characters copied!
This strategy has the nice property that the amount of "wasted" memory is bounded by k.
In practice this strategy is seldom used because of that n-squared problem.
The second basic strategy is
Make an array
When you run out of room, create a new array with k% more characters, for some constant k.
Copy the old array to the new array, and orphan the old array.
k% is usually 100%; if it is then this is called the "double when full" strategy.
This strategy has the nice property that its amortized cost is O(n). Suppose again the final string is a million characters and you start with a thousand. You make copies at 1000, 2000, 4000, 8000, ... and end up copying 1000 + 2000 + 4000 + 8000 ... + 512000 characters, which sums to about a million characters copied; much better.
The strategy has the property that the amortized cost is linear no matter what percentage you choose.
This strategy has a number of downside that sometimes a copy operation is extremely expensive, and you can be wasting up to k% of the final string length in unused memory.
The third strategy is to make a linked list of arrays, each array of size k. When you overflow an existing array, a new one is allocated and appended to the end of the list.
This strategy has the nice property that no operation is particularly expensive, the total wasted memory is bounded by k, and you don't need to be able to locate large blocks in the heap on a regular basis. It has the downside that finally turning the thing into a string can be expensive as the arrays in the linked list might have poor locality.
The string builder in the .NET framework used to use a double-when-full strategy; it now uses a linked-list-of-blocks strategy.
You generally want to keep the growth factor a little smaller than the golden mean (~1.6). When it's smaller than the golden mean, the discarded segments will be large enough to satisfy a later request, as long as they're adjacent to each other. If your growth factor is larger than the golden mean, that can't happen.
I've found that reducing the factor to 1.5 still works quite nicely, and has the advantage of being easy to implement in integer math (size = (size + (size << 1))>>1; -- with a decent compiler you can write that as (size * 3)/2, and it should still compile to fast code).
I seem to recall a conversation some years ago on Usenet, in which P.J. Plauger (or maybe it was Pete Becker) of Dinkumware, saying they'd run rather more extensive tests than I ever did, and reached the same conclusion (so, for example, the implementation of std::vector in their C++ standard library uses 1.5).
When working with expanding and contracting buffers, the key property you want is to grow or shrink by a multiple of your size, not a constant difference.
Consider the case where you have a 16 byte array, increasing its size by 128 bytes is overkill; however, if instead you had a 4096 byte array and increased it by only 128 bytes, you would end up copying a lot.
I was taught to always double or halve arrays. If you really have no hint as to the size or maximum, multiplying by two ensures that you have a lot of capacity for a long time, and unless you're working on a resource constrained system, allocating at most twice the space isn't too terrible. Additionally, keeping things in powers of two can let you use bit shifts and other tricks and the underlying allocation is usually in powers of two.
Does any one know what Java or C# does under the hood?
Have a look at the following link to see how it's done in Java's StringBuilder from JDK11, in particular, the ensureCapacityInternal method.
https://java-browser.yawk.at/java/11/java.base/java/lang/AbstractStringBuilder.java#java.lang.AbstractStringBuilder%23ensureCapacityInternal%28int%29
It's implementation-specific, according to the documentation, but starts with 16:
The default capacity for this implementation is 16, and the default
maximum capacity is Int32.MaxValue.
A StringBuilder object can allocate more memory to store characters
when the value of an instance is enlarged, and the capacity is
adjusted accordingly. For example, the Append, AppendFormat,
EnsureCapacity, Insert, and Replace methods can enlarge the value of
an instance.
The amount of memory allocated is implementation-specific, and an
exception (either ArgumentOutOfRangeException or OutOfMemoryException)
is thrown if the amount of memory required is greater than the maximum
capacity.
Based on some other .NET framework things, I would suggest multiplying it by 1.1 each time the current capacity is reached. If extra space is needed, just have an equivalent to EnsureCapacity that will expand it to the necessary size manually.
Translate this to C.
I will probably maitain a List<List<string>> list.
class StringBuilder
{
private List<List<string>> list;
public Append(List<string> listOfCharsToAppend)
{
list.Add(listOfCharsToAppend);
}
}
This way you are just maintaining a list of Lists and allocating memory on demand rather than allocating memory well ahead.
List in .NET framework uses this algorithm: If initial capacity is specified, it creates buffer of this size, otherwise no buffer is allocated until first item(s) is added, which allocates space equal to number of item(s) added, but no less than 4. When more space is needed, it allocates new buffer with 2x previous capacity and copies all items from old buffer to new buffer. Earlier StringBuilder used similar algorithm.
In .NET 4, StringBuilder allocates initial buffer of size specified in constructor (default size is 16 characters). When allocated buffer is too small, no copying is made. Instead it fills current buffer to the rim, then creates new instance of StringBuilder, which allocates buffer of size *MAX(length_of_remaining_data_to_add, MIN(length_of_all_previous_buffers, 8000))* so at least all remaining data fits to new buffer and total size of all buffers is at least doubled. New StringBuilder keeps reference to old StringBuilder and so individual instances creates linked list of buffers.

Compact data structure for storing a large set of integral values

I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple:
Add a new value
Iterate over all of the values.
There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives.
I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions.
EDIT: To answer your questions:
No, the items don't need to be sorted
By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this.
There could be a decent number of these sets in memory at once (~100).
Use the BitArray. It uses only some 6MB of memory; the only real problem is that iteration is Theta(N), i.e. you have to walk the entire range. Locality of reference is good though and you can allocate the entire structure in one operation.
As for wasting space: you waste 6MB in the worst case.
EDIT: ok, you've lots of sets and you're serializing. For serializing on disk, I suggest 6MB files :)
For sending over the wire, just iterate and consider sending ranges instead of individual elements. That does require a sorting structure.
You need lots of these sets. Consider if you have 600MB to spare. Otherwise, check out:
Bytewise tries: O(1) insert, O(n) iteration, much lower constant factors than bitwise tries
A custom hash table, perhaps Google sparsehash through C++/CLI
BSTs storing ranges/intervals
Supernode BSTs
It would depend on the distribution of the sizes of your sets. Unless you expect most of the sets to be (close to) the minimum you've specified, I'd probably use a bitset. To cover a range up to 50,000,000, a bitset ends up ~6 megabytes.
Compared to storing the numbers directly, this is marginally larger for the minimum size set you've specified (~6 megabytes instead of ~4), but considerably smaller for the maximum size set (1/32nd the size).
The second possibility would be to use a delta encoding. For example, instead of storing each number directly, store the difference between that number and the previous number that was included. Given a maximum magnitude of 50,000,000 and a minimum size of 1,000,000 items, the average difference between one number and the next is ~50. This means you can theoretically store the difference in <6 bits on average. I'd probably use the 7 least significant bits directly, and if you need to encode a larger gap, set the msb and (for example) store the size of the gap in the lower 7 bits plus the next three bytes. That can't happen very often, so in most cases you're using only one byte per number, for about 4:1 compression compared to storing numbers directly. In the best case this would use ~1 megabyte for a set, and in the worst about 50 megabytes -- 4:1 compression compared to storing numbers directly.
If you don't mind a little bit of extra code, you could use an adaptive scheme -- delta encoding for small sets (up to 6,000,000 numbers), and a bitmap for larger sets.
I think the answer depends on what you mean by "passing around" and what you're trying to accomplish. You say you are only adding to the list: how often do you add? How fast will the list grow? What is an acceptable overhead for memory use, versus the time to reallocate memory?
In your worst case, 50,000,000 32-bit numbers = 200 megabytes using the most efficient possible data storage mechanism. Assuming you may end up with this much use in your worst case scenario, is it OK to use this much memory all the time? Is that better than having to reallocate memory frequently? What's the distribution of typical usage patterns? You could always just use an int[] that's pre-allocated to the whole 50 million.
As far as access speed for your operations, nothing is faster than iterating and adding to a pre-allocated chunk of memory.
From OP edit: There could be a decent number of these sets in memory at once (~100).
Hey now. You need to store 100 sets of 1 to 50 million numbers in memory at once? I think the bitset method is the only possible way this could work.
That would be 600 megabytes. Not insignificant, but unless they are (typically) mostly empty, it seems very unlikely that you would find a more efficient storage mechanism.
Now, if you don't use bitsets, but rather use dynamically sized constructs, and they could somehow take up less space to begin with, you're talking about a real ugly memory allocation/deallocation/garbage collection scenario.
Let's assume you really need to do this, though I can only imagine why. So your server's got a ton of memory, just allocate as many of these 6 megabyte bitsets as you need and recycle them. Allocation and garbage collection are no longer a problem. Yeah, you're using a ton of memory, but that seems inevitable.

Categories

Resources