In my application I do System.Array.Resize once per frame. Initially I set my arrays to a maximum possible size, and then Resize them to something smaller. In some cases it may be a lot smaller, in others it may be just a little smaller. It appears to me though that the more elements there are to resize, the longer it takes. Perhaps my observations are wrong, and that is why I am asking here.
It should do yes, resizing involves allocating new memory to the size you want and copying the old array into the new one. The larger the array, the more to copy.
From MSDN:
This method allocates a new array with the specified size, copies
elements from the old array to the new one, and then replaces the old
array with the new one.
Without knowing too much about the code, try using List<T> to manage the list and the resizing you need to do and when you need to provide it to Unity, call list.ToArray();.
This will still create the array and copy it, but only once per frame.
As other answers note, "resizing" an array requires copying all the elements, which is an O(N) operation when N gets large. Note that there are a number of approaches that can be used for copying arrays, with differing "setup" and "per-item" costs. A small array-copy operation may be processed 4 bytes at a time (or in some cases, one byte at a time), while a larger array operation would use special 16-byte operations to do most of the copying. These operations are limited to writing aligned 16-byte chunks of memory at a time. Depending upon source and destination alignment, a large array operation might require copying four groups of four bytes (the last byte of which will overlap the next group), many groups of 16 bytes, and four more groups of four bytes (the first byte of which will overlap the previous group). Determining how to subdivide the groups is a little tricky, so for smaller block-copy requests it's more efficient to use one- or four-byte operations.
Note that the real key to minimizing the expense of array resizing is to do it as seldom as possible. Whenever the List<T> type has to expand the size of its array, it doubles it. If its array starts at 16 items, then at the time it doubles the array to 256 elements, 128 will be empty, 64 will have been copied once, 32 will be copied twice, and 16 will have been copied three times. Note that while some elements will end up being copied lg(N) times, the total number of element copy operations in the process of building a list of size N will always be less than 2N.
There's no way to access the backing array of a List<T> as an array, but it's fairly easy to re-implement the class in such a way as to expose the array, and make sure any methods that accept an array as a parameter allow one to specify the length of the portion to be used (instead of just accessing the Length property of the array).
Yes. Array resizing is an O(n) operation. It has to copy each element into the new array.
Maybe it would be better however if you did not use arrays? What are the arrays used for? There might be a better data structure suitable for you application.
Related
I have some code that I added a nested dictionary to, of the following format
Dictionary<string, Dictionary<string, Dictionary<string, float>>>
After doing so I noticed the memory usage of my application shot up SIGNIFICANTLY. These dictionaries are keyed on strings that are often repeated, and there are many of these dictionaries, on the order of 10's of thousands.
In order to address this problem I hypothesized that the repeated strings were eating up a significant amount of memory. My solution was to hash the strings and use an integer instead (I would keep one copy of a rainbow table so I could reverse the hash when necessary)
Dictionary<int, Dictionary<int, Dictionary<int, float>>>
So I went to a memory profiler to see what kind of size reduction I could get. To my shock I actually found that the string storage was actually smaller in size (both normal and inclusive).
This doesn't make intuitive sense to me. Even if the compiler was smart enough to only store one copy of the string and use a reference, I would think that reference would be a pointer which is double the size of an int. I also didn't use any String.Intern methods so I don't know how this would have been accomplished (also is String.Intern the right method here?)
I'm very confused as to what's happening under the hood, any help would be appreciated
If your keys and values are objects, there's approximately 20 bytes of overhead for each element of a dictionary, plus several more bytes per dictionary. This is in addition to the space consumed by the keys and values themselves. if you have value types as keys and values, then it's 12 bytes plus the space consumed by the key and value for each item in the dictionary. This is if the number of elements equals the internal dictionary capacity. But typically there is more capacity than elements, so there is wasted space.
The wasted space will generally be a higher relative percentage if you have lots of dictionaries with a small number of elements than if you had one dictionary with many elements. If I go by your comment, your dictionaries with 8 elements will have a capacity of 11, those with 2 elements will have a capacity of 3, and those with 10 will have a capacity of 11.
If I understand your nesting counts, then a single top level dictionary will represent 184 dictionary elements. But if we count unused capacity, it's closer to 200 as far as space consumption. 200 * 20 = 4000 bytes for each top level dictionary. How many of those do you have? You say 10's of thousands of them in thousand of objects. Every 10,000 is going to consume about 38 MB of dictionary overhead. Add to that the objects stored in the dictionary.
A possible explanation of why your attempt to make it smaller by managing the hash codes would be if there are not a lot of duplicated references to your keys. Replacing an object reference key with an int key doesn't change the dictionary overhead amount, and you're adding the storage of your new collection of hash codes.
Array.Copy does the copy work but won't tell me how many elements were copied. In situations where either array is not large enough so that fewer elements than i asked for were copied, i need to know how many. So is there some API to achieve this, or do i have to do the calculation in my own code? Thanks.
Array.Copy will fail if not enough space exists in the destination array. Nothing will be copied in that case. If you think otherwise you must be misinterpreting something.
In any case you have to perform the calculations yourself.
If I understand you correctly, you want to copy one array into a new one and then look how many items were copied?
This is futile. Array.Copy() will either copy every item, or no item at all. you will get an ArgumentException if your destination array is smaller than your source array.
MSDN tells you the following:
The sourceArray and destinationArray parameters must have the same number of dimensions. In addition, destinationArray must already have been dimensioned and must have a sufficient number of elements to accommodate the copied data.
Or if you use the Array.Copy Method (Array, Int64, Array, Int64, Int64) overload you need to have enough space in your target array to accommodate the source array:
The sourceArray and destinationArray parameters must have the same number of dimensions. In addition, destinationArray must already have been dimensioned and must have a sufficient number of elements starting from the destinationIndex position to accommodate the copied data.
Either way, there is no way that only part of the items get copied, at least not with Array.Copy
It'll copy however elements you tell it to. Array.Copy() takes a parameter specifying the number of elements to copy.
You can use the Length property of each array to check the size.
According to documentation, every overload of the Array.Copy has length parameter, so you know how many elements will be copied beforehand.
For example, if we call it like this:
Array.Copy(sourceArray, destinationArray, length);
then as documentation says: "if length is greater than the number of elements in sourceArray or length is greater than the number of elements in destinationArray the ArgumentException will be thrown".
In C, I'm working on a "class" that manages a byte buffer, allowing arbitrary data to be appended to the end. I'm now looking into automatic resizing as the underlying array fills up using calls to realloc. This should make sense to anyone who's ever used Java or C# StringBuilder. I understand how to go about the resizing. But does anyone have any suggestions, with rationale provided, on how much to grow the buffer with each resize?
Obviously, there's a trade off to be made between wasted space and excessive realloc calls (which could lead to excessive copying). I've seen some tutorials/articles that suggest doubling. That seems wasteful if the user manages to supply a good initial guess. Is it worth trying to round to some power of two or a multiple of the alignment size on a platform?
Does any one know what Java or C# does under the hood?
In C# the strategy used to grow the internal buffer used by a StringBuilder has changed over time.
There are three basic strategies for solving this problem, and they have different performance characteristics.
The first basic strategy is:
Make an array of characters
When you run out of room, create a new array with k more characters, for some constant k.
Copy the old array to the new array, and orphan the old array.
This strategy has a number of problems, the most obvious of which is that it is O(n2) in time if the string being built is extremely large. Let's say that k is a thousand characters and the final string is a million characters. You end up reallocating the string at 1000, 2000, 3000, 4000, ... and therefore copying 1000 + 2000 + 3000 + 4000 + ... + 999000 characters, which sums to on the order of 500 billion characters copied!
This strategy has the nice property that the amount of "wasted" memory is bounded by k.
In practice this strategy is seldom used because of that n-squared problem.
The second basic strategy is
Make an array
When you run out of room, create a new array with k% more characters, for some constant k.
Copy the old array to the new array, and orphan the old array.
k% is usually 100%; if it is then this is called the "double when full" strategy.
This strategy has the nice property that its amortized cost is O(n). Suppose again the final string is a million characters and you start with a thousand. You make copies at 1000, 2000, 4000, 8000, ... and end up copying 1000 + 2000 + 4000 + 8000 ... + 512000 characters, which sums to about a million characters copied; much better.
The strategy has the property that the amortized cost is linear no matter what percentage you choose.
This strategy has a number of downside that sometimes a copy operation is extremely expensive, and you can be wasting up to k% of the final string length in unused memory.
The third strategy is to make a linked list of arrays, each array of size k. When you overflow an existing array, a new one is allocated and appended to the end of the list.
This strategy has the nice property that no operation is particularly expensive, the total wasted memory is bounded by k, and you don't need to be able to locate large blocks in the heap on a regular basis. It has the downside that finally turning the thing into a string can be expensive as the arrays in the linked list might have poor locality.
The string builder in the .NET framework used to use a double-when-full strategy; it now uses a linked-list-of-blocks strategy.
You generally want to keep the growth factor a little smaller than the golden mean (~1.6). When it's smaller than the golden mean, the discarded segments will be large enough to satisfy a later request, as long as they're adjacent to each other. If your growth factor is larger than the golden mean, that can't happen.
I've found that reducing the factor to 1.5 still works quite nicely, and has the advantage of being easy to implement in integer math (size = (size + (size << 1))>>1; -- with a decent compiler you can write that as (size * 3)/2, and it should still compile to fast code).
I seem to recall a conversation some years ago on Usenet, in which P.J. Plauger (or maybe it was Pete Becker) of Dinkumware, saying they'd run rather more extensive tests than I ever did, and reached the same conclusion (so, for example, the implementation of std::vector in their C++ standard library uses 1.5).
When working with expanding and contracting buffers, the key property you want is to grow or shrink by a multiple of your size, not a constant difference.
Consider the case where you have a 16 byte array, increasing its size by 128 bytes is overkill; however, if instead you had a 4096 byte array and increased it by only 128 bytes, you would end up copying a lot.
I was taught to always double or halve arrays. If you really have no hint as to the size or maximum, multiplying by two ensures that you have a lot of capacity for a long time, and unless you're working on a resource constrained system, allocating at most twice the space isn't too terrible. Additionally, keeping things in powers of two can let you use bit shifts and other tricks and the underlying allocation is usually in powers of two.
Does any one know what Java or C# does under the hood?
Have a look at the following link to see how it's done in Java's StringBuilder from JDK11, in particular, the ensureCapacityInternal method.
https://java-browser.yawk.at/java/11/java.base/java/lang/AbstractStringBuilder.java#java.lang.AbstractStringBuilder%23ensureCapacityInternal%28int%29
It's implementation-specific, according to the documentation, but starts with 16:
The default capacity for this implementation is 16, and the default
maximum capacity is Int32.MaxValue.
A StringBuilder object can allocate more memory to store characters
when the value of an instance is enlarged, and the capacity is
adjusted accordingly. For example, the Append, AppendFormat,
EnsureCapacity, Insert, and Replace methods can enlarge the value of
an instance.
The amount of memory allocated is implementation-specific, and an
exception (either ArgumentOutOfRangeException or OutOfMemoryException)
is thrown if the amount of memory required is greater than the maximum
capacity.
Based on some other .NET framework things, I would suggest multiplying it by 1.1 each time the current capacity is reached. If extra space is needed, just have an equivalent to EnsureCapacity that will expand it to the necessary size manually.
Translate this to C.
I will probably maitain a List<List<string>> list.
class StringBuilder
{
private List<List<string>> list;
public Append(List<string> listOfCharsToAppend)
{
list.Add(listOfCharsToAppend);
}
}
This way you are just maintaining a list of Lists and allocating memory on demand rather than allocating memory well ahead.
List in .NET framework uses this algorithm: If initial capacity is specified, it creates buffer of this size, otherwise no buffer is allocated until first item(s) is added, which allocates space equal to number of item(s) added, but no less than 4. When more space is needed, it allocates new buffer with 2x previous capacity and copies all items from old buffer to new buffer. Earlier StringBuilder used similar algorithm.
In .NET 4, StringBuilder allocates initial buffer of size specified in constructor (default size is 16 characters). When allocated buffer is too small, no copying is made. Instead it fills current buffer to the rim, then creates new instance of StringBuilder, which allocates buffer of size *MAX(length_of_remaining_data_to_add, MIN(length_of_all_previous_buffers, 8000))* so at least all remaining data fits to new buffer and total size of all buffers is at least doubled. New StringBuilder keeps reference to old StringBuilder and so individual instances creates linked list of buffers.
I'm working on a genetic algorithm project where I encode my chromosome in a binary string where each 32 bits represents a value. The problem is that the functions I'm optimizing has over 3000 parameters which implies that I have over 96000 bits in my bit string and the manipulations i do on this are simply to slow...
I have proceeded as following: I have a binary class where I'm creating a boolean array that represents my big binary string. Then I manipulate this binary string with various shifts and moves and such.
My question is, is there a better way to do this? The speed is just killing. I'm sure it would be fine if i only did this on one bit string but i have to do the manipulations on 25 bit strings for way over 1000 generations.
What I would do is take a step back. Your analysis seems to be wedded to an implementation detail, namely that you have chosen bool[] as how you represent a string of bits.
Clear your mind of bools and arrays and make a complete list of the operations you actually need to perform, how frequently they happen, and how fast they have to be. Ideally consider whether your speed requirement is average speed or worst case speed. (There are many data structures that attain high average speed by having one expensive operation for every thousand cheap operations; if having any expensive operations is unacceptable then you need to know that up front.)
Once you have that list, you can then do research on what data structures work well.
For example, suppose your list of operations is:
construct bit sequences on the order of 32 bits
concatenate on the order of 3000 bit sequences together to form new bit sequences
insert new bit sequences into existing long bit sequences at specific locations, quickly
Given just that list of operations, I'd think that the data structure you want is a catenable deque. Catenable deques support fast insertion on either end, and can be broken up into two deques efficiently. Inserting stuff into the middle of a deque is then easily done: break the deque up, insert the item into the end of one half, and join them back up again.
However, if you then add another operation to the problem, say, "search for a particular bit string anywhere in the 90000-bit sequence, and find the result in sublinear time" then just a catenable deque isn't going to do it. Searching a deque is slow. There are other data structures that support that operation.
If I understood correctly you are encoding the bit array in a bool[]. The first obvious optimisation would be to change this to int[] (or even long[]) and take advantage of bit operations on a whole machine word, where possible.
For example, this would make shifts more efficient by ~ a factor 4.
Is the BitArray class no help?
A BitArray would probably be faster than a boolean array but you would still not get built-in support to shift 96000 bits.
GPUs are extremely good at massive bit operations. Maybe Brahma, CUDA.NET, or Accelerator could be of service?
Let me understand; currently, you're using a sequence of 32-bit values for a "chromosome". Are we talking about DNA chromosomes or neuroevolutionary algorithmic chromosomes?
If it's DNA, you deal with 4 values; A,C,G,T. That can be coded in 2 bits, making a byte able to hold 4 values. Your 3000-element chromosome sequence can be stored in a 750-element byte array; that's nothing, really.
Your two most expensive operations are to and from the compressed bitstream. I would recommend a byte-keyed enum:
public enum DnaMarker : byte { A, C, G, T };
Then, you go from 4 of these to a byte with one operation:
public static byte ToByteCode(this DnaMarker[] markers)
{
byte output = 0;
for(byte i=0;i<4;i++)
output = (output << 2) + (byte)markers[i];
}
... and parse them back out with something like this:
public static DnaMarker[] ToMarkers(this byte input)
{
var result = new byte[4];
for(byte i=0;i<4;i++)
result[i] = (DnaMarker)(input - (input >> (2*(i+1))));
return result;
}
You might see a slight performance increase using four parameters (output if necessary) versus allocating and using an array in the heap. But, you lose the iteration which makes the code more compact.
Now, because you're packing them into four-byte "blocks", if your sequence length isn't always an exact multiple of four you'll end up "padding" the end of your block with zero values (A). Working around this is messy, but if you had a 32-bit integer that told you the exact number of markers, you can simply discard anything more you find in the bytestream.
From here, possibilities are endless; you can convert the enum array to a string by simply calling ToString() on each one, and likewise you can feed in a string and get an enum array by iterating through using Enum.Parse().
And always remember, unless memory is at a premium (usually it isn't), it's almost always faster to deal with the data in an easily-usable format instead of the most compact format. The one big exception is in network transmission; if you had to send 750 bytes vs 12KB over the Internet, there's an obvious advantage in the smaller size.
I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple:
Add a new value
Iterate over all of the values.
There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives.
I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions.
EDIT: To answer your questions:
No, the items don't need to be sorted
By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this.
There could be a decent number of these sets in memory at once (~100).
Use the BitArray. It uses only some 6MB of memory; the only real problem is that iteration is Theta(N), i.e. you have to walk the entire range. Locality of reference is good though and you can allocate the entire structure in one operation.
As for wasting space: you waste 6MB in the worst case.
EDIT: ok, you've lots of sets and you're serializing. For serializing on disk, I suggest 6MB files :)
For sending over the wire, just iterate and consider sending ranges instead of individual elements. That does require a sorting structure.
You need lots of these sets. Consider if you have 600MB to spare. Otherwise, check out:
Bytewise tries: O(1) insert, O(n) iteration, much lower constant factors than bitwise tries
A custom hash table, perhaps Google sparsehash through C++/CLI
BSTs storing ranges/intervals
Supernode BSTs
It would depend on the distribution of the sizes of your sets. Unless you expect most of the sets to be (close to) the minimum you've specified, I'd probably use a bitset. To cover a range up to 50,000,000, a bitset ends up ~6 megabytes.
Compared to storing the numbers directly, this is marginally larger for the minimum size set you've specified (~6 megabytes instead of ~4), but considerably smaller for the maximum size set (1/32nd the size).
The second possibility would be to use a delta encoding. For example, instead of storing each number directly, store the difference between that number and the previous number that was included. Given a maximum magnitude of 50,000,000 and a minimum size of 1,000,000 items, the average difference between one number and the next is ~50. This means you can theoretically store the difference in <6 bits on average. I'd probably use the 7 least significant bits directly, and if you need to encode a larger gap, set the msb and (for example) store the size of the gap in the lower 7 bits plus the next three bytes. That can't happen very often, so in most cases you're using only one byte per number, for about 4:1 compression compared to storing numbers directly. In the best case this would use ~1 megabyte for a set, and in the worst about 50 megabytes -- 4:1 compression compared to storing numbers directly.
If you don't mind a little bit of extra code, you could use an adaptive scheme -- delta encoding for small sets (up to 6,000,000 numbers), and a bitmap for larger sets.
I think the answer depends on what you mean by "passing around" and what you're trying to accomplish. You say you are only adding to the list: how often do you add? How fast will the list grow? What is an acceptable overhead for memory use, versus the time to reallocate memory?
In your worst case, 50,000,000 32-bit numbers = 200 megabytes using the most efficient possible data storage mechanism. Assuming you may end up with this much use in your worst case scenario, is it OK to use this much memory all the time? Is that better than having to reallocate memory frequently? What's the distribution of typical usage patterns? You could always just use an int[] that's pre-allocated to the whole 50 million.
As far as access speed for your operations, nothing is faster than iterating and adding to a pre-allocated chunk of memory.
From OP edit: There could be a decent number of these sets in memory at once (~100).
Hey now. You need to store 100 sets of 1 to 50 million numbers in memory at once? I think the bitset method is the only possible way this could work.
That would be 600 megabytes. Not insignificant, but unless they are (typically) mostly empty, it seems very unlikely that you would find a more efficient storage mechanism.
Now, if you don't use bitsets, but rather use dynamically sized constructs, and they could somehow take up less space to begin with, you're talking about a real ugly memory allocation/deallocation/garbage collection scenario.
Let's assume you really need to do this, though I can only imagine why. So your server's got a ton of memory, just allocate as many of these 6 megabyte bitsets as you need and recycle them. Allocation and garbage collection are no longer a problem. Yeah, you're using a ton of memory, but that seems inevitable.