Ok, maybe I'm just lazy but this might be a cool question to have on the interwebs.
I know that Buffer.BlockCopy(...) is faster than Array.Copy(...) when working with byte[]. I was about to write a CloneBuffer helper that would create an array the same size as a source array then copy the source array into it using Buffer.BlockCopy(...) when I instead wrote:
public void Send(byte[] data) {
// Copy caller-provided buffer
var buf = data.ToArray();
// Start async send here and return immediately
}
Does anyone know if the ToArray() method special-cased for byte[] or if this is going to be slower than BlockCopy?
You can look into the Microsoft .NET assemblies using a reflector program, such as ILSpy.
This tells me that the implementation of System.Linq.Enumerable::ToArray() is:
public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source)
{
// ...
return new Buffer<TSource>(source).ToArray();
}
And the constructor of the internal struct Buffer<T> does:
If the source enumerable implements ICollection<T>, then:
allocate an array of Count elements, and
use CopyTo() to copy the collection into the array.
Otherwise:
allocate an array of 4 elements, and
start enumerating the IEnumerable, storing each value in the array.
Is the array too small?
Create a new array that has twice the size of the old one,
and copy the old array's content into the new one,
then use the new array instead, and continue.
And Buffer<T>.ToArray() simply returns the inner array if its size matches the number of elements in it; otherwise copies the inner array to a new array with the exact size.
Note that this Buffer<T> class is internal and not related to the Buffer class you mentioned.
All copying is done using Array.Copy().
So, to conclude: all copying is done using Array.Copy() and there is no optimization for byte arrays. But I don't know whether it is slower than Buffer.BlockCopy(). The only way to know is to measure.
Yes, it is going to be slower.
When you look at the documentation for the System.Array methods, there is no definition for System.Array.ToArray(). In fact, looking at the inheritance/interface tree, it's all the way we have to go back all the way to [IEnumerable.ToArray()][2] before we find this method. Since this was implemented with only the features of IEnumerable to work with, it can't know the size of the resulting array when it begins executing. Instead, it uses a doubling algorithm to build up the array as it runs. So you might end up creating and throwing away several arrays over the course of making the copy, and copying those initial items several time in the course of destroying/recreating each intermediate buffer.
If you want a simpler, naive implementation, at least look at Array.CopyTo(). And remember: I said, "If".
Related
Looking over the source of List<T>, it seems that there's no good way to access the private _items array of items.
What I need is basically a dynamic list of structs, which I can then modify in place. From my understanding, because C# 6 doesn't yet support ref return types, you can't have a List<T> return a reference to an element, which requires copying of the whole item, for example:
struct A {
public int X;
}
void Foo() {
var list = new List<A> { new A { X = 3; } };
list[0].X++; // this fails to compile, because the indexer returns a copy
// a proper way to do this would be
var copy = list[0];
copy.X++;
list[0] = copy;
var array = new A[] { new A { X = 3; } };
array[0].X++; // this works just fine
}
Looking at this, it's both clunky from syntax point of view, and possibly much slower than modifying the data in place (Unless the JIT can do some magic optimizations for this specific case? But I doubt they could be relied on in the general case, unless it's a special standardized optimization?)
Now if List<T>._items was protected, one could at least subclass List<T> and create a data structure with specific modify operations available. Is there another data structure in .NET that allows this, or do I have to implement my own dynamic array?
EDIT: I do not want any form of boxing or introducing any form of reference semantics. This code is intended for very high performance, and the reason I'm using an array of structs is to have them tighly packed on memory (and not everywhere around heap, resulting in cache misses).
I want to modify the structs in place because it's part of a performance critical algorithm that stores some of it's data in those structs.
Is there another data structure in .NET that allows this, or do I have to implement my own dynamic array?
Neither.
There isn't, and can't be, a data structure in .NET that avoids the structure copy, because deep integration with the C# language is needed to get around the "indexed getter makes a copy" issue. So you're right to think in terms of directly accessing the array.
But you don't have to build your own dynamic array from scratch. Many List<T>-like operations such as Resize and bulk movement of items are provided for you as static methods on type System.Array. They come in generic flavors, so no boxing is involved.
The unfortunate thing is that the high-performance Buffer.BlockCopy, which should work on any blittable type, actually contains a hard-coded check for primitive types and refuses to work on any structure.
So just go with T[] (plus int Count -- array length isn't good enough because trying to keep capacity equal to count is very inefficient) and use System.Array static methods when you would otherwise use methods of List<T>. If you wrap this as a PublicList<T> class, you can get reusability and both the convenience of methods for Add, Insert, Sort as well as direct element access by indexing directly on the array. Just exercise some restraint and never store the handle to the internal array, because it will become out-of-date the next time the list needs to grow its capacity. Immediate direct access is perfectly fine though.
While looking at the Implementation of List.AddRange i found something odd i do not understand.
Sourcecode, see line 727 (AddRange calls InsertRange)
T[] itemsToInsert = new T[count];
c.CopyTo(itemsToInsert, 0);
itemsToInsert.CopyTo(_items, index);
Why doest it Copy the collection into a "temp-array" (itemsToInsert) first and then copies the temp array into the actual _items-array?
Is there any reason behind this, or is this just some leftover from copying ArrayList's source, because the same thing happens there.
My guess is that this is to hide the existence of the internal backing array. There is no way to obtain a reference to that array which is intentional. The List class does not even promise that there is such an array. (Of course, for performance and for compatibility reasons it will always be implemented with an array.)
Someone could pass in a crafted ICollection<T> that remembers the array that it is passed. Now callers can mess with the internal array of List and start depending on List internals.
Contrast this with MemoryStream which has a documented way to access the internal buffer (and shoot yourself with it): GetBuffer().
Why is it that i cannot use the normal array functions in C# like:
string[] k = {"Hello" , "There"};
k.RemoveAt(index); //Not possible
Code completion comes with suggestions like All<>, Any<>, Cast<> or Average<>, but no function to remove strings from the array. This happens with all kind of arrays. Is this because my build target is set to .NET 4.5.1?
You cannot "Add" or "Remove" items from an array, nor should you, as arrays are defined to be a fixed size. The functions you mention (All, Any) are there because Array<T> implements IEnumerable<T> and so you get access to the LINQ extensions.
While it does implement IList<T>, the methods will throw a NotSupportedException. In your case, to "remove" the string, just do:
k[index] = String.Empty; //Or null, whichever you prefer
The length of an array is fixed when it's created and doesn't change, it represents a block of memory. Arrays do actually implement IList/IList<T>, but only partially - any method that tries to change the array is only available after casting and will throw an exception. Arrays are used internally in most collections.
If you need to add and remove arbitrarily and have fast acces by index you should use a List<T> which uses a resizing array internally.
I have been playing around with the BlockingCollection class, and I was wondering why the ToArray() Method is an O(n) operation. Coming from a Java background, the ArrayList's ToArray() method runs in O(1), because it just returns the internal array it uses (elementData). So why in the world do they iterate through all of the items, and create a new Array in the IEnumerable.ToArray method, when they could just override it and return the internal array the collection uses?
Coming from a Java background, the ArrayList's ToArray() method runs in O(1), because it just returns the internal array it uses (elementData).
No, it really doesn't. It creates a copy of the array. From the docs for ArrayList.toArray:
Returns an array containing all of the elements in this list in proper sequence (from first to last element).
The returned array will be "safe" in that no references to it are maintained by this list. (In other words, this method must allocate a new array). The caller is thus free to modify the returned array.
So basically, the premise of your question is flawed in the Java sense.
Now, beyond that, Enumerable.ToArray (the extension method on IEnumerable<T>) in general would be O(N), as there's no guarantee that the sequence is even backed by an array. When it's backed by an IList<T>, it uses IList<T>.CopyTo to make things more efficient, but this is an implementation-specific detail and still doesn't transform it into an O(1) operation.
ArrayList.toArray is not O(1), and it does not just return its internal array. Did you read the API specification?
The returned array will be "safe" in that no references to it are maintained by this list. (In other words, this method must allocate a new array). The caller is thus free to modify the returned array.
First, there's no array to return. BlockingCollection<T> uses an object of type IProducerConsumerCollection<T> for its internal storage, and there's no guarantee that the concrete type being used will be backed by an array. For example the default constructor uses a ConcurrentQueue<T>, which stores its data in a linked list of arrays. Even in the odd case where there is an array which represents the full contents of the collection hiding somewhere in there it won't be exposed through the IProducerConsumerCollection<T> interface.
Second, even assuming there were an array to be returned in the first place (which there isn't), it wouldn't be a safe thing to do. If the calling code made any modifications to the array it would corrupt the internal state of the collection.
If my understanding of deep and shallow copying is correct my question is an impossible one.
If you have an array (a[10]) and perform a shallow copy (b[20]) wouldn't this be impossible as the data in b wouldn't be contiguous?
If i've got this completely wrong could someone advise a fast way to immitate (in c#) c++'s ability to do a realloc in order to resize an array.
NOTE
Im looking at the .Clone() and .Copy() members of the System.Array object.
You can't resize an existing array, however, you can use:
Array.Resize(ref arr, newSize);
This allocates a new array, copies the data from the old array into the new array, and updates the arr variable (which is passed by-ref in this case). Is that what you mean?
However, any other references still pointing at the old array will not be updated. A better option might be to work with List<T> - then you don't need to resize it manually, and you don't have the issue of out-of-date references. You just Add/Remove etc. Generally, you don't tend to use arrays directly very often. They have their uses, but they aren't the default case.
Re your comments;
boxing: List<T> doesn't box. That is one of the points about generics; under the hood, List<T> is a wrapper around T[], so a List<int> has an int[] - no boxing. The older ArrayList is a wrapper around object[], so that does box; of course, boxing isn't as bad as you might assume anyway.
workings of Array.Resize; if I recall, it finds the size of T, then uses Buffer.BlockCopy to blit the contents the actual details are hidden by an internal call - but essentially after allocating a new array it is a blit (memcpy) of the data between the two arrays, so it should be pretty quick; note that for reference-types this only copies the reference, not the object on the heap. However, if you are resizing regularly, List<T> would usually be a lot simpler (and quicker unless you basically re-implement what List<T> does re spare capacity to minimise the number of resizes).