Why is it that i cannot use the normal array functions in C# like:
string[] k = {"Hello" , "There"};
k.RemoveAt(index); //Not possible
Code completion comes with suggestions like All<>, Any<>, Cast<> or Average<>, but no function to remove strings from the array. This happens with all kind of arrays. Is this because my build target is set to .NET 4.5.1?
You cannot "Add" or "Remove" items from an array, nor should you, as arrays are defined to be a fixed size. The functions you mention (All, Any) are there because Array<T> implements IEnumerable<T> and so you get access to the LINQ extensions.
While it does implement IList<T>, the methods will throw a NotSupportedException. In your case, to "remove" the string, just do:
k[index] = String.Empty; //Or null, whichever you prefer
The length of an array is fixed when it's created and doesn't change, it represents a block of memory. Arrays do actually implement IList/IList<T>, but only partially - any method that tries to change the array is only available after casting and will throw an exception. Arrays are used internally in most collections.
If you need to add and remove arbitrarily and have fast acces by index you should use a List<T> which uses a resizing array internally.
Related
Although perhaps a bizare thing to want to do, I need to create an Array in .Net with a lower bound > 0. This at first seems to be possible, using:
Array.CreateInstance(typeof(Object), new int[] {2}, new int[] {9});
Produces the desired results (an array of objects with a lower bound set to 9). However the created array instance can no longer be passed to other methods expecting Object[] giving me an error saying that:
System.Object[*] can not be cast into a System.Object[]. What is this difference in array types and how can I overcome this?
Edit: test code =
Object x = Array.CreateInstance(typeof(Object), new int[] {2}, new int[] {9});
Object[] y = (Object[])x;
Which fails with: "Unable to cast object of type 'System.Object[*]' to type 'System.Object[]'."
I would also like to note that this approach DOES work when using multiple dimensions:
Object x = Array.CreateInstance(typeof(Object), new int[] {2,2}, new int[] {9,9});
Object[,] y = (Object[,])x;
Which works fine.
The reason why you can't cast from one to the other is that this is evil.
Lets say you create an array of object[5..9] and you pass it to a function F as an object[].
How would the function knows that this is a 5..9 ? F is expecting a general array but it's getting a constrained one. You could say it's possible for it to know, but this is still unexpected and people don't want to make all sort of boundary checks everytime they want to use a simple array.
An array is the simplest structure in programming, making it too complicated makes it unsusable. You probably need another structure.
What you chould do is a class that is a constrained collection that mimics the behaviour you want. That way, all users of that class will know what to expect.
class ConstrainedArray<T> : IEnumerable<T> where T : new()
{
public ConstrainedArray(int min, int max)
{
array = new T[max - min];
}
public T this [int index]
{
get { return array[index - Min]; }
set { array[index - Min] = value; }
}
public int Min {get; private set;}
public int Max {get; private set;}
T[] array;
public IEnumerator<T> GetEnumerator()
{
return array.GetEnumarator();
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return array.GetEnumarator();
}
}
I'm not sure about why that can't be passed as Object[], but wouldn't be easy if you just create a real class to wrap an array and handle your "weird logic" in there?
You'd get the benefits of using a real reference object were you could add "intelligence" to your class.
Edit: How are you casting your Array, could you post some more code? Thanks.
Just store your lower bound in a const offset integer, and subtract that value from whatever your source returns as the index.
Also: this is an old VB6 feature. I think there might be an attribute to help support it.
The .NET CLR differentiates between two internal array object formats: SZ arrays and MZ arrays. MZ arrays can be multi-dimensional and store their lower bounds in the object.
The reason for this difference is two-fold:
Efficient code generation for single-dimensional arrays requires that there is no lower bound. Having a lower bound is incredibly uncommon. We would not want to sacrifice significant performance in the common case for this rarely used feature.
Most code expects arrays with zero lower bound. We certainly don't want to pollute all of our code with checking the lower bound or adjusting loop bounds.
These concerns are solved by making a separate CLR type for SZ arrays. This is the type that almost all practically occurring arrays are using.
Know it's old question, but to fully explain it.
If type (in this case a single-dimension array with lower bound > 0) can't be created by typed code, simply reflected type instance can't be consumed by typed code then.
What you have noticed is already in documentation:
https://learn.microsoft.com/en-us/dotnet/framework/reflection-and-codedom/specifying-fully-qualified-type-names
Note that from a runtime point of view, MyArray[] != MyArray[*], but
for multidimensional arrays, the two notations are equivalent. That
is, Type.GetType("MyArray [,]") == Type.GetType("MyArray[*,*]")
evaluates to true.
In c#/vb/... you can keep that reflected array in object, pass around as object, and use only reflection to access it's items.
-
Now you ask "why is there LowerBound at all?", well COM object aren't .NET, it could be written in old VB6 that actually had array object that has LowerBound set to 1 (or anything VB6 had such freedom or curse, depends whom you ask). To access first element of such object you would actually need to use 'comObject(1)' instead of 'comObject(0)'. So the reason to check lower bound is when you are performing enumeration of such object to know where to start enumeration, since element functions in COM object expects first element to be of LowerBound value, and not Zero (0), it was reasonable to support same logic on such instances. Imagine your get element value of first element at 0, and use some Com object to pass such element instance with index value of 1 or even with index value of 2001 to a method, code would be very confusing.
To put it simply: it's mostly for legacy support only!
Looking over the source of List<T>, it seems that there's no good way to access the private _items array of items.
What I need is basically a dynamic list of structs, which I can then modify in place. From my understanding, because C# 6 doesn't yet support ref return types, you can't have a List<T> return a reference to an element, which requires copying of the whole item, for example:
struct A {
public int X;
}
void Foo() {
var list = new List<A> { new A { X = 3; } };
list[0].X++; // this fails to compile, because the indexer returns a copy
// a proper way to do this would be
var copy = list[0];
copy.X++;
list[0] = copy;
var array = new A[] { new A { X = 3; } };
array[0].X++; // this works just fine
}
Looking at this, it's both clunky from syntax point of view, and possibly much slower than modifying the data in place (Unless the JIT can do some magic optimizations for this specific case? But I doubt they could be relied on in the general case, unless it's a special standardized optimization?)
Now if List<T>._items was protected, one could at least subclass List<T> and create a data structure with specific modify operations available. Is there another data structure in .NET that allows this, or do I have to implement my own dynamic array?
EDIT: I do not want any form of boxing or introducing any form of reference semantics. This code is intended for very high performance, and the reason I'm using an array of structs is to have them tighly packed on memory (and not everywhere around heap, resulting in cache misses).
I want to modify the structs in place because it's part of a performance critical algorithm that stores some of it's data in those structs.
Is there another data structure in .NET that allows this, or do I have to implement my own dynamic array?
Neither.
There isn't, and can't be, a data structure in .NET that avoids the structure copy, because deep integration with the C# language is needed to get around the "indexed getter makes a copy" issue. So you're right to think in terms of directly accessing the array.
But you don't have to build your own dynamic array from scratch. Many List<T>-like operations such as Resize and bulk movement of items are provided for you as static methods on type System.Array. They come in generic flavors, so no boxing is involved.
The unfortunate thing is that the high-performance Buffer.BlockCopy, which should work on any blittable type, actually contains a hard-coded check for primitive types and refuses to work on any structure.
So just go with T[] (plus int Count -- array length isn't good enough because trying to keep capacity equal to count is very inefficient) and use System.Array static methods when you would otherwise use methods of List<T>. If you wrap this as a PublicList<T> class, you can get reusability and both the convenience of methods for Add, Insert, Sort as well as direct element access by indexing directly on the array. Just exercise some restraint and never store the handle to the internal array, because it will become out-of-date the next time the list needs to grow its capacity. Immediate direct access is perfectly fine though.
Ok, maybe I'm just lazy but this might be a cool question to have on the interwebs.
I know that Buffer.BlockCopy(...) is faster than Array.Copy(...) when working with byte[]. I was about to write a CloneBuffer helper that would create an array the same size as a source array then copy the source array into it using Buffer.BlockCopy(...) when I instead wrote:
public void Send(byte[] data) {
// Copy caller-provided buffer
var buf = data.ToArray();
// Start async send here and return immediately
}
Does anyone know if the ToArray() method special-cased for byte[] or if this is going to be slower than BlockCopy?
You can look into the Microsoft .NET assemblies using a reflector program, such as ILSpy.
This tells me that the implementation of System.Linq.Enumerable::ToArray() is:
public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source)
{
// ...
return new Buffer<TSource>(source).ToArray();
}
And the constructor of the internal struct Buffer<T> does:
If the source enumerable implements ICollection<T>, then:
allocate an array of Count elements, and
use CopyTo() to copy the collection into the array.
Otherwise:
allocate an array of 4 elements, and
start enumerating the IEnumerable, storing each value in the array.
Is the array too small?
Create a new array that has twice the size of the old one,
and copy the old array's content into the new one,
then use the new array instead, and continue.
And Buffer<T>.ToArray() simply returns the inner array if its size matches the number of elements in it; otherwise copies the inner array to a new array with the exact size.
Note that this Buffer<T> class is internal and not related to the Buffer class you mentioned.
All copying is done using Array.Copy().
So, to conclude: all copying is done using Array.Copy() and there is no optimization for byte arrays. But I don't know whether it is slower than Buffer.BlockCopy(). The only way to know is to measure.
Yes, it is going to be slower.
When you look at the documentation for the System.Array methods, there is no definition for System.Array.ToArray(). In fact, looking at the inheritance/interface tree, it's all the way we have to go back all the way to [IEnumerable.ToArray()][2] before we find this method. Since this was implemented with only the features of IEnumerable to work with, it can't know the size of the resulting array when it begins executing. Instead, it uses a doubling algorithm to build up the array as it runs. So you might end up creating and throwing away several arrays over the course of making the copy, and copying those initial items several time in the course of destroying/recreating each intermediate buffer.
If you want a simpler, naive implementation, at least look at Array.CopyTo(). And remember: I said, "If".
I have been playing around with the BlockingCollection class, and I was wondering why the ToArray() Method is an O(n) operation. Coming from a Java background, the ArrayList's ToArray() method runs in O(1), because it just returns the internal array it uses (elementData). So why in the world do they iterate through all of the items, and create a new Array in the IEnumerable.ToArray method, when they could just override it and return the internal array the collection uses?
Coming from a Java background, the ArrayList's ToArray() method runs in O(1), because it just returns the internal array it uses (elementData).
No, it really doesn't. It creates a copy of the array. From the docs for ArrayList.toArray:
Returns an array containing all of the elements in this list in proper sequence (from first to last element).
The returned array will be "safe" in that no references to it are maintained by this list. (In other words, this method must allocate a new array). The caller is thus free to modify the returned array.
So basically, the premise of your question is flawed in the Java sense.
Now, beyond that, Enumerable.ToArray (the extension method on IEnumerable<T>) in general would be O(N), as there's no guarantee that the sequence is even backed by an array. When it's backed by an IList<T>, it uses IList<T>.CopyTo to make things more efficient, but this is an implementation-specific detail and still doesn't transform it into an O(1) operation.
ArrayList.toArray is not O(1), and it does not just return its internal array. Did you read the API specification?
The returned array will be "safe" in that no references to it are maintained by this list. (In other words, this method must allocate a new array). The caller is thus free to modify the returned array.
First, there's no array to return. BlockingCollection<T> uses an object of type IProducerConsumerCollection<T> for its internal storage, and there's no guarantee that the concrete type being used will be backed by an array. For example the default constructor uses a ConcurrentQueue<T>, which stores its data in a linked list of arrays. Even in the odd case where there is an array which represents the full contents of the collection hiding somewhere in there it won't be exposed through the IProducerConsumerCollection<T> interface.
Second, even assuming there were an array to be returned in the first place (which there isn't), it wouldn't be a safe thing to do. If the calling code made any modifications to the array it would corrupt the internal state of the collection.
I usually find myself doing something like:
string[] things = arrayReturningMethod();
int index = things.ToList<string>.FindIndex((s) => s.Equals("FOO"));
//do something with index
return things.Distinct(); //which returns an IEnumerable<string>
and I find all this mixup of types/interface a bit confusing and it tickles my potential performance problem antennae (which I ignore until proven right, of course).
Is this idiomatic and proper C# or is there a better alternative to avoid casting back and forth to access the proper methods to work with the data?
EDIT:
The question is actually twofold:
When is it proper to use either the IEnumerable interface or an array or a list (or any other IEnumerable implementing type) directly (when accepting parameters)?
Should you freely move between IEnumerables (implementation unknown) and lists and IEnumerables and arrays and arrays and Lists or is that non idiomatic (there are better ways to do it)/ non performant (not typically relevant, but might be in some cases) / just plain ugly (unmaintable, unreadable)?
In regards to performance...
Converting from List to T[] involves copying all the data from the original list to a newly allocated array.
Converting from T[] to List also involves copying all the data from the original list to a newly allocated List.
Converting from either List or T[] to IEnumerable involves casting, which is a few CPU cycles.
Converting from IEnumerable to List involves upcasting, which is also a few CPU cycles.
Converting from IEnumerable to T[] also involves upcasting.
You can't cast an IEnumerable to T[] or List unless it was a T[] or List respectively to begin with. You can use the ToArray or ToList functions, but those will also result in a copy being made.
Accessing all the values in order from start to end in a T[] will, in a straightforward loop, be optimized to use straightforward pointer arithmetic -- which makes it the fastest of them all.
Accessing all the values in order from start to end in a List involves a check on each iteration to make sure that you aren't accessing a value outside the array's bounds, and then the actual accessing of the array value.
Accessing all the values in an IEnumerable involves creating an enumerator object, calling the Next() function which increases the index pointer, and then calling the Current property which gives you the actual value and sticks it in the variable that you specified in your foreach statement. Generally, this isn't as bad as it sounds.
Accessing an arbitrary value in an IEnumerable involves starting at the beginning and calling Next() as many times as you need to get to that value. Generally, this is as bad as it sounds.
In regards to idioms...
In general, IEnumerable is useful for public properties, function parameters, and often for return values -- and only if you know that you're going to be using the values sequentially.
For instance, if you had a function PrintValues, if it was written as PrintValues(List<T> values), it would only be able to deal with List values, so the user would first have to convert, if for instance they were using a T[]. Likewise with if the function was PrintValues(T[] values). But if it was PrintValues(IEnumerable<T> values), it would be able to deal with Lists, T[]s, stacks, hashtables, dictionaries, strings, sets, etc -- any collection that implements IEnumerable, which is practically every collection.
In regards to internal use...
Use a List only if you're not sure how many items will need to be in it.
Use a T[] if you know how many items will need to be in it, but need to access the values in an arbitrary order.
Stick with the IEnumerable if that's what you've been given and you just need to use it sequentially. Many functions will return IEnumerables. If you do need to access values from an IEnumerable in an arbitrary order, use ToArray().
Also, note that casting is different from using ToArray() or ToList() -- the latter involves copying the values, which is indeed a performance and memory hit if you have a lot of elements. The former simply is to say that "A dog is an animal, so like any animal, it can eat" (downcast) or "This animal happens to be a dog, so it can bark" (upcast). Likewise, All Lists and T[]s are IEnumerables, but only some IEnumerables are Lists or T[]s.
A good rule of thumb is to always use IEnumerable (when declaring your variables/method parameters/method return types/properties/etc.) unless you have a good reason not to. By far the most type-compatible with other (especially extension) methods.
Well, you've got two apples and an orange that you are comparing.
The two apples are the array and the List.
An array in C# is a C-style array that has garbage collection built in. The upside of using them it that they have very little overhead, assuming you don't need to move things around. The bad thing is that they are not as efficient when you are adding things, removing things, and otherwise changing the array around, as memory gets shuffled around.
A List is a C# style dynamic array (similar to the vector<> class in C++). There is more overhead, but they are more efficient when you need to be moving things around a lot, as they will not try to keep the memory usage contiguous.
The best comparison I could give is saying that arrays are to Lists as strings are to StringBuilders.
The orange is 'IEnumerable'. This is not a datatype, but rather it is an interface. When a class implements the IEnumerable interface, it allows that object to be used in a foreach() loop.
When you return the list (as you did in your example), you were not converting the list to an IEnumerable. A list already is an IEnumerable object.
EDIT: When to convert between the two:
It depends on the application. There is very little that can be done with an array that cannot be done with a List, so I would generally recommend the List. Probably the best thing to do is to make a design decision that you are going to use one or the other, that way you don't have to switch between the two. If you rely on an external library, abstract it away to maintain consistent usage.
Hope this clears a little bit of the fog.
Looks to me like the problem is that you haven't bothered learning how to search an array. Hint: Array.IndexOf or Array.BinarySearch depending on whether the array is sorted.
You're right that converting to a list is a bad idea: it wastes space and time and makes the code less readable. Also, blindly upcasting to IEnumerable slows matters down and also completely prevents use of certain algorithms (such as binary search).
I try to avoid rapidly jumping between data types if it can be avoided.
It must be the case that each situation similar to that you described is sufficiently different so as to prevent a dogmatic rule about transforming your types; however, it is generally good practice to select a data structure that provides as best as possible the interface you need without having to copying elements needlessly to new data structures.
When to use what?
I would suggest returning the most specific type, and taking in the most flexible type.
Like this:
public int[] DoSomething(IEnumerable<int> inputs)
{
//...
}
public List<int> DoSomethingElse(IList<int> inputs)
{
//...
}
That way you can call methods on List< T > for whatever you get back from the method in addition to treating it as an IEnumerable. On the inputs, use as flexible as possible, so you don't dictate the users of your method what kind of collection to create.
You're right to ignore the 'performance problem' antennae until you actually have a performance problem. Most performance problems come from doing too much I/O or too much locking or doing one of them wrong, and none of these apply to this question.
My general approach is:
Use T[] for 'static' or 'snapshot'-style information. Use for things where calling .Add() wouldn't make sense anyway, and you don't need the extra methods List<T> gives you.
Accept IEnumerable<T> if you don't really care what you're given and don't need a constant-time .Length/.Count.
Only return IEnumerable<T> when you're doing simple manipulations of an input IEnumerable<T> or when you specifically want to make use of the yield syntax to do your work lazily.
In all other cases, use List<T>. It's just too flexible.
Corollary to #4: don't be afraid of ToList(). ToList() is your friend. It forces the IEnumerable<T> to evaluate right then (useful for when you're stacking several where clauses). Don't go nuts with it, but feel free to call it once you've built up your full where clause before you do the foreach over it (or the like).
Of course, this is just a rough guideline. Just please try to follow the same pattern in the same codebase -- code styles that jump around make it harder for maintenance coders to get into your frame of mind.