Array.Copy vs Skip and Take in c# - c#

I was browsing this question and some similar ones:
Getting a sub-array from an existing array
Many places I read answers like this:
Getting a sub-array from an existing array
What I am wondering is why Skip and Take are not constant time operations for arrays?
In turn, if they were constant time operations, won't the Skip and Take method (without calling ToArray() at the end) have the same running time without the overhead of doing an Array.Copy, but also more space efficient?

You have to differentiate between the work that the Skip and Take methods do, and the work of consuming the data that the methods return.
The Skip and Take methods themselves are O(1) operations, as the work they do does not scale with the input size. They just set up an enumerator that is capable of returning items from the array.
It's when you use the enumerator that the work is done. That is an O(n) operation, where n is the number of items that the enumerator produces. As the enumerators read from the array, they don't contain a copy of the data, and you have to keep the data in the array intact as long as you are using the enumerator.
(If you use Skip on a collection that is not accessible by index like an array, gettting the first item is an O(n) operation, where n is the number of items skipped.)

Related

How to set count of c# without altering capacity?

I know exactly how many items I want to keep in a list, they are ordered, I only need it to finish at an specific index I know, but I don't want to alter the capacity or use TrimExcess in order to make it smaller, otherwise after adding an item again it will double the size of the list again.
How can I set the Count instead of using Remove or RemoveAt or RemoveRange?.
My priority is optimization of speed for this operation.
Important: I know I can use an array, but I am not allowed. Also, I'm adding items and removing them all the time. I just want the capacity to stay around a similar amount which I don't know exactly but it will stabilize.
If you remove elements, the Capacity won't change. So if you don't use TrimExcess(), the Capacity will only ever increase (to the maximum you ever used for this list). So there's no performance penalty in removing elements again. You can set the initial capacity in the constructor, which is a good idea if you know how many elements you'll be using (or have an estimate for it), because that will remove the overload of the doubling while initially building up the list.
Note: Insert/Remove in a list is still O(n), because the elements eventually need to be compied around (unless you operate only at the tail end of the list).
Use an array and (in C# 8.0 onwards) use Indices and Ranges with slicing.
https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-8#indices-and-ranges

Uses of method BinarySearch with ArrayList

In C# we have ArrayList to store various type of data and objects.
In ArrayList we have IndexOf method which returns the index of first occurrence with in the ArrayLIst.
Also in ArrayList we have BinarySearch method which searches the Sorted ArrayList for an element and returns it's index.
My query :
To my understanding BinarySearch and IndexOf are doing same task.In what scenario will BinarySearch method be useful ? I understand that BinarySearch needs a sorted ArrayList .So can we say that when ArrayList is sorted one should use BinarySearch and when not sorted we should use IndexOf method ? Also in Sorted ArrayList which method gives high performance : BinarySearch or IndexOf ?
Definitions
BinarySearch works by progressively halving the size of the search space. It relies on the data first being ordered by the search parameter to accomplish this. Its performance is logarithmic - O(log n)
IndexOf will perform a linear search to find the index of the item - O(n).
Consequences
This effectively means that in an ArrayList of 1000 values, where the item being sought is at the end of the list, IndexOf would have to examine 1000 values to find the result. BinarySearch would first check the middle value, then the middle of the remaining values and so on, effectively only examining 10 items in total before returning the correct result.
Of course, in practice it is unlikely that the sought item will always be at the end of the list, so 1000 comparisons is only the worst-case scenario for a linear search. If the item were the first one in the list, IndexOf would out-perform BinarySearch.
As with all algorithms, which to use depends heavily on what you are trying to accomplish and the nature of your data.
If your data is unsorted and you do not want to change the order of the items in your ArrayList, or if comparing data is an expensive operation, BinarySearch could be far more computationally expensive than IndexOf despite performing fewer comparisons on average due to the need to make a copy of the ArrayList and sort that copy.
If the item you need to find generally tends to be one of the first items in your ArrayList (on average) then IndexOf would probably be the best option to use.
Similarly if you have a very small array (in the order of 10 items), BinarySearch will not yield significantly better results.
Code relying on BinarySearch may also be more difficult to maintain - Your code must document the fact that maintaining ordering of the data is essential to the correct performance of the application - otherwise another developer might later alter the code to something that re-orders the data invalidating the binary search and breaking the application.
If your data is already sorted, (ie. it doesn't need to be sorted just to make it ordered for the purposes of searching), then a BinarySearch will almost always outperform IndexOf when searching for an item in a list of more than a handful of values... But the level of performance gain might be completely insignificant in an application that is also performing any other non-trivial tasks (such as I/O bound activities).
Recommendation
In general, one should favour the simpler operation which has no requirements or side-effects (ie. IndexOf) until it becomes apparent (through profiling) that BinarySearch would significantly improve the application's usability or efficiency.
Whenever you choose an algorithm, always document the reason why. It will help other developers understand the code, and sometimes that "other" developer will be you, reviewing code years after you've forgotten why you chose one algorithm over another in the first place.

How large per list item does List<uint> get in .NET 4.0?

I know that it takes 4 bytes to store a uint in memory, but how much space in memory does it take to store List<uint> for, say, x number of uints?
How does this compare to the space required by uint[]?
There is no per-item overhead of a List<T> because it uses a T[] to store its data. However, a List<T> containing N items may have 2N elements in its T[]. Also, the List data structure itself has probably 32 or more bytes of overhead.
You probably will notice not so much difference between T[] and list<T> but you can use
System.GC.GetTotalMemory(true);
before and after an object allocation to obtain an approximate memory usage.
List<> uses an array internally, so a List<uint> should take O(4bytes * n) space, just like a uint[]. There may be some more constant overhead in comparison to a array, but you should normally not care about this.
Depending on the specific implementation (this may be different when using Mono as a runtime instead of the MS .NET runtime), the internal array will be bigger than the number of actual items in the list. E.g.: a list of 5 elements has an internal array that can store 10, a list of 10000 elements may have an internal array of size 11000. So you cant generally say that the internal array will always be twice as big, or 5% bigger than the number of list element, it may also depend on the size.
Edit: I've just seen, Hans Passant has described the growing behaviour of List<T> here.
So, if you have a collection of items that you want to append to, and you cant know the size of this collection at the time the list is created, use a List<T>. It is specifically designed for this case. It provides fast random access O(1) to the elements, and has very little memory overhead (internal array). It is on the other hand very slow on removing or inserting in the middle of the list. If you need those operations often, use a LinkedList<T>, which has then more memory overhead (per item!), however. If you know the size of you collection from the beginning, and you know that is wont change (or just very few times) use arrays.

C# LINQ First() faster than ToArray()[0]?

I am running a test.
It looks like:
method 1)
List<int> = new List<int>{1,2,4, .....} //assume 1000k
var result ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)).First();
method 2)
List<int> = new List<int>{1,2,4, .....} //assume 1000k
var result = ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)).ToArray()[0];
Why is method 2 is so slow compared to method 1?
You have a jar containing a thousand coins, many of which are dimes. You want a dime. Here are two methods for solving your problem:
Pull coins out of the jar, one at a time, until you get a dime. Now you've got a dime.
Pull coins out of the jar, one at a time, putting the dimes in another jar. If that jar turns out to be too small, move them, one at a time, to a larger jar. Keep on doing that until you have all of the dimes in the final jar. That jar is probably too big. Manufacture a jar that is exactly big enough to hold that many dimes, and then move the dimes, one at a time, to the new jar. Now start taking dimes out of that jar. Take out the first one. Now you've got a dime.
Is it now clear why method 1 is a whole lot faster than method 2?
Erm... because you are creating an extra array (rather than just using the iterator). The first approach stops after the first match (Where is a non-buffered streaming API). The second loads all the matches into an array (presumably with several re-sizes), then takes the first item.
As a side note; you can create infinite sequences; the first approach would still work, the second would run forever (or explode).
It could also be:
var result ErrorCodes.First(x => ReturnedErrorCodes.Contains(x));
(that won't make it any faster, but is perhaps easier to read)
Because of deferred execution.
The code ErrorCodes.Where(x => ReturnedErrorCodes.Contains(x)) doesn't return a collection of integers, instead it returns an expression that is capable of returning a stream of integers. It doesn't do any actual work until you start reading integers from it.
The ToArray method will consume the entire stream and put all the integers in an array. This means that every item in the entire list has to be compared to the error codes.
The First method on the other hand will only get the first item from the stream, and then stop reading from the stream. This will make it a lot faster, because it will stop comparing items from the list to the error codes as soon as it finds a match.
Because ToArray() copies the entire sequence to an array.
Method 2 has to iterate over the whole sequence to build an array, and then returns the first element.
Method 1 just iterates over enough of the sequence to find the first matching element.
ToArray() walks through the whole sequence it has been given and creates and array out of it.
If you don't callt ToArray(), First() lets Where() return just the first item that matches and immediatelly returns.
First() is complexity of O(1)
ToArray()[0] is complexity O(n)+1
var #e = array.GetEnumerator();
// First
#e.MoveNext();
return #e.Current;
// ToArray (with yield [0] should as fast as First...)
while (#e.MoveNext() {
yield return #e.Current;
}
Because in the second example, you are actually converting the IEnumerable<T> to an array, whereas in the first example, no conversion is taking place.
In method 2 the entire array must be converted to an array first. Also, it seems awkward to mix array access when First() is so much more readable.
This makes sense, ToArray probably involves a copy, which is always going to be more expensive, since Linq can't make any guarantees about how you're going to use your array, while First() can just return the single element at the beginning.

Array that can be resized fast

I'm looking for a kind of array data-type that can easily have items added, without a performance hit.
System.Array - Redim Preserve copies entire RAM from old to new, as slow as amount of existing elements
System.Collections.ArrayList - good enough?
System.Collections.IList - good enough?
Just to summarize a few data structures:
System.Collections.ArrayList: untyped data structures are obsolete. Use List(of t) instead.
System.Collections.Generic.List(of t): this represents a resizable array. This data structure uses an internal array behind the scenes. Adding items to List is O(1) as long as the underlying array hasn't been filled, otherwise its O(n+1) to resize the internal array and copy the elements over.
List<int> nums = new List<int>(3); // creates a resizable array
// which can hold 3 elements
nums.Add(1);
// adds item in O(1). nums.Capacity = 3, nums.Count = 1
nums.Add(2);
// adds item in O(1). nums.Capacity = 3, nums.Count = 3
nums.Add(3);
// adds item in O(1). nums.Capacity = 3, nums.Count = 3
nums.Add(4);
// adds item in O(n). Lists doubles the size of our internal array, so
// nums.Capacity = 6, nums.count = 4
Adding items is only efficient when adding to the back of the list. Inserting in the middle forces the array to shift all items forward, which is an O(n) operation. Deleting items is also O(n), since the array needs to shift items backward.
System.Collections.Generic.LinkedList(of t): if you don't need random or indexed access to items in your list, for example you only plan to add items and iterate from first to last, then a LinkedList is your friend. Inserts and removals are O(1), lookup is O(n).
You should use the Generic List<> (System.Collections.Generic.List) for this. It operates in constant amortized time.
It also shares the following features with Arrays.
Fast random access (you can access any element in the list in O(1))
It's quick to loop over
Slow to insert and remove objects in the start or middle (since it has to do a copy of the entire listbelieve)
If you need quick insertions and deletions in the beginning or end, use either linked-list or queues
Would the LinkedList< T> structure work for you? It's not (in some cases) as intuitive as a straight array, but is very quick.
AddLast to append to the end
AddBefore/AddAfter to insert into list
AddFirst to append to the beginning
It's not so quick for random access however, as you have to iterate over the structure to access your items... however, it has .ToList() and .ToArray() methods to grab a copy of the structure in list/array form so for read access, you could do that in a pinch. The performance increase of the inserts may outweigh the performance decrease of the need for random access or it may not. It will depend entirely on your situation.
There's also this reference which will help you decide which is the right way to go:
When to use a linked list over an array/array list?
What is "good enough" for you? What exactly do you want to do with that data structure?
No array structure (i.e. O(n) access) allows insertion in the middle without an O(n) runtime; insertion at the end is O(n) worst case an O(1) amortized for self-resizing arrays like ArrayList.
Maybe hashtables (amortized O(1) access and insertion anywhere, but O(n) worst case for insertion) or trees (O(log(n)) for access and insertion anywhere, guaranteed) are better suited.
If speed is your problem, I don't see how the selected answer is any better than using a raw Array, although it resizes itself, it uses the exact same mechanism you would use to resize an array (and should take just a touch longer) UNLESS you are always adding to the end, in which case it should do things a bit smarter because it allocates a chunk at a time instead of just one element.
If you often add near the beginning/middle of your collection and don't index into the middle/end very often, you probably want a Linked List. That will have the fastest insert time and will have great iteration time, it just sucks at indexing (such as looking at the 3rd element from the end, or the 72nd element).

Categories

Resources