C# List remove from end, really O(n)? - c#

I've read a couple of articles stating that List.RemoveAt() is in O(n) time.
If I do something like:
var myList = new List<int>();
/* Add many ints to the list here. */
// Remove item at end of list:
myList.RemoveAt(myList.Count - 1); // Does this line run in O(n) time?
Removing from the end of the list should be O(1), as it just needs to decrement the list count.
Do I need to write my own class to have this behavior, or does removing the item at the end of a C# list already perform in O(1) time?

In general List<T>::RemoveAt is O(N) because of the need to shift elements after the index up a slot in the array. But for the specific case of removing from the end of the list no shifting is needed and it is consequently O(1)

Removing last item will actually be O(1) operation since only in this case List doesn't shift next items in array. Here is a code from Reflector:
this._size--;
if (index < this._size) // this statement is false if index equals last index in List
{
Array.Copy(this._items, index + 1, this._items, index, this._size - index);
}
this._items[this._size] = default(T);

This should give you an idea
public void RemoveAt(int index) {
if ((uint)index >= (uint)_size) {
ThrowHelper.ThrowArgumentOutOfRangeException();
}
_size--;
if (index < _size) {
Array.Copy(_items, index + 1, _items, index, _size - index);
}
_items[_size] = default(T);
_version++;
}

When speaking asymptotically, O(N) is the worst case time complexity of the method itself, where N is the count. It cannot perform worse than that.
Practically, it would be in the order of O(N-I) (ignoring constant time overhead), where I is the index. This is deducible since all the items beyond the given index I needs to be shifted to position preceding them respectively in a List.
To see this intuitively, if N is 100 and index is 99 (last element), then there are no elements that need to be 'shifted' just the last element is deleted (or simply the count is decreased without changing the size of data structure).
Similarly, when N is 100, and index is 0 (first element), 99 shifts have to be made.
Run the following code and see for yourself:
int size = 1000000;
var list1 = new List<int>();
var list2 = new List<int>();
for (int i = 0; i < size; i++)
{
list1.Add(i);
list2.Add(i);
}
var sw = Stopwatch.StartNew();
for (int i = 0; i < size; i++)
{
list1.RemoveAt(size-1);
list1.Add(0);
}
sw.Stop();
Console.WriteLine("Time elapsed: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i = 0; i < size; i++)
{
list2.RemoveAt(0);
list2.Add(0);
}
sw.Stop();
Console.WriteLine("Time elapsed: {0}", sw.ElapsedMilliseconds);

It seems to me that if this was actually relevant to your application, you could have measured it in less time than it took to ask the question. And now you have at least two contradictory answers, so you'll have to test it anyway.
The point I'm trying to make is that unless the MSDN docs say that removeAt is O(1) for items at the end of the list, you couldn't really count on it working that way, and it might change in any given .NET update. For that matter, the behavior could be different for different types, for all you know.
If List is the "natural" data structure to use, then use it. If removing items from the List ends up being a hot spot n your profiling, then maybe it's time to implement your own class.

Related

How to Order By or Sort an integer List and select the Nth element

I have a list, and I want to select the fifth highest element from it:
List<int> list = new List<int>();
list.Add(2);
list.Add(18);
list.Add(21);
list.Add(10);
list.Add(20);
list.Add(80);
list.Add(23);
list.Add(81);
list.Add(27);
list.Add(85);
But OrderbyDescending is not working for this int list...
int fifth = list.OrderByDescending(x => x).Skip(4).First();
Depending on the severity of the list not having more than 5 elements you have 2 options.
If the list never should be over 5 i would catch it as an exception:
int fifth;
try
{
fifth = list.OrderByDescending(x => x).ElementAt(4);
}
catch (ArgumentOutOfRangeException)
{
//Handle the exception
}
If you expect that it will be less than 5 elements then you could leave it as default and check it for that.
int fifth = list.OrderByDescending(x => x).ElementAtOrDefault(4);
if (fifth == 0)
{
//handle default
}
This is still some what flawed because you could end up having the fifth element being 0. This can be solved by typecasting the list into a list of nullable ints at before the linq:
var newList = list.Select(i => (int?)i).ToList();
int? fifth = newList.OrderByDescending(x => x).ElementAtOrDefault(4);
if (fifth == null)
{
//handle default
}
Without LINQ expressions:
int result;
if(list != null && list.Count >= 5)
{
list.Sort();
result = list[list.Count - 5];
}
else // define behavior when list is null OR has less than 5 elements
This has a better performance compared to LINQ expressions, although the LINQ solutions presented in my second answer are comfortable and reliable.
In case you need extreme performance for a huge List of integers, I'd recommend a more specialized algorithm, like in Matthew Watson's answer.
Attention: The List gets modified when the Sort() method is called. If you don't want that, you must work with a copy of your list, like this:
List<int> copy = new List<int>(original);
List<int> copy = original.ToList();
The easiest way to do this is to just sort the data and take N items from the front. This is the recommended way for small data sets - anything more complicated is just not worth it otherwise.
However, for large data sets it can be a lot quicker to do what's known as a Partial Sort.
There are two main ways to do this: Use a heap, or use a specialised quicksort.
The article I linked describes how to use a heap. I shall present a partial sort below:
public static IList<T> PartialSort<T>(IList<T> data, int k) where T : IComparable<T>
{
int start = 0;
int end = data.Count - 1;
while (end > start)
{
var index = partition(data, start, end);
var rank = index + 1;
if (rank >= k)
{
end = index - 1;
}
else if ((index - start) > (end - index))
{
quickSort(data, index + 1, end);
end = index - 1;
}
else
{
quickSort(data, start, index - 1);
start = index + 1;
}
}
return data;
}
static int partition<T>(IList<T> lst, int start, int end) where T : IComparable<T>
{
T x = lst[start];
int i = start;
for (int j = start + 1; j <= end; j++)
{
if (lst[j].CompareTo(x) < 0) // Or "> 0" to reverse sort order.
{
i = i + 1;
swap(lst, i, j);
}
}
swap(lst, start, i);
return i;
}
static void swap<T>(IList<T> lst, int p, int q)
{
T temp = lst[p];
lst[p] = lst[q];
lst[q] = temp;
}
static void quickSort<T>(IList<T> lst, int start, int end) where T : IComparable<T>
{
if (start >= end)
return;
int index = partition(lst, start, end);
quickSort(lst, start, index - 1);
quickSort(lst, index + 1, end);
}
Then to access the 5th largest element in a list you could do this:
PartialSort(list, 5);
Console.WriteLine(list[4]);
For large data sets, a partial sort can be significantly faster than a full sort.
Addendum
See here for another (probably better) solution that uses a QuickSelect algorithm.
This LINQ approach retrieves the 5th biggest element OR throws an exception WHEN the list is null or contains less than 5 elements:
int fifth = list?.Count >= 5 ?
list.OrderByDescending(x => x).Take(5).Last() :
throw new Exception("list is null OR has not enough elements");
This one retrieves the 5th biggest element OR null WHEN the list is null or contains less than 5 elements:
int? fifth = list?.Count >= 5 ?
list.OrderByDescending(x => x).Take(5).Last() :
default(int?);
if(fifth == null) // define behavior
This one retrieves the 5th biggest element OR the smallest element WHEN the list contains less than 5 elements:
if(list == null || list.Count <= 0)
throw new Exception("Unable to retrieve Nth biggest element");
int fifth = list.OrderByDescending(x => x).Take(5).Last();
All these solutions are reliable, they should NEVER throw "unexpected" exceptions.
PS: I'm using .NET 4.7 in this answer.
Here there is a C# implementation of the QuickSelect algorithm to select the nth element in an unordered IList<>.
You have to put all the code contained in that page in a static class, like:
public static class QuickHelpers
{
// Put the code here
}
Given that "library" (in truth a big fat block of code), then you can:
int resA = list.QuickSelect(2, (x, y) => Comparer<int>.Default.Compare(y, x));
int resB = list.QuickSelect(list.Count - 1 - 2);
Now... Normally the QuickSelect would select the nth lowest element. We reverse it in two ways:
For resA we create a reverse comparer based on the default int comparer. We do this by reversing the parameters of the Compare method. Note that the index is 0 based. So there is a 0th, 1th, 2th and so on.
For resB we use the fact that the 0th element is the list-1 th element in the reverse order. So we count from the back. The highest element would be the list.Count - 1 in an ordered list, the next one list.Count - 1 - 1, then list.Count - 1 - 2 and so on
Theorically using Quicksort should be better than ordering the list and then picking the nth element, because ordering a list is on average a O(NlogN) operation and picking the nth element is then a O(1) operation, so the composite is O(NlogN) operation, while QuickSelect is on average a O(N) operation. Clearly there is a but. The O notation doesn't show the k factor... So a O(k1 * NlogN) with a small k1 could be better than a O(k2 * N) with a big k2. Only multiple real life benchmarks can tell us (you) what is better, and it depends on the size of the collection.
A small note about the algorithm:
As with quicksort, quickselect is generally implemented as an in-place algorithm, and beyond selecting the k'th element, it also partially sorts the data. See selection algorithm for further discussion of the connection with sorting.
So it modifies the ordering of the original list.

Remove list elements at given indices

I have a list which contains some items of type string.
List<string> lstOriginal;
I have another list which contains idices which should be removed from first list.
List<int> lstIndices;
I'd tried to do the job with RemoveAt() method ,
foreach(int indice in lstIndices)
{
lstOriginal.RemoveAt(indice);
}
but it crashes and said me that "index is Out of Range."
You need to sort the indexes that you would like to return from largest to smallest in order to avoid removing something at the wrong index.
foreach(int indice in lstIndices.OrderByDescending(v => v))
{
lstOriginal.RemoveAt(indice);
}
Here is why: let's say have a list of five items, and you'd like to remove items at indexes 2 and 4. If you remove the item at 2 first, the item that was at index 4 would be at index 3, and index 4 would no longer be in the list at all (causing your exception). If you go backwards, all indexes would be there up to the moment when you're ready to remove the corresponding item.
How are you populating the list of indices? There's a much more efficient RemoveAll method that you might be able to use. For example, instead of this:
var indices = new List<int>();
int index = 0;
foreach (var item in data)
if (SomeFunction(data))
indices.Add(index++);
//then some logic to remove the items
you could do this:
data.RemoveAll(item => SomeFunction(item));
This minimizes the copying of items to new positions in the array; each item is copied only once.
You could also use a method group conversion in the above example, instead of a lambda:
data.RemoveAll(SomeFunction);
The reason this is happening is because when you remove an item from the list, the index of each item after it effectively decreases by one, so if you remove them in increasing index order and some items near the end of the original list were to be removed, those indices are now invalid because the list becomes shorter as the earlier items are removed.
The easiest solution is to sort your index list in decreasing order (highest index first) and then iterate across that.
for (int i = 0; i < indices.Count; i++)
{
items.RemoveAt(indices[i] - i);
}
My in-place deleting of given indices as handy extension method. It copies all items only once so it is much more performant if large amount of indicies is to be removed.
It also throws ArgumentOutOfRangeException in case where index to remove is out of bounds.
public static class ListExtensions
{
public static void RemoveAllIndices<T>(this List<T> list, IEnumerable<int> indices)
{
//do not remove Distinct() call here, it's important
var indicesOrdered = indices.Distinct().ToArray();
if(indicesOrdered.Length == 0)
return;
Array.Sort(indicesOrdered);
if (indicesOrdered[0] < 0 || indicesOrdered[indicesOrdered.Length - 1] >= list.Count)
throw new ArgumentOutOfRangeException();
int indexToRemove = 0;
int newIdx = 0;
for (int originalIdx = 0; originalIdx < list.Count; originalIdx++)
{
if(indexToRemove < indicesOrdered.Length && indicesOrdered[indexToRemove] == originalIdx)
{
indexToRemove++;
}
else
{
list[newIdx++] = list[originalIdx];
}
}
list.RemoveRange(newIdx, list.Count - newIdx);
}
}
var array = lstOriginal.ConvertAll(item => new int?(item)).ToArray();
lstIndices.ForEach(index => array[index] = null);
lstOriginal = array.Where(item => item.HasValue).Select(item => item.Value).ToList();
lstIndices.OrderByDescending(p => p).ToList().ForEach(p => lstOriginal.RemoveAt((int)p));
As a side note, in foreach statements, it is better not to modify the Ienumerable on which foreach is running. The out of range error is probably as a result of this situation.

What's the best way to remove items from an ordered collection?

I have a list of items to remove from an ordered collection in C#.
what's the best way in going about this?
If I remove an item in the middle, the index changes but what If I want to remove multiple items?
To avoid index changes, start at the end and go backwards to index 0.
Something along these lines:
for(int i = myList.Count - 1; i >= 0; i++)
{
if(NeedToDelete(myList[i]))
{
myList.RemoveAt(i);
}
}
What is the type of the collection? If it inherits from ICollection, you can just run a loop over the list of items to remove, then call the .Remove() method on the collection.
For Example:
object[] itemsToDelete = GetObjectsToDeleteFromSomewhere();
ICollection<object> orderedCollection = GetCollectionFromSomewhere();
foreach (object item in itemsToDelete)
{
orderedCollection.Remove(item);
}
If the collection is a List<T> you can also use the RemoveAll method:
list.RemoveAll(x => otherlist.Contains(x));
Assuming that the list of items to delete is relatively short, you can first sort the target list. Than traverse the source list and keep an index in the target list which corresponds to the item which you deleted.
Supposed that the source list is haystack and list of items to delete is needle:
needle.Sort(); // not needed if it's known that `needle` is sorted
// haystack is known to be sorted
haystackIdx = 0;
needleIdx = 0;
while (needleIdx < needle.Count && haystackIdx < haystack.Count)
{
if (haystack[haystackIdx] < needle[needleIdx])
haystackIdx++;
else if (haystack[haystackIdx] > needle[needleIdx])
needleIdx++;
else
haystack.RemoveAt(haystackIdx);
}
This way you have only 1 traversal of both haystack and needle, plus the time of sorting the needle, provided the deletion is O(1) (which is often the case for linked lists and the collections like that). If the collection is a List<...>, deletion will need O(collection size) because of data shifts, so you'd better start from the end of both collections and move to the beginning:
needle.Sort(); // not needed if it's known that `needle` is sorted
// haystack is known to be sorted
haystackIdx = haystack.Count - 1;
needleIdx = needle.Count - 1;
while (needleIdx >= 0 && haystackIdx >= 0)
{
if (haystack[haystackIdx] > needle[needleIdx])
haystackIdx--;
else if (haystack[haystackIdx] < needle[needleIdx])
needleIdx--;
else
haystack.RemoveAt(haystackIdx--);
}

Clear all array list data

Why doesn't the code below clear all array list data?
Console.WriteLine("Before cleaning:" + Convert.ToString(ID.Count));
//ID.Count = 20
for (int i = 0; i < ID.Count; i++)
{
ID.RemoveAt(i);
}
Console.WriteLine("After cleaning:" + Convert.ToString(ID.Count));
//ID.Count = 10
Why is 10 printed to the screen?
Maybe there is another special function, which deletes everything?
You're only actually calling RemoveAt 10 times. When i reaches 10, ID.Count will be 10 as well. You could fix this by doing:
int count = ID.Count;
for (int i = 0; i < originalCount; i++)
{
ID.RemoveAt(0);
}
This is an O(n2) operation though, as removing an entry from the start of the list involves copying everything else.
More efficiently (O(n)):
int count = ID.Count;
for (int i = 0; i < originalCount; i++)
{
ID.RemoveAt(ID.Count - 1);
}
or equivalent but simpler:
while (ID.Count > 0)
{
ID.RemoveAt(ID.Count - 1);
}
But using ID.Clear() is probably more efficient than all of these, even though it is also O(n).
`Array.Clear()`
removes all items in the array.
`Array.RemoveAt(i)`
removes the element of ith index in the array.
ArrayList.Clear Method
Removes all elements from the ArrayList.
for more detail : http://msdn.microsoft.com/en-us/library/system.collections.arraylist.clear.aspx
After removing 10 items, ID.Count() == 10 and i == 10 so the loop stops.
Use ID.Clear() to remove all items in the array list.
Use the clear() Method
or
change ID.RemoveAt(i); to ID.RemoveAt(0);
Whenever an element is removed from the collection, its index also changes. Hence when you say ID.RemoveAt(0); the element at index 1 now will be moved to index 0. So again you've to remove the same element (like dequeuing). until you reach the last element. However if you want to remove all the elements at once you can better use the Clear() method.
Your code does:
ID.RemoveAt(0);
...
ID.RemoveAt(9);
ID.RemoveAt(10); \\ at this point you have already removed 10
\\ items so there is nothing left on 10- 19, but you are left with
\\ the 'first' 10 elements
...
ID.RemoveAt(19);
Generally speaking your method removes every second element from the list..
Use ArrayList.Clear instead as other have mentioned.

Copy an array backwards? Array.Copy?

I have a List<T> that I want to be able to copy to an array backwards, meaning start from List.Count and copy maybe 5 items starting at the end of the list and working its way backwards. I could do this with a simple reverse for loop; however there is probably a faster/more efficient way of doing this so I thought I should ask. Can I use Array.Copy somehow?
Originally I was using a Queue as that pops it off in the correct order I need, but I now need to pop off multiple items at once into an array and I thought a list would be faster.
Looks like Array.Reverse has native code for reversing an array which sometimes doesn't apply and would fall back to using a simple for loop. In my testing Array.Reverse is very slightly faster than a simple for loop. In this test of reversing a 1,000,000 element array 1,000 times, Array.Reverse is about 600ms whereas a for-loop is about 800ms.
I wouldn't recommend performance as a reason to use Array.Reverse though. It's a very minor difference which you'll lose the minute you load it into a List which will loop through the array again. Regardless, you shouldn't worry about performance until you've profiled your app and identified the performance bottlenecks.
public static void Test()
{
var a = Enumerable.Range(0, 1000000).ToArray();
var stopwatch = Stopwatch.StartNew();
for(int i=0; i<1000; i++)
{
Array.Reverse(a);
}
stopwatch.Stop();
Console.WriteLine("Elapsed Array.Reverse: " + stopwatch.ElapsedMilliseconds);
stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 1000; i++)
{
MyReverse(a);
}
stopwatch.Stop();
Console.WriteLine("Elapsed MyReverse: " + stopwatch.ElapsedMilliseconds);
}
private static void MyReverse(int[] a)
{
int j = a.Length - 1;
for(int i=0; i<j; i++, j--)
{
int z = a[i];
a[i] = a[j];
a[j] = z;
}
}
It is not possible to do this faster than a simple for loop.
You can accomplish it any number of ways, but the fastest way is get the elements in exactly the manner you are. You can use Array.Reverse, Array.Copy, etc., or you can use LINQ and extension methods, and both are valid alternatives, but they shouldn't be any faster.
In one of your comments:
Currently we are pulling out one result and committing it to a database one at a time
There is a big difference between using a for loop to iterate backwards over a List<T> and committing records to a database one at a time. The former is fine; nobody's endorsing the latter.
Why not just iterate first--to populate an array--and then send that array into the database, all populated?
var myArray = new T[numItemsYouWantToSend];
int arrayIndex = 0;
for (int i = myList.Count - 1; arrayIndex < myArray.Length; --i) {
if (i < 0) break;
myArray[arrayIndex++] = myList[i];
}
UpdateDatabase(myArray);

Categories

Resources