Take & remove elements from collection - c#

What's the most performant way to remove n elements from a collection and add those removed n elements to an already existing, different, collection?
Currently I've got this:
var entries = collection.Take(5).ToList();
foreach(var entry in entries)
collection.Remove(entry);
otherCollection.AddRange(entries);
However, this doesn't look performant at all to me (multiple linear algorithms instead of only one).
A possible solution may of course change the collection implementation - as long as the following requirements are met:
otherCollection must implement IEnumerable<T>, it is currently of type List<T>
collection must implement ICollection<T>, it is currently of type LinkedList<T>
Hint: entries do not necessarily implement Equals() or GetHashCode().
What's the most performant way to reach my goal?
As it has been obviously too hard to understand my performance considerations, here once more my code example:
var entries = collection.Take(1000).ToList(); // 1000 steps
foreach(var entry in entries) // 1000 * 1 steps (as Remove finds the element always immediately at the beginning)
collection.Remove(entry);
otherCollection.AddRange(entries); // another 1000 steps
= 3000 steps in total => I want to reduce it to a single 1000 steps.

The previous function only returns half results. You should use:
public static IEnumerable<T> TakeAndRemove<T>(Queue<T> queue, int count)
{
for (int i = 0; i < count && queue.Count > 0; i++)
yield return queue.Dequeue();
}

With your use case the best data structure seems to be a queue. When using a queue your method can look this this:
public static IEnumerable<T> TakeAndRemove<T>(Queue<T> queue, int count)
{
count = Math.Min(queue.Count, count);
for (int i = 0; i < count; i++)
yield return queue.Dequeue();
}

Related

What is the difference between for and foreach?

What is the major difference between for and foreach loops?
In which scenarios can we use for and not foreach and vice versa.
Would it be possible to show with a simple program?
Both seem the same to me. I can't differentiate them.
a for loop is a construct that says "perform this operation n. times".
a foreach loop is a construct that says "perform this operation against each value/object in this IEnumerable"
You can use foreach if the object you want to iterate over implements the IEnumerable interface. You need to use for if you can access the object only by index.
I'll tryto answer this in a more general approach:
foreach is used to iterate over each element of a given set or list (anything implementing IEnumerable) in a predefined manner. You can't influence the exact order (other than skipping entries or canceling the whole loop), as that's determined by the container.
foreach (String line in document) { // iterate through all elements of "document" as String objects
Console.Write(line); // print the line
}
for is just another way to write a loop that has code executed before entering the loop and once after every iteration. It's usually used to loop through code a given number of times. Contrary to foreach here you're able to influence the current position.
for (int i = 0, j = 0; i < 100 && j < 10; ++i) { // set i and j to 0, then loop as long as i is less than 100 or j is less than 10 and increase i after each iteration
if (i % 8 == 0) { // skip all numbers that can be divided by 8 and count them in j
++j
continue;
}
Console.Write(i);
}
Console.Write(j);
If possible and applicable, always use foreach rather than for (assuming there's some array index). Depending on internal data organisation, foreach can be a lot faster than using for with an index (esp. when using linked lists).
Everybody gave you the right answer with regard to foreach, i.e. it's a way to loop through the elements of something implementing IEnumerable.
On the other side, for is much more flexible than what is shown in the other answers. In fact, for is used to executes a block of statements for as long as a specified condition is true.
From Microsoft documentation:
for (initialization; test; increment)
statement
initialization
Required. An expression. This expression is executed only once, before the loop is executed.
test
Required. A Boolean expression. If test is true, statement is executed. If test if false, the loop is terminated.
increment
Required. An expression. The increment expression is executed at the end of every pass through the loop.
statement
Optional. Statement to be executed if test is true. Can be a compound statement.
This means that you can use it in many different ways. Classic school examples are the sum of the numbers from 1 to 10:
int sum = 0;
for (int i = 0; i <= 10; i++)
sum = sum + i;
But you can use it to sum the numbers in an Array, too:
int[] anArr = new int[] { 1, 1, 2, 3, 5, 8, 13, 21 };
int sum = 0;
for (int i = 0; i < anArr.Length; i++)
sum = sum + anArr[i];
(this could have been done with a foreach, too):
int[] anArr = new int[] { 1, 1, 2, 3, 5, 8, 13, 21 };
int sum = 0;
foreach (int anInt in anArr)
sum = sum + anInt;
But you can use it for the sum of the even numbers from 1 to 10:
int sum = 0;
for (int i = 0; i <= 10; i = i + 2)
sum = sum + i;
And you can even invent some crazy thing like this one:
int i = 65;
for (string s = string.Empty; s != "ABC"; s = s + Convert.ToChar(i++).ToString()) ;
Console.WriteLine(s);
for loop:
1) need to specify the loop bounds( minimum or maximum).
2) executes a statement or a block of statements repeatedly
until a specified expression evaluates to false.
Ex1:-
int K = 0;
for (int x = 1; x <= 9; x++){
k = k + x ;
}
foreach statement:
1)do not need to specify the loop bounds minimum or maximum.
2)repeats a group of embedded statements for
a)each element in an array
or b) an object collection.
Ex2:-
int k = 0;
int[] tempArr = new int[] { 0, 2, 3, 8, 17 };
foreach (int i in tempArr){
k = k + i ;
}
foreach is almost equivalent to :
var enumerator = list.GetEnumerator();
var element;
while(enumerator.MoveNext()){
element = enumerator.Current;
}
and in order to implemetn a "foreach" compliant pattern, this need to provide a class that have a method GetEnumerator() which returns an object that have a MoveNext() method, a Reset() method and a Current property.
Indeed, you do not need to implement neither IEnumerable nor IEnumerator.
Some derived points:
foreach does not need to know the collection length so allows to iterate through a "stream" or a kind of "elements producer".
foreach calls virtual methods on the iterator (the most of the time) so can perform less well than for.
It depends on what you are doing, and what you need.
If you are iterating through a collection of items, and do not care about the index values then foreach is more convenient, easier to write and safer: you can't get the number of items wrong.
If you need to process every second item in a collection for example, or process them ion the reverse order, then a for loop is the only practical way.
The biggest differences are that a foreach loop processes an instance of each element in a collection in turn, while a for loop can work with any data and is not restricted to collection elements alone. This means that a for loop can modify a collection - which is illegal and will cause an error in a foreach loop.
For more detail, see MSDN : foreach and for
Difference Between For and For Each Loop in C#
For Loops executes a block of code until an expression returns false while ForEach loop executed a block of code through the items in object collections.
For loop can execute with object collections or without any object collections while ForEach loop can execute with object collections only.
The for loop is a normal loop construct which can be used for multiple purposes where as foreach is designed to work only on Collections or IEnumerables object.
foreach is useful if you have a array or other IEnumerable Collection of data. but for can be used for access elements of an array that can be accessed by their index.
A for loop is useful when you have an indication or determination, in advance, of how many times you want a loop to run. As an example, if you need to perform a process for each day of the week, you know you want 7 loops.
A foreach loop is when you want to repeat a process for all pieces of a collection or array, but it is not important specifically how many times the loop runs. As an example, you are formatting a list of favorite books for users. Every user may have a different number of books, or none, and we don't really care how many it is, we just want the loop to act on all of them.
The for loop executes a statement or a block of statements repeatedly until a specified expression evaluates to false.
There is a need to specify the loop bounds (minimum or maximum). Following is a code example of a simple for loop that starts 0 till <= 5.
we look at foreach in detail. What looks like a simple loop on the outside is actually a complex data structure called an enumerator:
An enumerator is a data structure with a Current property, a MoveNext method, and a Reset method. The Current property holds the value of the current element, and every call to MoveNext advances the enumerator to the next item in the sequence.
Enumerators are great because they can handle any iterative data structure. In fact, they are so powerful that all of LINQ is built on top of enumerators.
But the disadvantage of enumerators is that they require calls to Current and MoveNext for every element in the sequence. All those method calls add up, especially in mission-critical code.
Conversely, the for-loop only has to call get_Item for every element in the list. That’s one method call less than the foreach-loop, and the difference really shows.
So when should you use a foreach-loop, and when should you use a for-loop?
Here’s what you need to do:
When you’re using LINQ, use foreach
When you’re working with very large computed sequences of values, use foreach
When performance isn’t an issue, use foreach
But if you want top performance, use a for-loop instead
The major difference between the for and foreach loop in c# we understand by its working:
The for loop:
The for loop's variable always be integer only.
The For Loop executes the statement or block of statements repeatedly until specified expression evaluates to false.
In for loop we have to specify the loop's boundary ( maximum or minimum).-------->We can say this is the limitation of the for loop.
The foreach loop:
In the case of the foreach loop the variable of the loop while be same as the type of values under the array.
The Foreach statement repeats a group of embedded statements for each element in an array or an object collection.
In foreach loop, You do not need to specify the loop bounds minimum or maximum.--->
here we can say that this is the advantage of the for each loop.
I prefer the FOR loop in terms of performance. FOREACH is a little slow when you go with more number of items.
If you perform more business logic with the instance then FOREACH performs faster.
Demonstration:
I created a list of 10000000 instances and looping with FOR and FOREACH.
Time took to loop:
FOREACH -> 53.852ms
FOR -> 28.9232ms
Below is the sample code.
class Program
{
static void Main(string[] args)
{
List<TestClass> lst = new List<TestClass>();
for (int i = 1; i <= 10000000; i++)
{
TestClass obj = new TestClass() {
ID = i,
Name = "Name" + i.ToString()
};
lst.Add(obj);
}
DateTime start = DateTime.Now;
foreach (var obj in lst)
{
//obj.ID = obj.ID + 1;
//obj.Name = obj.Name + "1";
}
DateTime end = DateTime.Now;
var first = end.Subtract(start).TotalMilliseconds;
start = DateTime.Now;
for (int j = 0; j<lst.Count;j++)
{
//lst[j].ID = lst[j].ID + 1;
//lst[j].Name = lst[j].Name + "1";
}
end = DateTime.Now;
var second = end.Subtract(start).TotalMilliseconds;
}
}
public class TestClass
{
public long ID { get; set; }
public string Name { get; set; }
}
If I uncomment the code inside the loop:
Then, time took to loop:
FOREACH -> 2564.1405ms
FOR -> 2753.0017ms
Conclusion
If you do more business logic with the instance, then FOREACH is recommended.
If you are not doing much logic with the instance, then FOR is recommended.
Many answers are already there, I just need to identify one difference which is not there.
for loop is fail-safe while foreach loop is fail-fast.
Fail-fast iteration throws ConcurrentModificationException if iteration and modification are done at the same time in object.
However, fail-safe iteration keeps the operation safe from failing even if the iteration goes in infinite loop.
public class ConcurrentModification {
public static void main(String[] args) {
List<String> str = new ArrayList<>();
for(int i=0; i<1000; i++){
str.add(String.valueOf(i));
}
/**
* this for loop is fail-safe. It goes into infinite loop but does not fail.
*/
for(int i=0; i<str.size(); i++){
System.out.println(str.get(i));
str.add(i+ " " + "10");
}
/**
* throws ConcurrentModificationexception
for(String st: str){
System.out.println(st);
str.add("10");
}
*/
/* throws ConcurrentModificationException
Iterator<String> itr = str.iterator();
while(itr.hasNext()) {
System.out.println(itr.next());
str.add("10");
}*/
}
}
Hope this helps to understand the difference between for and foreach loop through different angle.
I found a good blog to go through the differences between fail-safe and fail-fast, if anyone interested:
You can use the foreach for an simple array like
int[] test = { 0, 1, 2, 3, ...};
And you can use the for when you have a 2D array
int[][] test = {{1,2,3,4},
{5,2,6,5,8}};
foreach syntax is quick and easy. for syntax is a little more complex, but is also more flexible.
foreach is useful when iterating all of the items in a collection. for is useful when iterating overall or a subset of items.
The foreach iteration variable which provides each collection item, is READ-ONLY, so we can't modify the items as they are iterated. Using the for syntax, we can modify the items as needed.
Bottom line- use foreach to quickly iterate all of the items in a collection. Use for to iterate a subset of the items of the collection or to modify the items as they are iterated.
simple difference between for and foreach
for loop is working with values. it must have condition then increment and intialization also. you have to knowledge about 'how many times loop repeated'.
foreach is working with objects and enumaretors. no need to knowledge how many times loop repeated.
The foreach statement repeats a group of embedded statements for each element in an array or an object collection that implements the System.Collections.IEnumerable or System.Collections.Generic.IEnumerable interface. The foreach statement is used to iterate through the collection to get the information that you want, but can not be used to add or remove items from the source collection to avoid unpredictable side effects. If you need to add or remove items from the source collection, use a for loop.
One important thing related with foreach is that , foreach iteration variable cannot be updated(or assign new value) in loop body.
for example :
List<string> myStrlist = new List<string>() { "Sachin", "Ganguly", "Dravid" };
foreach(string item in myStrlist)
{
item += " cricket"; // ***Not Possible***
}

Most efficient sorting algorithm for sorted sub-sequences

I have several sorted sequences of numbers of type long (ascending order) and want to generate one master sequence that contains all elements in the same order. I look for the most efficient sorting algorithm to solve this problem. I target C#, .Net 4.0 and thus also welcome ideas targeting parallelism.
Here is an example:
s1 = 1,2,3,5,7,13
s2 = 2,3,6
s3 = 4,5,6,7,8
resulting Sequence = 1,2,2,3,3,4,5,5,6,6,7,7,8,13
Edit: When there are two (or more) identical values then the order of those two (or more) does not matter.
Just merge the sequences. You do not have to sort them again.
There is no .NET Framework method that I know of to do a K-way merge. Typically, it's done with a priority queue (often a heap). It's not difficult to do, and it's quite efficient. Given K sorted lists, together holding N items, the complexity is O(N log K).
I show a simple binary heap class in my article A Generic Binary Heap Class. In Sorting a Large Text File, I walk through the creation of multiple sorted sub-files and using the heap to do the K-way merge. Given an hour (perhaps less) of study, and you can probably adapt that to use in your program.
You just have to merge your sequences like in a merge sort.
And this is parallelizable:
merge sequences (1 and 2 in 1/2), (3 and 4 in 3/4), …
merge sequences (1/2 and 3/4 in 1/2/3/4), (5/6 and 7/8 in 5/6/7/8), …
…
Here is the merge function :
int j = 0;
int k = 0;
for(int i = 0; i < size_merged_seq; i++)
{
if (j < size_seq1 && seq1[j] < seq2[k])
{
merged_seq[i] = seq1[j];
j++;
}
else
{
merged_seq[i] = seq2[k];
k++;
}
}
Easy way is to merge them with each other one by one. However, this will require O(n*k^2) time, where k is number of sequences and n is the average number of items in sequences. However, using divide and conquer approach you can lower this time to O(n*k*log k). The algorithm is as follows:
Divide k sequences to k/2 groups, each of 2 elements (and 1 groups of 1 element if k is odd).
Merge sequences in each group. Thus you will get k/2 new groups.
Repeat until you get single sequence.
UPDATE:
Turns out that with all the algorithms... It's still faster the simple way:
private static List<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedBunches)
{
var list = sortedBunches.SelectMany(bunch => bunch).ToList();
list.Sort();
return list;
}
And for legacy purposes...
Here is the final version by prioritizing:
private static IEnumerable<T> MergeSorted<T>(IEnumerable<IEnumerable<T>> sortedInts) where T : IComparable<T>
{
var enumerators = new List<IEnumerator<T>>(sortedInts.Select(ints => ints.GetEnumerator()).Where(e => e.MoveNext()));
enumerators.Sort((e1, e2) => e1.Current.CompareTo(e2.Current));
while (enumerators.Count > 1)
{
yield return enumerators[0].Current;
if (enumerators[0].MoveNext())
{
if (enumerators[0].Current.CompareTo(enumerators[1].Current) == 1)
{
var tmp = enumerators[0];
enumerators[0] = enumerators[1];
enumerators[1] = tmp;
}
}
else
{
enumerators.RemoveAt(0);
}
}
do
{
yield return enumerators[0].Current;
} while (enumerators[0].MoveNext());
}

List.Sort and Bubble Sort, which is faster? (Closed)

For example, I have a List
List<int> list = new List<int>();
list.Add(1);
list.Add(5);
list.Add(7);
list.Add(3);
list.Add(17);
list.Add(10);
list.Add(13);
list.Add(9);
I use List.Sort method like this
private static int Compare(int x, int y)
{
if (x == y)
return 0;
else if (x > y)
return -1;
else
return 1;
}
List.Sort(Compare);
I use bubble sort like this
private static void Sort(List<int> list)
{
int size = list.Capacity;
for (int i = 1; i < size; i++)
{
for (int j = 0; j < (size - i); j++)
{
if (list[j] > list[j+1])
{
int temp = list[j];
list[j] = list[j+1];
list[j+1] = temp;
}
}
}
}
My question like the title, I wonder that which is faster?
Thank you
On the whole, bubble sort will be slower than almost anything else, including List.Sort which is implemented with a quick sort algorithm.
Bubble sort is simple to implement, but it's not very efficient. The List.Sort method uses QuickSort, which is a more complex and also more efficient algorithm.
However, when you have very few items in your list, like in your example, the efficiency of the algorithm doesn't really matter. What matters is how it's implemented and how much overhead there is, so you would just have to use the Stopwatch class to time your examples. This will of course only tell you which is faster for the exact list that you are testing, so it's not very useful for choosing an algorithm to use in an application.
Besides, when there are very few items in the list, it doesn't really matter which algorithm is faster because it takes so little time anyway. You should consider how much items there will be in the actual implementation, and if the number of items will grow over time.
Have a look at the documentation of List<T>.Sort():
On average, this method is an O(n log n) operation, where n is Count; in the worst case it is an O(n ^ 2) operation.
Since bubble sort is O(n ^ 2) (both on average and in the worst case), you can expect List<T>.Sort() to be (much) faster on large data sets. The speed of sorting 8 elements (as you have) is usually so minuscule (even using bubble sort), that it doesn't matter what you use.
What could affect the speed in this case is the fact that you use a delegate with List<T>.Sort(), but not with your bubble sort. Invoking delegates is relatively slow, so you should try to avoid them if possible when you are micro-optimizing (which you shouldn't do most of the time).

C# List remove from end, really O(n)?

I've read a couple of articles stating that List.RemoveAt() is in O(n) time.
If I do something like:
var myList = new List<int>();
/* Add many ints to the list here. */
// Remove item at end of list:
myList.RemoveAt(myList.Count - 1); // Does this line run in O(n) time?
Removing from the end of the list should be O(1), as it just needs to decrement the list count.
Do I need to write my own class to have this behavior, or does removing the item at the end of a C# list already perform in O(1) time?
In general List<T>::RemoveAt is O(N) because of the need to shift elements after the index up a slot in the array. But for the specific case of removing from the end of the list no shifting is needed and it is consequently O(1)
Removing last item will actually be O(1) operation since only in this case List doesn't shift next items in array. Here is a code from Reflector:
this._size--;
if (index < this._size) // this statement is false if index equals last index in List
{
Array.Copy(this._items, index + 1, this._items, index, this._size - index);
}
this._items[this._size] = default(T);
This should give you an idea
public void RemoveAt(int index) {
if ((uint)index >= (uint)_size) {
ThrowHelper.ThrowArgumentOutOfRangeException();
}
_size--;
if (index < _size) {
Array.Copy(_items, index + 1, _items, index, _size - index);
}
_items[_size] = default(T);
_version++;
}
When speaking asymptotically, O(N) is the worst case time complexity of the method itself, where N is the count. It cannot perform worse than that.
Practically, it would be in the order of O(N-I) (ignoring constant time overhead), where I is the index. This is deducible since all the items beyond the given index I needs to be shifted to position preceding them respectively in a List.
To see this intuitively, if N is 100 and index is 99 (last element), then there are no elements that need to be 'shifted' just the last element is deleted (or simply the count is decreased without changing the size of data structure).
Similarly, when N is 100, and index is 0 (first element), 99 shifts have to be made.
Run the following code and see for yourself:
int size = 1000000;
var list1 = new List<int>();
var list2 = new List<int>();
for (int i = 0; i < size; i++)
{
list1.Add(i);
list2.Add(i);
}
var sw = Stopwatch.StartNew();
for (int i = 0; i < size; i++)
{
list1.RemoveAt(size-1);
list1.Add(0);
}
sw.Stop();
Console.WriteLine("Time elapsed: {0}", sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();
for (int i = 0; i < size; i++)
{
list2.RemoveAt(0);
list2.Add(0);
}
sw.Stop();
Console.WriteLine("Time elapsed: {0}", sw.ElapsedMilliseconds);
It seems to me that if this was actually relevant to your application, you could have measured it in less time than it took to ask the question. And now you have at least two contradictory answers, so you'll have to test it anyway.
The point I'm trying to make is that unless the MSDN docs say that removeAt is O(1) for items at the end of the list, you couldn't really count on it working that way, and it might change in any given .NET update. For that matter, the behavior could be different for different types, for all you know.
If List is the "natural" data structure to use, then use it. If removing items from the List ends up being a hot spot n your profiling, then maybe it's time to implement your own class.

What's the best way to remove items from an ordered collection?

I have a list of items to remove from an ordered collection in C#.
what's the best way in going about this?
If I remove an item in the middle, the index changes but what If I want to remove multiple items?
To avoid index changes, start at the end and go backwards to index 0.
Something along these lines:
for(int i = myList.Count - 1; i >= 0; i++)
{
if(NeedToDelete(myList[i]))
{
myList.RemoveAt(i);
}
}
What is the type of the collection? If it inherits from ICollection, you can just run a loop over the list of items to remove, then call the .Remove() method on the collection.
For Example:
object[] itemsToDelete = GetObjectsToDeleteFromSomewhere();
ICollection<object> orderedCollection = GetCollectionFromSomewhere();
foreach (object item in itemsToDelete)
{
orderedCollection.Remove(item);
}
If the collection is a List<T> you can also use the RemoveAll method:
list.RemoveAll(x => otherlist.Contains(x));
Assuming that the list of items to delete is relatively short, you can first sort the target list. Than traverse the source list and keep an index in the target list which corresponds to the item which you deleted.
Supposed that the source list is haystack and list of items to delete is needle:
needle.Sort(); // not needed if it's known that `needle` is sorted
// haystack is known to be sorted
haystackIdx = 0;
needleIdx = 0;
while (needleIdx < needle.Count && haystackIdx < haystack.Count)
{
if (haystack[haystackIdx] < needle[needleIdx])
haystackIdx++;
else if (haystack[haystackIdx] > needle[needleIdx])
needleIdx++;
else
haystack.RemoveAt(haystackIdx);
}
This way you have only 1 traversal of both haystack and needle, plus the time of sorting the needle, provided the deletion is O(1) (which is often the case for linked lists and the collections like that). If the collection is a List<...>, deletion will need O(collection size) because of data shifts, so you'd better start from the end of both collections and move to the beginning:
needle.Sort(); // not needed if it's known that `needle` is sorted
// haystack is known to be sorted
haystackIdx = haystack.Count - 1;
needleIdx = needle.Count - 1;
while (needleIdx >= 0 && haystackIdx >= 0)
{
if (haystack[haystackIdx] > needle[needleIdx])
haystackIdx--;
else if (haystack[haystackIdx] < needle[needleIdx])
needleIdx--;
else
haystack.RemoveAt(haystackIdx--);
}

Categories

Resources