Lock free iterating / indexing jagged arrays

Lock free iterating / indexing jagged arrays - c#

For parallel work on an acceleration data structure I currently use a SpinLock but would like to design the algorithm lock free.
The data structure is a jagged array where each inner array has a different size.
The working threads should fetch the next element in the inner array, increment the index and, if the index is larger, switch to the next index in the outer array:
for(int i = 0; i < arr.Length; ++i)
{
for(int j = 0; j < arr[i].Length; ++j)
{
DoWork(arr[i][j]);
}
}
I can't think of a way to do this lock free except to increment a shared index and then sum up the lengths of the arrays:
int sharedIndex = -1;
// -- In the worker thread ---------------------
bool loop = false;
do
{
int index = Interlocked.Increment(ref sharedIndex);
int count = 0;
loop = false;
for(int i = 0; i < arr.Length; ++i)
{
count += arr[i].Length;
if(count > index)
{
var remaining = index - (count - arr[i].Length);
DoWork(arr[i][remaining]);
loop = true;
break;
}
}
} while(loop);
Is there a way to not have to loop over the entire outer array and still remain lock free?
Because I can't increment two indexes at the same time (for the outer and inner index).

Can you divide up work by having each thread do one to four outer iterations between synchronization steps? If outer_size / chunk_size / threads is at least 4 or so (or maybe greater than the expected ratio between your shortest and longest inner arrays), scheduling of work is dynamic enough that you should usually avoid having one thread running for a long time on a very long array while the other threads have all finished.
(If a chunk size of 1 row aka inner array is coarse enough for efficiency, you can simply do that. You say that DoWork is so slow that even a shared counter for single elements might not be a problem)
That might still be a risk if the very last inner array is longer than the others. Depending on how common that is, and/or how important it is to avoid that worst-case scenario, you might look at the inner sizes ahead of time and sort or partition them to start working on the longest inner arrays first, so at the end the differences between threads finishing are the differences in lengths of the shorter arrays. (e.g. real-time where limiting the worst case is more important than speeding up the average, vs. a throughput-oriented use-case. Also if there's anything useful for other threads to be doing with free CPU cores if you don't schedule this perfectly.)
Atomically incrementing a shared counter for every inner element would serialize all threads on that, so unless processing each inner element was very expensive, it would be much slower than single-threaded without synchronization.
I'm assuming you don't need to start work on each element in sequential order, since even a shared counter wouldn't guarantee that (a thread could sleep after incrementing, with another thread starting the element after).
If you are going to search, start from your previous position.
If you do want to use a single shared counter, instead of linear searching from the start of the outer array every time, only search from your previous position. The shared counter is monotonically increasing, so the next position will usually be later this row, or into the very next. Should be more efficient to do that than to search from the start every time.
e.g. keep 3 variables: prev_index, and prev_i, prev_j. If j = prev_j + (index - prev_index) is still within the current array, you're done. This is likely the common case. Otherwise, move to the next row and recompute by subtracting arr[i].Length until you have a j that's in-bounds for that i.
Theodor Zoulias suggested pre-computing an array with a running total (aka prefix-sum) of the lengths. Good idea, but searching from the previous position probably makes that unnecessary, unless your rows are typically very short and you have lots of threads. In that case each step might involve multiple rows, so you could linear search from your previous position in a running-total array a bit more efficiently.
Per-row position counter: other threads can help finish a long row
If dividing work between threads only by rows isn't fine-grained enough, you could still mostly do that (with low contention), but create a way to threads to go back and help with unfinished long rows once there are no more fresh rows.
So you start as I proposed, with each thread claiming a whole row via a shared counter. When it gets to the end of a row, atomic fetch_add to get a new row to work on. Once you run out of fresh rows, threads can go back and look for arrays with arr[i].work_pos < arr[i].length.
Inside each row, you have a struct with the array itself (which records the length), and an atomic current-position counter, and another atomic counter for the number of threads currently working on this sub-array.
While working on an inner array, a thread atomically increments the position-within-array counter for that inner array, using that as the position of the next DoWork. So it's still a full memory barrier between every DoWork call (or unroll to claim 2 at a time and then do them), but contention is greatly reduced for most of the total run time because this will be the only thread incrementing that counter. (Until later threads jump in and start helping)
An atomic RMW on a cache line that stays hot in this core's L1d cache is much cheaper than an atomic RMW on a line when we have to request it from another core. So we want the per-row struct to be allocated separately, ideally contiguous with the row data like in C struct { _Atomic size_t work_pos; size_t len; atomic_int thread_active; struct work arr[]; }; with a "flexible array member" (so arbitrary-length array is contiguous with the end of the struct), or another level of indirection to just have a pointer/reference to an array.
Or if you can use the first 2 elements of an array of integers for this atomic bookkeeping, that also works. The outer array should be an array of references to these structs, not by value where multiple control blocks will share a cache line. False sharing would be about as bad as true sharing. And having pairs of threads contend with each other would be nearly as bad as all threads contending for the same single counter, if DoWork is slow enough that either way there's usually just one request for it in flight by one core.
Then the fun part comes at the end, when an Interlocked.Increment on the rowIndex returns an index past the end. Then that thread has to find an in-progress row to help with. Ideally this could be even distributed over still-working threads.
Perhaps we should have an array that records which row each thread is working on, with an entry for each thread? So threads looking for a place to help can scan through that and find the array with the highest work_left / threads_working. (That's why I suggested a thread-count member in the control block). Races in atomic stores of a pointer/reference to this array vs. readers reading one entry at a time aren't a problem; if an array was almost done, we wouldn't have wanted to pick it anyway, and we'll find somewhere useful to join.
If you naively just search backward from the end of the outer array, new threads will pile on to the last incomplete row, even if it's almost done, and create lots of contention for its atomic counters. But you also don't want to have to search over the whole outer array every time, if it could be large-ish. (If not, if rows are long but there aren't many of them, then that's fine.)
Reading the atomic work_pos counter that another thread is using will disturb that thread, as it loses exclusive ownership so its next Interlocked.Increment will be slower. So we'd like to avoid threads needing to find new rows to jump in on too frequently.
If we had a good heuristic for them to say that a row looks "good enough" and jump in immediately, instead of looking at all active / incomplete rows every time, that could reduce contention. But only if it's a good enough heuristic to make good choices.
Another way to reduce contention is to minimize how often a thread gets to the end of a row. Choosing the larger work_left / threads_working should achieve that, as that should be a decent approximation of which row will be completed last.
Multiple threads choosing at the same time might all pick the same row, but I don't think we can be perfect (or it would be too expensive). We can detect this when they use Interlocked.Increment to add themselves to the number of threads working on this row. Fallback to the second-longest estimated time row could be appropriate, or check if this is still the estimated-slowest row with the extra workers.
This doesn't have to be perfect; this is all just cleanup at the end of things, after running with minimal contention most of the time. As long as DoWork isn't too cheap relative to inter-thread latency, it's not a disaster if we sometimes have a bit more contention than was necessary.
Perhaps you'd also want some heuristic for a thread stopping itself before all the work is done could be useful, if there are other things a CPU core could be doing. (Or for this thread to be doing, in a pool of worker threads.)

You could optimize your current algorithm by doing binary search of a precomputed array, that contains the accumulated length of all the arrays up to this index. So for example if you have a jagged array of 10 inner arrays with lengths 8, 9, 5, 4, 0, 0, 6, 4, 4, 7, then the precomputed array will contain the values 0, 8, 17, 22, 26, 26, 26, 32, 36, 40. Doing a binary search will get you directly to the inner array that corresponds to the index that you are searching for, doing only O(Log n) comparisons.
Here is an implementation of this idea:
// --- Preparation ------------------------------
int[] indices = new int[arr.Length];
indices[0] = 0;
for (int i = 1; i < arr.Length; i++)
indices[i] = indices[i - 1] + arr[i - 1].Length;
int sumLength = arr.Sum(inner => inner.Length);
int sharedIndex = -1;
// --- In the worker thread ---------------------
while (true)
{
int index = Interlocked.Increment(ref sharedIndex);
if (index >= sumLength) break;
int outerIndex = Array.BinarySearch(indices, index);
if (outerIndex < 0) outerIndex = (~outerIndex) - 1;
while (arr[outerIndex].Length == 0) outerIndex++; // Skip empty arrays
int innerIndex = index - indices[outerIndex];
DoWork(arr[outerIndex][innerIndex]);
}

Related

Interlocked.Increment Method by a Certain Number Interval

We have a concurrent, multithreaded program.
How would I make a sample number increase by +5 interval every time? Does Interlocked.Increment, have an overload for interval? I don't see it listed.
Microsoft Interlocked.Increment Method
// Attempt to make it increase by 5
private int NumberTest;
for (int i = 1; i <= 5; i++)
{
NumberTest= Interlocked.Increment(ref NumberTest);
}
This is another question its based off,
C# Creating global number which increase by 1

I think you want Interlocked.Add:
Adds two integers and replaces the first integer with the sum, as an atomic operation.
int num = 0;
Interlocked.Add(ref num, 5);
Console.WriteLine(num);

Adding (i.e +=) is not and cannot be an atomic operation (as you know). Unfortunately, there is no way to achieve this without enforcing a full fence, on the bright-side these are fairly optimised at a low level. However, there are several other ways you can ensure integrity though (especially since this is just an add)
The use of Interlocked.Add (The sanest solution)
Apply exclusive lock (or Moniter.Enter) outside the for loop.
AutoResetEvent to ensure threads doing task one by one (meh sigh).
Create a temp int in each thread and once finished add temp onto the sum under an exclusive lock or similar.
The use of ReaderWriterLockSlim.
Parallel.For thread based accumulation with Interlocked.Increment sum, same as 4.

C#( C++ would be cool too) Fastest way to find differences in two large arrays/lists with indexes

More Details:
For this problem, I'm specifically looking for the fastest way to do this, in general and specifically in c#. I don't necessarily mean "theoretical" fastest/algorithmic, instead I'm looking for practical implementation speed. In this specific situation, the arrays only have like 1000 elements each, which seems very small, but this computation is going to be running very rapidly and comparing many arrays(it blows up in size very quickly). I ultimately need the indexes of each element that is different.
I can obviously do a very simple implementation like:
public List<int> FindDifferences(List<double> Original,List<double> NewList)
{
List<int> Changes = new List<int>();
for(int i=0;i<Original.Count;i++)
{
if(Original[i]!=NewList[i])
{
Changes.Add(i);
}
}
return Changes;
}
But from what I can see, this will be really slow overall since it has to iterate once though each item on the list. Is there anything I can do to speed this up? Specifically, is there a way to do something like a parallel foreach that generates a list of the indexes of changes? I saw what I think was a similar question asked before, but I didn't quite understand the answer .Or would there be another way to run the calculation on all items of the list simultaneously(or somehow clustered)?
Assumptions
Each array or list being compared contains data of the same
type(double int or string), so if array1 holds strings and is
compared to array2, I know for certain that array2 will only hold
strings and it will be of the same size(in terms of item count-I can
see if maybe they are the same byte count too if that could come
into play).
The vast majority of the items in these comparisons will remain the same. My resultant "differences" list will probably only contain a few(1-10) items, if any.
Concerns
1) After a comparison is made(old and new list in the block above), the new list will overwrite the old list. If computation time is slower than the time it takes to receive a new message(a new list to compare), I can have a problem with collision:
Lets say I have three lists, A,B, and C. A would be my global/"current state" list. When a message is received containing a new list(B), it would be the list B would be compared to.
In an ideal world, A would be compared to B, I would receive a list of integers representing the indexes that contain elements different between the two. After the method computes and returns this index list, A would become B(the values of B overwrite the values of A as my "current state"). When I receive another message(C), this would be compared to my new current state(A, but with the values previously belonging to B), I'd receive the list of differences and C's values would overwrite A's and become the new current state. If the comparison between A and B is still calculating when C is received, I would need to make sure the new calculation either:
Doesn't happen until after A and B's comparison finish and A is overwritten with its new values. or
The comparison is instead made between B and C, with C overwriting A after the comparison finishes(the difference list is fired off elsewhere, so I'd still receive both change lists)
2) If this comparison between lists can't be sped up, is there somewhere else I can speed up instead? These messages I'm receiving come as an object with three values, an Ascii-encoded byte array, a long string(the already parsed byte array), and a "type"(the name of the list it corresponds to-so I know the data type of its contents). I currently ignore the byte array and parse the string by splitting it at newline characters.
I know this is inefficient, but I have trouble converting the byte array into ints or doubles. The doubles because it has a lot of "noise"(a value of 1.50 could end up coming in as 1.4976789, so I actually have to round it to get its "real" value). The ints because there is no 0 padding, so I don't know the length to chunk the byte array into. Below is an example of what I'm doing:
public List<string> ListFromString(string request)
{
List<string> fulllist = request.Split('\n').ToList<string>();
return fulllist.GetRange(1, fulllist.Count - 2); //There's always a label tacked on the beginning so I start from 1
}
public List<double> RequestListAsDouble(string request)
{
List<string> RequestAsString = ListFromString(request);
List<double> RequestListAsDouble = new List<double>();
foreach(string requestElement in RequestAsString)
{
double requestElementAsDouble = Math.Round(Double.Parse(requestElement),2);
RequestListAsDouble.Add(requestElementAsDouble);
}
return RequestListAsDouble;
}

Your single-threaded comparison of the two parsed lists is probably the fastest way to do it. It is certainly the easiest. As noted by another poster, you can get some speed advantage by pre-allocating the size of the "Changes" list to be some percentage of the size of your input list.
If you want to try parallel thread comparisons, you should setup "N" number of threads in advance and have them wait for a starting event. "N" is the number of real processors on your system. Each thread should compare a portion of the lists, and write their answers to the interlocked output list "Changes". On completion, the threads go back to sleep, waiting for the next starting event.
When all the threads have gone back to their starting positions, the main thread can pick up the "Changes" and pass it along. Repeat with the next list
Be sure to clean up all the worker threads when your application is supposed to exit - or it won't exit.
There is a lot of overhead in starting and ending threads. It is all too easy to lose all the processing speed from that overhead. That's why you would want a pool of worker threads already setup and waiting on an event flag. Threads only improve processing speed up to the number of real CPUs in the system.

A small optimization would be to initialize the results list with the capacity of the original
https://msdn.microsoft.com/en-us/library/4kf43ys3(v=vs.110).aspx
If the size of the collection can be estimated, using the
List(Int32) constructor and specifying the initial capacity
eliminates the need to perform a number of resizing operations while
adding elements to the List.
List<int> Changes = new List<int>(Original.Length);

Concurrency issue: parallel writes

One day I was trying to get a better understanding of threading concepts, so I wrote a couple of test programs. One of them was:
using System;
using System.Threading.Tasks;
class Program
{
static volatile int a = 0;
static void Main(string[] args)
{
Task[] tasks = new Task[4];
for (int h = 0; h < 20; h++)
{
a = 0;
for (int i = 0; i < tasks.Length; i++)
{
tasks[i] = new Task(() => DoStuff());
tasks[i].Start();
}
Task.WaitAll(tasks);
Console.WriteLine(a);
}
Console.ReadKey();
}
static void DoStuff()
{
for (int i = 0; i < 500000; i++)
{
a++;
}
}
}
I hoped I will be able to see outputs less than 2000000. The model in my imagination was the following: more threads read variable a at the same time, all local copies of a will be the same, the threads increment it and the writes happen and one or more increments are "lost" this way.
Although the output is against this reasoning. One sample output (from a corei5 machine):
2000000
1497903
1026329
2000000
1281604
1395634
1417712
1397300
1396031
1285850
1092027
1068205
1091915
1300493
1357077
1133384
1485279
1290272
1048169
704754
If my reasoning were true I would see 2000000 occasionally and sometimes numbers a bit less. But what I see is 2000000 occasionally and numbers way less than 2000000. This indicates that what happens behind the scenes is not just a couple of "increment losses" but something more is going on. Could somebody explain me the situation?
Edit:
When I was writing this test program I was fully aware how I could make this thrad safe and I was expecting to see numbers less than 2000000. Let me explain why I was surprised by the output: First lets assume that the reasoning above is correct. Second assumption (this wery well can be the source of my confusion): if the conflicts happen (and they do) than these conflicts are random and I expect a somewhat normal distribution for these random event occurences. In this case the first line of the output says: from 500000 experiments the random event never occured. The second line says: the random event occured at least 167365 times. The difference between 0 and 167365 is just to big (almost impossible with a normal distribution). So the case boils down to the following:
One of the two assumptions (the "increment loss" model or the "somewhat normally distributed paralell conflicts" model) are incorrect. Which one is and why?

The behavior stems from the fact that you are using both the volatile keyword as well as not locking access to the variable a when using the increment operator (++) (although you still get a random distribution when not using volatile, using volatile does change the nature of the distribution, which is explored below).
When using the increment operator, it's the equivalent of:
a = a + 1;
In this case, you're actually doing three operations, not one:
Read the value of a
Add 1 to the value of a
Assign the result of 2 back to a
While the volatile keyword serializes access, in the above case, it's serializing access to three separate operations, not serializing access to them collectively, as an atomic unit of work.
Because you're performing three operations when incrementing instead of one, you have additions that are being dropped.
Consider this:
Time Thread 1 Thread 2
---- -------- --------
0 read a (1) read a (1)
1 evaluate a + 1 (2) evaluate a + 1 (2)
2 write result to a (3) write result to a (3)
Or even this:
Time a Thread 1 Thread 2 Thread 3
---- - -------- -------- --------
0 1 read a read a
1 1 evaluate a + 1 (2)
2 2 write back to a
3 2 read a
4 2 evaluate a + 1 (3)
5 3 write back to a
6 3 evaluate a + 1 (2)
7 2 write back to a
Note in particular steps 5-7, thread 2 has written a value back to a, but because thread 3 has an old, stale value, it actually overwrites the results that previous threads have written, essentially wiping out any trace of those increments.
As you can see, as you add more threads, you have a greater potential to mix up the order in which the operations are being performed.
volatile will prevent you from corrupting the value of a due to two writes happening at the same time, or a corrupt read of a due to a write happening during a read, but it doesn't do anything to handle making the operations atomic in this case (since you're performing three operations).
In this case, volatile ensures that the distribution of the value of a is between 0 and 2,000,000 (four threads * 500,000 iterations per thread) because of this serialization of access to a. Without volatile, you run the risk of a being anything as you can run into corruption of the value a when reads and/or writes happen at the same time.
Because you haven't synchronized access to a for the entire increment operation, the results are unpredictable, as you have writes that are being overwritten (as seen in the previous example).
What's going on in your case?
For your specific case you have many writes that are being overwritten, not just a few; since you have four threads each writing a loop two million times, theoretically all the writes could be overwritten (expand the second example to four threads and then just add a few million rows to increment the loops).
While it's not really probable, there shouldn't be an expectation that you wouldn't drop a tremendous amount of writes.
Additionally, Task is an abstraction. In reality (assuming you are using the default scheduler), it uses the ThreadPool class to get threads to process you requests. The ThreadPool is ultimately shared with other operations (some internal to the CLR, even in this case) and even then, it does things like work-stealing, using the current thread for operations and ultimately at some point drops down to the operating system at some level to get a thread to perform work on.
Because of this, you can't assume that there's a random distribution of overwrites that will be skipped, as there's always going to be a lot more going on that will throw whatever order you expect out the window; the order of processing is undefined, the allocation of work will never be evenly distributed.
If you want to ensure that additions won't be overwritten, then you should use the Interlocked.Increment method in the DoStuff method, like so:
for (int i = 0; i < 500000; i++)
{
Interlocked.Increment(ref a);
}
This will ensure that all writes will take place, and your output will be 2000000 twenty times (as per your loop).
It also invalidates the need for the volatile keyword, as you're making the operations you need atomic.
The volatile keyword is good when the operation that you need to make atomic is limited to a single read or write.
If you have to do anything more than a read or a write, then the volatile keyword is too granular, you need a more coarse locking mechanism.
In this case, it's Interlocked.Increment, but if you have more that you have to do, then the lock statement will more than likely be what you rely on.

I don't think it's anything else happening - it's just happening a lot. If you add 'locking' or some other synch technique (Best thread-safe way to increment an integer up to 65535) you'll reliably get the full 2,000,000 increments.
Each task is calling DoStuff() as you'd expect.
private static object locker = new object();
static void DoStuff()
{
for (int i = 0; i < 500000; i++)
{
lock (locker)
{
a++;
}
}
}

Try increasing the the amounts, the timespan is simply to short to draw any conclusions on. Remember that normal IO is in the range of milliseconds and just one blocking IO-op in this case would render the results useless.
Something along the lines of this is better: (or why not intmax?)
static void DoStuff()
{
for (int i = 0; i < 50000000; i++) // 50 000 000
a++;
}
My results ("correct" being 400 000 000):
63838940
60811151
70716761
62101690
61798372
64849158
68786233
67849788
69044365
68621685
86184950
77382352
74374061
58356697
70683366
71841576
62955710
70824563
63564392
71135381
Not really a normal distribution but we are getting there. Bear in mind that this is roughly 35% of the correct amount.
I can explain my results as I am running on 2 physical cores, although viewed as 4 due to hyperthreading, which means that if it is optimal to do a "ht-switch" during the actual addition atleast 50% of the additions will be "removed" (if I remember the implementation of ht correctly it would be (ie modifying some threads data in ALU while loading/saving other threads data)). And the remaining 15% due to the program actually running on 2 cores in parallell.
My recommendations
post your hardware
increase the loop count
vary the TaskCount
hardware matters!

Interview - Write a program to remove even elements

I was asked this today and i know the answer is damn sure simple but he kept me the twist to the last.
Question
Write a program to remove even numbers stored in ArrayList containing 1 - 100.
I just said wow
Here you go this is how i have implemented it.
ArrayList source = new ArrayList(100);
for (int i = 1; i < 100; i++)
{
source.Add(i);
}
for (int i = 0; i < source.Count; i++)
{
if (Convert.ToInt32(source[i]) % 2 ==0)
{
source.RemoveAt(i);
}
}
//source contains only Odd elements
The twist
He asked me what is the computational complexity of this give him a equation. I just did and said this is Linear directly proportional to N (Input).
he said : hmmm.. so that means i need to wait longer to get results when the input size increases am i right? Yes sirr you are
Tune it for me, make it Log(N) try as much as you can he said. I failed miserably in this part.
Hence come here for the right logic, answer or algorithm to do this.
note: He wanted no Linq, No extra bells and whistles. Just plain loops or other logic to do it

I dare say that the complexity is in fact O(N^2), since removal in arrays is O(N) and it can potentially be called for each item.
So you have O(N) for the traversal of the array(list) and O(N) for each removal => O(N) * O(N).
Since it does not seem clear, I'll explain the reasoning. At each step a removal of an item may take place (assuming the worst case in which every item must be removed). In an array the removal is done by shifting. Hence, to remove the first item, I need to shift all the following N-1 items by one position to the left:
1 2 3 4 5 6...
<---
2 3 4 5 6...
Now, at each iteration I need to shift, so I'm doing N-1 + N-2 + ... + 1 + 0 shifts, which gives a result of (N) * (N-1) / 2 (arithmetic series) giving a final complexity of O(N^2).

Let's think it this way:
The number of delete actions you are doing is, forcely, the half of array lenght (if the elements are stored in array). So the complexity is at least O(N) .
The question you received let me suppose that your professor wanted you to reason about different ways of storing the numbers.
Usually when you have log complexity you are working with different structures, like graphs or trees.
The only way I can think of having logartmic complexity is having the numbers stored in a tree (ordered tree, b-tree... we colud elaborate on this), but it is actually out of the constraints of your exam (sotring numbers in array).
Does it make sense to you?

You can get noticeably better performance if you keep two indexes, one to the current read position and one to the current write position.
int read = 0
int write = 0;
The idea is that read looks at each member of the array in turn; write keeps track of the current end of the list. When we find a member we want to delete, we move read forwards, but not write.
for (int read = 0; read < source.Count; read++) {
if (source[read] % 2 != 0) {
source[write] = source[read];
write += 1;
}
}
Then at the end, tell the ArrayList that its new length is the current value of `write'.
This takes you from your original O(n^2) down to O(n).
(note: I haven't tested this)

Without changing the data structure or making some assumption on the way items are stores inside the ArrayList, I can't see how you'll avoid checking the parity of each and every member (hence at least O(n) complexity). Perhaps the interviewer simply wanted you to tell him it's impossible.

If you really have to use an ArrayList and actively have to remove the entries (instead if not adding them in the first place)
Not incrementing by i + 1 but i + 2 will remove your need to check if it is odd.
for (int i = source.Count - 1 ; i > 0; i = i i 2)
{
source.RemoveAt(i);
}
Edit: I know this will only work if source contains the entries from 1-100 in sequential order.

The problem with the given solution is that it starts from the beginning, so the entire list must be shifted each time an item is removed:
Initial List: 1, 2, 3, 4, 5, ..., 98, 99
/ / / /// /
After 1st removal: 1, 3, 4, 5, ..., 98, 99, <empty>
/ /// / /
After 2nd removal: 1, 3, 5, ..., 98, 99, <empty>, <empty>
I've used the slashes to try to show how the list shifts after each removal.
You can reduce the complexity (and eliminate the bug I mentioned in the comments) simply by reversing the order of removal:
for (int i = source.Count-1; i >= 0; --i) {
if (Convert.ToInt32(source[i]) % 2 == 0) {
// No need to re-check the same element during the next iteration.
source.RemoveAt(--i);
}
}

It is possible IF you have unlimited parallel threads available to you.
Suppose that we have an array with n elements. Assign one thread per element. Assume all threads act in perfect sync.
Each thread decides whether its element is even or odd. (Time O(1).)
Determine how many elements below it in the array are odd. (Time O(log(n)).)
Mark a 0 or 1 in an second array depending whether you are even or odd at the same index. So each one is a count of odds at that spot.
If your index is odd, add the previous number. Now each entry is a count of odds in the current block of 2 up to yourself
If your index mod 4 is 2, add the value at the index below, if it is 3, add the answer 2 indexes below. Now each entry is a count of odds in the current block of 4 up to yourself.
Continue this pattern with blocks of 2**i (if you're in the top half add the count for the bottom half) log2(n) times - now each entry in this array is the count of odds below.
Each CPU inserts its value into the correct slot.
Truncate the array to the right size.
I am willing to bet that something like this is the answer your friend has in mind.

Does looping occurs at the same speed for all systems?

Does looping in C# occur at the same speed for all systems. If not, how can I control a looping speed to make the experience consistent on all platforms?

You can set a minimum time for the time taken to go around a loop, like this:
for(int i= 0; i < 10; i++)
{
System.Threading.Thread.Sleep(100);
... rest of your code...
}
The sleep call will take a minimum of 100ms (you cannot say what the maximum will be), so your loop wil take at least 1 second to run 10 iterations.
Bear in mind that it's counter to the normal way of Windows programming to sleep on your user-interface thread, but this might be useful to you for a quick hack.

You can never depend on the speed of a loop. Although all existing compilers strive to make loops as efficient as possible and so they probably produce very similar results (given enough development time), the compilers are not the only think influencing this.
And even leaving everything else aside, different machines have different performance. No two machines will yield the exact same speed for a loop. In fact, even starting the program twice on the same machine will yield slightly different performances. It depends on what other programs are running, how the CPU is feeling today and whether or not the moon is shining.

No, loops do not occur the same in all systems. There are so many factors to this question that it can not be appreciable answered without code.
This is a simple loop:
int j;
for(int i = 0; i < 100; i++) {
j = j + i;
}
this loop is too simple, it's merely a pair of load, add, store operations, with a jump and a compare. This will be only a few microops and will be really fast. However, the speed of those microops will be dependent on the processor. If the processor can do one microop in 1 billionth of a second (roughly one gigahertz) then the loop will take approximately 6 * 100 microops (this is all rough estimation, there are so many factors involved that I'm only going for approximation) or 6 * 100 billionths of a second, or slightly less than one millionth of a second. For the entire loop. You can barely measure this with most operating system functions.
I wanted to demonstrate the speed of the looping. I referenced above a processor of 1 billion microops per second. Now consider a processor that can do 4 billion microops per second. That processor would be four times faster (roughly) than the first processor. And we didn't change the code.
Does this answer the question?
For those who want to mention that the compiler might loop unroll this, ignore that for the sake of the learning.

One way of controlling this is by using the Stopwatch to control when you do your logic. See this example code:
int noofrunspersecond = 30;
long ticks1 = 0;
long ticks2 = 0;
double interval = (double)Stopwatch.Frequency / noofrunspersecond;
while (true) {
ticks2 = Stopwatch.GetTimestamp();
if (ticks2 >= ticks1 + interval) {
ticks1 = Stopwatch.GetTimestamp();
//perform your logic here
}
Thread.Sleep(1);
}
This will make sure that that the logic is performed at given intervals as long as the system can keep up, so if you try to execute 100 times per second, depending on the logic performed the system might not manage to perform that logic 100 times a second. In other cases this should work just fine.
This kind of logic is good for getting smooth animations that will not speed up or slow down on different systems for example.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Lock free iterating / indexing jagged arrays - c#

Related

Interlocked.Increment Method by a Certain Number Interval

C#( C++ would be cool too) Fastest way to find differences in two large arrays/lists with indexes

Concurrency issue: parallel writes

Interview - Write a program to remove even elements

Does looping occurs at the same speed for all systems?

Categories

Resources