So, for uni we have to do this assignment where we have to make a serial implementation of a rush hour solver parallel. The solver uses a bfs implementation.
Here is a part of the default bfs implementation:
// Initialize empty queue
Queue<Tuple<byte[], Solution>> q = new Queue<Tuple<byte[], Solution>>();
// By default, the solution is "no solution"
foundSolution = new NoSolution();
// Place the starting position in the queue
q.Enqueue(Tuple.Create(vehicleStartPos, (Solution)new EmptySolution()));
AddNode(vehicleStartPos);
// Do BFS
while (q.Count > 0)
{
Tuple<byte[], Solution> currentState = q.Dequeue();
// Generate sucessors, and push them on to the queue if they haven't been seen before
foreach (Tuple<byte[], Solution> next in Sucessors(currentState))
{
// Did we reach the goal?
if (next.Item1[targetVehicle] == goal)
{
q.Clear();
foundSolution = next.Item2;
break;
}
// If we haven't seen this node before, add it to the Trie and Queue to be expanded
if(!AddNode(next.Item1))
q.Enqueue(next);
}
}
Console.WriteLine(foundSolution);
Console.ReadLine();
I managed to turn this into parallel like this:
ConcurrentQueue<Tuple<byte[], Solution>> q = new ConcurrentQueue<Tuple<byte[], Solution>>();
foundSolution = new NoSolution();
q.Enqueue(Tuple.Create(vehicleStartPos, (Solution)new EmptySolution()));
AddNode(vehicleStartPos);
while (q.Count > 0 && !solutionFound)
{
Tuple<byte[], Solution> currentState;
q.TryDequeue(out currentState);
Parallel.ForEach(Sucessors(currentState), (next) =>
{
// Did we reach the goal?
if (next.Item1[targetVehicle] == goal)
{
solutionFound = true;
foundSolution = next.Item2;
return;
}
// If we haven't seen this node before, add it to the Trie and Queue to be expanded
if (!AddNode(next.Item1))
q.Enqueue(next);
});
}
as you can see, I tried to implement a parallel foreach loop with a concurrentQueue. I get the feeling like the concurrentQueue works well, but it locks automatically and thus costs too much time, making this parallel implementation way slower than the serial one.
I was thinking about having a wait-free or at least lock-free queue, so I can save that bit of time, but I am not sure how to implement such thing. Could you guys give some insight into whether this would be feasable and whether it would be faster than using a regular Queue ? Or maybe use a different concurrent data structure to better suit the situation. Not sure how well a ConcurrentBag and the like would fit in. Could you shed some light on this ?
Also, after having searched for parallel bfs implementations, I couldn't find any. What are some general tips and hints for people like me wanting to implement bfs in parallel ? What are some good alternatives for the queue, to make it thread-safe ?
EDIT1:
I managed to implement tasks like this:
int taskNumbers = Environment.ProcessorCount;
Task[] tasks = new Task[taskNumbers];
// Set up the cancellation token
ctSource = new CancellationTokenSource();
for (int i = 0; i < taskNumbers; i++)
tasks[i] = new Task(() =>
{
try{ Traverse(); }
catch{ }
},
ctSource.Token);
for (int i = 0; i < taskNumbers; i++)
tasks[i].Start();
Task.WaitAll(tasks);
ctSource.Dispose();
They call a traverse method, which looks like this:
private static void Traverse()
{
ctSource.Token.ThrowIfCancellationRequested();
while (q.Count > 0)
{
Tuple<byte[], Solution> currentState;
if (q.TryDequeue(out currentState))
{
foreach (Tuple<byte[], Solution> next in Sucessors(currentState))
{
// Did we reach the goal?
if (next.Item1[targetVehicle] == goal)
{
ctSource.Cancel();
foundSolution = next.Item2;
return;
}
// If we haven't seen this node before, add it to the Trie and Queue to be expanded
if (!AddNode(next.Item1))
q.Enqueue(next);
}
}
if (ctSource.IsCancellationRequested)
ctSource.Token.ThrowIfCancellationRequested();
}
}
yet, I am having trouble figuring out the condition for the while loop in the traverse method. The current condition allows for tasks to exit the loop too early. As far as I know, I dont have a complete list of all nodes available, so I cant compare the visited tree to the list of all nodes. Besides that, I don't have any other ideas of how I can keep tasks looping through the while loop until I have found an answer or until there are no more new nodes. Could you guys help me out ?
Thnx #Brian Malehorn for your help so far, I managed to get the performance of the parallel bfs version up to almost equal the performance of the serial version. All I need now is to make tasks stay in the while loop I think.
The problem isn't your queue, the problem is you're parallelizing the wrong thing. You're parallelizing adding the successors to the queue, when you should be parallelizing the Sucessors() call.
That is, Sucessors() should only be called from a worker thread, never in the "main" thread.
For example, suppose Sucessors() takes 1 second to run, and you're searching this tree:
o
/ \
/ \
o o
/ \ / \
o o o o
The fastest you can search this tree is 3 seconds. How long will your code take?
Related
my code is a bit complex, but the core is starting threads like this:
Thread task = new Thread(new ParameterizedThreadStart(x => { ThreadReturn = BuildChildNodes(x); }));
task.Start((NodeParameters)tasks[0]);
it should work. but when i check my CPU usage i get barely 10%. so i do assume it's just using one core. barely.
ThreadReturn btw is a value i use a setter on, to have some kind of event, when the thread is ready:
public object ThreadReturn
{
set
{
lock (thisLock)
{
NodeReturn result = (NodeReturn)value;
if (result.states.Count == 0) return;
Content[result.level + 1].AddRange(result.states);
if (result.level + 1 >= MaxDepth) return;
for (int i = 0; i < result.states.Count; i++)
{
Thread newTask = new Thread(new ParameterizedThreadStart(x => ThreadReturn = BuildChildNodes(x)));
NodeParameters param = new NodeParameters()
{
level = result.level + 1,
node = Content[result.level + 1].Count - (i + 1),
turn = SkipOpponent ? StartTurn : !result.turn
};
if (tasks.Count > 100)
unstarted.Add(param);
else
{
newTask.Start(param);
tasks.Add(newTask);
}
}
}
}
}
i got some crazy error about mark stack overflow so i limited the maximum number of parallel tasks with putting them into a second list...
i'm not firm in multithreading so this code is a little bit messy... maybe you can show me a better way which actually uses my cores.
btw: it's not the locks fault. i tried without before. -> same result
Edit: this is my code before i went to the Threading class. i find it more suitable:
Content.Clear();
Content.Add(new List<T> { Root });
for (var i = 0; i < maxDepth; i++)
Content.Add(new List<T>());
Task<object> firstTask = new Task<object>(x => BuildChildNodes(x), (new NodeParameters() { level = 0, node = 0, turn = Turn }));
firstTask.Start();
tasks.Add(firstTask);
while (tasks.Count > 0 && Content.Last().Count == 0)
{
Task.WaitAny(tasks.ToArray());
for (int task = tasks.Count - 1; task >= 0; task--)
{
if (tasks[task].IsCompleted)
{
NodeReturn result = (NodeReturn)tasks[task].Result;
tasks.RemoveAt(task);
Content[result.level + 1].AddRange(result.states);
if (result.level + 1 >= maxDepth) continue;
for (int i = 0; i < result.states.Count; i++)
{
Task<object> newTask = new Task<object>(x => BuildChildNodes(x), (object)new NodeParameters() { level = result.level + 1, node = Content[result.level + 1].Count - (i + 1), turn = SkipOpponent ? Turn : !result.turn });
newTask.Start();
}
}
}
}
In every state i'm calculating children and in my main thread i put them into my state tree while waiting for the tasks to finish. please assume i'd actually use the return value of waitany, i did a git reset and now... welll... it's gone^^
Edit:
Okay i don't know what exactly i did wrong but... in general everything was a total mess. i now implemented the deep construction method and maybe because there's much less... "traffic" now my whole code runs in 200ms. so... thanks for this!
i don't know if i should delete this question hence stupidity or if you guys want to post answers so i can rate them postive, you really helped me a lot :)
Disregarding all the other issues you have here, essentially your lock ruins the show.
What you are saying is, hey random person go and do some stuff! just make sure you don't do it at the same time as anyone else (lock), you could have 1000 threads, but only one thread is going to be active at one time on one core, hence your results.
Here are some other thoughts.
Get the gunk out of the setter, this would fail any sane code review.
Use Tasks instead of Thread.
Thinking about what needs thread safety, and elegantly lock only what needs it, Take a look at the Interlocked for dealing with numeric atomic manipulation
Take a look at the concurrent collections you may get more mileage out of this
Simplify your code.
I can't give any more advice as it's just about impossible to know what you are trying to do.
The original post contained a problem, I managed to solve, introducing a lot of issues with shared mutable state. Now, I'm wondering, if it can be done in a pure functional way.
Requests can be processed in a certain order.
For each order i there is an effectiveness E(i)
Processing request should follow three conditions
There should be no delay between acquiring the first request and processing it
There should be no delay between processing some request and processing next request
When there are several orders of processing requests, the one with highest effectiveness should be chosen
Concrete example:
For an infinite list of integers, print them, so, that prime numbers are generally earlier, than not prime numbers
Effectiveness of ordering is reverse to the number of times we had primes in queue, but printed non prime
My first solution in C# (not for primes, obviously) used some classes having a shared mutable state represented by a concurrent priority queue. It was ugly, because I had to manually subscribe classes to events and unsubscribe them, check that queue is not exhausted by one intermediate consumer before other consumer processes it and etc.
To refactor it, I chose Reactive Extensions library, which seemed to address issues with state. I understood that in the following case I couldn't use it:
The source function accepts nothing and returns IObservable<Request>
The process function accepts IObservable<Request> and returns nothing
I have to write a reorder function, which reorders requests on their way from source to process.
Internally reorder has a ConcurrentPriorityQueue of orders. It should handle two scenarios:
When process is busy with processing reorder finds better orderings and updates the queue
When process has requested a new order reorder returns the first element from queue
The problem was that if reorder returned IObservable<Request>, it wass unaware, whether items were requested from it, or no.
If reorder had called OnNext immediately upon receiving, it did not reorder anything and violated condition 3.
If it ensured, that it had found the best ordering, it violated conditions 1&2 because process could become idle.
If reorder returned ISubject<Request>, it exposed an option to call OnError and OnCompleted to consumer.
If reorder has returned the queue, I would have returned to where I started
The problem was that cold IObservable.Create was not lazy enough. It started exhausting queue with all requests when a subscription to it was made but results of only the first ones were used.
The solution I came up with is to return observable of requests, i.e. IObservable<Func<Task<int>>> instead of IObservable<int>
It works when there is only one subscriber, but if there are more requests used, than there are numbers generated by source, they will be awaited forever.
This issue can probably be solved by introducing caching, but then consumer which consumed a queue fast will have side effects on all other consumers, because he will freeze the queue in less effective ordering, than it would be after some waiting.
So, I will post solution to the original question, but It's not really a valuable answer, because it introduces a lot of problems.
This demonstrates why doesn't functional reactive programming and side effects mix well. On the other hand, it seems I now have an example of a practical problem impossible to solve in pure functional way. Or don't I? If Order function accepted optimizationLevel as a parameter it would be pure. Can we somehow implicitly convert time to optimizationLevel to make this pure as well?
I'd like to see such solution very much. In C# or any other language.
Problematic solution. Uses ConcurrentPriorityQueue from this repo.
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using System.Reactive.Linq;
using DataStructures;
using System.Threading;
namespace LazyObservable
{
class Program
{
/// <summary>
/// Compares tuple by second element, then by first in reverse
/// </summary>
class PriorityComparer<TElement, TPriority> : IComparer<Tuple<TElement, TPriority>>
where TPriority : IComparable<TPriority>
{
Func<TElement, TElement, int> fallbackComparer;
public PriorityComparer(IComparer<TElement> comparer=null)
{
if (comparer != null)
{
fallbackComparer = comparer.Compare;
}
else if (typeof(IComparable<TElement>).IsAssignableFrom(typeof(TElement))
|| typeof(IComparable).IsAssignableFrom(typeof(TElement)))
{
fallbackComparer = (a,b)=>-Comparer<TElement>.Default.Compare(a,b);
}
else
{
fallbackComparer = (_1,_2) => 0;
}
}
public int Compare(Tuple<TElement, TPriority> x, Tuple<TElement, TPriority> y)
{
if (x == null && y == null)
{
return 0;
}
if (x == null || y == null)
{
return x == null ? -1 : 1;
}
int res=x.Item2.CompareTo(y.Item2);
if (res == 0)
{
res = fallbackComparer(x.Item1,y.Item1);
}
return res;
}
};
const int N = 100;
static IObservable<int> Source()
{
return Observable.Interval(TimeSpan.FromMilliseconds(1))
.Select(x => (int)x)
.Where(x => x <= 100);
}
static bool IsPrime(int x)
{
if (x <= 1)
{
return false;
}
if (x == 2)
{
return true;
}
int limit = ((int)Math.Sqrt(x)) + 1;
for (int i = 2; i < limit; ++i)
{
if (x % i == 0)
{
return false;
}
}
return true;
}
static IObservable<Func<Task<int>>> Order(IObservable<int> numbers)
{
ConcurrentPriorityQueue<Tuple<int, int>> queue = new ConcurrentPriorityQueue<Tuple<int, int>>(new PriorityComparer<int, int>());
numbers.Subscribe(x =>
{
queue.Add(new Tuple<int, int>(x, 0));
});
numbers
.ForEachAsync(x=>
{
Console.WriteLine("Testing {0}", x);
if (IsPrime(x))
{
if (queue.Remove(new Tuple<int, int>(x, 0)))
{
Console.WriteLine("Accelerated {0}", x);
queue.Add(new Tuple<int, int>(x, 1));
}
}
});
Func<Task<int>> requestElement = async () =>
{
while (queue.Count == 0)
{
await Task.Delay(30);
}
return queue.Take().Item1;
};
return numbers.Select(_=>requestElement);
}
static void Process(IObservable<Func<Task<int>>> numbers)
{
numbers
.Subscribe(async x=>
{
await Task.Delay(1000);
Console.WriteLine(await x());
});
}
static void Main(string[] args)
{
Console.WriteLine("init");
Process(Order(Source()));
//Process(Source());
Console.WriteLine("called");
Console.ReadLine();
}
}
}
To summarize (conceptually):
You have requests that come in irregularly (from source), and a single processor (function process) that can handle them.
The processor should have no downtime.
You're implicitly going to need some sort of queue-ish collection to manage the case where the requests come in faster than the processor can process.
In the event that there are multiple requests queued up, ideally, you should order them by some effectiveness function, however the re-ordering shouldn't be the cause of downtime. (Function reorder).
Is all this correct?
Assuming it is, the source can be of type IObservable<Request>, sounds fine. reorder though sounds like it should really return an IEnumerable<Request>: process wants to be working on a pull-basis: It wants to pull the highest priority request once it frees up, and wait for the next request if the queue is empty but start immediately. That sounds like a task for IEnumerable, not IObservable.
public IObservable<Request> requester();
public IEnumerable<Request> reorder(IObservable<Request> requester);
public void process(IEnumerable<Request> requestEnumerable);
I have a web service I need to query and it takes a value that supports pagination for its data. Due to the amount of data I need to fetch and how that service is implemented I intended to do a series of concurrent http web requests to accumulate this data.
Say I have number of threads and page size how could I assign each thread to pick its starting point that doesn't overlap with the other thread? Its been a long time since I took parallel programming and I'm floundering a bit. I know I could find my start point with something like start = N/numThreads * threadNum however I don't know N. Right now I just spin up X threads and each loop until they get no more data. Problem is they tend to overlap and I end up with duplicate data. I need unique data and not to waste requests.
Right now I have code that looks something like this. This is one of many attempts and I see why this is wrong but its better to show something. The goal is to in parallel collect pages of data from a webservice:
int limit = pageSize;
data = new List<RequestStuff>();
List<Task> tasks = new List<Task>();
for (int i = 0; i < numThreads; i++)
{
tasks.Add(Task.Factory.StartNew(() =>
{
try
{
List<RequestStuff> someData;
do
{
int start;
lock(myLock)
{
start = data.Count;
}
someKeys = GetDataFromService(start, limit);
lock (myLock)
{
if (someData != null && someData.Count > 0)
{
data.AddRange(someData);
}
}
} while (hasData);
}
catch (AggregateException ex)
{
//Exception things
}
}));
}
Task.WaitAll(tasks.ToArray());
Any inspiration to solve this without race conditions? I need to stick to .NET 4 if that matters.
I'm not sure there's a way to do this without wasting some requests unless you know the actual limit. The code below might help eliminate the duplicate data as you will only query on each index once:
private int _index = -1; // -1 so first request starts at 0
private bool _shouldContinue = true;
public IEnumerable<RequestStuff> GetAllData()
{
var tasks = new List<Task<RequestStuff>>();
while (_shouldContinue)
{
tasks.Add(new Task<RequestStuff>(() => GetDataFromService(GetNextIndex())));
}
Task.WaitAll(tasks.ToArray());
return tasks.Select(t => t.Result).ToList();
}
private RequestStuff GetDataFromService(int id)
{
// Get the data
// If there's no data returned set _shouldContinue to false
// return the RequestStuff;
}
private int GetNextIndex()
{
return Interlocked.Increment(ref _index);
}
It could also be improved by adding cancellation tokens to cancel any indexes you know to be wasteful, i.e, if index 4 returns nothing you can cancel all queries on indexes above 4 that are still active.
Or if you could make a reasonable guess at the max index you might be able to implement an algorithm to pinpoint the exact limit before retrieving any data. This would probably only be more efficient if your guess was fairly accurate though.
Are you attempting to force parallelism on the part of the remote service by issuing multiple concurrent requests? Paging is generally used to limit the amount of data returned to only that which is needed, but if you need all of the data, then attempting to first page and then reconstruct it later seems like a poor design. Your code becomes needlessly complex, difficult to maintain, you'll likely just move the bottleneck from code you control to somewhere else, and now you've introduced data integrity issues (what happens if all of these threads access different versions of the data you are trying to query?). By increasing the complexity and number of calls, you are also increasing the likelihood of problems occurring (eg. one of the connections gets dropped).
Can you state the problem you are attempting to solve so perhaps instead we can help architect a better solution?
After doing some research, I'm resorting to any feedback regarding how to effectively remove two items off a Concurrent collection. My situation involves incoming messages over UDP which are currently being placed into a BlockingCollection. Once there are two Users in the collection, I need to safely Take two users and process them. I've seen several different techniques including some ideas listed below. My current implementation is below but I'm thinking there's a cleaner way to do this while ensuring that Users are processed in groups of two. That's the only restriction in this scenario.
Current Implementation:
private int userQueueCount = 0;
public BlockingCollection<User> UserQueue = new BlockingCollection<User>();
public void JoinQueue(User u)
{
UserQueue.Add(u);
Interlocked.Increment(ref userQueueCount);
if (userQueueCount > 1)
{
IEnumerable<User> users = UserQueue.Take(2);
if(users.Count==2) {
Interlocked.Decrement(ref userQueueCount);
Interlocked.Decrement(ref userQueueCount);
... do some work with users but if only one
is removed I'll run into problems
}
}
}
What I would like to do is something like this but I cannot currently test this in a production situation to ensure integrity.
Parallel.ForEach(UserQueue.Take(2), (u) => { ... });
Or better yet:
public void JoinQueue(User u)
{
UserQueue.Add(u);
// if needed? increment
Interlocked.Increment(ref userQueueCount);
UserQueue.CompleteAdding();
}
Then implement this somewhere:
Task.Factory.StartNew(() =>
{
while (userQueueCount > 1) OR (UserQueue.Count > 1) If it's safe?
{
IEnumerable<User> users = UserQueue.Take(2);
... do stuff
}
});
The problem with this is that i'm not sure I can guarantee that between the condition (Count > 1) and the Take(2) that i'm ensuring the UserQueue has at least two items to process? Incoming UDP messages are processed in parallel so I need a way to safely pull items off of the Blocking/Concurrent Collection in pairs of two.
Is there a better/safer way to do this?
Revised Comments:
The intented goal of this question is really just to achieve a stable/thread safe method of processing items off of a Concurrent Collection in .Net 4.0. It doesn't have to be pretty, it just has to be stable in the task of processing items in unordered pairs of twos in a parallel environment.
Here is what I'd do in rough Code:
ConcurrentQueuequeue = new ConcurrentQueue(); //can use a BlockingCollection too (as it's just a blocking ConcurrentQueue by default anyway)
public void OnUserStartedGame(User joiningUser)
{
User waitingUser;
if (this.gameQueue.TryDequeue(out waitingUser)) //if there's someone waiting, we'll get him
this.MatchUsers(waitingUser, joiningUser);
else
this.QueueUser(joiningUser); //it doesn't matter if there's already someone in the queue by now because, well, we are using a queue and it will sort itself out.
}
private void QueueUser(User user)
{
this.gameQueue.Enqueue(user);
}
private void MatchUsers(User first, User second)
{
//not sure what you do here
}
The basic idea being that if someone's wants to start a game and there's someone in your queue, you match them and start a game - if there's no-one, add them to the queue.
At best you'll only have one user in the queue at a time, but if not, well, that's not too bad either because as other users start games, the waiting ones will gradually removed and no new ones added until the queue is empty again.
If I could not put pairs of users into the collection for some reason, I would use ConcurrentQueue and try to TryDequeue 2 items at a time, if I can get only one - put it back. Wait as necessary.
I think the easiest solution here is to use locking: you will have one lock for all consumers (producers won't use any locks), which will make sure you always take the users in the correct order:
User firstUser;
User secondUser;
lock (consumerLock)
{
firstUser = userQueue.Take();
secondUser = userQueue.Take();
}
Process(firstUser, secondUser);
Another option, would be to have two queues: one for single users and one for pairs of users and have a process that transfers them from the first queue to the second one.
If you don't mind having wasting another thread, you can do this with two BlockingCollections:
while (true)
{
var firstUser = incomingUsers.Take();
var secondUser = incomingUsers.Take();
userPairs.Add(Tuple.Create(firstUser, secondUser));
}
You don't have to worry about locking here, because the queue for single users will have only one consumer, and the consumers of pairs can now use simple Take() safely.
If you do care about wasting a thread and can use TPL Dataflow, you can use BatchBlock<T>, which combines incoming items into batches of n items, where n is configured at the time of creation of the block, so you can set it to 2.
May this can helpd
public static IList<T> TakeMulti<T>(this BlockingCollection<T> me, int count = 100) where T : class
{
T last = null;
if (me.Count == 0)
{
last = me.Take(); // blocking when queue is empty
}
var result = new List<T>(count);
if (last != null)
{
result.Add(last);
}
//if you want to take more item on this time.
//if (me.Count < count / 2)
//{
// Thread.Sleep(1000);
//}
while (me.Count > 0 && result.Count <= count)
{
result.Add(me.Take());
}
return result;
}
I wanted to parallelize a piece of code, but the code actually got slower probably because of overhead of Barrier and BlockCollection. There would be 2 threads, where the first would find pieces of work wich the second one would operate on. Both operations are not much work so the overhead of switching safely would quickly outweigh the two threads.
So I thought I would try to write some code myself to be as lean as possible, without using Barrier etc. It does not behave consistent however. Sometimes it works, sometimes it does not and I can't figure out why.
This code is just the mechanism I use to try to synchronize the two threads. It doesn't do anything useful, just the minimum amount of code you need to reproduce the bug.
So here's the code:
// node in linkedlist of work elements
class WorkItem {
public int Value;
public WorkItem Next;
}
static void Test() {
WorkItem fst = null; // first element
Action create = () => {
WorkItem cur=null;
for (int i = 0; i < 1000; i++) {
WorkItem tmp = new WorkItem { Value = i }; // create new comm class
if (fst == null) fst = tmp; // if it's the first add it there
else cur.Next = tmp; // else add to back of list
cur = tmp; // this is the current one
}
cur.Next = new WorkItem { Value = -1 }; // -1 means stop element
#if VERBOSE
Console.WriteLine("Create is done");
#endif
};
Action consume = () => {
//Thread.Sleep(1); // this also seems to cure it
#if VERBOSE
Console.WriteLine("Consume starts"); // especially this one seems to matter
#endif
WorkItem cur = null;
int tot = 0;
while (fst == null) { } // busy wait for first one
cur = fst;
#if VERBOSE
Console.WriteLine("Consume found first");
#endif
while (true) {
if (cur.Value == -1) break; // if stop element break;
tot += cur.Value;
while (cur.Next == null) { } // busy wait for next to be set
cur = cur.Next; // move to next
}
Console.WriteLine(tot);
};
try { Parallel.Invoke(create, consume); }
catch (AggregateException e) {
Console.WriteLine(e.Message);
foreach (var ie in e.InnerExceptions) Console.WriteLine(ie.Message);
}
Console.WriteLine("Consume done..");
Console.ReadKey();
}
The idea is to have a Linkedlist of workitems. One thread adds items to the back of that list, and another thread reads them, does something, and polls the Next field to see if it is set. As soon as it is set it will move to the new one and process it. It polls the Next field in a tight busy loop because it should be set very quickly. Going to sleep, context switching etc would kill the benefit of parallizing the code.
The time it takes to create a workitem would be quite comparable to executing it, so the cycles wasted should be quite small.
When I run the code in release mode, sometimes it works, sometimes it does nothing. The problem seems to be in the 'Consumer' thread, the 'Create' thread always seems to finish. (You can check by fiddling with the Console.WriteLines).
It has always worked in debug mode. In release it about 50% hit and miss. Adding a few Console.Writelines helps the succes ratio, but even then it's not 100%. (the #define VERBOSE stuff).
When I add the Thread.Sleep(1) in the 'Consumer' thread it also seems to fix it. But not being able to reproduce a bug is not the same thing as knowing for sure it's fixed.
Does anyone here have a clue as to what goes wrong here? Is it some optimization that creates a local copy or something that does not get updated? Something like that?
There's no such thing as a partial update right? like a datarace, but then that one thread is half doen writing and the other thread reads the partially written memory? Just checking..
Looking at it I think it should just work.. I guess once every few times the threads arrive in different order and that makes it fail, but I don't get how. And how I could fix this without adding slowing it down?
Thanks in advance for any tips,
Gert-Jan
I do my damn best to avoid the utter minefield of closure/stack interaction at all costs.
This is PROBABLY a (language-level) race condition, but without reflecting Parallel.Invoke i can't be sure. Basically, sometimes fst is being changed by create() and sometimes not. Ideally, it should NEVER be changed (if c# had good closure behaviour). It could be due to which thread Parallel.Invoke chooses to run create() and consume() on. If create() runs on the main thread, it might change fst before consume() takes a copy of it. Or create() might be running on a separate thread and taking a copy of fst. Basically, as much as i love c#, it is an utter pain in this regard, so just work around it and treat all variables involved in a closure as immutable.
To get it working:
//Replace
WorkItem fst = null
//with
WorkItem fst = WorkItem.GetSpecialBlankFirstItem();
//And
if (fst == null) fst = tmp;
//with
if (fst.Next == null) fst.Next = tmp;
A thread is allowed by the spec to cache a value indefinitely.
see Can a C# thread really cache a value and ignore changes to that value on other threads? and also http://www.yoda.arachsys.com/csharp/threads/volatility.shtml