After doing some research, I'm resorting to any feedback regarding how to effectively remove two items off a Concurrent collection. My situation involves incoming messages over UDP which are currently being placed into a BlockingCollection. Once there are two Users in the collection, I need to safely Take two users and process them. I've seen several different techniques including some ideas listed below. My current implementation is below but I'm thinking there's a cleaner way to do this while ensuring that Users are processed in groups of two. That's the only restriction in this scenario.
Current Implementation:
private int userQueueCount = 0;
public BlockingCollection<User> UserQueue = new BlockingCollection<User>();
public void JoinQueue(User u)
{
UserQueue.Add(u);
Interlocked.Increment(ref userQueueCount);
if (userQueueCount > 1)
{
IEnumerable<User> users = UserQueue.Take(2);
if(users.Count==2) {
Interlocked.Decrement(ref userQueueCount);
Interlocked.Decrement(ref userQueueCount);
... do some work with users but if only one
is removed I'll run into problems
}
}
}
What I would like to do is something like this but I cannot currently test this in a production situation to ensure integrity.
Parallel.ForEach(UserQueue.Take(2), (u) => { ... });
Or better yet:
public void JoinQueue(User u)
{
UserQueue.Add(u);
// if needed? increment
Interlocked.Increment(ref userQueueCount);
UserQueue.CompleteAdding();
}
Then implement this somewhere:
Task.Factory.StartNew(() =>
{
while (userQueueCount > 1) OR (UserQueue.Count > 1) If it's safe?
{
IEnumerable<User> users = UserQueue.Take(2);
... do stuff
}
});
The problem with this is that i'm not sure I can guarantee that between the condition (Count > 1) and the Take(2) that i'm ensuring the UserQueue has at least two items to process? Incoming UDP messages are processed in parallel so I need a way to safely pull items off of the Blocking/Concurrent Collection in pairs of two.
Is there a better/safer way to do this?
Revised Comments:
The intented goal of this question is really just to achieve a stable/thread safe method of processing items off of a Concurrent Collection in .Net 4.0. It doesn't have to be pretty, it just has to be stable in the task of processing items in unordered pairs of twos in a parallel environment.
Here is what I'd do in rough Code:
ConcurrentQueuequeue = new ConcurrentQueue(); //can use a BlockingCollection too (as it's just a blocking ConcurrentQueue by default anyway)
public void OnUserStartedGame(User joiningUser)
{
User waitingUser;
if (this.gameQueue.TryDequeue(out waitingUser)) //if there's someone waiting, we'll get him
this.MatchUsers(waitingUser, joiningUser);
else
this.QueueUser(joiningUser); //it doesn't matter if there's already someone in the queue by now because, well, we are using a queue and it will sort itself out.
}
private void QueueUser(User user)
{
this.gameQueue.Enqueue(user);
}
private void MatchUsers(User first, User second)
{
//not sure what you do here
}
The basic idea being that if someone's wants to start a game and there's someone in your queue, you match them and start a game - if there's no-one, add them to the queue.
At best you'll only have one user in the queue at a time, but if not, well, that's not too bad either because as other users start games, the waiting ones will gradually removed and no new ones added until the queue is empty again.
If I could not put pairs of users into the collection for some reason, I would use ConcurrentQueue and try to TryDequeue 2 items at a time, if I can get only one - put it back. Wait as necessary.
I think the easiest solution here is to use locking: you will have one lock for all consumers (producers won't use any locks), which will make sure you always take the users in the correct order:
User firstUser;
User secondUser;
lock (consumerLock)
{
firstUser = userQueue.Take();
secondUser = userQueue.Take();
}
Process(firstUser, secondUser);
Another option, would be to have two queues: one for single users and one for pairs of users and have a process that transfers them from the first queue to the second one.
If you don't mind having wasting another thread, you can do this with two BlockingCollections:
while (true)
{
var firstUser = incomingUsers.Take();
var secondUser = incomingUsers.Take();
userPairs.Add(Tuple.Create(firstUser, secondUser));
}
You don't have to worry about locking here, because the queue for single users will have only one consumer, and the consumers of pairs can now use simple Take() safely.
If you do care about wasting a thread and can use TPL Dataflow, you can use BatchBlock<T>, which combines incoming items into batches of n items, where n is configured at the time of creation of the block, so you can set it to 2.
May this can helpd
public static IList<T> TakeMulti<T>(this BlockingCollection<T> me, int count = 100) where T : class
{
T last = null;
if (me.Count == 0)
{
last = me.Take(); // blocking when queue is empty
}
var result = new List<T>(count);
if (last != null)
{
result.Add(last);
}
//if you want to take more item on this time.
//if (me.Count < count / 2)
//{
// Thread.Sleep(1000);
//}
while (me.Count > 0 && result.Count <= count)
{
result.Add(me.Take());
}
return result;
}
Related
I have two methods as below
private void MethodB_GetId()
{
//Calling Method A constinuosly in different thread
//Let's say its calling for Id = 1 to 100
}
private void MethodA_GetAll()
{
List<string> lst;
lock(_locker)
{
lst = SomeService.Get(); //This get return all 100 ids in one shot.
//Some other processing and then return result.
}
}
Now client is calling MethodB_GetById continuously for fetching data for id: 1 to 100 randomly. (It require some of data from these 100 Ids, not all data)
MethodA_GetAll get all data from network may be cache or database in one shot. and return whole collection to method B, then method B extract record in which it is interested.
Now if MethodA_GetAll() makes GetALL() times multiple times and fetching same records will be useless. so i can put a lock around it one thread is fetching record then other will be blocked.
Let's When MethodA_GetAll called by Id = 1 acquire lock and all others are waiting for lock to be released.
What i want is one data is available by any one thread just don't make call again.
Solution option:
1. Make List global to that class and thread safe. (I don't have that option)
I require some how thread 1 tell all other threads that i have record don't go fetching record again.
something like
lock(_locker && Lst!=null) //Not here lst is local to every thread
{
//If this satisfy then only fetch records
}
Please excuse me for poorly framing question. I have posted this in little hurry.
It sounds like you want to create a threadsafe cache. One way to do this is to use Lazy<t>.
Here's an example for a cache of type List<string>:
public sealed class DataProvider
{
public DataProvider()
{
_cache = new Lazy<List<string>>(createCache);
}
public void DoSomethingThatNeedsCachedList()
{
var list = _cache.Value;
// Do something with list.
Console.WriteLine(list[10]);
}
readonly Lazy<List<string>> _cache;
List<string> createCache()
{
// Dummy implementation.
return Enumerable.Range(1, 100).Select(x => x.ToString()).ToList();
}
}
When you need to access the cached value, you just access _cache.Value. If it hasn't yet been created, then the method you passed to the Lazy<T>'s constructor will be called to initialise it. In the example above, this is the createCache() method.
This is done in a threadsafe manner, so that if two threads try to access the cached value simultaneously when it hasn't been created yet, one of the threads will actually end up calling createCache() and the other thread will be blocked until the cached value has been initialised.
You can try double-check-locking lst:
private List<string> lst;
private void MethodA_GetAll()
{
if (lst == null)
{
lock (_locker)
{
if (lst == null)
{
// do your thing
}
}
}
}
So I'm running a Parallel.ForEach that basically generates a bunch of data which is ultimately going to be saved to a database. However, since collection of data can get quite large I need to be able to occasionally save/clear the collection so as to not run into an OutOfMemoryException.
I'm new to using Parallel.ForEach, concurrent collections, and locks, so I'm a little fuzzy on what exactly needs to be done to make sure everything works correctly (i.e. we don't get any records added to the collection between the Save and Clear operations).
Currently I'm saying, if the record count is above a certain threshold, save the data in the current collection, within a lock block.
ConcurrentStack<OutRecord> OutRecs = new ConcurrentStack<OutRecord>();
object StackLock = new object();
Parallel.ForEach(inputrecords, input =>
{
lock(StackLock)
{
if (OutRecs.Count >= 50000)
{
Save(OutRecs);
OutRecs.Clear();
}
}
OutRecs.Push(CreateOutputRecord(input);
});
if (OutRecs.Count > 0) Save(OutRecs);
I'm not 100% certain whether or not this works the way I think it does. Does the lock stop other instances of the loop from writing to output collection? If not is there a better way to do this?
Your lock will work correctly but it will not be very efficient because all your worker threads will be forced to pause for the entire duration of each save operation. Also, locks tends to be (relatively) expensive, so performing a lock in each iteration of each thread is a bit wasteful.
One of your comments mentioned giving each worker thread its own data storage: yes, you can do this. Here's an example that you could tailor to your needs:
Parallel.ForEach(
// collection of objects to iterate over
inputrecords,
// delegate to initialize thread-local data
() => new List<OutRecord>(),
// body of loop
(inputrecord, loopstate, localstorage) =>
{
localstorage.Add(CreateOutputRecord(inputrecord));
if (localstorage.Count > 1000)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
return localstorage;
},
// finally block gets executed after each thread exits
localstorage =>
{
if (localstorage.Count > 0)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
});
One approach is to define an abstraction that represents the destination for your data. It could be something like this:
public interface IRecordWriter<T> // perhaps come up with a better name.
{
void WriteRecord(T record);
void Flush();
}
Your class that processes the records in parallel doesn't need to worry about how those records are handled or what happens when there's too many of them. The implementation of IRecordWriter handles all those details, making your other class easier to test.
An implementation of IRecordWriter could look something like this:
public abstract class BufferedRecordWriter<T> : IRecordWriter<T>
{
private readonly ConcurrentQueue<T> _buffer = new ConcurrentQueue<T>();
private readonly int _maxCapacity;
private bool _flushing;
public ConcurrentQueueRecordOutput(int maxCapacity = 100)
{
_maxCapacity = maxCapacity;
}
public void WriteRecord(T record)
{
_buffer.Enqueue(record);
if (_buffer.Count >= _maxCapacity && !_flushing)
Flush();
}
public void Flush()
{
_flushing = true;
try
{
var recordsToWrite = new List<T>();
while (_buffer.TryDequeue(out T dequeued))
{
recordsToWrite.Add(dequeued);
}
if(recordsToWrite.Any())
WriteRecords(recordsToWrite);
}
finally
{
_flushing = false;
}
}
protected abstract void WriteRecords(IEnumerable<T> records);
}
When the buffer reaches the maximum size, all the records in it are sent to WriteRecords. Because _buffer is a ConcurrentQueue it can keep reading records even as they are added.
That Flush method could be anything specific to how you write your records. Instead of this being an abstract class the actual output to a database or file could be yet another dependency that gets injected into this one. You can make decisions like that, refactor, and change your mind because the very first class isn't affected by those changes. All it knows about is the IRecordWriter interface which doesn't change.
You might notice that I haven't made absolutely certain that Flush won't execute concurrently on different threads. I could put more locking around this, but it really doesn't matter. This will avoid most concurrent executions, but it's okay if concurrent executions both read from the ConcurrentQueue.
This is just a rough outline, but it shows how all of the steps become simpler and easier to test if we separate them. One class converts inputs to outputs. Another class buffers the outputs and writes them. That second class can even be split into two - one as a buffer, and another as the "final" writer that sends them to a database or file or some other destination.
The original post contained a problem, I managed to solve, introducing a lot of issues with shared mutable state. Now, I'm wondering, if it can be done in a pure functional way.
Requests can be processed in a certain order.
For each order i there is an effectiveness E(i)
Processing request should follow three conditions
There should be no delay between acquiring the first request and processing it
There should be no delay between processing some request and processing next request
When there are several orders of processing requests, the one with highest effectiveness should be chosen
Concrete example:
For an infinite list of integers, print them, so, that prime numbers are generally earlier, than not prime numbers
Effectiveness of ordering is reverse to the number of times we had primes in queue, but printed non prime
My first solution in C# (not for primes, obviously) used some classes having a shared mutable state represented by a concurrent priority queue. It was ugly, because I had to manually subscribe classes to events and unsubscribe them, check that queue is not exhausted by one intermediate consumer before other consumer processes it and etc.
To refactor it, I chose Reactive Extensions library, which seemed to address issues with state. I understood that in the following case I couldn't use it:
The source function accepts nothing and returns IObservable<Request>
The process function accepts IObservable<Request> and returns nothing
I have to write a reorder function, which reorders requests on their way from source to process.
Internally reorder has a ConcurrentPriorityQueue of orders. It should handle two scenarios:
When process is busy with processing reorder finds better orderings and updates the queue
When process has requested a new order reorder returns the first element from queue
The problem was that if reorder returned IObservable<Request>, it wass unaware, whether items were requested from it, or no.
If reorder had called OnNext immediately upon receiving, it did not reorder anything and violated condition 3.
If it ensured, that it had found the best ordering, it violated conditions 1&2 because process could become idle.
If reorder returned ISubject<Request>, it exposed an option to call OnError and OnCompleted to consumer.
If reorder has returned the queue, I would have returned to where I started
The problem was that cold IObservable.Create was not lazy enough. It started exhausting queue with all requests when a subscription to it was made but results of only the first ones were used.
The solution I came up with is to return observable of requests, i.e. IObservable<Func<Task<int>>> instead of IObservable<int>
It works when there is only one subscriber, but if there are more requests used, than there are numbers generated by source, they will be awaited forever.
This issue can probably be solved by introducing caching, but then consumer which consumed a queue fast will have side effects on all other consumers, because he will freeze the queue in less effective ordering, than it would be after some waiting.
So, I will post solution to the original question, but It's not really a valuable answer, because it introduces a lot of problems.
This demonstrates why doesn't functional reactive programming and side effects mix well. On the other hand, it seems I now have an example of a practical problem impossible to solve in pure functional way. Or don't I? If Order function accepted optimizationLevel as a parameter it would be pure. Can we somehow implicitly convert time to optimizationLevel to make this pure as well?
I'd like to see such solution very much. In C# or any other language.
Problematic solution. Uses ConcurrentPriorityQueue from this repo.
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using System.Reactive.Linq;
using DataStructures;
using System.Threading;
namespace LazyObservable
{
class Program
{
/// <summary>
/// Compares tuple by second element, then by first in reverse
/// </summary>
class PriorityComparer<TElement, TPriority> : IComparer<Tuple<TElement, TPriority>>
where TPriority : IComparable<TPriority>
{
Func<TElement, TElement, int> fallbackComparer;
public PriorityComparer(IComparer<TElement> comparer=null)
{
if (comparer != null)
{
fallbackComparer = comparer.Compare;
}
else if (typeof(IComparable<TElement>).IsAssignableFrom(typeof(TElement))
|| typeof(IComparable).IsAssignableFrom(typeof(TElement)))
{
fallbackComparer = (a,b)=>-Comparer<TElement>.Default.Compare(a,b);
}
else
{
fallbackComparer = (_1,_2) => 0;
}
}
public int Compare(Tuple<TElement, TPriority> x, Tuple<TElement, TPriority> y)
{
if (x == null && y == null)
{
return 0;
}
if (x == null || y == null)
{
return x == null ? -1 : 1;
}
int res=x.Item2.CompareTo(y.Item2);
if (res == 0)
{
res = fallbackComparer(x.Item1,y.Item1);
}
return res;
}
};
const int N = 100;
static IObservable<int> Source()
{
return Observable.Interval(TimeSpan.FromMilliseconds(1))
.Select(x => (int)x)
.Where(x => x <= 100);
}
static bool IsPrime(int x)
{
if (x <= 1)
{
return false;
}
if (x == 2)
{
return true;
}
int limit = ((int)Math.Sqrt(x)) + 1;
for (int i = 2; i < limit; ++i)
{
if (x % i == 0)
{
return false;
}
}
return true;
}
static IObservable<Func<Task<int>>> Order(IObservable<int> numbers)
{
ConcurrentPriorityQueue<Tuple<int, int>> queue = new ConcurrentPriorityQueue<Tuple<int, int>>(new PriorityComparer<int, int>());
numbers.Subscribe(x =>
{
queue.Add(new Tuple<int, int>(x, 0));
});
numbers
.ForEachAsync(x=>
{
Console.WriteLine("Testing {0}", x);
if (IsPrime(x))
{
if (queue.Remove(new Tuple<int, int>(x, 0)))
{
Console.WriteLine("Accelerated {0}", x);
queue.Add(new Tuple<int, int>(x, 1));
}
}
});
Func<Task<int>> requestElement = async () =>
{
while (queue.Count == 0)
{
await Task.Delay(30);
}
return queue.Take().Item1;
};
return numbers.Select(_=>requestElement);
}
static void Process(IObservable<Func<Task<int>>> numbers)
{
numbers
.Subscribe(async x=>
{
await Task.Delay(1000);
Console.WriteLine(await x());
});
}
static void Main(string[] args)
{
Console.WriteLine("init");
Process(Order(Source()));
//Process(Source());
Console.WriteLine("called");
Console.ReadLine();
}
}
}
To summarize (conceptually):
You have requests that come in irregularly (from source), and a single processor (function process) that can handle them.
The processor should have no downtime.
You're implicitly going to need some sort of queue-ish collection to manage the case where the requests come in faster than the processor can process.
In the event that there are multiple requests queued up, ideally, you should order them by some effectiveness function, however the re-ordering shouldn't be the cause of downtime. (Function reorder).
Is all this correct?
Assuming it is, the source can be of type IObservable<Request>, sounds fine. reorder though sounds like it should really return an IEnumerable<Request>: process wants to be working on a pull-basis: It wants to pull the highest priority request once it frees up, and wait for the next request if the queue is empty but start immediately. That sounds like a task for IEnumerable, not IObservable.
public IObservable<Request> requester();
public IEnumerable<Request> reorder(IObservable<Request> requester);
public void process(IEnumerable<Request> requestEnumerable);
I have a web service I need to query and it takes a value that supports pagination for its data. Due to the amount of data I need to fetch and how that service is implemented I intended to do a series of concurrent http web requests to accumulate this data.
Say I have number of threads and page size how could I assign each thread to pick its starting point that doesn't overlap with the other thread? Its been a long time since I took parallel programming and I'm floundering a bit. I know I could find my start point with something like start = N/numThreads * threadNum however I don't know N. Right now I just spin up X threads and each loop until they get no more data. Problem is they tend to overlap and I end up with duplicate data. I need unique data and not to waste requests.
Right now I have code that looks something like this. This is one of many attempts and I see why this is wrong but its better to show something. The goal is to in parallel collect pages of data from a webservice:
int limit = pageSize;
data = new List<RequestStuff>();
List<Task> tasks = new List<Task>();
for (int i = 0; i < numThreads; i++)
{
tasks.Add(Task.Factory.StartNew(() =>
{
try
{
List<RequestStuff> someData;
do
{
int start;
lock(myLock)
{
start = data.Count;
}
someKeys = GetDataFromService(start, limit);
lock (myLock)
{
if (someData != null && someData.Count > 0)
{
data.AddRange(someData);
}
}
} while (hasData);
}
catch (AggregateException ex)
{
//Exception things
}
}));
}
Task.WaitAll(tasks.ToArray());
Any inspiration to solve this without race conditions? I need to stick to .NET 4 if that matters.
I'm not sure there's a way to do this without wasting some requests unless you know the actual limit. The code below might help eliminate the duplicate data as you will only query on each index once:
private int _index = -1; // -1 so first request starts at 0
private bool _shouldContinue = true;
public IEnumerable<RequestStuff> GetAllData()
{
var tasks = new List<Task<RequestStuff>>();
while (_shouldContinue)
{
tasks.Add(new Task<RequestStuff>(() => GetDataFromService(GetNextIndex())));
}
Task.WaitAll(tasks.ToArray());
return tasks.Select(t => t.Result).ToList();
}
private RequestStuff GetDataFromService(int id)
{
// Get the data
// If there's no data returned set _shouldContinue to false
// return the RequestStuff;
}
private int GetNextIndex()
{
return Interlocked.Increment(ref _index);
}
It could also be improved by adding cancellation tokens to cancel any indexes you know to be wasteful, i.e, if index 4 returns nothing you can cancel all queries on indexes above 4 that are still active.
Or if you could make a reasonable guess at the max index you might be able to implement an algorithm to pinpoint the exact limit before retrieving any data. This would probably only be more efficient if your guess was fairly accurate though.
Are you attempting to force parallelism on the part of the remote service by issuing multiple concurrent requests? Paging is generally used to limit the amount of data returned to only that which is needed, but if you need all of the data, then attempting to first page and then reconstruct it later seems like a poor design. Your code becomes needlessly complex, difficult to maintain, you'll likely just move the bottleneck from code you control to somewhere else, and now you've introduced data integrity issues (what happens if all of these threads access different versions of the data you are trying to query?). By increasing the complexity and number of calls, you are also increasing the likelihood of problems occurring (eg. one of the connections gets dropped).
Can you state the problem you are attempting to solve so perhaps instead we can help architect a better solution?
so got a new problem...
I'm writing a multithreaded proxychecker in c#.
I'm using BackgroundWorkers to solve the multithreading problem.
But i have problems coordinating and assigning the proxys left in queue to the runnning workers. It works most of the time but sometimes no result comes back so some proxys get 'lost' during the process.
This list represents the queue and is filled with the ids of the proxys in a ListView.
private List<int> queue = new List<int>();
private int GetNextinQueue()
{
if(queue.Count > 0)
{
lock (lockqueue)
{
int temp = queue[0];
queue.Remove(temp);
return temp;
}
}
else
return -1;
}
Above is my method to get the next proxy in queue, i'm using the lock statement to prevent race conditions but i am unsure if its enough or whether it slows the process down because it makes the other threads wait...
(lockqueue is an object just used for locking)
So my question is, how is it possible that some proxys are not getting checked(even if the ping fails the checking should return something, but sometimes theres just nothing) and how i can i optimize this code for performance?
Here's the rest of the code which i consider important for this question http://pastebin.com/iJduX82b
If something is missing just write a comment
Thanks :)
The check for queue.Count should be performed within the lock statement. Otberwise you may check that queue.Count > 0, but by the time you are able to enter the lock, another thread may have removed an item from the queue, and you are then going to call Remove on a possibly empty queue.
You could modify it to:
private int GetNextinQueue()
{
lock (lockqueue)
{
if(queue.Count > 0)
{
int temp = queue[0];
queue.Remove(temp);
return temp;
}
else
return -1;
}
}
Basically, if you want to guard access to a data structure with a lock, make sure that you guard ALL reads and writes to that structure for thread-safety.
A couple of things:
All accesses to the queue field need to go inside a lock (lockqueue) block -- this includes the if (queue.Count > 0) line above. It's not a question of performance: your application won't work if you don't acquire the lock wherever necessary.
From your pastebin, the call to RunWorkerAsync looks suspicious. Currently each BackgroundWorker shares the same arguments array; you need to give each one its own copy.
Try this instead:
private int GetNextinQueue()
{
int ret = -1;
lock (queue)
{
if (queue.Count > 0)
{
int temp = queue[0];
queue.Remove(temp);
ret = temp;
}
}
return ret;
}
I wouldn't worry about performance with this - it's true that other threads will block here if one thread holds the lock, but removing an int from a List doesn't take very long.
Also, you don't really need a lockqueue object - since queue is the object you want to lock access to, just use it.
If you're interested in simple elegance, use a queue:
private Queue<int> queue = new Queue<int>();
private int GetNextinQueue()
{
lock (queue) { return queue.Count > 0 ? queue.Dequeue() : -1; }
}