So, I have a requirement to read each record(line) of a large data file and then application various validation rules on each of these lines. So, rather than just apply sequential validation, I decided to see if I could use some pipelining to help speed things up. So, I need to apply the same set of Business validation rules(5 at the moment) to all items in my collection. As there is no need to return output from each validation process, I don't need to worry about passing values from one validation routine to the other. I do however need to make the same data available to all my validation steps and to do this, I came up with coping the same data(records) to 5 different buffers, which will be used by each of the validation stages.
Below is the code I have going. But I have little confidence in this applied and wanted to know if there is a better way of doing this please? I appreciate any help you can give on this please. Thanks in advance.
public static void LoadBuffers(List<BlockingCollection<FlattenedLoadDetail>> outputs,
BlockingCollection<StudentDetail> students)
{
try
{
foreach (var student in students)
{
foreach (var stub in student.RecordYearDetails)
foreach (var buffer in outputs)
buffer.Add(stub);
}
}
finally
{
foreach (var buffer in outputs)
buffer.CompleteAdding();
}
}
public void Process(BlockingCollection<StudentRecordDetail> StudentRecords)
{
//Validate header record before proceeding
if(! IsHeaderRecordValid)
throw new Exception("Invalid Header Record Found.");
const int buffersize = 20;
var buffer1 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
var buffer2 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
var buffer3 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
var buffer4 = new BlockingCollection<FlattenedLoadDetail>(buffersize);
var taskmonitor = new TaskFactory(TaskCreationOptions.LongRunning, TaskContinuationOptions.NotOnCanceled);
using (var loadUpStartBuffer = taskmonitor.StartNew(() => LoadBuffers(
new List<BlockingCollection<FlattenedLoadDetail>>
{buffer1, buffer2, buffer3, buffer4}, StudentRecords)))
{
var recordcreateDateValidationStage = taskmonitor.StartNew(() => ValidateRecordCreateDateActivity.Validate(buffer1));
var uniqueStudentIDValidationStage =
taskmonitor.StartNew(() => ValidateUniqueStudentIDActivity.Validate(buffer2));
var SSNNumberRangeValidationStage =
taskmonitor.StartNew(() => ValidateDocSequenceNumberActivity.Validate(buffer3));
var SSNRecordNumberMatchValidationStage =
taskmonitor.StartNew(() => ValidateStudentSSNRecordNumberActivity.Validate(buffer4));
Task.WaitAll(loadUpStartBuffer, recordcreateDateValidationStage, uniqueStudentIDValidationStage,
SSNNumberRangeValidationStage, SSNRecordNumberMatchValidationStage);
}
}
In fact, if I could tie up the tasks in such a way that once one fails, all the others stop, that would help me a lot but I am a newbie to this pattern and kind of trying to figure out best way to handle this problem I have here. Should I just throw caution to the wind and have each of the validation steps load an output buffer to be passed on to subsequent task? Is that a better way to go with this?
The first question you need to answer for yourself is whether you want to improve latency or throughput.
The strategy you depicted takes a single item and perform parallel calculation on it. This means that an item is serviced very fast, but at the expense of other items that are left waiting for their turn to enter.
Consider an alternative concurrent approach. You can treat the entire validation process as a sequential operation, but simultaneously service more than one item in parallel.
It seems to me that in your case you will benefit more from the latter approach, especially from the perspective of simplicity and since I am guessing that latency is not as important here.
Related
What I'm trying to accomplish is a C# application that will read logs from the Windows Event Logs and store them somewhere else. This has to be fast, since some of the devices where it will be installed generate a high amount of logs/s.
I have tried three approaches so far:
Local WMI: it didn't work good, there are too many errors and exceptions caused by the size of the collections that need to be loaded.
EventLogReader: I though this was the perfect solution, since it allows you to query the event log however you like by using XPath expressions. The problem is that when you want to get the content of the message for each log (by calling FormatDescription()) takes way too much time for long collections.
E.g: I can read 12k logs in 0.11s if I just go over them.
If I add a line to store the message for each log, it takes nearly 6 minutes to complete exactly the same operation, which is totally crazy for such a low number of logs.
I don't know if there's any kind of optimization that might be done to EventLogReader in order to get the message faster, I couldn't find anything either on MS documentation nor on the Internet.
I also found that you can read the log entries by using a class called EventLog. However, this technology does not allow you to enter any kind of filters so you basically have to load the entire list of logs to memory and then filter it out according to your needs.
Here's an example:
EventLog eventLog = EventLog.GetEventLogs().FirstOrDefault(el => el.Log.Equals("Security", StringComparison.OrdinalIgnoreCase));
var newEntries = (from entry in eventLog.Entries.OfType()
orderby entry.TimeWritten ascending
where entry.TimeWritten > takefrom
select entry);
Despite of being faster in terms of getting the message, the use of memory might be high and I don't want to cause any issues on the devices where this solution will get deployed.
Can anybody help me with this? I cannot find any workarounds or approaches to achieve something like this.
Thank you!.
You can give the EventLogReader class a try. See https://learn.microsoft.com/en-us/previous-versions/bb671200(v=vs.90).
It is better than the EventLog class because accessing the EventLog.Entries collection has the nasty property that its count can change while you are reading from it. What is even worse is that the reading happens on an IO threadpool thread which will let your application crash with an unhandled exception. At least that was the case some years ago.
The EventLogReader also gives you the ability to supply a query string to filter for the events you are interested in. That is the way to go if you write a new application.
Here is an application which shows how you can parallelize reading:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Diagnostics.Eventing.Reader;
using System.Linq;
using System.Threading.Tasks;
namespace EventLogReading
{
class Program
{
static volatile bool myHasStoppedReading = false;
static void ParseEventsParallel()
{
var sw = Stopwatch.StartNew();
var query = new EventLogQuery("Application", PathType.LogName, "*");
const int BatchSize = 100;
ConcurrentQueue<EventRecord> events = new ConcurrentQueue<EventRecord>();
var readerTask = Task.Factory.StartNew(() =>
{
using (EventLogReader reader = new EventLogReader(query))
{
EventRecord ev;
bool bFirst = true;
int count = 0;
while ((ev = reader.ReadEvent()) != null)
{
if ( count % BatchSize == 0)
{
events.Enqueue(ev);
}
count++;
}
}
myHasStoppedReading = true;
});
ConcurrentQueue<KeyValuePair<string, EventRecord>> eventsWithStrings = new ConcurrentQueue<KeyValuePair<string, EventRecord>>();
Action conversion = () =>
{
EventRecord ev = null;
using (var reader = new EventLogReader(query))
{
while (!myHasStoppedReading || events.TryDequeue(out ev))
{
if (ev != null)
{
reader.Seek(ev.Bookmark);
for (int i = 0; i < BatchSize; i++)
{
ev = reader.ReadEvent();
if (ev == null)
{
break;
}
eventsWithStrings.Enqueue(new KeyValuePair<string, EventRecord>(ev.FormatDescription(), ev));
}
}
}
}
};
Parallel.Invoke(Enumerable.Repeat(conversion, 8).ToArray());
sw.Stop();
Console.WriteLine($"Got {eventsWithStrings.Count} events with strings in {sw.Elapsed.TotalMilliseconds:N3}ms");
}
static void ParseEvents()
{
var sw = Stopwatch.StartNew();
List<KeyValuePair<string, EventRecord>> parsedEvents = new List<KeyValuePair<string, EventRecord>>();
using (EventLogReader reader = new EventLogReader(new EventLogQuery("Application", PathType.LogName, "*")))
{
EventRecord ev;
while ((ev = reader.ReadEvent()) != null)
{
parsedEvents.Add(new KeyValuePair<string, EventRecord>(ev.FormatDescription(), ev));
}
}
sw.Stop();
Console.WriteLine($"Got {parsedEvents.Count} events with strings in {sw.Elapsed.TotalMilliseconds:N3}ms");
}
static void Main(string[] args)
{
ParseEvents();
ParseEventsParallel();
}
}
}
Got 20322 events with strings in 19,320.047ms
Got 20323 events with strings in 5,327.064ms
This gives a decent speedup of a factor 4. I needed to use some tricks to get faster because for some strange reason the class ProviderMetadataCachedInformation is not thread safe and uses internally a lock(this) around the Format method which defeats paralell reading.
The key trick is to open the event log in the conversion threads again and then read a bunch of events of the query there via the event bookmark Api. That way you can format the strings independently.
Update1
I have landed a change in .NET 5 which increases performance by a factor three up to 20. See https://github.com/dotnet/runtime/issues/34568.
You can also copy the EventLogReader class from .NET Core and use this one instead which will give you the same speedup.
The full saga is described by my Blog Post: https://aloiskraus.wordpress.com/2020/07/20/ms-performance-hud-analyze-eventlog-reading-performance-in-realtime/
We discussed a bit about reading the existing logs in the comments, can access the Security-tagged logs by accessing:
var eventLog = new EventLog("Security");
for (int i = 0; i < eventLog.Entries.Count; i++)
{
Console.WriteLine($"{eventLog.Entries[i].Message}");
}
This might not be the cleanest (performance-wise) way of doing it, but I doubt any other will be faster, as you yourself have already found out by trying out different techniques.
A small edit duo to Alois post: EventLogReader is not faster out of the box than EventLog, especially when using the for-loop mechanism showed in the code block above, I think EventLog is faster -- it only accesses the entries inside the loop using their index, the Entries collection is just a reference, whereas while using the EventLogReader, it will perform a query first and loop through that result, which should be slower. As commented on Alois's post: if you don't need to use the query option, just use the EventLog variant. If you do need querying, use the EventLogReader as is can query on a lower level than you could while using EventLog (only LINQ queries, which is slower ofcourse than querying in while executing the look-up).
To prevent you from having this hassle again in the future, and because you said you are running a service, I'd use the EntryWritten event of the EventLog class:
var eventLog = new EventLog("Security")
{
EnableRaisingEvents = true
};
eventLog.EntryWritten += EventLog_EntryWritten;
// .. read existing logs or do other work ..
private static void EventLog_EntryWritten(object sender, EntryWrittenEventArgs e)
{
Console.WriteLine($"received new entry: {e.Entry.Message}");
}
Note that you must set the EnableRaisingEvents to true in order for the event to fire whenever a new entry is logged. It'll also be a good practice (also, performance-wise) to start a (for example) Task, so that the system won't lock itself while queuing up the calls to your event.
This approach works fine if you want to retrieve all newly created events, if you want to retrieve newly created events but use a query (filter) for these events, you can check out the EventLogWatcher class, but in your case, when there are no constraints, I'd just use the EntryWritten event because you don't need filters and for plain old simplicity.
I have a method which takes an argument and run it against database, retrieve records, process and save processed records to a new table. Running the method from the service with one parameter works. What i am trying to achieve now is make the parameter dynamic. I have implemented a method to retrieve the parameters and it works fine. Now i am trying to run methods parallel from the list of parameter's provided. My current implementation is:
WorkerClass WorkerClass = new WorkerClass();
var ParametersList = WorkerClass.GetParams();
foreach (var item in ParametersList){
WorkerClass WorkerClass2 = new WorkerClass();
Parallel.Invoke(
()=>WorkerClass2.ProcessAndSaveMethod(item)
);
}
On the above implementation i think defining a new WorkerClass2 defies the whole point of Parallel.Invoke but i am having an issue with data mixup when using already defined WorkerClass. The reason for the mix up is Oracle connection is opened inside the Init() Method of the class and static DataTable DataCollectionList; is defined on a class level thus creating an issue.
Inside the method ProcessAndSaveMethod(item) i have:
OracleCommand Command = new OracleCommand(Query, OracleConnection);
OracleDataAdapter Adapter = new OracleDataAdapter(Command);
Adapter.Fill(DataCollectionList);
Inside init():
try
{
OracleConnection = new OracleConnection(Passengers.OracleConString);
DataCollectionList = new DataTable();
OracleConnection.Open();
return true;
}
catch (Exception ex)
{
OracleConnection.Close();
DataCollectionList.Clear();
return false;
}
And the function isn't run parallely as i was trying to do. Is there another way to implement this?
To run it in parallel you need to call Parallel.Invoke only once with all the tasks to be completed:
Parallel.Invoke(
ParametersList.Select(item =>
new Action(()=>WorkerClass2.ProcessAndSaveMethod(item))
).ToArray()
);
If you have a list of somethings and want it processed in parallel, there really is no easier way than PLinq:
var parametersList = SomeObject.SomeFunction();
var resultList = parametersList.AsParallel()
.Select(item => new WorkerClass().ProcessAndSaveMethod(item))
.ToList();
The fact that you build up a new connection and use a lot of variables local to the one item you process is fine. It's actually the preferred way to do multi-threading: keep as much local to the thread as you can.
That said, you have to measure if multi-threading is actually the fastest way to solve your problem. Maybe you can do your processing sequentially and then do all your database stuff in one go with bulk inserts, temporary tables or whatever is suited to your specific problem. Splitting a task into smaller tasks for more processors to run is not always faster. It's a tool and you need to find out if that tool is helping in your specific situation.
I achieved parallel processing using the below code and also avoided null pointer exception from DbCon.open() caused by connection pooling using the max degree of parallelism parameter.
Parallel.ForEach(ParametersList , new ParallelOptions() { MaxDegreeOfParallelism = 5 }, item=>
{
WorkerClass Worker= new WorkerClass();
Worker.ProcessAndSaveMethod(item);
});
I have a web service I need to query and it takes a value that supports pagination for its data. Due to the amount of data I need to fetch and how that service is implemented I intended to do a series of concurrent http web requests to accumulate this data.
Say I have number of threads and page size how could I assign each thread to pick its starting point that doesn't overlap with the other thread? Its been a long time since I took parallel programming and I'm floundering a bit. I know I could find my start point with something like start = N/numThreads * threadNum however I don't know N. Right now I just spin up X threads and each loop until they get no more data. Problem is they tend to overlap and I end up with duplicate data. I need unique data and not to waste requests.
Right now I have code that looks something like this. This is one of many attempts and I see why this is wrong but its better to show something. The goal is to in parallel collect pages of data from a webservice:
int limit = pageSize;
data = new List<RequestStuff>();
List<Task> tasks = new List<Task>();
for (int i = 0; i < numThreads; i++)
{
tasks.Add(Task.Factory.StartNew(() =>
{
try
{
List<RequestStuff> someData;
do
{
int start;
lock(myLock)
{
start = data.Count;
}
someKeys = GetDataFromService(start, limit);
lock (myLock)
{
if (someData != null && someData.Count > 0)
{
data.AddRange(someData);
}
}
} while (hasData);
}
catch (AggregateException ex)
{
//Exception things
}
}));
}
Task.WaitAll(tasks.ToArray());
Any inspiration to solve this without race conditions? I need to stick to .NET 4 if that matters.
I'm not sure there's a way to do this without wasting some requests unless you know the actual limit. The code below might help eliminate the duplicate data as you will only query on each index once:
private int _index = -1; // -1 so first request starts at 0
private bool _shouldContinue = true;
public IEnumerable<RequestStuff> GetAllData()
{
var tasks = new List<Task<RequestStuff>>();
while (_shouldContinue)
{
tasks.Add(new Task<RequestStuff>(() => GetDataFromService(GetNextIndex())));
}
Task.WaitAll(tasks.ToArray());
return tasks.Select(t => t.Result).ToList();
}
private RequestStuff GetDataFromService(int id)
{
// Get the data
// If there's no data returned set _shouldContinue to false
// return the RequestStuff;
}
private int GetNextIndex()
{
return Interlocked.Increment(ref _index);
}
It could also be improved by adding cancellation tokens to cancel any indexes you know to be wasteful, i.e, if index 4 returns nothing you can cancel all queries on indexes above 4 that are still active.
Or if you could make a reasonable guess at the max index you might be able to implement an algorithm to pinpoint the exact limit before retrieving any data. This would probably only be more efficient if your guess was fairly accurate though.
Are you attempting to force parallelism on the part of the remote service by issuing multiple concurrent requests? Paging is generally used to limit the amount of data returned to only that which is needed, but if you need all of the data, then attempting to first page and then reconstruct it later seems like a poor design. Your code becomes needlessly complex, difficult to maintain, you'll likely just move the bottleneck from code you control to somewhere else, and now you've introduced data integrity issues (what happens if all of these threads access different versions of the data you are trying to query?). By increasing the complexity and number of calls, you are also increasing the likelihood of problems occurring (eg. one of the connections gets dropped).
Can you state the problem you are attempting to solve so perhaps instead we can help architect a better solution?
I've been trying to create an observable which streams a state-of-the-world (snapshot) from a repository cache, followed by live updates from a separate feed. The catch is that the snapshot call is blocking, so the updates have to be buffered during that time.
This is what I've come up with, a little simplified. The GetStream() method is the one I'm concerned with. I'm wondering whether there is a more elegant solution. Assume GetDataFeed() pulses updates to the cache all day long.
private static readonly IConnectableObservable<long> _updateStream;
public static Constructor()
{
_updateStream = GetDataFeed().Publish();
_updateStream.Connect();
}
static void Main(string[] args)
{
_updateStream.Subscribe(Console.WriteLine);
Console.ReadLine();
GetStream().Subscribe(l => Console.WriteLine("Stream: " + l));
Console.ReadLine();
}
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var bufferedStream = new ReplaySubject<long>();
_updateStream.Subscribe(bufferedStream);
var data = GetSnapshot();
// This returns the ticks from GetSnapshot
// followed by the buffered ticks from _updateStream
// followed by any subsequent ticks from _updateStream
data.ToObservable().Concat(bufferedStream).Subscribe(observer);
return Disposable.Empty;
});
}
private static IObservable<long> GetDataFeed()
{
var feed = Observable.Interval(TimeSpan.FromSeconds(1));
return Observable.Create<long>(observer =>
{
feed.Subscribe(observer);
return Disposable.Empty;
});
}
Popular opinion opposes Subjects as they are not 'functional', but I can't find a way of doing this without a ReplaySubject. The Replay filter on a hot observable wouldn't work because it would replay everything (potentially a whole day's worth of stale updates).
I'm also concerned about race conditions. Is there a way to guarantee sequencing of some sort, should an earlier update be buffered before the snapshot? Can the whole thing be done more safely and elegantly with other RX operators?
Thanks.
-Will
Whether you use a ReplaySubject or the Replay function really makes no difference. Replay uses a ReplaySubject under the hood. I'll also note that you are leaking subscriptions like mad, which can cause a resource leak. Also, you put no limit on the size of the replay buffer. If you watch the observable all day long, then that replay buffer will keep growing and growing. You should put a limit on it to prevent that.
Here is an updated version of GetStream. In this version I take the simplistic approach of just limitting the Replay to the most recent 1 minute of data. This assumes that GetData will always complete and the observer will observe the results within that 1 minute. Your mileage may vary and you can probably improve upon this scheme. But at least this way when you have watched the observable all day long, that buffer will not have grown unbounded and will still only contain a minute's worth of updates.
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var updateStreamSubscription = new SingleAssignmentDisposable();
var sequenceDisposable = new SingleAssignmentDisposable();
var subscriptions = new CompositeDisposable(updateStreamDisposable, sequenceDisposable);
// start buffering the updates
var bufferedStream = _updateStream.Replay(TimeSpan.FromMinutes(1));
updateStreamSubscription.Disposable = bufferedStream.Connect();
// now retrieve the initial snapshot data
var data = GetSnapshot();
// subscribe to the snapshot followed by the buffered data
sequenceDisposable.Disposable = data.ToObservable().Concat(bufferedStream).subscribe(observer);
// return the composite disposable which will unsubscribe when the observer wishes
return subscriptions;
});
}
As for your questions about race conditions and filtering out "old" updates...if your snapshot data includes some sort of version information, and your update stream also providers version information, then you can effectively measure the latest version returned by your snapshot query and then filter the buffered stream to ignore values for older versions. Here is a rough example:
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var updateStreamSubscription = new SingleAssignmentDisposable();
var sequenceDisposable = new SingleAssignmentDisposable();
var subscriptions = new CompositeDisposable(updateStreamDisposable, sequenceDisposable);
// start buffering the updates
var bufferedStream = _updateStream.Replay(TimeSpan.FromMinutes(1));
updateStreamSubscription.Disposable = bufferedStream.Connect();
// now retrieve the initial snapshot data
var data = GetSnapshot();
var snapshotVersion = data.Length > 0 ? data[data.Length - 1].Version : 0;
var filteredUpdates = bufferedStream.Where(update => update.Version > snapshotVersion);
// subscribe to the snapshot followed by the buffered data
sequenceDisposable.Disposable = data.ToObservable().Concat(filteredUpdates).subscribe(observer);
// return the composite disposable which will unsubscribe when the observer wishes
return subscriptions;
});
}
I have successfully used this pattern when merging live updates with a stored snapshot. I haven't yet found an elegant Rx operator that already does this without any race conditions. But the above method could probably be turned into such. :)
Edit: Note I have left out error handling in the examples above. In theory the call to GetSnapshot could fail and you'd leak the subscription to the update stream. I suggest wrapping everything after the CompositeDisposable declaration in a try/catch block, and in the catch handler, ensure call subscriptions.Dispose() before re-throwing the exception.
After doing some research, I'm resorting to any feedback regarding how to effectively remove two items off a Concurrent collection. My situation involves incoming messages over UDP which are currently being placed into a BlockingCollection. Once there are two Users in the collection, I need to safely Take two users and process them. I've seen several different techniques including some ideas listed below. My current implementation is below but I'm thinking there's a cleaner way to do this while ensuring that Users are processed in groups of two. That's the only restriction in this scenario.
Current Implementation:
private int userQueueCount = 0;
public BlockingCollection<User> UserQueue = new BlockingCollection<User>();
public void JoinQueue(User u)
{
UserQueue.Add(u);
Interlocked.Increment(ref userQueueCount);
if (userQueueCount > 1)
{
IEnumerable<User> users = UserQueue.Take(2);
if(users.Count==2) {
Interlocked.Decrement(ref userQueueCount);
Interlocked.Decrement(ref userQueueCount);
... do some work with users but if only one
is removed I'll run into problems
}
}
}
What I would like to do is something like this but I cannot currently test this in a production situation to ensure integrity.
Parallel.ForEach(UserQueue.Take(2), (u) => { ... });
Or better yet:
public void JoinQueue(User u)
{
UserQueue.Add(u);
// if needed? increment
Interlocked.Increment(ref userQueueCount);
UserQueue.CompleteAdding();
}
Then implement this somewhere:
Task.Factory.StartNew(() =>
{
while (userQueueCount > 1) OR (UserQueue.Count > 1) If it's safe?
{
IEnumerable<User> users = UserQueue.Take(2);
... do stuff
}
});
The problem with this is that i'm not sure I can guarantee that between the condition (Count > 1) and the Take(2) that i'm ensuring the UserQueue has at least two items to process? Incoming UDP messages are processed in parallel so I need a way to safely pull items off of the Blocking/Concurrent Collection in pairs of two.
Is there a better/safer way to do this?
Revised Comments:
The intented goal of this question is really just to achieve a stable/thread safe method of processing items off of a Concurrent Collection in .Net 4.0. It doesn't have to be pretty, it just has to be stable in the task of processing items in unordered pairs of twos in a parallel environment.
Here is what I'd do in rough Code:
ConcurrentQueuequeue = new ConcurrentQueue(); //can use a BlockingCollection too (as it's just a blocking ConcurrentQueue by default anyway)
public void OnUserStartedGame(User joiningUser)
{
User waitingUser;
if (this.gameQueue.TryDequeue(out waitingUser)) //if there's someone waiting, we'll get him
this.MatchUsers(waitingUser, joiningUser);
else
this.QueueUser(joiningUser); //it doesn't matter if there's already someone in the queue by now because, well, we are using a queue and it will sort itself out.
}
private void QueueUser(User user)
{
this.gameQueue.Enqueue(user);
}
private void MatchUsers(User first, User second)
{
//not sure what you do here
}
The basic idea being that if someone's wants to start a game and there's someone in your queue, you match them and start a game - if there's no-one, add them to the queue.
At best you'll only have one user in the queue at a time, but if not, well, that's not too bad either because as other users start games, the waiting ones will gradually removed and no new ones added until the queue is empty again.
If I could not put pairs of users into the collection for some reason, I would use ConcurrentQueue and try to TryDequeue 2 items at a time, if I can get only one - put it back. Wait as necessary.
I think the easiest solution here is to use locking: you will have one lock for all consumers (producers won't use any locks), which will make sure you always take the users in the correct order:
User firstUser;
User secondUser;
lock (consumerLock)
{
firstUser = userQueue.Take();
secondUser = userQueue.Take();
}
Process(firstUser, secondUser);
Another option, would be to have two queues: one for single users and one for pairs of users and have a process that transfers them from the first queue to the second one.
If you don't mind having wasting another thread, you can do this with two BlockingCollections:
while (true)
{
var firstUser = incomingUsers.Take();
var secondUser = incomingUsers.Take();
userPairs.Add(Tuple.Create(firstUser, secondUser));
}
You don't have to worry about locking here, because the queue for single users will have only one consumer, and the consumers of pairs can now use simple Take() safely.
If you do care about wasting a thread and can use TPL Dataflow, you can use BatchBlock<T>, which combines incoming items into batches of n items, where n is configured at the time of creation of the block, so you can set it to 2.
May this can helpd
public static IList<T> TakeMulti<T>(this BlockingCollection<T> me, int count = 100) where T : class
{
T last = null;
if (me.Count == 0)
{
last = me.Take(); // blocking when queue is empty
}
var result = new List<T>(count);
if (last != null)
{
result.Add(last);
}
//if you want to take more item on this time.
//if (me.Count < count / 2)
//{
// Thread.Sleep(1000);
//}
while (me.Count > 0 && result.Count <= count)
{
result.Add(me.Take());
}
return result;
}