Rx produce and consume on different threads

Rx produce and consume on different threads - c#

I have tried to simplify my issue by a sample code here. I have a producer thread constantly pumping in data and I am trying to batch it with a time delay between batches so that the UI has time to render it. But the result is not as expected, the produce and consumer seems to be on the same thread.
I don't want the batch buffer to sleep on the thread that is producing. Tried SubscribeOn did not help much. What am I doing wrong here, how do I get this to print different thread Ids on producer and consumer thread.
static void Main(string[] args)
{
var stream = new ReplaySubject<int>();
Task.Factory.StartNew(() =>
{
int seed = 1;
while (true)
{
Console.WriteLine("Thread {0} Producing {1}",
Thread.CurrentThread.ManagedThreadId, seed);
stream.OnNext(seed);
seed++;
Thread.Sleep(TimeSpan.FromMilliseconds(500));
}
});
stream.Buffer(5).Do(x =>
{
Console.WriteLine("Thread {0} sleeping to create time gap between batches",
Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(TimeSpan.FromSeconds(2));
})
.SubscribeOn(NewThreadScheduler.Default).Subscribe(items =>
{
foreach (var item in items)
{
Console.WriteLine("Thread {0} Consuming {1}",
Thread.CurrentThread.ManagedThreadId, item);
}
});
Console.Read();
}

Understanding the difference between ObserveOn and SubscribeOn is key here. See - ObserveOn and SubscribeOn - where the work is being done for an in depth explanation of these.
Also, you absolutely don't want to use a Thread.Sleep in your Rx. Or anywhere. Ever. Do is almost as evil, but Thead.Sleep is almost always totally evil. Buffer has serveral overloads you want to use instead - these include a time based overload and an overload that accepts a count limit and a time-limit, returning a buffer when either of these are reached. A time-based buffering will introduce the necessary concurrency between producer and consumer - that is, deliver the buffer to it's subscriber on a separate thread from the producer.
Also see these questions and answers which have good discussions on keeping consumers responsive (in the context of WPF here, but the points are generally applicable).
Process lots of small tasks and keep the UI responsive
Buffer data from database cursor while keeping UI responsive
The last question above specifically uses the time-based buffer overload. As I said, using Buffer or ObserveOn in your call chain will allow you to add concurrency between producer and consumer. You still need to take care that the processing of a buffer is still fast enough that you don't get a queue building up on the buffer subscriber.
If queues do build up, you'll need to think about means of applying backpressure, dropping updates and/or conflating the updates. These is a big topic too broad for in depth discussion here - but basically you either:
Drop events. There have been many ways discussed to tackle this in Rx. I current like Ignore incoming stream updates if last callback hasn't finished yet but also see With Rx, how do I ignore all-except-the-latest value when my Subscribe method is running and there are many other discussions of this.
Signal the producer out of band to tell it to slow down or send conflated updates, or
You introduce an operator that does in-stream conflation - like a smarter Buffer that could compress events to, for example, only include the latest price on a stock item etc. You can author operators that are sensitive to the time that OnNext invocations take to process, for example.
See if proper buffering helps first, then think about throttling/conflating events at the source as (a UI can only show so much infomation anway) - then consider smarter conflation as this can get quite complex. https://github.com/AdaptiveConsulting/ReactiveTrader is a good example of a project using some advanced conflation techniques.

Although the other answers are correct, I'd like to identify your actual problem as perhaps a misunderstanding of the behavior of Rx. Putting the producer to sleep blocks subsequent calls to OnNext and it seems as though you're assuming Rx automatically calls OnNext concurrently, but in fact it doesn't for very good reasons. Actually, Rx has a contract that requires serialized notifications.
See §§4.2, 6.7 in the Rx Design Guidelines for details.
Ultimately, it looks as though you're trying to implement the BufferIntrospective operator from Rxx. This operator allows you to pass in a concurrency-introducing scheduler, similar to ObserveOn, to create a concurrency boundary between a producer and a consumer. BufferIntrospective is a dynamic backpressure strategy that pushes out heterogeneously-sized batches based on the changing latencies of an observer. While the observer is processing the current batch, the operator buffers all incoming concurrent notifications. To accomplish this, the operator takes advantage of the fact that OnNext is a blocking call (per the §4.2 contract) and for that reason this operator should be applied as close to the edge of the query as possible, generally immediately before you call Subscribe.
As James described, you could call it a "smart buffering" strategy itself, or see it as the baseline for implementing such a strategy; e.g., I've also defined a SampleIntrospective operator that drops all but the last notification in each batch.

ObserveOn is probably what you want. It takes a SynchronizationContext as an argument, that should be the SynchronizationContext of your UI. If you don't know how to get it, see Using SynchronizationContext for sending events back to the UI for WinForms or WPF

Related

Thread Contention on a ConcurrentDictionary in C#

I have a C# .NET program that uses an external API to process events for real-time stock market data. I use the API callback feature to populate a ConcurrentDictionary with the data it receives on a stock-by-stock basis.
I have a set of algorithms that each run in a constant loop until a terminal condition is met. They are called like this (but all from separate calling functions elsewhere in the code):
Task.Run(() => ExecutionLoop1());
Task.Run(() => ExecutionLoop2());
...
Task.Run(() => ExecutionLoopN());
Each one of those functions calls SnapTotals():
public void SnapTotals()
{
foreach (KeyValuePair<string, MarketData> kvpMarketData in
new ConcurrentDictionary<string, MarketData>(Handler.MessageEventHandler.Realtime))
{
...
The Handler.MessageEventHandler.Realtime object is the ConcurrentDictionary that is updated in real-time by the external API.
At a certain specific point in the day, there is an instant burst of data that comes in from the API. That is the precise time I want my ExecutionLoop() functions to do some work.
As I've grown the program and added more of those execution loop functions, and grown the number of elements in the ConcurrentDictionary, the performance of the program as a whole has seriously degraded. Specifically, those ExecutionLoop() functions all seem to freeze up and take much longer to meet their terminal condition than they should.
I added some logging to all of the functions above, and to the function that updates the ConcurrentDictionary. From what I can gather, the ExecutionLoop() functions appear to access the ConcurrentDictionary so often that they block the API from updating it with real-time data. The loops are dependent on that data to meet their terminal condition so they cannot complete.
I'm stuck trying to figure out a way to re-architect this. I would like for the thread that updates the ConcurrentDictionary to have a higher priority but the message events are handled from within the external API. I don't know if ConcurrentDictionary was the right type of data structure to use, or what the alternative could be, because obviously a regular Dictionary would not work here. Or is there a way to "pause" my execution loops for a few milliseconds to allow the market data feed to catch up? Or something else?

Your basic approach is sound except for one fatal flaw: they are all hitting the same dictionary at the same time via iterators, sets, and gets. So you must do one thing: in SnapTotals you must iterate over a copy of the concurrent dictionary.
When you iterate over Handler.MessageEventHandler.Realtime or even new ConcurrentDictionary<string, MarketData>(Handler.MessageEventHandler.Realtime) you are using the ConcurrentDictionary<>'s iterator, which even though is thread-safe, is going to be using the dictionary for the entire period of iteration (including however long it takes to do the processing for each and every entry in the dictionary). That is most likely where the contention occurs.
Making a copy of the dictionary is much faster, so should lower contention.
Change SnapTotals to
public void SnapTotals()
{
var copy = Handler.MessageEventHandler.Realtime.ToArray();
foreach (var kvpMarketData in copy)
{
...
Now, each ExecutionLoopX can execute in peace without write-side contention (your API updates) and without read-side contention from the other loops. The write-side can execute without read-side contention as well.
The only "contention" should be for the short duration needed to do each copy.
And by the way, the dictionary copy (an array) is not threadsafe; it's just a plain array, but that is ok because each task is executing in isolation on its own copy.

I think that your main problem is not related to the ConcurrentDictionary, but to the large number of ExecutionLoopX methods. Each of these methods saturates a CPU core, and since the methods are more than the cores of your machine, the whole CPU is saturated. My assumption is that if you find a way to limit the degree of parallelism of the ExecutionLoopX methods to a number smaller than the Environment.ProcessorCount, your program will behave and perform better. Below is my suggestion for implementing this limitation.
The main obstacle is that currently your ExecutionLoopX methods are monolithic: they can't be separated to pieces so that they can be parallelized. My suggestion is to change their return type from void to async Task, and place an await Task.Yield(); inside the outer loop. This way it will be possible to execute them in steps, with each step being the code from the one await to the next.
Then create a TaskScheduler with limited concurrency, and a TaskFactory that uses this scheduler:
int maxDegreeOfParallelism = Environment.ProcessorCount - 1;
TaskScheduler scheduler = new ConcurrentExclusiveSchedulerPair(
TaskScheduler.Default, maxDegreeOfParallelism).ConcurrentScheduler;
TaskFactory taskFactory = new TaskFactory(scheduler);
Now you can parallelize the execution of the methods, by starting the tasks with the taskFactory.StartNew method instead of the Task.Run:
List<Task> tasks = new();
tasks.Add(taskFactory.StartNew(() => ExecutionLoop1(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop2(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop3(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop4(data)).Unwrap());
//...
Task.WaitAll(tasks.ToArray());
The .Unwrap() is needed because the taskFactory.StartNew returns a nested task (Task<Task>). The Task.Run method is also doing this unwrapping internally, when the action is asynchronous.
An online demo of this idea can be found here.
The Environment.ProcessorCount - 1 configuration means that one CPU core will be available for other work, like the communication with the external API and the updating of the ConcurrentDictionary.
A more cumbersome implementation of the same idea, using iterators and the Parallel.ForEach method instead of async/await, can be found in the first revision of this answer.

If you're not squeamish about mixing operations in a task, you could redesign such that instead of task A doing A things, B doing B things, C doing C things, etc. you can reduce the number of tasks to the number of processors, and thus run fewer concurrently, greatly easing contention.
So, for example, say you have just two processors. Make a "general purpose/pluggable" task wrapper that accepts delegates. So, wrapper 1 would accept delegates to do A and B work. Wrapper 2 would accept delegates to do C and D work. Then ask each wrapper to spin up a task that calls the delegates in a loop over the dictionary.
This would of course need to be measured. What I am proposing is, say, 4 tasks each doing 4 different types of processing. This is 4 units of work per loop over 4 loops. This is not the same as 16 tasks each doing 1 unit of work. In that case you have 16 loops.
16 loops intuitively would cause more contention than 4.
Again, this is a potential solution that should be measured. There is one drawback for sure: you will have to ensure that a piece of work within a task doesn't affect any of the others.

.NET Disruptor async patterns

I am using Disruptor-net in a C# application. I'm having some trouble understanding how to do async operations in the disruptor pattern.
Assuming I have a few event handlers, and the last one in the chain hands a message off to my business logic processors, how do I handle async operations inside of my business logic processor? When my business logic needs to do some database insert, does it hand a message off to my output disruptor, which does the insert, then publishes a new message on my input disruptor with all the state to continue the transaction?
In addition, within my output disruptor, would I use Tasks? I'm 99.9% sure I'd want to use tasks so I don't have a ton of event handlers blocking on async operations. How does that fit in with the disruptor pattern then? Seems kind of weird to just do something like this in my EventHandler..
void OnEvent(MyEvent evt, long sequence, bool endOfBatch)
{
db.InsertAsync(evt).ContinueWith(task => inputDisruptor.Publish(task));
}

The Disruptor has the following features:
Dedicated threads, which can be pinned / shielded / prioritized for better performance.
Explicit queues, which can be monitored and generate backpressure.
In-order message processing.
No heap allocations, which can help reduce GC pauses, or even remove them if your own code does not generate heap allocations.
Your code sample does not really follow the Disruptor philosophy:
Task.ContinueWith runs asynchronously by default, so the continuation will use thread-pool threads.
Because you are using the thread-pool, you have no guarantee on the continuation execution order. Even if you use TaskContinuationOptions.ExecuteSynchronously, you have no guarantee that InsertAsync will invoke the continuations in-order.
You are creating an implicit queue with all the pending insert operations. This queue is hidden and does not generate backpressure.
I will put aside the fact that your code is generating heap allocations. You will not benefit from the "no GC pauses" effect but it is probably very acceptable for your use-case.
Also, please note that batching is crucial to support high-throughput for IO operations. You should really use the Disruptor batches in your event handler.
I will simplify the problem to 3 event handlers:
PreInsertEventHandler: pre-insert logic (not shown here)
InsertEventHandler: insert logic
PostInsertEventHandler: post-insert logic
Of course, I am assuming that the post-insert logic must be run only after insert completion.
If your goal is to save the events in InsertEventHandler and to block until completion before processing the event in the next handler, you should probably just wait in InsertEventHandler.
InsertEventHandler:
void OnEvent(MyEvent evt, long sequence, bool endOfBatch)
{
_pendingInserts.Add((evt, task: db.InsertAsync(evt)));
if (endOfBatch)
{
var insertSucceeded = Task.WaitAll(_pendingInserts.Select(x => x.task).ToArray(), _insertTimeout);
foreach (var (pendingEvent, _) in _pendingInserts)
{
pendingEvent.InsertSucceeded = insertSucceeded;
}
_pendingInserts.Clear();
}
}
Of course, if your DB API exposes a bulk-insert method, it might be better to add the events in a list and to save them all at the end of the batch.
There are many other options, like waiting in PostInsertEventHandler, or queueing the insert results in another Disruptor, each coming with its own pros and cons. A SO answer might not be the best place to discuss and analyze all of them.

Improving performance of Parallel.For in C# with more methods

Recently I've stumbled upon a Parralel.For loop that performs way better than a regular for loop for my purposes.
This is how I use it:
Parallel.For(0, values.Count, i =>Products.Add(GetAllProductByID(values[i])));
It made my application work a lot faster, but still not fast enough. My question to you guys is:
Does Parallel.Foreach performs faster than Parallel.For?
Is there some "hybrid" method with whom I can combine my Parralel.For loop to perform even faster (i.e. use more CPU power)? If yes, how?
Can someone help me out with this?

If you want to play with parallel, I suggest using Parallel Linq (PLinq) instead of Parallel.For / Parallel.ForEach , e.g.
var Products = Enumerable
.Range(0, values.Count)
.AsParallel()
//.WithDegreeOfParallelism(10) // <- if you want, say 10 threads
.Select(i => GetAllProductByID(values[i]))
.ToList(); // <- this is thread safe now
With a help of With methods (e.g. WithDegreeOfParallelism) you can try tuning you implementation.

There are two related concepts: asynchronous programming and multithreading. Basically, to do things "in parallel" or asynchronously, you can either create new threads or work asynchronously on the same thread.
Keep in mind that either way you'll need some mechanism to prevent race conditions. From the Wikipedia article I linked to, a race condition is defined as follows:
A race condition or race hazard is the behavior of an electronic,
software or other system where the output is dependent on the sequence
or timing of other uncontrollable events. It becomes a bug when events
do not happen in the order the programmer intended.
As a few people have mentioned in the comments, you can't rely on the standard List class to be thread-safe - i.e. it might behave in unexpected ways if you're updating it from multiple threads. Microsoft now offers special "built-in" collection classes (in the System.Collections.Concurrent namespace) that'll behave in the expected way if you're updating it asynchronously or from multiple threads.
For well-documented libraries (and Microsoft's generally pretty good about this in their documentation), the documentation will often explicitly state whether the class or method in question is thread-safe. For example, in the documentation for System.Collections.Generic.List, it states the following:
Public static (Shared in Visual Basic) members of this type are thread
safe. Any instance members are not guaranteed to be thread safe.
In terms of asynchronous programming (vs. multithreading), my standard illustration of this is as follows: suppose you go a restaurant with 10 people. When the waiter comes by, the first person he asks for his order isn't ready; however, the other 9 people are. Thus, the waiter asks the other 9 people for their orders and then comes back to the original guy. (It's definitely not the case that they'll get a second waiter to wait for the original guy to be ready to order and doing so probably wouldn't save much time anyway). That's how async/await typically works (the exception being that some of the Task Parallel library calls, like Thread.Run(...), actually are executing on other threads - in our illustration, bringing in a second waiter - so make sure you check the documentation for which is which).
Basically, which you choose (asynchronously on the same thread or creating new threads) depends on whether you're trying to do something that's I/O-bound (i.e. you're just waiting for an operation to complete or for a result) or CPU-bound.
If your main purpose is to wait for a result from Ebay, it would probably be better to work asynchronously in the same thread as you may not get much of a performance benefit for using multithreading. Think back to our analogy: bringing in a second waiter just to wait for the first guy to be ready to order isn't necessarily any better than just having the waiter to come back to him.
I'm not sitting in front of an IDE so forgive me if this syntax isn't perfect, but here's an approximate idea of what you can do:
public async Task GetResults(int[] productIDsToGet) {
var tasks = new List<Task>();
foreach (int productID in productIDsToGet) {
Task task = GetResultFromEbay(productID);
tasks.Add(task);
}
// Wait for all of the tasks to complete
await Task.WhenAll(tasks);
}
private async Task GetResultFromEbay(int productIdToGet) {
// Get result asynchronously from eBay
}

How can two threads access a common array of buffers with minimal blocking ? (c#)

I'm working on an image processing application where I have two threads on top of my main thread:
1 - CameraThread that captures images from the webcam and writes them into a buffer
2 - ImageProcessingThread that takes the latest image from that buffer for filtering.
The reason why this is multithreaded is because speed is critical and I need to have CameraThread to keep grabbing pictures and making the latest capture ready to pick up by ImageProcessingThread while it's still processing the previous image.
My problem is about finding a fast and thread-safe way to access that common buffer and I've figured that, ideally, it should be a triple buffer (image[3]) so that if ImageProcessingThread is slow, then CameraThread can keep on writing on the two other images and vice versa.
What sort of locking mechanism would be the most appropriate for this to be thread-safe ?
I looked at the lock statement but it seems like it would make a thread block-waiting for another one to be finished and that would be against the point of triple buffering.
Thanks in advance for any idea or advice.
J.

This could be a textbook example of the Producer-Consumer Pattern.
If you're going to be working in .NET 4, you can use the IProducerConsumerCollection<T> and associated concrete classes to provide your functionality.
If not, have a read of this article for more information on the pattern, and this question for guidance in writing your own thread-safe implementation of a blocking First-In First-Out structure.

Personally I think you might want to look at a different approach for this, rather than writing to a centralized "buffer" that you have to manage access to, could you switch to an approach that uses events. Once the camera thread has "received" an image it could raise an event, that passed the image data off to the process that actually handles the image processing.
An alternative would be to use a Queue, which the queue is a FIFO (First in First Out) data structure, now it is not thread-safe for access so you would have to lock it, but your locking time would be very minimal to put the item in the queue. There are also other Queue classes out there that are thread-safe that you could use.
Using your approach there are a number of issues that you would have to contend with. Blocking as you are accessing the array, limitations as to what happens after you run out of available array slots, blocking, etc..

Given the amount of precessing needed for a picture, I don't think that a simple locking scheme would be your bottleneck. Measure before you start wasting time on the wrong problem.
Be very careful with 'lock-free' solutions, they are always more complicated than they look.
And you need a Queue, not an array.
If you can use dotNET4 I would use the ConcurrentQuue.

You will have to run some performance metrics, but take a look at lock free queues.
See this question and its associated answers, for example.
In your particular application, though, you processor is only really interested in the most recent image. In effect this means you only really want to maintain a queue of two items (the new item and the previous item) so that there is no contention between reading and writing. You could, for example, have your producer remove old entries from the queue once a new one is written.
Edit: having said all this, I think there is a lot of merit in what is said in Mitchel Sellers's answer.

I would look at using a ReaderWriterLockSlim which allows fast read and upgradable locks for writes.

This isn't a direct answer to your question, but it may be better to rethink your concurrency model. Locks are a terrible way to syncronize anything -- too low level, error prone, etc. Try to rethink your problem in terms of message passing concurrency:
The idea here is that each thread is its own tightly contained message loop, and each thread has a "mailbox" for sending and receiving messages -- we're going to use the term MailboxThread to distinguish these types of objects from plain jane threads.
So instead of having two threads accessing the same buffer, you instead have two MailboxThreads sending and receiving messages between one another (pseudocode):
let filter =
while true
let image = getNextMsg() // blocks until the next message is recieved
process image
let camera(filterMailbox) =
while true
let image = takePicture()
filterMailbox.SendMsg(image) // sends a message asyncronous
let filterMailbox = Mailbox.Start(filter)
let cameraMailbox = Mailbox.Start(camera(filterMailbox))
Now you're processing threads don't know or care about any buffers at all. They just wait for messages and process them whenever they're available. If you send to many message for the filterMailbox to handle, those messages get enqueued to be processed later.
The hard part here is actually implementing your MailboxThread object. Although it requires some creativity to get right, its wholly possible to implement these types of objects so that they only hold a thread open while processing a message, and release the executing thread back to the thread-pool when there are no messages left to handle (this implementation allows you to terminate your application without dangling threads).
The advantage here is how threads send and receive messages without worrying about locking or syncronization. Behind the scenes, you need to lock your message queue between enqueing or dequeuing a message, but that implementation detail is completely transparent to your client-side code.

Just an Idea.
Since we're talking about only two threads, we can make some assumptions.
Lets use your tripple buffer idea. Assuming there is only 1 writer and 1 reader thread, we can toss a "flag" back-and-forth in the form of an integer. Both threads will continuously spin but update their buffers.
WARNING: This will only work for 1 reader thread
Pseudo Code
Shared Variables:
int Status = 0; //0 = ready to write; 1 = ready to read
Buffer1 = New bytes[]
Buffer2 = New bytes[]
Buffer3 = New bytes[]
BufferTmp = null
thread1
{
while(true)
{
WriteData(Buffer1);
if (Status == 0)
{
BufferTmp = Buffer1;
Buffer1 = Buffer2;
Buffer2 = BufferTmp;
Status = 1;
}
}
}
thread2
{
while(true)
{
ReadData(Buffer3);
if (Status == 1)
{
BufferTmp = Buffer1;
Buffer2 = Buffer3;
Buffer3 = BufferTmp;
Status = 0;
}
}
}
just remember, you're writedata method wouldn't create new byte objects, but update the current one. Creating new objects is expensive.
Also, you may want a thread.sleep(1) in an ELSE statement to accompany the IF statements, otherwise one a single core CPU, a spinning thread will increase the latency before the other thread gets scheduled. eg. The write thread may run spin 2-3 times before the read thread gets scheduled, because the schedulers sees the write thread doing "work"

Implementing multithreading in C# (code review)

Greetings.
I'm trying to implement some multithreaded code in an application. The purpose of this code is to validate items that the database gives it. Validation can take quite a while (a few hundred ms to a few seconds), so this process needs to be forked off into its own thread for each item.
The database may give it 20 or 30 items a second in the beginning, but that begins to decline rapidly, eventually reaching about 65K items over 24 hours, at which point the application exits.
I'd like it if anyone more knowledgeable could take a peek at my code and see if there's any obvious problems. No one I work with knows multithreading, so I'm really just on my own, on this one.
Here's the code. It's kinda long but should be pretty clear. Let me know if you have any feedback or advice. Thanks!
public class ItemValidationService
{
/// <summary>
/// The object to lock on in this class, for multithreading purposes.
/// </summary>
private static object locker = new object();
/// <summary>Items that have been validated.</summary>
private HashSet<int> validatedItems;
/// <summary>Items that are currently being validated.</summary>
private HashSet<int> validatingItems;
/// <summary>Remove an item from the index if its links are bad.</summary>
/// <param name="id">The ID of the item.</param>
public void ValidateItem(int id)
{
lock (locker)
{
if
(
!this.validatedItems.Contains(id) &&
!this.validatingItems.Contains(id)
){
ThreadPool.QueueUserWorkItem(sender =>
{
this.Validate(id);
});
}
}
} // method
private void Validate(int itemId)
{
lock (locker)
{
this.validatingItems.Add(itemId);
}
// *********************************************
// Time-consuming routine to validate an item...
// *********************************************
lock (locker)
{
this.validatingItems.Remove(itemId);
this.validatedItems.Add(itemId);
}
} // method
} // class

The thread pool is a convenient choice if you have light weight sporadic processing that isn't time sensitive. However, I recall reading on MSDN that it's not appropriate for large scale processing of this nature.
I used it for something quite similar to this and regret it. I took a worker-thread approach in subsequent apps and am much happier with the level of control I have.
My favorite pattern in the worker-thread model is to create a master thread which holds a queue of tasks items. Then fork a bunch of workers that pop items off that queue to process. I use a blocking queue so that when there are no items the process, the workers just block until something is pushed onto the queue. In this model, the master thread produces work items from some source (db, etc.) and the worker threads consume them.

I second the idea of using a blocking queue and worker threads. Here is a blocking queue implementation that I've used in the past with good results:
https://www.codeproject.com/Articles/8018/Bounded-Blocking-Queue-One-Lock
What's involved in your validation logic? If its mainly CPU bound then I would create no more than 1 worker thread per processor/core on the box. This will tell you the number of processors:
Environment.ProcessorCount
If your validation involves I/O such as File Access or database access then you could use a few more threads than the number of processors.

Be careful, QueueUserWorkItem might fail

There is a possible logic error in the code posted with the question, depending on where the item id in ValidateItem(int id) comes from. Why? Because although you correctly lock your validatingItems and validatedItems queues before queing a work item, you do not add the item to the validatingItems queue until the new thread spins up. That means there could be a time gap where another thread calls ValidateItem(id) with the same id (unless this is running on a single main thread).
I would add item to the validatingItems queue just before queuing the item, inside the lock.
Edit: also QueueUserWorkItem() returns a bool so you should use the return value to make sure the item was queued and THEN add it to the validatingItems queue.

ThreadPool may not be optimal for jamming so much at once into it. You may want to research the upper limits of its capabilities and/or roll your own.
Also, there is a race condition that exists in your code, if you expect no duplicate validations. The call to
this.validatingItems.Add(itemId);
needs to happen in the main thread (ValidateItem), not in the thread pool thread (Validate method). This call should occur a line before the queueing of the work item to the pool.
A worse bug is found by not checking the return of QueueUserWorkItem. Queueing can fail, and why it doesn't throw an exception is a mystery to us all. If it returns false, you need to remove the item that was added to the validatingItems list, and handle the error (throw exeception probably).

I would be concerned about performance here. You indicated that the database may give it 20-30 items per second and an item could take up to a few seconds to be validated. That could be quite a large number of threads -- using your metrics, worst case 60-90 threads! I think you need to reconsider the design here. Michael mentioned a nice pattern. The use of the queue really helps keep things under control and organized. A semaphore could also be employed to control number of threads created -- i.e. you could have a maximum number of threads allowed, but under smaller loads, you wouldn't necessarily have to create the maximum number if fewer ended up getting the job done -- i.e. your own pool size could be dynamic with a cap.
When using the thread-pool, I also find it more difficult to monitor the execution of threads from the pool in their performing the work. So, unless it's fire and forget, I am in favor of more controlled execution. I know you mentioned that your app exits after the 65K items are all completed. How are you monitoring you threads to determine if they have completed their work -- i.e. all queued workers are done. Are you monitoring the status of all items in the HashSets? I think by queuing your items up and having your own worker threads consume off that queue, you can gain more control. Albeit, this can come at the cost of more overhead in terms of signaling between threads to indicate when all items have been queued allowing them to exit.

You could also try using the CCR - Concurrency and Coordination Runtime. It's buried inside Microsoft Robotics Studio, but provides an excellent API for doing this sort of thing.
You'd just need to create a "Port" (essentially a queue), hook up a receiver (method that gets called when something is posted to it), and then post work items to it. The CCR handles the queue and the worker thread to run it on.
Here's a video on Channel9 about the CCR.
It's very high-performance and is even being used for non-Robotics stuff (Myspace.com uses it behind the scenese for their content-delivery network).

I would recommend looking into MSDN: Task Parallel Library - DataFlow. You can find examples of implementing Producer-Consumer in your case would be the database producing items to validate and the validation routine becomes the consumer.
Also recommend using ConcurrentDictionary<TKey, TValue> as a "Concurrent" hash set where you just populate the keys with no values :). You can potentially make your code lock-free.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.