What can happen in two concurrent NetworkStream.BeginWrite calls? - c#

I have two methods on my Sender class:
public void SendMessage(OutgoingMessage msg)
{
try
{
stream.BeginWrite(msg.TcpData, 0, 16, messageSentCallback, msg);
}
catch
{
// ...
}
}
private void messageSentCallback(IAsyncResult result)
{
stream.EndWrite(result);
if (result.IsCompleted)
onDataSent(result.AsyncState as OutgoingMessage);
}
Other parts of the program can call (if they have an access to the Sender) the SendMessage() method. Since the program works in multi-threading environment, multiple threads have access to the Sender object.
I have 2 questions:
Q1) Would making two concurrent calls to the SendMessage method be able to mess up the TCP communication (by filling the TCP outgoing buffer with mixed data)?
Q2) Would enclosing the stream.BeginWrite() call into the lock { } solve this problem?
As far as I understand, the call to BeginWrite simply stores the data into the TCP outgoing buffer. Is that right?

Yes, a lock is required to avoid problems. However, I would switch to a different approach, both to solve concurrency problems as well as to make the thread interaction more easy to reason about.
You could have a shared queue where several threads put requests that need to be written to the stream. A single thread then reads requests from the queue and makes write operations. Now it's much easier to understand what is going on and you don't have to worry about synchronizing the writes. You could use one of the concurrent collections like ConcurrentQueue.

MSDN Says
As long as there is one unique thread for the write operations and one
unique thread for the read operations, there will be no
cross-interference between read and write threads and no
synchronization is required.
Which means that if you have more than one threads sending data then you should use lock to make sure only one thread calls BeginWrite at a time in order to send data without any interference

If you would like to minimize blocking and maintain high concurrency with multiple writer threads I would recommend using the Socket.SendAsync which accepts a SocketAsyncEventArgs.
You could preallocate a number of SocketAsyncEventArgs (with its associated buffer space) that are used as writers, in which case, rather than having a lock you would have a SemaphoreSlim which would allow a number of 'simultaneous' writes pushing the synching lower down the protocol stack.
Here is a Code Gallery sample that could get you started (also demonstrates pooling for your buffers.)
Here is a codeproject article that also demonstrates its use.
Good luck!

Related

C# timed sempahore to syncronize two threads and a single output buffer

ADDED PREFACE
Here I want better explain the scenario of my application.
I need a windows service to "convert" a SerialPort into a TCPPort. For instance let's say I have a serial ticket printer connected to a COM port for raw ascii stream and I want to access it by TCP sockets from network. The result should be that the serial printer becomes a network printer, my service should link many tcp sockets to com port.
This is the scheme:
The main problem is that the COM port has a unique connection but here I can have many simultaneous connection from network clients. I need to synchronize writes to COMport and get output from the COMport and copy that to all connected TCP clients.
With TCPconnections I cannot know when a write stream is really close, because a network client can send a print job without closing its connection and send another job after a while.
Serial printers are inline printer and there is not a start/end command, it can simply receive ascii chars and they are printer in receiving order.
This is because I need to be sure that network input will not be mixed and I want a timer that can understand that the job is really end before relase the synchronized write lock.
ORIGINAL QUESTION
I have two threads: A, B.
Both threads have to write in a single output buffer by WriteToOutput() method, and I want to be sure that the output will not be mixed if both A and B want to write in the output at the same time.
For first I need a simple semaphore:
private object locker = new object();
public void WriteToOutput(byte[] threadBuffer)
{
lock (locker)
{
//... copy threadBuffer to outputBuffer
}
}
But I need a little more safety to divide the outputs because a thread can empty its buffer but it can be filler right after the lock release.
So in case of concurrency if the thread A gets the lock, I want to wait the second thread B for while, let's say a tick of 1s. If in this time the thread A wants to write something more, it has the priority, and B has to wait another tick. If the thread A do not write for a n entire tick, then it can really relase the lock and the B thread can get the lock.
Just for correction - that's a monitor, not a semaphore.
As for the rest, this sounds like a weird multi-threaded design, and it's going to be brittle and unreliable. Make it obvious when it's safe to release the shared resource - relying on any kind of timing for synchronization is a terrible idea.
The problem is that the WriteToOutput method is obviously not a good point for the synchronization! If you need to ensure multiple writes from the same thread are serialized, you need to move your synchronization point somewhere else. Or, pass a Stream instead of byte[], and read that until it's closed inside the lock - this will effectively do the same thing, move the responsibility to the callee. Just make sure you don't lock it up forever by forgetting to close the stream :) Another alternative would be to use a BlockingCollection<byte[]>. It's hard to tell what's the best option when we don't really know what you're actually trying to do.
EDIT:
Okay, serial port communication is about the only proper use of timing like this I can think of. Of course, it can also be a bit tricky to handle the communication on a non-realtime system.
The best way to solve this would be to have a single endpoint for all your access to the serial port which would handle the communication and synchronization. Instead of calling the method from your other threads, you would just post data that the endpoint would read. However, this requires you to have a way of identifying the other threads - and I'm not sure if you have something like that (perhaps the EndPoint of the TCP socket?). The simplest way would be using the BlockingCollection:
private readonly object _syncObject = new object();
public void SendData(BlockingCollection<byte[]> data)
{
lock (_syncObject)
{
byte[] buffer;
while (data.TryTake(out buffer, TimeSpan.FromSeconds(1)))
{
// Send the data
}
}
}
This will keep reading and sending data from the queue, as long as it can get another buffer in at most second-long periods - if it takes more than a second, the method will exit and another thread will have a chance.
In the socket receive thread, you'd declare the blocking collection - this will vary based on your implementation of the receive code. If you have a single instance of some class for each of the different sockets, you can just declare it as an instance field. If not, you could use ThreadLocal. This assumes you're using manual threads, one per socket - if not, you'll need a different storage.
private readonly BlockingCollection<byte[]> _dataQueue = new BlockingCollection<byte[]>();
private void ReceiveHandler(byte[] data)
{
// This assumes the byte array passed is already a copy
_data.Add(data);
SendData(_dataQueue);
}
This is definitely not the best way to handle this, but it's certainly the simplest I can think of right now - it's barely any code at all, and it only uses lock and BlockingCollection.
I'd take a look at ReaderWriterLockSlim.
https://msdn.microsoft.com/en-us/library/system.threading.readerwriterlockslim(v=vs.110).aspx

Is this a correct use of multithreading design? (C#)

I'm building a small chat program that consists of a server and client. The server keeps a list of clients that it interacts with.
I've got two worker threads on the server. One handles incoming client connections. The other handles incoming client messages.
Now, since both threads interact with a List called 'clients', I've done something like this.
// The clients list looks something like this...
List<TcpClient> clients;
// This is running on one thread.
ConnectionHandler()
{
while(true)
{
// Wait for client to connect, etc. etc.
// Now, add the client to my clients List.
lock(clients)clients.Add(myNewClient);
}
}
// This is running on another thread.
ClientHandler()
{
while(true)
{
lock(clients)
{
/*
This will be handling things like incoming messages
and clients disconnecting (clients being removed from
the 'clients' List
*/
}
}
}
Is this a correct use of locks to prevent my List from being altered by two different threads at once?
I haven't had any problems doing this so far, but I just want to make sure it's correct.
This is correct, but make sure that ClientHandler does not hold the lock for too long. It should never hold the lock while blocking (e.g. caused by an IO operation on a socket). If you violate this rule you will find your throughput being destroyed (still maintaining correctness).
Do you have single writer and multiple readers? Have a look at ReaderWriterLock and this collections.
Looks kinda OK. Chat servers are periously tricky for multithreadedly-challenged. Exceptions could be raised inside locks, for example, when a server-client socket object gets a disconnect but, before its thread can remove the object from the list, another thread locks the list and tries to write to the disconnected socket.
A note (on top) - since you don't initialize the field (ie I don't see how you do it, when you might destroy and re-initialize etc.) - make sure you're locking the same instance, e.g. see this Lock on an object that might change during code execution
Looks fine to me but I would make this correction:
private readonly List<TcpClient> clients = new List<TcpClient>();
You can also create the list in the constructor but keep it as readonly. This is key to make sure you're locking on the same object. Otherwise, if you happen recreate clients list, your code would stop being thread safe.

How can two threads access a common array of buffers with minimal blocking ? (c#)

I'm working on an image processing application where I have two threads on top of my main thread:
1 - CameraThread that captures images from the webcam and writes them into a buffer
2 - ImageProcessingThread that takes the latest image from that buffer for filtering.
The reason why this is multithreaded is because speed is critical and I need to have CameraThread to keep grabbing pictures and making the latest capture ready to pick up by ImageProcessingThread while it's still processing the previous image.
My problem is about finding a fast and thread-safe way to access that common buffer and I've figured that, ideally, it should be a triple buffer (image[3]) so that if ImageProcessingThread is slow, then CameraThread can keep on writing on the two other images and vice versa.
What sort of locking mechanism would be the most appropriate for this to be thread-safe ?
I looked at the lock statement but it seems like it would make a thread block-waiting for another one to be finished and that would be against the point of triple buffering.
Thanks in advance for any idea or advice.
J.
This could be a textbook example of the Producer-Consumer Pattern.
If you're going to be working in .NET 4, you can use the IProducerConsumerCollection<T> and associated concrete classes to provide your functionality.
If not, have a read of this article for more information on the pattern, and this question for guidance in writing your own thread-safe implementation of a blocking First-In First-Out structure.
Personally I think you might want to look at a different approach for this, rather than writing to a centralized "buffer" that you have to manage access to, could you switch to an approach that uses events. Once the camera thread has "received" an image it could raise an event, that passed the image data off to the process that actually handles the image processing.
An alternative would be to use a Queue, which the queue is a FIFO (First in First Out) data structure, now it is not thread-safe for access so you would have to lock it, but your locking time would be very minimal to put the item in the queue. There are also other Queue classes out there that are thread-safe that you could use.
Using your approach there are a number of issues that you would have to contend with. Blocking as you are accessing the array, limitations as to what happens after you run out of available array slots, blocking, etc..
Given the amount of precessing needed for a picture, I don't think that a simple locking scheme would be your bottleneck. Measure before you start wasting time on the wrong problem.
Be very careful with 'lock-free' solutions, they are always more complicated than they look.
And you need a Queue, not an array.
If you can use dotNET4 I would use the ConcurrentQuue.
You will have to run some performance metrics, but take a look at lock free queues.
See this question and its associated answers, for example.
In your particular application, though, you processor is only really interested in the most recent image. In effect this means you only really want to maintain a queue of two items (the new item and the previous item) so that there is no contention between reading and writing. You could, for example, have your producer remove old entries from the queue once a new one is written.
Edit: having said all this, I think there is a lot of merit in what is said in Mitchel Sellers's answer.
I would look at using a ReaderWriterLockSlim which allows fast read and upgradable locks for writes.
This isn't a direct answer to your question, but it may be better to rethink your concurrency model. Locks are a terrible way to syncronize anything -- too low level, error prone, etc. Try to rethink your problem in terms of message passing concurrency:
The idea here is that each thread is its own tightly contained message loop, and each thread has a "mailbox" for sending and receiving messages -- we're going to use the term MailboxThread to distinguish these types of objects from plain jane threads.
So instead of having two threads accessing the same buffer, you instead have two MailboxThreads sending and receiving messages between one another (pseudocode):
let filter =
while true
let image = getNextMsg() // blocks until the next message is recieved
process image
let camera(filterMailbox) =
while true
let image = takePicture()
filterMailbox.SendMsg(image) // sends a message asyncronous
let filterMailbox = Mailbox.Start(filter)
let cameraMailbox = Mailbox.Start(camera(filterMailbox))
Now you're processing threads don't know or care about any buffers at all. They just wait for messages and process them whenever they're available. If you send to many message for the filterMailbox to handle, those messages get enqueued to be processed later.
The hard part here is actually implementing your MailboxThread object. Although it requires some creativity to get right, its wholly possible to implement these types of objects so that they only hold a thread open while processing a message, and release the executing thread back to the thread-pool when there are no messages left to handle (this implementation allows you to terminate your application without dangling threads).
The advantage here is how threads send and receive messages without worrying about locking or syncronization. Behind the scenes, you need to lock your message queue between enqueing or dequeuing a message, but that implementation detail is completely transparent to your client-side code.
Just an Idea.
Since we're talking about only two threads, we can make some assumptions.
Lets use your tripple buffer idea. Assuming there is only 1 writer and 1 reader thread, we can toss a "flag" back-and-forth in the form of an integer. Both threads will continuously spin but update their buffers.
WARNING: This will only work for 1 reader thread
Pseudo Code
Shared Variables:
int Status = 0; //0 = ready to write; 1 = ready to read
Buffer1 = New bytes[]
Buffer2 = New bytes[]
Buffer3 = New bytes[]
BufferTmp = null
thread1
{
while(true)
{
WriteData(Buffer1);
if (Status == 0)
{
BufferTmp = Buffer1;
Buffer1 = Buffer2;
Buffer2 = BufferTmp;
Status = 1;
}
}
}
thread2
{
while(true)
{
ReadData(Buffer3);
if (Status == 1)
{
BufferTmp = Buffer1;
Buffer2 = Buffer3;
Buffer3 = BufferTmp;
Status = 0;
}
}
}
just remember, you're writedata method wouldn't create new byte objects, but update the current one. Creating new objects is expensive.
Also, you may want a thread.sleep(1) in an ELSE statement to accompany the IF statements, otherwise one a single core CPU, a spinning thread will increase the latency before the other thread gets scheduled. eg. The write thread may run spin 2-3 times before the read thread gets scheduled, because the schedulers sees the write thread doing "work"

C# thread pool limiting threads

Alright...I've given the site a fair search and have read over many posts about this topic. I found this question: Code for a simple thread pool in C# especially helpful.
However, as it always seems, what I need varies slightly.
I have looked over the MSDN example and adapted it to my needs somewhat. The example I refer to is here: http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80,printer).aspx
My issue is this. I have a fairly simple set of code that loads a web page via the HttpWebRequest and WebResponse classes and reads the results via a Stream. I fire off this method in a thread as it will need to executed many times. The method itself is pretty short, but the number of times it needs to be fired (with varied data for each time) varies. It can be anywhere from 1 to 200.
Everything I've read seems to indicate the ThreadPool class being the prime candidate. Here is what things get tricky. I might need to fire off this thing say 100 times, but I can only have 3 threads at most running (for this particular task).
I've tried setting the MaxThreads on the ThreadPool via:
ThreadPool.SetMaxThreads(3, 3);
I'm not entirely convinced this approach is working. Furthermore, I don't want to clobber other web sites or programs running on the system this will be running on. So, by limiting the # of threads on the ThreadPool, can I be certain that this pertains to my code and my threads only?
The MSDN example uses the event drive approach and calls WaitHandle.WaitAll(doneEvents); which is how I'm doing this.
So the heart of my question is, how does one ensure or specify a maximum number of threads that can be run for their code, but have the code keep running more threads as the previous ones finish up until some arbitrary point? Am I tackling this the right way?
Sincerely,
Jason
Okay, I've added a semaphore approach and completely removed the ThreadPool code. It seems simple enough. I got my info from: http://www.albahari.com/threading/part2.aspx
It's this example that showed me how:
[text below here is a copy/paste from the site]
A Semaphore with a capacity of one is similar to a Mutex or lock, except that the Semaphore has no "owner" – it's thread-agnostic. Any thread can call Release on a Semaphore, while with Mutex and lock, only the thread that obtained the resource can release it.
In this following example, ten threads execute a loop with a Sleep statement in the middle. A Semaphore ensures that not more than three threads can execute that Sleep statement at once:
class SemaphoreTest
{
static Semaphore s = new Semaphore(3, 3); // Available=3; Capacity=3
static void Main()
{
for (int i = 0; i < 10; i++)
new Thread(Go).Start();
}
static void Go()
{
while (true)
{
s.WaitOne();
Thread.Sleep(100); // Only 3 threads can get here at once
s.Release();
}
}
}
Note: if you are limiting this to "3" just so you don't overwhelm the machine running your app, I'd make sure this is a problem first. The threadpool is supposed to manage this for you. On the other hand, if you don't want to overwhelm some other resource, then read on!
You can't manage the size of the threadpool (or really much of anything about it).
In this case, I'd use a semaphore to manage access to your resource. In your case, your resource is running the web scrape, or calculating some report, etc.
To do this, in your static class, create a semaphore object:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
Then, in each thread, you do this:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
try
{
// wait your turn (decrement)
S.WaitOne();
// do your thing
}
finally {
// release so others can go (increment)
S.Release();
}
Each thread will block on the S.WaitOne() until it is given the signal to proceed. Once S has been decremented 3 times, all threads will block until one of them increments the counter.
This solution isn't perfect.
If you want something a little cleaner, and more efficient, I'd recommend going with a BlockingQueue approach wherein you enqueue the work you want performed into a global Blocking Queue object.
Meanwhile, you have three threads (which you created--not in the threadpool), popping work out of the queue to perform. This isn't that tricky to setup and is very fast and simple.
Examples:
Best threading queue example / best practice
Best method to get objects from a BlockingQueue in a concurrent program?
It's a static class like any other, which means that anything you do with it affects every other thread in the current process. It doesn't affect other processes.
I consider this one of the larger design flaws in .NET, however. Who came up with the brilliant idea of making the thread pool static? As your example shows, we often want a thread pool dedicated to our task, without having it interfere with unrelated tasks elsewhere in the system.

Implementing multithreading in C# (code review)

Greetings.
I'm trying to implement some multithreaded code in an application. The purpose of this code is to validate items that the database gives it. Validation can take quite a while (a few hundred ms to a few seconds), so this process needs to be forked off into its own thread for each item.
The database may give it 20 or 30 items a second in the beginning, but that begins to decline rapidly, eventually reaching about 65K items over 24 hours, at which point the application exits.
I'd like it if anyone more knowledgeable could take a peek at my code and see if there's any obvious problems. No one I work with knows multithreading, so I'm really just on my own, on this one.
Here's the code. It's kinda long but should be pretty clear. Let me know if you have any feedback or advice. Thanks!
public class ItemValidationService
{
/// <summary>
/// The object to lock on in this class, for multithreading purposes.
/// </summary>
private static object locker = new object();
/// <summary>Items that have been validated.</summary>
private HashSet<int> validatedItems;
/// <summary>Items that are currently being validated.</summary>
private HashSet<int> validatingItems;
/// <summary>Remove an item from the index if its links are bad.</summary>
/// <param name="id">The ID of the item.</param>
public void ValidateItem(int id)
{
lock (locker)
{
if
(
!this.validatedItems.Contains(id) &&
!this.validatingItems.Contains(id)
){
ThreadPool.QueueUserWorkItem(sender =>
{
this.Validate(id);
});
}
}
} // method
private void Validate(int itemId)
{
lock (locker)
{
this.validatingItems.Add(itemId);
}
// *********************************************
// Time-consuming routine to validate an item...
// *********************************************
lock (locker)
{
this.validatingItems.Remove(itemId);
this.validatedItems.Add(itemId);
}
} // method
} // class
The thread pool is a convenient choice if you have light weight sporadic processing that isn't time sensitive. However, I recall reading on MSDN that it's not appropriate for large scale processing of this nature.
I used it for something quite similar to this and regret it. I took a worker-thread approach in subsequent apps and am much happier with the level of control I have.
My favorite pattern in the worker-thread model is to create a master thread which holds a queue of tasks items. Then fork a bunch of workers that pop items off that queue to process. I use a blocking queue so that when there are no items the process, the workers just block until something is pushed onto the queue. In this model, the master thread produces work items from some source (db, etc.) and the worker threads consume them.
I second the idea of using a blocking queue and worker threads. Here is a blocking queue implementation that I've used in the past with good results:
https://www.codeproject.com/Articles/8018/Bounded-Blocking-Queue-One-Lock
What's involved in your validation logic? If its mainly CPU bound then I would create no more than 1 worker thread per processor/core on the box. This will tell you the number of processors:
Environment.ProcessorCount
If your validation involves I/O such as File Access or database access then you could use a few more threads than the number of processors.
Be careful, QueueUserWorkItem might fail
There is a possible logic error in the code posted with the question, depending on where the item id in ValidateItem(int id) comes from. Why? Because although you correctly lock your validatingItems and validatedItems queues before queing a work item, you do not add the item to the validatingItems queue until the new thread spins up. That means there could be a time gap where another thread calls ValidateItem(id) with the same id (unless this is running on a single main thread).
I would add item to the validatingItems queue just before queuing the item, inside the lock.
Edit: also QueueUserWorkItem() returns a bool so you should use the return value to make sure the item was queued and THEN add it to the validatingItems queue.
ThreadPool may not be optimal for jamming so much at once into it. You may want to research the upper limits of its capabilities and/or roll your own.
Also, there is a race condition that exists in your code, if you expect no duplicate validations. The call to
this.validatingItems.Add(itemId);
needs to happen in the main thread (ValidateItem), not in the thread pool thread (Validate method). This call should occur a line before the queueing of the work item to the pool.
A worse bug is found by not checking the return of QueueUserWorkItem. Queueing can fail, and why it doesn't throw an exception is a mystery to us all. If it returns false, you need to remove the item that was added to the validatingItems list, and handle the error (throw exeception probably).
I would be concerned about performance here. You indicated that the database may give it 20-30 items per second and an item could take up to a few seconds to be validated. That could be quite a large number of threads -- using your metrics, worst case 60-90 threads! I think you need to reconsider the design here. Michael mentioned a nice pattern. The use of the queue really helps keep things under control and organized. A semaphore could also be employed to control number of threads created -- i.e. you could have a maximum number of threads allowed, but under smaller loads, you wouldn't necessarily have to create the maximum number if fewer ended up getting the job done -- i.e. your own pool size could be dynamic with a cap.
When using the thread-pool, I also find it more difficult to monitor the execution of threads from the pool in their performing the work. So, unless it's fire and forget, I am in favor of more controlled execution. I know you mentioned that your app exits after the 65K items are all completed. How are you monitoring you threads to determine if they have completed their work -- i.e. all queued workers are done. Are you monitoring the status of all items in the HashSets? I think by queuing your items up and having your own worker threads consume off that queue, you can gain more control. Albeit, this can come at the cost of more overhead in terms of signaling between threads to indicate when all items have been queued allowing them to exit.
You could also try using the CCR - Concurrency and Coordination Runtime. It's buried inside Microsoft Robotics Studio, but provides an excellent API for doing this sort of thing.
You'd just need to create a "Port" (essentially a queue), hook up a receiver (method that gets called when something is posted to it), and then post work items to it. The CCR handles the queue and the worker thread to run it on.
Here's a video on Channel9 about the CCR.
It's very high-performance and is even being used for non-Robotics stuff (Myspace.com uses it behind the scenese for their content-delivery network).
I would recommend looking into MSDN: Task Parallel Library - DataFlow. You can find examples of implementing Producer-Consumer in your case would be the database producing items to validate and the validation routine becomes the consumer.
Also recommend using ConcurrentDictionary<TKey, TValue> as a "Concurrent" hash set where you just populate the keys with no values :). You can potentially make your code lock-free.

Categories

Resources