For last 48 hours, I have been trying to understand Multithreading and Socket Programming. I tried to implement socket programming and had success when not using multithreading. I am new to both of the topics and have raised 2-3 question on stack itself needing help on the same.
After googling a lot I found an article that explains Socket Programming and Multithreading, but I still have a lot of doubts in this article and got stuck at Figure 5 in the article.
private void AcceptConnections()
{
while (true)
{
// Accept a connection
Socket socket = _serverSocket.Accept();
ConnectionInfo connection = new ConnectionInfo();
connection.Socket = socket;
// Create the thread for the receives.
connection.Thread = new Thread(ProcessConnection);
connection.Thread.IsBackground = true;
connection.Thread.Start(connection);
// Store the socket
lock (_connections) _connections.Add(connection);
}
}
In the very last line you can see a lock has been taken and 3-4 lines above a delegate ProcessConnection is bound.
At this point, I am not clear how this lock is working. What is happening behind the scenes when the lock has taken? Why did the author use lock here? What would have happened if no lock was taken? How does the thread ProcessConnection work? What things are happening simultaneously?
I got confused with all these questions
I know there is a list of questions here, but it would be a great help if you could assist me in understanding the methodology of working with multithreading.
connection.Thread.Start(connection) starts a new thread with a call to ProcessConnection, passing connection as the state argument. Execution in the current thread continues immediately with the next line while ProcessConnection is executed in the new thread.
ProcessConnection gets the Socket object from the ConnectionInfo object passed to it by AcceptConnections and waits to receive data from the socket. When it receives data, it loops through all of the other ConnectionInfo objects in the connections collection and sends this data to each of them in sequence.
So what's running concurrently here? Well, we have the initial thread (call it Thread 0) executing AcceptConnections in an endless loop. And then for each socket connection that we've accepted, we have a thread executing ProcessConnection.
The locks are needed because ProcessConnection uses foreach to loop through the known connections to send them data. If Thread 0 were to add a new connection to the collection while the collection is being enumerated in the foreach, an InvalidOperationException would be thrown in ProcessConnection.
The lock does prevent the concurrency issue in this case, but it also causes a potential performance problem. It doesn't only prevent AcceptConnections from modifying the collection while ProcessConnection is enumerating it. It also prevents any two threads executing ProcessConnection from enumerating the collection at the same time. A better choice in this case would be a ReaderWriterLockSlim which would allow multiple threads to read the collection concurrently.
I'm assuming _connections is a List<ConnectionInfo>: Lists are not threadsafe, and this thread adds items to that list. If another thread would be removing an item at the same time, the results would be unpredictable. So you have to make sure no other process can access it, using a lock.
connection.Thread.Start(connection); starts a new Thread that will start immediately or some time soon. The current thread (the code you're seeing here) will not have any control over it. This new thread is provided with a ConnectionInfo object though, so it will know on what socket to perform tasks on. While the current thread keeps listening to new clients, the ProcessConnection function will handle the recently accepted client.
In C#, and I think in CLR in general, every object might have a monitor associated with it. Here _connections is a collection that is possibly shared with the threads started from this very function (they probably remove connections from the collection when they are done). Collections in C# are not synchronized by default, you have to do it explicitly, thus the lock(_connections) statement to prevent races on the collection.
Related
Consider two threads run simultaneously. A is reading and B is writing. When A is reading, in the middle of code ,CPU time for A finishes then B thread continues.
Is there any way to don't give back CPU until A finishes, but B can start or continue?
You need to understand that you have almost no control over when CPU is given back and to whom it is given. The operating system does that. To have control on that, you'd need to be the operating system. The only things you can usually do are:
start a thread
set thread priority, so some threads are may more likely get time than others
put a thread to sleep, immediatelly and ask the operating system to wake it up upon some condition, maybe with some timeout (waiting time limit)
as a special case, or a typical use case, the second point is often also provided with a shorthand:
put a thread to sleep, immediatelly for a specified amount of time
By "sleep" I mean that this thread is paused and will not get any CPU time, even if all CPUs are idle, unless the thread is woken up by the OS due to some condition.
Furthermore, in a typical case, there is no "thread A and thread B that switch CPU time between them", but there is "lots of threads from various processes and the operating system itself, and you two threads". This means that when your thread A loses the CPU, most probably it will not be the thread B that gets the time now. Some other thread from somewhere else will get it, and at some future point of time, maybe your thread A or maybe thread B will get it back.
This means that there is very little you can be sure. You can be sure that your threads are
either dead
or sleeping
or proceeding 'forward' in a hard to determine order
If you need to ensure that some threads are synchronized, you must .. not start them simultaneously, or put them sleep in precise moments and wake them up in precise order.
You've just said in comments:
You know, if in the middle of A CPU time finishes, data that has been retrieved is not complete
This means that you need to ensure that thread B does not try to touch the data before thread A finishes writing it. But also, if you think about it, you need to ensure that thread A doesn't start writing next data if the thread B is now reading previous data.
This means synchronization. This means that threads A and B must wait if the other thread is touching the data. This means that they need to be put to sleep and woken up when the other thread finishes.
In C#, the easiest way to do that is to use lock(x) keyword. When a thread enters a lock() section, it proceeds only if it is able to get the lock. If not, it is put to sleep. It can't get the lock if any other thread was faster and got it before. However, a thread releases the lock when it ends its job. Upon that time, one of the sleeping threads is woken up and given the lock.
lock(fooo) { // <- this line means 'acquire the lock or sleep'
iam.doing(myjob);
very.important(things);
thatshouldnt.be.interrupted();
byother(threads);
} // <- this line means 'release the lock'
So, when a thread gets through the lock(fooo){ line, you can't be sure it won't be interrupted. Oh, surely it will be. OS will switch the threads back and forth to other processes, and so on. But you can be sure that no other threads of your app will be inside the code block. If they tried to get inside while your thread got that lock, they'd imediatelly fall asleep in the first lock line. One of them be will be later woken up when your thread gets out of that code.
There's one more thing. lock() keyword requires a parameter. I wrote foo there. You need to pass there something that will act as the lock. It can be any object, even plain object:
private object thelock = new object();
private void dosomething()
{
lock(thelock)
{
foobarize(thebaz);
}
}
however you must ensure that all threads try to use the same lock instance. Writing a code like
private void dosomething()
{
object thelock = new object();
lock(thelock)
{
foobarize(thebaz);
}
}
is a nonsense since every potential thread executing that lines will try lockin upon their own new object instance and will see it as "free" (it's new, just created, noone took it earlier) and will immediatelly get into the protected code block.
Now you wrote about using ConcurrentQueue. This class provides safely mechanisms against concurrency. You can be sure that adding or reading or removing items from that queue is already safe. This collection makes it safe. You don't need to add synchronization to add or remove items safely. It's safe. If you observe any ill effects, then most probably you have tried putting an item into that collection and then you were modifying that item. Concurrent collection will not guard you against that. It can only make sure that add/remove/etc are safe. But it has no knowledge or control on what you do to the items:
In short, if some thread B tries to read items from the collection, then in thread A this is NOT safe:
concurrentcoll.Add(item);
item.x = 5;
item.foobarize();
but this is safe:
item.x = 5;
item.foobarize();
concurrentcoll.Add(item);
// and do not touch the Item anymore here.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I'm creating a client-server structure where the server has a thread for every client.
That specific thread only sends and receives data. In the main thread of the server I'd want to read the input that the clientthread received. But it's possible that that input is being modified by the clientthread at the same time the mainthread is reading. How would i prevent this? I have read on locks but have no idea how to implement them that way.
A second part of my question is: the clientthread is a loop that constantly reads from a networkstream, and thus blocks that thread until it can read something. But can i call a function from my main thread (that function would send something through that networkstream) that the existing clientthread (that is looping) must execute?
Sorry i can't give any code right now, but i think it's clear enough?
It sounds like a producer-consumer design might be a good fit for your problem. In general terms, the client threads will put any received data into a (thread safe) queue and not modify it after that - any new data that arrives will go to a new slot in the queue. The main thread can then wait for new data in any of the queues and process it once it arrives. The main thread could either check on all the queues periodically, or (better) receive some sort of notification when data is placed in a queue, so that it can sleep while nothing is happening and won't eat CPU time.
Since you ask about locks: Here is a basic lock-based implementation as an alternative to queues, perhaps that will help you understand the principle
class IncomingClientData
{
private List<byte> m_incomingData = new List<byte>();
private readonly object m_lock = new object();
public void Append(IEnumerable<byte> data)
{
lock(m_lock)
{
m_incomingData.AddRange(data);
}
}
public List<byte> ReadAndClear()
{
lock(m_lock)
{
List<byte> result = m_incomingData;
m_incomingData = new List<byte>();
return result;
}
}
}
In this example, your client threads would call Append with the data that they have received, and the main thread could collect all the rececived data that arrived since the last check by calling ReadAndClear.
This is made thread-safe by locking all the code in both functions on m_lock, which is just a regular plain object - you can lock on any object in C#, but I believe this can be confusing and actually lead to subtle bugs if used carelessly, so I almost always use a dedicated object to lock on. Only one thread at a time can hold the lock on an object, so the code of those functions will only run in one thread at a time. For example, if your main thread calls ReadAndClear while the client thread is still busy appending data to the list, the main thread will wait until the client thread leaves the Append function.
It's not required to make a new class for this, but it can prevent accidents, because we can carefully control how the shared state is being accessed. For example, we know that it is safe to return the internal list in ReadAndClear() because there can be no other reference to the same list at that time.
Now for your second question: Just plain calling a method won't ever cause the method to run on a different thread, no matter which class the method is in. Invoke is a special feature of the WinForms UI thread, you'd have to implement that functionality yourself if you want to Invoke something in your worker threads. Internally, Invoke works by placing the code you want to run into a queue of all things that are supposed to run on the UI thread, including e.g. UI events. The UI thread itself is basically a loop that always pulls the next piece of work from that queue, and then performs that work, then takes the next item from the queue and so on. That is also why you shouldn't do long work in an event handler - as long as the UI thread is busy running your code, it won't be able to process the next items in its queue, so you'll hold up all the other work items / events that occur.
If you want your client threads to run a certain function, you have to actually provide the code for that - e.g. have the client threads check some queue for commands from the main thread.
I'm building a small chat program that consists of a server and client. The server keeps a list of clients that it interacts with.
I've got two worker threads on the server. One handles incoming client connections. The other handles incoming client messages.
Now, since both threads interact with a List called 'clients', I've done something like this.
// The clients list looks something like this...
List<TcpClient> clients;
// This is running on one thread.
ConnectionHandler()
{
while(true)
{
// Wait for client to connect, etc. etc.
// Now, add the client to my clients List.
lock(clients)clients.Add(myNewClient);
}
}
// This is running on another thread.
ClientHandler()
{
while(true)
{
lock(clients)
{
/*
This will be handling things like incoming messages
and clients disconnecting (clients being removed from
the 'clients' List
*/
}
}
}
Is this a correct use of locks to prevent my List from being altered by two different threads at once?
I haven't had any problems doing this so far, but I just want to make sure it's correct.
This is correct, but make sure that ClientHandler does not hold the lock for too long. It should never hold the lock while blocking (e.g. caused by an IO operation on a socket). If you violate this rule you will find your throughput being destroyed (still maintaining correctness).
Do you have single writer and multiple readers? Have a look at ReaderWriterLock and this collections.
Looks kinda OK. Chat servers are periously tricky for multithreadedly-challenged. Exceptions could be raised inside locks, for example, when a server-client socket object gets a disconnect but, before its thread can remove the object from the list, another thread locks the list and tries to write to the disconnected socket.
A note (on top) - since you don't initialize the field (ie I don't see how you do it, when you might destroy and re-initialize etc.) - make sure you're locking the same instance, e.g. see this Lock on an object that might change during code execution
Looks fine to me but I would make this correction:
private readonly List<TcpClient> clients = new List<TcpClient>();
You can also create the list in the constructor but keep it as readonly. This is key to make sure you're locking on the same object. Otherwise, if you happen recreate clients list, your code would stop being thread safe.
EDIT
The good news is that the weird behavior explained below is not related to the ConcurrentBag, that is: Threads related to the concurrent bag are eventually freed up. Yet Threads themselfs are kept alive for some or other reason. In the example code given i clearly create a thread and destroy all references. Yet a Garbage collection does not pick it up. In fact, the only moments i have discovered so far for when threads gets totally destroyed are when the concurrent bag itself gets collected (if i dont collect the concurrent bag, the Threads will stay alive), or when we create a number of other threads.
ORIGINAL (The original problem and some motivation. The sourcecode below this part explains the important aspects)
This is a replicate of a question i asked months before with regards to the ConcurrentBag ( Possible memoryleak in ConcurrentBag? ). Appareantly, the ConcurrentBag doesn't behave as it should be and now i'm worried about the stability of some running legacy code. This question comes in a response to my findings when answering the following question: How can I free-up memory used by a Parallel.Task?
The scenario hasnt changed a bit. I have a webservice which handles messages. Clients can shoot in messages using some public API. These requests will be handeled by a Thread from the .NET ThreadPool which in turn adds the message to a ConcurrentBag. Next, there are concurrent tasks consuming messages from the ConcurrentBag and handling them. there are peak hours with many messages being added and many messages being consumed and there moments when nobody is doing anything. Soo, the threadpool itself is subjected to change its amount of active threads quite extensively during the running time (desired running time is 'forever').
Now it turns out that as soon as a thread calls ConcurrentBag.Add (or any method on the ConcurrentBag). the thread is kept alive by a reference internally held in the ConcurrentBag and is only released when the concurrentBag itself is actually cleaned up by the GC. In my scenario, this will lead the a infinite amount of 'waste' threads who are not cleaned up over time since the ConcurrentBag is alive throughout the application's lifetime.
The previously given solution of simply emptying the bag doesn't help either since there is not the problem. The problem (Assumably is that ConcurrentBag doesn't call Dispose on the ThreadLocal it holds for the current Thread since there is no way for the ConcurrentBag to know when a thread is ends). (Yet it should do cleanup whenever you access the bag again).
That said, should we stop using ConcurrentBag in most cases or are there solutions i can add to work around this problem?
My test code:
Action collectAll = () =>
{
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
};
ConcurrentBag<object> bag = new ConcurrentBag<object>();
Thread producerThread = new Thread(new ThreadStart(delegate
{
// produce 1 item
bag.Add(new object());
}));
// create a weakreference to the newly created thread so we can track its status
WeakReference trackingReference = new WeakReference(producerThread);
// start the thread and let it complete
producerThread.Start();
producerThread.Join();
// thread can now be set to null, after a full GC collect, we assume that the thread is gone
producerThread = null;
collectAll();
Console.WriteLine("Thread is still alive: " + trackingReference.IsAlive); // returns true
// consume all items from the bag and collect again, the thread should surely be disposed by now
object output;
bag.TryTake(out output);
collectAll();
Console.WriteLine("Thread is still alive: " + trackingReference.IsAlive); // returns true
// ok, finally remove all references to the Bag
bag = null;
collectAll();
Console.WriteLine("Thread is still alive: " + trackingReference.IsAlive); // returns false
Given a following code snippet(found in somewhere while learning threading).
public class BlockingQueue<T>
{
private readonly object sync = new object();
private readonly Queue<T> queue;
public BlockingQueue()
{
queue = new Queue<T>();
}
public void Enqueue(T item)
{
lock (sync)
{
queue.Enqueue(item);
Monitor.PulseAll(sync);
}
}
public T Dequeue()
{
lock (sync)
{
while (queue.Count == 0)
Monitor.Wait(sync);
return queue.Dequeue();
}
}
}
What I want to understand is ,
Why is there a while loop ?
while (queue.Count == 0)
Monitor.Wait(sync);
and what is wrong with the,
if(queue.Count == 0)
Monitor.Wait(sync);
In fact, all the time when I see the similar code I found using while loop, can anyone please help me understand the use of one above another.
Thank you.
You need to understand what Pulse, PulseAll, and Wait are doing. The Monitor maintains two queues: the waiting queue and the ready queue. When a thread calls Wait it is moved into the waiting queue. When a thread calls Pulse it moves one and only one thread from the waiting queue to the ready queue. When a thread calls PulseAll it moves all threads from the waiting queue to the ready queue. Threads in the ready queue are eligible to reacquire the lock at any moment, but only after the current holder releases it of course.
Based on this knowledge it is fairly easy to understand why you must recheck the queue count when using PulseAll. It is because all dequeueing threads will eventually wake and will want to attempt to extract an item from queue. But, what if there is only one item in the queue to begin with? Obviously, we must recheck the queue count to avoid dequeueing an empty queue.
So what would be the conclusion if you had used Pulse instead of PulseAll? There would still be a problem with the simple if check. The reason is because a thread from the ready queue is not necessarily going to be the next thread to acquire the lock. That is because the Monitor does not give preference to a Wait call above an Enter call.
The while loop is a fairly standard pattern when using Monitor.Wait. This is because pulsing a thread does not have semantic meaning by itself. It is only a signal that the lock state has changed. When threads wake up after blocking on Wait they should recheck the same condition that was originally used to block the thread to see if the thread can now proceed. Sometimes it cannot and so it should block some more.
The best rule of thumb here is that if there is doubt about whether to use an if check or a while check then always choose a while loop because it is safer. In fact, I would take this to the extreme and suggest to always use a while loop because there is no inherent advantage in using the simpler if check and because the if check is almost always the wrong choice anyway. A similar rule holds for choosing whether to use Pulse or PulseAll. If there is doubt about which one to use then always choose PulseAll.
you have to keep checking whether the queue is still empty or not. Using only if would only check it once, wait for a while, then a dequeue. What if at that time the queue is still empty? BANG! queue underflow error...
with if condition when something released the lock the queue.Count == 0 will not check again and maybe a queue underflow error so we have to check the condition every time because of concurrency and this is called Spinning
Why on Unix it could go wrong is because of the spurious wake up, possibility caused by OS signals. It is a side effect that is not guaranteed to never happen on windows as well. This is not a legacy, it is how OS works. If Monitors are implemented in terms of Condition Variable, that is.
def : a spurious wake up is a re-scheduling of a sleeping thread on a condition variable wait site, that was not triggered by an action coming from the current program threads (like Pulse()).
This inconvenience could be masked in managed languages by, e.g. the queues. So before going out of the Wait() function, the framework could check that this running thread is actually really being requested for scheduling, if it does not find itself in a run queue it can go back to sleep. Hiding the problem.
if (queue.Count == 0)
will do.
Using while loop pattern for "wait for and check condition" context is a legacy leftover, I think. Because non-Windows, non-.NET monitor variables can be triggered without actual Pulse.
In .NET, you private monitor variable cannot be triggered without Queue filling so you don't need to worry about queue underflow after monitor waiting. But, it is really not bad habit to use while loop for "wait for and check condition".