Retrieve data from threads - c#

I am trying to run two separate threads, like A and B. A and B running on totally different data, and A only need small part of data from B. They both need to be running all time. How can I retrieve the data from thread B and not interrupt B's running.
I am new to the multiple threads, could you tell me in examples?

That's not how threads work, threads don't “own” data (most of the time). You can access data that was used or created on another thread just like any other data, but it can be very dangerous to do so.
The problem is that most data structures are not ready to be accessed from more than one thread at the same time (they are not thread-safe). There are several ways how to fix that:
Use lock (or some other synchronization construct) to access the shared resource. Doing this makes sure that only one thread accesses the resource at a time, so it's safe. This is the most general approach (it works every time), it's probably the most common solution and the one that is easiest to get right (just lock on the right lock object every time you access the resource). But it can hurt performance, because it can make threads wait on each other a lot.
Don't share data between threads. If you have several operations that you want to run in parallel, some require resource A and others require resource B, run those that require A on one thread and those that require B on another thread. This way, you can be sure that only one thread accesses A or B, so it's safe. Another variant of this is if each thread has a copy of the resource.
Use special thread-safe data structures. For example in .Net 4, there is a whole namespace of thread-safe collections: System.Collections.Concurrent.
Use immutable data structures. If the structure doesn't change, it's safe to access it from several threads at the same time. For example, because of this it's safe to share a string between several threads.
Use special constructs that avoid locking, like Interlocked operations or volatile operations. This is how most of the structures from #3 are implemented internally and it's a solution that can be much more performant than #1. But it's also very hard to do this right, which is why you should avoid it unless you really know what you're doing.
You have several options and it can be all confusing. But the best option usually is to just use a lock to access the shared resource, or use a thread-safe structure from a library and doing that is not hard. But if you find out that's not enough, you can go for the more advanced alternatives, but it will be hard to get right.

Related

Trying to multithread using a global udpclient object, is there possible collisions issues?

I'm making a project in a p2p sharing system which will initiate a lot of sockets with the same ports. right now I'm using a global UdpClient which will use receive and sendasync methods on different threads with different endpoints. there is no usage of mutex as of now which is why I'm asking if collisions are possible using said object if I'm not changing the information inside this object
right now I tried only one example and it doesn't seem to collide although I don't trust one example enough for a full answer
As far as I can see, UdpClient is not thread safe. Thread safe objects should specifically mention that in the documentation, and UdpClient does not seem to do that.
So without any type of synchronization your code is most likely not safe. Testing is not sufficient since multi threading bugs are notorious for being difficult to reproduce. When you write multi threaded code you need to ensure any shared data is synchronized appropriately.
Using it within a lock is probably safe. But that is not a guarantee, UI objects are only safe to use from the thread that created the. Unfortunately that is not always well documented. A problem with locks is that it will block the thread, so locks are best used for very short and fast sections of code, not while doing long running operations like IO. And I don't think the compiler will even let you hold a lock while awaiting.
Another pattern is to use one or more concurrent queues, i.e. threads put messages on the queue, and another thread reads from the queue and sends the messages. There are many possible designs, and the best design will really depend on the specific application. However, designing concurrent systems is difficult, and I would recommend trying to create modules that are fairly independent, so you can understand and test a single module, without having to understand the entire program.
Memory is safe read concurrently. But the same does not extend to objects, since many object may mutate internal state when reading. Some types, like List<T>, specifically mentions that concurrent reads are safe. So make sure you check the documentation before using any object concurrently.

Threadpool management of shared variables in .NET

Let's say I have a timer (e.g. a System.Timers.Timer), and we know each elasped event will get put into the threadpool. If events come rapidly enough, how does the threadpool manage access to shared variables (e.g. a global int counter). Does the manager use semaphores/locks under the hood?
Or does it not do anything, and just simply make a copy of shared variables at the start of the threadpool, and the last thread to finish will set the correct variable value?
Unfortunately I can't really test this because the order of events firing are not guaranteed (e.g. using a counter variable is not reliable) between each elapsed event, as they may be fired out of order.
Thanks
You have to manage multi-threaded access to shared variables yourself.
There are many answers on StackOverflow and Google explaining how to do this, search for "thread safety C#".
I've worked on huge projects with many potential threading issues, and the code I write just works. I'm damn good at writing thread safe code these days, as I've already made all of the possible mistakes.
If you are just learning to write thread safe code, then its easy to get overwhelmed by the huge amount of information out there. You might find some pages that cover the 8 different types of synchronization primitives. You will find huge discussions on the topic, and only half of them will be helpful.
If you are following the learning curve for the first time, I would recommend that you ignore said noise for now, and instead focus on mastering these two rules first:
Rule 1
If any two threads write to some shared primitive (like a long or a Dictionary or a List), put a lock around the access to this shared primitive. Aim for a situation so that when the lock is finished, the data structure is completely updated. This is the heart of writing thread safe code: all other rules for threading can be derived from this one.
Example:
// This _lock should be initialized once on program startup, and should be global.
static readonly object _dictLock = new object();
// This data structure can be accessed by multiple threads.
public static Dictionary<string, int> dict = new Dictionary<string, int>();
lock (_dictLock)
{
if (dict.ContainsKey("Hello") == false)
{
dict.Add("Hello", 42);
}
} // Lock exits: data structure is now completely 100% updated. Google "atomic access C#".
Rule 2
Try not to have locks within locks. This can create deadlocks if the locks are entered in the wrong order. If you only lock around the primitives (e.g. dictionary, long, string, etc), then this shouldn't be an issue.
Guideline 1
If you are just learning, use nothing but lock, see how to use lock. Its difficult to go wrong if you just this, as the lock is automatically released when the function exits. You can graduate to other types of locks, like reader-write locks, later on. Don't bother with ConcurrentDictionary or Interlocked.Increment yet - focus on getting the basics correct.
Guideline 2
Try to spend as little time in locks as possible. Don't put a lock around a huge block of code, put locks around the smallest possible portions in the code, usually a dictionary or a long. A lock is blindingly fast unless its contested, so this technique seems to work well to create thread safe code that is fast.
Cause of 95% of meaningful threading issues?
In my experience, the single biggest cause of thread-unsafe code is Dictionary. Even ConcurrentDictionary is not immune to this - it needs manual locking to be correct if the access is spread over multiple lines. If you get this right, you will eliminate 95% of meaningful threading issues in your code.
The thread pool can't magically make your shared mutable variables thread-safe. It has no control over them and it does not even know they exist.
Be aware of the fact that timer ticks can happen concurrently (even at low frequencies) and after the timer has been disposed. You need to perform any synchronization necessary.
The thread pool itself is thread-safe in the sense that I can successfully process concurrent work items (which is kind of the point).

ReaderWriterLock vs lock{}

Please explain what are the main differences and when should I use what.
The focus on web multi-threaded applications.
lock allows only one thread to execute the code at the same time. ReaderWriterLock may allow multiple threads to read at the same time or have exclusive access for writing, so it might be more efficient. If you are using .NET 3.5 ReaderWriterLockSlim is even faster. So if your shared resource is being read more often than being written, use ReaderWriterLockSlim. A good example for using it is a file that you read very often (on each request) and you update the contents of the file rarely. So when you read from the file you enter a read lock so that many requests can open it for reading and when you decide to write you enter a write lock. Using a lock on the file will basically mean that you can serve one request at a time.
Consider using ReaderWriterLock if you have lots of threads that only need to read the data and these threads are getting blocked waiting for the lock and and you don’t often need to change the data.
However ReaderWriterLock may block a thread that is waiting to write for a long time.
Therefore only use ReaderWriterLock after you have confirmed you get high contention for the lock in “real life” and you have confirmed you can’t redesign your locking design to reduce how long the lock is held for.
Also consider if you can't rather store the shared data in a database and let it take care of all the locking, as this is a lot less likely to give you a hard time tracking down bugs, iff a database is fast enough for your application.
In some cases you may also be able to use the Aps.net cache to handle shared data, and just remove the item from the cache when the data changes. The next read can put a fresh copy in the cache.
Remember
"The best kind of locking is the
locking you don't need (i.e. don't
share data between threads)."
Monitor and the underlying "syncblock" that can be associated with any reference object—the underlying mechanism under C#'s lock—support exclusive execution. Only one thread can ever have the lock. This is simple and efficient.
ReaderWriterLock (or, in V3.5, the better ReaderWriterLockSlim) provide a more complex model. Avoid unless you know it will be more efficient (i.e. have performance measurements to support yourself).
The best kind of locking is the locking you don't need (i.e. don't share data between threads).
ReaderWriterLock allows you to have multiple threads hold the ReadLock at the same time... so that your shared data can be consumed by many threads at once. As soon as a WriteLock is requested no more ReadLocks are granted and the code waiting for the WriteLock is blocked until all the threads with ReadLocks have released them.
The WriteLock can only ever be held by one thread, allow your 'data updates' to appear atomic from the point of view of the consuming parts of your code.
The Lock on the other hand only allows one thread to enter at a time, with no allowance for threads that are simply trying to consume the shared data.
ReaderWriterLockSlim is a new more performant version of ReaderWriterLock with better support for recursion and the ability to have a thread move from a Lock that is essentially a ReadLock to the WriteLock smoothly (UpgradeableReadLock).
ReaderWriterLock/Slim is specifically designed to help you efficiently lock in a multiple consumer/ single producer scenario. Doing so with the lock statement is possible, but not efficient. RWL/S gets the upper hand by being able to aggressively spinlock to acquire the lock. That also helps you avoid lock convoys, a problem with the lock statement where a thread relinquishes its thread quantum when it cannot acquire the lock, making it fall behind because it won't be rescheduled for a while.
It is true that ReaderWriterLockSlim is FASTER than ReaderWriterLock. But the memory consumption by ReaderWriterLockSlim is outright outrageous. Try attaching a memory profiler and see for yourself. I would pick ReaderWriterLock anyday over ReaderWriterLockSlim.
I would suggest looking through http://www.albahari.com/threading/part4.aspx#_Reader_Writer_Locks. It talks about ReaderWriterLockSlim (which you want to use instead of ReaderWriterLock).

What is the meant by 'thread safe' object?

I have used generic queue in C# collection and everyone says that it is better to use the object of System.Collection.Generic.Queue because of thread safety.
Please advise on the right decision to use Queue object, and how it is thread safe?
"Thread safe" is a bit of an unfortunate term because it doesn't really have a solid definition. Basically it means that certain operations on the object are guaranteed to behave sensibly when the object is being operated on via multiple threads.
Consider the simplest example: a counter. Suppose you have two threads that are incrementing a counter. If the sequence of events goes:
Thread one reads from counter, gets
zero.
Thread two reads from counter, gets
zero.
Thread one increments zero, writes
one to counter.
Thread two increments zero, writes
one to counter.
Then notice how the counter has "lost" one of the increments. Simple increment operations on counters are not threadsafe; to make them threadsafe you can use locks, or InterlockedIncrement.
Similarly with queues. Not-threadsafe-queues can "lose" enqueues the same way that not-threadsafe counters can lose increments. Worse, not threadsafe queues can even crash or produce crazy results if you use them in a multi-threaded scenario improperly.
The difficulty with "thread safe" is that it is not clearly defined. Does it simply mean "will not crash"? Does it mean that sensible results will be produced? For example, suppose you have a "threadsafe" collection. Is this code correct?
if (!collection.IsEmpty) Console.WriteLine(collection[0]);
No. Even if the collection is "threadsafe", that doesn't mean that this code is correct; another thread could have made the collection empty after the check but before the writeline and therefore this code could crash, even if the object is allegedly "threadsafe". Actually determining that every relevant combination of operations is threadsafe is an extremely difficult problem.
Now to come to your actual situation: anyone who is telling you "you should use the Queue class, it is better because it is threadsafe" probably does not have a clear idea of what they're talking about. First off, Queue is not threadsafe. Second, whether Queue is threadsafe or not is completely irrelevant if you are only using the object on a single thread! If you have a collection that is going to be accessed on multiple threads, then, as I indicated in my example above, you have an extremely difficult problem to solve, regardless of whether the collection itself is "threadsafe". You have to determine that every combination of operations you perform on the collection is also threadsafe. This is a very difficult problem, and if it is one you face, then you should use the services of an expert on this difficult topic.
A type that is thread safe can be safely accessed from multiple threads without concern for concurrency. This usually means that the type is read-only.
Interestingly enough, Queue<T> is not thread safe - it can support concurrent reads as long as the queue isn't modified but that isn't the same thing as thread safety.
In order to think about thread safety consider what would happen if two threads were accessing a Queue<T> and a third thread came along and began either adding to or removing from this Queue<T>. Since this type does not restrict this behavior it is not thread safe.
In dealing with multithreading, you usually have to deal with concurrency issues. The term "concurrency issues" refers to issues that are specifically introduced by the possibility of interleaving instructions from two different execution contexts on a resource shared by both. Here, in terms of thread safety, the execution contexts are two threads within a process; however, in related subjects they might be processes.
Thread safety measures are put in place to achieve two goals primarily. First is to regain determinism with regard to what happens if the threads context-switch (which is otherwise controlled by the OS and thus basically nondeterministic in user-level programs), to prevent certain tasks from being left half-finished or two contexts writing to the same location in memory one after the other. Most measures simply use a little bit of hardware-supported test-and-set instructions and the like, as well as software-level synchronization constructs to force all other execution contexts to stay away from a data type while another one is doing work that should not be interrupted.
Usually, objects that are read-only are thread-safe. Many objects that are not read-only are able to have data accesses (read-only) occur with multiple threads without issue, if the object is not modified in the middle. But this is not thread safety. Thread safety is when all manner of things are done to a data type to prevent any modifications to it by one thread from causing data corruption or deadlock even when dealing with many concurrent reads and writes.

only one of multiple threads to execute a particular code path

I have multiple threads starting at the roughly the same time --- all executing the same code path. Each thread needs to write records to a table in a database. If the table doesn't exist it should be created. Obviously two or more threads could see the table as missing, and try to create it.
What is the preferred approach to ensure that this particular block of code is executed only once by only one thread.
While I'm writing in C# on .NET 2.0, I assume that the approach would be framework/language neutral.
Something like this should work...
private object lockObject = new object();
private void CreateTableIfNotPresent()
{
lock(lockObject)
{
// check for table presence and create it if necessary,
// all inside this block
}
}
Have your threads call call the CreateTableIfNotPresent function. The lock block will ensure that no thread will be able to execute the code inside of the block concurrently, so no threads will be able to view the table as not present while another is creating it.
This is a classical application for either a Mutex or a Semaphore
A mutex ensures that a specific piece of code (or several pieces of code) can only be run by a single thread at a time. You could be clever and use a different mutex for each table, or simply constrain the whole initialisation block to one thread at a time.
A semaphore (or set of semaphores) could perform exactly the same function.
Most lock implementations will use a mutex internally, so look at what lock code is already available in the language or libraries you are using.
#ebpower has it right that in certain applications, you would actually be more efficient to catch an exception caused by an attempt to create the same table multiple times, though this may not be the case in your example.
However there are many other ways of proceeding. For example, you could use a single-threaded ExecutorService (sorry, I could only find a Java reference) that has responsibility for creating any tables that your worker threads discover are missing. If it gets two requests for the same table, it simply ignores the later ones.
A variant on a Memoizer (remembering table references, creating them first if necessary) would also work under the circumstances. The book Java Concurrency In Practice walks through the implementation of a nice Memoizer class, but this would be pretty simple to port to any other language with effective concurrency building blocks.
This is what Semaphores are for.
You may not even need to bother with locks since your database shouldn't let you create multiple tables with the same name. Why not just catch the appropriate exceptions and if two threads try to create the same table, one wins and continues on, while the other recovers and continues on.
I'd use a thread sync object such as ManualResetEvent though it sounds to me like you're willing a race condition which may mean you have a design problem
Some posts have suggested Mutexes - this is an overkill unless your threads are running on different processes.
Others have suggested using locks - this is fine but locking can lead to over-pessimistic locks on data which can negate the benefit of using threads in the first place.
A more fundamental question is why are you doing it this way at all? What benefit does threading bring to the problem domain? Does concurrency solve your problem?
You may want to try static constructors to get a reference of the table.
According to the MSDN (.net 2.0), A static constructor is used to initialize any static data, or to perform a particular action that needs performed once only.
Also, CLR automatically guarantees that a static constructor executes only once per AppDomain and is thread-safe.
For more info, check Chapter 8 of CLR via C# by Jeffrey Richter.

Categories

Resources