It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have read something about threadsafety but I want to understand on what operations I need to put a lock.
For example lets say I want a threadsafe queue/
If the deqeue operation will return the first element if there is one, when do I need a lock? Lets say i'm using an abstract linked list for the entries.
Should write actions be locked? Or reading ones? Or both?
Hope if someone can explain this to me or give me some links.
Synchronization in concurrent scenarios is a very wide topic. Essentially whenever two or more threads have some shared state between them (counter, data structure) and at least one of them mutates this shared state concurrently with a read or another mutation from a different thread, the results may be inconsistent. In such cases you will need to use some form of synchronization (of which locks are a flavor).
Now going to your question, a typical code that does a dequeue is the following (pseudocode):
if(queue is not empty)
queue.dequeue
which may be executed concurrently by multiple threads. Although some queue implementations internally synchronize both the queue is not empty operation as well as the queue.dequeue operation, that is not enough, since a thread executing the above code may be interrupted between the check and the actual dequeue, so some threads may find the queue empty when reaching the dequeue even though the check returned true. A lock over the entire sequence is needed:
lock(locker)
{
if(queue is not empty)
queue.dequeue
}
Note that the above may be implemented as a single thread-safe operation by some data structures, but I'm just trying to make a point here.
The best guide for locking and threading I found, is this page (this is the text I consult when working with locking and threading):
http://www.albahari.com/threading/
Yo want the paragraph "Locking and Thread Safety", but read the rest also, it is very well written.
For a basic overview see MSDN: Thread Synchronization. For a more detailed introduction I recommend reading Amazon: Concurrent Programming on Windows.
You need locks on objects that are subject to non atomic operations.
Add object to a list -> non atomic
Give value to a byte or an int -> atomic
As the simplest rule of thumb, all shared mutable data requires locking a lock while you access it.
You need a lock when writing, because you need to ensure no people are writing the same fields at the same time.
You need to lock when reading, because another thread could be halfway writing the data, so it could be in an inconsistent state. Inconsistant data can produce incorrect output, or crashes.
Locks have their own set of problems associated with them, (Google for "dining philosophers") so I tend to avoid using explicit locks whenever possible. Higher level building blocks, like the ConcurrentQueue<> are less errorprone, but you should still read the documentation.
Another simple way to avoid locks is to make a copy of the input data for your background process. Or even better, use immutable input (data that can not change).
The basic rules of locking
Changing the same thing simultaneously does not fly
Reading a thing that is being changed does not fly
Reading the same thing simultaneously does fly
Changing different things simultaneously might fly
Locking needs to prevent situations that do not fly. This can be done in many ways. C# gives you a lot of tools for this. Among other the Concurrent<> collection types like ConcurrentDictionary, ConcurrentQueue etc. But also ReaderWriterLockSlim and more.
You might find this free .pdf from microsoft useful. It's called 'An Introduction to Programming with C# Threads'
http://research.microsoft.com/pubs/70177/tr-2005-68.pdf
Ot this somewhat more humorous relay
http://www.codeproject.com/Articles/114262/6-ways-of-doing-locking-in-NET-Pessimistic-and-opt
Related
I am currently working on a multithreaded c# application.
In my case, I have a list/dictionary, which is assigned and filled in the main-thread while the application is starting up. The list will never be modified again. I only use the list to get objects.
Do I have to use locks?
lock(list) { var test = list[0]; }
or can I access the object directly?
I know, if I access the object in the list, the object has to be thread-safe.
Reading is not a problem. But be sure that unexpected behaviors can appear if someone else is writing/deleting. When you are reading
if this list is prepared before and not changed after this you could access the object without locking which is also faster.
But be really aware that you should strongly avoid and modification action to happend when you read from the collection.
When you need also the writing operations then you would need synchronization like with ReaderWriterLockSlim or have a look at the system.collections.concurrent namespace
As long as you don't change the content of the list/array, there is no immediate need for locks.
But I would suggest to implement some synchronization (like locks) anyway. Can you be sure that you won't change your application in the next years so that you will change the content later at runtime?
Lock is used to avoid fetching dirty reads if there's other thread . Since you won't change it, list is lock-free.
If you really want to prevent some unexpected changes for debugging (get error when it happens), you can declare it as const.
As others have mentioned, reading is not a problem. But as you said, you are populating this collection at start up. But you have not mentioned at what point are you starting to read. So presumably there can be unexpected behaviours. You have to use thread safe collections for that. For an example you can use a blocking collection for this purpose.
Here is the MSDN article which explains more about thread safe collections Link.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
From a performance standpoint, it is more beneficial to read large amounts of data from an XML file or by looping through an array?
I have around 2,000 datasets I need to loop through and do calculations with, so I'm just wondering if it would be better to import all XML data and process it as an array (single large import) or to import each dataset sequentially (many small imports).
Thoughts and suggestions?
If I have interpreted your question correctly, you need to load 2,000 sets of data from one file, and then process them all. So you have to read all the data and process all the data. At a basic level there is the same amount of work to do.
So I think the question is "How can I finish the same processing earlier?"
Consider:
How much memory will the data use? If it's going to be more than 1.5GB of RAM, then you will not be able to process it in a single pass on a 32-bit PC, and even on 64-bit PCs you're likely to see virtual memory paging killing performance. In either of these cases, streaming the data in smaller chunks is a necessity.
Conversely if the data is small (e.g. 2000 records might only be 200kB for all I know), then you may get better I/O performance by reading it in one chunk, or it will load so fast compared to the processing time that there is no point trying to optimise it.
Are the records independent? (so they don't need to be processed in a particular order, and you don't need one record present in memory in order to process another one) If so, and if the loading time is significant overall, then the "best" approach may be to parallelise the operation - If you can process some data while you are loading more data in the background, you will utilise the hardware better and do the same work in less time. So you probably want to consider splitting your loading and processing onto different threads.
But spreading the processing onto many threads might not help you if loading takes much longer than processing, as your processing threads may be starved of data while waiting for I/O - so using 1 processing thread may be just as fast as using 3 or 7. And there's no point in creating more threads than you have available CPU cores. If going multithreaded, I'd write it to use a configurable/dynamic number of threads and then do some testing to determine what the optimum approach will be.
But before you consider all of that, you might want to consider writing a brute force approach and see what the performance is like. Do you even need to optimise it?
And if the answer is "yes, I desperately need to optimise it", then can you reconsider the data format? XML is a very useful but grossly inefficient format. If you have a performance critical case, is there anything you can do to reduce the XML size (e.g. simply using shorter element names can make a massive difference on large files), or even use a much more compact and easily read binary format?
I'm using such configuration:
.NET framework 4.5
Windows Server 2008 R2
HP DL360p Gen8 (2 * Xeon E5-2640, x64)
I have such field somewhere in my program:
protected int HedgeVolume;
I access this field from several threads. I assume that as I have multi-processor system it's possible that this threads are executing on different processors.
What should I do to guarantee that any time I use this field the most recent value is "read"? And to make sure that when I "write" value it become available to all other threads immediately?
What should I do?
just leave field as is.
declare it volatile
use Interlocked class to access the field
use .NET 4.5 Volatile.Read, Volatile.Write methods to access the field
use lock
I only need simplest way to make my program work on this configuration I don't need my program to work on another computers or servers or operation systems. Also I want minimal latency so I'm looking for fastest solution that will always work on this standard configuration (multiprocessor intel x64, .net 4.5).
Your question is missing one key element... How important is the integrity of the data in that field?
volatile gives you performance, but if a thread is currently writing changes to the field, you won't get that data until it's done, so you might access out of date information, and potentially overwrite changes another thread is currently doing. If the data is sensitive, you might get bugs that would get very hard to track. However, if you are doing very quick update, overwrite the value without reading it and don't care that once in a while you get outdated (by a few ms) data, go for it.
lock guaranty that only one thread can access the field at a time. You can put it only on the methods that write the field and leave the reading method alone. The down side is, it is slow, and may block a thread while another is performing its task. However, you are sure your data stay valid.
Interlock exist to shield yourself from the scheduler context switch. My opinion? Don't use it unless you know exactly why you would be using it and exactly how to use it. It gives options, but with great options comes great problematic. It prevents a context switch while a variable is being update. It might not do what you think it does and won't prevent parallel threads from performing their tasks simultaneously.
You want to use Volatile.Read().
As you are running on x86, all writes in C# are the equivalent of Volatile.Write(), you only need to use this for Itanium.
Volatile.Read() will ensure that you get the latest copy regardless of which thread last wrote it.
There is a fantastic write up here, C# Memory Model Explained
Summary of it includes,
On some processors, not only must the compiler avoid certain
optimizations on volatile reads and writes, it also has to use special
instructions. On a multi-core machine, different cores have different
caches. The processors may not bother to keep those caches coherent by
default, and special instructions may be needed to flush and refresh
the caches.
Hopefully that much is obvious, other than the need for volatile to stop the compiler from optimising it, there is the processor as well.
However, in C# all writes are volatile (unlike say in Java),
regardless of whether you write to a volatile or a non-volatile field.
So, the above situation actually never happens in C#. A volatile write
updates the thread’s cache, and then flushes the entire cache to main
memory.
You do not need Volatile.Write(). More authoratitive source here, Joe Duffy CLR Memory Model. However, you may need it to stop the compiler reordering it.
Since all C# writes are volatile, you can think of all writes as going
straight to main memory. A regular, non-volatile read can read the
value from the thread’s cache, rather than from main
You need Volatile.Read()
When you start designing a concurrent program, you should consider these options in order of preference:
1) Isolation: each thread has it's own private data
2) Immutability: threads can see shared state, but it never changes
3) Mutable shared state: protect all access to shared state with locks
If you get to (3), then how fast do you actually need this to be?
Acquiring an uncontested lock takes in the order of 10ns ( 10-8 seconds ) - that's fast enough for most applications and is the easiest way to guarantee correctness.
Using any of the other options you mention takes you into the realm of low-lock programming, which is insanely difficult to get correct.
If you want to learn how to write concurrent software, you should read these:
Intro: Joe Albahari's free e-book - will take about a day to read
Bible: Joe Duffy's "Concurrent Programming on Windows" - will take about a month to read
Depends what you DO. For reading only, volatile is easiest, interlocked allows a little more control. Lock is unnecessary as it is more ganular than the problem you describe. Not sure about Volatile.Read/Write, never used them.
volatile - bad, there are some issues (see Joe Duffy's blog)
if all you do is read the value or unconditionally write a value - use Volatile.Read and Volatile.Write
if you need to read and subsequently write an updated value - use the lock syntax. You can however achieve the same effect without lock using the Interlocked classes functionality, but this is more complex (involves CompareExchange s to ensure that you are updating the read value i.e. has not been modified since the read operation + logic to retry if the value was modified since the read).
From this i can understand that you want to be able to read the last value that it was writtent in a field. Lets make an analogy with the sql concurency problem of the data. If you want to be able to read the last value of a field you must make atomic instructions. If someone is writing a field all of the threads must be locked for reading until that thread finished the writing transaction. After that every read on that thread will be safe. The problem is not with reading as it is with writing. A lock on that field whenever its writtent should be enough if you ask me ...
First have a look here: Volatile vs. Interlocked vs. lock
The volatile modifier shurely is a good option for a multikernel cpu.
But is this enough? It depends on how you calculate the new HedgeVolume value!
If your new HedgeVolume does not depend on current HedgeVolume then your done with volatile.
But if HedgeVolume[x] = f(HedgeVolume[x-1]) then you need some thread synchronisation to guarantee that HedgeVolume doesn't change while you calculate and assign the new value. Both, lock and Interlocked szenarios would be suitable in this case.
I had a similar question and found this article to be extremely helpful. It's a very long read, but I learned a LOT!
I will try to make this question as generic as possible, but I will give a brief introduction to my actual problem -
I am trying to implement a concurrent skiplist for a priority queue. Each 'node', has a value, and an array of 'forward' nodes, where node.forward[i] represents the next node on the i-th level of the skiplist. For write access (i.e. insertions and deletions), I use a Spinlock (still to determine if that is the best lock to use)
My question is essentially, when I need a read access for a traversal,
node = node.forward[i]
What kind of thread safety do I need around something like this? If another thread is modifying node.forward[i] at exactly the same time that I read (with no current locking mechanism for read), what can happen here?
My initial thought is to have a ReaderWriterLockSLim on the getter and setter of the indexer for Forward. Will there be too much unnecessary locking in this scenario?
Edit: Or would it be best to instead use a Interlocked.Exchange for all of my reads?
If another thread is modifying node.forward[i] at exactly the same time that I read (with no current locking mechanism for read), what can happen here?
It really depends on the implementation. It's possible to use Interlocked.Exchange when setting "forward" in a way that can prevent the references from being invalid (as it can make the "set" atomic), but there is no guarantee of which reference you'd get on read. However, with a naive implementation, anything can happen, including getting bad data.
My initial thought is to have a ReaderWriterLockSLim on the getter and setter of the indexer for Forward.
This is likely to be a good place to start. It will be fairly easy to make a properly synchronized collection using a ReaderWriterLockSlim, and functional is always the first priority.
This would likely be a good starting point.
Will there be too much unnecessary locking in this scenario?
There's no way to know without seeing how you implement it, and more importantly, how it's goign to be used. Depending on your usage, you can profile and look for optimization opportunities if necessary at that point.
On a side note - you might want to reconsider using node.Forward[i] as opposed to more of a "linked list" approach here. Any access to Forward[i] is likely to require a fair bit of synchronization to iterate through the skip list i steps, all of which will need some synchronization to prevent errors if there are concurrent writes anywhere between node and i elements beyond node. If you only look ahead one step, you can (potentially) reduce the amount of synchronization required.
When I said atomic, I meant set of instructions will execute without any context switching to another thread on the same process (other kinds of switches have to be done of course). The only solution I came up with is to suspend all threads except currently executed before part and resume them after it. Any more elegant way?
The reason I want to do that is to collect a coherent state of objects running on multiple threads. However, their code cannot be changed (they're already compiled), so I cannot insert mutexes, semaphores, etc in it. The atomic operation is of course state collecting (i.e. copying some variables).
There are some atomic operations in the Interlocked class but it only provides a few very simple operations. It can't be used to create an entire atomic block of code.
I'd advise using locking carefully to make sure that your code will still work even if the context changes.
Well, you can use locks, but you can't prevent context switching exactly. But if your threads lock on the same object, then the threads waiting obviously won't be running, so there's no context switching involved since there's nothing to run.
You might want to look at this page too.
No. You can surround a block of code with a Monitor to make it thread-safe, but you cannot make general code snippets atomic.
object lck = new object();
lock(lck)
{
// thread safe code goes in here
}
No, that's against multi-tasking.
Unless very simple operations like incrementing ... which are not subject of your question.
It is possible to obtain a global state from a shared memory composed of a collection (array) of atomic one reader/multi writer registers. The solution is simple but not trivial. You can read the algorithm published in the paper "atomic snapshots of shared memory" or you can read the chapter 4 from the art of multiprocesor programming book, there you can get ideas on the implementation on the java language, of course, once you are familiarized with the idea you should be able to transport it to any other language. Sorry if my english is not well enough.