Should I lock a datatable in multithread paradigm?

Should I lock a datatable in multithread paradigm? - c#

In a project of windows services (C# .Net Platform), I need a suggestion.
In the project I have class named Cache, in which I keep some data that I need frequently. There is a thread that updates cache after each 30 minutes. Where as there are multiple threads which use cache data.
In the cache class, there are getter and setter functions which are used by user threads and cache updater thread respectively. No one uses data objects like tables directly, because they are private members.
From the above context, do you think I should use locking functionality in the cache class?

The effects of not using locks when writing to a shared memory location (like cache) really depend on the application. If the code was used in banking software the results could be catastrophic.
As a rule o thumb - when multiple threads access the same location, even if only one tread writes and all the other read, you should use locks (for write operation). What can happen is that one thread will start reading data, get swiped out by the updater thread; So it'll potentially end up using a mixture of old and new data. If that really as an impact depends on the application and how sensible it is.

Key Point: If you don't lock on the reads there's a chance your read won't see the changes. A lock will force your read code to get values from main memory rather than pulling data from a cache or register. To avoid actually locking you could use Thread.MemoryBarrier(), which will do the same job without overhead of actually locking anything.
Minor Points: Using lock would prevent a read from getting half old data and half new data. If you are reading more than one field, I recommend it. If you are really clever, you could keep all the data in an immutable object and return that object to anyone calling the getter and so avoid the need for a lock. (When new data comes in, you create a new immutable object, then replace the old with the new in one go. Use a lock for the actual write, or, if you're still feeling really clever, make the field referencing the object volatile.)
Also: when your getter is called, remember it's running on many other threads. There's a tendency to think that once something is running the Cache class's code it's all on the same thread, and it's not.

Related

Using a double buffer technique for concurrent reading and writing?

I have a relatively simple case where:
My program will be receiving updates via Websockets, and will be using these updates to update it's local state. These updates will be very small (usually < 1-1000 bytes JSON so < 1ms to de-serialize) but will be very frequent (up to ~1000/s).
At the same time, the program will be reading/evaluating from this local state and outputs its results.
Both of these tasks should run in parallel and will run for the duration for the program, i.e. never stop.
Local state size is relatively small, so memory usage isn't a big concern.
The tricky part is that updates need to happen "atomically", so that it does not read from a local state that has for example, written only half of an update. The state is not constrained to using primitives and could contain arbitrary classes AFAICT atm, so I cannot solve it by something simple like using Interlocked atomic operations. I plan on running each task on its own thread, so a total of two threads in this case.
To achieve this goal I thought to use a double buffer technique, where:
It keeps two copies of the state so one can be read from while the other is being written to.
The threads could communicate which copy they are using by using a lock. i.e. Writer thread locks copy when writing to it; reader thread requests access to lock after it's done with current copy; writer thread sees that reader thread is using it so it switches to other copy.
Writing thread keeps track of state updates it's done on the current copy so when it switches to the other copy it can "catch up".
That's the general gist of the idea, but the actual implementation will be a bit different of course.
I've tried to lookup whether this is a common solution but couldn't really find much info, so it's got me wondering things like:
Is it viable, or am I missing something?
Is there a better approach?
Is it a common solution? If so what's it commonly referred to as?
(bonus) Is there a good resource I could read up on for topics related to this?
Pretty much I feel I've run into a dead-end where I cannot find (because I don't know what to search for) much more resources and info to see if this approach is "good". I plan on writing this in .NET C#, but I assume the techniques and solutions could translate to any language. All insights appreciated.

You actually need four buffers/objects. Two buffers/objects are owned by the reader, one by the writer, and one in the mailbox.
The reader -- each time he finishes a group of atomic operations on his newer object, he uses interlocked exchange to swap his older object handle (pointer or index doesn't matter) with the mailbox one. Then he looks at the newly obtained object and compares the sequence number to the object he just read (and is still holding) to find out which is newer.
The writer -- writes a complete copy of latest data into his object, then uses interlocked exchange to swap his newly written object with the mailbox one.
As you can see, the writer can steal the mailbox object at any time, but never the one that the reader is using, so read operations stay atomic. And the reader can steal the mailbox object at any time, but never the one the writer is using, so write operations stay atomic.
As long as the interlocked-exchange function produces the correct memory fence (release for the swap done in the writer thread, acquire for the reader thread), the objects can themselves be arbitrarily complex.

If I understand correctly, the writes themselves are synchronous. If so, then maybe it's not necessary to keep two copies or even to use locks.
Maybe something like this could work?
State state = populateInitialState();
...
// Reader thread
public State doRead() {
return makeCopyOfState(state);
}
...
// Writer thread
public void updateState() {
State newState = makeCopyOfState(state);
// make changes in newState
state = newState;
}

It looks like you are using the input-process-output pattern in a multithreaded pipeline. Sometimes the input and processing phases (or processing and output phases) are merged when the problem is simple.
You have added a C# tag so using something like a BlockingCollection might be a useful way to communicate between the input and output threads. Since the local state is relatively small (your words) then posting a data-object containing a copy of the local state from the input thread to the output thread could be a simple solution. This follows a share-nothing philosophy which satisfies the atomic requirement because a snapshot of the current state is queued. The "catch up" capability is satisfied because the queue contains the backlog of state changes.
Generally, Messaging Patterns and Conversation Patterns are useful resources when trying to work out what to communicate and how to communicate between 2 or more threads (or processes, services, servers, etc).

Best way of dealing with shared state in a real time system in dotnet core background service

I have a background service IHostedService in dotnet core 3.1 that takes requests from 100s of clients(machines in a factory) using sockets (home rolled). My issue is that multiple calls can come in on different threads to the same method on a class which has access to an object (shared state). This is common in the codebase. The requests also have to be processed in the correct order.
The reason that this is not in a database is due to performance reasons (real time system). I know I can use a lock, but I don't want to have locks all over the code base.
What is a standard way to handle this situation. Do you use an in-memory database? In-memory cache? Or do I just have to add locks everywhere?
public class Machine
{
public MachineState {get; set;}
// Gets called by multiple threads from multiple clients
public bool CheckMachineStatus()
{
return MachineState.IsRunning;
}
// Gets called by multiple threads from multiple clients
public void SetMachineStatus()
{
MachineState = Stopped;
}
}
Update
Here's an example. I have a console app that talks to a machine via sockets, for weighing products. When the console app initializes it will load data into memory (information about the products being weighed). All of this is done on the main thread, to keep data integrity.
When a call comes in from the weigh-er on Thread 1, it will get switched to the main thread to access the product information, and to finish any other work like raising events for other parts of the system.
Currently this switching from Thread 1,2, ...N to the main thread is done by a home rolled solution, and was done to avoid having locking code all over the code base. This was written in .Net 1.1 and since moving to dotnet core 3.1. I thought there might be a framework, library, tool, technique etc that might handle this for us, or just a better way.
This is an existing system that I'm still learning. Hope this makes sense.

Using an in-memory database is an option, as long as you are willing to delegate all concurrency-inducing situations to the database, and do nothing using code. For example if you must update a value in the database depending on some condition, then the condition should be checked by the database, not by your own code.
Adding locks everywhere is also an option, that will almost certainly lead to unmaintanable code quite quickly. The code will probably be riddled with hidden bugs from the get-go, bugs that you will discover one by one over time, usually under the most unfortunate of circumstances.
You must realize that you are dealing with a difficult problem, with no magic solutions available. Managing shared state in a multithreaded application has always been a source of pain.
My suggestion is to encapsulate all this complexity inside thread-safe classes, that the rest of your application can safely invoke. How you make these classes thread-safe depends on the situation.
Using locks is the most flexible option, but not always the most efficient because it has the potential of creating contention.
Using thread-safe collections, like the ConcurrentDictionary for example, is less flexible because the thread-safety guarantees they offer are limited to the integrity of their internal state. If for example you must update one collection based on a condition obtained from another collection, then the whole operation can not be made atomic by just using thread-safety collections. On the other hand these collections offer better performance than the simple locks.
Using immutable collections, like the ImmutableQueue for example, is another interesting option. They are less efficient both memory and CPU wise than the concurrent collections (adding/removing is in many cases O(Log n) instead of O(1)), and not more flexible than them, but they are very efficient specifically at providing snapshots of actively processed data. For updating atomically an immutable collection, there is the handy ImmutableInterlocked.Update method available. It updates a reference of an immutable collection with an updated version of the same collection, without using locks. In case of contention with other threads it may invoke the supplied transformation multiple times, until it wins the race.

Is it safe to thread concurrent database queries?

I'm trying to improve upon this program that I wrote for work. Initially I was rushed, and they don't care about performance or anything. So, I made a horrible decision to query an entire database(a SQLite database), and then store the results in lists for use in my functions. However, I'm now considering having each of my functions threaded, and having the functions query only the parts of the database that it needs. There are ~25 functions. My question is, is this safe to do? Also, is it possible to have that many concurrent connections? I will only be PULLING information from the database, never inserting or updating.

The way I've had it described to me[*] is to have each concurrent thread open its own connection to the database, as each connection can only process one query or modification at a time. The group of threads with their connections can then perform concurrent reads easily. If you've got a significant problem with many concurrent writes causing excessive blocking or failure to acquire locks, you're getting to the point where you're exceeding what SQLite does for you (and should consider a server-based DB like PostgreSQL).
Note that you can also have a master thread open the connections for the worker threads if that's more convenient, but it's advised (for your sanity's sake if nothing else!) to only actually use each connection from one thread.
[* For a normal build of SQLite. It's possible to switch things off at build time, of course.]

SQLite has no write concurrency, but it supports arbitrarily many connections that read at the same time.
Just ensure that every thread has its own connection.

25 simultanious connections is not a smart idea. That's a huge number.
I usually create a multi-layered design for this problem. I send all requests to the database through a kind of ObjectFactory class that has an internal cache. The ObjectFactory will forward the request to a ConnectionPoolHandler and will store the results in its cache. This connection pool handler uses X simultaneous connections but dispatches them to several threads.
However, some remarks must be made before applying this design. You first have to ask yourself the following 2 questions:
Is your application the only application that has access to this
database?
Is your application the only application that modifies data in this database?
If the first question is negatively, then you could encounter locking issues. If your second question is answered negatively, then it will be extremely difficult to apply caching. You may even prefer not to implement any caching it all.
Caching is especially interesting in case you are often requesting objects based on a unique reference, such as the primary key. In that case you can store the most often used objects in a Map. A popular collection for caching is an "LRUMap" ("Least-Recently-Used" map). The benifit of this collection is that it automatically arranges the most often used objects to the top. At the same time it has a maximum size and automatically removes items from the map that are rarely ever used.
A second advantage of caching is that each object exists only once. For example:
An Employee is fetched from the database.
The ObjectFactory converts the resultset to an actual object instance
The ObjectFactory immediatly stores it in cache.
A bit later, a bunch of employees are fetched using an SQL "... where name like "John%" statement.
Before converting the resultset to objects, the ObjectFactory first checks if the IDs of these records are perhaps already stored in cache.
Found a match ! Aha, this object does not need to be recreated.
There are several advantages to having a certain object only once in memory.
Last but not least in Java there is something like "Weak References". These are references that are references that in fact can be cleaned up by the garbage collector. I am not sure if it exists in C# and how it's called. By implementing this, you don't even have to care about the maximum amount of cached objects, your garbage collector will take care of it.

guarantee that up-to-date value of variable is always visible to several threads on multi-processor system

I'm using such configuration:
.NET framework 4.5
Windows Server 2008 R2
HP DL360p Gen8 (2 * Xeon E5-2640, x64)
I have such field somewhere in my program:
protected int HedgeVolume;
I access this field from several threads. I assume that as I have multi-processor system it's possible that this threads are executing on different processors.
What should I do to guarantee that any time I use this field the most recent value is "read"? And to make sure that when I "write" value it become available to all other threads immediately?
What should I do?
just leave field as is.
declare it volatile
use Interlocked class to access the field
use .NET 4.5 Volatile.Read, Volatile.Write methods to access the field
use lock
I only need simplest way to make my program work on this configuration I don't need my program to work on another computers or servers or operation systems. Also I want minimal latency so I'm looking for fastest solution that will always work on this standard configuration (multiprocessor intel x64, .net 4.5).

Your question is missing one key element... How important is the integrity of the data in that field?
volatile gives you performance, but if a thread is currently writing changes to the field, you won't get that data until it's done, so you might access out of date information, and potentially overwrite changes another thread is currently doing. If the data is sensitive, you might get bugs that would get very hard to track. However, if you are doing very quick update, overwrite the value without reading it and don't care that once in a while you get outdated (by a few ms) data, go for it.
lock guaranty that only one thread can access the field at a time. You can put it only on the methods that write the field and leave the reading method alone. The down side is, it is slow, and may block a thread while another is performing its task. However, you are sure your data stay valid.
Interlock exist to shield yourself from the scheduler context switch. My opinion? Don't use it unless you know exactly why you would be using it and exactly how to use it. It gives options, but with great options comes great problematic. It prevents a context switch while a variable is being update. It might not do what you think it does and won't prevent parallel threads from performing their tasks simultaneously.

You want to use Volatile.Read().
As you are running on x86, all writes in C# are the equivalent of Volatile.Write(), you only need to use this for Itanium.
Volatile.Read() will ensure that you get the latest copy regardless of which thread last wrote it.
There is a fantastic write up here, C# Memory Model Explained
Summary of it includes,
On some processors, not only must the compiler avoid certain
optimizations on volatile reads and writes, it also has to use special
instructions. On a multi-core machine, different cores have different
caches. The processors may not bother to keep those caches coherent by
default, and special instructions may be needed to flush and refresh
the caches.
Hopefully that much is obvious, other than the need for volatile to stop the compiler from optimising it, there is the processor as well.
However, in C# all writes are volatile (unlike say in Java),
regardless of whether you write to a volatile or a non-volatile field.
So, the above situation actually never happens in C#. A volatile write
updates the thread’s cache, and then flushes the entire cache to main
memory.
You do not need Volatile.Write(). More authoratitive source here, Joe Duffy CLR Memory Model. However, you may need it to stop the compiler reordering it.
Since all C# writes are volatile, you can think of all writes as going
straight to main memory. A regular, non-volatile read can read the
value from the thread’s cache, rather than from main
You need Volatile.Read()

When you start designing a concurrent program, you should consider these options in order of preference:
1) Isolation: each thread has it's own private data
2) Immutability: threads can see shared state, but it never changes
3) Mutable shared state: protect all access to shared state with locks
If you get to (3), then how fast do you actually need this to be?
Acquiring an uncontested lock takes in the order of 10ns ( 10-8 seconds ) - that's fast enough for most applications and is the easiest way to guarantee correctness.
Using any of the other options you mention takes you into the realm of low-lock programming, which is insanely difficult to get correct.
If you want to learn how to write concurrent software, you should read these:
Intro: Joe Albahari's free e-book - will take about a day to read
Bible: Joe Duffy's "Concurrent Programming on Windows" - will take about a month to read

Depends what you DO. For reading only, volatile is easiest, interlocked allows a little more control. Lock is unnecessary as it is more ganular than the problem you describe. Not sure about Volatile.Read/Write, never used them.

volatile - bad, there are some issues (see Joe Duffy's blog)
if all you do is read the value or unconditionally write a value - use Volatile.Read and Volatile.Write
if you need to read and subsequently write an updated value - use the lock syntax. You can however achieve the same effect without lock using the Interlocked classes functionality, but this is more complex (involves CompareExchange s to ensure that you are updating the read value i.e. has not been modified since the read operation + logic to retry if the value was modified since the read).

From this i can understand that you want to be able to read the last value that it was writtent in a field. Lets make an analogy with the sql concurency problem of the data. If you want to be able to read the last value of a field you must make atomic instructions. If someone is writing a field all of the threads must be locked for reading until that thread finished the writing transaction. After that every read on that thread will be safe. The problem is not with reading as it is with writing. A lock on that field whenever its writtent should be enough if you ask me ...

First have a look here: Volatile vs. Interlocked vs. lock
The volatile modifier shurely is a good option for a multikernel cpu.
But is this enough? It depends on how you calculate the new HedgeVolume value!
If your new HedgeVolume does not depend on current HedgeVolume then your done with volatile.
But if HedgeVolume[x] = f(HedgeVolume[x-1]) then you need some thread synchronisation to guarantee that HedgeVolume doesn't change while you calculate and assign the new value. Both, lock and Interlocked szenarios would be suitable in this case.

I had a similar question and found this article to be extremely helpful. It's a very long read, but I learned a LOT!

C# locks and newbie multithreading questions

Some newbie questions about multi-threading in .NET which I think will help reinforce some concepts I'm trying to absorb - I've read several multi-threading material (including the Albahari ebook) but feel I just need some confirmation of some questions to help drive these concepts home
A lock scope protects a shared region of code - suppose there is a thread executing a method that increments a simple integer variable x in a loop - however this won't protect code elsewhere that might also alter variable x eg in another method on another thread ...
Since this is two different regions of code potentially affecting the same variable, do we solve this by locking both regions of code using the same lock variable for both lock scopes around variable x? If you locked both regions of code with different lock variables, this would not protect the variable correct?
To further this example, using the same lock variable, what would happen if for some reason, code in one method went into some infinite loop and never relinquished the lock variable - how could the second region of code in the other method detect this?
How does the choice of lock variable influence the behavior of the lock? I've read numerous posts on this subject already but can never seem to find a definitive answer - in some instances people explicitly use an object variable specifically for this purpose, other times people use lock(this) and finally there've been times I've seen people use a type object.
How do the different choices of lock variables influence the behavior / scope of the lock and what scenarios would it make sense to use one over the other?
suppose you have a hashtable wrapped in a class exposing add, remove, get and some sort of Calculate method (say each object represents a quantity and this method sums each value) and all these methods are locked - however, once a reference to an object in that collection is made available to other code and passed around an application, this object (not the hashtable) would now be outside the lock scope surrounding the methods of that class ..how could you then protect access / updates to those actual objects taken from the hashtable, which could interfere with the Calculate method?
Appreciate any heuristics provided that would help reinforce these concepts for me - thanks!

1) Yes
2) That's a deadlock
3) The parts of your code you want to block are an implementation detail of your class. Exposing the lock object by using lock(this) or lock(this.GetType()) is asking for trouble since now external code can lock the same object and block your code unintentionally or maliciously. The lock object should be private.
4) It isn't very clear what you mean, you certainly wouldn't want to expose the Hashtable directly. Just keep it as a private field of the class, encapsulating it.
However, the odds that you can safely expose your class to client code using threads go down very rapidly with the number of public methods and properties you expose. You'll quickly get to a point where only the client code can properly take a lock. Fine-grained locking creates lots of opportunities for threading races when the client code is holding on to property values. Say a Count property value you return. By the time it uses the value, like in a for loop, the Count property might have changed. Only the most careful design can avoid these traps, a serious headache.
Furthermore, fine-grained locking is very inefficient since it inevitably is done in the most inner parts of your code. Locks are not that expensive, a rough 100 cpu cycles, but it quickly adds up. Especially wasted effort if the class object isn't actually used in multiple threads.
You then have no option but to declare your class thread-unsafe and the client code needs to use it in a thread-safe manner. Also the core reason that so many .NET classes are not thread-safe. This is the biggest reason that threading is so hard to get right, the programmer least likely to do it correctly is responsible for doing the most difficult thing.

1)
You are correct. You must use the same lock object to protect two distinct area's of code that for example increment the variable x.
2)
This is known as a deadlock and is one of the difficulties with multithreaded programming. There are algorithms which can be used to prevent deadlocks such as the Bankers Algorithm.
3)
Some languages make locking easy, for example in .Net you can just create an object and use it as the shared lock. This is good for synchronising code within a given process. Lock(this) just applies the lock to the object in question. However try to avoid this, instead create a private object and use that. Lock(this) can lead to deadlocking situations. The lock object underneath is probably just a wrapper around a Critical Section. If you wanted to protect a resource across different processes you would need a much heavier named Mutex, this requires a lock on a kernel object and is expensive, so do not use unless you must.
4)You need to make sure locking is applied there as well. But surely when people call methods on this reference they call the methods which employ synchronisation.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.