I'm having a hard time wrapping my head around accessing a singleton class with multiple threads.
This article has given me a nice starting point to get my singleton thread safe: http://csharpindepth.com/Articles/General/Singleton.aspx
My singleton class is supposed to treat a group of files as a single unity of data, but process them in a parallel fashion.
I store information of each file in a dictionary and return to the calling thread a unique key (which will be created using a DateTime and a random number) so that each thread can later refer to its own file.
public string AddFileForProcessing(FileForProcessing file)
{
var id = CreateUniqueFileId();
var resultFile = CreateResultFileFor(file);
//These collections are written here and only read elsewhere
_files.Add(id, file);
_results.Add(id, resultFile)
return id;
}
Then later threads call methods passing this id.
public void WriteProcessResultToProperFile(string id, string[] processingResult)
{
//locate the proper file in dictionary using id and then write information...
File.AppendAllLines(_results[key].FileName, processingResult);
}
Those methods will be accessed inside a class that:
a) Responds to a FileWatcher's Created event and creates threads that call AddFileForProcessing:
public void ProcessIncomingFile(object sender, EventArgs e)
{
var file = ((FileProcessingEventArg)e).File;
ThreadPool.QueueUserWorkItem(
item =>
{
ProcessFile(file);
});
}
b) Inside ProcessFile, I add the file to the dictionary and start processing.
private void ProcessFile(FileForProcessing file)
{
var key = filesManager.AddFileForProcessing(file);
var records = filesManager.GetRecordsCollection(key);
for (var i = 0; i < records.Count; i++)
{
//Do my processing here
filesManager.WriteProcessResultToProperFile(key, processingResult);
}
}
Now I don't get what happens when two threads call these methods, given they're both using the same instance.
Each thread will call AddFileForProcessing and WriteProcessResultToProperFile with a different parameter. Does that make them two different calls?
Since it will operate on a file that will be uniquely identified by the id that belongs to a single thread (i.e.. no file will suffer from multiple accesses), can I leave this method as is or do I still have to "lock" my method?
Yes, as long as you only read from the shared dictionary all should be fine. And you can process the files in parallel as long as they are different files, as you correctly mention.
The documentation explains:
A Dictionary<TKey, TValue> can support multiple readers concurrently, as long as the collection is not modified.
So, you can't do anything in parallel if anyone can call AddFileForProcessing (without lock). But with calls only to WriteProcessResultToProperFile, it will be fine. This implies that if you want to call AddFileForProcessing in parallel, then you need locks in both methods (in fact: all parts of code that will touch this dictionary).
Related
I have two methods as below
private void MethodB_GetId()
{
//Calling Method A constinuosly in different thread
//Let's say its calling for Id = 1 to 100
}
private void MethodA_GetAll()
{
List<string> lst;
lock(_locker)
{
lst = SomeService.Get(); //This get return all 100 ids in one shot.
//Some other processing and then return result.
}
}
Now client is calling MethodB_GetById continuously for fetching data for id: 1 to 100 randomly. (It require some of data from these 100 Ids, not all data)
MethodA_GetAll get all data from network may be cache or database in one shot. and return whole collection to method B, then method B extract record in which it is interested.
Now if MethodA_GetAll() makes GetALL() times multiple times and fetching same records will be useless. so i can put a lock around it one thread is fetching record then other will be blocked.
Let's When MethodA_GetAll called by Id = 1 acquire lock and all others are waiting for lock to be released.
What i want is one data is available by any one thread just don't make call again.
Solution option:
1. Make List global to that class and thread safe. (I don't have that option)
I require some how thread 1 tell all other threads that i have record don't go fetching record again.
something like
lock(_locker && Lst!=null) //Not here lst is local to every thread
{
//If this satisfy then only fetch records
}
Please excuse me for poorly framing question. I have posted this in little hurry.
It sounds like you want to create a threadsafe cache. One way to do this is to use Lazy<t>.
Here's an example for a cache of type List<string>:
public sealed class DataProvider
{
public DataProvider()
{
_cache = new Lazy<List<string>>(createCache);
}
public void DoSomethingThatNeedsCachedList()
{
var list = _cache.Value;
// Do something with list.
Console.WriteLine(list[10]);
}
readonly Lazy<List<string>> _cache;
List<string> createCache()
{
// Dummy implementation.
return Enumerable.Range(1, 100).Select(x => x.ToString()).ToList();
}
}
When you need to access the cached value, you just access _cache.Value. If it hasn't yet been created, then the method you passed to the Lazy<T>'s constructor will be called to initialise it. In the example above, this is the createCache() method.
This is done in a threadsafe manner, so that if two threads try to access the cached value simultaneously when it hasn't been created yet, one of the threads will actually end up calling createCache() and the other thread will be blocked until the cached value has been initialised.
You can try double-check-locking lst:
private List<string> lst;
private void MethodA_GetAll()
{
if (lst == null)
{
lock (_locker)
{
if (lst == null)
{
// do your thing
}
}
}
}
So I'm running a Parallel.ForEach that basically generates a bunch of data which is ultimately going to be saved to a database. However, since collection of data can get quite large I need to be able to occasionally save/clear the collection so as to not run into an OutOfMemoryException.
I'm new to using Parallel.ForEach, concurrent collections, and locks, so I'm a little fuzzy on what exactly needs to be done to make sure everything works correctly (i.e. we don't get any records added to the collection between the Save and Clear operations).
Currently I'm saying, if the record count is above a certain threshold, save the data in the current collection, within a lock block.
ConcurrentStack<OutRecord> OutRecs = new ConcurrentStack<OutRecord>();
object StackLock = new object();
Parallel.ForEach(inputrecords, input =>
{
lock(StackLock)
{
if (OutRecs.Count >= 50000)
{
Save(OutRecs);
OutRecs.Clear();
}
}
OutRecs.Push(CreateOutputRecord(input);
});
if (OutRecs.Count > 0) Save(OutRecs);
I'm not 100% certain whether or not this works the way I think it does. Does the lock stop other instances of the loop from writing to output collection? If not is there a better way to do this?
Your lock will work correctly but it will not be very efficient because all your worker threads will be forced to pause for the entire duration of each save operation. Also, locks tends to be (relatively) expensive, so performing a lock in each iteration of each thread is a bit wasteful.
One of your comments mentioned giving each worker thread its own data storage: yes, you can do this. Here's an example that you could tailor to your needs:
Parallel.ForEach(
// collection of objects to iterate over
inputrecords,
// delegate to initialize thread-local data
() => new List<OutRecord>(),
// body of loop
(inputrecord, loopstate, localstorage) =>
{
localstorage.Add(CreateOutputRecord(inputrecord));
if (localstorage.Count > 1000)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
return localstorage;
},
// finally block gets executed after each thread exits
localstorage =>
{
if (localstorage.Count > 0)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
});
One approach is to define an abstraction that represents the destination for your data. It could be something like this:
public interface IRecordWriter<T> // perhaps come up with a better name.
{
void WriteRecord(T record);
void Flush();
}
Your class that processes the records in parallel doesn't need to worry about how those records are handled or what happens when there's too many of them. The implementation of IRecordWriter handles all those details, making your other class easier to test.
An implementation of IRecordWriter could look something like this:
public abstract class BufferedRecordWriter<T> : IRecordWriter<T>
{
private readonly ConcurrentQueue<T> _buffer = new ConcurrentQueue<T>();
private readonly int _maxCapacity;
private bool _flushing;
public ConcurrentQueueRecordOutput(int maxCapacity = 100)
{
_maxCapacity = maxCapacity;
}
public void WriteRecord(T record)
{
_buffer.Enqueue(record);
if (_buffer.Count >= _maxCapacity && !_flushing)
Flush();
}
public void Flush()
{
_flushing = true;
try
{
var recordsToWrite = new List<T>();
while (_buffer.TryDequeue(out T dequeued))
{
recordsToWrite.Add(dequeued);
}
if(recordsToWrite.Any())
WriteRecords(recordsToWrite);
}
finally
{
_flushing = false;
}
}
protected abstract void WriteRecords(IEnumerable<T> records);
}
When the buffer reaches the maximum size, all the records in it are sent to WriteRecords. Because _buffer is a ConcurrentQueue it can keep reading records even as they are added.
That Flush method could be anything specific to how you write your records. Instead of this being an abstract class the actual output to a database or file could be yet another dependency that gets injected into this one. You can make decisions like that, refactor, and change your mind because the very first class isn't affected by those changes. All it knows about is the IRecordWriter interface which doesn't change.
You might notice that I haven't made absolutely certain that Flush won't execute concurrently on different threads. I could put more locking around this, but it really doesn't matter. This will avoid most concurrent executions, but it's okay if concurrent executions both read from the ConcurrentQueue.
This is just a rough outline, but it shows how all of the steps become simpler and easier to test if we separate them. One class converts inputs to outputs. Another class buffers the outputs and writes them. That second class can even be split into two - one as a buffer, and another as the "final" writer that sends them to a database or file or some other destination.
I am using a ConcurrentDictionary (ongoingConnectionDic) in my code:
I check if a serial port number exists in the Dictionary.
If not existing, I add it into dictionary.
I perform communication with the serial port.
I remove the element from the ongoingConnectionDic.
If existing, I put the thread in wait.
My question is, can I ensure that when I perform a read operation, no other thread is simultaneously writing / updating the value ? So, am I reading the most recent value of the dictionary ?
If not, how do I achieve what I want?
Sample program:
class Program
{
// Dictionary in question
private static ConcurrentDictionary<string, string> ongoingPrinterJobs =
new ConcurrentDictionary<string, string>();
private static void sendPrint(string printerName)
{
if (ongoingPrinterJobs.ContainsKey(printerName))
{
// Add to pending list and run a thread to finish pending jobs by calling print();
}
else
{
ongoingPrinterJobs.TryAdd(printerName, ""); // -- Add it to the dictionary so that no other thread can
// use the printer
ThreadPool.QueueUserWorkItem(new WaitCallback(print), printerName);
}
}
private static void print(object stateInfo)
{
string printerName = (stateInfo as string);
string dummy;
// do printing work
// Remove from dictionary
ongoingPrinterJobs.TryRemove(printerName, out dummy);
}
static void Main(string[] args)
{
// Run threads here in random to print something on different printers
// Sample run with 10 printers
Random r = new Random();
for ( int i = 0 ; i < 10 ; i++ )
{
sendPrint(r.Next(0, 10).ToString());
}
}
The concurrent collections take a "snapshot" of the collection upon enumeration. This is to prevent the enumerator from becoming invalid if another thread comes along and writes to the collection.
A method such as ContainsKey may enumerate over the items in the dictionary (you'd have to look at the implementation), in which case, you may be reading stale data.
All concurrent collections allow you to do is ensure you can enumerate over a collection even if another thread writes to it while you're enumerating. This wasn't the case with the standard collections.
With that said, as others have mentioned in their comments, other issues of thread safety must still be considered (E.g. race conditions).
The only way to prevent someone inserting a value into the collection after you've attempted to read a value but before writing ia value is to lock the collection prior to reading the value to begin with, to ensure synchronized access to the collection throughout the entire transaction (I.e. The reading and subsequent writing of a value).
I have a C# program that has a list that does writes and reads in separate threads. The write is user initiated and can change the data at any random point in time. The read runs in a constant loop. It doesn't matter if the read is missing data in any given loop, as long as the data it does receive is valid and it get's the new data in a future loop.
After considering ConcurrentBag, I settled on using locks for a variety of reasons (simplicity being one of them). After implementing the locks, a coworker mentioned to me that using temporary references to point to the old List in memory would work just as well, but I am concerned about what will happen if the new assignment and the reference assignment would happen at the same time.
Q: Is the temporary reference example below thread safe?
Update: User input provides a list of strings which are used in DoStuff(). You can think of these strings as a definition of constants and as such the strings need to be persisted for future loops. They are not deleted in DoStuff(), only read. UserInputHandler is the only thread that will ever change this list and DoStuff() is the only thread that will ever read from this list. Nothing else has access to it.
Additionally, I am aware of the the Concurrent namespace and have used most of the collections in it in other projects, but, I have chosen not to use them here because of extra code complexity that they add (i.e. ConcurrentBag doesn't have a simple Clear() function, etc.). A simple lock is good enough in this situation. The question is only whether the second example below is thread safe.
Lock
static List<string> constants = new List<string>();
//Thread A
public void UserInputHandler(List<string> userProvidedConstants)
{
lock(items)
{
items.Clear();
foreach(var constant in userProvidedConstants)
{
constants.Add(constant);
}
}
}
//Thread B
public void DoStuff()
{
lock(items)
{
//Do read only actions with items here
foreach(var constant in constants)
{
//readonly actions
}
}
}
Reference
static List<string> constants = new List<string>();
//Thread A
public void UserInputHandler(List<string> userProvidedConstants)
{
lock(items)
{
items = new List<string>();
foreach(var constant in userProvidedConstants)
{
constants.Add(constant);
}
}
}
//Thread B
public void DoStuff()
{
var constantsReference = constants;
//Do read only actions with constantsReference here
foreach(var constant in constantsReference)
{
//readonly actions
}
}
This is not safe without the lock. Copying the reference to the list doesn't really do anything for you in this context. It's still quite possible for the list that you are currently iterating to be mutated in another thread while you are iterating it, causing all sorts of possible badness.
I think what you're looking for is BlockingCollection. Check out the following link for getting starting using it:
http://msdn.microsoft.com/en-us/library/dd997371%28v=vs.110%29.aspx
Here's an example of using BlockingCollection. ThreadB won't start enumerating the BlockingCollection until there are items available, and when it runs out of items, it will stop enumerating until more items become available (or until the IsCompleted property returns true)
private static readonly BlockingCollection<int> Items = new BlockingCollection<int>();
//ThreadA
public void LoadStuff()
{
Items.Add(1);
Items.Add(2);
Items.Add(3);
}
//ThreadB
public void DoStuff()
{
foreach (var item in Items.GetConsumingEnumerable())
{
//Do stuff here
}
}
Lock Free is dangerous and not portable. Don't do it. If you need to read on how to do lock-free, you probably shouldn't be doing it.
I think I missed understood the question. I under the strange impression that the list was only ever added to or only the most recent version is what matters. No idea how I came to that when he explicitly shows a "clear()" call.
I apologize for the confusion.
This code is being disputed, use at your own risk, but I'm quite sure it should work on x86/x64, but no clue about ARM
You could do something like this
//Suggested to just use volatile instead of memorybarrier
static volatile T _MyList = new ReadOnlyList<T>();
void Load(){
T LocalList = _MyList.Copy();
LocalList.Add(1);
LocalList.Add(2);
LocalList.Add(3);
_MyList = LocalList.ReadOnly(); //Making it more clear
}
DoStuff(){
T LocalList = _MyList;
foreach(t tmp in LocalList)
}
This should work well for heavy read workloads. If you have more than one writer that modifies _MyList, you'll need to figure out a way to synchronize them.
I have a service that I am rewriting to use threading. I understand that state from one thread should not be accessed by another, but I'm a little confused by what constitutes 'state'. Does that mean any field/property/method outside of the method scope?
Specifically, my service looks something like this:
public class MyService
{
private IRepository<MyClass> repository;
private ILogger log;
...
public void MyMethod()
{
...
var t = new Thread(MyMethodAsync);
t.Start(someState);
}
//Is this OK???
public void MyMethodAsync(object state)
{
var someState = (MyState)state;
log.Log("Starting");
var someData = repository.GetSomeData(someState.Property);
//process data
log.Log("Done");
}
//Or should I be doing this:
public void MyMethodAsync2(object state)
{
var someState = (MyState)state;
lock(log){
log.Log("Starting"); }
lock(repository){
var someData = repository.GetSomeData(someState.Property);}
//process data
lock(log){
log.Log("Done"); }
}
}
Er...nope, you don't need to lock resources that are read-only. The purpose of locking them is so that if you need to check the value of a resource before writing it then another resource can't change the value between your read and your write. i.e.:
SyncLock MyQueue
If MyQueue.Length = 0 Then
PauseFlag.Reset
End If
End SyncLock
If we were to check the length of our queue before we set the flag to pause our process queue thread and then another resource were to add an item to the queue, then our process queue thread would sit in a paused state while an item potentially could've been added between checking the queue length and setting the pause flag...
If all resources are only accessing the queue in a read only fashion (not that I could think of a single useful application of a read-only queue) then there's no need to lock it.
"State" is all the data contained in the class, and the real issue as far as concurrency goes is write access, so your intuition is right.
Even worse, locking read-only structures is a good way to create deadlocks.