Part of a C# application I'm writing requires collecting data from a service provider's database for each account associated to a user. When the user logs into the app a call is made to start updating the accounts from the service provider's database. Since lots operations are performed on the third party's end the process of getting their information could take a while so I don't want to wait for each account just to start the process of updating. My question is, is there any issues (maybe threading issues) with calling a asynchronous method inside of a loop?
The only loop-specific issue is that if you use anonymous methods that refer to loop variables, each time around the loop an instance of the anonymous method object will be created but they will all refer to the same loop variable, so they will see it change its value as the loop executes. So make a copy of the loop variable inside the loop.
foreach (var thing in collection)
{
var copy = thing;
Action a = () =>
{
// refer to copy, not thing
}
}
2017-04-25: By the way, this issue was solved by C# 5.0. foreach automatically performs the above transformation.
The loop is no problem, but starting (too) many threads might be. See if your requirements allow using a the ThreadPool.
Related
I am using a thread-safe third party library to retrieve data from a historian.
The operating mode for a typical scenario is the following:
Library instance;
Result[] Process(string[] itemNames) {
var itemsIds = instance.ReserveItems(itemNames);
Result[] results = instance.ProcessItems(itemIds);
instance.ReleaseItems(itemIds);
return results;
}
Library is a class that is expensive to instantiate, so it is used here as a singleton (instance), and it works perfectly against multiple threads.
However, I notice sometimes that a Result is marked as failed ("item not found"), when multiple threads attempt to execute Process with an itemNames array that shares some common items. Because the library is very badly documented, that was unexpected.
By intensively logging, I have deduced that a thread could release an item at the same time another one is about to process it.
After a couple of mails to the library's vendor, I learnt that instance shares a list of reserved items between thread, and that it is necessary to synchronize the calls...
Uncompiling some part of the library confirmed this: there is a class level m_items list that is used by both ReserveItems and ReleaseItems.
So I envision the following waste:
Result[] Process(string[] itemNames) {
lock(instance) {
var itemsIds = instance.ReserveItems(itemNames);
Result[] results = instance.ProcessItems(itemIds);
instance.ReleaseItems(itemIds);
return results;
}
}
But it seems a bit too violent to me.
As this Library works perfectly when different items are processed by multiple thread, how can perform a more fine-grained synchronization and avoid a performance penalty?
EDIT - 2018-11-09
I noticed that the whole ProcessItems method body of the Library is enclosed into a lock statement...
So any attempt at fine synchronization around this is futile. I ended up enclosing my Process method body in a lock statement as well, the performance penalty is -as expected now- not perceptible at all.
You could implement a lock per item ID. That could take the form of a Dictionary<string, object> where the value is the lock object (new object()).
If you want to process the same item ID on multiple threads at the same time without blocking everything in case of conflict, you could track more state in the dictionary value to do that. As an example, you could use a Dictionary<string, Lazy<Result>>. The first thread to need an item ID would initialize and directly consume the lazy. Other threads can then detect that an operation is in progress on that item ID and also consume the lazy.
If I have a ConcurrentDictionary and use the TryGetValue within an if statement, does this make the if statement's contents thread safe? Or must you lock still within the if statement?
Example:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
//Users is a list.
client.Users.Add(item);
}
or do I have to do:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
lock (client)
{
//Users is a list.
client.Users.Add(item);
}
}
Yes you have to lock inside the if statement the only guarantee you get from concurrent dictionary is that its methods are thread save.
The accepted answer could be misleading, depending on your point of view and the scope of thread safety you are trying to achieve. This answer is aimed at people who stumble on this question while learning about threading and concurrency:
It's true that locking on the output of the dictionary retrieval (the Client object) makes some of the code thread safe, but only the code that is accessing that retrieved object within the lock. In the example, it's possible that another thread removes that object from the dictionary after the current thread retrieves it. (Even though there are no statements between the retrieval and the lock, other threads can still execute in between.) Then, this code would add the Client object to the Users list even though it is no longer in the concurrent dictionary. That could cause an exception, synchronization, or race condition.
It depends on what the rest of the program is doing. But in the scenario I'm describing, it would be safer to put the lock around the entire dictionary retrieval. And then a regular dictionary might be faster and simpler than a concurrent dictionary, as long as you always lock on it while using it!
While both of the current answers are technically true I think that the potential exists for them to be a little misleading and they don't express ConcurrentDictionary's big strengths. Maybe the OP's original way of solving the problem with locks worked in that specific circumstance but this answer is aimed more generally towards people learning about ConcurrentDictionary for the first time.
Concurrent Dictionary is designed so that you don't have to use locks. It has several specialty methods designed around the idea that some other thread could modify the object in the dictionary while you're currently working on it. For a simple example, the TryUpdate method lets you check to see if a key's value has changed between when you got it and the moment that you're trying to update it. If the value that you've got matches the value currently in the ConcurrentDictionary you can update it and TryUpdate returns true. If not, TryUpdate returns false. The documentation for the TryUpdate method can make this a little confusing because it doesn't make it explicitly clear why there is a comparison value but that's the idea behind the comparison value. If you wanted to have a little more control around adding or updating, you could use one of the overloads of the AddOrUpdate method to either add a value for a key if it doesn't exist at the moment that you're trying to add it or update the value if some other thread has already added a value for the key that is specified. The context of whatever you're trying to do will dictate the appropriate method to use. The point is that, rather than locking, try taking a look at the specialty methods that ConcurrentDictionary provides and prefer those over trying to come up with your own locking solution.
In the case of OP's original question, I would suggest that instead of this:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
//Users is a list.
client.Users.Add(item);
}
One might try the following instead*:
ConcurrentDictionary<Guid, Client> m_Clients;
Client originalClient;
if(m_Clients.TryGetValue(clientGUID, out originalClient)
{
//The Client object will need to implement IEquatable if more
//than an object instance comparison needs to be done. This
//sample code assumes that Client implements IEquatable.
//If copying a Client is not trivial, you'll probably want to
//also implement a simple type of copy in a method of the Client
//object. This sample code assumes that the Client object has
//a ShallowCopy method to do this copy for simplicity's sake.
Client modifiedClient = originalClient.ShallowCopy();
//Make whatever modifications to modifiedClient that need to get
//made...
modifiedClient.Users.Add(item);
//Now update the value in the ConcurrentDictionary
if(!m_Clients.TryUpdate(clientGuid, modifiedClient, originalClient))
{
//Do something if the Client object was updated in between
//when it was retrieved and when the code here tries to
//modify it.
}
}
*Note in the example above, I'm using TryUpate for ease of demonstrating the concept. In practice, if you need to make sure that an object gets added if it doesn't exist or updated if it does, the AddOrUpdate method would be the ideal option because the method handles all of the looping required to check for add vs update and take the appropriate action.
It might seem like it's a little harder at first because it may be necessary to implement IEquatable and, depending on how instances of Client need to be copied, some sort of copying functionality but it pays off in the long run if you're working with ConcurrentDictionary and objects within it in any serious way.
I am working on a class library that logs audit details of a web application in several types of datasources(file, xml, database) based on policies defined in the web configuration file.
My Audit log method has a signature similar to this:
public static void LogInfo(User user, Module module, List lst);
Web application uses this method to log important pieces of details like warnings, error and even exception details.
Since in a single workflow, there are more than 700+ calls to these methods , I thought of making them asynchronous. I used simple method from ThreadPool class called QueueUserWorkItem
ThreadPool.QueueUserWorkItem(o => LogInfo(User user, Module module, List<Object> lst) );
but this does not ensure the order in which work item was queued to it. Even though all my information was logged but entire ordering was messed up. In my text file my logs were not in the order in which they were called.
Is there a way I can control the ordering of the threads being called using QueueUserWorkItem?
I don't think you can specify ordering when using QueueUserWorkItem.
To run the logging in parallel (on some background thread), you could use ConcurrentQueue<T>. This is a thread-safe collection that can be accessed from multiple threads. You could create one work item (or a thread) that reads elements from the collection and writes them to a file. Your main application would add items to the collection. The fact that you're adding items to the collection from a single thread should guarantee that they will be read in the right order.
To keep things simple, you can store Action values in the queue:
ConcurrentQueue<Action> logOperations = new ConcurrentQueue<Action>();
// To add logging operation from main thread:
logOperations.Add(() => LogInfo(user, module, lst));
The background task can just take Actions from the queue and run them:
// Start this when you create the `logOperations` collection
ThreadPool.QueueUserWorkItem(o => {
Action op;
// Repeatedly take log operations & run them
while (logOperations.TryDequeue(out op)) op();
});
If you need to stop the background processor (that writes data to the log), you can create a CancellationTokenSource and also end the while loop when the token is being cancelled (by the main thread). This cha be checked using IsCancellationRequested property (see MSDN)
One way of solving this would be to put your data in a queue, and then having a single task picking from that queue and writing them in order. If you are using .net 4.0 You could use ConcurrentQueue, which is thread safe, otherwise a simple Queue with proper locks would work as well.
The thread consuming the queue could then periodically check for any element inside the queue, and for each one of them it could log. This way the lengthy operation (logging) could be in its own thread, whereas in the main thread you do simply adds.
imagine the simplest DB access code with some in-memory caching -
if exists in cache
return object
else
get from DB
add to cache
return object
Now, if the DB access takes a second and I have, say, 5 ASP.Net requests/threads hitting that same code within that second, how can I ensure only the first one does the DB call? I have a simple thread lock around it, but that simply queues them up in an orderly fashion, allowing each to call the DB in turn. My data repositories basically read in entire tables in one go, so we're not talking about Get by Id data requests.
Any ideas on how I can do this? Thread wait handles sound almost what I'm after but I can't figure out how to code it.
Surely this must be a common scenario?
Existing pseudocode:
lock (threadLock)
{
get collection of entities using Fluent NHib
add collection to cache
}
Thanks,
Col
You've basically answered your own question. The "lock()" is fine, it prevents the other threads proceeding into that code while any other thread is in there. Then, inside the lock perform your first pseudo-code. Check if it's cached already, if not, retrieve the value and cache it. The next thread will then come in, check the cache, find it's available and use that.
Surely this must be a common scenario?
Not necessarily as common as you may think.
In many similar caching scenarios:
the race condition you describe doesn't happen frequently (it requires multiple requests to arrive when the cache is cold)
the data returned from the database is readonly, and data returned by multiple requests is essentially interchangeable.
the cost of retrieving the database is not so prohibitive that it matters.
But if in scenario you absolutely need to prevent this race condition, then use a lock as suggested by Roger Perkins.
I'd use Monitor/Mutext over lock. Using lock u need to specify a resource (may also use this-pointer, which is not recommended).
try the following instead:
Mutext myMutex = new Mutex();
// if u want it systemwide use a named mutex
// Mutext myMutex = new Mutex("SomeUniqueName");
mutex.WaitOne();
// or
//if(mutex.WaitOne(<ms>))
//{
// //thread has access
//}
//else
//{
// //thread has no access
//}
<INSERT CODE HERE>
mutex.ReleaseMutex();
I don't know general solution or established algorithm is exist.
I personally use below code pattern to solve problem like this.
1) Define a integer variable that can be accessed by all thread.
int accessTicket = 0;
2) Modify code block
int myTicket = accessTicket;
lock (threadLock)
{
if (myTicket == accessTicket)
{
++accessTicket;
//get collection of entities using Fluent NHib
//add collection to cache
}
}
UPDATE
Purpose of this code is not prevent multiple DB access of duplicate caching. We can do it with normal thread lock.
By using the access ticket like this we can prevent other thread doing again already finished work.
UPDATE#2
LOOK THERE IS lock (threadLock)
Look before comment.
Look carefully before vote down.
Good morning,
At the startup of the application I am writing I need to read about 1,600,000 entries from a file to a Dictionary<Tuple<String, String>, Int32>. It is taking about 4-5 seconds to build the whole structure using a BinaryReader (using a FileReader takes about the same time). I profiled the code and found that the function doing the most work in this process is BinaryReader.ReadString(). Although this process needs to be run only once and at startup, I would like to make it as quick as possible. Is there any way I can avoid BinaryReader.ReadString() and make this process faster?
Thank you very much.
Are you sure that you absolutely have to do this before continuing?
I would examine the possibility of hiving off the task to a separate thread which sets a flag when finished. Then your startup code simply kicks off that thread and continues on its merry way, pausing only when both:
the flag is not yet set; and
no more work can be done without the data.
Often, the illusion of speed is good enough, as anyone who has coded up a splash screen will tell you.
Another possibility, if you control the data, is to store it in a more binary form so you can just blat it all in with one hit (i.e., no interpretation of the data, just read in the whole thing). That, of course, makes it harder to edit the data from outside your application but you haven't stated that as a requirement.
If it is a requirement or you don't control the data, I'd still look into my first suggestion above.
If you think that reading the file line by line is the bottleneck, and depending on its size, you can try to read it all at once:
// read the entire file at once
string entireFile = System.IO.File.ReadAllText(path);
It this doesn't help, you can try to add a separate thread with a semaphore, which would start reading in background immediately when the program is started, but block the requesting thread at the moment you try to access the data.
This is called a Future, and you have an implementation in Jon Skeet's miscutil library.
You call it like this at the app startup:
// following line invokes "DoTheActualWork" method on a background thread.
// DoTheActualWork returns an instance of MyData when it's done
Future<MyData> calculation = new Future<MyData>(() => DoTheActualWork(path));
And then, some time later, you can access the value in the main thread:
// following line blocks the calling thread until
// the background thread completes
MyData result = calculation.Value;
If you look at the Future's Value property, you can see that it blocks at the AsyncWaitHandle if the thread is still running:
public TResult Value
{
get
{
if (!IsCompleted)
{
_asyncResult.AsyncWaitHandle.WaitOne();
_lock.WaitOne();
}
return _value;
}
}
If strings are repeated inside tuples you could reorganize your file to have all different involving strings at the start, and have references to those strings (integers) in the body of the file. Your main Dictionary does not have to change, but you would need a temporary Dictionary during startup with all different strings (values) and their references (keys).