I'm using LINQ to SQL and after I submit some changes I want to spawn a thread which looks through all the changes and updates our lucene index as necessary. My code looks vaguely like:
(new Thread(() => { UpdateIndex(context.GetChangeSet()); }).Start();
Sometimes though I get an InvalidOperationException, which I think is because context.GetChangeSet() is not thread-safe, and so if the change set is modified in one thread while another thread is enumerating through it, problems arise.
Is there a "thread-safe" version of GetChangeSet()? Or some way I can do ChangeSet.clone() or something?
Instance members of the DataContext class are not thread-safe.
In order to avoid race conditions you should invoke the DataContext.GetChangeSet method from the same thread that makes the modifications tracked by the DataContext instance. For example:
public class CustomerDao : IDisposable
{
private DataContext context;
public CustomerDao()
{
this.context = new DataContext("SomeConnectionString");
}
public void Insert(Customer instance)
{
this.context.Customers.InsertOnSubmit(instance);
this.StartUpdateIndex();
this.context.SubmitChanges();
}
public void Delete(Customer instance)
{
this.context.Customers.DeleteOnSubmit(instance);
this.StartUpdateIndex();
this.context.SubmitChanges();
}
public void Dispose()
{
if (this.context != null)
{
this.context.Dispose();
}
}
private void StartUpdateIndex()
{
ChangeSet changes = this.context.GetChangeSet();
ThreadPool.QueueUserWorkItem(
state => this.UpdateIndex((ChangeSet)state), changes);
}
}
This assumes that the Insert and Delete methods are being called on a given instance of the CustomerDao class from a single thread.
I only needed to extract a small amount of data from each object, so for I ended up just extracting the text, putting it into a new object, then sending that new object off. This saved me a lot of trouble from having to deal with locking everywhere else, but I think Enrico's answer is probably the "real" one, so leaving his marked as the solution.
Related
Every so often I hit upon this problem and ignore it, but it started gnawing at me today.
private readonly object _syncRoot = new object();
private List<int> NonconcurrentObject { get; } = new List<int>();
public void Fiddle()
{
lock (_syncRoot)
{
// ...some code...
NonconcurrentObject.Add(1);
Iddle();
}
}
public void Twiddle()
{
lock (_syncRoot)
{
// ...some different code...
NonconcurrentObject.Add(2);
Iddle();
}
}
private void Iddle()
{
// NOT THREADSAFE! DO NOT CALL THIS WITHOUT LOCKING ON _syncRoot
// ......lots of code......
NonconcurrentObject.Add(3);
}
I have multiple public methods of a class with some code that is not inherently threadsafe (the List above is a trivial example). I want to use helper methods for the code shared between them (as anyone would), but in splitting off the shared code I'm faced with a dilemma: do I use recursive locking in the helper methods or not? If I do, my code is wasteful and possibly less performant. If I don't (as above), the helper method is no longer threadsafe and open to a nasty race condition if called by some other method in the future.
How can I (elegantly and robustly) signal that a method isn't threadsafe?
You use doc comments.
///<remarks>not thread safe</remarks>
You could use custom attributes to mark methods that are not thread safe.
The advantage over comments is that it gives you options for further processing (via reflection) if you wish to do so at a later date.
public class NotThreadSafe : Attribute
{
//...
}
public class MyClass
{
[NotThreadSafe]
public void MyMethod()
{
//...
}
}
You could add the _Unsafe suffix to your utility methods that are not protected with locks.
Advantages: It reminds you that you are doing dangerous things, and so that you must be extra careful. A small mistake could cost you days of debugging in the future.
Disadvantages: Not very pretty, and can be confused with the unsafe keyword.
private void Iddle_Unsafe()
{
NonconcurrentObject.Add(3);
}
public void Twiddle()
{
lock (_syncRoot)
{
NonconcurrentObject.Add(2);
Iddle_Unsafe();
}
}
There is an extended implementation of command pattern to support multi-commands (groups) in C#:
var ctx= //the context object I am sharing...
var commandGroup1 = new MultiItemCommand(ctx, new List<ICommand>
{
new Command1(ctx),
new Command2(ctx)
});
var commandGroup2 = new MultiItemCommand(ctx, new List<ICommand>
{
new Command3(ctx),
new Command4(ctx)
});
var groups = new MultiCommand(new List<ICommand>
{
commandGroup1 ,
commandGroup2
}, null);
Now , the execution is like:
groups.Execute();
I am sharing the same context (ctx) object.
The execution plan of the web app needs to separate
commandGroup1 and commandGroup2 groups in different thread. In specific, commandGroup2 will be executed in a new thread and commandGroup1 in the main thread.
Execution now looks like:
//In Main Thread
commandGroup1.Execute();
//In the new Thread
commandGroup2.Execute();
How can I thread-safely share the same context object (ctx), so as to be able to rollback the commandGroup1 from the new Thread ?
Is t.Start(ctx); enough or do I have to use lock or something?
Some code implementation example is here
The provided sample code certainly leaves open a large number of questions about your particular use-case; however, I will attempt to answer the general strategy to implementing this type of problem for a multi-threaded environment.
Does the context or its data get modified in a coupled, non-atmoic way?
For example, would any of your commands do something like:
Context.Data.Item1 = "Hello"; // Setting both values is required, only
Context.Data.Item2 = "World"; // setting one would result in invalid state
Then absolutely you would need to utilize lock(...) statements somewhere in your code. The question is where.
What is the thread-safety behavior of your nested controllers?
In the linked GIST sample code, the CommandContext class has properties ServerController and ServiceController. If you are not the owner of these classes, then you must carefully check the documentation on the thread-safety of of these classes as well.
For example, if your commands running on two different threads perform calls such as:
Context.ServiceController.Commit(); // On thread A
Context.ServiceController.Rollback(); // On thread B
There is a strong possibility that these two actions cannot be invoked concurrently if the creator of the controller class was not expecting multi-threaded usage.
When to lock and what to lock on
Take the lock whenever you need to perform multiple actions that must happen completely or not at all, or when invoking long-running operations that do not expect concurrent access. Release the lock as soon as possible.
Also, locks should only be taken on read-only or constant properties or fields. So before you do something like:
lock(Context.Data)
{
// Manipulate data sub-properties here
}
Remember that it is possible to swap out the object that Data is pointing to. The safest implementation is to provide a special locking objects:
internal readonly object dataSyncRoot = new object();
internal readonly object serviceSyncRoot = new object();
internal readonly object serverSyncRoot = new object();
for each sub-object that requires exclusive access and use:
lock(Context.dataSyncRoot)
{
// Manipulate data sub-properties here
}
There is no magic bullet on when and where to do the locks, but in general, the higher up in the call stack you put them, the simpler and safer your code will probably be, at the expense of performance - since both threads cannot execute simultaneously anymore. The further down you place them, the more concurrent your code will be, but also more expense.
Aside: there is almost no performance penalty for the actual taking and releasing of the lock, so no need to worry about that.
Assume we have a MultiCommand class that aggregates a list of ICommands and at some time must execute all commands Asynchronously. All Commands must share context. Each command could change context state, but there is no set order!
The first step is to kick off all ICommand Execute methods passing in the CTX. The next step is to set up an event listener for new CTX Changes.
public class MultiCommand
{
private System.Collections.Generic.List<ICommand> list;
public List<ICommand> Commands { get { return list; } }
public CommandContext SharedContext { get; set; }
public MultiCommand() { }
public MultiCommand(System.Collections.Generic.List<ICommand> list)
{
this.list = list;
//Hook up listener for new Command CTX from other tasks
XEvents.CommandCTX += OnCommandCTX;
}
private void OnCommandCTX(object sender, CommandContext e)
{
//Some other task finished, update SharedContext
SharedContext = e;
}
public MultiCommand Add(ICommand cc)
{
list.Add(cc);
return this;
}
internal void Execute()
{
list.ForEach(cmd =>
{
cmd.Execute(SharedContext);
});
}
public static MultiCommand New()
{
return new MultiCommand();
}
}
Each command handles the asynchronous part similar to this:
internal class Command1 : ICommand
{
public event EventHandler CanExecuteChanged;
public bool CanExecute(object parameter)
{
throw new NotImplementedException();
}
public async void Execute(object parameter)
{
var ctx = (CommandContext)parameter;
var newCTX = await Task<CommandContext>.Run(() => {
//the command context is here running in it's own independent Task
//Any changes here are only known here, unless we return the changes using a 'closure'
//the closure is this code - var newCTX = await Task<CommandContext>Run
//newCTX is said to be 'closing' over the task results
ctx.Data = GetNewData();
return ctx;
});
newCTX.NotifyNewCommmandContext();
}
private RequiredData GetNewData()
{
throw new NotImplementedException();
}
}
Finally we set up a common event handler and notification system.
public static class XEvents
{
public static EventHandler<CommandContext> CommandCTX { get; set; }
public static void NotifyNewCommmandContext(this CommandContext ctx, [CallerMemberName] string caller = "")
{
if (CommandCTX != null) CommandCTX(caller, ctx);
}
}
Further abstractions are possible in each Command's execute function. But we won't discuss that now.
Here's what this design does and doesn't do:
It allows any finished task to update the new context on the thread it was first set in the MultiCommand class.
This assumes there is no workflow based state necessary. The post merely indicated a bunch of task only had to run asynchronous rather than in an ordered asynchronous manner.
No currencymanager is necessary because we are relying on each command's closure/completion of the asynchronous task to return the new context on the thread it was created!
If you need concurrency then that implies that the context state is important, that design is similar to this one but different. That design is easily implemented using functions and callbacks for the closure.
As long as each context is only used from a single thread concurrently there is no problem with using it from multiple threads.
Using what I judged was the best of all worlds on the Implementing the Singleton Pattern in C# amazing article, I have been using with success the following class to persist user-defined data in memory (for the very rarely modified data):
public class Params
{
static readonly Params Instance = new Params();
Params()
{
}
public static Params InMemory
{
get
{
return Instance;
}
}
private IEnumerable<Localization> _localizations;
public IEnumerable<Localization> Localizations
{
get
{
return _localizations ?? (_localizations = new Repository<Localization>().Get());
}
}
public int ChunkSize
{
get
{
// Loc uses the Localizations impl
LC.Loc("params.chunksize").To<int>();
}
}
public void RebuildLocalizations()
{
_localizations = null;
}
// other similar values coming from the DB and staying in-memory,
// and their refresh methods
}
My usage would look something like this:
var allLocs = Params.InMemory.Localizations; //etc
Whenever I update the database, the RefreshLocalizations gets called, so only part of my in-memory store is rebuilt. I have a single production environment out of about 10 that seems to be misbehaving when the RefreshLocalizations gets called, not refreshing at all, but this is also seems to be intermittent and very odd altogether.
My current suspicions goes towards the singleton, which I think does the job great and all the unit tests prove that the singleton mechanism, the refresh mechanism and the RAM performance all work as expected.
That said, I am down to these possibilities:
This customer is lying when he says their environment is not using loading balance, which is a setting I am not expecting the in-memory stuff to work properly (right?)
There is some non-standard pool configuration in their IIS which I am testing against (maybe in a Web Garden setting?)
The singleton is failing somehow, but not sure how.
Any suggestions?
.NET 3.5 so not much parallel juice available, and not ready to use the Reactive Extensions for now
Edit1: as per the suggestions, would the getter look something like:
public IEnumerable<Localization> Localizations
{
get
{
lock(_localizations) {
return _localizations ?? (_localizations = new Repository<Localization>().Get());
}
}
}
To expand on my comment, here is how you might make the Localizations property thread safe:
public class Params
{
private object _lock = new object();
private IEnumerable<Localization> _localizations;
public IEnumerable<Localization> Localizations
{
get
{
lock (_lock) {
if ( _localizations == null ) {
_localizations = new Repository<Localization>().Get();
}
return _localizations;
}
}
}
public void RebuildLocalizations()
{
lock(_lock) {
_localizations = null;
}
}
// other similar values coming from the DB and staying in-memory,
// and their refresh methods
}
There is no point in creating a thread safe singleton, if your properties are not going to be thread safe.
You should either lock around assignment of the _localization field, or instantiate in your singleton's constructor (preferred). Any suggestion which applies to singleton instantiation applies to this lazy-instantiated property.
The same thing further applies to all properties (and their properties) of Localization. If this is a Singleton, it means that any thread can access it any time, and simply locking the getter will again do nothing.
For example, consider this case:
Thread 1 Thread 2
// both threads access the singleton, but you are "safe" because you locked
1. var loc1 = Params.Localizations; var loc2 = Params.Localizations;
// do stuff // thread 2 calls the same property...
2. var value = loc1.ChunkSize; var chunk = LC.Loc("params.chunksize");
// invalidate // ...there is a slight pause here...
3. loc1.RebuildLocalizations();
// ...and gets the wrong value
4. var value = chunk.To();
If you are only reading these values, then it might not matter, but you can see how you can easily get in trouble with this approach.
Remember that with threading, you never know if a different thread will execute something between two instructions. Only simple 32-bit assignments are atomic, nothing else.
This means that, in this line here:
return LC.Loc("params.chunksize").To<int>();
is, as far as threading is concerned, equivalent to:
var loc = LC.Loc("params.chunksize");
Thread.Sleep(1); // anything can happen here :-(
return loc.To<int>();
Any thread can jump in between Loc and To.
I'm trying to designing a class and I'm having issues with accessing some of the nested fields and I have some concerns with how multithread safe the whole design is. I would like to know if anyone has a better idea of how this should be designed or if any changes that should be made?
using System;
using System.Collections;
namespace SystemClass
{
public class Program
{
static void Main(string[] args)
{
System system = new System();
//Seems like an awkward way to access all the members
dynamic deviceInstance = (((DeviceType)((DeviceGroup)system.deviceGroups[0]).deviceTypes[0]).deviceInstances[0]);
Boolean checkLocked = deviceInstance.locked;
//Seems like this method for accessing fields might have problems with multithreading
foreach (DeviceGroup dg in system.deviceGroups)
{
foreach (DeviceType dt in dg.deviceTypes)
{
foreach (dynamic di in dt.deviceInstances)
{
checkLocked = di.locked;
}
}
}
}
}
public class System
{
public ArrayList deviceGroups = new ArrayList();
public System()
{
//API called to get names of all the DeviceGroups
deviceGroups.Add(new DeviceGroup("Motherboard"));
}
}
public class DeviceGroup
{
public ArrayList deviceTypes = new ArrayList();
public DeviceGroup() {}
public DeviceGroup(string deviceGroupName)
{
//API called to get names of all the Devicetypes
deviceTypes.Add(new DeviceType("Keyboard"));
deviceTypes.Add(new DeviceType("Mouse"));
}
}
public class DeviceType
{
public ArrayList deviceInstances = new ArrayList();
public bool deviceConnected;
public DeviceType() {}
public DeviceType(string DeviceType)
{
//API called to get hardwareIDs of all the device instances
deviceInstances.Add(new Mouse("0001"));
deviceInstances.Add(new Keyboard("0003"));
deviceInstances.Add(new Keyboard("0004"));
//Start thread CheckConnection that updates deviceConnected periodically
}
public void CheckConnection()
{
//API call to check connection and returns true
this.deviceConnected = true;
}
}
public class Keyboard
{
public string hardwareAddress;
public bool keypress;
public bool deviceConnected;
public Keyboard() {}
public Keyboard(string hardwareAddress)
{
this.hardwareAddress = hardwareAddress;
//Start thread to update deviceConnected periodically
}
public void CheckKeyPress()
{
//if API returns true
this.keypress = true;
}
}
public class Mouse
{
public string hardwareAddress;
public bool click;
public Mouse() {}
public Mouse(string hardwareAddress)
{
this.hardwareAddress = hardwareAddress;
}
public void CheckClick()
{
//if API returns true
this.click = true;
}
}
}
Making a class thread-safe is a heck of a difficult thing to do.
The first, naive, way, that many tends to attempt is just adding a lock and ensuring that no code that touches mutable data does so without using the lock. By that I mean that everything in the class that is subject to change, has to first lock the locking object before touching the data, be it just reading from it, or writing to it.
However, if this is your solution, then you should probably not do anything at all to the code, just document that the class is not thread-safe and leave it to the programmer that uses it.
Why?
Because you've effectively just serialized all access to it. Two threads that tries use the class at the same time, even though they are touching separate parts of it, will block. One of the threads will be given access, the other one will wait until the first one is complete.
This is actually discouraging multi-threaded usage of your class, so in this case you're adding overhead of locking to your class, and not actually getting any benefits from it. Yes, your class is now "thread safe", but it isn't actually a good thread-citizen.
The other way is to start adding granular locks, or writing lock-free constructs (seriously hard), so that if two parts of the object aren't always related, code that accesses each part have their own lock. This would allow multiple threads that accesses different parts of the data to run in parallel without blocking one another.
This becomes hard wherever you need to work on more than one part of the data at a time, as you need to be super-careful to take the locks in the right order, or suffer deadlocks. It should be your class' responsibility to ensure the locks are taken in the right order, not the code that uses the class.
As for your specific example, it looks to me as though the parts that will change from background threads are only the "is the device connected" boolean values. In this case I would make that field volatile, and use a lock around each. If, however, the list of devices will change from background threads, you're going to run into problems pretty fast.
You should first try to identify all the parts that will be changed by background threads, and then devise scenarios for how you want the changes to propagate to other threads, how to react to the changes, etc.
Sorry if this has been answered elsewhere... I have found a lot of posts on similar things but not the same.
I want to ensure that only one instance of an object exists at a time BUT I don't want that object to be retained past its natural life-cycle, as it might be with the Singleton pattern.
I am writing some code where processing of a list gets triggered (by external code that I have no control over) every minute. Currently I just create a new 'processing' object each time and it gets destroyed when it goes out of scope, as per normal. However, there might be occasions when the processing takes longer than a minute, and so the next trigger will create a second instance of the processing class in a new thread.
Now, I want to have a mechanism whereby only one instance can be around at a time... say, some sort of factory whereby it'll only allow one object at a time. A second call to the factory will return null, instead of a new object, say.
So far my (crappy) solution is to have a Factory type object as a nested class of the processor class:
class XmlJobListProcessor
{
private static volatile bool instanceExists = false;
public static class SingletonFactory
{
private static object lockObj = new object();
public static XmlJobListProcessor CreateListProcessor()
{
if (!instanceExists)
{
lock (lockObj)
{
if (!instanceExists)
{
instanceExists = true;
return new XmlJobListProcessor();
}
return null;
}
}
return null;
}
}
private XmlJobListProcessor() { }
....
}
I was thinking of writing an explicit destructor for the XmlJobListProcessor class that reset the 'instanceExists' field to false.
I Realise this is a seriously terrible design. The factory should be a class in its own right... it's only nested so that both it and the instance destructors can access the volatile boolean...
Anyone have any better ways to do this? Cheers
I know .NET 4 is not as widely used, but eventually it will be and you'll have:
private static readonly Lazy<XmlJobListProcessor> _instance =
new Lazy<XmlJobListProcessor>(() => new XmlJobListProcessor());
Then you have access to it via _instance.Value, which is initialized the first time it's requested.
Your original example uses double-check locking, which should be avoided at all costs.
See msdn Singleton implementation on how to do initialize the Singleton properly.
just make one and keep it around, don't destroy and create it every minute
"minimize the moving parts"
I would instance the class and keep it around. Certainly I wouldn't use a destructor (if you mean ~myInstance() )...that increases GC time. In addition, if a process takes longer than a minute, what do you do with the data that was suppose to be processed if you just return a null value?
Keep the instance alive, and possibly build a buffer mechanism to continue taking input while the processor class is busy. You can check to see:
if ( isBusy == true )
{
// add data to bottom of buffer
}
else
{
// call processing
}
I take everyone's point about not re-instantiating the processor object and BillW's point about a queue, so here is my bastardized mashup solution:
public static class PRManager
{
private static XmlJobListProcessor instance = new XmlJobListProcessor();
private static object lockobj = new object();
public static void ProcessList(SPList list)
{
bool acquired = Monitor.TryEnter(lockobj);
try
{
if (acquired)
{
instance.ProcessList(list);
}
}
catch (ArgumentNullException)
{
}
finally
{
Monitor.Exit(lockobj);
}
}
}
The processor is retained long-term as a static member (here, long term object retention is not a problem since it has no state variables etc.) If a lock has been acquired on lockObj, the request just isn't processed and the calling thread will go on with its business.
Cheers for the feedback guys. Stackoverflow will ensure my internship! ;D