I have a program that we'd like to multi-thread at a certain point. We're using CSLA for our business rules. At a one location of our program we are iterating over a BusinessList object and running some sanity checks against the data one row at a time. When we up the row count to about 10k rows it takes some time to run the process (about a minute). Naturally this sounds like a perfect place to use a bit of TPL and make this multi-threaded.
I've done a fair amount of multithreaded work through the years, so I understand the pitfalls of switching from single to multithreaded code. I was surprised to find that the code bombed within the CSLA routines themselves. It seems to be related to the code behind the CSLA PropertyInfo classes.
All of our business object properties are defined like this:
public static readonly PropertyInfo<string> MyTextProperty = RegisterProperty<string>(c => c.MyText);
public string MyText {
get { return GetProperty(MyTextProperty); }
set { SetProperty(MyTextProperty, value); }
}
Is there something I need to know about multithreading and CSLA? Are there any caveats that aren't found in any written documentation (I haven't found anything as of yet).
--EDIT---
BTW: the way I implemented my multithreading via throwing all the rows into a ConcurrentBag and then spawning 5 or so tasks that just grab objects from the bag till the bag is empty. So I don't think the problem is in my code.
As you've discovered, the CSLA.NET framework is not thread-safe.
To solve your particular problem, I would make use of the Wintellect Power Threading library; either the AsyncEnumerator/SyncGate combo or the ReaderWriterGate on its own.
The Power Threading library will allow you queue 'read' and 'write' requests to a shared resource (your CSLA.NET collection). At one moment in time, only a single 'write' request will be allowed access to the shared resource, all without thread-blocking the queued 'read' or 'write' requests. Its very clever and super handy for safely accessing shared resources from multiple threads. You can spin up as many threads as you wish and the Power Threading library will synchronise the access to your CSLA.NET collection.
Related
I have a background service IHostedService in dotnet core 3.1 that takes requests from 100s of clients(machines in a factory) using sockets (home rolled). My issue is that multiple calls can come in on different threads to the same method on a class which has access to an object (shared state). This is common in the codebase. The requests also have to be processed in the correct order.
The reason that this is not in a database is due to performance reasons (real time system). I know I can use a lock, but I don't want to have locks all over the code base.
What is a standard way to handle this situation. Do you use an in-memory database? In-memory cache? Or do I just have to add locks everywhere?
public class Machine
{
public MachineState {get; set;}
// Gets called by multiple threads from multiple clients
public bool CheckMachineStatus()
{
return MachineState.IsRunning;
}
// Gets called by multiple threads from multiple clients
public void SetMachineStatus()
{
MachineState = Stopped;
}
}
Update
Here's an example. I have a console app that talks to a machine via sockets, for weighing products. When the console app initializes it will load data into memory (information about the products being weighed). All of this is done on the main thread, to keep data integrity.
When a call comes in from the weigh-er on Thread 1, it will get switched to the main thread to access the product information, and to finish any other work like raising events for other parts of the system.
Currently this switching from Thread 1,2, ...N to the main thread is done by a home rolled solution, and was done to avoid having locking code all over the code base. This was written in .Net 1.1 and since moving to dotnet core 3.1. I thought there might be a framework, library, tool, technique etc that might handle this for us, or just a better way.
This is an existing system that I'm still learning. Hope this makes sense.
Using an in-memory database is an option, as long as you are willing to delegate all concurrency-inducing situations to the database, and do nothing using code. For example if you must update a value in the database depending on some condition, then the condition should be checked by the database, not by your own code.
Adding locks everywhere is also an option, that will almost certainly lead to unmaintanable code quite quickly. The code will probably be riddled with hidden bugs from the get-go, bugs that you will discover one by one over time, usually under the most unfortunate of circumstances.
You must realize that you are dealing with a difficult problem, with no magic solutions available. Managing shared state in a multithreaded application has always been a source of pain.
My suggestion is to encapsulate all this complexity inside thread-safe classes, that the rest of your application can safely invoke. How you make these classes thread-safe depends on the situation.
Using locks is the most flexible option, but not always the most efficient because it has the potential of creating contention.
Using thread-safe collections, like the ConcurrentDictionary for example, is less flexible because the thread-safety guarantees they offer are limited to the integrity of their internal state. If for example you must update one collection based on a condition obtained from another collection, then the whole operation can not be made atomic by just using thread-safety collections. On the other hand these collections offer better performance than the simple locks.
Using immutable collections, like the ImmutableQueue for example, is another interesting option. They are less efficient both memory and CPU wise than the concurrent collections (adding/removing is in many cases O(Log n) instead of O(1)), and not more flexible than them, but they are very efficient specifically at providing snapshots of actively processed data. For updating atomically an immutable collection, there is the handy ImmutableInterlocked.Update method available. It updates a reference of an immutable collection with an updated version of the same collection, without using locks. In case of contention with other threads it may invoke the supplied transformation multiple times, until it wins the race.
In a project of windows services (C# .Net Platform), I need a suggestion.
In the project I have class named Cache, in which I keep some data that I need frequently. There is a thread that updates cache after each 30 minutes. Where as there are multiple threads which use cache data.
In the cache class, there are getter and setter functions which are used by user threads and cache updater thread respectively. No one uses data objects like tables directly, because they are private members.
From the above context, do you think I should use locking functionality in the cache class?
The effects of not using locks when writing to a shared memory location (like cache) really depend on the application. If the code was used in banking software the results could be catastrophic.
As a rule o thumb - when multiple threads access the same location, even if only one tread writes and all the other read, you should use locks (for write operation). What can happen is that one thread will start reading data, get swiped out by the updater thread; So it'll potentially end up using a mixture of old and new data. If that really as an impact depends on the application and how sensible it is.
Key Point: If you don't lock on the reads there's a chance your read won't see the changes. A lock will force your read code to get values from main memory rather than pulling data from a cache or register. To avoid actually locking you could use Thread.MemoryBarrier(), which will do the same job without overhead of actually locking anything.
Minor Points: Using lock would prevent a read from getting half old data and half new data. If you are reading more than one field, I recommend it. If you are really clever, you could keep all the data in an immutable object and return that object to anyone calling the getter and so avoid the need for a lock. (When new data comes in, you create a new immutable object, then replace the old with the new in one go. Use a lock for the actual write, or, if you're still feeling really clever, make the field referencing the object volatile.)
Also: when your getter is called, remember it's running on many other threads. There's a tendency to think that once something is running the Cache class's code it's all on the same thread, and it's not.
I have a thread that is doing some processing. I would like to be able to stop this thread during execution, somehow save it's position (and the state of the objects it is operating on), and then continue from that that location at a later date (so after my computer has restarted).
In C# this isn't possible right? And if not, what is the proper design to achieve this functionality?
So my original wish was to have something like
class Foo : Task {
void override Execute(){
//example task
while(someCondition){
...do stuff...
}
}
}
and be able to pause/save at any point within that function. When the function ends, everyone knows it is complete.
As an alternative, perhaps this is the better way to do it
class Foo : Task {
void override Execute(State previousState){
//set someCondition, other stuff
//IsPaused = false;
previousState.setUpStuff();
//example task
while(someCondition){
...do stuff...
if(base.IsPauseRequested){
base.UpdateState(); //this would be a bit different but just to get the idea
base.IsPaused = true;
return;
}
}
base.RaiseNotifyTaskComplete();
}
}
So the first case is a lot simpler for other people who need to inherit my base class as they only have to implement the Execute function. However, in the second case, they have to consider the previous state and also manage where good pause points exist. Is there a better way to do this?
What you want could be accomplished by a serializable state machine. Basically, you change your local variables into fields in a class and add a field that keeps the state – the position in the code of the original method. This class will be [Serializable] and it will have one method like MoveNext(), which does a piece of work and returns. When working, you call this method in a loop. When you want to stop, you wait until the current call finishes, break out of the loop and then serialize the state machine to the disk.
Based on the complexity of the original method and how often you do want to “checkpoint” (when the MoveNext() method returns and you can choose to continue or not), the state machine could be as simple as having just one state, or quite complicated.
The C# compiler does very similar transformation when it's compiling iterator blocks (and C# 5's async methods). But it's not meant for this purpose and it doesn't mark the generated class [Serializable], so I don't think you could use it. Although reading some articles about how this transformation is actually done might help you do the same yourself.
This can be farily easily achieved using WF...it has all the plumbing to explicitly pause and resume tasks (and it takes care of the persistence for you). Check out this link.
Probably not be suitable for what you want, but maybe worth investigation.
I cannot answer for C#, but in general this question is called Persistence and there is no general easy way to solve it (unless the language and operating system provides it). You cannot reason in terms of one thread, because the thread you are considering is referencing some other (global or heap) data. That question is also related to garbage collection (because both garbage collectors and persistence mechanism tend to scan the entire live heap) and databases (notably NoSQL ones). So read the GC handbook and textbooks on operating systems, see also websites such as OSDEV.
See also this draft report, read Queinnec's Lisp in Small Pieces book, and study the source code of existing persistent software, including RefPerSys (and of course Mono, an open source implementation for C#)
Things become even more challenging when you want to persist a network (or graph) of processes in the context of cloud computing. Then search the web for agent oriented programming.
At the conceptual level your question is related to continuation-passing style and to callbacks, a topic on which you'll find many ACM-sponsored conferences (e.g. PLDI, ISMM)
You could probably set something like this up with an expression tree or method chain. Set up lambdas or small methods for the smallest "atomic" units of work that cannot be interrupted. Then, chain them together with a "supervisor" that will execute each of these chunks in order, but can be told to stop what it is doing in between instructions, save its position along the chain, return and wait to be resumed. If you wanted the name of a pattern, you might call it a variation on Visitor.
You want to serialize the state of your object when the process is paused, and then deserialize it once it restarts.
A very naive implementation would be to create two static methods on your Task class Serialize/Deserialize which use Linq to read/write the state of your object in XML. When a task is paused call Serialize which dumps the object as xml to disk, when it's restarted call Deserialize which reads the xml. There is also the XmlSerializer class which may be more robust.
Regardless this is a complex problem, with a conceptually simple solution.
I'm presently working on a side-by-side application (C#, WinForms) that injects messages into an application via COM.
This application uses multiple foreach statements, polling entity metrics from the application that accepts COM. A ListBox is used to list each entity, and when a user selects one from this list, a thread is created and executed, calling a method that retrieves the required data.
When a user selects a different entity from the list, the running thread is aborted and a new thread is created for the newly selected entity.
I've spent a day looking into my threading and memory usage, and have come to a conclusion that everything is fine. Never is there more than 6 threads running concurrently (all unique for executing different members), and via the Windows task manager, my application never peaks >10 CPU%, 29M MEM.
The only thing coming to mind is that the COM object you are using is designed to run in a single threaded apartment (STA). If that is the case then it will not matter how many threads you start; they will all eventually get serialized when calling into this COM object. And if your machine has multiple cores then you will definitely see less than 100% usage. 10% seems awfully low though. I would not be surprised to see something around 25% which would basically represent one pegged core of a quad core system, but the 10% figure might require another explanation. If your code or the COM object itself is waiting for IO operations to complete that might explain more of the low throughput.
In WinForms you can do SuspendLayout() and ResumeLayout(). If you are inserting a lot of items (or in general doing a lot of screen updates) you would first call SuspectLayout() then do all of your updates and then ResumeLayout().
You don't mention what's slow, so it's very difficult to say anything with certainty. However, since you say that you insert items into a listbox, I'll make a complete guess and ask how many items is that each time? It can be very slow to insert a lot of items into a list box.
If that's the case, you could speed it up by instead of listing each entity in one listbox, only list a set of categories there and then when the user selects a category you'll populate another listbox with the entities related to that category.
I've seen a lot of discussion about this subject on here.
If i have a static class w/ static methods that connects to a database or a server, is it a bad idea to use this in a multi-user environment (like a web page)? Would this make a new user's tread wait for previous users' threads to finish their calls before accepting a new one?
What would be the implications of this with multi-threading, also?
Thx!
If each static method is fully responsible for acquiring its resources and then disposing its resources within the scope of the method call (no shared state), then you shouldn't have any problem with threading that you wouldn't have using instance classes. I would suggest, however, that the bigger problem is that a reliance on public static methods (in static or non-static classes) creates many other design problems down the road.
First of all, you're binding very tightly to an implementation, which is always bad.
Second, testing all of the classes that depend on your static methods becomes very difficult to do, because you're locked to a single implementation.
Third, it does become very easy to create non-thread safe methods since static methods can only have static state (which is shared across all method calls).
Static methods do not have any special behaviour in respect to multithreading. That is, you can expect several "copies" of the method running at the same time. The same goes for static variables - different threads can access them all at once, there is no waiting there. And unless you're careful, this can create chaos.
Yes it's a bad idea.
When you use one connection for all your users if someone performs an action that requires, lets say 15 seconds, just for database access, all other users will have to wait in order to connect to the database
A little weirded out by this question. As to why you have so much static going on.
But I think you're asking about threading issues, so I would say go check out some of the docs on threading
http://msdn.microsoft.com/en-us/library/c5kehkcz(VS.80).aspx
Static is only defining the scope where the method is defined, and how it is bound / called. It has nothing to do with multi threading.
You need to be careful with static fields. They are shared by all threads. Threads are not waiting for each other, but you need locks to make it work.
But if your application is a bit more complex than Hello World, you should consider to have you methods not static but to use object oriented patterns.
If you do it right, it won't be a problem. If you do it wrong, it has the potential force sequential access to the resource.
Sometimes the difference between right and wrong can be very subtle and hard to spot, but the main thing is that no method should rely on or lock any "state" (members) of the class.
If you use one static connection to access the database, you will have to synchronize method calls. Multiple threads asking the database for data over a single connection will ... ehhmmm ... mess things up. So you are serializing all threads' data access and this will have a large impact on the performance.
If every call opens its own connection, you do not need to serialize all threads because there is no shared connection. Creating a connection per request is still an expensive design.
If you use a static connection pool you will reduce this performance impact because you only need to serialize the access to the connection pool.
Further, statics are in general not a good design decission - they make unit testing very complicated. You should consider using the Singleton or Monostate pattern.
I use static method for lookup objects. I can manage all lookups objects in one place (using caching) for the asp.net application and all methods call it by using static method.
By this way, I do not need to instantiate lookups objects everytime I need it and it reduce the need to call DB for performance enhancement.