I am not familiar with multithreading. Image I have a method to do some intensive search on a string, and return 2 lists of integers as out parameters.
public static void CalcModel(string s, out List<int> startPos, out List<int> len)
{
// Do some intensive search
}
The search on long string is very time consuming. So I want to split the string into several fragments, search with multithreads, and recombine the result (adjust the startPos accordingly).
How to integrate multithreading in this kinda process? Thanks
I forgot to mention the following two things:
I want to set a string length cutoff, and let the code to decide how many fragments it needs.
I had a hard time to associate the startPos of each fragments (on the original string) with the thread. How can I do that?
Rather than get too bogged down in details, generally, you send each thread a "return object." Once you've started all the threads, you block on them and wait until they are all finished.
While each thread is running, the thread modifies its work object and terminates when it has produced the output.
So roughly this (I can't tell exactly how you want to split it up, so perhaps you can modify this):
public class WorkItem {
public string InputString;
public List<int> startPos;
public List<int> len;
}
public static void CalcLotsOfStrings(string s, out List<int> startPos, out List<int> len)
{
WorkItem wi1 = new WorkItem();
wi1.InputString = s;
Thread t1 = new Thread(InternalCalcThread1);
t1.Start(wi1);
WorkItem wi2 = new WorkItem();
wi2.InputString = s;
Thread t2 = new Thread(InternalCalcThread2);
t2.Start(wi2);
// You can now wait for the threads to complete or start new threads
// When you're done, wi1 and wi2 will be filled with the updated data
// but make sure not to use them until the threads are done!
}
public static void InternalCalcThread1(object item)
{
WorkItem w = item as WorkItem;
w.startPos = new List<int>();
w.len = new List<int>();
// Do work here - populate the work item data
}
public static void InternalCalcThread2(object item)
{
// Do work here
}
You can try this, but I am not sure about the performance on these methods
Parallel.Invoke(
() => CalcModel(s,startPos, len),
() => CalcModel(s,startPos, len)
);
To create and run multiple threads is a very easy task. All you need is method which acts as a starting point for a thread.
Suppose you have the CalcModel method as defined in your original post then you only have to do:
// instantiate the thread with a method as starting point
Thread t = new Thread(new ThreadStart(CalcModel));
// run the thread
t.Start();
However if you want the thread to return some values you might apply a little trick because you can't return values directly like you do it with a return statement or out parameters.
You can 'wrap' the thread in its own class and let him store its data in the class's fields:
public class ThreadClass {
public string FieldA;
public string FieldB;
//...
public static void Run () {
Thread t = new Thread(new ThreadStart(_run));
t.Start();
}
private void _run() {
//...
fieldA = "someData";
fieldB = "otherData"
//...
}
}
That's only a very rough example to illustrate the idea. I doesn't include any parts for thread synchronization or thread control.
I would say the more difficult task would be to think about splitting your CalcModel method in a way that it can be parallelized and then maybe more important how the partially results can be joined together to form one single end solution.
Related
I have a queue of jobs which can be populated by multiple threads (ConcurrentQueue<MyJob>). I need to implement continuous execution of this jobs asynchronously(not by main thread), but only by one thread at the same time. I've tried something like this:
public class ConcurrentLoop {
private static ConcurrentQueue<MyJob> _concurrentQueue = new ConcurrentQueue<MyJob>();
private static Task _currentTask;
private static object _lock = new object();
public static void QueueJob(Job job)
{
_concurrentQueue.Enqueue(job);
checkLoop();
}
private static void checkLoop()
{
if ( _currentTask == null || _currentTask.IsCompleted )
{
lock (_lock)
{
if ( _currentTask == null || _currentTask.IsCompleted )
{
_currentTask = Task.Run(() =>
{
MyJob current;
while( _concurrentQueue.TryDequeue( out current ) )
//Do something
});
}
}
}
}
}
This code in my opinion have a problem: if task finnishing to execute(TryDequeue returns false but task have not been marked as completed yet) and in this moment i get a new job, it will not be executed. Am i right? If so, how to fix this
Your problem statement looks like a producer-consumer problem, with a caveat that you only want a single consumer.
There is no need to reimplement such functionality manually.
Instead, I suggest to use BlockingCollection -- internally it uses ConcurrentQueue and a separate thread for the consumption.
Note, that this may or may not be suitable for your use case.
Something like:
_blockingCollection = new BlockingCollection<your type>(); // you may want to create bounded or unbounded collection
_consumingThread = new Thread(() =>
{
foreach (var workItem in _blockingCollection.GetConsumingEnumerable()) // blocks when there is no more work to do, continues whenever a new item is added.
{
// do work with workItem
}
});
_consumingThread.Start();
Multiple producers (tasks or threads) can add work items to the _blockingCollection no problem, and no need to worry about synchronizing producers/consumer.
When you are done with producing task, call _blockingCollection.CompleteAdding() (this method is not thread safe, so it is advised to stop all producers beforehand).
Probably, you should also do _consumingThread.Join() somewhere to terminate your consuming thread.
I would use Microsoft's Reactive Framework Team's Reactive Extensions (NuGet "System.Reactive") for this. It's a lovely abstraction.
public class ConcurrentLoop
{
private static Subject<MyJob> _jobs = new Subject<MyJob>();
private static IDisposable _subscription =
_jobs
.Synchronize()
.ObserveOn(Scheduler.Default)
.Subscribe(job =>
{
//Do something
});
public static void QueueJob(MyJob job)
{
_jobs.OnNext(job);
}
}
This nicely synchronizes all incoming jobs into a single stream and pushes the execution on to Scheduler.Default (which is basically the thread-pool), but because it has serialized all input only one can happen at a time. The nice thing about this is that it releases the thread if there is a significant gap between the values. It's a very lean solution.
To clean up you just need call either _jobs.OnCompleted(); or _subscription.Dispose();.
Given some code like so
public class CustomCollectionClass : Collection<CustomData> {}
public class CustomData
{
string name;
bool finished;
string result;
}
public async Task DoWorkInParallel(CustomCollectionClass collection)
{
// collection can be retrieved from a DB, may not exist.
if (collection == null)
{
collection = new CustomCollectionClass();
foreach (var data in myData)
{
collection.Add(new CustomData()
{
name = data.Name;
});
}
}
// This part doesn't feel safe. Not sure what to do here.
var processTasks = myData.Select(o =>
this.DoWorkOnItemInCollection(collection.Single(d => d.name = o.Name))).ToArray();
await Task.WhenAll(processTasks);
await SaveModifedCollection(collection);
}
public async Task DoWorkOnItemInCollection(CustomData data)
{
await DoABunchOfWorkElsewhere();
// This doesn't feel safe either. Lock here?
data.finished = true;
data.result = "Parallel";
}
As I noted in a couple comments inline, it doesn't feel safe for me to do the above, but I'm not sure. I do have a collection of elements that I'd like to assign a unique element to each parallel task and have those tasks be able to modify that single element of the collection based on what work is done. End result being, I wanted to save the collection after individual, different elements have been modified in parallel. If this isn't a safe way to do it, how best would I go about this?
Your code is the right way to do this, assuming starting DoABunchOfWorkElsewhere() multiple times is itself safe.
You don't need to worry about your LINQ query, because it doesn't actually run in parallel. All it does is to invoke DoWorkOnItemInCollection() multiple times. Those invocations may work in parallel (or not, depending on your synchronization context and the implementation of DoABunchOfWorkElsewhere()), but the code you showed is safe.
Your above code should work without issue. You are passing off one item to each worker thread. I'm not so sure about the async attribute. You might just return a Task, and then in your method do:
public Task DoWorkOnItemInCollection(CustomData data)
{
return Task.Run(() => {
DoABunchOfWorkElsewhere().Wait();
data.finished = true;
data.result = "Parallel";
});
}
You might want to be careful, with large amount of items, you could overflow your max thread count with background threads. In this case, c# just deletes your threads, which can be difficult to debug later.
I have done this before, It might be easier if instead of handing the whole collection to some magic linq, rather do a classic consumer problem:
class ParallelWorker<T>
{
private Action<T> Action;
private Queue<T> Queue = new Queue<T>();
private object QueueLock = new object();
private void DoWork()
{
while(true)
{
T item;
lock(this.QueueLock)
{
if(this.Queue.Count == 0) return; //exit thread
item = this.Queue.DeQueue();
}
try { this.Action(item); }
catch { /*...*/ }
}
}
public void DoParallelWork(IEnumerable<T> items, int maxDegreesOfParallelism, Action<T> action)
{
this.Action = action;
this.Queue.Clear();
this.Queue.AddRange(items);
List<Thread> threads = new List<Thread>();
for(int i = 0; i < items; i++)
{
ParameterizedThreadStart threadStart = new ParameterizedThreadStart(DoWork);
Thread thread = new Thread(threadStart);
thread.Start();
threads.Add(thread);
}
foreach(Thread thread in threads)
{
thread.Join();
}
}
}
This was done IDE free, so there may be typos.
I'm going to make the suggestion that you use Microsoft's Reactive Framework (NuGet "Rx-Main") to do this task.
Here's the code:
public void DoWorkInParallel(CustomCollectionClass collection)
{
var query =
from x in collection.ToObservable()
from r in Observable.FromAsync(() => DoWorkOnItemInCollection(x))
select x;
query.Subscribe(x => { }, ex => { }, async () =>
{
await SaveModifedCollection(collection);
});
}
Done. That's it. Nothing more.
I have to say though, that when I tried to get your code to run it was full of bugs and issues. I suspect that the code you posted isn't your production code, but an example you wrote specifically for this question. I suggest that you try to make a running compilable example before posting.
Nevertheless, my suggestion should work for you with a little tweaking.
It is multi-threaded and thread-safe. And it does do cleanly save the modified collection when done.
I have multiple instances of a class that has a function that does some process that lasts more than an hour, and I need to allow only a max of 2 processes running at a time across all instances, and if the number of processes was 2 then it has to wait until the the value of running process goes under 2, so I came up with this
public class SomeClass
{
private static int _ProcessesRunningCount=0;
public int ProcessesRunningCount
{
get {return Interlocked.CompareExchange(ref _ProcessesRunningCount, 0, 0); }
}
public void StartProcessing()
{
if (ProcessesRunningCount < 2)
{
Interlocked.Increment(ref _ProcessesRunningCount);
Task.Factory.StartNew(() => Process());
}
else
{
//wait and start after _ProcessesRunningCount gets to less than 2
}
}
private void Process()
{
//Do the processing
System.Threading.Thread.Sleep(100000);
Interlocked.Decrement(ref _ProcessesRunningCount);
}
}
However I am not sure how to achieve the wait part, and not sure if that is a good way to do it, but I don't want to create a manager class that handles everything
example
var A = new SomeClass();
var B = new SomeClass();
var C = new SomeClass();
var D = new SomeClass();
A.StartProcessing(); //process will start
B.StartProcessing(); //process will start
C.StartProcessing(); //process will wait until _ProcessesRunningCount goes under 2
D.StartProcessing(); //process will wait until _ProcessesRunningCount goes under 2
You can use a semaphore to limit the number of processes you spin up. There's an example on MSDN that should fit right into your current design. A semaphore is similar to a mutex (lock), but it allows more than 1 thread to access the critical section. The Thread in the example will start a Process and should block until the process exits.
I have a program where I'm receiving events and want to process them in batches, so that all items that come in while I'm processing the current batch will appear in the next batch.
The simple TimeSpan and count based Buffer methods in Rx will give me multiple batches of items instead of giving me one big batch of everything that has come in (in cases when the subscriber takes longer than the specified TimeSpan or more than N items come in and N is greater than count).
I looked at using the more complex Buffer overloads that take Func<IObservable<TBufferClosing>> or IObservable<TBufferOpening> and Func<TBufferOpening, IObservable<TBufferClosing>>, but I can't find examples of how to use these, much less figure out how to apply them to what I'm trying to do.
Does this do what you want?
var xs = new Subject<int>();
var ys = new Subject<Unit>();
var zss =
xs.Buffer(ys);
zss
.ObserveOn(Scheduler.Default)
.Subscribe(zs =>
{
Thread.Sleep(1000);
Console.WriteLine(String.Join("-", zs));
ys.OnNext(Unit.Default);
});
ys.OnNext(Unit.Default);
xs.OnNext(1);
Thread.Sleep(200);
xs.OnNext(2);
Thread.Sleep(600);
xs.OnNext(3);
Thread.Sleep(400);
xs.OnNext(4);
Thread.Sleep(300);
xs.OnNext(5);
Thread.Sleep(900);
xs.OnNext(6);
Thread.Sleep(100);
xs.OnNext(7);
Thread.Sleep(1000);
My Result:
1-2-3
4-5
6-7
What you need is something to buffer the values and then when the worker
is ready it asks for the current buffer and then resets it. This can
be done with a combination of RX and Task
class TicTac<Stuff> {
private TaskCompletionSource<List<Stuff>> Items = new TaskCompletionSource<List<Stuff>>();
List<Stuff> in = new List<Stuff>();
public void push(Stuff stuff){
lock(this){
if(in == null){
in = new List<Stuff>();
Items.SetResult(in);
}
in.Add(stuff);
}
}
private void reset(){
lock(this){
Items = new TaskCompletionSource<List<Stuff>>();
in = null;
}
}
public async Task<List<Stuff>> Items(){
List<Stuff> list = await Items.Task;
reset();
return list;
}
}
then
var tictac = new TicTac<double>();
IObservable<double> source = ....
source.Subscribe(x=>tictac.Push(x));
Then in your worker
while(true){
var items = await tictac.Items();
Thread.Sleep(100);
for each (item in items){
Console.WriteLine(item);
}
}
The way I have done this before is to pull up the ObserveOn method in DotPeek/Reflector and take that queuing concept that it has and adapt it to our requirements. For example, in UI applications with fast ticking data (like finance) the UI thread can get flooded with events and sometimes it cant update quick enough. In these cases we want to drop all events except the last one (for a particular instrument). In this case we changed the internal Queue of the ObserveOn to a single value of T (look for ObserveLatestOn(IScheduler)). In your case you want the Queue, however you want to push the whole queue not just the first value. This should get you started.
Kind of an expansion of #Enigmativity's answer. I have used this to solve the problem:
public static IObservable<(Action ready, IReadOnlyList<T> values)> BufferUntilReady<T>(this IObservable<T> stream)
{
var gate = new BehaviorSubject<Guid>(Guid.NewGuid());
void Ready() => gate.OnNext(Guid.NewGuid());
return stream.Publish(shared => shared
.Buffer(gate.CombineLatest(shared, ValueTuple.Create)
.DistinctUntilChanged(new AnyEqualityComparer<Guid, T>()))
.Where(x => x.Any())
.Select(x => ((Action) Ready, (IReadOnlyList<T>) x)));
}
public class AnyEqualityComparer<T1, T2> : IEqualityComparer<(T1 a, T2 b)>
{
public bool Equals((T1 a, T2 b) x, (T1 a, T2 b) y) => Equals(x.a, y.a) || Equals(x.b, y.b);
public int GetHashCode((T1 a, T2 b) obj) => throw new NotSupportedException();
}
The subscriber receives a Ready() function to be called when ready to receive next buffer. I don't observe each buffer on the same thread to avoid cycles, but I guess you could break it some other place, if you need each buffer to be handled on the same thread.
I have a method named InitializeCRMService() which returns an object of IOrganizationService. Now I am defining a different method named GetConnection(string thread) which calls InitializeCRMService() based on the parameter passed to it. If the string passed to GetConnection is single it will start a single threaded instance of the IntializeCRMService() method, but if the string passed is multiple, I need to use a thread pool where I need to pass the method to QueueUserWorkItem. The method InitializeCRMService has no input parameters. It just returns a service object. Please find below the code block in the GetConnection method:
public void GetConnection(string thread)
{
ParallelOptions ops = new ParallelOptions();
if(thread.Equals("one"))
{
Parallel.For(0, 1, i =>
{
dynamic serviceObject = InitializeCRMService();
});
}
else if (thread.Equals("multi"))
{
// HERE I NEED TO IMPLEMENT MULTITHREADING USING THREAD POOL
// AND NOT PARALLEL FOR LOOP......
// ThreadPool.QueueUserWorkItem(new WaitCallback(InitializeCRMService));
}
}
Please note my method InitializeCRMService() has a return type of Service Object.
Please tell me how do I implement it.
Since you want to execute InitializeCRMService in the ThreadPool when a slot is available, and you are executing this only once, the solution depends on what you want to do with the return value of InitializeCRMService.
If you only want to ignore it, I have two options so far.
Option 1
public void GetConnection(string thread)
{
//I found that ops is not being used
//ParallelOptions ops = new ParallelOptions();
if(thread.Equals("one"))
{
Parallel.For(0, 1, i =>
{
//You don't really need to have a variable
/*dynamic serviceObject =*/ InitializeCRMService();
});
}
else if (thread.Equals("multi"))
{
ThreadPool.QueueUserWorkItem
(
new WaitCallback
(
(_) =>
{
//You don't really need to have a variable
/*dynamic serviceObject =*/ InitializeCRMService();
}
)
);
}
}
On the other hand, if you need to pass it somewhere to store it an reuse it later you can do it like this:
public void GetConnection(string thread)
{
//I found that ops is not being used
//ParallelOptions ops = new ParallelOptions();
if(thread.Equals("one"))
{
Parallel.For(0, 1, i =>
{
//It seems to me a good idea to take the same path here too
//dynamic serviceObject = InitializeCRMService();
Store(InitializeCRMService());
});
}
else if (thread.Equals("multi"))
{
ThreadPool.QueueUserWorkItem
(
new WaitCallback
(
(_) =>
{
Store(InitializeCRMService());
}
)
);
}
}
Where Store would be something like this:
private void Store(dynamic serviceObject)
{
//store serviceObject somewhere you can use it later.
//Depending on your situation you may want to
// set a flag or use a ManualResetEvent to notify
// that serviceObject is ready to be used.
//Any pre proccess can be done here too.
//Take care of thread affinity,
// since this may come from the ThreadPool
// and the consuming thread may be another one,
// you may need some synchronization.
}
Now, if you need to allow clients of your class to access serviceObject, you can take the following approach:
//Note: I marked it as partial because there may be other code not showed here
// in particular I will not write the method GetConnection again. That said...
// you can have it all in a single block in a single file without using partial.
public partial class YourClass
{
private dynamic _serviceObject;
private void Store(dynamic serviceObject)
{
_serviceObject = serviceObject;
}
public dynamic ServiceObject
{
get
{
return _serviceObject;
}
}
}
But this doesn't take care of all the cases. In particular if you want to have thread waiting for serviceObject to be ready:
public partial class YourClass
{
private ManualResetEvent _serviceObjectWaitHandle = new ManualResetEvent(false);
private dynamic _serviceObject;
private void Store(dynamic serviceObject)
{
_serviceObject = serviceObject;
//If you need to do some work as soon as _serviceObject is ready...
// then it can be done here, this may still be the thread pool thread.
//If you need to call something like the UI...
// you will need to use BeginInvoke or a similar solution.
_serviceObjectWaitHandle.Set();
}
public void WaitForServiceObject()
{
//You may also expose other overloads, just for convenience.
//This will wait until Store is executed
//When _serviceObjectWaitHandle.Set() is called
// this will let other threads pass.
_serviceObjectWaitHandle.WaitOne();
}
public dynamic ServiceObject
{
get
{
return _serviceObject;
}
}
}
Still, I haven't covered all the scenarios. For intance... what happens if GetConnection is called multiple times? We need to decide if we want to allow that, and if we do, what do we do with the old serviceObject? (do we need to call something to dismiss it?). This can be problematic, if we allow multiple threads to call GetConnection at once. So by default I will say that we don't, but we don't want to block the other threads either...
The solution? Follows:
//This is another part of the same class
//This one includes GetConnection
public partial class YourClass
{
//1 if GetConnection has been called, 0 otherwise
private int _initializingServiceObject;
public void GetConnection(string thread)
{
if (Interlocked.CompareExchange(ref _initializingServiceObject, 1, 0) == 0)
{
//Go on, it is the first time GetConnection is called
//I found that ops is not being used
//ParallelOptions ops = new ParallelOptions();
if(thread.Equals("one"))
{
Parallel.For(0, 1, i =>
{
//It seems to me a good idea to take the same path here too
//dynamic serviceObject = InitializeCRMService();
Store(InitializeCRMService());
});
}
else if (thread.Equals("multi"))
{
ThreadPool.QueueUserWorkItem
(
new WaitCallback
(
(_) =>
{
Store(InitializeCRMService());
}
)
);
}
}
}
}
Finally, if we are allowing multiple thread to use _serviceObject, and _serviceObject is not thread safe, we can run into trouble. Using monitor or using a read write lock are two alternatives to solve that.
Do you remember this?
public dynamic ServiceObject
{
get
{
return _serviceObject;
}
}
Ok, you want to have the caller access the _serviceObject when it is in a context that will prevent others thread to enter (see System.Threading.Monitor), and make sure it stop using it, and then leave this context I mentioned before.
Now consider that the caller thread could still store a copy of _serviceObject somewhere, and then leave the syncrhonization, and then do something with _serviceObject, and that may happen when another thread is using it.
I'm used to think of every corner case when it comes to threading. But if you have control over the calling threads, you can do it very well with just the property showed above. If you don't... let's talk about it, I warn you, it can be extensive.
Option 2
This is a totally different behaviour, the commend Damien_The_Unbeliever made in your question made me think that you may have intended to return serviceObject. In that case, it is not shared among threads, and it is ok to have multiple serviceObject at a time. And any synchronization needed is left to the caller.
Ok, this may be what you have been looking for:
public void GetConnection(string thread, Action<dynamic> callback)
{
if (ReferenceEquals(callback, null))
{
throw new ArgumentNullException("callback");
}
//I found that ops is not being used
//ParallelOptions ops = new ParallelOptions();
if(thread.Equals("one"))
{
Parallel.For(0, 1, i =>
{
callback(InitializeCRMService());
});
}
else if (thread.Equals("multi"))
{
ThreadPool.QueueUserWorkItem
(
new WaitCallback
(
(_) =>
{
callback(InitializeCRMService());
}
)
);
}
}
How should the callback look? Well, as soon as it is not shared between threads it is ok. Why? Because each thread that calls GetConnection passes it's own callback Action, and will recieve a different serviceObject, so there is no risk that what one thread does to it affect what the other does to its (since it is not the same serviceObject).
Unless you want to have one thread call this and then shared it with other threads, in which case, it is a problem of the caller and it will be resolved in another place in another moment.
One last thing, you could use an enum to represent the options you currently pass in the string thread. In fact, since there are only two options you may consider using a bool, unless they may appear more cases in the future.