hi i am working on an assignment and i should implement a queue which handles jobs waiting to be processed (producer-consumer problem). I have to develop a better queue that works more efficiently than the FIFO queue. There are parameters that describe the waiting time before the starvation occurs, the time they need to process after the queue is over for them. consumers come at a specified time, can wait for specified time and they take some time to execute whatever they wanna do when their turn has come. can you help me with a better queue rather than FIFO method?
First of all you are trying to solve different problems at the same time, if you want to improve the performance of the regular queue, you can implement a queue based on a priority of the elements(a heap) if you want to maintain the priority of the regular queue you can put a priority based on an integer, increasing that number every time you add an element into the heap.
Here I am attaching the first link that I found on google for
Priority queue. The order of the insertion is O(log n) if you use a Binary Heap
Now if you want to implement that queue allowing concurrency, you need to isolate the common resource(for example the basic structure where the heap store the elements).
Albahari is a good reference to see how producer-consumer works with the concurrency.
And here are all the classes that you can use to implement the concurrency for producer-consumer Concurrency sheet
I am adding an example with one of those Types
//BlockingCollection with a fix number of Products to put, it works with 10 items max on the collection
class Program
{
private static int counter = 1;
private static BlockingCollection<Product> products =
new BlockingCollection<Product>(10);
static void Main(string[] args)
{
//three producers
Task.Run(() => Producer());
Task.Run(() => Producer());
Task.Run(() => Producer());
Task.Run(() => Consumer());
Console.ReadLine();
}
static void Producer()
{
while (true)
{
var product = new Product()
{
Number = counter,
Name = "Product " + counter++
};
//Adding one element
Console.WriteLine("Producing: " + product);
products.Add(product);
Thread.Sleep(2000);
}
}
static void Consumer()
{
while (true)
{
//wait until exist one element
if (products.Count == 0)
continue;
var product = products.Take();
Console.WriteLine("Consuming: " + product);
Thread.Sleep(2000);
}
}
}
public class Product
{
public int Number { get; set; }
public string Name { get; set; }
public override string ToString()
{
return Name;
}
}
Related
I have an IEnumerable<customClass> object that has roughly 10-15 entries, so not a lot, but I'm running into a System.IO.FileNotFoundException when I try and do
Parallel.Foreach(..some linq query.., object => { ...stuff....});
with the enumerable. Here is the code I have that sometimes works, other times doesn't:
IEnumerable<UserIdentifier> userIds = script.Entries.Select(x => x.UserIdentifier).Distinct();
await Task.Factory.StartNew(() =>
{
Parallel.ForEach(userIds, async userId =>
{
Stopwatch watch = new Stopwatch();
watch.Start();
_Log.InfoFormat("user identifier: {0}", userId);
await Task.Factory.StartNew(() =>
{
foreach (ScriptEntry se in script.Entries.Where(x => x.UserIdentifier.Equals(userId)))
{
// // Run the script //
_Log.InfoFormat("waiting {0}", se.Delay);
Task.Delay(se.Delay);
_Log.InfoFormat("running SelectionInformation{0}", se.SelectionInformation);
ExecuteSingleEntry(se);
_Log.InfoFormat("[====== SelectionInformation {1} ELAPSED TIME: {0} ======]", watch.Elapsed,
se.SelectionInformation.Verb);
}
});
watch.Stop();
_Log.InfoFormat("[====== TOTAL ELAPSED TIME: {0} ======]", watch.Elapsed);
});
});
When the function ExecuteSingleEntry is ran, there is a function a few calls deep within that function that creates a temp directory and files. It seems to me, that when I run the parallel.foreach the function is getting slammed at once by numerous calls (I'm testing 5 at once currently but need to handle about 10) and isn't creating some of the files I need. But if I hit a break point in the file creation function and just F5 every time it gets hit I don't have any problems with a file not found exception being thrown.
So, my question is, how can I achieve running a subset of my scripts.Entries in parallel based on the user id within the script entries with a delay of 1 second between each different user id entries being started?
and a script entry is like:
UserIdentifier: 141, SelectionInformation: class of stuff, Ids: list of EntryIds, Names: list of Entry Names
And each user identifier can appear 1 or more times in the array. I want to start all the different user identifiers, more or less, at once. Then Task out the different SelectionInformation's tied to a script entry.
scripts.Entries is an array of ScriptEntry, which is as follows:
[DataMember]
public TimeSpan Delay { get; set; }
[DataMember]
public SelectionInformation Selection { get; set; }
[DataMember]
public long[] Ids { get; set; }
[DataMember]
public string Names { get; set; }
[DataMember]
public long UserIdentifier { get; set; }
I referenced: Parallel.ForEach vs Task.Factory.StartNew to obtain the
Task.Factory.StartNew(() => Parallel.Foreach({ }) ) so my UI doesn't lock up on me
There are a few principles to apply:
Prefer Task.Run over Task.Factory.StartNew. I describe on my blog why StartNew is dangerous; Run is a much safer, more modern alternative.
Don't pass an async lambda to Parallel.ForEach. It doesn't make sense, and it won't work right.
Task.Delay doesn't do anything by itself. You either have to await it or use the synchronous version (Thread.Sleep).
(In fact, in your case, the internal StartNew is meaningless; it's already parallel, and the code - running on a thread pool thread - is trying to start a new operation on a thread pool thread and immediately asynchronously await it???)
After applying these principles:
await Task.Run(() =>
{
Parallel.ForEach(userIds, userId =>
{
Stopwatch watch = new Stopwatch();
watch.Start();
_Log.InfoFormat("user identifier: {0}", userId);
foreach (ScriptEntry se in script.Entries.Where(x => x.UserIdentifier.Equals(userId)))
{
// // Run the script //
_Log.InfoFormat("waiting {0}", se.Delay);
Thread.Sleep(se.Delay);
_Log.InfoFormat("running SelectionInformation{0}", se.SelectionInformation);
ExecuteSingleEntry(se);
_Log.InfoFormat("[====== SelectionInformation {1} ELAPSED TIME: {0} ======]", watch.Elapsed,
se.SelectionInformation.Verb);
}
watch.Stop();
_Log.InfoFormat("[====== TOTAL ELAPSED TIME: {0} ======]", watch.Elapsed);
});
});
Scope:
I am currently implementing an application that uses Amazon SQS Service as a provider of data for this program to process.
Since I need a parallel processing over the messages dequeued from this queue, this is what I've did.
Parallel.ForEach (GetMessages (msgAttributes), new ParallelOptions { MaxDegreeOfParallelism = threadCount }, message =>
{
// Processing Logic
});
Here's the header of the "GetMessages" method:
private static IEnumerable<Message> GetMessages (List<String> messageAttributes = null)
{
// Dequeueing Logic... 10 At a Time
// Yielding the messages to the Parallel Loop
foreach (Message awsMessage in messages)
{
yield return awsMessage;
}
}
How will this work ?:
My initial thought about how this would work was that the GetMessagesmethod would be executed whenever the thread's had no work (or a good number of threads had no work, something like an internal heuristic to measure this). That being said, to me, the GetMessages method would than, distribute the messages to the Parallel.For working threads, which would process the messages and wait for the Parallel.For handler to give them more messages to work.
Problem? I was wrong...
The thing is that, I was wrong. Still, I have no idea on what's happening in this situation.
The number of messages being dequeued is way too high, and it grews by powers of 2 every time they get dequeued. The dequeueing count (messsages) goes as following:
Dequeue is Called: Returns 80 Messages
Dequeue is Called: Returns 160 Messages
Dequeue is Called: Returns 320 Messages (and so forth)
After a certain point, the number of messages being dequeued, or, in this case, waiting to be processed by my application is too high and I end up running out of memory.
More Information:
I am using thread-safe InterLocked operations to increment counters mentioned above.
The number of threads being used is 25 (for the Parallel.Foreach)
Each "GetMessages" will return up to 10 messages (as an IEnumerable, yielded).
Question: What exactly is happening on this scenario ?
I am having a hard-time trying to figure out what exactly is going on. Is my GetMessages method being invoked by each thread once it finishes the "Processing Loop", hence, leading to more and more messages being dequeued ?
Is the call to the "GetMessages", made by a single thread, or is it being called by multiple threads ?
I think there's an issue with Parallel.ForEach partitioning... Your question is a typical producer / consumer scenario. For such a case, you should have independent logics for dequeuing on one side, and processing on the other. It will respect separation of concerns, and will simplify debugging.
BlockingCollection<T> will let you to separate boths : on one side, you add items to be processed, and on the other, you consume them. Here's an example of how to implement it :
You will need the ParallelExtensionsExtras nuget package for BlockingCollection<T> workload partitioning (.GetConsumingEnumerable() in the process method).
public static class ProducerConsumer
{
public static ConcurrentQueue<String> SqsQueue = new ConcurrentQueue<String>();
public static BlockingCollection<String> Collection = new BlockingCollection<String>();
public static ConcurrentBag<String> Result = new ConcurrentBag<String>();
public static async Task TestMethod()
{
// Here we separate all the Tasks in distinct threads
Task sqs = Task.Run(async () =>
{
Console.WriteLine("Amazon on thread " + Thread.CurrentThread.ManagedThreadId.ToString());
while (true)
{
ProducerConsumer.BackgroundFakedAmazon(); // We produce 50 Strings each second
await Task.Delay(1000);
}
});
Task deq = Task.Run(async () =>
{
Console.WriteLine("Dequeue on thread " + Thread.CurrentThread.ManagedThreadId.ToString());
while (true)
{
ProducerConsumer.DequeueData(); // Dequeue 20 Strings each 100ms
await Task.Delay(100);
}
});
Task process = Task.Run(() =>
{
Console.WriteLine("Process on thread " + Thread.CurrentThread.ManagedThreadId.ToString());
ProducerConsumer.BackgroundParallelConsumer(); // Process all the Strings in the BlockingCollection
});
await Task.WhenAll(c, sqs, deq, process);
}
public static void DequeueData()
{
foreach (var i in Enumerable.Range(0, 20))
{
String dequeued = null;
if (SqsQueue.TryDequeue(out dequeued))
{
Collection.Add(dequeued);
Console.WriteLine("Dequeued : " + dequeued);
}
}
}
public static void BackgroundFakedAmazon()
{
Console.WriteLine(" ---------- Generate 50 items on amazon side ---------- ");
foreach (var data in Enumerable.Range(0, 50).Select(i => Path.GetRandomFileName().Split('.').FirstOrDefault()))
SqsQueue.Enqueue(data + " / ASQS");
}
public static void BackgroundParallelConsumer()
{
// Here we stay in Parallel.ForEach, waiting for data. Once processed, we are still waiting the next chunks
Parallel.ForEach(Collection.GetConsumingEnumerable(), (i) =>
{
// Processing Logic
String processedData = "Processed : " + i;
Result.Add(processedData);
Console.WriteLine(processedData);
});
}
}
You can try it from a console app like this :
static void Main(string[] args)
{
ProducerConsumer.TestMethod().Wait();
}
I have a program where I'm receiving events and want to process them in batches, so that all items that come in while I'm processing the current batch will appear in the next batch.
The simple TimeSpan and count based Buffer methods in Rx will give me multiple batches of items instead of giving me one big batch of everything that has come in (in cases when the subscriber takes longer than the specified TimeSpan or more than N items come in and N is greater than count).
I looked at using the more complex Buffer overloads that take Func<IObservable<TBufferClosing>> or IObservable<TBufferOpening> and Func<TBufferOpening, IObservable<TBufferClosing>>, but I can't find examples of how to use these, much less figure out how to apply them to what I'm trying to do.
Does this do what you want?
var xs = new Subject<int>();
var ys = new Subject<Unit>();
var zss =
xs.Buffer(ys);
zss
.ObserveOn(Scheduler.Default)
.Subscribe(zs =>
{
Thread.Sleep(1000);
Console.WriteLine(String.Join("-", zs));
ys.OnNext(Unit.Default);
});
ys.OnNext(Unit.Default);
xs.OnNext(1);
Thread.Sleep(200);
xs.OnNext(2);
Thread.Sleep(600);
xs.OnNext(3);
Thread.Sleep(400);
xs.OnNext(4);
Thread.Sleep(300);
xs.OnNext(5);
Thread.Sleep(900);
xs.OnNext(6);
Thread.Sleep(100);
xs.OnNext(7);
Thread.Sleep(1000);
My Result:
1-2-3
4-5
6-7
What you need is something to buffer the values and then when the worker
is ready it asks for the current buffer and then resets it. This can
be done with a combination of RX and Task
class TicTac<Stuff> {
private TaskCompletionSource<List<Stuff>> Items = new TaskCompletionSource<List<Stuff>>();
List<Stuff> in = new List<Stuff>();
public void push(Stuff stuff){
lock(this){
if(in == null){
in = new List<Stuff>();
Items.SetResult(in);
}
in.Add(stuff);
}
}
private void reset(){
lock(this){
Items = new TaskCompletionSource<List<Stuff>>();
in = null;
}
}
public async Task<List<Stuff>> Items(){
List<Stuff> list = await Items.Task;
reset();
return list;
}
}
then
var tictac = new TicTac<double>();
IObservable<double> source = ....
source.Subscribe(x=>tictac.Push(x));
Then in your worker
while(true){
var items = await tictac.Items();
Thread.Sleep(100);
for each (item in items){
Console.WriteLine(item);
}
}
The way I have done this before is to pull up the ObserveOn method in DotPeek/Reflector and take that queuing concept that it has and adapt it to our requirements. For example, in UI applications with fast ticking data (like finance) the UI thread can get flooded with events and sometimes it cant update quick enough. In these cases we want to drop all events except the last one (for a particular instrument). In this case we changed the internal Queue of the ObserveOn to a single value of T (look for ObserveLatestOn(IScheduler)). In your case you want the Queue, however you want to push the whole queue not just the first value. This should get you started.
Kind of an expansion of #Enigmativity's answer. I have used this to solve the problem:
public static IObservable<(Action ready, IReadOnlyList<T> values)> BufferUntilReady<T>(this IObservable<T> stream)
{
var gate = new BehaviorSubject<Guid>(Guid.NewGuid());
void Ready() => gate.OnNext(Guid.NewGuid());
return stream.Publish(shared => shared
.Buffer(gate.CombineLatest(shared, ValueTuple.Create)
.DistinctUntilChanged(new AnyEqualityComparer<Guid, T>()))
.Where(x => x.Any())
.Select(x => ((Action) Ready, (IReadOnlyList<T>) x)));
}
public class AnyEqualityComparer<T1, T2> : IEqualityComparer<(T1 a, T2 b)>
{
public bool Equals((T1 a, T2 b) x, (T1 a, T2 b) y) => Equals(x.a, y.a) || Equals(x.b, y.b);
public int GetHashCode((T1 a, T2 b) obj) => throw new NotSupportedException();
}
The subscriber receives a Ready() function to be called when ready to receive next buffer. I don't observe each buffer on the same thread to avoid cycles, but I guess you could break it some other place, if you need each buffer to be handled on the same thread.
I am not familiar with multithreading. Image I have a method to do some intensive search on a string, and return 2 lists of integers as out parameters.
public static void CalcModel(string s, out List<int> startPos, out List<int> len)
{
// Do some intensive search
}
The search on long string is very time consuming. So I want to split the string into several fragments, search with multithreads, and recombine the result (adjust the startPos accordingly).
How to integrate multithreading in this kinda process? Thanks
I forgot to mention the following two things:
I want to set a string length cutoff, and let the code to decide how many fragments it needs.
I had a hard time to associate the startPos of each fragments (on the original string) with the thread. How can I do that?
Rather than get too bogged down in details, generally, you send each thread a "return object." Once you've started all the threads, you block on them and wait until they are all finished.
While each thread is running, the thread modifies its work object and terminates when it has produced the output.
So roughly this (I can't tell exactly how you want to split it up, so perhaps you can modify this):
public class WorkItem {
public string InputString;
public List<int> startPos;
public List<int> len;
}
public static void CalcLotsOfStrings(string s, out List<int> startPos, out List<int> len)
{
WorkItem wi1 = new WorkItem();
wi1.InputString = s;
Thread t1 = new Thread(InternalCalcThread1);
t1.Start(wi1);
WorkItem wi2 = new WorkItem();
wi2.InputString = s;
Thread t2 = new Thread(InternalCalcThread2);
t2.Start(wi2);
// You can now wait for the threads to complete or start new threads
// When you're done, wi1 and wi2 will be filled with the updated data
// but make sure not to use them until the threads are done!
}
public static void InternalCalcThread1(object item)
{
WorkItem w = item as WorkItem;
w.startPos = new List<int>();
w.len = new List<int>();
// Do work here - populate the work item data
}
public static void InternalCalcThread2(object item)
{
// Do work here
}
You can try this, but I am not sure about the performance on these methods
Parallel.Invoke(
() => CalcModel(s,startPos, len),
() => CalcModel(s,startPos, len)
);
To create and run multiple threads is a very easy task. All you need is method which acts as a starting point for a thread.
Suppose you have the CalcModel method as defined in your original post then you only have to do:
// instantiate the thread with a method as starting point
Thread t = new Thread(new ThreadStart(CalcModel));
// run the thread
t.Start();
However if you want the thread to return some values you might apply a little trick because you can't return values directly like you do it with a return statement or out parameters.
You can 'wrap' the thread in its own class and let him store its data in the class's fields:
public class ThreadClass {
public string FieldA;
public string FieldB;
//...
public static void Run () {
Thread t = new Thread(new ThreadStart(_run));
t.Start();
}
private void _run() {
//...
fieldA = "someData";
fieldB = "otherData"
//...
}
}
That's only a very rough example to illustrate the idea. I doesn't include any parts for thread synchronization or thread control.
I would say the more difficult task would be to think about splitting your CalcModel method in a way that it can be parallelized and then maybe more important how the partially results can be joined together to form one single end solution.
The code below continues to create threads, even when the queue is empty..until eventually an OutOfMemory exception occurs. If i replace the Parallel.ForEach with a regular foreach, this does not happen. anyone know of reasons why this may happen?
public delegate void DataChangedDelegate(DataItem obj);
public class Consumer
{
public DataChangedDelegate OnCustomerChanged;
public DataChangedDelegate OnOrdersChanged;
private CancellationTokenSource cts;
private CancellationToken ct;
private BlockingCollection<DataItem> queue;
public Consumer(BlockingCollection<DataItem> queue) {
this.queue = queue;
Start();
}
private void Start() {
cts = new CancellationTokenSource();
ct = cts.Token;
Task.Factory.StartNew(() => DoWork(), ct);
}
private void DoWork() {
Parallel.ForEach(queue.GetConsumingPartitioner(), item => {
if (item.DataType == DataTypes.Customer) {
OnCustomerChanged(item);
} else if(item.DataType == DataTypes.Order) {
OnOrdersChanged(item);
}
});
}
}
I think Parallel.ForEach() was made primarily for processing bounded collections. And it doesn't expect collections like the one returned by GetConsumingPartitioner(), where MoveNext() blocks for a long time.
The problem is that Parallel.ForEach() tries to find the best degree of parallelism, so it starts as many Tasks as the TaskScheduler lets it run. But the TaskScheduler sees there are many Tasks that take a very long time to finish, and that they're not doing anything (they block) so it keeps on starting new ones.
I think the best solution is to set the MaxDegreeOfParallelism.
As an alternative, you could use TPL Dataflow's ActionBlock. The main difference in this case is that ActionBlock doesn't block any threads when there are no items to process, so the number of threads wouldn't get anywhere near the limit.
The Producer/Consumer pattern is mainly used when there is just one Producer and one Consumer.
However, what you are trying to achieve (multiple consumers) more neatly fits in the Worklist pattern. The following code was taken from a slide for unit2 slide "2c - Shared Memory Patterns" from a parallel programming class taught at the University of Utah, which is available in the download at http://ppcp.codeplex.com/
BlockingCollection<Item> workList;
CancellationTokenSource cts;
int itemcount
public void Run()
{
int num_workers = 4;
//create worklist, filled with initial work
worklist = new BlockingCollection<Item>(
new ConcurrentQueue<Item>(GetInitialWork()));
cts = new CancellationTokenSource();
itemcount = worklist.Count();
for( int i = 0; i < num_workers; i++)
Task.Factory.StartNew( RunWorker );
}
IEnumberable<Item> GetInitialWork() { ... }
public void RunWorker() {
try {
do {
Item i = worklist.Take( cts.Token );
//blocks until item available or cancelled
Process(i);
//exit loop if no more items left
} while (Interlocked.Decrement( ref itemcount) > 0);
} finally {
if( ! cts.IsCancellationRequested )
cts.Cancel();
}
}
}
public void AddWork( Item item) {
Interlocked.Increment( ref itemcount );
worklist.Add(item);
}
public void Process( Item i )
{
//Do what you want to the work item here.
}
The preceding code allows you to add worklist items to the queue, and lets you set an arbitrary number of workers (in this case, four) to pull items out of the queue and process them.
Another great resource for the Parallelism on .Net 4.0 is the book "Parallel Programming with Microsoft .Net" which is freely available at: http://msdn.microsoft.com/en-us/library/ff963553
Internally in the Task Parallel Library, the Parallel.For and Parallel.Foreach follow a hill-climbing algorithm to determine how much parallelism should be utilized for the operation.
More or less, they start with running the body on one task, move to two, and so on, until a break-point is reached and they need to reduce the number of tasks.
This works quite well for method bodies that complete quickly, but if the body takes a long time to run, it may take a long time before the it realizes it needs to decrease the amount of parallelism. Until that point, it continues adding tasks, and possibly crashes the computer.
I learned the above during a lecture given by one of the developers of the Task Parallel Library.
Specifying the MaxDegreeOfParallelism is probably the easiest way to go.