Getting weird result while using Task Parallel Library? - c#

I am trying to do some filter task using TPL. Here I am simplifying the code to filter number based on condition. Here is the code.
public static void Main (string[] args)
{
IEnumerable<int> allData = getIntData ();
Console.WriteLine ("Complete Data display");
foreach (var item in allData) {
Console.Write(item);
Console.Write(" | ");
}
Console.WriteLine ();
filterAllDatas (ref allData, getConditions ());
foreach (var item in allData) {
Console.Write(item);
Console.Write(" | ");
}
Console.WriteLine ();
}
static void filterAllDatas(ref IEnumerable<int> data, IEnumerable<Func<int,bool>> conditions)
{
List<int> filteredData = data.ToList ();
List<Task> tasks = new List<Task>();
foreach (var item in data.AsParallel()) {
foreach (var condition in conditions.AsParallel()) {
tasks.Add(Task.Factory.StartNew(() => {
if (condition(item)) {
filteredData.Remove(item);
}
}));
}
}
Task.WaitAll(tasks.ToArray());
data = filteredData.AsEnumerable ();
}
static IEnumerable<Func<int,bool>> getConditions()
{
yield return (a) => { Console.WriteLine("modulo by 2"); return a % 2 == 0;};
yield return (a) => { Console.WriteLine("modulo by 3"); Thread.Sleep(3000); return a % 3 == 0;};
}
static IEnumerable<int> getIntData ()
{
for (int i = 0; i < 10; i++) {
yield return i;
}
}
Here, it is simple code to filter out integer which is divided by two or three. Now, if I remove that thread sleep code work perfectly but if I put that it is not.
Normally means without Thread.Sleep , both condition execute 10 times e.g. for every number. But if I add Thread.Sleep first condition executes 7 times and second executes thirteen times. And because of this few number skip the condition. I try to debug but didn't get anything that can point out issue with my code.
Is there any good way to achieve this? Like filter condition on data can work async and parallel to improve performance ?
Code is for demo purpose only.
FYI: Currently I am using Mono with Xamarine studio on windows machine.
Please let me know if any further details needed.

I would guess it has to do with how your task's lambda closes over the loop variable condition. Try changing it as follows:
foreach (var condition in conditions.AsParallel()) {
var tasksCondition = condition
tasks.Add(Task.Factory.StartNew(() => {
if (tasksCondition(item)) {
filteredData.Remove(item);
}
}));
Note you're also closing over the loop variable item, which could cause similar problems.

First you can change your getConditions method to see what's happening inside :
static IEnumerable<Func<int, bool>> getConditions()
{
yield return (a) => { Console.WriteLine(a + " modulo by 2"); return a % 2 == 0; };
yield return (a) => { Console.WriteLine(a + " modulo by 3"); Thread.Sleep(3000); return a % 3 == 0; };
}
And if you stop capturing the foreach's variables, it will work :
static void filterAllDatas(ref IEnumerable<int> data, IEnumerable<Func<int, bool>> conditions)
{
List<int> filteredData = data.ToList();
List<Task> tasks = new List<Task>();
foreach (var item in data.AsParallel())
{
var i = item;
foreach (var condition in conditions.AsParallel())
{
var c = condition;
tasks.Add(Task.Factory.StartNew(() =>
{
if (c(i))
{
filteredData.Remove(i);
}
}));
}
}
Task.WaitAll(tasks.ToArray());
data = filteredData.AsEnumerable();
}

Related

Bad Performance Parallel.ForEach

I have a Parallel ForEach loop, that perform terrible. I don't understand what is wrong with the code that cause it to perform so bad, it runs for like 15 minutes, and I stop it.. I'm not sure if that some kind of memory leak or something similar, but the sequence version that I had before trying to change it to parallel, was working fine with the same base code.
Note that sourceFileRead and outboundFileRead are two big list of around 100,000 IrfRecord. That's why I wanted to move it to work in parallel, but it seems like i'm doing something very wrong ?
private async Task Compare(ConcurrentBag<IrfRecord> sourceFileRead, ConcurrentBag<IrfRecord> outboundFileRead)
{
await Task.Run(() =>
{
Parallel.ForEach(sourceFileRead, (src) =>
{
var matched = outboundFileRead.FirstOrDefault(ob => ob.Key == src.Key);
if (matched == null)
{
ReportMissing(src.Key);
return;
}
CompareEntities(src, matched);
});
});
private void CompareEntities(IrfRecord src, IrfRecord matched)
{
for (var i = 0; i < src.Body.Count - 1; i++)
{
if (src.Body[i].Settings.ToIgnore) continue;
if (src.Body[i].Value != matched.Body[i].Value)
{
ReportDiff(src.Body[i], matched.Body[i], src.Key);
}
}
}
private void ReportMissing(string srcKey)
{
_differencesCount.AddOrUpdate("Missing", new List<string> { srcKey },
(key, existingVal) =>
{
existingVal.Add(srcKey);
return existingVal;
});
}
private void ReportDiff(IrfRecordProperty srcProp, IrfRecordProperty obProp, string srcKey)
{
_differencesCount.AddOrUpdate(srcProp.Settings.Name, new List<string> { srcKey + $#" LC Value: {srcProp.Value}, D Value: {obProp.Value}" },
(key, existingVal) =>
{
existingVal.Add(srcKey + $#" LC Value: {srcProp.Value}, D Value: {obProp.Value}");
return existingVal;
});
}
Am I using the Parallel.ForEach in a wrong way ?

Difficulties retrieving List<int> within List<List<int>>

I am having trouble with the below code:
foreach (var result in results)
{
foreach (int individualresult in result)
{
//Operation
}
}
'results' is a List<List<int>> and I am trying to retrieve each integer from each list within the 'results' list of lists (sorry if that's confusing), however when I run the code no errors are received but it doesn't get any further than the first line.
I've put it in a 'try catch' and it doesn't pick up any exceptions or errors so I am flummoxed as to why it isn't working. Additionally I have tried changing var to List<int> but that didn't change anything either.
Any and all help will be appreciated. Thanks
Try this code in a Console-Application and check the Output. I assume that ether your results is empty or that each individual List in your results is empty.
Console.WriteLine("Start");
if (results == null || !results.Any())
Console.WriteLine("No result received!");
foreach (var result in results)
{
Console.WriteLine("new result-set");
if (result == null || !result.Any())
Console.WriteLine(" Result: Empty list!");
foreach (var individualresult in result)
{
Console.WriteLine(" Result: " + individualresult);
}
}
Console.WriteLine("End");
Console.Readline();
See this example it might help you
class Program
{
static void Main(string[] args)
{
List<int> list1d = new List<int>();
List<List<int>> list2d = new List<List<int>>();
for (int i = 0; i < 10; i++)
{
list1d = new List<int>();
for (int j = 0; j < 10; j++)
{
list1d.Add(i * j + i);
}
list2d.Add(list1d);
}
foreach (var result in list2d)
{
foreach (var i in result)
{
Console.WriteLine(i);
}
}
Console.ReadKey();
}
}

Finish two tasks then printing something

I have three tasks, one is producer, then consumer and the last one is to print something after finishing the first two. However the code doesn't reach the last task, which means no printing.
while (true)
{
ThreadEvent.WaitOne(waitingTime, false);
lock (SyncVar)
{
collection = new BlockingCollection<string>(4);
Task producer = Task.Run(() =>
{
if (list.Count > 0)
Console.WriteLine("Block begin");
while (!collection.IsAddingCompleted)
{
var firstItem = list.FirstOrDefault();
collection.TryAdd(firstItem);
list.Remove(firstItem);
}
collection.CompleteAdding();
});
Task consumer = Task.Run(() => DoConsume());
Task endTask = consumer.ContinueWith(i => Console.WriteLine("Block end"));// not print this line, why?
Task.WaitAll(producer, consumer, endTask);
if (ThreadState != State.Running) break;
}
}
Please look at my code logic.
EDIT:
For `DoConsume', it is complicated.
public void DoConsume()
{
if (collection.Count > 0)
Console.WriteLine("There are {0} channels to be processed.", collection.Count);
var workItemBlock = new ActionBlock<string>(
workItem =>
{
bool result =ProcessEachChannel(workItem);
});
foreach (var workItem in collection.GetConsumingEnumerable())
{
workItemBlock.Post(workItem);
}
workItemBlock.Complete();
}
The problem is that your producer will never complete:
// This will run until after CompleteAdding is called
while (!collection.IsAddingCompleted)
{
var firstItem = list.FirstOrDefault();
collection.TryAdd(firstItem);
list.Remove(firstItem);
}
//... which doesn't happen until after the loop
collection.CompleteAdding();
It looks like you're just trying to add all of the items in your list, which should be as simple as:
Task producer = Task.Run(() =>
{
if (list.Count > 0)
Console.WriteLine("Block begin");
while(list.Any())
{
var firstItem = list.First();
collection.TryAdd(firstItem);
list.Remove(firstItem);
}
collection.CompleteAdding();
});
Or, a simpler method:
Task producer = Task.Run(() =>
{
if (list.Count > 0)
Console.WriteLine("Block begin");
foreach(var item in list)
{
collection.TryAdd(item);
}
list.Clear();
collection.CompleteAdding();
});
I used Reed Copsey's code but the error is still there. Just can't figure it out why.
I think that my code has the flaw at while (!collection.IsAddingCompleted).
Because the collection has the boundary of 4, suppose there are two item left in the collection. The condition collection.IsAddingCompleted is never met therefore the code could not jump out of the while loop.
I rewrote the code, it seems fine. The code is similar MSDN. I used Take to retrieve the element in the collection.
while (true)
{
ThreadEvent.WaitOne(waitingTime, false);
lock (SyncVar)
{
collection = new BlockingCollection<string>(4);
DoWork dc = new DoWork();
Task consumer = Task.Run(() =>
{
while (!collection.IsCompleted)
{
string data = "";
try
{
if (collection.Count > 0)
data = collection.Take();
}
catch (InvalidOperationException e)
{
Console.WriteLine(e.Message);
}
if (data != "")
{
bool result = dc.DoConsume(data);
}
}
});
Task producer = Task.Run(() =>
{
if (list.Count > 0)
Console.WriteLine("Block begin");
foreach (var item in list)
{
collection.Add(item);
}
list.Clear();
collection.CompleteAdding();
});
Task endTask = consumer.ContinueWith(i => Console.WriteLine("Block end"));
Task.WaitAll(producer, consumer, endTask);
if (ThreadState != State.Running) break;
}

Incrementing an IEnumerator/IEnumerable while using yield

I am trying to yield iterate through a collection and if the collection is empty then call an increment method that will get the next set of results. When the increment says there are no more results then the yield with break;
I can not use (i think) a standard IEnumerator with MoveNext() etc as the increment method returns two different types of data.
I have tried an example below but it stops after one itteration. I am hoping there is a much easier way to do this (or at least is possible just I have a bug).
static void Main(string[] args)
{
var query = new Query();
foreach(var s in query.Q1())
{
Console.WriteLine(s);
}
foreach (var s in query.Q2())
{
Console.WriteLine(s);
}
Console.ReadLine();
}
public class Query
{
int i = 0;
bool complete;
List<string> q1 = new List<string>();
List<string> q2 = new List<string>();
public IEnumerable<string> Q1()
{
if (complete)
{
yield break;
}
if (!q1.Any() && !complete)
{
Increment();
}
if (q1.Any())
{
foreach (var s in q1)
{
yield return s;
}
}
}
public IEnumerable<string> Q2()
{
if (complete)
{
yield break;
}
if (!q2.Any() && !complete)
{
Increment();
}
if (q2.Any())
{
foreach (var s in q2)
{
yield return s;
}
}
}
void Increment()
{
if (i < 10)
{
// simulate getting two types of data back (parent and two children) from datasource
q1.Add((1 * (i + 1)).ToString());
q2.Add("A: " + (1 * (i + 1)).ToString());
q2.Add("B: " + (1 * (i + 1)).ToString());
i++;
}
else
{
complete = true;
}
}
}
result:
1
A: 1
B: 1
Any ideas on a better way of doing this or where I am going wrong?
EDIT
Here is my rough and ready fix:
public IEnumerable<string> Q1()
{
var index = 0;
if (!complete)
{
while (!complete)
{
var count = q1.Count();
if (index + 1 == count)
{
for (var x = index; index < count; index++)
{
yield return q1[index];
}
}
else
{
Increment();
}
}
}
else
{
foreach (var s in q1)
{
yield return s;
}
}
}
You are adding elements only to q2 list. Thus when you call Q1 iterator, you are exiting it after checking
if (q1.Any())
When you calling Q2 iterator, you exit it after
if (q2.Any())
{
foreach (var s in q2)
{
yield return s;
}
}
This foreach loop is executed only once and it returns only three items which where added to q2 during single Increment call in Q1 iterator.
It's not very clear what you want to achieve, but here is the way you can use loop for generating return values of iterator
public IEnumerable<string> Q2()
{
for (int i = 1; i <= 10; i++) // start from 1
{
yield return i.ToString(); // do not multiply by 1
yield return "A: " + i; // .ToString() is not necessary
yield return "B: " + i;
}
}

LINQ: take a sequence of elements from a collection

I have a collection of objects and need to take batches of 100 objects and do some work with them until there are no objects left to process.
Instead of looping through each item and grabbing 100 elements then the next hundred etc is there a nicer way of doing it with linq?
Many thanks
static void test(IEnumerable<object> objects)
{
while (objects.Any())
{
foreach (object o in objects.Take(100))
{
}
objects = objects.Skip(100);
}
}
:)
int batchSize = 100;
var batched = yourCollection.Select((x, i) => new { Val = x, Idx = i })
.GroupBy(x => x.Idx / batchSize,
(k, g) => g.Select(x => x.Val));
// and then to demonstrate...
foreach (var batch in batched)
{
Console.WriteLine("Processing batch...");
foreach (var item in batch)
{
Console.WriteLine("Processing item: " + item);
}
}
This will partition the list into a list of lists of however many items you specify.
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int size)
{
int i = 0;
List<T> list = new List<T>(size);
foreach (T item in source)
{
list.Add(item);
if (++i == size)
{
yield return list;
list = new List<T>(size);
i = 0;
}
}
if (list.Count > 0)
yield return list;
}
I don't think linq is really suitable for this sort of processing - it is mainly useful for performing operations on whole sequences rather than splitting or modifying them. I would do this by accessing the underlying IEnumerator<T> since any method using Take and Skip are going to be quite inefficient.
public static void Batch<T>(this IEnumerable<T> items, int batchSize, Action<IEnumerable<T>> batchAction)
{
if (batchSize < 1) throw new ArgumentException();
List<T> buffer = new List<T>();
using (var enumerator = (items ?? Enumerable.Empty<T>()).GetEnumerator())
{
while (enumerator.MoveNext())
{
buffer.Add(enumerator.Current);
if (buffer.Count == batchSize)
{
batchAction(buffer);
buffer.Clear();
}
}
//execute for remaining items
if (buffer.Count > 0)
{
batchAction(buffer);
}
}
}
var batchSize = 100;
for (var i = 0; i < Math.Ceiling(yourCollection.Count() / (decimal)batchSize); i++)
{
var batch = yourCollection
.Skip(i*batchSize)
.Take(batchSize);
// Do something with batch
}

Categories

Resources