Usage of ConcurrentBag<T> vs list<T> - c#

Currently I was working on parallel threads in C#. So, I have confusion of using ConcurrentBag<T> and List<T>.
Here is my code:
public async Task<ConcurrentDictionary<string, ConcurrentBag<T>>> MethodA(SearchResults<Result> response)
{
var a = new ConcurrentDictionary<string, ConcurrentBag<DeviceAlertAlarm>>();
var tasks = response.GetResults().Select(async result) =>
{
var b = new List <T>();
// do something
a["xxx"] = b;
});
await Task.WhenAll(tasks);
return a;
}
For var b = new List ();
Is it mandatory of ConcurrentBag<T> in multi-threading or can I use List<T> which is best way of writing the code with respective of performance.
Which one is better Concurrentbag<T> or List<T> in the above part of code?

Because your inner list is never used concurrently you also do not need to use a ConcurrentBag<T> here.
Commented your example a bit. From what I expect your code is doing I would take ICollection<T> or IEnumerable<T> in the ConcurrentDictionary<string, IEnumerable<T>>.
// IEnumerable, List, Collection ... is enough here.
public async Task<ConcurrentDictionary<string, IEnumerable<T>>> MethodA(SearchResults<Result> response)
{
var a = new ConcurrentDictionary<string, IEnumerable<DeviceAlertAlarm>>();
var tasks = response.GetResults().Select(async (result) =>
{
//
// This list just exists and is accessed in this 'task/thread/delegate'
var b = new List<T>();
//
// do something ...
//
// The surrounding IDictionary as you have it here
// is used *concurrently* so this is enough to be thread-safe
a["xxx"] = b;
});
await Task.WhenAll(tasks);
return a;
}

Related

I don't understand why my sort isn't working

My algoritm
1) i have empty completeTask List, and compare with task i try to find the task which doesn't have the dependences (it's task_e)
when did it, I add to complete list firt task(I need just a name of task becauce I compare the name of dependences and name of task) and place it to first position.
2) we have first task which we did - task_e it's our first dependence.
now i must find all task which have their dependece, and do it.
when I will do it i add it to completeTask list and make do other tasks
it's my algoritm
public static void Sort(Task[] tasks)
{
List<String> completedTaskName = new List<String>();
Task temp;
for (int i = 0; i < tasks.Length; i++)
{
for (int j = i; j < tasks.Length; j++)
{
if (!tasks[j].Dependencies.Except(completedTaskName).Any())
{
temp = tasks[i];
tasks[i] = tasks[j];
tasks[j] = temp;
completedTaskName.Add(tasks[i].Name);
}
}
}
}
but it done not correctly result when i sort
new Task("a", "b", "c"),
new Task("b"),
new Task("c", "b"),
Do you necessarily need to apply the sorting on the same array, because if you could make a copy and append to list in the order tasks would complete, it would be much easier.
Utilizing the same LINQ methods as in your example, you could simply move the tasks from one list to another in the order they are completed based on their dependencies, like the following.
private static List<Task> Sort(Task[] tasks)
{
var completedTasks = new List<Task>();
var uncompletedTasks = tasks.ToList();
while (uncompletedTasks.Any())
{
var taskToComplete = uncompletedTasks
.FirstOrDefault(task => !task.Dependencies.Except(completedTasks.Select(x => x.Name)).Any());
if (taskToComplete == null)
{
// Cross dependency between tasks
Console.WriteLine($"Cross dependency between the tasks: {string.Join(", ", uncompletedTasks.Select(task => task.Name))}");
break;
}
completedTasks.Add(taskToComplete);
uncompletedTasks.Remove(taskToComplete);
}
return completedTasks;
}
Then instead of Sort(tasks) simply do var sortedTasks = Sort(tasks)
Comments to OP have already underlined some problems in your code:
you are trying to create a sorted List, but you discard it
you are using Task for your class name, which is not a good idea, given that a class Task already exists in .Net framework
Anyway, you should not try to implement your sorting algorithm, .Net framework already implements (a few) good algorithims for you. Don't reinvent the wheel
You need just to specify how two objects (two Task in your case) should be compared.
You can create your own TaskComparer, implementing : IComparer<Task>, and use the comparer in (e.g) Linq OrderBy or List<Task>.Sort
Something like this should work:
public class TaskComparer: IComparer<Task>
{
public virtual int Compare(Task t1, Task t2)
{
// second task is included in first task dependencies, it should be considered "bigger" than first
if (t1.Dependencies.Contains(t2.Name))
return 1;
// first task is included in second task dependencies, it should be considered "bigger" than second
if (t2.Dependencies.Contains(t1.Name))
return -1;
return 0;
}
}
public static void Main()
{
// The following array is an example of specific tasks and dependencies between them.
// For example the following constructor:
// new Task("task_a", "task_c")
// means that task_a may be started only after task_c is complete
var tasks = new[]
{
new Task("task_a", "task_c"),
new Task("task_b", "task_c"),
new Task("task_c", "task_e"),
new Task("task_d", "task_a", "task_e"),
new Task("task_e"),
};
var sortedList = tasks.OrderBy(t => t, new TaskComparer()).ToList();
foreach (Task t in sortedList)
Console.WriteLine(t.Name);
Console.WriteLine();
// another set of data
tasks = new Task[]
{
new Task("task_a", "task_b", "task_c"),
new Task("task_b"),
new Task("task_c", "task_b"),
};
sortedList = tasks.OrderBy(t => t, new TaskComparer()).ToList();
foreach (Task t in sortedList)
Console.WriteLine(t.Name);
}
Output:
task_e
task_c
task_a
task_b
task_d
task_b
task_c
task_a
See it at work here

C# Multithreading String Array

I feel super confused... I am trying to implement an asynchronous C# call to a Web API to translate a list of values, the result I expect is another list in a 1 to 1 fashion. We don't mind about order, we are just interested in speed and to our knowledge the servers are capable to process the load.
private object ReadFileToEnd(string filePath)
{
//file read logic and validations...
string[] rowData = new string[4]; //array with initial value
rowData = translateData(rowData);
}
private async Task<List<string>> translateData(string[] Collection)
{
//The resulting string collection.
List<string> resultCollection = new List<string>();
Dictionary dict = new Dictionary();
foreach (string value in Collection)
{
Person person = await Task.Run(() => dict.getNewValue(param1, param2, value.Substring(0, 10)));
value.Remove(0, 10);
resultCollection.Add(person.Property1 + value);
}
return resultCollection;
}
I might have other problems, like the return type, I am just not getting it to work. My main focus is the multithread and returning an string array. The main thread is coming from ReadFileToEnd(...) already noticed that if I add the await it will require to add async to the function, I am trying not to change too much.
Use a Parallel ForEach to iterate and remove the await call inside each loop iteration.
private IEnumerable<string> translateData(string[] Collection)
{
//The resulting string collection.
var resultCollection = new ConcurrentBag<string>();
Dictionary dict = new Dictionary();
Parallel.ForEach(Collection,
value =>
{
var person = dict.getNewValue(param1, param2, value.Substring(0, 10));
value.Remove(0, 10);
resultCollection.Add(person.Property1 + value);
});
return resultCollection;
}
Your attempt and parallelism is not correct. You are doing nothing if everytime you send a Parallel request to the translate you stop your current iteration and wait for a result (without continuing the loop).
Hope this help!

Thread-safe changes to a ConcurrentDictionary

I am populating a ConcurrentDictionary in a Parallel.ForEach loop:
var result = new ConcurrentDictionary<int, ItemCollection>();
Parallel.ForEach(allRoutes, route =>
{
// Some heavy operations
lock(result)
{
if (!result.ContainsKey(someKey))
{
result[someKey] = new ItemCollection();
}
result[someKey].Add(newItem);
}
}
How do I perform the last steps in a thread-safe manner without using the lock statement?
EDIT: Assume that ItemCollection is thread-safe.
I think you want GetOrAdd, which is explicitly designed to either fetch an existing item, or add a new one if there's no entry for the given key.
var collection = result.GetOrAdd(someKey, _ => new ItemCollection());
collection.Add(newItem);
As noted in the question comments, this assumes that ItemCollection is thread-safe.
You need to use the GetOrAdd method.
var result = new ConcurrentDictionary<int, ItemCollection>();
int someKey = ...;
var newItem = ...;
ItemCollection collection = result.GetOrAdd(someKey, _ => new ItemCollection());
collection.Add(newItem);
Assuming ItemCollection.Add is not thread-safe, you will need a lock, but you can reduce the size of the critical region.
var collection = result.GetOrAdd(someKey, k => new ItemCollection());
lock(collection)
collection.Add(...);
Update: Since it seems to be thread-safe, you don't need the lock at all
var collection = result.GetOrAdd(someKey, k => new ItemCollection());
collection.Add(...);

How to multithread a process method

I am trying to using multi threading to process a list of results faster. I tried using a parallel for each but when the process method is run I do not recieve the correct results.
private IEnumerable<BulkProcessorResult> GetProccessResults(List<Foo> Foos)
{
var listOfFooLists = CreateListOfFooLists(Foos);
var bulkProcessorResults = new List<BulkProcessorResult>();
Parallel.ForEach(listOfFooLists, FooList =>
{
foreach (var Foo in FooList)
{
var processClaimResult = _processor.Process(Foo);
var bulkProcessorResult = new BulkProcessorResult()
{
ClaimStatusId = (int) processClaimResult.ClaimStatusEnum,
Property1 = Foo.Property1
};
bulkProcessorResults.Add(bulkProcessorResult);
}
});
return bulkProcessorResults;
}
If I use a normal forEach I get the correct output. If I use the above code I get status' of all 2 when there should be three with the status of 1 and one status of 3.
I am really new to threading so any help would be great.
The most obvious issue is that you're working with multiple threads (okay, this is somewhat hidden by calling Parallel.ForEach, but you should be aware that it achieves parallelism by using multiple threads/tasks) but you're using a List<T>, which isn't a thread-safe collection class:
A List<T> can support multiple readers concurrently, as long as the collection is not modified. Enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with one or more write accesses, the only way to ensure thread safety is to lock the collection during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization
Rather than implementing your own synchronization, though, and whilst not altering much else in your code, I would switch to using a ConcurrentQueue<T>:
private IEnumerable<BulkProcessorResult> GetProccessResults(List<Foo> Foos)
{
var listOfFooLists = CreateListOfFooLists(Foos);
var bulkProcessorResults = new ConcurrentQueue<BulkProcessorResult>();
Parallel.ForEach(listOfFooLists, FooList =>
{
foreach (var Foo in FooList)
{
var processClaimResult = _processor.Process(Foo);
var bulkProcessorResult = new BulkProcessorResult()
{
ClaimStatusId = (int) processClaimResult.ClaimStatusEnum,
Property1 = Foo.Property1
};
bulkProcessorResults.Enqueue(bulkProcessorResult);
}
});
return bulkProcessorResults;
}
How about treating the entire thing as a Parallel Linq query?
private IEnumerable<BulkProcessorResult> GetProccessResults(List<Foo> Foos)
{
var listOfFooLists = CreateListOfFooLists(Foos);
return listOfFooLists.AsParallel()
.SelectMany(FooList => FooList)
.Select(Foo =>
new BulProcessorResult {
ClaimStatusId = (int)_processor.Process(Foo),
Property1 = Foo.Property1
}).ToList();
}

Rx Amb extension

I'm working with the Reactive framework for Silverlight and would like to achieve the following.
I am try to create a typical data provider for a Silverlight client that also takes advantage of the caching framework available in MS Ent Lib. The scenarios requires that I must check in the cache for the key-value pair before hitting the WCF data client.
By using the Rx extension Amb, I am able to pull the data from the cache or WCF data client, whichever returns first, but how can I stop the WCF client from executing the call if the values is in the cache?
I would also like to consider racing conditions, e.g. if the first subscriber requests some data and the provider is fetching data from the WCF data client (async), how do I prevent subsequent async requests from doing the same thing (at this stage, the cache has yet to be populated).
I had exactly the same problem. I solved it with an extension method with the following signature:
IObservable<R> FromCacheOrFetch<T, R>(
this IObservable<T> source,
Func<T, R> cache,
Func<IObservable<T>, IObservable<R>> fetch,
IScheduler scheduler) where R : class
Effectively what this did was take in the source observable and return an observable that would match each input value with its output value.
To get each output value it would check the cache first. If the value exists in the cache it used that. If not it would spin up the fetch function only on values that weren't in the cache. If all of the values were in the cache then the fetch function would never be spun up - so no service connection set up penalty, etc.
I'll give you the code, but it's based on a slightly different version of the extension method that uses a Maybe<T> monad - so you might find you need to fiddle with the implementation.
Here it is:
public static IObservable<R> FromCacheOrFetch<T, R>(this IObservable<T> source, Func<T, R> cache, Func<IObservable<T>, IObservable<R>> fetch, IScheduler scheduler)
where R : class
{
return source.FromCacheOrFetch<T, R>(t => cache(t).ToMaybe(null), fetch, scheduler);
}
public static IObservable<R> FromCacheOrFetch<T, R>(this IObservable<T> source, Func<T, Maybe<R>> cache, Func<IObservable<T>, IObservable<R>> fetch, IScheduler scheduler)
{
var results = new Subject<R>();
var disposables = new CompositeDisposable();
var loop = new EventLoopScheduler();
disposables.Add(loop);
var sourceDone = false;
var pairsDone = true;
var exception = (Exception)null;
var fetchIn = new Subject<T>();
var fetchOut = (IObservable<R>)null;
var pairs = (IObservable<KeyValuePair<int, R>>)null;
var lookup = new Dictionary<T, int>();
var list = new List<Maybe<R>>();
var cursor = 0;
Action checkCleanup = () =>
{
if (sourceDone && pairsDone)
{
if (exception == null)
{
results.OnCompleted();
}
else
{
results.OnError(exception);
}
loop.Schedule(() => disposables.Dispose());
}
};
Action dequeue = () =>
{
while (cursor != list.Count)
{
var mr = list[cursor];
if (mr.HasValue)
{
results.OnNext(mr.Value);
cursor++;
}
else
{
break;
}
}
};
Action<KeyValuePair<int, R>> nextPairs = kvp =>
{
list[kvp.Key] = Maybe<R>.Something(kvp.Value);
dequeue();
};
Action<Exception> errorPairs = ex =>
{
fetchIn.OnCompleted();
pairsDone = true;
exception = ex;
checkCleanup();
};
Action completedPairs = () =>
{
pairsDone = true;
checkCleanup();
};
Action<T> sourceNext = t =>
{
var mr = cache(t);
list.Add(mr);
if (mr.IsNothing)
{
lookup[t] = list.Count - 1;
if (fetchOut == null)
{
pairsDone = false;
fetchOut = fetch(fetchIn.ObserveOn(Scheduler.ThreadPool));
pairs = fetchIn.Select(x => lookup[x]).Zip(fetchOut, (i, r2) => new KeyValuePair<int, R>(i, r2));
disposables.Add(pairs.ObserveOn(loop).Subscribe(nextPairs, errorPairs, completedPairs));
}
fetchIn.OnNext(t);
}
else
{
dequeue();
}
};
Action<Exception> errorSource = ex =>
{
sourceDone = true;
exception = ex;
fetchIn.OnCompleted();
checkCleanup();
};
Action completedSource = () =>
{
sourceDone = true;
fetchIn.OnCompleted();
checkCleanup();
};
disposables.Add(source.ObserveOn(loop).Subscribe(sourceNext, errorSource, completedSource));
return results.ObserveOn(scheduler);
}
Example usage would look like this:
You would have a source of the indices that you want to fetch:
IObservable<X> source = ...
You would have a function that can get values from the cache and an action that can put them in (and both should be thread-safe):
Func<X, Y> getFromCache = x => ...;
Action<X, Y> addToCache = (x, y) => ...;
Then you would have the actual call to go get the data from your database or service:
Func<X, Y> getFromService = x => ...;
Then you could define fetch like so:
Func<IObservable<X>, IObservable<Y>> fetch =
xs => xs.Select(x =>
{
var y = getFromService(x);
addToCache(x, y);
return y;
});
And finally you can make your query by calling the following:
IObservable<Y> results =
source.FromCacheOrFetch(
getFromCache,
fetch,
Scheduler.ThreadPool);
Of course you would need to subscribe to the result to make the computation take place.
Clearly Amb is not the right way to go, since that will hit both the cache and the service every time. What does EntLib return you if the cache is a miss?
Note that Observable.Timeout is a reasonable alternative:
cache(<paramters>).Timeout(TimeSpan.FromSeconds(1), service<paramters>);
But clearly it's not a great idea to timeout if you want instead process the return from EntLib and act appropriately instead.
I'm not seeing why this is necessarily a Reactive Extensions problem.
A simple approach, which is probably less fully featured than #Enigmativity's solution could be something along the lines of:
public IObservable<T> GetCachedValue<TKey, TResult>(TKey key, Func<TKey, TResult> getFromCache, Func<TKey, TResult> getFromSource)
{
return getFromCache(<key>).Concat(getFromSource(<key>).Take(1);
}
This is just a loosely formed idea, you'd need to add:
A mechanism to add the item to the cache, or assume getFromSource caches the result
Some kind of thread safety to prevent multiple hits on the source for the same uncached key (if required)
getFromCache would need to return Observable.Empty() if the item wasn't in the cache.
But if you want something simple, it's not a bad place to start.

Categories

Resources