C# Multithreading String Array - c#

I feel super confused... I am trying to implement an asynchronous C# call to a Web API to translate a list of values, the result I expect is another list in a 1 to 1 fashion. We don't mind about order, we are just interested in speed and to our knowledge the servers are capable to process the load.
private object ReadFileToEnd(string filePath)
{
//file read logic and validations...
string[] rowData = new string[4]; //array with initial value
rowData = translateData(rowData);
}
private async Task<List<string>> translateData(string[] Collection)
{
//The resulting string collection.
List<string> resultCollection = new List<string>();
Dictionary dict = new Dictionary();
foreach (string value in Collection)
{
Person person = await Task.Run(() => dict.getNewValue(param1, param2, value.Substring(0, 10)));
value.Remove(0, 10);
resultCollection.Add(person.Property1 + value);
}
return resultCollection;
}
I might have other problems, like the return type, I am just not getting it to work. My main focus is the multithread and returning an string array. The main thread is coming from ReadFileToEnd(...) already noticed that if I add the await it will require to add async to the function, I am trying not to change too much.

Use a Parallel ForEach to iterate and remove the await call inside each loop iteration.
private IEnumerable<string> translateData(string[] Collection)
{
//The resulting string collection.
var resultCollection = new ConcurrentBag<string>();
Dictionary dict = new Dictionary();
Parallel.ForEach(Collection,
value =>
{
var person = dict.getNewValue(param1, param2, value.Substring(0, 10));
value.Remove(0, 10);
resultCollection.Add(person.Property1 + value);
});
return resultCollection;
}
Your attempt and parallelism is not correct. You are doing nothing if everytime you send a Parallel request to the translate you stop your current iteration and wait for a result (without continuing the loop).
Hope this help!

Related

How is this parallel for not processing all elements?

I've created this normal for loop:
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
Dictionary<string, Dictionary<string, bool>> filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
foreach (var item in files)
{
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
}
return filesAnalyzed;
}
The loop just checks if each file that is in the variable "files" has all the dependencies specified in the variable "dependencies".
the "files" variable should only have unique elements because it is used as the key for the result, a dictionary, but I check this before calling the method.
The for loop works correctly and all elements are processed in single thread, so I wanted to increase the performance by changing to a parallel for loop, the problem is that not all the elements that come from the "files" variable are being processed in the parallel for (in my test case I get 30 elements instead of 53).
I've tried to increase the timespan, or to remove all the "Monitor.TryEnter" code and use just a lock(filesAnalyzed) but still got the same result
I'm not very familiar with the paraller for, so it might be something in the syntax that I'm using.
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
var filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
Parallel.For<KeyValuePair<string, Dictionary<string, bool>>>(
//start index
0,
//end index
files.Count(),
// initialization?
()=>new KeyValuePair<string, Dictionary<string, bool>>(),
(index, loop, result) =>
{
var temp = new KeyValuePair<string, Dictionary<string, bool>>(
files.ElementAt(index),
AnalyzeFile(files.ElementAt(index), dependencies));
return temp;
}
,
//finally
(x) =>
{
if (Monitor.TryEnter(filesAnalyzed, new TimeSpan(0, 0, 30)))
{
try
{
filesAnalyzed.Add(x.Key, x.Value);
}
finally
{
Monitor.Exit(filesAnalyzed);
}
}
}
);
return filesAnalyzed;
}
any feedback is appreciated
Assuming the code inside AnalyzeFile and dependencies is thread safe, how about something like this:
var filesAnalyzed = files
.AsParellel()
.Select(x => new{Item = x, File = AnalyzeFile(x, dependencies)})
.ToDictionary(x => x.Item, x=> x.File);
Rewrite your normal loop this way:
Parallel.Foreach(files, item=>
{
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});
You should also use ConcurrentDictionary except Dictionary to make all process thread-safe
You can simplify your code a lot if you use Parallel LINQ instead :
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
var filesAnalyzed = ( from item in files.AsParallel()
let result=AnalyzeFile(item, dependencies)
select (Item:item,Result:result)
).ToDictionary( it=>it.Item,it=>it.Result)
return filesAnalyzed;
}
I used tuple syntax in this case to avoid noise. It also cuts down on allocations.
Using method syntax, the same can be written as :
var filesAnalyzed = files.AsParallel()
.Select(item=> (item, AnalyzeFile(item, dependencies)))
.ToDictionary( it=>it.Item,it=>it.Result)
Dictionary<> isn't thread-safe for modification. If you wanted to use Parallel.ForEach without locking, you'd have to use ConcurrentDictionary
var filesAnalyzed = ConcurrentDictionary<string,Dictionary<string,bool>>;
Parallel.ForEach(files,file => {
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});
In this case at least, there is no benefit in using Parallel over PLINQ.
Hard to say what is exactly going wrong without debugging the code. Just looking at it though I would have used a ConcurrentDictionary for filesAnalyzed variable instead of a normal `Dictionary and get rid of the Monitor.
I would also check whether same key already exists in the dictionary filesAnalyzed, it could be that you are trying to add a kvp withthe key that is added to the dictionary already.

Is it the same to iterate over Linq expression result than to assign it first to a variable?

So, this is more difficult to explain in words, so i will put code examples.
let's suppose i already have a list of clients that i want to filter.
Basically i want to know if this:
foreach(var client in list.Where(c=>c.Age > 20))
{
//Do something
}
is the same as this:
var filteredClients = list.Where(c=>c.Age > 20);
foreach(var client in filteredClients)
{
//Do something
}
I've been told that the first approach executes the .Where() in every iteration.
I'm sorry if this is a duplicate, i couldn't find any related question.
Thanks in advance.
Yes, both those examples are functionally identical. One just stores the result from Enumerable.Where in a variable before accessing it while the other just accesses it directly.
To really see why this will not make a difference, you have to understand what a foreach loop essentially does. The code in your examples (both of them) is basically equivalent to this (I’ve assumed a known type Client here):
IEnumerable<Client> x = list.Where(c=>c.Age > 20);
// foreach loop
IEnumerator<Client> enumerator = x.GetEnumerator();
while (enumerator.MoveNext())
{
Client client = enumerator.Current;
// Do something
}
So what actually happens here is the IEnumerable result from the LINQ method is not consumed directly, but an enumerator of it is requested first. And then the foreach loop does nothing else than repeatedly asking for a new object from the enumerator and processing the current element in each loop body.
Looking at this, it doesn’t make sense whether the x in the above code is really an x (i.e. a previously stored variable), or whether it’s the list.Where() call itself. Only the enumerator object—which is created just once—is used in the loop.
Now to cover that SharePoint example which Colin posted. It looks like this:
SPList activeList = SPContext.Current.List;
for (int i=0; i < activeList.Items.Count; i++)
{
SPListItem listItem = activeList.Items[i];
// do stuff
}
This is a fundamentally different thing though. Since this is not using a foreach loop, we do not get that one enumerator object which we use to iterate through the list. Instead, we repeatedly access activeList.Items: Once in the loop body to get an item by index, and once in the continuation condition of the for loop where we get the collection’s Count property value.
Unfortunately, Microsoft does not follow its own guidelines all the time, so even if Items is a property on the SPList object, it actually is creating a new SPListItemCollection object every time. And that object is empty by default and will only lazily load the actual items when you first access an item from it. So above code will eventually create a large amount of SPListItemCollections which will each fetch the items from the database. This behavior is also mentioned in the remarks section of the property documentation.
This generally violates Microsoft’s own guidelines on choosing a property vs a method:
Do use a method, rather than a property, in the following situations.
The operation returns a different result each time it is called, even if the parameters do not change.
Note that if we used a foreach loop for that SharePoint example again, then everything would have been fine, since we would have again only requested a single SPListItemCollection and created a single enumerator for it:
foreach (SPListItem listItem in activeList.Items.Cast<SPListItem>())
{ … }
They are not quite the same:
Here is the original C# code:
static void ForWithVariable(IEnumerable<Person> clients)
{
var adults = clients.Where(x => x.Age > 20);
foreach (var client in adults)
{
Console.WriteLine(client.Age.ToString());
}
}
static void ForWithoutVariable(IEnumerable<Person> clients)
{
foreach (var client in clients.Where(x => x.Age > 20))
{
Console.WriteLine(client.Age.ToString());
}
}
Here is the decompiled Intermediate Language (IL) code this results in (according to ILSpy):
private static void ForWithVariable(IEnumerable<Person> clients)
{
Func<Person, bool> arg_21_1;
if ((arg_21_1 = Program.<>c.<>9__1_0) == null)
{
arg_21_1 = (Program.<>c.<>9__1_0 = new Func<Person, bool>(Program.<>c.<>9.<ForWithVariable>b__1_0));
}
IEnumerable<Person> enumerable = clients.Where(arg_21_1);
foreach (Person current in enumerable)
{
Console.WriteLine(current.Age.ToString());
}
}
private static void ForWithoutVariable(IEnumerable<Person> clients)
{
Func<Person, bool> arg_22_1;
if ((arg_22_1 = Program.<>c.<>9__2_0) == null)
{
arg_22_1 = (Program.<>c.<>9__2_0 = new Func<Person, bool>(Program.<>c.<>9.<ForWithoutVariable>b__2_0));
}
foreach (Person current in clients.Where(arg_22_1))
{
Console.WriteLine(current.Age.ToString());
}
}
As you can see, there is a key difference:
IEnumerable<Person> enumerable = clients.Where(arg_21_1);
A more practical question, however, is whether the differences hurt performance. I concocted a test to measure that.
class Program
{
public static void Main()
{
Measure(ForEachWithVariable);
Measure(ForEachWithoutVariable);
Console.ReadKey();
}
static void Measure(Action<List<Person>, List<Person>> action)
{
var clients = new[]
{
new Person { Age = 10 },
new Person { Age = 20 },
new Person { Age = 30 },
}.ToList();
var adultClients = new List<Person>();
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 1E6; i++)
action(clients, adultClients);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds.ToString());
Console.WriteLine($"{adultClients.Count} adult clients found");
}
static void ForEachWithVariable(List<Person> clients, List<Person> adultClients)
{
var adults = clients.Where(x => x.Age > 20);
foreach (var client in adults)
adultClients.Add(client);
}
static void ForEachWithoutVariable(List<Person> clients, List<Person> adultClients)
{
foreach (var client in clients.Where(x => x.Age > 20))
adultClients.Add(client);
}
}
class Person
{
public int Age { get; set; }
}
After several runs of the program, I was not able to find any significant difference between ForEachWithVariable and ForEachWithoutVariable. They were always close in time, and neither was consistently faster than the other. Interestingly, if I change 1E6 to just 1000, the ForEachWithVariable is actually consistently slower, by about 1 millisecond.
So, I conclude that for LINQ to Objects, there is no practical difference. The same type of test could be run if your particular use case involves LINQ to Entities (or SharePoint).

`Using Parallel.ForEach to speed up processing of file but cant return in correct order

so im trying to use a Parallel.ForEach loop to speed up my processing of a file but I can't figure out how to make it build the output in an ordered fashion. This is the code I have so far:
string[] lines = File.ReadAllLines(fileName);
List<string> list_lines = new List<string>(lines);
Parallel.ForEach(list_lines, async line =>
{
processedData += await processSingleLine(line);
});
As you can see it doesn't have any sort of ordered implementation since I have tried looking for something to fit my solution I haven't found anything that I've been able to get even near working.
So preferably I'd like have each line processed but build up the processedData variable in the same order that each line was sent out, however I do realize that this might just be out of my current skill level so any advice would be nice.
EDIT:
After trying reading the answers below I tried it with two methods:
ConcurrentDictionary<int, string> result = new ConcurrentDictionary<int, string>();
Parallel.For(0, list.Length, i =>
{
// process your data and save to dict
result[i] = processData(lines[i]);
});
and
ConcurrentDictionary<int, string> result = new ConcurrentDictionary<int, string>();
for (var i = 0; i < lines.Length; i++)
{
result[i] = lines[i];
}
Array.Clear(lines,0, lines.Length);
Parallel.ForEach(result, line =>
{
result[line.Key] = encrypt(line.Value, key);
});
Yet both only appear to be using about 1 core(4 core processor), 30% of total in Task manager, where as before I implemented the ordering it was using near on 80% on the CPU.
You can try using Parallel.For instead of Parallel.ForEach. Then you will have indexes for your lines. I.e.:
string[] lines = File.ReadAllLines(fileName);
// use thread safe collection for catching the results in parallel
ConcurrentDictionary<int, Data> result = new ConcurrentDictionary<int, Data>();
Parallel.For(0, list.Length, i =>
{
// process your data and save to dict
result[i] = processData(lines[i]);
});
// having data in dict you can easily retrieve initial order
Data[] orderedData = Data[lines.Length];
for(var i=0; i<lines.Length; i++)
{
orderedData[i] = result[i];
}
EDIT: And as it was said in comments under your question, you can't use async methods here. When you do, Parallel.ForEach will return you a bunch of tasks, not results. If you want to parallelize asynchronous code, you can use multiple Task.Run, like here:
string[] lines = File.ReadAllLines(fileName);
var tasks = lines.Select(
l => Task.Run<Data>(
async () => {
return await processAsync(l);
})).ToList();
var results = await Task.WhenAll(tasks);
NOTE: Should work, but didn't check it.
I believe Parallel.ForEach.AsOrdered() does what you want.
Taking the data structure list_lines and the method processSingleLine from your code, the following should preserve the order and have parallel execution:
var parallelQuery = from line in list_lines.AsParallel().AsOrdered()
select processSingleLine(line);
foreach (var processedLine in parallelQuery)
{
Console.Write(processedLine);
}

Create sequence consisting of multiple property values

I have an existing collection of objects with two properties of interest. Both properties are of the same type. I want to create a new sequence consisting of the property values. Here's one way (I'm using tuples instead of my custom type for simplicity):
var list = new List<Tuple<string, string>>
{ Tuple.Create("dog", "cat"), Tuple.Create("fish", "frog") };
var result =
list.SelectMany(x => new[] {x.Item1, x.Item2});
foreach (string item in result)
{
Console.WriteLine(item);
}
Results in:
dog
cat
fish
frog
This gives me the results I want, but is there a better way to accomplish this (in particular, without the need to create arrays or collections)?
Edit:
This also works, at the cost of iterating over the collection twice:
var result = list.Select(x => x.Item1).Concat(list.Select(x => x.Item2));
If you want to avoid creating another collection, you could yield the results instead.
void Main()
{
var list = new List<Tuple<string, string>>
{ Tuple.Create("dog", "cat"), Tuple.Create("fish", "frog") };
foreach (var element in GetSingleList(list))
{
Console.WriteLine (element);
}
}
// A reusable extension method would be a better approach.
IEnumerable<T> GetSingleList<T>(IEnumerable<Tuple<T,T>> list) {
foreach (var element in list)
{
yield return element.Item1;
yield return element.Item2;
}
}
I think your approach is fine and I would stick with that. The use of the array nicely gets the job done when using SelectMany, and the final result is an IEnumerable<string>.
There are some alternate approaches, but I think they're more verbose than your approach.
Aggregate approach:
var result = list.Aggregate(new List<string>(), (seed, t) =>
{
seed.Add(t.Item1);
seed.Add(t.Item2);
return seed;
});
result.ForEach(Console.WriteLine);
ForEach approach:
var result = new List<string>();
list.ForEach(t => { result.Add(t.Item1); result.Add(t.Item2); });
result.ForEach(Console.WriteLine);
In both cases a new List<string> is created.

Is there a way to know I am getting the last element in the foreach loop

I need to do special treatment for the last element in the collection. I am wondering if I can know I hit the last element when using foreach loop.
Only way I know of is to increment a counter and compare with length on exit, or when breaking out of loop set a boolean flag, loopExitedEarly.
There isn't a direct way. You'll have to keep buffering the next element.
IEnumerable<Foo> foos = ...
Foo prevFoo = default(Foo);
bool elementSeen = false;
foreach (Foo foo in foos)
{
if (elementSeen) // If prevFoo is not the last item...
ProcessNormalItem(prevFoo);
elementSeen = true;
prevFoo = foo;
}
if (elementSeen) // Required because foos might be empty.
ProcessLastItem(prevFoo);
Alternatively, you could use the underlying enumerator to do the same thing:
using (var erator = foos.GetEnumerator())
{
if (!erator.MoveNext())
return;
Foo current = erator.Current;
while (erator.MoveNext())
{
ProcessNormalItem(current);
current = erator.Current;
}
ProcessLastItem(current);
}
It's a lot easier when working with collections that reveal how many elements they have (typically the Count property from ICollection or ICollection<T>) - you can maintain a counter (alternatively, if the collection exposes an indexer, you could use a for-loop instead):
int numItemsSeen = 0;
foreach(Foo foo in foos)
{
if(++numItemsSeen == foos.Count)
ProcessLastItem(foo)
else ProcessNormalItem(foo);
}
If you can use MoreLinq, it's easy:
foreach (var entry in foos.AsSmartEnumerable())
{
if(entry.IsLast)
ProcessLastItem(entry.Value)
else ProcessNormalItem(entry.Value);
}
If efficiency isn't a concern, you could do:
Foo[] fooArray = foos.ToArray();
foreach(Foo foo in fooArray.Take(fooArray.Length - 1))
ProcessNormalItem(foo);
ProcessLastItem(fooArray.Last());
Unfortunately not, I would write it with a for loop like:
string[] names = { "John", "Mary", "Stephanie", "David" };
int iLast = names.Length - 1;
for (int i = 0; i <= iLast; i++) {
Debug.Write(names[i]);
Debug.Write(i < iLast ? ", " : Environment.NewLine);
}
And yes, I know about String.Join :).
I see others already posted similar ideas while I was typing mine, but I'll post it anyway:
void Enumerate<T>(IEnumerable<T> items, Action<T, bool> action) {
IEnumerator<T> enumerator = items.GetEnumerator();
if (!enumerator.MoveNext()) return;
bool foundNext;
do {
T item = enumerator.Current;
foundNext = enumerator.MoveNext();
action(item, !foundNext);
}
while (foundNext);
}
...
string[] names = { "John", "Mary", "Stephanie", "David" };
Enumerate(names, (name, isLast) => {
Debug.Write(name);
Debug.Write(!isLast ? ", " : Environment.NewLine);
})
Not without jumping through flaming hoops (see above). But you can just use the enumerator directly (slightly awkward because of C#'s enumerator design):
IEnumerator<string> it = foo.GetEnumerator();
for (bool hasNext = it.MoveNext(); hasNext; ) {
string element = it.Current;
hasNext = it.MoveNext();
if (hasNext) { // normal processing
Console.Out.WriteLine(element);
} else { // special case processing for last element
Console.Out.WriteLine("Last but not least, " + element);
}
}
Notes on the other approaches I see here: Mitch's approach requires having access to a container which exposes it's size. J.D.'s approach requires writing a method in advance, then doing your processing via a closure. Ani's approach spreads loop management all over the place. John K's approach involves creating numerous additional objects, or (second method) only allows additional post processing of the last element, rather than special case processing.
I don't understand why people don't use the Enumerator directly in a normal loop, as I've shown here. K.I.S.S.
This is cleaner with Java iterators, because their interface uses hasNext rather than MoveNext. You could easily write an extension method for IEnumerable that gave you Java-style iterators, but that's overkill unless you write this kind of loop a lot.
Is it Special treatment can be done only while processing on the foreach loop, Is it you can't do that while adding to the collection. If this is your case, have your own custom collection,
public class ListCollection : List<string>
{
string _lastitem;
public void Add(string item)
{
//TODO: Do special treatment on the new Item, new item should be last one.
//Not applicable for filter/sort
base.Add(item);
}
}
List<int> numbers = new ....;
int last = numbers.Last();
Stack<int> stack = new ...;
stack.Peek();
update
var numbers = new int[] { 1, 2,3,4,5 };
var enumerator = numbers.GetEnumerator();
object last = null;
bool hasElement = true;
do
{
hasElement = enumerator.MoveNext();
if (hasElement)
{
last = enumerator.Current;
Console.WriteLine(enumerator.Current);
}
else
Console.WriteLine("Last = {0}", last);
} while (hasElement);
Console.ReadKey();
Deferred Execution trick
Build a class that encapsulates the values to be processed and the processing function for deferred execution purpose. We will end up using one instance of it for each element processed in the loop.
// functor class
class Runner {
string ArgString {get;set;}
object ArgContext {get;set;}
// CTOR: encapsulate args and a context to run them in
public Runner(string str, object context) {
ArgString = str;
ArgContext = context;
}
// This is the item processor logic.
public void Process() {
// process ArgString normally in ArgContext
}
}
Use your functor in the foreach loop to effect deferred execution by one element:
// intended to track previous item in the loop
var recent = default(Runner); // see Runner class above
// normal foreach iteration
foreach(var str in listStrings) {
// is deferred because this executes recent item instead of current item
if (recent != null)
recent.Process(); // run recent processing (from previous iteration)
// store the current item for next iteration
recent = new Runner(str, context);
}
// now the final item remains unprocessed - you have a choice
if (want_to_process_normally)
recent.Process(); // just like the others
else
do_something_else_with(recent.ArgString, recent.ArgContext);
This functor approach uses memory more but prevents you from having to count the elements in advance. In some scenarios you might achieve a kind of efficiency.
OR
Shorter Workaround
If you want to apply special processing to the last element after processing them all in a regular way ....
// example using strings
var recentStr = default(string);
foreach(var str in listStrings) {
recentStr = str;
// process str normally
}
// now apply additional special processing to recentStr (last)
It's a potential workaround.

Categories

Resources