Converting a for loop into Task.Parallel.For

Converting a for loop into Task.Parallel.For - c#

I have a procedure bool IsExistImage(int i) . the task of the procedure to detect an image and return bool whether it exist or not.
i have a PDF of 100+ pages which i split and send only the file name through the method. file names are actually the pagenumber of the main PDF file. like 1,2,3,...,125,..
after detecting the image, my method correctly save the list of pages. For that i used this code:
ArrayList array1 = new ArrayList();
for(int i=1;i<pdf.length;i++)
{
if(isExistImage(i))
{
array1.add(i);
}
}
This process runs for more than 1 hours(obviously for the internal works in isExistImage() method.). I can assure you, that no object/variable are global out side the method scope.
So, to shorten the time, I used Task.Parallel For loop. here is what i did :
System.Threading.Tasks,Parallel.For(1,pdf.Length,i =>
{
if(isExistImage(i))
array1.Add(i);
}
But this is not working properly. Sometimes the image detection is right. But most of the time its wrong. When i use non parallel for loop, then it's always right.
I am not understanding what is the problem here. what should i apply here. Is there any technique i am missing?

Your problem is that ArrayList (and most other .Net collections) is not thread-safe.
There are several ways to fix this, but I think that in this case, the best option is to use PLINQ:
List<int> pagesWithImages = ParallelEnumerable.Range(1, pdf.Length)
.Where(i => isExistImage(i))
.ToList();
This will use multiple threads to call the (weirdly named) isExistImage method, which is exactly what you want, and then return a List<int> containing the indexes that matched the condition.
The returned list won't be sorted. If you want that, add AsOrdered() before the Where().
BTW, you really shouldn't be using ArrayList. If you want a list of integers, use List<int>.

ArrayList isn't thread safe; look into concurrent collections here.
is isExistImage thread safe? I.e. are you locking before updating any member variables??

Related

Push Item to the end of an array

No, I can't use generic Collections. What I am trying to do is pretty simple actually. In php I would do something like this
$foo = [];
$foo[] = 1;
What I have in C# is this
var foo = new int [10];
// yeah that's pretty much it
Now I can do something like foo[foo.length - 1] = 1 but that obviously wont work. Another option is foo[foo.Count(x => x.HasValue)] = 1 along with a nullable int during declaration. But there has to be a simpler way around this trivial task.
This is homework and I don't want to explain to my teacher (and possibly the entire class) what foo[foo.Count(x => x.HasValue)] = 1 is and why it works etc.

The simplest way is to create a new class that holds the index of the inserted item:
public class PushPopIntArray
{
private int[] _vals = new int[10];
private int _nextIndex = 0;
public void Push(int val)
{
if (_nextIndex >= _vals.Length)
throw new InvalidOperationException("No more values left to push");
_vals[_nextIndex] = val;
_nextIndex++;
}
public int Pop()
{
if (_nextIndex <= 0)
throw new InvalidOperationException("No more values left to pop");
_nextIndex--;
return _vals[_nextIndex];
}
}
You could add overloads to get the entire array, or to index directly into it if you wanted. You could also add overloads or constructors to create different sized arrays, etc.

In C#, arrays cannot be resized dynamically. You can use Array.Resize (but this will probably be bad for performance) or substitute for ArrayList type instead.

But there has to be a simpler way around this trivial task.
Nope. Not all languages do everything as easy as each other, this is why Collections were invented. C# <> python <> php <> java. Pick whichever suits you better, but equivalent effort isn't always the case when moving from one language to another.

foo[foo.Length] won't work because foo.Length index is outside the array.
Last item is at index foo.Length - 1
After that an array is a fixed size structure if you expect it to work the same as in php you're just plainly wrong

Originally I wrote this as a comment, but I think it contains enough important points to warrant writing it as an answer.
You seem to be under the impression that C# is an awkward language because you stubbornly insist on using an array while having the requirement that you should "push items onto the end", as evidenced by this comment:
Isn't pushing items into the array kind of the entire purpose of the data structure?
To answer that: no, the purpose of the array data structure is to have a contiguous block of pre-allocated memory to mimic the original array structure in C(++) that you can easily index and perform pointer arithmetic on.
If you want a data structure that supports certain operations, such as pushing elements onto the end, consider a System.Collections.Generic.List<T>, or, if you insist on avoiding generics, a System.Collections.List. There are specializations that specify the underlying storage structure (such as ArrayList) but in general the whole point of the C# library is that you don't want to concern yourself with such details: the List<T> class has certain guarantees on its operations (e.g. insertion is O(n), retrieval is O(1) -- just like an array) and whether there is an array or some linked list that actually holds the data is irrelevant and is in fact dynamically decided based on the size and use case of the list at runtime.
Don't try to compare PHP and C# by comparing PHP arrays with C# arrays - they have different programming paradigms and the way to solve a problem in one does not necessarily carry over to the other.
To answer the question as written, I see two options then:
Use arrays the awkward way. Either create an array of Nullable<int>s and accept some boxing / unboxing and unpleasant LINQ statements for insertion; or keep an additional counter (preferably wrapped up in a class together with the array) to keep track of the last assigned element.
Use a proper data structure with appropriate guarantees on the operations that matter, such as List<T> which is effectively the (much better, optimised) built-in version of the second option above.
I understand that the latter option is not feasible for you because of the constraints imposed by your teacher, but then do not be surprised that things are harder than the canonical way in another language, if you are not allowed to use the canonical way in this language.
Afterthought:
A hybrid alternative that just came to mind, is using a List for storage and then just calling .ToArray on it. In your insert method, just Add to the list and return the new array.

C# optimization calling a method and returning its value for each item in the list

currently, I have a list of items that I want to loop thru. For each item number I want to call another method and return its value to another list.
Below is the implemented code by me:
List<SomeClass> lst = new List<SomeClass>();
foreach (var i in someList)
{
lst.AddRange(obj.ReturnsAnEnumerableOf_MyClass(i.someId));
}
I'm not sure if calling a method within the AddRange method is a good idea or not.
So, I want to optimize this into something better using a line like below code:
var j = someList.ForEach(i => obj.ReturnsAnEnumerableOf_MyClass(i.someId));
But, I get below error by executing above one liner code:
Cannot assign void to an implicitly typed variable
.
How can I optimize the above code?

As per this MSDN page, the ForEach method returns void. That is the cause for the error you get.
If you want the convert the above code to a one-liner, you can use:
var j = someList.SelectMany(i => obj.ReturnsAnEnumerableOf_MyClass(i.someId)).ToList();
j will be a List<T> instance, where T is the type of the list ReturnsAnEnumerableOf_MyClass returns.
Also, as a side note, there isn't much optimization going on around here: performance wise, both ways should take the same time. It's more of a preference of readability.
The performance should be identical because in both ways there the same amount of operations (which are the number of operations that ReturnAsEnumerableOf_MyClass() multiplied by the length of someList) is being carried out, so there shouldn't be any difference.

Looping through associated Lists in a class?

I have a class that has several List<T> objects in it. These Lists are "associated" so that the first items in each are related, and the second ones, and so on (kind of like fields within a single record). I want to loop through the Lists together to alter some of the data simultaneously per "record".
With a foreach loop, I can loop through one List without tracking the record via i or some such. However, I don't know how to simultaneously access the related items in the other Lists. Do I have to count it out using a variable like i, or is there a better way? I'm still pretty new to generics and class-based programming. Am I totally missing a better way to arrange this data?

So this is kind of a fun problem... Note that I suspect some different data modeling might have been able to get around this issue, but if you stored the related items together in a Tuple you could get away from having sync'ed lists... It seems very dangerous to have these sync'ed lists and rely on the fact that they should all correspond at "i" in that any sorting, grouping, or paging (Skip/Take) could break this paradigm.
If you stored them in a List<Tuple<ItemTypeFromList1, ItemTypeFromList2, ... ItemTypeFromListN>> then you could keep the items together in a single list such that you could do a single iteration over the list and then just act on the N items in the tuple appropriately

Use a standard for loop and an index (your i) that will allow you to access the same element in each array. There is no better way to do it.

How about collecting all data for the 'row' in a single class and place instances of this class in a single list as opposed to multiple lists you are trying to keep in synch

The easiest way I can think of would be to use a standard for-loop. When the index is important I always prefer for-loops instead of foreach.
for(int i = 0; i < list1.Count(); i++)
{
list1[i].someMethod();
list2[i].someMethod();
...
}
I assume all lists are of equal length when they are related as you say.
You might want to look into grouping the related items together in a single class and then have only one list, instead of multiple.

Try using following code
foreach (var i in firstList )
{
var s1 = secondList[firstList.LastIndexOf(i)];
var s2 = thirdList[firstList.LastIndexOf(i)];
}
Hope this is the answer you want..:)

How to iterate through Dictionary without using foreach

I am not sure if the title formulates it well so sorry.
I basically have a bunch of elements listing targets for a communication. I placed them in a dictionary though i am open to moving them to a different data structure. My problem is that i have a tree-like structure where a key is a branch and each branch has many leaves. Both the branch and the leaves have names stored in strings (cannot be numeral).
private Dictionary < string, string[]> targets;
For each element in the dictionary i must send a communication, and when the target answers i go to the next target and start over. So after searching i am faced with these dilemmas:
I cannot use the usual foreach because i need to keep the pointer in memory to pass it in between threads.
Since dictionaries are random access it is difficult to keep a pointer
When i receive a communication i must verify if the origins are from a target, so i like the dictionary.contains method for that.
I am fairly new at C#, so the answer is probably obvious but i am finding a hard time finding a data structure that fits my needs. What would be the simplest solution? Can somebody suggest anything?
Thank you.
EDIT
I think my post has confused many, and they are sort of stuck on the terms pointers and threads. By threads i don`t mean that they are parallel, simply that i cannot use a foreach or a loop as the next thread that does the next iteration is triggered by incoming communication. This mechanism cannot be changed at the moment, just the iteration must be. By pointer i wasn't referring to the memory pointers often used in C, i just meant something that points to where you are in a list. Sorry i am a Java programmer so i might be using confusing terms.
I noticed the Enumerator is often inherited and that it can be used with structures such as Dictionary and Linked List. Examples i find talk about this sub structure being encapsulated, and shows foreach loops as examples.
Would it be possible to use GetEnumerator() in some way that the enumerator would remember the current position even when accessed through a different thread?
I am off to test these on my own, but if any input from more experienced people is always appreciated!

I think you need to re-work your architecture a bit, the Dictionary itself is probably not the data structure you need to use for a ordered iteration.
I would consider moving your tree into a linked list instead.
When you kick off your communications I would suggest having your threads callback a delegate to update your list data, or another shared datastructure that keeps track of where you are in the communication process.
static LinkedList<LeafItem> TreeList = new LinkedList<LeafItem>( );
foreach (LeafItem li in TreeList) {
Thread newThread = new Thread(
new ParameterizedThreadStart(Work.DoWork));
newThread.Start(li);
}

You can enumerate over this in parallel using Parallel.ForEach method (from .NET 4). It has been backported as part of the Rx Framework for use in .NET 3.5sp1.
Note - this doesn't actually use one thread per item, but rather partitions the work using the thread pool, based on the hardware thread count of the system on which you're executing (which is usually better...). In .NET 4, it takes advantage of the ThreadPool's new hill climbing and work stealing algorithms, so is very efficient.

this one is a slight long shot, and I suspect I've messed it up somewhere here :/
basically the idea is to create a custom IEnumerator for your dictionary. The idea being that it contains a static variable that keeps the "location" of the enumeration, for continuing.
the following is some skeleton code for something that does work for pausing and restarting.
public class MyDictEnumerator<T> : IEnumerator<T>
{
private List<T> Dict;
private static int curLocation = -1;
public MyDictEnumerator(List<T> dictionary)
{
Dict = dictionary;
}
public T Current
{
get { return Dict[curLocation]; }
}
public void Dispose()
{ }
object System.Collections.IEnumerator.Current
{
get { return Dict[curLocation]; }
}
public bool MoveNext()
{
curLocation++;
if (curLocation >= Dict.Count)
return false;
return true;
}
public void Reset()
{
curLocation = -1;
}
}
Then to use:
MyDictEnumerator<KeyValuePair<string, int>> enumer = new MyDictEnumerator<KeyValuePair<string, int>>(test.ToList());
while (enumer.MoveNext())
{
Console.WriteLine(enumer.Current.Value);
}
I'll admit that this isn't the cleanest way of doing it. But if you break out of the enumerator, and create a new one on another thread, then it will continue at the same point (i think :/)
I hope this helps.

Edit: from your comments:
My alogrithm is more like: Get the
first target Send the message to the
first target Thread DIES - Catch a
port reception event check if its the
right target do some actions - go to
the next target start the loop over.
If you want to process the items asynchronously but not in parallel, you should be able to achieve this by copying the dictionary's keys to a Queue<string> and passing both to the callback that handles your asynchronous responses.
Your completion handler pseduo-code might look like this:
// first extract your dictionary, key, and queue from whatever state
// object you're using to pass data back to the completion event
if (dictionary.Contains(key)) {
// process the response
}
if (queue.Count > 0) {
string key = queue.Dequeue();
string[] messages = dictionary[key];
// send the messages, along with your state data and this callback
}

Hopefully simple question about modifying dictionaries in C#

I have a huge dictionary of blank values in a variable called current like so:
struct movieuser {blah blah blah}
Dictionary<movieuser, float> questions = new Dictionary<movieuser, float>();
So I am looping through this dictionary and need to fill in the "answers", like so:
for(var k = questions.Keys.GetEnumerator();k.MoveNext(); )
{
questions[k.Current] = retrieveGuess(k.Current.userID, k.Current.movieID);
}
Now, this doesn't work, because I get an InvalidOperationException from trying to modify the dictionary I am looping through. However, you can see that the code should work fine - since I am not adding or deleting any values, just modifying the value. I understand, however, why it is afraid of my attempting this.
What is the preferred way of doing this? I can't figure out a way to loop through a dictionary WITHOUT using iterators.
I don't really want to create a copy of the whole array, since it is a lot of data and will eat up my ram like its still Thanksgiving.
Thanks,
Dave

Matt's answer, getting the keys first, separately is the right way to go. Yes, there'll be some redundancy - but it will work. I'd take a working program which is easy to debug and maintain over an efficient program which either won't work or is hard to maintain any day.
Don't forget that if you make MovieUser a reference type, the array will only be the size of as many references as you've got users - that's pretty small. A million users will only take up 4MB or 8MB on x64. How many users have you really got?
Your code should therefore be something like:
IEnumerable<MovieUser> users = RetrieveUsers();
IDictionary<MovieUser, float> questions = new Dictionary<MovieUser, float>();
foreach (MovieUser user in users)
{
questions[user] = RetrieveGuess(user);
}
If you're using .NET 3.5 (and can therefore use LINQ), it's even easier:
IDictionary<MovieUser, float> questions =
RetrieveUsers.ToDictionary(user => user, user => RetrieveGuess(user));
Note that if RetrieveUsers() can stream the list of users from its source (e.g. a file) then it will be efficient anyway, as you never need to know about more than one of them at a time while you're populating the dictionary.
A few comments on the rest of your code:
Code conventions matter. Capitalise the names of your types and methods to fit in with other .NET code.
You're not calling Dispose on the IEnumerator<T> produced by the call to GetEnumerator. If you just use foreach your code will be simpler and safer.
MovieUser should almost certainly be a class. Do you have a genuinely good reason for making it a struct?

Is there any reason you can't just populate the dictionary with both keys and values at the same time?
foreach(var key in someListOfKeys)
{
questions.Add(key, retrieveGuess(key.userID, key.movieID);
}

store the dictionary keys in a temporary collection then loop over the temp collection and use the key value as your indexer parameter. This should get you around the exception.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.