How to iterate through Dictionary without using foreach - c#

I am not sure if the title formulates it well so sorry.
I basically have a bunch of elements listing targets for a communication. I placed them in a dictionary though i am open to moving them to a different data structure. My problem is that i have a tree-like structure where a key is a branch and each branch has many leaves. Both the branch and the leaves have names stored in strings (cannot be numeral).
private Dictionary < string, string[]> targets;
For each element in the dictionary i must send a communication, and when the target answers i go to the next target and start over. So after searching i am faced with these dilemmas:
I cannot use the usual foreach because i need to keep the pointer in memory to pass it in between threads.
Since dictionaries are random access it is difficult to keep a pointer
When i receive a communication i must verify if the origins are from a target, so i like the dictionary.contains method for that.
I am fairly new at C#, so the answer is probably obvious but i am finding a hard time finding a data structure that fits my needs. What would be the simplest solution? Can somebody suggest anything?
Thank you.
EDIT
I think my post has confused many, and they are sort of stuck on the terms pointers and threads. By threads i don`t mean that they are parallel, simply that i cannot use a foreach or a loop as the next thread that does the next iteration is triggered by incoming communication. This mechanism cannot be changed at the moment, just the iteration must be. By pointer i wasn't referring to the memory pointers often used in C, i just meant something that points to where you are in a list. Sorry i am a Java programmer so i might be using confusing terms.
I noticed the Enumerator is often inherited and that it can be used with structures such as Dictionary and Linked List. Examples i find talk about this sub structure being encapsulated, and shows foreach loops as examples.
Would it be possible to use GetEnumerator() in some way that the enumerator would remember the current position even when accessed through a different thread?
I am off to test these on my own, but if any input from more experienced people is always appreciated!

I think you need to re-work your architecture a bit, the Dictionary itself is probably not the data structure you need to use for a ordered iteration.
I would consider moving your tree into a linked list instead.
When you kick off your communications I would suggest having your threads callback a delegate to update your list data, or another shared datastructure that keeps track of where you are in the communication process.
static LinkedList<LeafItem> TreeList = new LinkedList<LeafItem>( );
foreach (LeafItem li in TreeList) {
Thread newThread = new Thread(
new ParameterizedThreadStart(Work.DoWork));
newThread.Start(li);
}

You can enumerate over this in parallel using Parallel.ForEach method (from .NET 4). It has been backported as part of the Rx Framework for use in .NET 3.5sp1.
Note - this doesn't actually use one thread per item, but rather partitions the work using the thread pool, based on the hardware thread count of the system on which you're executing (which is usually better...). In .NET 4, it takes advantage of the ThreadPool's new hill climbing and work stealing algorithms, so is very efficient.

this one is a slight long shot, and I suspect I've messed it up somewhere here :/
basically the idea is to create a custom IEnumerator for your dictionary. The idea being that it contains a static variable that keeps the "location" of the enumeration, for continuing.
the following is some skeleton code for something that does work for pausing and restarting.
public class MyDictEnumerator<T> : IEnumerator<T>
{
private List<T> Dict;
private static int curLocation = -1;
public MyDictEnumerator(List<T> dictionary)
{
Dict = dictionary;
}
public T Current
{
get { return Dict[curLocation]; }
}
public void Dispose()
{ }
object System.Collections.IEnumerator.Current
{
get { return Dict[curLocation]; }
}
public bool MoveNext()
{
curLocation++;
if (curLocation >= Dict.Count)
return false;
return true;
}
public void Reset()
{
curLocation = -1;
}
}
Then to use:
MyDictEnumerator<KeyValuePair<string, int>> enumer = new MyDictEnumerator<KeyValuePair<string, int>>(test.ToList());
while (enumer.MoveNext())
{
Console.WriteLine(enumer.Current.Value);
}
I'll admit that this isn't the cleanest way of doing it. But if you break out of the enumerator, and create a new one on another thread, then it will continue at the same point (i think :/)
I hope this helps.

Edit: from your comments:
My alogrithm is more like: Get the
first target Send the message to the
first target Thread DIES - Catch a
port reception event check if its the
right target do some actions - go to
the next target start the loop over.
If you want to process the items asynchronously but not in parallel, you should be able to achieve this by copying the dictionary's keys to a Queue<string> and passing both to the callback that handles your asynchronous responses.
Your completion handler pseduo-code might look like this:
// first extract your dictionary, key, and queue from whatever state
// object you're using to pass data back to the completion event
if (dictionary.Contains(key)) {
// process the response
}
if (queue.Count > 0) {
string key = queue.Dequeue();
string[] messages = dictionary[key];
// send the messages, along with your state data and this callback
}

Related

C# Closures and a self made SpinLock.RecursiveEnter

actually what I'm simply trying to achieve is to get to know multithreading in C#.
SO i have this class called WeakeningEvictionary{TKey, TValue}, which has a private Dictionary{TKey, CachedValue{TValue}} that functions as the cache. CachedValue is a Wrapper that has a Strong- and WeakReference to TValue. After a predefined Time a Task is created to nullify the StrongReference and put it into WeakReference. I also have a HashSet implemented that keeps track of which keyValuePairs to evict. (added to when weakening happened, removed from when SetValue is called) Immediately after GC has done its Job another Task is created to evict all those mentioned Pairs.
Actually I wouldn't need a RecursiveLock for this, but I encountered Issues, when some stored Information is asked recursively because a construction series required so.
So I came up with this code: (Updated, was a not-going-to-work ExtensionMethod before)
public void RecursiveEnter(Action action)
{
if (_spinLock.IsHeldByCurrentThread)
{
action();
}
else
{
bool gotLock = false;
_spinLock.Enter(ref gotLock);//blocking until acquired
action();
if (gotLock) _spinLock.Exit();
}
}
So what I'm trying to do now is:
private void Evict()
{
RecursiveEnter(() =>
{
foreach (TKey key in toEvict)
{
_dict.Remove(key);
}
}
);
}
Alright what if I use
And my Question is: What are the Risks? And are Closures known to cause Issues when being used by Threads in this way?
Thanks for your Input ;-)
Right off the bat, the method call is 100% not going to work: SpinLock is a value type, you must pass it by reference (RecursiveEnter(ref SpinLock spinLock, Action action)) and not by value.
See for example https://learn.microsoft.com/en-us/dotnet/api/system.threading.spinlock?view=netframework-4.7.2#remarks
I'm not sure this is the best thing for you to use: you should start with a higher-level primitive (maybe a ReaderWriterLockSlim) and refine things only with careful testing and understanding.

Finding the index of an entry in a queue

I currently have queue of class. The class definition looks something like this.
class myClass
{
object obj; //reference to an object.
float value;
}
I have overwritten the Equals function for myClass.It will do something like this
this.obj.Equals(otherObj.obj) && this.value.Equals(otherObj.value)
So when this returns true, I want to be able to know where this entry is located in the queue. I can do a foreach and iterate through the queue, and for every iteration, increment a counter until we find it in the queue. But I was wondering if there is a better way to do this (more quickly, constant time if possible). I need to use a queue to have the FIFO behavior (I could use a list and sort it, but for the sake of this question, lets assume we are using a queue).
And help would be appreciated!

Converting a for loop into Task.Parallel.For

I have a procedure bool IsExistImage(int i) . the task of the procedure to detect an image and return bool whether it exist or not.
i have a PDF of 100+ pages which i split and send only the file name through the method. file names are actually the pagenumber of the main PDF file. like 1,2,3,...,125,..
after detecting the image, my method correctly save the list of pages. For that i used this code:
ArrayList array1 = new ArrayList();
for(int i=1;i<pdf.length;i++)
{
if(isExistImage(i))
{
array1.add(i);
}
}
This process runs for more than 1 hours(obviously for the internal works in isExistImage() method.). I can assure you, that no object/variable are global out side the method scope.
So, to shorten the time, I used Task.Parallel For loop. here is what i did :
System.Threading.Tasks,Parallel.For(1,pdf.Length,i =>
{
if(isExistImage(i))
array1.Add(i);
}
But this is not working properly. Sometimes the image detection is right. But most of the time its wrong. When i use non parallel for loop, then it's always right.
I am not understanding what is the problem here. what should i apply here. Is there any technique i am missing?
Your problem is that ArrayList (and most other .Net collections) is not thread-safe.
There are several ways to fix this, but I think that in this case, the best option is to use PLINQ:
List<int> pagesWithImages = ParallelEnumerable.Range(1, pdf.Length)
.Where(i => isExistImage(i))
.ToList();
This will use multiple threads to call the (weirdly named) isExistImage method, which is exactly what you want, and then return a List<int> containing the indexes that matched the condition.
The returned list won't be sorted. If you want that, add AsOrdered() before the Where().
BTW, you really shouldn't be using ArrayList. If you want a list of integers, use List<int>.
ArrayList isn't thread safe; look into concurrent collections here.
is isExistImage thread safe? I.e. are you locking before updating any member variables??

Function should get called only once

I have one c# function which returns me List of States. I want this function should get called only once like static variable.
public List GetStateList()
{
List lstState=new List();
lstState.add("State1");
lstState.add("State2");
lstState.add("State3");
return lstState;
}
I m calling this function from many places since this state list is going to be same so i want this function should get called only once, and next time when this function is getting called it should not re create the whole list again.
How could i achieve this in c#.
Memoise it. It'll still be called multiple times, but only do the full work once:
private List<string> _states; //if GetStateList() doesn't depend on object
//state, then this can be static.
public List GetStateList()
{
if(_states == null)
{
List lstState=new List();
lstState.add("State1");
lstState.add("State2");
lstState.add("State3");
_states = lstState;
}
return _states;
}
Depending on threading issues, you may wish to either:
Lock on the whole thing. Guaranteed single execution.
Lock on the assignment to _states. There may be some needless work in the early period, but all callers will receive the same object.
Allow for early callers to overwrite each other.
While the last may seem the most wasteful, it can be the best in the long run, as after the initial period where different calls may needlessly overwrite each other, it becomes a simpler and faster from that point on. It really depends on just how much work is done, and how often it may be concurrently called prior to _states being assigned to.
One issue with reusing a list is that callers can modify this list, which will affect any pre-existing references to it. For such a small amount of data, this isn't likely to save you very much in the long run. I'd probably be content to just return a new array each time.
I certainly wouldn't bother with lazy instantiation; populate it in the constructor and be done:
public static class States {
static States() {
All = Array.AsReadOnly(new string[] { "state1", "state2", "state3" });
}
public static readonly ReadOnlyCollection<string> All;
}
Now it's thread-safe, (relatively) tamper-proof, and above all, simple.

using ThreadPools to search through object lists

I have these container objects (let's call them Container) in a list. Each of these Container objects in turn has a DataItem (or a derivate) in a list. In a typical scenario a user will have 15-20 Container objects with 1000-5000 DataItems each. Then there are some DataMatcher objects that can be used for different types of searches. These work mostly fine (since I have several hundred unit tests on them), but in order to make my WPF application feel snappy and responsive, I decided that I should use the ThreadPool for this task. Thus I have a DataItemCommandRunner which runs on a Container object, and basically performs each delegate in a list it takes as a parameter on each DataItem in turn; I use the ThreadPool to queue up one thread for each Container, so that the search in theory should be as efficient as possible on multi-core computers etc.
This is basically done in a DataItemUpdater class that looks something like this:
public class DataItemUpdater
{
private Container ch;
private IEnumerable<DataItemCommand> cmds;
public DataItemUpdater(Container container, IEnumerable<DataItemCommand> commandList)
{
ch = container;
cmds = commandList;
}
public void RunCommandsOnContainer(object useless)
{
Thread.CurrentThread.Priority = ThreadPriority.AboveNormal;
foreach (DataItem di in ch.ItemList)
{
foreach (var cmd in cmds)
{
cmd(sh);
}
}
//Console.WriteLine("Done running for {0}", ch.DisplayName);
}
}
(The useless object parameter for RunCommandsOnContainer is because I am experimenting with this with and without using threads, and one of them requires some parameter. Also, setting the priority to AboveNormal is just an experiment as well.)
This works fine for all but one scenario - when I use the AllWordsMatcher object type that will look for DataItem objects containing all words being searched for (as opposed to any words, exact phrase or regular expression for instance).
This is a pretty simple somestring.Contains(eachWord) based object, backed by unit tests. But herein lies some hairy strangeness.
When the RunCommandsOnContainer runs using ThreadPool threads, it will return insane results. Say I have a string like this:
var someString = "123123123 - just some numbers";
And I run this:
var res = someString.Contains("data");
When it runs, this will actually return true quite a lot - I have debugging information that shows it returning true for empty strings and other strings that simply do not contain the data. Also, it will some times return false even when the string actually contains the data being looked for.
The kicker in all this? Why do I suspect the ThreadPool and not my own code?
When I run the RunCommandsOnContainer() command for each Container in my main thread (i.e. locking the UI and everything), it works 100% correctly - every time! It never finds anything it shouldn't, and it never skips anything it should have found.
However, as soon as I use the ThreadPool, it starts finding a lot of items it shouldn't, while some times not finding items it should.
I realize this is a complex problem (it is painful trying to debug, that's for sure!), but any insight into why and how to fix this would be greatly appreciated!
Thanks!
Rune
It's a bit hard to see from the fragment you're posting, but judging by the symptoms I would look at the AllWordsMatcher (look for static state). If AllWordsMatcher is stateful you should also check that you're creating a new instance for each thread.
More generally I'd look at all the instances involved in the matching/searching process, specifically at the working objects being used when multithreaded. From past experience, the problem usually lies there. (It's easy to look too much at the object graph representing your business data Container/DataItem in this case)

Categories

Resources