No idea if this is possible, but rather than iterate over a dictionary and modify entries based on some condition, sequentially, I was wondering if it is possible to do this in parallel?
For example, rather than:
Dictionary<int, byte> dict = new Dictionary<int, byte>();
for (int i = 0; i < dict.Count; i++)
{
dict[i] = 255;
}
I'd like something like:
Dictionary<int, byte> dict = new Dictionary<int, byte>();
dict.Parallel(x=>x, <condition>, <function_to_apply>);
I realise that in order to build the indices for modifying the dict, we would need to iterate and build a list of ints... but I was wondering if there was some sneaky way to do this that would be both faster and more concise than the first example.
I could of course iterate through the dict and for each entry, spawn a new thread and run some code, return the value and build a new, updated dictionary, but that seems really overkill.
The reason I'm curious is that the <function_to_apply> might be expensive.
I could of course iterate through the dict and for each entry, spawn a new thread and run some code, return the value and build a new, updated dictionary, but that seems really overkill.
Assuming you don't need the dictionary while it's rebuilt it's not that much:
var newDictionary = dictionary.AsParallel()
.Select(kvp =>
/* do whatever here as long as
it works with the local kvp variable
and not the original dict */
new
{
Key = kvp.Key,
NewValue = function_to_apply(kvp.Key, kvp.Value)
})
.ToDictionary(x => x.Key,
x => x.NewValue);
Then lock whatever sync object you need and swap the new and old dictionaries.
First of all, I mostly agree with others recommending ConcurrentDictionary<> - it is designed to be thread-safe.
But if you are adventurous coder ;) and performance it super-critical for you, you could sometimes try doing what you (I suppose) is trying to do in case no new keys are added and no keys are removed from dictionary during your parallel manipulations:
int keysNumber = 1000000;
Dictionary<int, string> d = Enumerable.Range(1, keysNumber)
.ToDictionary(x => x, x => (string)null);
Parallel.For(1, keysNumber + 1, k => { d[k] = "Value" + k; /*Some complex logic might go here*/ });
To verify data consistency after these operations you can add:
Debug.Assert(d.Count == keysNumber);
for (int i = 1; i <= keysNumber; i++)
{
Debug.Assert(d[i] == "Value" + i);
}
Console.WriteLine("Successful");
WHY IT WORKS:
Basically we have created dictionary in advance from SINGLE main thread and then popullated it in parallel. What allows us to do that is that current Dictionary implementation (Microsoft does not guarantee that, but most likely won't ever change) defines it's structure solely on keys, and values are just assigned to corresponding cells. Since each key is being assigned a new value from single thread we do not have race condition, and since navigating the hashtable concurrently does not alter it, everything works fine.
But you should be really careful with such code and have very good reasons not to use ConcurrentDictionary.
PS: My main idea is not even a "hack" of using Dicrionary concurrently, but to draw attention that not always every data structure need to be concurrent. I saw ConcurrentDictionary<int, ConcurrentStack<...>>, while each stack object in dictionary could be accessed only from single thread and that is an overkill and doesn't make your performance better. Just keep in mind what are you affecting and what can go wrong with multithreading scenarios.
Related
I've created this normal for loop:
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
Dictionary<string, Dictionary<string, bool>> filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
foreach (var item in files)
{
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
}
return filesAnalyzed;
}
The loop just checks if each file that is in the variable "files" has all the dependencies specified in the variable "dependencies".
the "files" variable should only have unique elements because it is used as the key for the result, a dictionary, but I check this before calling the method.
The for loop works correctly and all elements are processed in single thread, so I wanted to increase the performance by changing to a parallel for loop, the problem is that not all the elements that come from the "files" variable are being processed in the parallel for (in my test case I get 30 elements instead of 53).
I've tried to increase the timespan, or to remove all the "Monitor.TryEnter" code and use just a lock(filesAnalyzed) but still got the same result
I'm not very familiar with the paraller for, so it might be something in the syntax that I'm using.
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
var filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
Parallel.For<KeyValuePair<string, Dictionary<string, bool>>>(
//start index
0,
//end index
files.Count(),
// initialization?
()=>new KeyValuePair<string, Dictionary<string, bool>>(),
(index, loop, result) =>
{
var temp = new KeyValuePair<string, Dictionary<string, bool>>(
files.ElementAt(index),
AnalyzeFile(files.ElementAt(index), dependencies));
return temp;
}
,
//finally
(x) =>
{
if (Monitor.TryEnter(filesAnalyzed, new TimeSpan(0, 0, 30)))
{
try
{
filesAnalyzed.Add(x.Key, x.Value);
}
finally
{
Monitor.Exit(filesAnalyzed);
}
}
}
);
return filesAnalyzed;
}
any feedback is appreciated
Assuming the code inside AnalyzeFile and dependencies is thread safe, how about something like this:
var filesAnalyzed = files
.AsParellel()
.Select(x => new{Item = x, File = AnalyzeFile(x, dependencies)})
.ToDictionary(x => x.Item, x=> x.File);
Rewrite your normal loop this way:
Parallel.Foreach(files, item=>
{
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});
You should also use ConcurrentDictionary except Dictionary to make all process thread-safe
You can simplify your code a lot if you use Parallel LINQ instead :
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
var filesAnalyzed = ( from item in files.AsParallel()
let result=AnalyzeFile(item, dependencies)
select (Item:item,Result:result)
).ToDictionary( it=>it.Item,it=>it.Result)
return filesAnalyzed;
}
I used tuple syntax in this case to avoid noise. It also cuts down on allocations.
Using method syntax, the same can be written as :
var filesAnalyzed = files.AsParallel()
.Select(item=> (item, AnalyzeFile(item, dependencies)))
.ToDictionary( it=>it.Item,it=>it.Result)
Dictionary<> isn't thread-safe for modification. If you wanted to use Parallel.ForEach without locking, you'd have to use ConcurrentDictionary
var filesAnalyzed = ConcurrentDictionary<string,Dictionary<string,bool>>;
Parallel.ForEach(files,file => {
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});
In this case at least, there is no benefit in using Parallel over PLINQ.
Hard to say what is exactly going wrong without debugging the code. Just looking at it though I would have used a ConcurrentDictionary for filesAnalyzed variable instead of a normal `Dictionary and get rid of the Monitor.
I would also check whether same key already exists in the dictionary filesAnalyzed, it could be that you are trying to add a kvp withthe key that is added to the dictionary already.
I am using Linq to Sql.
Here is the code:
Dictionary<string, int> allResults;
using (var dc= new MyDataContext())
{
dc.CommandTimeout = 0;
allResults = dc.MyTable.ToDictionary(x => x.Text, x => x.Id);
}
it is ran on a 64 bit machine and the compilation is AnyCPU. It throws a System.OutOfMemoryException.
This accesses an SQL Server database. The Id field maps to an SQL Server int field, And the Text field maps to Text(nvarchar(max)) field. Running select COUNT(*) from TableName results in 1,173,623 records and running select sum(len(Text)) from TableName results in 48,915,031. Since int is a 32 bit integer, the ids should take only 4.69MB of space and the strings less than 1GB. So we are not even bumping against the 2GB/object limit.
I then change the code in this way:
Dictionary<string, int> allResults;
using (var dc = new MyDataContext())
{
Dictionary<string, int> threeHundred;
dc.CommandTimeout = 0;
var tot = dc.MyTable.Count();
allResults = new Dictionary<string, int>(tot);
int skip = 0;
int takeThis = 300000;
while (skip < tot)
{
threeHundred = dc.MyTable.Skip(skip).Take(takeThis).ToDictionary(x => x.Text, x => x.Id);
skip = skip + takeThis;
allResults = allResults.Concat(threeHundred).ToDictionary(x => x.Key, x => x.Value);
threeHundred = null;
GC.Collect();
}
}
I learn that garbage collaction here does not help and that the out of memory exception is thrown on the first line in the while loop once skip = 900,000.
What is wrong and how do I fix this?
Without getting into your calculations of how much it should take in memory (as there could be issues of encoding that could easily double the size of the data), I'll try to give a few pointers.
Starting with the cause of the issue - my guess is that the threeHundred dictionary is causing a lot of allocations.
When you add items to a dictionary like above, the dictionary won't be able to know how many items it should pre-allocated. Which will cause a massive re-allocation and coping of all data to newly created dictionaries.
Please set a size (using the ctor) to the threeHundred dictionary before adding any items to it.
Please read this article I've published which goes in-depth into Dictionary internals - I'm sure it will shed some light on those symptoms.
http://www.codeproject.com/Articles/500644/Understanding-Generic-Dictionary-in-depth
In addition, when trying to populate this large amount of data, I suggest to fully control the process.
My suggestion:
Pre-allocate slots in the Dictionary (using a Count query directly on the DB, and passing it to the Dictionary ctor)
Work with DataReader for populating those items without loading all of the query result into memory.
If you know for a fact (which is VERY important to know this in advance) - think of using string.Intern - only if there are many duplicated items! - you should test to see how it is working
Memory-profile the code - you should only see ONE allocation for the Dictionary, and strings as the amount of the items from the query (int is value type - therefor it is not allocated on the heap as an object, but instead it sits inside the Dictionary.
Either way, you should check if you are running on 32 bit or 64 bit. .Net 4.5 prefers 32 bit. (check it on Task Manager or the project properties)
Hope this helps,
Ofir.
Recently I was running into the following exception when using a generic dictionary
An InvalidOperationException has occurred. A collection was modified
I realized that this error was primarily because of thread safety issues on the static dictionary I was using.
A little background: I currently have an application which has 3 different methods that are related to this issue.
Method A iterates through the dictionary using foreach and returns a value.
Method B adds data to the dictionary.
Method C changes the value of the key in the dictionary.
Sometimes while iterating through the dictionary, data is also being added, which is the cause of this issue. I keep getting this exception in the foreach part of my code where I iterate over the contents of the dictionary. In order to resolve this issue, I replaced the generic dictionary with the ConcurrentDictionary and here are the details of what I did.
Aim : My main objective is to completely remove the exception
For method B (which adds a new key to the dictionary) I replaced .Add with TryAdd
For method C (which updates the value of the dictionary) I did not make any changes. A rough sketch of the code is as follows :
static public int ChangeContent(int para)
{
foreach (KeyValuePair<string, CustObject> pair in static_container)
{
if (pair.Value.propA != para ) //Pending cancel
{
pair.Value.data_id = prim_id; //I am updating the content
return 0;
}
}
return -2;
}
For method A - I am simply iterating over the dictionary and this is where the running code stops (in debug mode) and Visual Studio informs me that this is where the error occured.The code I am using is similar to the following
static public CustObject RetrieveOrderDetails(int para)
{
foreach (KeyValuePair<string, CustObject> pair in static_container)
{
if (pair.Value.cust_id.Equals(symbol))
{
if (pair.Value.OrderStatus != para)
{
return pair.Value; //Found
}
}
}
return null; //Not found
}
Are these changes going to resolve the exception that I am getting.
Edit:
It states on this page that the method GetEnumerator allows you to traverse through the elements in parallel with writes (although it may be outdated). Isnt that the same as using foreach ?
For modification of elements, one option is to manually iterate the dictionary using a for loop, e.g.:
Dictionary<string, string> test = new Dictionary<string, string>();
int dictionaryLength = test.Count();
for (int i = 0; i < dictionaryLength; i++)
{
test[test.ElementAt(i).Key] = "Some new content";
}
Be weary though, that if you're also adding to the Dictionary, you must increment dictionaryLength (or decrement it if you move elements) appropriately.
Depending on what exactly you're doing, and if order matters, you may wish to use a SortedDictionary instead.
You could extend this by updating dictionaryLength explicitly by recalling test.Count() at each iteration, and also use an additional list containing a list of keys you've already modified and so on and so forth if there's a danger of missing any, it really depends what you're doing as much as anything and what your needs are.
You can further get a list of keys using test.Keys.ToList(), that option would work as follows:
Dictionary<string, string> test = new Dictionary<string, string>();
List<string> keys = test.Keys.ToList();
foreach (string key in keys)
{
test[key] = "Some new content";
}
IEnumerable<string> newKeys = test.Keys.ToList().Except(keys);
if(newKeys.Count() > 0)
// Do it again or whatever.
Note that I've also shown an example of how to find out whether any new keys were added between you getting the initial list of keys, and completing iteration such that you could then loop round and handle the new keys.
Hopefully one of these options will suit (or you may even want to mix and match- for loop on the keys for example updating that as you go instead of the length) - as I say, it's as much about what precisely you're trying to do as much as anything.
Before doing foreach() try out copying container to a new instance
var unboundContainer = static_container.ToList();
foreach (KeyValuePair<string, CustObject> pair in unboundContainer)
Also I think updating Value property is not right from thread safety perspectives, refactor your code to use TryUpdate() instead.
I have a Dictionary<int, int> and would like to update certain elements all at once based on their current values, e.g. changing all elements with value 10 to having value 14 or something.
I imagined this would be easy with some LINQ/lambda stuff but it doesn't appear to be as simple as I thought. My current approach is this:
List<KeyValuePair<int, int>> kvps = dictionary.Where(d => d.Value == oldValue).ToList();
foreach (KeyValuePair<int, int> kvp in kvps)
{
dictionary[KeyValuePair.Key] = newValue;
}
The problem is that dictionary is pretty big (hundreds of thousands of elements) and I'm running this code in a loop thousands of times, so it's incredibly slow. There must be a better way...
This might be the wrong data structure. You are attempting to look up dictionary entries based on their values which is the reverse of the usual pattern. Maybe you could store Sets of keys that currently map to certain values. Then you could quickly move these sets around instead of updating each entry separately.
I would consider writing your own collection type to achieve this whereby keys with the same value actually share the same value instance such that changing it in one place changes it for all keys.
Something like the following (obviously, lots of code omitted here - just for illustrative purposes):
public class SharedValueDictionary : IDictionary<int, int>
{
private List<MyValueObject> values;
private Dictionary<int, MyValueObject> keys;
// Now, when you add a new key/value pair, you actually
// look in the values collection to see if that value already
// exists. If it does, you add an entry to keys that points to that existing object
// otherwise you create a new MyValueObject to wrap the value and add entries to
// both collections.
}
This scenario would require multiple versions of Add and Remove to allow for changing all keys with the same value, changing only one key of a set to be a new value, removing all keys with the same value and removing just one key from a value set. It shouldn't be difficult to code for these scenarios as and when needed.
You need to generate a new dictionary:
d = d.ToDictionary(w => w.Key, w => w.Value == 10 ? 14 : w.Value)
I think the thing that everybody must be missing is that it is exceeeeedingly trivial:
List<int> keys = dictionary.Keys.Where(d => d == oldValue);
You are NOT looking up keys by value (as has been offered by others).
Instead, keys.SingleOrDefault() will now by definition return the single key that equals oldValue if it exists in the dictionary. So the whole code should simplify to
if (dictionary.ContainsKey(oldValue))
dictionary[key] = newValue;
That is quick. Now I'm a little concerned that this might indeed not be what the OP intended, but it is what he had written. So if the existing code does what he needs, he will now have a highly performant version of the same :)
After the edit, this seems an immediate improvement:
foreach (var kvp in dictionary.Where(d => d.Value == oldValue))
{
kvp.Value = newValue;
}
I'm pretty sure you can update the kvp directly, as long as the key isn't changed
I have this and all seems to work fine but not sure why and if its valid.
Dictionary<string, List<string>> test = new Dictionary<string, List<string>>();
while (test.Count > 0)
{
var obj = test.Last();
MyMethod(obj);
test.Remove(obj.Key);
}
Update: Thanks for the answers, I have updated my code to explain why I don't do Dictionary.Clear();
I don't understand why you are trying to process all Dictonary entries in reverse order - but your code is OK.
It might be a bit faster to get a list of all Keys and process the entries by key instead of counting again and again...
E.G.:
var keys = test.Keys.OrderByDescending(o => o).ToList();
foreach (var key in keys)
{
var obj = test[key];
MyMethod(obj);
test.Remove(key);
}
Dictonarys are fast when they are accessed by their key value. Last() is slower and counting is not necessary - you can get a list of all (unique) keys.
There is nothing wrong with mutating a collection type in a while loop in this manner. Where you get into trouble is when you mutate a collection during a foreach block. Or more generally use a IEnumerator<T> after the underlying collection is mutated.
Although in this sample it would be a lot simpler to just call test.Clear() :)
That works, fine, since you're not iterating over the dictionary while removing items. Each time you check test.Count, it's like it's checking it from scratch.
That being said, the above code could be written much simpler and more effectively:
test.Clear();
It works because Count will be updated every time you remove an object. So say count is 3, test.Remove will decriment the count to 2, and so on, until the count is 0, then you will break out of the loop
Yes, this should be valid, but why not just call Dictionary.Clear()?
All you're doing is taking the last item in the collection and removing it until there are no more items left in the Dictionary.
Nothing out of the ordinary and there's no reason it shouldn't work (as long as emptying the collection is what you want to do).
So, you're just trying to clear the Dictionary, correct? Couldn't you just do the following?
Dictionary<string, List<string>> test = new Dictionary<string, List<string>>();
test.Clear();
This seems like it will work, but it looks extremely expensive. This would be a problem if you were iterating over it with a foreach loop (you can't edit collections while your iterating).
Dictionary.Clear() should do the trick (but you probably already knew that).
Despite your update, you can probably still use clear...
foreach(var item in test) {
MyMethod(item);
}
test.Clear()
Your call to .Last() is going to be extremely inefficient on a large dictionary, and won't guarantee any particular ordering of the processing regardless (the Dictionary is an unordered collection)
I used this code to remove items conditionally.
var dict = new Dictionary<String, float>
var keys = new String[dict.Count];
dict.Keys.CopyTo(keys, 0);
foreach (var key in keys) {
var v = dict[key];
if (condition) {
dict.Remove(key);
}