C# - Interlocked Increment on a Dictionary Like List - c#

I know that the int wont have a fixed position in memory so it simply cant work like that.
But The exact same portion of code will be run concurrently with different names, parameters e.t.c
I need to essentially pass a string of "Name" and then somehow increment one of the items in my int array.
Dictionary<string, int> intStats = new Dictionary<string, int>();
This dictionary stores all the stats based on the "Name" supplied as the dictionaries string key.
And since im using a LOT of multi-threading, I wish to keep the int count as synchronized as possible. Which is why im attempting to use Interlocked.Increment(ref intStats[theName]);
But unfortunately this wont work.
Is there any alternatives that would work for my situation?

First, I suggest creating a custom type that captures the semantics of your abstract data type. That way you can experiment with different implementations, and that way your call sites become self-documenting.
internal sealed class NameCounter
{
public int GetCount(string Name) { ... }
public void Increment(string Name) { ... }
}
So: what implementation choices might you make, given that this must be threadsafe?
a private Dictionary<string, int> would work but you'd have to lock the dictionary on every access, which could get expensive.
a private ConcurrentDictionary<string, int>, but keep in mind that you have to use TryUpdate in a loop to make sure you don't lose values.
make a wrapper type:
internal sealed class MutableInt
{
public int Value;
}
This is one of the rare cases when you'd want to make a public field. Now make a ConcurrentDictionary<string, MutableInt>, and then InterlockedIncrement the public field. Now you don't have to TryUpdate, but there is still a race here: if two threads both attempt to add the same name at the same time for the first time then you have to make sure that only one of them wins. Use AddOrUpdate carefully to ensure that this race doesn't happen.
Implement your own concurrent dictionary as a hash table that indexes into an int array; InterlockedIncrement on elements of the array. Again, you'll have to be extremely careful when a new name is introduced into the system to ensure that hash collisions are detected in a threadsafe manner.
Hash the string to one of n buckets, but this time the buckets are immutable dictionaries. Each bucket has a lock; lock the bucket, create a new dictionary from the old one, put it back in the bucket, unlock the bucket. If there is contention, increase n until it goes away.

Related

Complexity of searching in a list and in a dictionary

Let's say I have a class:
class C
{
public int uniqueField;
public int otherField;
}
This is very simplified version of the actual problem. I want to store multiple instances of this class, where "uniqueField" should be unique for each instance.
What is better in this case?
a) Dictionary with uniqueField as the key
Dictionary<int, C> d;
or b) List?
List<C> l;
In the first case (a) the same data would be stored twice (as the key and as the field of a class instance). But the question is: Is it faster to find an element in dictionary than in list? Or is the equally fast?
a)
d[searchedUniqueField]
b)
l.Find(x=>x.uniqueField==searchedUniqueField);
Assuming you've got quite a lot of instances, it's likely to be much faster to find the item in the dictionary. Basically a Dictionary<,> is a hash table, with O(1) lookup other than due to collisions.
Now if the collection is really small, then the extra overhead of finding the hash code, computing the right bucket and then looking through that bucket for matching hash codes, then performing a key equality check can take longer than just checking each element in a list.
If you might have a lot of instances but might not, I'd usually pick the dictionary approach. For one thing it expresses what you're actually trying to achieve: a simple way of accessing an element by a key. The overhead for small collections is unlikely to be very significant unless you have far more small collections than large ones.
Use Dictionary when the number of lookups greatly exceeds the number of insertions. It is fine to use List when you will always have fewer than four items.
Reference - http://www.dotnetperls.com/dictionary-time
If you want to ensure that your client will not create a duplication of the key, you may want your class to be responsible to create the unique key. Therefore once the unique key generation is the responsibility of the class , dictionary or list is the client decision.

Directory with datetime+string key, and automatic removal of old entries

I have an application that receives certain "events", uniquely identified by a 12 chars string and a DateTime. At each event is associated a result that is a string.
I need to keep these events in memory (for a maximum of for example 8 hours) and be able, in case I receive a second time the same event, being able to know I've already received it (in the last 8 hours).
Events to store will be less than 1000.
I can't use an external storage, it has to be done in memory.
My idea is to use a Dictionary where the key is a class composed of a string and a datetime, the value is the result.
EDIT: the string itself (actually a MAC address) does not identify uniquely the event, it's the MAC AND the DateTime, those two combined are unique, that's why the key must be formed by both.
The application is a server that receives a certain event from a client: the event is marked on the client by client MAC and by the client datetime (can't use a guid).
It may happen that the client retransmits the same data, and by checking the dictionary for that MAC/Datetime key I would know that I have already received that data.
Then, every hour (for example), I can foreach through the whole collection and remove all the keys where datetime is older than 8 hours.
Can you suggest a better approach to the problem or to the data formats I have chosen? In terms of performance and cleaniness of the code.
Or a better way to delete old data, with LINQ for example.
Thanks,
Mattia
The event time has to not be part of the key -- if it is, how are you going to be able to tell that you have already received this event? So you should move to a dictionary where the keys are event names and the values are tuples of date and result.
Once in a while you can trim old data from the dictionary easily with LINQ:
dictionary = dictionary
.Where(p => p.Value.DateOfEvent >= DateTime.Now.AddHours(-8))
.ToDictionary();
If requirements state that updating once per hour is good enough, and you're never having more than 1000 items in the dictionary, your solution should be perfectly adequate and probably the most easily understood by anyone else looking at your code. I'd probably recommend immutable structs for the key instead of classes, but that's it.
If there's a benefit to removing them immediately rather than once per hour, you could do something where you also add a Timer that removes it after exactly 8 hours, but then you've got to deal with thread safety and cleaning up all the timers and such. Likely not worth it.
I'd avoid the OrderedDictionary approach since it's more code, and may be slower since it has to reorder with every insert.
It's a common mantra these days to focus first on keeping code simple, only optimize when necessary. Until you have a known bottleneck and have profiled it, you never know if you're even optimizing the right thing. (And from your description, there's no telling which part will be slowest without profiling it).
I would go for a Dictionary.
This way you can searh very fast for the string (O(1)-operation).
Other collections are slower:
OrderedDictionary: is slow because it needs boxing and unboxing.
SortedDictionary: performs an O(log n) operation.
All normal arrays and lists: use an O(n/2) operation.
An example:
public class Event
{
public Event(string macAddress, DateTime time, string data)
{
MacAddress = macAddress;
Time = time;
Data = data;
}
public string MacAddress { get; set; }
public DateTime Time { get; set; }
public string Data { get; set; }
}
public class EventCollection
{
private readonly Dictionary<Tuple<string, DateTime>, Event> _Events = new Dictionary<Tuple<string, DateTime>, Event>();
public void Add(Event e)
{
_Events.Add(new Tuple<string, DateTime>(e.MacAddress, e.Time), e);
}
public IList<Event> GetOldEvents(bool autoRemove)
{
DateTime old = DateTime.Now - TimeSpan.FromHours(8);
List<Event> results = new List<Event>();
foreach(Event e in _Events.Values)
if (e.Time < old)
results.Add(e);
// Clean up
if (autoRemove)
foreach(Event e in results)
_Events.Remove(new Tuple<string, DateTime>(e.MacAddress, e.Time));
return results;
}
}
I would use an OrderedDictionary where the key is the 12 charactor identifier and the result and datetime are part of the value. Sadly OrderedDictionary is not generic (key and value are objects), so you would need to do the casting and type checking yourself. When you need to remove the old events, you can foreach through the OrderedDictionary and stop when you get to a time new enough to keep. This assumes the datetimes you use are in order when you add them to the dictionary.

Proper class definition and usage - thread safe - ASP.net

I wonder how to define a class properly and use it safely. I mean thread safely when thousands of concurrent calls are being made by every website visitor.
I made myself something like below but i wonder is it properly built
public static class csPublicFunctions
{
private static Dictionary<string, clsUserTitles> dicAuthorities;
static csPublicFunctions()
{
dicAuthorities = new Dictionary<string, clsUserTitles>();
using (DataTable dtTemp = DbConnection.db_Select_DataTable("select * from myTable"))
{
foreach (DataRow drw in dtTemp.Rows)
{
clsUserTitles tempCLS = new clsUserTitles();
tempCLS.irAuthorityLevel = Int32.Parse(drw["Level"].ToString());
tempCLS.srTitle_tr = drw["Title_tr"].ToString();
tempCLS.srTitle_en = drw["Title_en"].ToString();
dicAuthorities.Add(drw["authorityLevel"].ToString(), tempCLS);
}
}
}
public class clsUserTitles
{
private string Title_tr;
public string srTitle_tr
{
get { return Title_tr; }
set { Title_tr = value; }
}
private string Title_en;
public string srTitle_en
{
get { return Title_en; }
set { Title_en = value; }
}
private int AuthorityLevel;
public int irAuthorityLevel
{
get { return AuthorityLevel; }
set { AuthorityLevel = value; }
}
}
public static clsUserTitles returnUserTitles(string srUserAuthority)
{
return dicAuthorities[srUserAuthority];
}
}
Dictionary will be initialized only 1 time. No add remove update later.
Dictionary supports thread safe reading. Here is the proof from MSDN:
A Dictionary can support multiple readers concurrently,
as long as the collection is not modified. Even so, enumerating
through a collection is intrinsically not a thread-safe procedure. In
the rare case where an enumeration contends with write accesses, the
collection must be locked during the entire enumeration. To allow the
collection to be accessed by multiple threads for reading and writing,
you must implement your own synchronization.
So, if you are planning to only read data from it, it should work. However, I do not believe that your dictionary is filled only once and won't be modified during your application work. in this case, all other guys in this thread are correct, it is necessary to synchronize access to this dictionary and it is best to use the ConcurrentDictionary object.
Now, I want to say a couple of words about the design itself. If you want to store a shared data between users, use ASP.NET Cache instead which was designed for such purposes.
A quick look through your code and it seems to me that your first problem will be the publicly available dictionary dicAuthorities. Dictionaries are not thread safe. Depending on what you want to do with that Dictionary, you'll need to implement something that regulates access to it. See this related question:
Making dictionary access thread-safe?
As the others have said, Dictionary<TKey,TValue> is not inherently thread-safe. However, if your usage scenario is:
Fill the dictionary on startup
Use that dictionary as lookup while the application is running
Never add or remove values after startup
than you should be fine.
However, if you use .net 4.5, I would recommend making #3 explict, by using a ReadOnlyDictionary
So, your implementation might look like this (changed the coding style to more C# friendly)
private static readonly ReadOnlyDictionary<string, UserTitles> authorities;
static PublicFunctions()
{
Dictionary<string, UserTitles> authoritiesFill = new Dictionary<string, clsUserTitles>();
using (DataTable dtTemp = DbConnection.db_Select_DataTable("select * from myTable"))
{
foreach (DataRow drw in dtTemp.Rows)
{
UserTitles userTitle = new UserTitles
{
AuthorityLevel = Int32.Parse(drw["Level"].ToString()),
TitleTurkish = drw["Title_tr"].ToString();
TitleEnglish = drw["Title_en"].ToString();
}
authoritiesFill.Add(drw["authorityLevel"].ToString(), userTitle);
}
}
authorities = new ReadOnlyDictionary<string, UserTitles>(authoritiesFill);
}
I've also added a readonly modifier to the declaration itself, because this way you can be sure that it won't be replaced at runtime by another dictionary.
No you code is not thread safe.
[EDIT does not apply - set/created inside static constructor] Dictionary (as pointed by System Down answer) is not thread safe while being updated. Dictionary is not read only - hence no way to guarantee that it is not modified over time.
[EDIT does not apply - set/created inside static constructor] Initialization is not protected by any locks so you end-up with multiple initializations at the same time
Your entries are mutable - so it is very hard to reason if you get consistent value of each entry
[EDIT does not apply - only modified in static constructor] Field that holds dictionary not read-only - depending on code you may end-up with inconsistent data if not caching pointer to dictionary itself.
Side note: try to follow coding guidelines for C# and call classes starting with upper case MySpecialClass and have names that reflect purpose of the class (or clearly sample names).
EDIT: most of my points do not apply as the only initialization of the dictionary is inside static constructor. Which makes initialization safe from thread-safety point of view.
Note that initialization inside static constructor will happen at non-deterministic moment "before first use". It can lead to unexpected behavior - i.e. when access to DB may use wrong "current" user account.
The answer to your question is no, it's not thread safe. Dictionary is not a thread-safe collection. If you want to use a thread-safe dictionary then use ConcurrentDictionary.
Besides that, it's difficult to say whether your csPublicFunctions is thread-safe or not because it depends on how you handle your database connections inside the call to DbConnection.db_Select_DataTable
There is not thread-safe problem only with public Dictionary.
Yes, dictionary filling is thread-safe. But another modification of this dictionary is not thread safe. As was wrote above - ConcurrentDictionary could help.
Another problem that your class clsUserTitles is not thread-safe too.
If clsUserTitles is using only for reading you could make each property setter of clsUserTitles private. And initialize these properties from clsUserTitles constructor.

Parallel optimisation of string comparison

I'm trying to optimise the performance of a string comparison operation on each string key of a dictionary used as a database query cache. The current code looks like:
public void Clear(string tableName)
{
foreach (string key in cache.Keys.Where(key => key.IndexOf(tableName, StringComparison.Ordinal) >= 0).ToList())
{
cache.Remove(key);
}
}
I'm new to using C# parallel features and am wondering what the best way would be to convert this into a parallel operation so that multiple string comparisons can happen 'simultaneously'. The cache can often get quite large so maintenance on it with Clear() can get quite costly.
Make your cache object a ConcurrentDictionary and use TryRemove instead of Remove.
This will make your cache thread-safe; then, can invoke your current foreach loop like this:
Parallel.ForEach(cache.Keys, key =>
{
if(key.IndexOf(tableName, StringComparison.Ordinal) >= 0)
{
dynamic value; // just because I don't know your dictionary.
cache.TryRemove(key, out value);
}
});
Hope that gives you an starting point.
Your approach can't work well on a Dictionary<string, Whatever> because that class isn't thread-safe for multiple writers, so the simultaneous deletes could cause all sorts of problems.
You will therefore have to use a lock to synchronise the removals, which will therefore make the access of the dictionary essentially single-threaded. About the only thing that can be safely done across the threads simultaneously is the comparison in the Where.
You could use ConcurrentDictionary because its use of striped locks will reduce this impact. It still doesn't seem the best approach though.
If you are building keys from a strings so that testing if the key starts with a sub-key, and if removing the entire subkey is a frequent need, then you could try using a Dictionary<string, Dictionary<string, Whatever>>. Adding or updating becomes a bit more expensive, but clearing becomes an O(1) removal of just the one value from the higher-level dictionary.
I've used Dictionaries as caches before and what I've used to do is to do the clean up the cache "on the fly", that is, with each entry I also include its time of inclusion, then anytime an entry is requested I remove the old entries. Performance hit was minimal to me but if needed you could implement a Queue (of Tuple<DateTime, TKey> where TKey is the type of the keys on your dictionary) as an index to hold these timestamps so you didn't need to iterate over the entire dictionary every time. Anyway, if you're having to think about these issues, it's time to consider using a dedicated caching server. To me, Shared Cache (http://sharedcache.codeplex.com) has been good enough.

What is the best way to implement a property=value collection

I've written a wrapper class around a 3rd party library that requires properties to be set by calling a Config method and passing a string formatted as "Property=Value"
I'd like to pass all the properties in a single call and process them iteratively.
I've considered the following:
creating a property/value class and then creating a List of these
objects
building a string of multiple "Property=Value" separating them
with a token (maybe "|")
Using a hash table
All of these would work (and I'm thinking of using option 1) but is there a better way?
A bit more detail about my query:
The finished class will be included in a library for re-use in other applications. Whilst I don't currently see threading as a problem at the moment (our apps tend to just have a UI thread and a worker thread) it could become an issue in the future.
Garbage collection will not be an issue.
Access to arbitrary indices of the data source is not currently an issue.
Optimization is not currently an issue but clearly define the key/value pairs is important.
As you've already pointed out, any of the proposed solutions will accomplish the task as you've described it. What this means is that the only rational way to choose a particular method is to define your requirements:
Does your code need to support multiple threads accessing the data source simultaneously? If so, using a ConcurrentDictionary, as Yahia suggested, makes sense. Otherwise, there's no reason to incur the additional overhead and complexity of using a concurrent data structure.
Are you working in an environment where garbage collection is a problem (for example, an XNA game)? If so, any suggestion involving the concatenation of strings is going to be problematic.
Do you need O(1) access to arbitrary indices of the data source? If so, your third approach makes sense. On the other hand, if all you're doing is iterating over the collection, there's no reason to incur the additional overhead of inserting into a hashtable; use a List<KeyValuePair<String, String>> instead.
On the other hand, you may not be working in an environment where the optimization described above is necessary; the ability to clearly define the key/value pairs programatically may be more important to you. In which case using a Dictionary is a better choice.
You can't make an informed decision as to how to implement a feature without completely defining what the feature needs to do, and since you haven't done that, any answer given here will necessarily be incomplete.
Given your clarifications, I would personally suggest the following:
Avoid making your Config() method thread-safe by default, as per the MSDN guidelines:
By default, class libraries should not be thread safe. Adding locks to create thread-safe code decreases performance, increases lock contention, and creates the possibility for deadlock bugs to occur.
If thread safety becomes important later, make it the caller's responsibility.
Given that you don't have special performance requirements, stick with a dictionary to allow key/value pairs to be easily defined and read.
For simplicity's sake, and to avoid generating lots of unnecessary strings doing concatenations, just pass the dictionary in directly and iterate over it.
Consider the following example:
var configData = new Dictionary<String, String>
configData["key1"] = "value1";
configData["key2"] = "value2";
myLibraryObject.Config(configData);
And the implementation of Config:
public void Config(Dictionary<String, String> values)
{
foreach(var kvp in values)
{
var configString = String.Format("{0}={1}", kvp.Key, kvp.Value);
// do whatever
}
}
You could use Dictionary<string,string>, the items are then of type KeyValuePair<string,string> (this correpsonds to your first idea)
You can then use myDict.Select(kvp=>string.Format("{0}={1}",kvp.Key,kvp.Value)) to get a list of strings with the needed formatting
Use for example a ConcurrentDictionary<string,string> - it is thread-safe and really fast since most operations are implemented lock-free...
You could make a helper class that uses reflection to turn any class into a Property=Value collection
public static class PropertyValueHelper
{
public static IEnumerable<string> GetPropertyValues(object source)
{
Type t = source.GetType();
foreach (var property in t.GetProperties())
{
object value = property.GetValue(source, null);
if (value != null)
{
yield return property.Name + "=" + value.ToString();
}
else
{
yield return property.Name + "=";
}
}
}
}
You would need to add extra logic to handle enumerations, indexed properties, etc.

Categories

Resources