What is the best way to implement a property=value collection - c#

I've written a wrapper class around a 3rd party library that requires properties to be set by calling a Config method and passing a string formatted as "Property=Value"
I'd like to pass all the properties in a single call and process them iteratively.
I've considered the following:
creating a property/value class and then creating a List of these
objects
building a string of multiple "Property=Value" separating them
with a token (maybe "|")
Using a hash table
All of these would work (and I'm thinking of using option 1) but is there a better way?
A bit more detail about my query:
The finished class will be included in a library for re-use in other applications. Whilst I don't currently see threading as a problem at the moment (our apps tend to just have a UI thread and a worker thread) it could become an issue in the future.
Garbage collection will not be an issue.
Access to arbitrary indices of the data source is not currently an issue.
Optimization is not currently an issue but clearly define the key/value pairs is important.

As you've already pointed out, any of the proposed solutions will accomplish the task as you've described it. What this means is that the only rational way to choose a particular method is to define your requirements:
Does your code need to support multiple threads accessing the data source simultaneously? If so, using a ConcurrentDictionary, as Yahia suggested, makes sense. Otherwise, there's no reason to incur the additional overhead and complexity of using a concurrent data structure.
Are you working in an environment where garbage collection is a problem (for example, an XNA game)? If so, any suggestion involving the concatenation of strings is going to be problematic.
Do you need O(1) access to arbitrary indices of the data source? If so, your third approach makes sense. On the other hand, if all you're doing is iterating over the collection, there's no reason to incur the additional overhead of inserting into a hashtable; use a List<KeyValuePair<String, String>> instead.
On the other hand, you may not be working in an environment where the optimization described above is necessary; the ability to clearly define the key/value pairs programatically may be more important to you. In which case using a Dictionary is a better choice.
You can't make an informed decision as to how to implement a feature without completely defining what the feature needs to do, and since you haven't done that, any answer given here will necessarily be incomplete.
Given your clarifications, I would personally suggest the following:
Avoid making your Config() method thread-safe by default, as per the MSDN guidelines:
By default, class libraries should not be thread safe. Adding locks to create thread-safe code decreases performance, increases lock contention, and creates the possibility for deadlock bugs to occur.
If thread safety becomes important later, make it the caller's responsibility.
Given that you don't have special performance requirements, stick with a dictionary to allow key/value pairs to be easily defined and read.
For simplicity's sake, and to avoid generating lots of unnecessary strings doing concatenations, just pass the dictionary in directly and iterate over it.
Consider the following example:
var configData = new Dictionary<String, String>
configData["key1"] = "value1";
configData["key2"] = "value2";
myLibraryObject.Config(configData);
And the implementation of Config:
public void Config(Dictionary<String, String> values)
{
foreach(var kvp in values)
{
var configString = String.Format("{0}={1}", kvp.Key, kvp.Value);
// do whatever
}
}

You could use Dictionary<string,string>, the items are then of type KeyValuePair<string,string> (this correpsonds to your first idea)
You can then use myDict.Select(kvp=>string.Format("{0}={1}",kvp.Key,kvp.Value)) to get a list of strings with the needed formatting

Use for example a ConcurrentDictionary<string,string> - it is thread-safe and really fast since most operations are implemented lock-free...

You could make a helper class that uses reflection to turn any class into a Property=Value collection
public static class PropertyValueHelper
{
public static IEnumerable<string> GetPropertyValues(object source)
{
Type t = source.GetType();
foreach (var property in t.GetProperties())
{
object value = property.GetValue(source, null);
if (value != null)
{
yield return property.Name + "=" + value.ToString();
}
else
{
yield return property.Name + "=";
}
}
}
}
You would need to add extra logic to handle enumerations, indexed properties, etc.

Related

What's the reason for the comparisonValue argument in the ConcurrentDictionary.TryUpdate method?

Why do we need third argument comparisonValue in ConcurrentDictionary.TryUpdate method?
And why will updating not succeed if already existed value is not equal to comparisonValue? Can't we just replace existed value with the new one just like in normal Dictionary<,>?
This is the signature:
public bool TryUpdate(TKey key, TValue newValue, TValue comparisonValue)
The point is that you're using concurrent dictionary for scenarios with concurrent access to the dictionary. You don't know who (and how) changed the dictionary in the meantime. Passing a comparison value is a very simple and effective way of only doing the change if the state of the dictionary is the same one you expect.
If you expect collisions to be relatively rare, this is a very efficient and performant way of handling shared state (no need for locking, and thus stopping all access). This pattern is the basis of lock-free code; you see it even on the hardware level. You can look up Compare and Exchange (or Compare and Swap) for more information.
If you want to update a key of a ConcurrentDictionary regardless of its current value, you can just use the set accessor of the indexer:
var dictionary = new ConcurrentDictionary<int, string>();
dictionary[1] = "Hello";
dictionary[2] = "World";
dictionary[1] = "Goodbye";
Console.WriteLine(String.Join(", ", dictionary));
Output:
[1, Goodbye], [2, World]
If each thread is working with an isolated set of keys, updating a ConcurrentDictionary like this might be sufficient. But if multiple threads are competing for updating the same keys, chaos might ensue. In those cases it might be desirable to use the TryUpdate method, or more frequently the AddOrUpdate method. These methods allow to update conditionally the dictionary, with the checking and updating being an atomic operation.
The following question might offer some insights about how this API can be used in practice:
Is there a way to use ConcurrentDictionary.TryUpdate with a lambda expression?

Parallel optimisation of string comparison

I'm trying to optimise the performance of a string comparison operation on each string key of a dictionary used as a database query cache. The current code looks like:
public void Clear(string tableName)
{
foreach (string key in cache.Keys.Where(key => key.IndexOf(tableName, StringComparison.Ordinal) >= 0).ToList())
{
cache.Remove(key);
}
}
I'm new to using C# parallel features and am wondering what the best way would be to convert this into a parallel operation so that multiple string comparisons can happen 'simultaneously'. The cache can often get quite large so maintenance on it with Clear() can get quite costly.
Make your cache object a ConcurrentDictionary and use TryRemove instead of Remove.
This will make your cache thread-safe; then, can invoke your current foreach loop like this:
Parallel.ForEach(cache.Keys, key =>
{
if(key.IndexOf(tableName, StringComparison.Ordinal) >= 0)
{
dynamic value; // just because I don't know your dictionary.
cache.TryRemove(key, out value);
}
});
Hope that gives you an starting point.
Your approach can't work well on a Dictionary<string, Whatever> because that class isn't thread-safe for multiple writers, so the simultaneous deletes could cause all sorts of problems.
You will therefore have to use a lock to synchronise the removals, which will therefore make the access of the dictionary essentially single-threaded. About the only thing that can be safely done across the threads simultaneously is the comparison in the Where.
You could use ConcurrentDictionary because its use of striped locks will reduce this impact. It still doesn't seem the best approach though.
If you are building keys from a strings so that testing if the key starts with a sub-key, and if removing the entire subkey is a frequent need, then you could try using a Dictionary<string, Dictionary<string, Whatever>>. Adding or updating becomes a bit more expensive, but clearing becomes an O(1) removal of just the one value from the higher-level dictionary.
I've used Dictionaries as caches before and what I've used to do is to do the clean up the cache "on the fly", that is, with each entry I also include its time of inclusion, then anytime an entry is requested I remove the old entries. Performance hit was minimal to me but if needed you could implement a Queue (of Tuple<DateTime, TKey> where TKey is the type of the keys on your dictionary) as an index to hold these timestamps so you didn't need to iterate over the entire dictionary every time. Anyway, if you're having to think about these issues, it's time to consider using a dedicated caching server. To me, Shared Cache (http://sharedcache.codeplex.com) has been good enough.

Double checked locking on Dictionary "ContainsKey"

My team is currently debating this issue.
The code in question is something along the lines of
if (!myDictionary.ContainsKey(key))
{
lock (_SyncObject)
{
if (!myDictionary.ContainsKey(key))
{
myDictionary.Add(key,value);
}
}
}
Some of the posts I've seen say that this may be a big NO NO (when using TryGetValue). Yet members of our team say it is ok since "ContainsKey" does not iterate on the key collection but checks if the key is contained via the hash code in O(1). Hence they claim there is no danger here.
I would like to get your honest opinions regarding this issue.
Don't do this. It's not safe.
You could be calling ContainsKey from one thread while another thread calls Add. That's simply not supported by Dictionary<TKey, TValue>. If Add needs to reallocate buckets etc, I can imagine you could get some very strange results, or an exception. It may have been written in such a way that you don't see any nasty effects, but I wouldn't like to rely on it.
It's one thing using double-checked locking for simple reads/writes to a field, although I'd still argue against it - it's another to make calls to an API which has been explicitly described as not being safe for multiple concurrent calls.
If you're on .NET 4, ConcurrentDictionary is probably the way forward. Otherwise, just lock on every access.
If you are in a multithreaded environment, you may prefer to look at using a ConcurrentDictionary. I blogged about it a couple of months ago, you might find the article useful: http://colinmackay.co.uk/blog/2011/03/24/parallelisation-in-net-4-0-the-concurrent-dictionary/
This code is incorrect. The Dictionary<TKey, TValue> type does not support simultaneous read and write operations. Even though your Add method is called within the lock the ContainsKey is not. Hence it easily allows for a violation of the simultaneous read / write rule and will lead to corruption in your instance
It doesn't look thread-safe, but it would probably be hard to make it fail.
The iteration vs hash lookup argument doesn't hold, there could be a hash-collision for instance.
If this dictionary is rarely written and often read, then I often employ safe double locking by replacing the entire dictionary on write. This is particularly effective if you can batch writes together to make them less frequent.
For example, this is a cut down version of a method we use that tries to get a schema object associated with a type, and if it can't, then it goes ahead and creates schema objects for all the types it finds in the same assembly as the specified type to minimize the number of times the entire dictionary has to be copied:
public static Schema GetSchema(Type type)
{
if (_schemaLookup.TryGetValue(type, out Schema schema))
return schema;
lock (_syncRoot) {
if (_schemaLookup.TryGetValue(type, out schema))
return schema;
var newLookup = new Dictionary<Type, Schema>(_schemaLookup);
foreach (var t in type.Assembly.GetTypes()) {
var newSchema = new Schema(t);
newLookup.Add(t, newSchema);
}
_schemaLookup = newLookup;
return _schemaLookup[type];
}
}
So the dictionary in this case will be rebuilt, at most, as many times as there are assemblies with types that need schemas. For the rest of the application lifetime the dictionary accesses will be lock-free. The dictionary copy becomes a one-time initialization cost of the assembly. The dictionary swap is thread-safe because pointer writes are atomic so the whole reference gets switched at once.
You can apply similar principles in other situations as well.

Avoiding array duplication

According to [MSDN: Array usage guidelines](http://msdn.microsoft.com/en-us/library/k2604h5s(VS.71).aspx):
Array Valued Properties
You should use collections to avoid code inefficiencies. In the following code example, each call to the myObj property creates a copy of the array. As a result, 2n+1 copies of the array will be created in the following loop.
[Visual Basic]
Dim i As Integer
For i = 0 To obj.myObj.Count - 1
DoSomething(obj.myObj(i))
Next i
[C#]
for (int i = 0; i < obj.myObj.Count; i++)
DoSomething(obj.myObj[i]);
Other than the change from myObj[] to ICollection myObj, what else would you recommend? Just realized that my current app is leaking memory :(
Thanks;
EDIT: Would forcing C# to pass references w/ ref (safety aside) improve performance and/or memory usage?
No, it isn't leaking memory - it is just making the garbage collector work harder than it might. Actually, the MSDN article is slightly misleading: if the property created a new collection every time it was called, it would be just as bad (memory wise) as with an array. Perhaps worse, due to the usual over-sizing of most collection implementations.
If you know a method/property does work, you can always minimise the number of calls:
var arr = obj.myObj; // var since I don't know the type!
for (int i = 0; i < arr.Length; i++) {
DoSomething(arr[i]);
}
or even easier, use foreach:
foreach(var value in obj.myObj) {
DoSomething(value);
}
Both approaches only call the property once. The second is clearer IMO.
Other thoughts; name it a method! i.e. obj.SomeMethod() - this sets expectation that it does work, and avoids the undesirable obj.Foo != obj.Foo (which would be the case for arrays).
Finally, Eric Lippert has a good article on this subject.
Just as a hint for those who haven't use the ReadOnlyCollection mentioned in some of the answers:
[C#]
class XY
{
private X[] array;
public ReadOnlyCollection<X> myObj
{
get
{
return Array.AsReadOnly(array);
}
}
}
Hope this might help.
Whenever I have properties that are costly (like recreating a collection on call) I either document the property, stating that each call incurs a cost, or I cache the value as a private field. Property getters that are costly, should be written as methods.
Generally, I try to expose collections as IEnumerable rather than arrays, forcing the consumer to use foreach (or an enumerator).
It will not make copies of the array unless you make it do so. However, simply passing the reference to an array privately owned by an object has some nasty side-effects. Whoever receives the reference is basically free to do whatever he likes with the array, including altering the contents in ways that cannot be controlled by its owner.
One way of preventing unauthorized meddling with the array is to return a copy of the contents. Another (slightly better) is to return a read-only collection.
Still, before doing any of these things you should ask yourself if you are about to give away too much information. In some cases (actually, quite often) it is even better to keep the array private and instead let provide methods that operate on the object owning it.
myobj will not create new item unless you explicitly create one. so to make better memory usage I recommend to use private collection (List or any) and expose indexer which will return the specified value from the private collection

Return collection as read-only

I have an object in a multi-threaded environment that maintains a collection of information, e.g.:
public IList<string> Data
{
get
{
return data;
}
}
I currently have return data; wrapped by a ReaderWriterLockSlim to protect the collection from sharing violations. However, to be doubly sure, I'd like to return the collection as read-only, so that the calling code is unable to make changes to the collection, only view what's already there. Is this at all possible?
If your underlying data is stored as list you can use List(T).AsReadOnly method.
If your data can be enumerated, you can use Enumerable.ToList method to cast your collection to List and call AsReadOnly on it.
I voted for your accepted answer and agree with it--however might I give you something to consider?
Don't return a collection directly. Make an accurately named business logic class that reflects the purpose of the collection.
The main advantage of this comes in the fact that you can't add code to collections so whenever you have a native "collection" in your object model, you ALWAYS have non-OO support code spread throughout your project to access it.
For instance, if your collection was invoices, you'd probably have 3 or 4 places in your code where you iterated over unpaid invoices. You could have a getUnpaidInvoices method. However, the real power comes in when you start to think of methods like "payUnpaidInvoices(payer, account);".
When you pass around collections instead of writing an object model, entire classes of refactorings will never occur to you.
Note also that this makes your problem particularly nice. If you don't want people changing the collections, your container need contain no mutators. If you decide later that in just one case you actually HAVE to modify it, you can create a safe mechanism to do so.
How do you solve that problem when you are passing around a native collection?
Also, native collections can't be enhanced with extra data. You'll recognize this next time you find that you pass in (Collection, Extra) to more than one or two methods. It indicates that "Extra" belongs with the object containing your collection.
If your only intent is to get calling code to not make a mistake, and modify the collection when it should only be reading all that is necessary is to return an interface which doesn't support Add, Remove, etc.. Why not return IEnumerable<string>? Calling code would have to cast, which they are unlikely to do without knowing the internals of the property they are accessing.
If however your intent is to prevent the calling code from observing updates from other threads you'll have to fall back to solutions already mentioned, to perform a deep or shallow copy depending on your need.
I think you're confusing concepts here.
The ReadOnlyCollection provides a read-only wrapper for an existing collection, allowing you (Class A) to pass out a reference to the collection safe in the knowledge that the caller (Class B) cannot modify the collection (i.e. cannot add or remove any elements from the collection.)
There are absolutely no thread-safety guarantees.
If you (Class A) continue to modify the underlying collection after you hand it out as a ReadOnlyCollection then class B will see these changes, have any iterators invalidated, etc. and generally be open to any of the usual concurrency issues with collections.
Additionally, if the elements within the collection are mutable, both you (Class A) and the caller (Class B) will be able to change any mutable state of the objects within the collection.
Your implementation depends on your needs:
- If you don't care about the caller (Class B) from seeing any further changes to the collection then you can just clone the collection, hand it out, and stop caring.
- If you definitely need the caller (Class B) to see changes that are made to the collection, and you want this to be thread-safe, then you have more of a problem on your hands. One possibility is to implement your own thread-safe variant of the ReadOnlyCollection to allow locked access, though this will be non-trivial and non-performant if you want to support IEnumerable, and it still won't protect you against mutable elements in the collection.
One should note that aku's answer will only protect the list as being read only. Elements in the list are still very writable. I don't know if there is any way of protecting non-atomic elements without cloning them before placing them in the read only list.
You can use a copy of the collection instead.
public IList<string> Data {
get {
return new List<T>(data);
}}
That way it doesn't matter if it gets updated.
You want to use the yield keyword. You loop through the IEnumerable list and return the results with yeild. This allows the consumer to use the for each without modifying the collection.
It would look something like this:
List<string> _Data;
public IEnumerable<string> Data
{
get
{
foreach(string item in _Data)
{
return yield item;
}
}
}

Categories

Resources