best data structure for storing large number of numeric fields

best data structure for storing large number of numeric fields - c#

I am working with a class, say Widget, that has a large number of numeric real world attributes (eg, height, length, weight, cost, etc.). There are different types of widgets (sprockets, cogs, etc.), but each widget shares the exact same attributes (the values will be different by widget, of course, but they all have a weight, weight, etc.). I have 1,000s of each type of widget (1,000 cogs, 1,000 sprockets, etc.)
I need to perform a lot of calculations on these attributes (say calculating the weighted average of the attributes for 1000s of different widgets). For the weighted averages, I have different weights for each widget type (ie, I may care more about length for sprockets than for cogs).
Right now, I am storing all the attributes in a Dictionary< string, double> within each widget (the widgets have an enum that specifies their type: cog, sprocket, etc.). I then have some calculator classes that store weights for each attribute as a Dictionary< WidgetType, Dictionary< string, double >>. To calculate the weighted average for each widget, I simply iterate through its attribute dictionary keys like:
double weightedAvg = 0.0;
foreach (string attibuteName in widget.Attributes.Keys)
{
double attributeValue = widget.Attributes[attributeName];
double attributeWeight = calculator.Weights[widget.Type][attributeName];
weightedAvg += (attributeValue * attributeWeight);
}
So this works fine and is pretty readable and easy to maintain, but is very slow for 1000s of widgets based on some profiling. My universe of attribute names is known and will not change during the life of the application, so I am wondering what some better options are. The few I can think of:
1) Store attribute values and weights in double []s. I think this is probably the most efficient option, but then I need to make sure the arrays are always stored in the correct order between widgets and calculators. This also decouples the data from the metadata so I will need to store an array (?) somewhere that maps between the attribute names and the index into double [] of attribute values and weights.
2) Store attribute values and weights in immutable structs. I like this option because I don't have to worry about the ordering and the data is "self documenting". But is there an easy way to loop over these attributes in code? I have almost 100 attributes, so I don't want to hardcode all those in the code. I can use reflection, but I worry that this will cause even a larger penalty hit since I am looping over so many widgets and will have to use reflection on each one.
Any other alternatives?

Three possibilities come immediately to mind. The first, which I think you rejected too readily, is to have individual fields in your class. That is, individual double values named height, length, weight, cost, etc. You're right that it would be more code to do the calculations, but you wouldn't have the indirection of dictionary lookup.
Second is to ditch the dictionary in favor of an array. So rather than a Dictionary<string, double>, you'd just have a double[]. Again, I think you rejected this too quickly. You can easily replace the string dictionary keys with an enumeration. So you'd have:
enum WidgetProperty
{
First = 0,
Height = 0,
Length = 1,
Weight = 2,
Cost = 3,
...
Last = 100
}
Given that and an array of double, you can easily go through all of the values for each instance:
for (int i = (int)WidgetProperty.First; i < (int)WidgetProperty.Last; ++i)
{
double attributeValue = widget.Attributes[i];
double attributeWeight = calculator.Weights[widget.Type][i];
weightedAvg += (attributeValue * attributeWeight);
}
Direct array access is going to be significantly faster than accessing a dictionary by string.
Finally, you can optimize your dictionary access a little bit. Rather than doing a foreach on the keys and then doing a dictionary lookup, do a foreach on the dictionary itself:
foreach (KeyValuePair<string, double> kvp in widget.Attributes)
{
double attributeValue = kvp.Value;
double attributeWeight = calculator.Weights[widget.Type][kvp.Key];
weightedAvg += (attributeValue * attributeWeight);
}

To calculate weighted averages without looping or reflection, one way would be to calculate the weighted average of the individual attributes and store them in some place. This should happen while you are creating instance of the widget. Following is a sample code which needs to be modified to your needs.
Also, for further processing of the the widgets themselves, you can use data parallelism. see my other response in this thread.
public enum WidgetType { }
public class Claculator { }
public class WeightStore
{
static Dictionary<int, double> widgetWeightedAvg = new Dictionary<int, double>();
public static void AttWeightedAvgAvailable(double attwightedAvg, int widgetid)
{
if (widgetWeightedAvg.Keys.Contains(widgetid))
widgetWeightedAvg[widgetid] += attwightedAvg;
else
widgetWeightedAvg[widgetid] = attwightedAvg;
}
}
public class WidgetAttribute
{
public string Name { get; }
public double Value { get; }
public WidgetAttribute(string name, double value, WidgetType type, int widgetId)
{
Name = name;
Value = value;
double attWeight = Calculator.Weights[type][name];
WeightStore.AttWeightedAvgAvailable(Value*attWeight, widgetId);
}
}
public class CogWdiget
{
public int Id { get; }
public WidgetAttribute height { get; set; }
public WidgetAttribute wight { get; set; }
}
public class Client
{
public void BuildCogWidgets()
{
CogWdiget widget = new CogWdiget();
widget.Id = 1;
widget.height = new WidgetAttribute("height", 12.22, 1);
}
}

As it is always the case with data normalization, is that choosing your normalization level determines a good part of the performance. It looks like you would have to go from your current model to another model or a mix.
Better performance for your scenario is possible when you do not process this with the C# side, but with the database instead. You then get the benefit of indexes, no data transfer except the wanted result, plus 100000s of man hours already spent on performance optimization.

Use Data Parallelism supported by the .net 4 and above.
https://msdn.microsoft.com/en-us/library/dd537608(v=vs.110).aspx
An excerpt from the above link
When a parallel loop runs, the TPL partitions the data source so that the loop can operate on multiple parts concurrently. Behind the scenes, the Task Scheduler partitions the task based on system resources and workload. When possible, the scheduler redistributes work among multiple threads and processors if the workload becomes unbalanced

Related

Passing in and returning custom data - are interfaces the right approach?

I'm writing code in a C# library to do clustering on a (two-dimensional) dataset - essentially breaking the data up into groups or clusters. To be useful, the library needs to take in "generic" or "custom" data, cluster it, and return the clustered data.
To do this, I need to assume that each datum in the dataset being passed in has a 2D vector associated with it (in my case Lat, Lng - I'm working with co-ordinates).
My first thought was to use generic types, and pass in two lists, one list of the generic data (i.e. List<T>) and another of the same length specifying the 2D vectors (i.e. List<Coordinate>, where Coordinate is my class for specifying a lat, lng pair), where the lists correspond to each other by index. But this is quite tedious because it means that in the algorithm I have to keep track of these indices somehow.
My next thought was to use inferfaces, where I define an interface
public interface IPoint
{
double Lat { get; set; }
double Lng { get; set; }
}
and ensure that the data that I pass in implements this interface (i.e. I can assume that each datum passed in has a Lat and a Lng).
But this isn't really working out for me either. I'm using my C# library to cluster stops in a transit network (in a different project). The class is called Stop, and this class is also from an external library, so I can't implement the interface for that class.
What I did then was inherit from Stop, creating a class called ClusterableStopwhich looks like this:
public class ClusterableStop : GTFS.Entities.Stop, IPoint
{
public ClusterableStop(Stop stop)
{
Id = stop.Id;
Code = stop.Code;
Name = stop.Name;
Description = stop.Description;
Latitude = stop.Latitude;
Longitude = stop.Longitude;
Zone = stop.Zone;
Url = stop.Url;
LocationType = stop.LocationType;
ParentStation = stop.ParentStation;
Timezone = stop.Timezone;
WheelchairBoarding = stop.WheelchairBoarding;
}
public double Lat
{
get
{
return this.Latitude;
}
}
public double Lng
{
get
{
return this.Longitude;
}
}
}
which as you can see implements the IPoint interface. Now I use the constructor for ClusterableStop to first convert all Stops in the dataset to ClusterableStops, then run the algorithm and get the result as ClusterableStops.
This isn't really what I want, because I want to do things to the Stops based on what cluster they fall in. I can't do that because I've actually instantiated new stops, namely ClusterableStops !!
I can still acheive what I want to, because e.g. I can retrieve the original objects by Id. But surely there is a much more elegant way to accomplish all of this? Is this the right way to be using interfaces? It seemed like such a simple idea - passing in and getting back custom data - but turned out to be so complicated.

Since all you need is to associate a (latitude, longitude) pair to each element of 2D array, you could make a method that takes a delegate, which produces an associated position for each datum, like this:
ClusterList Cluster<T>(IList<T> data, Func<int,Coordinate> getCoordinate) {
for (int i = 0 ; i != data.Count ; i++) {
T item = data[i];
Coordinate coord = getCoord(i);
...
}
}
It is now up to the caller to decide how Coordinate is paired with each element of data.
Note that the association by list position is not the only option available to you. Another option is to pass a delegate that takes the item, and returns its coordinate:
ClusterList Cluster<T>(IEnumerable<T> data, Func<T,Coordinate> getCoordinate) {
foreach (var item in data) {
Coordinate coord = getCoord(item);
...
}
}
Although this approach is better than the index-based one, in cases when the coordinates are not available on the object itself, it requires the caller to keep some sort of an associative container on T, which must either play well with hash-based containers, or be an IComparable<T>. The first approach places no restrictions on T.
In your case, the second approach is preferable:
var clustered = Cluster(
myListOfStops
, stop => new Coordinate(stop.Latitude, stop.Longitude)
);

Have you considered using Tuples to do the work - sometimes this is a useful way of associating two classes without creating a whole new class. You can create a list of tuples:
List<Tuple<Point, Stop>>
where Point is the thing you cluster on.

Converting a method to use any Enum

My Problem:
I want to convert my randomBloodType() method to a static method that can take any enum type. I want my method to take any type of enum whether it be BloodType, DaysOfTheWeek, etc. and perform the operations shown below.
Some Background on what the method does:
The method currently chooses a random element from the BloodType enum based on the values assigned to each element. An element with a higher value has a higher probability to be picked.
Code:
public enum BloodType
{
// BloodType = Probability
ONeg = 4,
OPos = 36,
ANeg = 3,
APos = 28,
BNeg = 1,
BPos = 20,
ABNeg = 1,
ABPos = 5
};
public BloodType randomBloodType()
{
// Get the values of the BloodType enum and store it in a array
BloodType[] bloodTypeValues = (BloodType[])Enum.GetValues(typeof(BloodType));
List<BloodType> bloodTypeList = new List<BloodType>();
// Create a list where each element occurs the approximate number of
// times defined as its value(probability)
foreach (BloodType val in bloodTypeValues)
{
for(int i = 0; i < (int)val; i++)
{
bloodTypeList.Add(val);
}
}
// Sum the values
int sum = 0;
foreach (BloodType val in bloodTypeValues)
{
sum += (int)val;
}
//Get Random value
Random rand = new Random();
int randomValue = rand.Next(sum);
return bloodTypeList[randomValue];
}
What I have tried so far:
I have tried to use generics. They worked out for the most part, but I was unable to cast my enum elements to int values. I included a example of a section of code that was giving me problems below.
foreach (T val in bloodTypeValues)
{
sum += (int)val; // This line is the problem.
}
I have also tried using Enum e as a method parameter. I was unable to declare the type of my array of enum elements using this method.

(Note: My apologies in advance for the lengthy answer. My actual proposed solution is not all that long, but there are a number of problems with the proposed solutions so far and I want to try to address those thoroughly, to provide context for my own proposed solution).
In my opinion, while you have in fact accepted one answer and might be tempted to use either one, neither of the answers provided so far are correct or useful.
Commenter Ben Voigt has already pointed out two major flaws with your specifications as stated, both related to the fact that you are encoding the enum value's weight in the value itself:
You are tying the enum's underlying type to the code that then must interpret that type.
Two enum values that have the same weight are indistinguishable from each other.
Both of these issues can be addressed. Indeed, while the answer you accepted (why?) fails to address the first issue, the one provided by Dweeberly does address this through the use of Convert.ToInt32() (which can convert from long to int just fine, as long as the values are small enough).
But the second issue is much harder to address. The answer from Asad attempts to address this by starting with the enum names and parsing them to their values. And this does indeed result in the final array being indexed containing the corresponding entries for each name separately. But the code actually using the enum has no way to distinguish the two; it's really as if those two names are a single enum value, and that single enum value's probability weight is the sum of the value used for the two different names.
I.e. in your example, while the enum entries for e.g. BNeg and ABNeg will be selected separately, the code that receives these randomly selected value has no way to know whether it was BNeg or ABNeg that was selected. As far as it knows, those are just two different names for the same value.
Now, even this problem can be addressed (but not in the way that Asad attempts to…his answer is still broken). If you were, for example, to encode the probabilities in the value while still ensuring unique values for each name, you could decode those probabilities while doing the random selection and that would work. For example:
enum BloodType
{
// BloodType = Probability
ONeg = 4 * 100 + 0,
OPos = 36 * 100 + 1,
ANeg = 3 * 100 + 2,
APos = 28 * 100 + 3,
BNeg = 1 * 100 + 4,
BPos = 20 * 100 + 5,
ABNeg = 1 * 100 + 6,
ABPos = 5 * 100 + 7,
};
Having declared your enum values that way, then you can in your selection code divide the enum value by 100 to obtain its probability weight, which then can be used as seen in the various examples. At the same time, each enum name has a unique value.
But even solving that problem, you are still left with problems related to the choice of encoding and representation of the probabilities. For example, in the above you cannot have an enum that has more than 100 values, nor one with weights larger than (2^31 - 1) / 100; if you want an enum that has more than 100 values, you need a larger multiplier but that would limit your weight values even more.
In many scenarios (maybe all the ones you care about) this won't be an issue. The numbers are small enough that they all fit. But that seems like a serious limitation in what seems like a situation where you want a solution that is as general as possible.
And that's not all. Even if the encoding stays within reasonable limits, you have another significant limit to deal with: the random selection process requires an array large enough to contain for each enum value as many instances of that value as its weight. Again, if the values are small maybe this is not a big problem. But it does severely limit the ability of your implementation to generalize.
So, what to do?
I understand the temptation to try to keep each enum type self-contained; there are some obvious advantages to doing so. But there are also some serious disadvantages that result from that, and if you truly ever try to use this in a generalized way, the changes to the solutions proposed so far will tie your code together in ways that IMHO negate most if not all of the advantage of keeping the enum types self-contained (primarily: if you find you need to modify the implementation to accommodate some new enum type, you will have to go back and edit all of the other enum types you're using…i.e. while each type looks self-contained, in reality they are all tightly coupled with each other).
In my opinion, a much better approach would be to abandon the idea that the enum type itself will encode the probability weights. Just accept that this will be declared separately somehow.
Also, IMHO is would be better to avoid the memory-intensive approach proposed in your original question and mirrored in the other two answers. Yes, this is fine for the small values you're dealing with here. But it's an unnecessary limitation, making only one small part of the logic simpler while complicating and restricting it in other ways.
I propose the following solution, in which the enum values can be whatever you want, the enum's underlying type can be whatever you want, and the algorithm uses memory proportionally only to the number of unique enum values, rather than in proportion to the sum of all of the probability weights.
In this solution, I also address possible performance concerns, by caching the invariant data structures used to select the random values. This may or may not be useful in your case, depending on how frequently you will be generating these random values. But IMHO it is a good idea regardless; the up-front cost of generating these data structures is so high that if the values are selected with any regularity at all, it will begin to dominate the run-time cost of your code. Even if it works fine today, why take the risk? (Again, especially given that you seem to want a generalized solution).
Here is the basic solution:
static T NextRandomEnumValue<T>()
{
KeyValuePair<T, int>[] aggregatedWeights = GetWeightsForEnum<T>();
int weightedValue =
_random.Next(aggregatedWeights[aggregatedWeights.Length - 1].Value),
index = Array.BinarySearch(aggregatedWeights,
new KeyValuePair<T, int>(default(T), weightedValue),
KvpValueComparer<T, int>.Instance);
return aggregatedWeights[index < 0 ? ~index : index + 1].Key;
}
static KeyValuePair<T, int>[] GetWeightsForEnum<T>()
{
object temp;
if (_typeToAggregatedWeights.TryGetValue(typeof(T), out temp))
{
return (KeyValuePair<T, int>[])temp;
}
if (!_typeToWeightMap.TryGetValue(typeof(T), out temp))
{
throw new ArgumentException("Unsupported enum type");
}
KeyValuePair<T, int>[] weightMap = (KeyValuePair<T, int>[])temp;
KeyValuePair<T, int>[] aggregatedWeights =
new KeyValuePair<T, int>[weightMap.Length];
int sum = 0;
for (int i = 0; i < weightMap.Length; i++)
{
sum += weightMap[i].Value;
aggregatedWeights[i] = new KeyValuePair<T,int>(weightMap[i].Key, sum);
}
_typeToAggregatedWeights[typeof(T)] = aggregatedWeights;
return aggregatedWeights;
}
readonly static Random _random = new Random();
// Helper method to reduce verbosity in the enum-to-weight array declarations
static KeyValuePair<T1, T2> CreateKvp<T1, T2>(T1 t1, T2 t2)
{
return new KeyValuePair<T1, T2>(t1, t2);
}
readonly static KeyValuePair<BloodType, int>[] _bloodTypeToWeight =
{
CreateKvp(BloodType.ONeg, 4),
CreateKvp(BloodType.OPos, 36),
CreateKvp(BloodType.ANeg, 3),
CreateKvp(BloodType.APos, 28),
CreateKvp(BloodType.BNeg, 1),
CreateKvp(BloodType.BPos, 20),
CreateKvp(BloodType.ABNeg, 1),
CreateKvp(BloodType.ABPos, 5),
};
readonly static Dictionary<Type, object> _typeToWeightMap =
new Dictionary<Type, object>()
{
{ typeof(BloodType), _bloodTypeToWeight },
};
readonly static Dictionary<Type, object> _typeToAggregatedWeights =
new Dictionary<Type, object>();
Note that the work of actually selecting a random value is simply a matter of choosing a non-negative random integer less than the sum of the weights, and then using a binary search to find the appropriate enum value.
Once per enum type, the code will build the table of values and weight-sums that will be used for the binary search. This result is stored in a cache dictionary, _typeToAggregatedWeights.
There are also the objects that have to be declared and which will be used at run-time to build this table. Note that the _typeToWeightMap is just in support of making this method 100% generic. If you wanted to write a different named method for each specific type you wanted to support, that could still used a single generic method to implement the initialization and selection, but the named method would know the correct object (e.g. _bloodTypeToWeight) to use for initialization.
Alternatively, another way to avoid the _typeToWeightMap while still keeping the method 100% generic would be to have the _typeToAggregatedWeights be of type Dictionary<Type, Lazy<object>>, and have the values of the dictionary (the Lazy<object> objects) explicitly reference the appropriate weight array for the type.
In other words, there are lots of variations on this theme that would work fine. But they will all have essentially the same structure as above; semantics would be the same and performance differences would be negligible.
One thing you'll notice is that the binary search requires a custom IComparer<T> implementation. That is here:
class KvpValueComparer<TKey, TValue> :
IComparer<KeyValuePair<TKey, TValue>> where TValue : IComparable<TValue>
{
public readonly static KvpValueComparer<TKey, TValue> Instance =
new KvpValueComparer<TKey, TValue>();
private KvpValueComparer() { }
public int Compare(KeyValuePair<TKey, TValue> x, KeyValuePair<TKey, TValue> y)
{
return x.Value.CompareTo(y.Value);
}
}
This allows the Array.BinarySearch() method to correct compare the array elements, allowing a single array to contain both the enum values and their aggregated weights, but limiting the binary search comparison to just the weights.

Assuming your enum values are all of type int (you can adjust this accordingly if they're long, short, or whatever):
static TEnum RandomEnumValue<TEnum>(Random rng)
{
var vals = Enum
.GetNames(typeof(TEnum))
.Aggregate(Enumerable.Empty<TEnum>(), (agg, curr) =>
{
var value = Enum.Parse(typeof (TEnum), curr);
return agg.Concat(Enumerable.Repeat((TEnum)value,(int)value)); // For int enums
})
.ToArray();
return vals[rng.Next(vals.Length)];
}
Here's how you would use it:
var rng = new Random();
var randomBloodType = RandomEnumValue<BloodType>(rng);
People seem to have their knickers in a knot about multiple indistinguishable enum values in the input enum (for which I still think the above code provides expected behavior). Note that there is no answer here, not even Peter Duniho's, that will allow you to distinguish enum entries when they have the same value, so I'm not sure why this is being considered as a metric for any potential solutions.
Nevertheless, an alternative approach that doesn't use the enum values as probabilities is to use an attribute to specify the probability:
public enum BloodType
{
[P=4]
ONeg,
[P=36]
OPos,
[P=3]
ANeg,
[P=28]
APos,
[P=1]
BNeg,
[P=20]
BPos,
[P=1]
ABNeg,
[P=5]
ABPos
}
Here is what the attribute used above looks like:
[AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
public class PAttribute : Attribute
{
public int Weight { get; private set; }
public PAttribute(int weight)
{
Weight = weight;
}
}
and finally, this is what the method to get a random enum value would like:
static TEnum RandomEnumValue<TEnum>(Random rng)
{
var vals = Enum
.GetNames(typeof(TEnum))
.Aggregate(Enumerable.Empty<TEnum>(), (agg, curr) =>
{
var value = Enum.Parse(typeof(TEnum), curr);
FieldInfo fi = typeof (TEnum).GetField(curr);
var weight = ((PAttribute)fi.GetCustomAttribute(typeof(PAttribute), false)).Weight;
return agg.Concat(Enumerable.Repeat((TEnum)value, weight)); // For int enums
})
.ToArray();
return vals[rng.Next(vals.Length)];
}
(Note: if this code is performance critical, you might need to tweak this and add caching for the reflection data).

Some of this you can do and some of it isn't so easy. I believe the following extension method will do what you describe.
static public class Util {
static Random rnd = new Random();
static public int PriorityPickEnum(this Enum e) {
// The approved types for an enum are byte, sbyte, short, ushort, int, uint, long, or ulong
// However, Random only supports a int (or double) as a max value. Either way
// it doesn't have the range for uint, long and ulong.
//
// sum enum
int sum = 0;
foreach (var x in Enum.GetValues(e.GetType())) {
sum += Convert.ToInt32(x);
}
var i = rnd.Next(sum); // get a random value, it will form a ratio i / sum
// enums may not have a uniform (incremented) value range (think about flags)
// therefore we have to step through to get to the range we want,
// this is due to the requirement that return value have a probability
// proportional to it's value. Note enum values must be sorted for this to work.
foreach (var x in Enum.GetValues(e.GetType()).OfType<Enum>().OrderBy(a => a)) {
i -= Convert.ToInt32(x);
if (i <= 0) return Convert.ToInt32(x);
}
throw new Exception("This doesn't seem right");
}
}
Here is an example of using this extension:
BloodType bt = BloodType.ABNeg;
for (int i = 0; i < 100; i++) {
var v = (BloodType) bt.PriorityPickEnum();
Console.WriteLine("{0}: {1}({2})", i, v, (int) v);
}
This should work pretty well for enum's of type byte, sbyte, ushort, short and int. Once you get beyond int (uint, long, ulong) the problem is the Random class. You can adjust the code to use doubles generated by Random, which would cover uint, but the Random class just doesn't have the range to cover long and ulong. Of course you could use/find/write a different Random class if this is important.

Best way to store ROE in C#

I want to save Rate of Exchange of all currencies corresponding different base currency. What is the best and efficient type or struct to save the same. Currently i am using
Dictionary<string, Dictionary<string, decimal>>();
Thanks in advance. Please suggest

Typically I use a variation of the following class.
public ForexSpotContainer : IEnumerable<FxSpot>
{
[DataMember] private readonly Dictionary<string, FxSpot> _fxSpots;
public FxSpot this[string baseCurrency, string quoteCurrency]
{
get
{
var baseCurrencySpot = _fxSpots[baseCurrency];
var quoteCurrencySpot = _fxSpots[quoteCurrency];
return baseCurrencySpot.Invert()*quoteCurrencySpot;
}
}
public IEnumerator<FxSpot> GetEnumerator()
{
return _fxSpots.Values.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
I tend to then create a Money class and a FxSpot class then create +-*/ operators for the Money and FxSpot classes so that I can do financial calculations in a safe way.
EDIT: I my experience, when working in financial systems, I have always had issues with code like this
decimal sharePriceOfMsft = 40.30m;
decimal usdEur = 0.75m;
decimal msftInEur = sharePriceOfMsft * usdEur;
Since it always takes a few second for me to check if I should multiply the spot or divide.
The problem is compounded when I have to use Forex Crosses, such as JPYEUR or EURJPY etc, and hours were lost to subtitle bugs from close Forex Spots.
Also of consequence is the dimensional analysis of equations. When you multiple lots of numbers together and you expect a Price, are you sure you didn't mess up a multiple/divide. By creating a new class for each unit, you have a little more compile time error checking that can ultimately save you seconds for each line of code you read (which in a many thousand line library will add up to hours very quickly).

Don't see big problem with your approach, if not
1) use some custom 3rd party in memory DB (redis like) . But may reveal too combersome for your case.
2) derive from your type
public class MyCustomHolder : Dictionary<string, Dictionary<string, decimal>> {
}
So avoid that long and confusing definitions in the code, and bring more semantics to
your code reader and yourself.

Is this more suited for key value storage or a tree?

I'm trying to figure out the best way to represent some data. It basically follows the form Manufacturer.Product.Attribute = Value. Something like:
Acme.*.MinimumPrice = 100
Acme.ProductA.MinimumPrice = 50
Acme.ProductB.MinimumPrice = 60
Acme.ProductC.DefaultColor = Blue
So the minimum price across all Acme products is 100 except in the case of product A and B. I want to store this data in C# and have some function where GetValue("Acme.ProductC.MinimumPrice") returns 100 but GetValue("Acme.ProductA.MinimumPrice") return 50.
I'm not sure how to best represent the data. Is there a clean way to code this in C#?
Edit: I may not have been clear. This is configuration data that needs to be stored in a text file then parsed and stored in memory in some way so that it can be retrieved like the examples I gave.

Write the text file exactly like this:
Acme.*.MinimumPrice = 100
Acme.ProductA.MinimumPrice = 50
Acme.ProductB.MinimumPrice = 60
Acme.ProductC.DefaultColor = Blue
Parse it into a path/value pair sequence:
foreach (var pair in File.ReadAllLines(configFileName)
.Select(l => l.Split('='))
.Select(a => new { Path = a[0], Value = a[1] }))
{
// do something with each pair.Path and pair.Value
}
Now, there two possible interpretations of what you want to do. The string Acme.*.MinimumPrice could mean that for any lookup where there is no specific override, such as Acme.Toadstool.MinimumPrice, we return 100 - even though there is nothing referring to Toadstool anywhere in the file. Or it could mean that it should only return 100 if there are other specific mentions of Toadstool in the file.
If it's the former, you could store the whole lot in a flat dictionary, and at look up time keep trying different variants of the key until you find something that matches.
If it's the latter, you need to build a data structure of all the names that actually occur in the path structure, to avoid returning values for ones that don't actually exist. This seems more reliable to me.
So going with the latter option, Acme.*.MinimumPrice is really saying "add this MinimumPrice value to any product that doesn't have its own specifically defined value". This means that you can basically process the pairs at parse time to eliminate all the asterisks, expanding it out into the equivalent of a completed version of the config file:
Acme.ProductA.MinimumPrice = 50
Acme.ProductB.MinimumPrice = 60
Acme.ProductC.DefaultColor = Blue
Acme.ProductC.MinimumPrice = 100
The nice thing about this is that you only need a flat dictionary as the final representation and you can just use TryGetValue or [] to look things up. The result may be a lot bigger, but it all depends how big your config file is.
You could store the information more minimally, but I'd go with something simple that works to start with, and give it a very simple API so that you can re-implement it later if it really turns out to be necessary. You may find (depending on the application) that making the look-up process more complicated is worse over all.

I'm not entirely sure what you're asking but it sounds like you're saying either.
I need a function that will return a fixed value, 100, for every product ID except for two cases: ProductA and ProductB
In that case you don't even need a data structure. A simple comparison function will do
int GetValue(string key) {
if ( key == "Acme.ProductA.MinimumPrice" ) { return 50; }
else if (key == "Acme.ProductB.MinimumPrice") { return 60; }
else { return 100; }
}
Or you could have been asking
I need a function that will return a value if already defined or 100 if it's not
In that case I would use a Dictionary<string,int>. For example
class DataBucket {
private Dictionary<string,int> _priceMap = new Dictionary<string,int>();
public DataBucket() {
_priceMap["Acme.ProductA.MinimumPrice"] = 50;
_priceMap["Acme.ProductB.MinimumPrice"] = 60;
}
public int GetValue(string key) {
int price = 0;
if ( !_priceMap.TryGetValue(key, out price)) {
price = 100;
}
return price;
}
}

One of the ways - you can create nested dictionary: Dictionary<string, Dictionary<string, Dictionary<string, object>>>. In your code you should split "Acme.ProductA.MinimumPrice" by dots and get or set a value to the dictionary corresponding to the splitted chunks.
Another way is using Linq2Xml: you can create XDocument with Acme as root node, products as children of the root and and attributes you can actually store as attributes on products or as children nodes. I prefer the second solution, but it would be slower if you have thousands of products.

I would take an OOP approach to this. The way that you explain it is all your Products are represented by objects, which is good. This seems like a good use of polymorphism.
I would have all products have a ProductBase which has a virtual property that defaults
virtual MinimumPrice { get { return 100; } }
And then your specific products, such as ProductA will override functionality:
override MinimumPrice { get { return 50; } }

Generating the next available unique name in C#

If you were to have a naming system in your app where the app contains say 100 actions, which creates new objects, like:
Blur
Sharpen
Contrast
Darken
Matte
...
and each time you use one of these, a new instance is created with a unique editable name, like Blur01, Blur02, Blur03, Sharpen01, Matte01, etc. How would you generate the next available unique name, so that it's an O(1) operation or near constant time. Bear in mind that the user can also change the name to custom names, like RemoveFaceDetails, etc.
It's acceptable to have some constraints, like restricting the number of characters to 100, using letters, numbers, underscores, etc...
EDIT: You can also suggest solutions without "filling the gaps" that is without reusing the already used, but deleted names, except the custom ones of course.

I refer you to Michael A. Jackson's Two Rules of Program Optimization:
Don't do it.
For experts only: Don't do it yet.
Simple, maintainable code is far more important than optimizing for a speed problem that you think you might have later.
I would start simple: build a candidate name (e.g. "Sharpen01"), then loop through the existing filters to see if that name exists. If it does, increment and try again. This is O(N2), but until you get thousands of filters, that will be good enough.
If, sometime later, the O(N2) does become a problem, then I'd start by building a HashSet of existing names. Then you can check each candidate name against the HashSet, rather than iterating. Rebuild the HashSet each time you need a unique name, then throw it away; you don't need the complexity of maintaining it in the face of changes. This would leave your code easy to maintain, while only being O(N).
O(N) will be good enough. You do not need O(1). The user is not going to click "Sharpen" enough times for there to be any difference.

I would create a static integer in action class that gets incremented and assigned as part of each new instance of the class. For instance:
class Blur
{
private static int count = 0;
private string _name;
public string Name
{
get { return _name; }
set { _name = value; }
}
public Blur()
{
_name = "Blur" + count++.ToString();
}
}
Since count is static, each time you create a new class, it will be incremented and appended to the default name. O(1) time.
EDIT
If you need to fill in the holes when you delete, I would suggest the following. It would automatically queue up numbers when items are renamed, but it would be more costly overall:
class Blur
{
private static int count = 0;
private static Queue<int> deletions = new Queue<int>();
private string _name;
public string Name
{
get { return _name; }
set
{
_name = value;
Delete();
}
}
private int assigned;
public Blur()
{
if (deletions.Count > 0)
{
assigned = deletions.Dequeue();
}
else
{
assigned = count++;
}
_name = "Blur" + assigned.ToString();
}
public void Delete()
{
if (assigned >= 0)
{
deletions.Enqueue(assigned);
assigned = -1;
}
}
}
Also, when you delete an object, you'll need to call .Delete() on the object.
CounterClass Dictionary version
class CounterClass
{
private int count;
private Queue<int> deletions;
public CounterClass()
{
count = 0;
deletions = new Queue<int>();
}
public string GetNumber()
{
if (deletions.Count > 0)
{
return deletions.Dequeue().ToString();
}
return count++.ToString();
}
public void Delete(int num)
{
deletions.Enqueue(num);
}
}
you can create a Dictionary to look up counters for each string. Just make sure you parse out the index and call .Delete(int) whenever you rename or delete a value.

You can easily do it in O(m) where m is the number of existing instances of the name (and not dependent on n, the number of items in the list.
Look up the string S in question. If S isn't in the list, you're done.
S exists, so construct S+"01" and check for that. Continue incrementing (e.g. next try S+"02" until it doesn't exist.
This gives you unique names but they're still "pretty" and human-readable.
Unless you expect a large number of duplicates, this should be "near-constant" time because m will be so small.
Caveat: What if the string naturally ends with e.g. "01"? In your case this sounds unlikely so perhaps you don't care. If you do care, consider adding more of a suffix, e.g. "_01" instead of just "01" so it's easier to tell them apart.

You could do something like this:
private Dictionary<string, int> instanceCounts = new Dictionary<string, int>();
private string GetNextName(string baseName)
{
int count = 1;
if (instanceCounts.TryGetValue(baseName, out count))
{
// the thing already exists, so add one to it
count++;
}
// update the dictionary with the new value
instanceCounts[baseName] = count;
// format the number as desired
return baseName + count.ToString("00");
}
You would then just use it by calling GetNextName(...) with the base name you wanted, such as
string myNextName = GetNextName("Blur");
Using this, you wouldn't have to pre-init the dictionary.
It would fill in as you used the various base words.
Also, this is O(1).

I would create a dictionary with a string key and a integer value, storing the next number to use for a given action. This will be almost O(1) in practice.
private IDictionary<String, Int32> NextFreeActionNumbers = null;
private void InitializeNextFreeActionNumbers()
{
this.NextFreeActionNumbers = new Dictionary<String, Int32>();
this.NextFreeActionNumbers.Add("Blur", 1);
this.NextFreeActionNumbers.Add("Sharpen", 1);
this.NextFreeActionNumbers.Add("Contrast", 1);
// ... and so on ...
}
private String GetNextActionName(String action)
{
Int32 number = this.NextFreeActionNumbers[action];
this.NextFreeActionNumbers[action] = number + 1;
return String.Format("{0} {1}", action, number);
}
And you will have to check against collisions with user edited values. Again a dictionary might be a smart choice. There is no way around that. What ever way you generate your names, the user can always change a existing name to the next one you generate unless you include all existing names into the generation schema. (Or use a special character that is not allowed in user edited names, but that would be not that nice.)
Because of the comments on reusing the holes I want to add it here, too. Don't resuse the holes generated be renaming or deletion. This will confuse the user because names he deleted or modified will suddenly reappear.

I would look for ways to simplify the problem.
Are there any constraints that can be applied? As an example, would it be good enough if each user can only have one (active) type of action? Then, the actions could be distinguished using the name (or ID) of the user.
Blur (Ben F)
Blur (Adrian H)
Focus (Ben F)
Perhaps this is not an option in this case, but maybe something else would be possible. I would go to great lengths in order to avoid the complexity in some of the proposed solutions!

If you want O(1) time then just track how many instances of each you have. Keep a hashtable with all of the possible objects, when you create an object, increment the value for that object and use the result in the name.

You're definitely not going to want to expose a GUID to the user interface.
Are you proposing an initial name like "Blur04", letting the user rename it, and then raising an error message if the user's custom name conflicts? Or silently renaming it to "CustomName01" or whatever?
You can use a Dictionary to check for duplicates in O(1) time. You can have incrementing counters for each effect type in the class that creates your new effect instances. Like Kevin mentioned, it gets more complex if you have to fill in gaps in the numbering when an effect is deleted.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

best data structure for storing large number of numeric fields - c#

Related

Passing in and returning custom data - are interfaces the right approach?

Converting a method to use any Enum

Best way to store ROE in C#

Is this more suited for key value storage or a tree?

Generating the next available unique name in C#

Categories

Resources