Related
I am working with a class, say Widget, that has a large number of numeric real world attributes (eg, height, length, weight, cost, etc.). There are different types of widgets (sprockets, cogs, etc.), but each widget shares the exact same attributes (the values will be different by widget, of course, but they all have a weight, weight, etc.). I have 1,000s of each type of widget (1,000 cogs, 1,000 sprockets, etc.)
I need to perform a lot of calculations on these attributes (say calculating the weighted average of the attributes for 1000s of different widgets). For the weighted averages, I have different weights for each widget type (ie, I may care more about length for sprockets than for cogs).
Right now, I am storing all the attributes in a Dictionary< string, double> within each widget (the widgets have an enum that specifies their type: cog, sprocket, etc.). I then have some calculator classes that store weights for each attribute as a Dictionary< WidgetType, Dictionary< string, double >>. To calculate the weighted average for each widget, I simply iterate through its attribute dictionary keys like:
double weightedAvg = 0.0;
foreach (string attibuteName in widget.Attributes.Keys)
{
double attributeValue = widget.Attributes[attributeName];
double attributeWeight = calculator.Weights[widget.Type][attributeName];
weightedAvg += (attributeValue * attributeWeight);
}
So this works fine and is pretty readable and easy to maintain, but is very slow for 1000s of widgets based on some profiling. My universe of attribute names is known and will not change during the life of the application, so I am wondering what some better options are. The few I can think of:
1) Store attribute values and weights in double []s. I think this is probably the most efficient option, but then I need to make sure the arrays are always stored in the correct order between widgets and calculators. This also decouples the data from the metadata so I will need to store an array (?) somewhere that maps between the attribute names and the index into double [] of attribute values and weights.
2) Store attribute values and weights in immutable structs. I like this option because I don't have to worry about the ordering and the data is "self documenting". But is there an easy way to loop over these attributes in code? I have almost 100 attributes, so I don't want to hardcode all those in the code. I can use reflection, but I worry that this will cause even a larger penalty hit since I am looping over so many widgets and will have to use reflection on each one.
Any other alternatives?
Three possibilities come immediately to mind. The first, which I think you rejected too readily, is to have individual fields in your class. That is, individual double values named height, length, weight, cost, etc. You're right that it would be more code to do the calculations, but you wouldn't have the indirection of dictionary lookup.
Second is to ditch the dictionary in favor of an array. So rather than a Dictionary<string, double>, you'd just have a double[]. Again, I think you rejected this too quickly. You can easily replace the string dictionary keys with an enumeration. So you'd have:
enum WidgetProperty
{
First = 0,
Height = 0,
Length = 1,
Weight = 2,
Cost = 3,
...
Last = 100
}
Given that and an array of double, you can easily go through all of the values for each instance:
for (int i = (int)WidgetProperty.First; i < (int)WidgetProperty.Last; ++i)
{
double attributeValue = widget.Attributes[i];
double attributeWeight = calculator.Weights[widget.Type][i];
weightedAvg += (attributeValue * attributeWeight);
}
Direct array access is going to be significantly faster than accessing a dictionary by string.
Finally, you can optimize your dictionary access a little bit. Rather than doing a foreach on the keys and then doing a dictionary lookup, do a foreach on the dictionary itself:
foreach (KeyValuePair<string, double> kvp in widget.Attributes)
{
double attributeValue = kvp.Value;
double attributeWeight = calculator.Weights[widget.Type][kvp.Key];
weightedAvg += (attributeValue * attributeWeight);
}
To calculate weighted averages without looping or reflection, one way would be to calculate the weighted average of the individual attributes and store them in some place. This should happen while you are creating instance of the widget. Following is a sample code which needs to be modified to your needs.
Also, for further processing of the the widgets themselves, you can use data parallelism. see my other response in this thread.
public enum WidgetType { }
public class Claculator { }
public class WeightStore
{
static Dictionary<int, double> widgetWeightedAvg = new Dictionary<int, double>();
public static void AttWeightedAvgAvailable(double attwightedAvg, int widgetid)
{
if (widgetWeightedAvg.Keys.Contains(widgetid))
widgetWeightedAvg[widgetid] += attwightedAvg;
else
widgetWeightedAvg[widgetid] = attwightedAvg;
}
}
public class WidgetAttribute
{
public string Name { get; }
public double Value { get; }
public WidgetAttribute(string name, double value, WidgetType type, int widgetId)
{
Name = name;
Value = value;
double attWeight = Calculator.Weights[type][name];
WeightStore.AttWeightedAvgAvailable(Value*attWeight, widgetId);
}
}
public class CogWdiget
{
public int Id { get; }
public WidgetAttribute height { get; set; }
public WidgetAttribute wight { get; set; }
}
public class Client
{
public void BuildCogWidgets()
{
CogWdiget widget = new CogWdiget();
widget.Id = 1;
widget.height = new WidgetAttribute("height", 12.22, 1);
}
}
As it is always the case with data normalization, is that choosing your normalization level determines a good part of the performance. It looks like you would have to go from your current model to another model or a mix.
Better performance for your scenario is possible when you do not process this with the C# side, but with the database instead. You then get the benefit of indexes, no data transfer except the wanted result, plus 100000s of man hours already spent on performance optimization.
Use Data Parallelism supported by the .net 4 and above.
https://msdn.microsoft.com/en-us/library/dd537608(v=vs.110).aspx
An excerpt from the above link
When a parallel loop runs, the TPL partitions the data source so that the loop can operate on multiple parts concurrently. Behind the scenes, the Task Scheduler partitions the task based on system resources and workload. When possible, the scheduler redistributes work among multiple threads and processors if the workload becomes unbalanced
Good afternoon all!
As a part of getting a better grip on some of the most aspects of object based programming, I've started to attempt something far larger than I have done in the past. Hereby I'm trying to learn about inheritance, code reuse, using classes far more extensively, and so on.
For this purpose I am trying to piece together all the parts required for a basic RPG/dungeon crawler.
I know this has been done a billion times before, but I find that actually trying to code something like it takes you through a lot more problems than you might think, which is a great way to learn (I think).
For now I have only loaded up a WPF application, since my interest is 95% on being able to piece together the working classes, routines, functions, etc. And not so much interested in how it will look. I am actually reading up on XNA, but since I am mostly trying to get a grip on the basic workings, I don't want to complicate those aspects with the graphical side of things just yet.
The problem I am now facing is that when I would a character to attack or defend, it should know from which other character it came, or to which one it should be pointed. I figured I could either use a GUID, or a manually appointed ID. But the problem is that I don't really know how I can implement such a thing.
The thing that I figured was that I could maybe add a reference to an array (Character[]), and have a SearchByID function loop through them to find the right one, and return it. Like so:
internal Character SearchByID(string _ID)
{
foreach(Character charToFind in Character[])
{
if(charToFind.ID == _ID)
return charToFind;
}
}
This of course has to be altered a bit due to the return at the moment, but just to give you an idea.
What I am stuck on is how to create the appropriate array outside of the "Character"-class? I can fill it up just fine, but how do I go about having it added above class level?
The way the "Character"-class is built up is that every new character instantiates from the Character class. The constructor then loads the appropriate values. But other than this, I see no possibility to initialize an array outside of this.
If it is preferable to post the entire code that I have, that will be no problem at all!
Thanks for any insights you may provide me with.
I think you can just use the Character-class and pass other Characters to it, for example:
public class Character
{
public string Name { get; private set; }
public int HitPoints { get; private set; }
public int Offense { get; private set; }
public int Defense { get; private set; }
public Character(string name, int hitPoints, int offense, int defense)
{
Name = name;
HitPoints = hitPoints;
Offense = offense;
Defense = defense;
}
public void Defend(Character source)
{
HitPoints = HitPoints - (source.Offense - Defense);
if (HitPoints <= 0)
{
Console.WriteLine("{0} died", Name);
}
}
public void Attack(Character target)
{
// Here you can call the other character's defend with this char as an attacker
target.Defend(this);
if (target.HitPoints <= 0)
{
Console.WriteLine("{0} killed {1}", Name, target.Name);
}
}
}
The thing with object oriented programming is that you have to start thinking in objects. Objects are like boxes when they're concrete. You can make new ones and give them some properties, like a name, height, width, hitpoints, whatever. You can also let these objects perform actions. Now a simple box won't do much itself, but a character can do various things, so it makes sense to put these actions in the Character-class.
Besides having Characters, you might have a Game-class which manages the game-state, characters, monsters, treasure chests etc...
Now this simple example may cause you to gain HitPoints when your defense is higher than the attacker's offense, but that's details, I'll leave the exact implementation up to you.
I guess you want a way to insert characters in an array when they are instantiated..
You can make a static array or list
So,your class in my opinion should be
class Character
{
static List<Character> characterList=new List<Character>();//all characters are here
public Character(string id,...)
{
//initialize your object
characterList.Add(this);//store them in the list as and when created
}
internal Character SearchByID(string _ID)
{
foreach(Character charToFind in characterList)
{
if(charToFind.ID == _ID)
return charToFind;
}
}
}
As you may be knowing static members are associated with the class not with the object.So,when you create a new character object it would be automatically added to the characterList
Unless you are dealing with seperate processes, e.g. client-server, you probably don't want to use "Id"s at all.
Whereever you are passing string _ID around, pass the actual Character instead. This saves you looking up in an array or whatever.
Post more code, and I can show you what I mean.
You could use a dictionary, instantiated in your controller class:
Dictionary<Guid, Character> _characterList = new Dictionary<Guid, Character>();
Initialise:
var someCharacter = new Character() { stats = something };
var otherCharacter = new Character() { stats = anotherThing };
var char1Id = Guid.NewGuid();
var char2Id = Guid.NewGuid();
_characterList.Add(char1Id, someCharacter);
_characterList.Add(char2Id, otherCharacter);
then, to access characters:
var charToFind = _characterList[char1Id];
or
var charToFind = _characterList.Single(c => c.Name = "Fred The Killer");
or whatever else...
Check out keyed collection
KeyedCollection
It is like a dictionary where the key is a property of class.
You will be able to reference a Character with
Characters[id]
Syntax
On your Character class overrite GetHashCode and Equals for performance.
If you use Int32 for the ID then you will get a perfect hash.
Very fast and O(1).
I have a weird situation happening that I'm not quite understanding.
I have a 'dataset' class that holds various metadata about a monitoring buoy including a list of 'sensors'.
Each current 'sensorstate'.
Each 'sensorstate' has a bit of metadata about it (timestamp, reason for change etc) but most importantly it has a Dictionary<DateTime,float> of values.
These sensors generally have upwards of 50k data points (years worth of 15min data readings) and so I wanted to find something that was a bit faster at serialising than the default .NET BinaryFormatter and so set up Protobuf-net which will serialize fantastically fast.
Unfortunately my problem occurs on deserialization when my dictionary of values throws a exception for there already being an item with the same key added and the only way I can get it to deserialise is to enable 'OverwriteList' but I'm a little unsure why when there aren't any duplicate keys (it's a dictionary) when serializing, so why are there duplicate keys when I deserialize? Which also brings up data integrity issues.
Any help in explaining this would be highly appreciated.
(On a side note, when giving ProtoMember attribute ids, do they need to be unique to the class or the whole project? and I'm looking for lossless compression recommendations to use in conjunction with protobuf-net as the files are getting pretty large)
Edit:
I've just put my source up on GitHub and here is the class in question
SensorState (Note: it currently has OverwriteList = true in order to have it working for other development)
Here is an example raw data file
I had already tried using the SkipContructor flag but even with it set to true it gets an exception unless OverwriteList is also true for the values dictionary.
If OverwriteList fixes it, then it suggests to me that the dictionary has some data in it by default, perhaps via a constructor or similar. If it is indeed coming from the constructor, you can disable that with [ProtoContract(SkipConstructor=true)].
If I have misunderstood the above, it may help to illustrate with a reproducible example, if possible.
With regard to the ids, they only need to be unique inside each type, and it is recommended to keep them small (due to "varint" encoding of tags, small keys are "cheaper" than large keys).
If you want to really minimise size, I would actually suggest looking at the content of the data, too. For example, you say that this is 15 minute readings... well, I'm guessing there are occasional gaps, but could you do, for example:
Block (class)
Start Time (DateTime)
Values (float[])
and have a Block for every contiguous bunch of 15-minute values (the assumption here is that every value is 15 after the last, else a new block is started). So you are storing multiple Block instances in place of a single dictionary. This has the advantages:
much less DateTime values to store
you can use "packed" encoding on the floats, which means it doesn't need to add all the intermediate tags; you do this by marking an array/list as ([ProtoMember({key}, IsPacked = true)]) - noting that it only works on a few basic data-types (not sub-objects)
combined, these two tweaks could yield significant savings
If the data has a lot of strings, you could try GZIP/DEFLATE. You can of course try these either way, but without large amounts of string data I would be cautious of expecting too much extra from compression.
As an update based on the supplied (CSV) data file, there is no inherent problem here handling the dictionary - as shown:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using ProtoBuf;
class Program
{
static void Main()
{
var data = new Data
{
Points =
{
{new DateTime(2009,09,1,0,0,0), 11.04F},
{new DateTime(2009,09,1,0,15,0), 11.04F},
{new DateTime(2009,09,1,0,30,0), 11.01F},
{new DateTime(2009,09,1,0,45,0), 11.01F},
{new DateTime(2009,09,1,1,0,0), 11F},
{new DateTime(2009,09,1,1,15,0), 10.98F},
{new DateTime(2009,09,1,1,30,0), 10.98F},
{new DateTime(2009,09,1,1,45,0), 10.92F},
{new DateTime(2009,09,1,2,00,0), 10.09F},
}
};
var ms = new MemoryStream();
Serializer.Serialize(ms, data);
ms.Position = 0;
var clone =Serializer.Deserialize<Data>(ms);
Console.WriteLine("{0} points:", clone.Points.Count);
foreach(var pair in clone.Points.OrderBy(x => x.Key))
{
float orig;
data.Points.TryGetValue(pair.Key, out orig);
Console.WriteLine("{0}: {1}", pair.Key, pair.Value == orig ? "correct" : "FAIL");
}
}
}
[ProtoContract]
class Data
{
private readonly Dictionary<DateTime, float> points = new Dictionary<DateTime, float>();
[ProtoMember(1)]
public Dictionary<DateTime, float> Points { get { return points; } }
}
This is where I apologize for ever suggesting it had anything to do with code that wasn't my own doing. And while I'm here mad props to the team behind protobuf and Marc Gravell for protobuf-net it's seriously fast.
What was happening was in the Sensor class I had some logic to never let a couple of Properties never be null.
[ProtoMember(12)]
public SensorState CurrentState
{
get { return (_currentState == null) ? RawData : _currentState; }
set { _currentState = value; }
}
Link
[ProtoMember(16)]
public SensorState RawData
{
get { return _rawData ?? (_rawData = new SensorState(this, DateTime.Now, new Dictionary<DateTime, float>(), "", true, null)); }
private set { _rawData = value; }
}
Link
While this works fantastically for when I'm using the properties it messes up serialization processes.
The simple fix was to instead mark the underlying objects for serialization instead.
[ProtoMember(16)]
private SensorState _rawData;
[ProtoMember(12)]
private SensorState _currentState;
Link
I'm trying to figure out the best way to represent some data. It basically follows the form Manufacturer.Product.Attribute = Value. Something like:
Acme.*.MinimumPrice = 100
Acme.ProductA.MinimumPrice = 50
Acme.ProductB.MinimumPrice = 60
Acme.ProductC.DefaultColor = Blue
So the minimum price across all Acme products is 100 except in the case of product A and B. I want to store this data in C# and have some function where GetValue("Acme.ProductC.MinimumPrice") returns 100 but GetValue("Acme.ProductA.MinimumPrice") return 50.
I'm not sure how to best represent the data. Is there a clean way to code this in C#?
Edit: I may not have been clear. This is configuration data that needs to be stored in a text file then parsed and stored in memory in some way so that it can be retrieved like the examples I gave.
Write the text file exactly like this:
Acme.*.MinimumPrice = 100
Acme.ProductA.MinimumPrice = 50
Acme.ProductB.MinimumPrice = 60
Acme.ProductC.DefaultColor = Blue
Parse it into a path/value pair sequence:
foreach (var pair in File.ReadAllLines(configFileName)
.Select(l => l.Split('='))
.Select(a => new { Path = a[0], Value = a[1] }))
{
// do something with each pair.Path and pair.Value
}
Now, there two possible interpretations of what you want to do. The string Acme.*.MinimumPrice could mean that for any lookup where there is no specific override, such as Acme.Toadstool.MinimumPrice, we return 100 - even though there is nothing referring to Toadstool anywhere in the file. Or it could mean that it should only return 100 if there are other specific mentions of Toadstool in the file.
If it's the former, you could store the whole lot in a flat dictionary, and at look up time keep trying different variants of the key until you find something that matches.
If it's the latter, you need to build a data structure of all the names that actually occur in the path structure, to avoid returning values for ones that don't actually exist. This seems more reliable to me.
So going with the latter option, Acme.*.MinimumPrice is really saying "add this MinimumPrice value to any product that doesn't have its own specifically defined value". This means that you can basically process the pairs at parse time to eliminate all the asterisks, expanding it out into the equivalent of a completed version of the config file:
Acme.ProductA.MinimumPrice = 50
Acme.ProductB.MinimumPrice = 60
Acme.ProductC.DefaultColor = Blue
Acme.ProductC.MinimumPrice = 100
The nice thing about this is that you only need a flat dictionary as the final representation and you can just use TryGetValue or [] to look things up. The result may be a lot bigger, but it all depends how big your config file is.
You could store the information more minimally, but I'd go with something simple that works to start with, and give it a very simple API so that you can re-implement it later if it really turns out to be necessary. You may find (depending on the application) that making the look-up process more complicated is worse over all.
I'm not entirely sure what you're asking but it sounds like you're saying either.
I need a function that will return a fixed value, 100, for every product ID except for two cases: ProductA and ProductB
In that case you don't even need a data structure. A simple comparison function will do
int GetValue(string key) {
if ( key == "Acme.ProductA.MinimumPrice" ) { return 50; }
else if (key == "Acme.ProductB.MinimumPrice") { return 60; }
else { return 100; }
}
Or you could have been asking
I need a function that will return a value if already defined or 100 if it's not
In that case I would use a Dictionary<string,int>. For example
class DataBucket {
private Dictionary<string,int> _priceMap = new Dictionary<string,int>();
public DataBucket() {
_priceMap["Acme.ProductA.MinimumPrice"] = 50;
_priceMap["Acme.ProductB.MinimumPrice"] = 60;
}
public int GetValue(string key) {
int price = 0;
if ( !_priceMap.TryGetValue(key, out price)) {
price = 100;
}
return price;
}
}
One of the ways - you can create nested dictionary: Dictionary<string, Dictionary<string, Dictionary<string, object>>>. In your code you should split "Acme.ProductA.MinimumPrice" by dots and get or set a value to the dictionary corresponding to the splitted chunks.
Another way is using Linq2Xml: you can create XDocument with Acme as root node, products as children of the root and and attributes you can actually store as attributes on products or as children nodes. I prefer the second solution, but it would be slower if you have thousands of products.
I would take an OOP approach to this. The way that you explain it is all your Products are represented by objects, which is good. This seems like a good use of polymorphism.
I would have all products have a ProductBase which has a virtual property that defaults
virtual MinimumPrice { get { return 100; } }
And then your specific products, such as ProductA will override functionality:
override MinimumPrice { get { return 50; } }
It seems strange that the language apparently includes no suitable functionality.
I find myself with data that would best be expressed as a multi-dimensional array but it's utterly constant, there is no way anyone could want to change it without also changing the associated code. Faced with such stuff in Delphi the answer is obvious--a constant whose value is the table. However, C# doesn't seem to support anything like this.
Google shows many people griping about this, no good answers.
How do people handle this sort of situation?
(And don't say that constants don't belong in code--the last one I bumped into was all possible permutations of 4 items. Unless the very nature of spacetime changes this is set in stone.)
What happened?? There was an answer that came pretty close, I was asking about a detail and it vanished! Simply declaring an array sort of does the job--the only problem is that the array allocation is going to run every time. The one in front of me contains 96 values--how do I get it to initialize only once? Do I just have to accept scoping it far wider than it should be? (As it stands it's in one 3-line routine that's inside what amounts to an O(n^3) routine.)
ReadOnlyCollection
There's a page in in the C# FAQ about this specific thing.
They suggest using a static readonly array:
static readonly int[,] constIntArray = new int[,] { { 1, 2, 3 }, { 4, 5, 6 }};
However, be aware that this is only sort of constant - you can still reassign individual elements within the array. Also, this has to be specified on the class level since it's a static, but it will work fairly well.
You could use a readonly Hashtable. The only downside is that readonly does not prevent you from changing the value of a particular item in the Hashtable. So it is not truly const.
readonly Hashtable table = new Hashtable(){{1,"One"},{2,"Two"}};
Or an array
public readonly string[,] arry = new string[,]{{"1","2"},{"2","4"}};
Yes, you will need to declare the variable in the appropriate scope so it does not get initialized more than once.
Like they say, just add another layer of indirection. C# doesn't need to provide a specialized data structure as a language primitive, although one does, at times, wish there was a way to make any class immutable, but that's another discussion.
Now you didn't mention if you need to store different things in there. In fact you didn't mention anything other than multi-dimensional and no ability to change the values or the arrays. I don't even know if the access pattern (a single int,int,int indexer) is appropriate.
But in general, for a 3-dimensional jagged array, the following works (but it isn't pretty).
One caveat is the type you construct it with also needs to be immutable, but that's your problem. You can just create your own read-only wrapper.
public static readonly ReadOnlyThreeDimensions<int> MyGlobalThree
= new ReadOnlyThreeDimensions<int>(IntInitializer);
public class ReadOnlyThreeDimensions<T>
{
private T[][][] _arrayOfT;
public ReadOnlyThreeDimensions(Func<T[][][]> initializer)
{
_arrayOfT = initializer();
}
public ReadOnlyThreeDimensions(T[][][] arrayOfT)
{
_arrayOfT = arrayOfT;
}
public T this [int x, int y, int z]
{
get
{
return _arrayOfT[x][y][z];
}
}
}
And then you just need to provide some initializer method, or assign it in a static constructor.
public static int[][][] IntInitializer()
{
return xyz // something that constructs a [][][]
}
Enumerations, surely.
Well, I've taken the approach of the following, it's a little nasty to read but easy to edit.
public struct Something
{
public readonly int Number;
public readonly string Name;
public Something(int num, string name) { this.Number = num; this.Name = name; }
}
public readonly Something[] GlobalCollection = new Something[]
{
new Something(1, "One"),
new Something(2, "Two"),
};