Related
I am working with a class, say Widget, that has a large number of numeric real world attributes (eg, height, length, weight, cost, etc.). There are different types of widgets (sprockets, cogs, etc.), but each widget shares the exact same attributes (the values will be different by widget, of course, but they all have a weight, weight, etc.). I have 1,000s of each type of widget (1,000 cogs, 1,000 sprockets, etc.)
I need to perform a lot of calculations on these attributes (say calculating the weighted average of the attributes for 1000s of different widgets). For the weighted averages, I have different weights for each widget type (ie, I may care more about length for sprockets than for cogs).
Right now, I am storing all the attributes in a Dictionary< string, double> within each widget (the widgets have an enum that specifies their type: cog, sprocket, etc.). I then have some calculator classes that store weights for each attribute as a Dictionary< WidgetType, Dictionary< string, double >>. To calculate the weighted average for each widget, I simply iterate through its attribute dictionary keys like:
double weightedAvg = 0.0;
foreach (string attibuteName in widget.Attributes.Keys)
{
double attributeValue = widget.Attributes[attributeName];
double attributeWeight = calculator.Weights[widget.Type][attributeName];
weightedAvg += (attributeValue * attributeWeight);
}
So this works fine and is pretty readable and easy to maintain, but is very slow for 1000s of widgets based on some profiling. My universe of attribute names is known and will not change during the life of the application, so I am wondering what some better options are. The few I can think of:
1) Store attribute values and weights in double []s. I think this is probably the most efficient option, but then I need to make sure the arrays are always stored in the correct order between widgets and calculators. This also decouples the data from the metadata so I will need to store an array (?) somewhere that maps between the attribute names and the index into double [] of attribute values and weights.
2) Store attribute values and weights in immutable structs. I like this option because I don't have to worry about the ordering and the data is "self documenting". But is there an easy way to loop over these attributes in code? I have almost 100 attributes, so I don't want to hardcode all those in the code. I can use reflection, but I worry that this will cause even a larger penalty hit since I am looping over so many widgets and will have to use reflection on each one.
Any other alternatives?
Three possibilities come immediately to mind. The first, which I think you rejected too readily, is to have individual fields in your class. That is, individual double values named height, length, weight, cost, etc. You're right that it would be more code to do the calculations, but you wouldn't have the indirection of dictionary lookup.
Second is to ditch the dictionary in favor of an array. So rather than a Dictionary<string, double>, you'd just have a double[]. Again, I think you rejected this too quickly. You can easily replace the string dictionary keys with an enumeration. So you'd have:
enum WidgetProperty
{
First = 0,
Height = 0,
Length = 1,
Weight = 2,
Cost = 3,
...
Last = 100
}
Given that and an array of double, you can easily go through all of the values for each instance:
for (int i = (int)WidgetProperty.First; i < (int)WidgetProperty.Last; ++i)
{
double attributeValue = widget.Attributes[i];
double attributeWeight = calculator.Weights[widget.Type][i];
weightedAvg += (attributeValue * attributeWeight);
}
Direct array access is going to be significantly faster than accessing a dictionary by string.
Finally, you can optimize your dictionary access a little bit. Rather than doing a foreach on the keys and then doing a dictionary lookup, do a foreach on the dictionary itself:
foreach (KeyValuePair<string, double> kvp in widget.Attributes)
{
double attributeValue = kvp.Value;
double attributeWeight = calculator.Weights[widget.Type][kvp.Key];
weightedAvg += (attributeValue * attributeWeight);
}
To calculate weighted averages without looping or reflection, one way would be to calculate the weighted average of the individual attributes and store them in some place. This should happen while you are creating instance of the widget. Following is a sample code which needs to be modified to your needs.
Also, for further processing of the the widgets themselves, you can use data parallelism. see my other response in this thread.
public enum WidgetType { }
public class Claculator { }
public class WeightStore
{
static Dictionary<int, double> widgetWeightedAvg = new Dictionary<int, double>();
public static void AttWeightedAvgAvailable(double attwightedAvg, int widgetid)
{
if (widgetWeightedAvg.Keys.Contains(widgetid))
widgetWeightedAvg[widgetid] += attwightedAvg;
else
widgetWeightedAvg[widgetid] = attwightedAvg;
}
}
public class WidgetAttribute
{
public string Name { get; }
public double Value { get; }
public WidgetAttribute(string name, double value, WidgetType type, int widgetId)
{
Name = name;
Value = value;
double attWeight = Calculator.Weights[type][name];
WeightStore.AttWeightedAvgAvailable(Value*attWeight, widgetId);
}
}
public class CogWdiget
{
public int Id { get; }
public WidgetAttribute height { get; set; }
public WidgetAttribute wight { get; set; }
}
public class Client
{
public void BuildCogWidgets()
{
CogWdiget widget = new CogWdiget();
widget.Id = 1;
widget.height = new WidgetAttribute("height", 12.22, 1);
}
}
As it is always the case with data normalization, is that choosing your normalization level determines a good part of the performance. It looks like you would have to go from your current model to another model or a mix.
Better performance for your scenario is possible when you do not process this with the C# side, but with the database instead. You then get the benefit of indexes, no data transfer except the wanted result, plus 100000s of man hours already spent on performance optimization.
Use Data Parallelism supported by the .net 4 and above.
https://msdn.microsoft.com/en-us/library/dd537608(v=vs.110).aspx
An excerpt from the above link
When a parallel loop runs, the TPL partitions the data source so that the loop can operate on multiple parts concurrently. Behind the scenes, the Task Scheduler partitions the task based on system resources and workload. When possible, the scheduler redistributes work among multiple threads and processors if the workload becomes unbalanced
I want to convert a string which contains 5 zeros ("00000") into a Int so it can be incremented.
Whenever I convert the string to an integer using Convert.ToInt32(), the value becomes 0.
How can I ensure that the integer stays at a fixed length once converted?
I want to be able to increment the value from "00000" to "00001" and so on so that the value appears with that many digits in a database instead of 0 or 1.
If you are going to down vote question, the least you could do is leave feedback on why you did it...
An integer is an integer. Nothing more.
So the int you have has value zero, there is no way to add any "length" metadata or similar.
I guess what you actually want is, that - at some point - there will be a string again. You want that string to represent your number, but with 5 characters and leading zeroes.
You can achieve that using:
string s = i.ToString("00000")
EDIT
You say you want the numbers to appear like that in your DB.
Ask yourself:
Does it really need to be like that in the DB or is it sufficient to format the numbers as soon as you read them from the database?
Depending on that, format them (as shown above), either when writing to or reading from the DB. Of course, the former case will require more storage space and you cannot really use SQL or similar to perform arithmetic operations!
An integer doesn't have a length, it's purely a numerical value. If you want to keep the length, you have to store the information about the length somewhere else.
You can wrap the value in an object that keeps the length in a property. Example:
public class FormattedInt {
private int _value, _length;
public FormattedInt(string source) {
_length = source.Length;
_value = Int32.Parse(source);
}
public void Increase() {
_value++;
}
public override string ToString() {
return _value.ToString(new String('0', _length));
}
}
Usage:
FormattedInt i = new FormattedInt("0004");
i.Increase();
string s = i.ToString(); // 0005
As olydis says above, an int is just a number. It doesn't specify the display format. I don't recommend storing it padded out with zeroes in a database either. It's an int, store it as an int.
For display purposes, in the app, a report, whatever, then pad out the zeroes.
string s = i.ToString("00000");
I am using the below code to generate random keys like the following TX8L1I
public string GetNewId()
{
string result = string.Empty;
Random rnd = new Random();
short codeLength = 5;
string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
StringBuilder builder = new StringBuilder(codeLength);
for (int i = 0; i < codeLength; ++i)
builder.Append(chars[rnd.Next(chars.Length)]);
result = string.Format("{0}{1}", "T", builder.ToString());
return result;
}
Every-time a key is generated a correspondent record is created on the database using the generated result as primary key. Off course this is not safe since a primary key violation might occur.
What's the correct approach to this? Should I load all existing keys and verify against tha t list if the key already exist and if yes generate another one?
Or maybe a more efficient way would be to move this logic into the database side? But even on the database side I still need to check that the random key generated does not exist on the current table.
Any ideas?
Thanks
Move the decision to the database. Add a primary key of type uniqueidentifier. Then its a simple "fire and forget" algorithm.. the database decides what the key is.
I think you should use Random instance outside of your method.
Since Random object is seeded from the system clock, which means that if you call your method several times very quickly, it will use the same seed each time, which means that you'll get the same string at the end.
well my decision to use the random is that I don't want the users to be able to know how the keys are generated (format), since the website is public and I don't want users to be trying to access keys that.
You've merged two problems into one that are really separate:
Identify a row in a database.
Have an identifier that is passed to and from clients, that matches this, but is not predictable.
This is a specific case of a more general set of two problems:
Identify a row in a database.
Have an identifier that is passed to and from clients, that matches this.
In this case, when we don't care about guessing, then the easiest way to deal with it is to just use the same identifier. E.g. for the database row identified by the integer 42 we use the string "42" and the mapping is a trivial int.Parse() or int.TryParse() in one direction and a trivial .ToString() or implicit .ToString() in the other.
And that's therefore the pattern we use without even thinking about it; public IDs and database keys are the same (perhaps with some conversion).
But the best solution to your specific case where you want to prevent guessing is not to change the key, but to change mapping between the key and the public identifier.
First, use auto-incremented integers ("IDENTITY" in SQL Server, and various similar concepts in other databases).
Then, when you are sending the key to the client (i.e. using it in a form value or appending it to a URI) then map it as so:
private const string Seed = "this is my secret seed ¾Ÿˇʣכ ↼⊜┲◗ blah balh";
private static string GetProtectedID(int id)
{
using(var sha = System.Security.Cryptography.SHA1.Create())
{
return string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(id.ToString() + Seed)).Select(b => b.ToString("X2"))) + id.ToString();
}
}
For example, with an ID of 123 this produces "989178D90470D8777F77C972AF46C4DED41EF0D9123".
Now map back to a the key with:
private static bool GetIDFromProtectedID(string str, out int id)
{
int chkID;
if(int.TryParse(str.Substring(40), out chkID))
{
using(var sha = System.Security.Cryptography.SHA1.Create())
{
if(string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(chkID.ToString() + Seed)).Select(b => b.ToString("X2"))) == str.Substring(0, 40))
{
id = chkID;
return true;
}
}
}
id = 0;
return false;
}
For "989178D90470D8777F77C972AF46C4DED41EF0D9123" this returns true and sets the id argument to 123. For "989178D90470D8777F77C972AF46C4DED41EF0D9122" (because I tried to guess the ID to attack your site) it returns false and sets id to 0. (The correct ID for key 122 is "F8AD0F55CA1B9426D18F684C4857E0C4D43984BA122", which is not easy to guess from having seen that for 123).
If you want, you can remove some of the 40 characters of the output to produce smaller IDs. This makes it less secure (an attacker has fewer possible IDs to brute-force) but should still be reasonable for many uses.
Obviously, you should used a different value for Seed than here, so that someone reading this answer can't use it to predict your ID; change the seed, change the ID. Seed cannot be changed once set without altering every ID in the system. Sometimes that's a good thing (if identifiers are never meant to have long-term value anyway), but normally it's bad (you've just 404d every page in the site that used it).
I would move the logic to the database instead. Even though the database still have to check for the existence of the key, it's already within the database operation at that point, so there is no back-and-forth happening.
Also, if there's an index on this randomly generated key, a simple if exists will be quick enough to determine if the key has been used before or not.
You can do the following:
Generate your string as well as concatenate the following onto it
string newUniqueString = string.Format("{0}{1}", result, DateTime.Now.ToString("yyyyMMddHHmmssfffffff"));
This way, you will never have the same key ever again!
Or use
var StringGuid = Guid.NewGuid();
You've commited a typical sin with Random: one shouldn't create Random instance
each time they want to generate random value (otherwise one'll get badly skewed distribution with many repeats). Put Random away form function:
// Let Generator be thread safe
private static ThreadLocal<Random> s_Generator = new ThreadLocal<Random>(
() => new Random());
public static Random Generator {
get {
return s_Generator.Value;
}
}
// For repetition test. One can remove repetion test if
// number of generated ids << 15000
private HashSet<String> m_UsedIds = new HashSet<String>();
public string GetNewId() {
int codeLength = 5;
string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
while (true) {
StringBuilder builder = new StringBuilder(codeLength);
for (int i = 0; i < codeLength; ++i)
builder.Append(chars[Generator.Next(chars.Length)]); // <- same generator for all calls
result = string.Format("{0}{1}", "T", builder.ToString());
// Test if random string in fact a repetition
if (m_UsedIds.Contains(result))
continue;
m_UsedIds.Add(result);
return result;
}
}
For codeLength = 5 there're Math.Pow(chars.Length, codeLength) possible
strings (Math.Pow(36, 5) == 60466176 = 6e7). According to birthday paradox
you can expect 1st repetition at about 2 * sqrt(possible strings) == 2 * sqrt(6e7) == 15000. If it's OK, you can skip the repetition test, otherwise HashSet<String> could
be a solution.
I am quite new to C# and I was wondering if there is a Class or a data structure or the best way to handle the following requirement...
I need to handle a COUPLE of int that represent a range of data (eg. 1 - 10 or 5-245) and I need a method to verify if an Int value is contained in the range...
I believe that in C# there is a class built in the framework to handle my requirement...
what I need to do is to verify if an INT (eg. 5) is contained in the range of values Eg (1-10) ...
in the case that I should discover that there is not a class to handle it, I was thinking to go with a Struct that contain the 2 numbers and make my own Contain method to test if 5 is contained in the range 1-10)
in the case that I should discover that there is not a class to handle
it, I was thinking to go with a Struct that contain the 2 numbers and
make my own Contain method to test if 5 is contained in the range
1-10)
That's actually a great idea as there's no built-in class for your scenario in the BCL.
You're looking for a range type; the .Net framework does not include one.
You should make an immutable (!) Int32Range struct, as you suggested.
You may want to implement IEnumerable<int> to allow users to easily loop through the numbers in the range.
You need to decide whether each bound should be inclusive or exclusive.
[Start, End) is probably the most obvious choice.
Whatever you choose, you should document it clearly in the XML comments.
Nothing exists that meets your requirements exactly.
Assuming I understood you correctly, the class is pretty simple to write.
class Range
{
public int Low {get; set;}
public int High {get; set;}
public bool InRange(int val) { return val >= Low && val <= High; }
}
A Tuple<int,int> would get you part of the way but you'd have to add an extension method to get the extra behavior. The downside is that the lower- and upper-bounds are implicitly Item1 and Item2 which could be confusing.
// written off-the-cuff, may not compile
public static class TupleExtension
{
public static bool InRange(Tuple<int, int> this, int queryFor)
{
return this.Item1 >= queryFor && this.Item2 <= queryFor;
}
}
You could create an extension if you want to avoid making a new type:
public static class Extensions
{
public static bool IsInRange(this int value, int min, int max)
{
return value >= min && value <= max;
}
}
Then you could do something like:
if(!value.IsInRange(5, 545))
throw new Exception("Value is out of range.");
i think you can do that with an array.
some nice examples and explanation can be found here:
http://www.dotnetperls.com/int-array
Nothing built in AFAIK, but (depending on the size of the range) an Enumerable.Range would work (but be less than optimal, as you're really storing every value in the range, not just the endpoints). It does allow you to use the LINQ methods (including Enumerable.Contains), though - which may come in handy.
const int START = 5;
const int END = 245;
var r = Enumerable.Range(START, (END - START)); // 2nd param is # of integers
return r.Contains(100);
Personally, I'd probably go ahead and write the class, since it's fairly simple (and you can always expose an IEnumerable<int> iterator via Enumerable.Range if you want to do LINQ over it)
I'm trying to figure out the best way to represent some data. It basically follows the form Manufacturer.Product.Attribute = Value. Something like:
Acme.*.MinimumPrice = 100
Acme.ProductA.MinimumPrice = 50
Acme.ProductB.MinimumPrice = 60
Acme.ProductC.DefaultColor = Blue
So the minimum price across all Acme products is 100 except in the case of product A and B. I want to store this data in C# and have some function where GetValue("Acme.ProductC.MinimumPrice") returns 100 but GetValue("Acme.ProductA.MinimumPrice") return 50.
I'm not sure how to best represent the data. Is there a clean way to code this in C#?
Edit: I may not have been clear. This is configuration data that needs to be stored in a text file then parsed and stored in memory in some way so that it can be retrieved like the examples I gave.
Write the text file exactly like this:
Acme.*.MinimumPrice = 100
Acme.ProductA.MinimumPrice = 50
Acme.ProductB.MinimumPrice = 60
Acme.ProductC.DefaultColor = Blue
Parse it into a path/value pair sequence:
foreach (var pair in File.ReadAllLines(configFileName)
.Select(l => l.Split('='))
.Select(a => new { Path = a[0], Value = a[1] }))
{
// do something with each pair.Path and pair.Value
}
Now, there two possible interpretations of what you want to do. The string Acme.*.MinimumPrice could mean that for any lookup where there is no specific override, such as Acme.Toadstool.MinimumPrice, we return 100 - even though there is nothing referring to Toadstool anywhere in the file. Or it could mean that it should only return 100 if there are other specific mentions of Toadstool in the file.
If it's the former, you could store the whole lot in a flat dictionary, and at look up time keep trying different variants of the key until you find something that matches.
If it's the latter, you need to build a data structure of all the names that actually occur in the path structure, to avoid returning values for ones that don't actually exist. This seems more reliable to me.
So going with the latter option, Acme.*.MinimumPrice is really saying "add this MinimumPrice value to any product that doesn't have its own specifically defined value". This means that you can basically process the pairs at parse time to eliminate all the asterisks, expanding it out into the equivalent of a completed version of the config file:
Acme.ProductA.MinimumPrice = 50
Acme.ProductB.MinimumPrice = 60
Acme.ProductC.DefaultColor = Blue
Acme.ProductC.MinimumPrice = 100
The nice thing about this is that you only need a flat dictionary as the final representation and you can just use TryGetValue or [] to look things up. The result may be a lot bigger, but it all depends how big your config file is.
You could store the information more minimally, but I'd go with something simple that works to start with, and give it a very simple API so that you can re-implement it later if it really turns out to be necessary. You may find (depending on the application) that making the look-up process more complicated is worse over all.
I'm not entirely sure what you're asking but it sounds like you're saying either.
I need a function that will return a fixed value, 100, for every product ID except for two cases: ProductA and ProductB
In that case you don't even need a data structure. A simple comparison function will do
int GetValue(string key) {
if ( key == "Acme.ProductA.MinimumPrice" ) { return 50; }
else if (key == "Acme.ProductB.MinimumPrice") { return 60; }
else { return 100; }
}
Or you could have been asking
I need a function that will return a value if already defined or 100 if it's not
In that case I would use a Dictionary<string,int>. For example
class DataBucket {
private Dictionary<string,int> _priceMap = new Dictionary<string,int>();
public DataBucket() {
_priceMap["Acme.ProductA.MinimumPrice"] = 50;
_priceMap["Acme.ProductB.MinimumPrice"] = 60;
}
public int GetValue(string key) {
int price = 0;
if ( !_priceMap.TryGetValue(key, out price)) {
price = 100;
}
return price;
}
}
One of the ways - you can create nested dictionary: Dictionary<string, Dictionary<string, Dictionary<string, object>>>. In your code you should split "Acme.ProductA.MinimumPrice" by dots and get or set a value to the dictionary corresponding to the splitted chunks.
Another way is using Linq2Xml: you can create XDocument with Acme as root node, products as children of the root and and attributes you can actually store as attributes on products or as children nodes. I prefer the second solution, but it would be slower if you have thousands of products.
I would take an OOP approach to this. The way that you explain it is all your Products are represented by objects, which is good. This seems like a good use of polymorphism.
I would have all products have a ProductBase which has a virtual property that defaults
virtual MinimumPrice { get { return 100; } }
And then your specific products, such as ProductA will override functionality:
override MinimumPrice { get { return 50; } }