fast creation of a calculated column in c#

fast creation of a calculated column in c# - c#

Let us say I am being given a 'string' formula from another source. Example:
NewCalculatedColumn = (Column1 * Column2)/Column3
I would like to apply this formula to create a calculated column for some data stored as double array (or DataTable - I have freedom here) in memory.
In this particular example, the array/dataset consist of 3 column and has thousands of rows. One option is to use DataColumn.Expressions, if the data is stored in a DataTable, as discussed here. However, this may not be the most efficient way. Any feedback would be very much appreciated. Many thanks!

DataTable is a much heavier data structure than a list of objects (or the more generic IEnumerable<T>), as it is indicated here.
So, if you are not forced into using a DataTable a list of objects that look like the following, can be used:
public ObjectType
{
public double Column1 { get; set; }
public double Column2 { get; set; }
public double Column3 { get; set; }
// avoid division by zero, adjust zero comparison threshold as needed
// also adjust returned value on zero
// using C# 6.0 specific syntax. If not available, use get { return } syntax
public double NewCalculatedColumn => Math.Abs(Column3) > 0.0001 ?
(Column1 * Column2)/Column3
: 0.0;
}
Even if you fetch data as DataTable, you can easily convert it to List<ObjectType> as indicated here.
[EDIT]
Based on comment, if expression can is dynamic, an external library can be used. E.g. NCalc:
public double NewCalculatedColumn
{
get
{
// you can provide a dynamic expression which contains col1, col2 and col3
//TODO: add exception handling
var e = new Expression($"(col1 * col2)/{col3}");
e.Parameters["col1"] = Column1;
e.Parameters["col2"] = Column2;
e.Parameters["col3"] = Column3;
return e.Evaluate();
}
}

Related

How can I have a datatable detect multiple columns when iterating through rows in C# with conditionals?

I am new to C# and I have to rebuild a JavaScript program into C#.
This program involves reading a CSV file and iterating through it to detect different values and to produce outputs.
Here is an example of my code:
foreach(DataRow row in results.Rows)
{
if (row["Date"].ToString().Substring(row["Date"].ToString().Length - 16) == "2021 12:00:00 AM") //Checks if the year is 2021 or not in the "date" column
{
if (row["State"].ToString() == "CA") //Looks for CA state in the "state" column, however, it appears to not be finding it?
{ //DOES NEEDED CALCULATIONS
Basically, the code detects the "2021" date just fine in the data table, but it can't find the CA state at all when iterating through the rows, therefore, the required calculations are never done.
Here is what the data table looks like:
DataTable
Help is greatly appreciated, I been stuck on this for a while due to my lack of knowledge with C#.

Odds are that there is some extra whitespace in row["State"]
Try this:
foreach(DataRow row in results.Rows)
{
if (row["Date"].ToString().Substring(row["Date"].ToString().Length - 16) == "2021 12:00:00 AM") //Checks if the year is 2021 or not in the "date" column
{
if (row["State"].ToString().Contains("CA")) //Looks for CA state in the "state" column, however, it appears to not be finding it?
{ //DOES NEEDED CALCULATIONS
That being said, all the previous comments are really helpful for your needs. Don't do your own CSV parsing if you don't have to. Don't work on DateTime as string. Make your own DTOs to represent records, instead of using DataTable.
Example:
record Invoice
{
public int InvoiceNumber { get; set; }
public DateTime Date { get; set; }
public double Amount { get; set; }
public string State { get; set; }
}
public void DoStuff()
{
var invoices = ReadInvoiceFile("Your/Path/Here.csv");
foreach (var invoice in invoices)
{
if(invoice.Date.Year != 2021) continue;
if (invoice.State.Contains("CA"))
{
//do CA specific stuff here
}
}
}
private List<Invoice> ReadInvoiceFile(string path)
{
//realistically you would use a 3rd party library to do this
}
I would also add that you shouldn't be using inline literals in your code (such as 2021 or "CA" in my example). And making your behavior depend on an if statement around a hardcoded state and year violates the Open-Closed principle, and a good candidate for refactoring into a factory method. But lets take one step at a time.

best data structure for storing large number of numeric fields

I am working with a class, say Widget, that has a large number of numeric real world attributes (eg, height, length, weight, cost, etc.). There are different types of widgets (sprockets, cogs, etc.), but each widget shares the exact same attributes (the values will be different by widget, of course, but they all have a weight, weight, etc.). I have 1,000s of each type of widget (1,000 cogs, 1,000 sprockets, etc.)
I need to perform a lot of calculations on these attributes (say calculating the weighted average of the attributes for 1000s of different widgets). For the weighted averages, I have different weights for each widget type (ie, I may care more about length for sprockets than for cogs).
Right now, I am storing all the attributes in a Dictionary< string, double> within each widget (the widgets have an enum that specifies their type: cog, sprocket, etc.). I then have some calculator classes that store weights for each attribute as a Dictionary< WidgetType, Dictionary< string, double >>. To calculate the weighted average for each widget, I simply iterate through its attribute dictionary keys like:
double weightedAvg = 0.0;
foreach (string attibuteName in widget.Attributes.Keys)
{
double attributeValue = widget.Attributes[attributeName];
double attributeWeight = calculator.Weights[widget.Type][attributeName];
weightedAvg += (attributeValue * attributeWeight);
}
So this works fine and is pretty readable and easy to maintain, but is very slow for 1000s of widgets based on some profiling. My universe of attribute names is known and will not change during the life of the application, so I am wondering what some better options are. The few I can think of:
1) Store attribute values and weights in double []s. I think this is probably the most efficient option, but then I need to make sure the arrays are always stored in the correct order between widgets and calculators. This also decouples the data from the metadata so I will need to store an array (?) somewhere that maps between the attribute names and the index into double [] of attribute values and weights.
2) Store attribute values and weights in immutable structs. I like this option because I don't have to worry about the ordering and the data is "self documenting". But is there an easy way to loop over these attributes in code? I have almost 100 attributes, so I don't want to hardcode all those in the code. I can use reflection, but I worry that this will cause even a larger penalty hit since I am looping over so many widgets and will have to use reflection on each one.
Any other alternatives?

Three possibilities come immediately to mind. The first, which I think you rejected too readily, is to have individual fields in your class. That is, individual double values named height, length, weight, cost, etc. You're right that it would be more code to do the calculations, but you wouldn't have the indirection of dictionary lookup.
Second is to ditch the dictionary in favor of an array. So rather than a Dictionary<string, double>, you'd just have a double[]. Again, I think you rejected this too quickly. You can easily replace the string dictionary keys with an enumeration. So you'd have:
enum WidgetProperty
{
First = 0,
Height = 0,
Length = 1,
Weight = 2,
Cost = 3,
...
Last = 100
}
Given that and an array of double, you can easily go through all of the values for each instance:
for (int i = (int)WidgetProperty.First; i < (int)WidgetProperty.Last; ++i)
{
double attributeValue = widget.Attributes[i];
double attributeWeight = calculator.Weights[widget.Type][i];
weightedAvg += (attributeValue * attributeWeight);
}
Direct array access is going to be significantly faster than accessing a dictionary by string.
Finally, you can optimize your dictionary access a little bit. Rather than doing a foreach on the keys and then doing a dictionary lookup, do a foreach on the dictionary itself:
foreach (KeyValuePair<string, double> kvp in widget.Attributes)
{
double attributeValue = kvp.Value;
double attributeWeight = calculator.Weights[widget.Type][kvp.Key];
weightedAvg += (attributeValue * attributeWeight);
}

To calculate weighted averages without looping or reflection, one way would be to calculate the weighted average of the individual attributes and store them in some place. This should happen while you are creating instance of the widget. Following is a sample code which needs to be modified to your needs.
Also, for further processing of the the widgets themselves, you can use data parallelism. see my other response in this thread.
public enum WidgetType { }
public class Claculator { }
public class WeightStore
{
static Dictionary<int, double> widgetWeightedAvg = new Dictionary<int, double>();
public static void AttWeightedAvgAvailable(double attwightedAvg, int widgetid)
{
if (widgetWeightedAvg.Keys.Contains(widgetid))
widgetWeightedAvg[widgetid] += attwightedAvg;
else
widgetWeightedAvg[widgetid] = attwightedAvg;
}
}
public class WidgetAttribute
{
public string Name { get; }
public double Value { get; }
public WidgetAttribute(string name, double value, WidgetType type, int widgetId)
{
Name = name;
Value = value;
double attWeight = Calculator.Weights[type][name];
WeightStore.AttWeightedAvgAvailable(Value*attWeight, widgetId);
}
}
public class CogWdiget
{
public int Id { get; }
public WidgetAttribute height { get; set; }
public WidgetAttribute wight { get; set; }
}
public class Client
{
public void BuildCogWidgets()
{
CogWdiget widget = new CogWdiget();
widget.Id = 1;
widget.height = new WidgetAttribute("height", 12.22, 1);
}
}

As it is always the case with data normalization, is that choosing your normalization level determines a good part of the performance. It looks like you would have to go from your current model to another model or a mix.
Better performance for your scenario is possible when you do not process this with the C# side, but with the database instead. You then get the benefit of indexes, no data transfer except the wanted result, plus 100000s of man hours already spent on performance optimization.

Use Data Parallelism supported by the .net 4 and above.
https://msdn.microsoft.com/en-us/library/dd537608(v=vs.110).aspx
An excerpt from the above link
When a parallel loop runs, the TPL partitions the data source so that the loop can operate on multiple parts concurrently. Behind the scenes, the Task Scheduler partitions the task based on system resources and workload. When possible, the scheduler redistributes work among multiple threads and processors if the workload becomes unbalanced

Database queries in Entity Framework model - variable equals zero

I have some problems with using database in my Model. I suspect that its not good idea to use database queries in Model, but I don't know how to do it better.
Code:
Let's assume that I have some application to analize football scores. I have some EF model that stores info about footballer:
public class Player
{
[...]
public virtual ICollection<Goal> HisGoals { get; set; }
public float Efficiency
{
get
{
using(var db = new ScoresContext())
{
var allGoalsInSeason = db.Goals.Count();
return HisGoals.Count / allGoalsInSeason;
}
}
}
}
Problem:
So the case is: I want to have a variable in my model called "Efficiency" that will return quotient of two variables. One of them contains data got in real-time from database.
For now this code doesn't work. "Efficiency" equals 0. I tried to use debugger and all data is correct and it should return other value.
What's wrong? Why it returns always zero?
My suspicions:
Maybe I'm wrong, I'm not good at C#, but I think the reason that Efficiency is always zero, is because I use database in it and it is somehow asynchronous. When I call this variable, it returns zero first and then calls the database.

I think that your problem lies in dividing integer / integer. In order to get a float you have to cast first one to float like this:
public float Efficiency
{
get
{
using(var db = new ScoresContext())
{
var allGoalsInSeason = db.Goals.Count();
return (float)HisGoals.Count / allGoalsInSeason;
}
}
}
Dividing int/int results always in int that is in your case 0 (if it is as you said in comment 4/11).
Second thing is that Entity Framework will cache values - test it before shipping to production.

how to make calculation between two rows in a specific column in a list

I have this code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
namespace CsvDemo
{
class Program
{
static void Main(string[] args)
{
List<DailyValues> values = File.ReadAllLines("C:\\Users\\Josh\\Sample.csv")
.Skip(1)
.Select(v => DailyValues.FromCsv(v))
.ToList();
}
}
class DailyValues
{
DateTime Date;
decimal Open;
decimal High;
decimal Low;
decimal Close;
decimal Volume;
decimal AdjClose;
public static DailyValues FromCsv(string csvLine)
{
string[] values = csvLine.Split(',');
DailyValues dailyValues = new DailyValues();
dailyValues.Date = Convert.ToDateTime(values[0]);
dailyValues.Open = Convert.ToDecimal(values[1]);
dailyValues.High = Convert.ToDecimal(values[2]);
dailyValues.Low = Convert.ToDecimal(values[3]);
dailyValues.Close = Convert.ToDecimal(values[4]);
dailyValues.Volume = Convert.ToDecimal(values[5]);
dailyValues.AdjClose = Convert.ToDecimal(values[6]);
return dailyValues;
}
}
}
The File.ReadAllLines reads all lines from the CSV file into a string array.
The .Skip(1) skips the header line.
The .Select(v => DailyValues.FromCsv(v)) uses Linq to select each line and create a new DailyValues instance using the FromCsv method. This creates a System.Collections.Generic.IEnumerable<CsvDemo.DailyValues> type.
Finally, the .ToList() converts the IEnumerable to a List to match the type you want.
But my question is how can I make a calculation between two cells. For example I want to addition in column 3 the rows 2 and 3 and the result be displayed in the last column.

It seems (although, Im relying on a bit of mind-reading here) that you're trying to have a running balance field on each record of your list. You already have code which projects your raw csv data into a list of objects:
List<DailyValues> values = File.ReadAllLines("C:\\Users\\Josh\\Sample.csv")
.Skip(1)
.Select(v => DailyValues.FromCsv(v))
.ToList()
What you need to do from there is to project it again into another list, doing the calculation as you go. Obviously this calculation cannot do anything with the first row, as it needs a previous value, so in that instance you set it to zero.
class DailyValuesWithTotal : DailyValues
{
decimal TotalHigh
}
var projectedValues = values.Select( (v,i) => new DailyValuesWithTotal(){
Date = v.Date,
Open = v.Open,
High = v.High,
Low = v.Low,
Volume = v.Volume,
AdjClose = v.AdjClose,
TotalHigh = (i == 0) ? 0.0 : values[i-1].TotalHigh + v.High
})
The example above keeps a running total of the High field in a new field called TotalHigh.

If I understand your question correctly you want to sum the value of the current and next row to the current row last column.
If that's correct, you can do that with a for loop after you read all the data from the file. Just add a property in your class that you will fill in that for loop.

First of all, I suggest using FileHelpers or another CSV parsing library, since it will make your code a lot easier (especially error handling, handling quotes, etc.).
(I'll use this library in my answer, but you don't have to)
Add a new property to your class to store the calculated value:
[IgnoreFirst(1)]
[DelimitedRecord(",")]
class DailyValues
{
[FieldConverter(ConverterKind.Date, "yyyy-MM-dd")]
public DateTime Date;
public decimal Open;
public decimal High;
public decimal Low;
public decimal Close;
public decimal Volume;
public decimal AdjClose;
[FieldNotInFile]
public decimal SomethingCalculated;
}
Then just loop through your elements and use a temporary variable to keep track of the value of the previous element you're interested in so you can do your calculation:
var engine = new FileHelperEngine(typeof(DailyValues));
var res = engine.ReadFile(#"your:\path") as DailyValues[];
decimal? prev = null;
foreach(var value in res)
{
if(prev.HasValue)
// or whatever columns you want to use
value.SomethingCalculated = value.High + prev.Value;
prev = value.Open;
}

C#: Store number in database with 2 places of decimal. Eliminate unwanted numbers

I have a field NumberValue1 declared like this
public double NumberValue1 { get; set; }
NumberValue1 has a datatype of Number in the
oracle database
I read in a value from an excel file which is
22.55
[[col8Value is an object type]]
I then did this.
NumberValue1 = col8Value == null ? 0 : Math.Round(Convert.ToDouble(col8Value),2)
When I inserted this into the database I got the number below stored
22.550000000000001
Why is it bringing the other ...00001.
I just want it to show the 22.55 which is the initial number I loaded
Thanks.

Try using
private int NumberStoreHundreths
public double NumberValue1
{
get
{
return ((double)NumberStoreHundreths)/100;
}
set
{
NumberStoreHundreths = (int)(value*100);
}
}
kind of old school, but should work

You don't need to change db column type, just use different datatype in app (hint - use decimal). (Answer by Koka Chernov)
This fixed my problem in 2015.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

fast creation of a calculated column in c# - c#

Related

How can I have a datatable detect multiple columns when iterating through rows in C# with conditionals?

best data structure for storing large number of numeric fields

Database queries in Entity Framework model - variable equals zero

how to make calculation between two rows in a specific column in a list

C#: Store number in database with 2 places of decimal. Eliminate unwanted numbers

Categories

Resources