I'm trying to build a module where customer can create custom formulas and calculate them on the data he has in SQL database.
The customer has access to basic arithmetic operators (+, -, *, /) and aggregate functions (MIN, MAX, AVG, SUM). In addition he is given a set of columns he can perform those operations on, for the sake of the example: (UnitsProduced, ProductionTime, DownTime etc.). With these tools he can construct formulas such as: SUM(UnitsProduced)/SUM(ProductionTime).
After checking that the formula is mathematically correct and contains only valid columns I fetch the data using Stored Procedure from SQL Server into DataTable in C# and then use DataTable.Compute() method on that formula.
The problem is when the aggregate function yields 0. I tried padding the provided formula with IIF condition so that SUM(UnitsProduced)/SUM(ProductionTime) becomes SUM(UnitsProduced)/IIF(SUM(ProductionTime)<>0,SUM(ProductionTime),NULL) and i would expect to get null as the result but it gives me the error:
Cannot evaluate. Expression 'System.Data.FunctionNode' is not an aggregate
After doing some research I found that I cannot use aggregate function in a conditional statement. I haven't tried LINQ yet and I don't know if it's going to work.
How can I solve this problem?
After some research, me and my colleagues came to an interesting solution.
Basically you need to find all the divisors of the given formula and create a filtering string that will filter out all the data where the divisor is 0. This code also handles aggregation on multiple columns because Comute can't do that.
The code that takes care of zero divisions and aggregation:
DataTable dt = db.GetDataFromStoredProcedure("GetCustomFormulasData", param);
//Avoid zero division
List<string> divisors = GetFormulaDivisors(formula);
string formulaFilter = "";
foreach (string divisor in divisors)
formulaFilter += $" and {divisor}<>0 ";
if(formulaFilter != "")
formulaFilter = formulaFilter.Remove(0, 5); //remove the 1st " and "
//Compute cant handle multiple columns in one aggregate, so we handle it here
List<string> aggregations = GetFormulaAggregations(formula);
for (int i = 0; i < aggregations.Count; i++)
{
string expr = aggregations[i].Substring(4, aggregations[i].Length - 5);//remove the agg func and last paren
DataColumn dc = new DataColumn
{
DataType = typeof(float),
ColumnName = $"Column{i}",
Expression = expr
};
dt.Columns.Add(dc);
Regex rgx = new Regex(Regex.Escape(expr));
string temp = rgx.Replace(aggregations[i], dc.ColumnName);
rgx = new Regex(Regex.Escape(aggregations[i]));
formula = rgx.Replace(formula, temp);
if (formulaFilter != "")
formulaFilter = rgx.Replace(formulaFilter, temp);
}
return float.Parse(dt.Compute(formula, formulaFilter).ToString());
Where GetDataFromStoredProcedure, GetFormulaDivisors, GetFormulaAggregations are inner methods that do as their name suggests.
So for the example of formula SUM(GoodUnitsProduced+RejectedUnits)/SUM(ProductionTime), the DataTable dt will get the Columns: (GoodUnitsProduced, RejectedUnits, ProductionTime) from stored procedure, List divisors will have SUM(ProductionTime) and List aggregations will have: SUM(GoodUnitsProduced+RejectedUnits) and SUM(ProductionTime).
Right before calling the Compute method the DataTable will have two additional columns, one for each aggregate in the formula and filter will have the divisors in it. So formula = SUM(Column0)/SUM(Column1) and formulaFilter = SUM(Column1)<>0 and the problem of zero division is solved.
Related
I have a huge dataset that I want to write into the Excel and need to perform conditional formatting of rows based on a business logic. So, for the data insertion part, I am using a data array to populate the Excel and it works pretty fast. However, I see a severe performance degradation when it comes to formatting the rows. It almost takes more than double the time just to do the formatting.
As of now, I am applying formatting to individual rows and loop through a series of rows. However, I am wondering if I can select multiple rows at a time and apply bulk formatting options to those rows:
Here is what I have right now:
foreach (int row in rowsToBeFormatted)
{
Excel.Range range = (Excel.Range)xlsWorksheet.Range[xlsWorksheet.Cells[row + introFormat, 1], xlsWorksheet.Cells[row + introFormat, 27]];
range.Font.Size = 11;
range.Interior.ColorIndex = 15;
range.Font.Bold = true;
}
And here is a demo of how I am trying to select multiple rows to the range and apply the formatting:
string excelrange = "A3:AA3,A83:AA83,A88:AA88,A94:AA94,A102:AA102,A106:AA106,A110:AA110,...." (string with more than 3000 characters)
xlsWorksheet.get_Range(excelrange).Interior.Color = Color.SteelBlue;
However, I get the following error when I execute the code:
Exception from HRESULT: 0x800A03EC
and there is nothing in inner exception. Any ideas how can I achieve the desired result?
As per comments under the question, there's hard-coded limit of 255 characters for a range string, however I wasn't able to find any documentation about it. Another commenter suggested to use semicolon as separator, but the documentation clearly states that comma should be used as union operator in range string:
The name of the range in A1-style notation in the language of the application. It can include the range operator (a colon), the intersection operator (a space), or the union operator (a comma). It can also include dollar signs, but they are ignored. You can use a local defined name in any part of the range. If you use a name, the name is assumed to be in the language of the application.
So where do we go from here? Formatting each range individually is indeed inefficient. Application interface provides method Union, but calling it in a loop is as inefficient as individual formatting. So the natural choice is to use the range string limit to the maximum and thus minimizing number of calls to COM interface.
You can split the full range to format into chunks; each not exceeding 255 characters limit. I would implement it using enumerators:
static IEnumerable<string> GetChunks(IEnumerable<string> ranges)
{
const int MaxChunkLength = 255;
var sb = new StringBuilder(MaxChunkLength);
foreach (var range in ranges)
{
if (sb.Length > 0)
{
if (sb.Length + range.Length + 1 > MaxChunkLength)
{
yield return sb.ToString();
sb.Clear();
}
else
{
sb.Append(",");
}
}
sb.Append(range);
}
if (sb.Length > 0)
{
yield return sb.ToString();
}
}
var rowsToFormat = new[] { 3, 83, 88, 94, 102, 106, 110/*, ...*/ }
var rowRanges = rowsToFormat.Select(row => "A" + row + ":" + "AA" + row);
foreach (var chunk in GetChunks(rowRanges))
{
var range = xlsWorksheet.Range[chunk];
// do formatting stuff here
}
The above is 10-15 times faster than individual formatting:
foreach (var rangeStr in rowRanges)
{
var range = xlsWorksheet.Range[rangeStr];
// do formatting stuff here
}
I can also see further space for optimization like grouping contiguous rows, but in case you are formatting discrete rows with subtotals, it won't help.
I've tried to compute a column in my datagrid, i will be showing these in my code as per below. I Keep getting these error.
I've go through these links
1. How To Convert The DataTable Column type?
2. Error while taking SUM() of column in datatable
3. Invalid usage of aggregate function Sum() and Type: String
//This is the column i add into my datagrid
MITTRA.Columns.Add("Min_Tol");
MITTRA.Columns.Add("Max_Tol");
MITTRA.Columns.Add("Min_Weight");
MITTRA.Columns.Add("Max_Weight");
// The following codes is working finely
// if you notice MTTRQT column, it's a data queried from database
for (int i = 0; i <= MITTRA.Rows.Count - 1; i++)
{
string item = MITTRA.Rows[i]["MTITNO"].ToString();
Tolerancechecking = database_select4.LoadUser_Tolerance(item);
MITTRA.Rows[i]["Min_Tol"] = Tolerancechecking.Rows[0]["Min_Tol"].ToString();
MITTRA.Rows[i]["Max_Tol"] = Tolerancechecking.Rows[0]["Max_Tol"].ToString();
MITTRA.Rows[i]["Min_Weight"] = Convert.ToDecimal(MITTRA.Rows[i]["MTTRQT"]) - ((Convert.ToDecimal(MITTRA.Rows[i]["MTTRQT"]) * Convert.ToDecimal(MITTRA.Rows[i]["Min_Tol"]) / 10));
MITTRA.Rows[i]["Max_Weight"] = Convert.ToDecimal(MITTRA.Rows[i]["MTTRQT"]) + ((Convert.ToDecimal(MITTRA.Rows[i]["MTTRQT"]) * Convert.ToDecimal(MITTRA.Rows[i]["Max_Tol"]) / 10));
dataGrid2.Columns.Clear();
dataGrid2.ItemsSource = null;
dataGrid2.ItemsSource = Tolerancechecking.DefaultView;
}
//Working Sum computation
Decimal Sum = Convert.ToDecimal(MITTRA.Compute("SUM(MTTRQT)", string.Empty));
MaxTol.Text = Sum.ToString(); /*** This is working as i got my value on the text box ****/
Errors when trying sum computation for different column
Initial try
1. Decimal Sum = Convert.ToDecimal(MITTRA.Compute("SUM(Min_Weight)", string.Empty));
Errors occurred
Invalid usage of aggregate function Sum() and Type: String.
Second Attempts
2. Decimal Sum = Convert.ToDecimal(MITTRA.Compute("SUM(Convert(Min_Weight,'System.Decimal'))", string.Empty));
3. Decimal Sum = Convert.ToDecimal(MITTRA.Compute("SUM(Convert(Min_Weight,'System.Decimal'))", ""));
Errors occurred
Syntax error in aggregate argument: Expecting a single column argument with possible 'Child' qualifier.
How am i going to get the compute sum function to work?
Thank you #NoChance for the solutions. Hope this post can help others too.
/***** This is the part where i declare my datacolumn data type ****/
DataColumn Min_Weight = new DataColumn("Min_Weight");
Min_Weight.DataType = System.Type.GetType("System.Decimal");
MITTRA.Columns.Add(Min_Weight);
/**** This Works finally *****/
Decimal Sum = Convert.ToDecimal(MITTRA.Compute("SUM(Min_Weight)", string.Empty));
MaxTol.Text = Sum.ToString();
I am trying to use Datatable Select but I am getting incorrect data
double marks = 5;
DataRow[] result = dsGrades.Tables[0].Select("Convert(MarksFrom, 'System.Decimal') >=" + marks + " And " + marks + "<= Convert(MarksTo, 'System.Decimal') ");
dsGrades contains below data,
when 'marks' contain '5.0', I am expecting row where MarksFrom = 5.0 and MarksTo = 5.9, as 5.0 falls in this range, but here it is returning 5 rows.
Whats wrong with datatable select? Any help is appreciated.
If would make sense to change your DataColumn types to double, however even with decimal you don't need a conversion inside the expression.
Note in your provided example, your constraint appears to be backwards. You're specifying that you want MarksFrom greater or equal to the passed in amount, which won't return a single row in the range you want.
This should return a single row for any mark passed in:
double marks = 5.0;
DataRow[] result = dsGrades.Tables[0].Select($"{marks} >= MarksFrom AND {marks} <= MarksTo");
Also since you're always only expecting a single match, you could change this to:
DataRow match = table.Select($"{marks} >= MarksFrom AND {marks} <= MarksTo").SingleOrDefault();
SingleOrDefault will throw an InvalidOperationException if more than one result is returned, which may be the desired outcome in this case.
You can do it like this:
double marks = 5.0;
decimal newMarks = Convert.ToDecimal(marks);
var result =
dsGrades.Tables[0]
.AsEnumerable()
.Where( dr => dr.Field<decimal>( "MarksFrom" ) >= newMarks
&& dr.Field<decimal>( "MarksTo" ) < newMarks + 1);
This could be the solution:
var result = dsGrades.Tables[0].Select("Convert(MarksFrom, 'System.Decimal') >=" + newMarks + " And Convert(MarksTo, 'System.Decimal') < " newMarks + 1);
From my comment on question explaining problem:
Getting all the rows where MarksFrom is above 5 will return the first 5 visible rows in table, checking the second condition for these 5 rows and 5.0 is less than or equal to MarksTo in each of the rows so this would evaluate true for these rows. Therefore grabbing 5 rows
Needs to do casting from datatable as below:
DataRow[] drGrdPercntl = dt_GrdPercntil.Select($"{SubjVal} >= Convert(MarksFrom, 'System.Decimal') AND {SubjVal} <= Convert(MarksTo, 'System.Decimal')");
I'm writing a C# application that runs a number of regular expressions (~10) on a lot (~25 million) of strings. I did try to google this, but any searches for regex with "slows down" are full of tutorials about how backreferencing etc. slows down regexes. I am assuming that this is not my problem because my regexes start out fast and slow down.
For the first million or so strings it takes about 60ms per 1000 strings to run the regular expressions. By the end, it's slowed down to the point where its taking about 600ms. Does anyone know why?
It was worse, but I improved it by using instances of RegEx instead of the cached version and compiling the expressions that I could.
Some of my regexes need to vary e.g. depending on the user's name it might be
mike said (\w*) or john said (\w*)
My understanding is that it is not possible to compile those regexes and pass in parameters (e.g saidRegex.Match(inputString, userName)).
Does anyone have any suggestions?
[Edited to accurately reflect speed - was per 1000 strings, not per string]
This may not be a direct answer to your question about RegEx performance degradation - which is somewhat fascinating. However - after reading all of the commentary and discussion above - I'd suggest the following:
Parse the data once, splitting out the matched data into a database table. It looks like you're trying to capture the following fields:
Player_Name | Monetary_Value
If you were to create a database table containing these values per-row, and then catch each new row as it is being created - parse it - and append to the data table - you could easily do any kind of analysis / calculation against the data - without having to parse 25M rows again and again (which is a waste).
Additionally - on the first run, if you were to break the 25M records down into 100,000 record blocks, then run the algorithm 250 times (100,000 x 250 = 25,000,000) - you could enjoy all the performance you're describing with no slow-down, because you're chunking up the job.
In other words - consider the following:
Create a database table as follows:
CREATE TABLE PlayerActions (
RowID INT PRIMARY KEY IDENTITY,
Player_Name VARCHAR(50) NOT NULL,
Monetary_Value MONEY NOT NULL
)
Create an algorithm that breaks your 25m rows down into 100k chunks. Example using LINQ / EF5 as an assumption.
public void ParseFullDataSet(IEnumerable<String> dataSource) {
var rowCount = dataSource.Count();
var setCount = Math.Floor(rowCount / 100000) + 1;
if (rowCount % 100000 != 0)
setCount++;
for (int i = 0; i < setCount; i++) {
var set = dataSource.Skip(i * 100000).Take(100000);
ParseSet(set);
}
}
public void ParseSet(IEnumerable<String> dataSource) {
String playerName = String.Empty;
decimal monetaryValue = 0.0m;
// Assume here that the method reflects your RegEx generator.
String regex = RegexFactory.Generate();
for (String data in dataSource) {
Match match = Regex.Match(data, regex);
if (match.Success) {
playerName = match.Groups[1].Value;
// Might want to add error handling here.
monetaryValue = Convert.ToDecimal(match.Groups[2].Value);
db.PlayerActions.Add(new PlayerAction() {
// ID = ..., // Set at DB layer using Auto_Increment
Player_Name = playerName,
Monetary_Value = monetaryValue
});
db.SaveChanges();
// If not using Entity Framework, use another method to insert
// a row to your database table.
}
}
}
Run the above one time to get all of your pre-existing data loaded up.
Create a hook someplace which allows you to detect the addition of a new row. Every time a new row is created, call:
ParseSet(new List<String>() { newValue });
or if multiples are created at once, call:
ParseSet(newValues); // Where newValues is an IEnumerable<String>
Now you can do whatever computational analysis or data mining you want from the data, without having to worry about performance over 25m rows on-the-fly.
Regex does takes time to compute. However, U can make it compact using some tricks.
You can also use string functions in C# to avoid regex function.
The code would be lengthy but might improve performance.
String has several functions to cut and extract characters and do pattern matching as u need.
like eg: IndeOfAny, LastIndexOf, Contains....
string str= "mon";
string[] str2= new string[] {"mon","tue","wed"};
if(str2.IndexOfAny(str) >= 0)
{
//success code//
}
I have a scenario at work where we have several different tables of data in a format similar to the following:
Table Name: HingeArms
Hght Part #1 Part #2
33 S-HG-088-00 S-HG-089-00
41 S-HG-084-00 S-HG-085-00
49 S-HG-033-00 S-HG-036-00
57 S-HG-034-00 S-HG-037-00
Where the first column (and possibly more) contains numeric data sorted ascending and represents a range to determine the proper record of data to get (e.g. height <= 33 then Part 1 = S-HG-088-00, height <= 41 then Part 1 = S-HG-084-00, etc.)
I need to lookup and select the nearest match given a specified value. For example, given a height = 34.25, I need to get second record in the set above:
41 S-HG-084-00 S-HG-085-00
These tables are currently stored in a VB.NET Hashtable "cache" of data loaded from a CSV file, where the key for the Hashtable is a composite of the table name and one or more columns from the table that represent the "key" for the record. For example, for the above table, the Hashtable Add for the first record would be:
ht.Add("HingeArms,33","S-HG-088-00,S-HG-089-00")
This seems less than optimal and I have some flexibility to change the structure if necessary (the cache contains data from other tables where direct lookup is possible... these "range" tables just got dumped in because it was "easy"). I was looking for a "Next" method on a Hashtable/Dictionary to give me the closest matching record in the range, but that's obviously not available on the stock classes in VB.NET.
Any ideas on a way to do what I'm looking for with a Hashtable or in a different structure? It needs to be performant as the lookup will get called often in different sections of code. Any thoughts would be greatly appreciated. Thanks.
A hashtable is not a good data structure for this, because items are scattered around the internal array according to their hash code, not their values.
Use a sorted array or List<T> and perform a binary search, e.g.
Setup:
var values = new List<HingeArm>
{
new HingeArm(33, "S-HG-088-00", "S-HG-089-00"),
new HingeArm(41, "S-HG-084-00", "S-HG-085-00"),
new HingeArm(49, "S-HG-033-00", "S-HG-036-00"),
new HingeArm(57, "S-HG-034-00", "S-HG-037-00"),
};
values.Sort((x, y) => x.Height.CompareTo(y.Height));
var keys = values.Select(x => x.Height).ToList();
Lookup:
var index = keys.BinarySearch(34.25);
if (index < 0)
{
index = ~index;
}
var result = values[index];
// result == { Height = 41, Part1 = "S-HG-084-00", Part2 = "S-HG-085-00" }
You can use a sorted .NET array in combination with Array.BinarySearch().
If you get a non negative value this is the index of exact match.
Otherwise, if result is negative use formula
int index = ~Array.BinarySearch(sortedArray, value) - 1
to get index of previous "nearest" match.
The meaning of nearest is defined by a comparer you use. It must be the same you used when sorting the array. See:
http://gmamaladze.wordpress.com/2011/07/22/back-to-the-roots-net-binary-search-and-the-meaning-of-the-negative-number-of-the-array-binarysearch-return-value/
How about LINQ-to-Objects (This is by no means meant to be a performant solution, btw.)
var ht = new Dictionary<string, string>();
ht.Add("HingeArms,33", "S-HG-088-00,S-HG-089-00");
decimal wantedHeight = 34.25m;
var foundIt =
ht.Select(x => new { Height = decimal.Parse(x.Key.Split(',')[1]), x.Key, x.Value }).Where(
x => x.Height < wantedHeight).OrderBy(x => x.Height).SingleOrDefault();
if (foundIt != null)
{
// Do Something with your item in foundIt
}