User defined formulas, evaluation, and speed

User defined formulas, evaluation, and speed - c#

Lets say we have an array of a million elements, and we want to accept user input on what kind of math to apply to each of those elements.
Would the program need to evaluate the user formula (string) a million times, for each of the elements. Or can the formula itself somehow be saved and interpreted only once? And then be applied in a loop to the million elements?
I'm just trying to get a general idea on how this works. Because right now it looks like excel type programs are always interpreted which is way too slow for my data research. So basically what I'm doing is re-compiling my user defined math each and every time I change it. And there seems to be no way of compiling a program, and recompiling it while it itself is running. So that it could accept user input, compile it, and run it, without quitting. Maybe with DLLs and seperate app domains, but then all data has to be marshalled.
I just don't get how one gets top speed for data research, when everything is going against you. Including the stupid operating system that can't allow the DLL to be unloaded, loaded, because of security concerns. Or something. However maybe I'm just spurting nonsense, since I just began programming a few months ago.

In case you are interested in, or prefer to implement the solution yourself, you could parse the formula into a (binary) tree, where each node represents an operator (+, -, *,..) or a function (cos(), max(),..) and each leaf represents either a constant or one of the formula's variables.
This can be done in a pretty efficient way and then applied to the entire data set by recursively calculating the value of the root while filling in the correct variables at the leaves.
To figure out how best to do this, this question might provide some pointers (though I am fairly certain the answer provided there is only one of several ways):
Create Binary Tree from Mathematical Expression
Hope this helps or at least is of some interest to other readers!

If you want user input against a data set and expressions compiled on the fly, then check out a suitable C# REPL.
You should be able to sumply type in your transformations etc., and have the REPL compile and apply them. It's a useful tool/technique for trying out solutions prior to coding them up properly in classes/components.

Related

User Friendly way to use formula for calculation

I have an object model that I need the user to be able to create a formula based on, and also use some built-in functions. For example:
AddWorkDays(MyObject.StartDate, 3)
MyObject has various properties that the user may access. We may also need to do some If/Then statements. The users are very familiar with Excel formulas, because that is how they currently do their work.
I see two possible options:
create my own parser using one of the many available C# Libraries
Adapt an Excel-based parser to be context aware of my objects.
The issue with option 1 is I don't want to re-invent the wheel. I would expect someone has already built a parser that can handle basic functions and math operations and is context aware based on class(es) passed in. I can't seem to find something like this.
Option 2 would allow the user to re-use their existing Excel knowledge to build formulas like:
=if(MyObject.Type = "A", AddWorkDays(MyObject.StartDate, 3), AddWorkDays(MyObject.StartDate, 5)
I see XLParser is advertised as being great for parsing Excel formulas, but it seems I would need to add-on the Context-aware part for reading and validating properties on MyObject.
Any experience, examples, warnings, etc. on how to proceed are welcome

I solved this issue by using the FLEE library: https://github.com/mparlak/Flee
It supports "Context" where you can define variables that should be in context. Out of the box it supports regular syntax for things like addition, subtraction, and even a basic IF statement. It is easily extensible to add your own functions. I made it Excel-like by defining some of the common functions: AND, OR, MIN, MAX. This was less than 100 lines of code to do.
I use it to process relatively large data sets (200,000 items with 5 formulas for each item) and it processes in under 10 seconds if used correctly.

A .Net math expression COMPILER for Windows phone?

I'm making a function graphing app for windows phone where user would input a function for the app to draw. I need a fast (here I mean the fastest possible) expression evaluator. I've seen a lot of math parsers out there, but it seems none of them allow for compiling and evaluating separately. I need this because I need to calculate a lot of data points (1000+) at 30 or even better 60fps. All of those I found take a string and parse it + evaluate at the same time. As I'm making this for windows phone, I cannot compile c# code directly because of restrictions.
It should be able to do something like: 2^2*sin(x/20)+abs(x)/log(x, 2)
SOLVED:
I'm really angry with myself because I couldn't google this out and finaly when I ask a question here, I find the answer myself.
This did the trick:
http://nicoschertler.wordpress.com/2011/09/22/math-parser-using-lambda-expressions/
its so good that 1.5ghz dual core phone can run it at 1/4 pixel precision at 60fps!!

Strange because there are quite some parser generators, which in return give you parser ready to use. It is up to you if you define evaluation while parsing, or just build a math expression for later evaluation.
One disadvantage of general parser generator is, well, the fact it is general, so it does more than you ask. On the other hand, it is easy to add something when the need strikes (I wouldn't dare touch anything in the code you linked).
My own framework (i.e. parser generator) contains very simple calculator (so it does math expression parsing) in 2 forms -- written as C# code (slow) and generated to action table (fast -- it could be faster of course).
http://sourceforge.net/projects/naivelangtools/

How to allow users to define financial formulas in a C# app

I need to allow my users to be able to define formulas which will calculate values based on data. For example
//Example 1
return GetMonetaryAmountFromDatabase("Amount due") * 1.2;
//Example 2
return GetMonetaryAmountFromDatabase("Amount due") * GetFactorFromDatabase("Discount");
I will need to allow / * + - operations, also to assign local variables and execute IF statements, like so
var amountDue = GetMonetaryAmountFromDatabase("Amount due");
if (amountDue > 100000) return amountDue * 0.75;
if (amountDue > 50000) return amountDue * 0.9;
return amountDue;
The scenario is complicated because I have the following structure..
Customer (a few hundred)
Configuration (about 10 per customer)
Item (about 10,000 per customer configuration)
So I will perform a 3 level loop. At each "Configuration" level I will start a DB transaction and compile the forumlas, each "Item" will use the same transaction + compiled formulas (there are about 20 formulas per configuration, each item will use all of them).
This further complicates things because I can't just use the compiler services as it would result in continued memory usage growth. I can't use a new AppDomain per each "Configuration" loop level because some of the references I need to pass cannot be marshalled.
Any suggestions?
--Update--
This is what I went with, thanks!
http://www.codeproject.com/Articles/53611/Embedding-IronPython-in-a-C-Application

Iron Python Allows you to embed a scripting engine into your application. There are many other solutions. In fact, you can google something like "C# embedded scripting" and find a whole bunch of options. Some are easier than others to integrate, and some are easier than others to code up the scripts.
Of course, there is always VBA. But that's just downright ugly.

You could create a simple class at runtime, just by writing your logic into a string or the like, compile it, run it and make it return the calculations you need. This article shows you how to access the compiler from runtime: http://www.codeproject.com/KB/cs/codecompilation.aspx

I faced a similar problem a few years ago. I had a web app with moderate traffic that needed to allow equations, and it needed similar features to yours, and it had to be fast. I went through several ideas.
The first solution involved adding calculated columns to our database. Our tables for the app store the properties in columns (e.g., there's a column for Amount Due, another Discount, etc.). If the user typed in a formula like PropertyA * 2, the code would alter the underlying table to have a new calculated column. It's messy as far as adding and removing columns. It does have a few advantages though: the database (SQL Server) was really fast at doing the calculations; the database handled a lot of error detection for us; and I could pretend that the calculated values were the same as the non-calculated values, which meant that I didn't have to modify any existing code that worked with the non-calculated values.
That worked for a while until we needed the ability for a formula to reference another formula, and SQL Server doesn't allow that. So I switched to a scripting engine. IronPython wasn't very mature back then, so I chose another engine... I can't remember which one right now. Anyway, it was easy to write, but it was a little slow. Not a lot, maybe a few milliseconds per query, but for a web app the time really added up over all the requests.
That was when I decided to write my own parser for the formulas. That is, I have a PlusToken class to add two values, an ItemToken class that corresponds to GetValue("Discount"), etc. When the user enters a new formula, a validator parses the formula, makes sure it's valid (things like, did they reference a column that doesn't exist?), and stores it in a semi-compiled form that's easy to parse later. When the user requests a calculated value, a parser reads the formula, parses it, figures out what data is needed from the database, and computes the final answer. It took a fair amount of work up front, but it works well and it's really fast. Here's what I learned:
If the user enters a formula that leads to a cycle in the formulas, and you try to compute the value of the formula, you'll run out of stack space. If you're running this on a web app, the entire web server will stop working until you reset it. So it's important to detect cycles at the validation stage.
If you have more than a couple formulas, aggregate all the database calls in one place, then request all the data at once. Much faster.
Users will enter wacky stuff into formulas. A parser that provides useful error messages will save a lot of headaches later on.

If the custom scripts don't get more complex than the ones that you show above, I would agree with Sylvestre: Create your own parser, make a tree and do the logic yourself. You can generate a .Net expression tree or just go through the Syntax tree yourself and make the operations within your own code (Antlr below will help you generate such code).
Then you are in complete control of your references, you are always within C#, so you don't need to worry about memory management (any more than you would normally do) etc. IMO Antlr is the best tool for doing this in C# . You get examples from the site for little languages, like your scenario.
But... if this is really just a beginning and at the end you need almost full power of a proper scripting language, you would need to go into embedding a scripting language to your system. With your numbers, you will have a problem with performance, memory management and probably references as you noted. There are several approaches, but I cannot really give one recommendation for your scenario: I've never done it in such a scale.

You could build two base classes UnaryOperator (if, square, root...) and BinaryOperator (+ - / *) and build a tree from the expression. Then evaluate the tree for each item.

How do I write user-extendable code? [duplicate]

This question already has answers here:
Adding scripting functionality to .NET applications
(9 answers)
Closed 8 years ago.
As a perl programmer I can evaluate strings as code If I wished, Can I do the same in C#( with strings or some other object containing user input)?
What I want to accomplish is to create an object where its methods may be predefined in my source code or may be defined at run-time by the user by entering a string that represents C# code for a method or a SQL Query. The method call should always return its scalar value as a string, I believe it would be desirable to make available some pre-defined "system" variables for use in the method call and some "Cleanup" code to validate that a string is actually returned.
Psuedo - Structure
Object Statistic
string Name;
functionref Method;
The architecture I have in mind would basically collect these in realtime and add them upon request of the user to the list of statistics that user wants to display. Defined Statistics could be saved to a file and loaded into the main program during initialization. This way the user doesn't have to keep redefining the desired statistic. Edit/Update/Delete of statistics are needed.
If this is successful then I( the programmer) won't have to go and add new code every time someone decides that they have a new piece of information they want displayed on their stat board that I haven't already written code for .
Any ideas on where to start reading to accomplish this in C#?
The purpose for this capability is a program that displays statistics for a running system/database. Values to watch are not necessarily known at design time, nor how to define the value desired for retrieval. I want to allow the User to define any statistics beyond any I pre-code for the system.

Check out this earlier question.
However, also consider the stability, scalability and security of your solution. If you allow arbitrary execution, your application isn't going to be fun to operate and support. Setup a very well-delimited sandbox for externally provided code.
Also, don't expect ordinary users (whoever they are) to be able coders.
You might want to offer restricted but more manageable extensibility through e.g. dependency injection instead.

I'll echo what Pontus said, especially regarding stability, support costs, and expecting users to write code. I'll add performance to the list as well, since these users are likely to be unaware of performance implications of their code. For these reasons and others, user-facing eval-like constructs are not considered wise in C#, and probably shouldn't be in perl as well.
Instead, look at the System.AddIn namespace in .Net for providing plugin functionality for your application.

What is the best way to implement precomputed data?

I have a computation that calculates a resulting percentage based on certain input. But these calculations can take quite some time, which can be annoying. Since there are about 12500 possible inputs, I thought it would be a good idea to precompute all the data, and look this up during normal program execution.
My first idea was to just create a simple file which is read at program initialization and populates some arrays. Although this will work, I would like to know if there are some other options? For example that the array is populated during compile time.
BTW, I'm writing my code in C#.

This tutorial here implements a serializer, which you can use to easily convert an object to a binary file and back. Once you have the serializer in hand, you can just create an object that holds all your data and serialize it; when you actually run your program, just deserialize the object and use it.
This has all the benefits of saving an object to the hard drive, with an implementation that is object-agnostic (meaning you don't have to write much code for any object you want to serialize) and outputs in binary (thus saving space, if that is a concern).

A file with data is probably the easiest and most flexible way to implement it.
If you wanted it in memory without having to read it from somewhere, I would write a program to output your data in C#-like CSV format suitable for copying and pasting into an array/collection initializer, and thereby generate the source code for your precomputed data.

Create a program that outputs valid C# code which initializes your lookup tables. Make this part of your build process so that it will automatically create the source file and then build the rest of your project.

As Daniel Lew said, serialize it into a binary file.
If you need speed, go for a Dictionary. A Dictionary is indexed on it's key, and should allow rapid lookup even with large amounts of data.

I would always start by considering if there was any way to avoid precomputing. If there's 12500 possible inputs, how many are required per user request ? Will all 12500 be needed at the same time or will they be spread out in time ? If you can get by with calculating a few at a time, I'd do that with lazy initialization. I prefer this solution simply because I'll have fewer issues with it in the long run. What do you do when the persistent format changes, or the data changes. How will you handle it when the file is missing or corrupted ? Persisting to a file does not create less code.
I would serialize such a file to a human-readable format if I had to persist a pre-loaded version. I'd probably use xml serialization since it's simple. But quite often there's issues of invalidation and recalculation. Do the values never change or only very infrequently ?

I agree with mquander and Trent. Use your favorite language or script to generate the whole C# file you need to define your data (no copy-pasting, that's a manual step and error-prone). Add it as a Pre-Build event in Visual Studio. You could even detect that you have an up-to-date file and avoid regeneration for most builds.
There is definitely a way to statically generate almost any data using template metaprogramming in C++, although it can be painful. It's not worth it unless you need many sets of different data in several parts of your program. I am not familiar enough with metaprogrammation in C# to evaluate the general effort in your case. You should look into that.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.