C# Technology Options for Storing/Managing Calculations Remotely - c#

Trying to figure out what options I have available to have a program running locally get its calculations or results of the calculations from a remote source.
Problem:
I have a data acquisition application that reads a lot of instruments and collects data while testing equipment. This data than goes through various forms of aggregation (min,max,average,etc) and than several calculations are applied to it and the result is saved to a database. This process happens several times during a test. This application runs on a machine dedicated to perform this test, but users outside of the test also need to perform the same calculations for experimentation, data analysis etc etc.
In the past, our two applications, (one with the equipment and the one with the users) would get updated every time a calculation changed and then deployed everywhere. This is a pain.
Question:
I'm looking for possible options to solve this problem.
What I've found so far.
1). WCF.
Like:
Only have to update the server and both the programs can now take advantage of the new calculation.
Concern:
The DataContract would contain several classes that would have to be passed to the function(s). ( Total Size of "data" could range from 1 MB to 1 GB depending on the situation). Not sure if the amount of data is an actual problem at this time.
2). Store compiled DLLs and download/load them.
Query the server for a class library. Download it. Load it into memory and use the calculations.
Like:
Do not have to pass a lot of data back and forth.
Concern:
DLL that now resides on each and every computer. People may not be forced to update to the correct version which may cause problems. DLL on the local persons computer may pose a security risk.

That is a pretty tricky question and I think there are quite a number of ways you could try solve it. Here are a few ideas that may get you started
Use Akka.Net and take a look at Akka.Remote which may be able to solve your remote deploy issue. But I'm not sure that this will fit with what you need if you have users deciding to join the cluster ad-hoc rather than having a fixed set of places to distribute calculations?
If the calculations are simple mathematical formulas you could store them as a set of rules and download the formulas. I.e. taking more of a rules engine approach. See perhaps C# expression trees
Browser based solution. This clearly wins from a ease of remote deployment point of view. There are many options if you don't want to use JavaScript, e.g. PureScript, F# (Fable), ClojureScipt, ELM and possibly even C# with JSIL/Bridge. Moving from desktop to web is quite a big shift though so make sure you know what you are in for :)

After trying a few options we found that sending data back and forth was too cumbersome. Instead we chose to load a DLL dynamically with all the calculations we require and use a set of interfaces to define the behaviour.
What I did learn and found was neat is that I can store the DLL in the database and load it directly into memory without having to copy it to the hard drive first thus providing an added layer of protection. Using SQL and FILESTREAM. It also provides a sort of version control in that I can choose which version of the DLL to load from the database and calculate/recalculate the various values.
// Read in File
using (SqlFileStream fileStream = new SqlFileStream(path, transaction, System.IO.FileAccess.Read))
{
byte[] buffer = new byte[size];
int bytesRead = fileStream.Read(buffer, 0, buffer.Length);
// Load Assembly
Assembly DLL = Assembly.Load(buffer);
Type myInterfaceType = DLL.GetType("MyNamespace.MyClass");
// Get access to the root interface
IMyInterface myClass = (IMyInterface)Activator.CreateInstance(myInterfaceType);
}

Related

Best approach to an extendible statistics system

Ok so - I need to implement a statistics/data-points/data-sources system.
I basically want to pass data periodically to the 'root' and have it process and update the relevant properties for access throughout the application - as data sources for graphing, labels, status checks etc.
I was wondering if there were some real world examples of this from users that have handled something like this in the past. I googled the hell out of this and I keep getting a mixed bag of results as to what I should do and I hate just programming and 'letting the pieces fall in place'. I need a direction.
Edit for clarity:
The data sources will be:
Local files (xml most likely),
Local sql,
Remotely acquired json data,
Remotely acquired sql.
Types of subsystems (limited list, just for illustrative purposes):
Connection statuses - both bool and text,
Graphing/Gridview data sources,
Processing/Predictive methods (eg probability distribution etc),
General statistical profiles based on client/dept,
General statistical profiles based on date/time/spans,
more...
As I said, a lot of these sources can be used in collaboration to update segments of data, should the need arise (which it likely will). One piece of information can be used across multiple systems, but there will be times when a fetch will be very specific for one point.
I hope that made it a little bit clearer... maybe. I would like to handle all the data processing in one area if possible. It'll be easier to work with as the flow increases over time.
I wrote down some thoughts on it as brainstormed the idea.
The observer pattern
This pattern seems good, however it does have some drawbacks in the sense that all subs will be notified instead of selective ones. Which will force me to either check the data then process, or create multiple observable objects for each type of data and have it cascade to the subs. I definitely like how extendable this is, also allows me to sub to multiple types of data sources should the need arise. On the other hand also seems like a lot of work to get any sort of results. Paying it forward as it were.
Strategy
This pattern also seems relevant but in a different way. Store the processing of the raw data separately and just have a parent class that holds all the statistical information (so to speak). I like this because all information is stored centrally and the 'nodes' process it and return it. Allows for easy access and storage, however the amount of properties (unless I split it up, likely) will be huge.
Custom events.
Now - I guess this COULD be seen as a reinvention of the first one. But I do like the control it offers.
A combination of observer and strategy.
This could be weird but hear me out. So You have your observable object have the data passed to it, which cascades down to appropriate subs that will process that information for different reasons, then using the strategy from each of those subs and process the information accordingly and pass it back to the sub for storage / access.
An example of this would be periodically withdrawing data from some kind of source; this information can be used to update multiple areas of the system (observer), but each area needs it processed a different way (strategy).
Is this logic sound or should I be looking at it a different way. I do need this to be extendable and scalable as the system could potentially be handling 'large' volumes.
Thoughts? Tried to be specific but remain on topic.
I ended out going with a combination of observer and strategy with a few customs events thrown in. Funny how that works. It actually works very well - lightweight, extendable and scalable on testing with 'large' (5-7gigs) of input. Desired results every time. Although assistance didn't happen I thought I would share the fact that the observer/strategy combination actually worked well.

how can I limit a .net application to be executed for specific number of times

I'm making a C# (winforms) app that I want user to be able to execute only for a defined number of times (say 100 times). I know its possible to add an xml or a text file to store a count but that would be easy to edit or "crack"... is their any way to embed the counter in the code or maybe any other way that might not be easy to crack? and that its also easy later to "update" the membership for another period of 100 executions?
Thanks in advance
There are lots of ways to store a variable. As you've noted, you can write it to a text or xml file. You could write it to the Registry. You could encrypt it and write it in a file somewhere.
Probably the most secure method is to write it on a server and have the application "call home" whenever it wants to run.
Preventing copying is a difficult balancing act - treat your legitimate customers too much like criminals and they'll leave you.
If you're talking about memberships, your application may be web connected. If that is the case, you could verify the instance against a web service on your server that holds and increments the count and issues a "OK/Not OK to run" reply.
If you don't want to do this, I have heard of an application that uses steganography to hide relevant details in certain files - you could hide your count in some of your image resources.
Create multiple files containing the counter or the number of times your app will run. Name these files with different file names and store it in different location so that it will be hard to locate,delete and crack by user. The reason why it is not just one file because if the user found one of your file and alter or delete it, you still have other files which contains the valid information about your app.
If your application is a commercial product it might be worth to have a look at security products from other commercial vendors like SafeNet.com, for example.
A few years ago I used the HASP HL hardlock for a project, which worked just fine.
They offer hardware dongles for software protection as well as software based protection (using authentication services over the internet), and combinations of both.
Their products allow for very fine grained control of what you want to allow your users, e.g. how many times an application may be started before it expires (which would be just what you want) or time-expiration, or feature packages, or any combination of it all.
The downside is, that they have very "healthy" licensing prices.
If this is worth it will depend on the size and price of your own application.

Memory-mapped file IList implementation, for storing large datasets "in memory"?

I need to perform operations chronologically on huge time series implemented as IList. The data is ultimately stored into a database, but it would not make sense to submit tens of millions of queries to the database.
Currently the in-memory IList triggers an OutOfMemory exception when trying to store more than 8 million (small) objects, though I would need to deal with tens of millions.
After some research, it looks like the best way to do it would be to store data on disk and access it through an IList wrapper.
Memory-mapped files (introduced in .NET 4.0) seem the right interface to use, but I wonder what is the best way to write a class that should implement IList (for easy access) and internally deal with a memory-mapped file.
I am also curious to hear if you know about other ways ! I thought for example of an IList wrapper using data from db4o (someone mentionned here using a memory-mapped file as the IoAdapterFile, though using db4o probably adds a performance cost vs. dealing directly with the memory-mapped file).
I have come across this question asked in 2009, but it did not yield useful answers or serious ideas.
I found this PersistentDictionary<>, but it only works with strings, and by reading the source code I am not sure it was designed for very large datasets.
More scalable (up to 16 TB), the ESENT PersistentDictionary<>, uses the ESENT database engine present in Windows (XP+) and can store all serializable objects containing simple types.
Disk Based Data Structures, including Dictionary, List and Array with an "intelligent" serializer looked exactly like what I was looking for, but it did not run smoothly with extremely large datasets, especially as it does not make use of the "native" .NET MemoryMappedFiles yet, and support for 32 bits systems is experimental.
Update 1: I ended up implementing my own version that makes extensive use of .NET MemoryMappedFiles; it is very fast and I will probably release it on Codeplex once I have made it better for more general purpose usages.
Update 2: TeaFiles.Net also worked great for my purpose. Highly recommended (and free).
I see several options:
"in-memory-DB"
for example SQLite can be used this way - no need for any setup etc. just deploying the DLL (1 or 2) together with the app and the rest can be done programmatically
Load all data into temporary table(s) into the DB, with unknown (but big) amounts of data I found that this pays off really fast (and processing can usually be done inside the DB whcih is even better!)
use a MemoryMappedFile and a fixed structure size (array-like access via offset) but beware that physical memory is the limit except you use some sort of "sliding window" to map only parts into memory
The memory mapped files is a nice way to do it. But it going to be very slow if you need to access things randomly.
Your best bet is probably to come up with a fixed structure size when saved in memory (if you can) then you use the offset as the list item id. However deletes / sorting is always a problem.

How to allow users to define financial formulas in a C# app

I need to allow my users to be able to define formulas which will calculate values based on data. For example
//Example 1
return GetMonetaryAmountFromDatabase("Amount due") * 1.2;
//Example 2
return GetMonetaryAmountFromDatabase("Amount due") * GetFactorFromDatabase("Discount");
I will need to allow / * + - operations, also to assign local variables and execute IF statements, like so
var amountDue = GetMonetaryAmountFromDatabase("Amount due");
if (amountDue > 100000) return amountDue * 0.75;
if (amountDue > 50000) return amountDue * 0.9;
return amountDue;
The scenario is complicated because I have the following structure..
Customer (a few hundred)
Configuration (about 10 per customer)
Item (about 10,000 per customer configuration)
So I will perform a 3 level loop. At each "Configuration" level I will start a DB transaction and compile the forumlas, each "Item" will use the same transaction + compiled formulas (there are about 20 formulas per configuration, each item will use all of them).
This further complicates things because I can't just use the compiler services as it would result in continued memory usage growth. I can't use a new AppDomain per each "Configuration" loop level because some of the references I need to pass cannot be marshalled.
Any suggestions?
--Update--
This is what I went with, thanks!
http://www.codeproject.com/Articles/53611/Embedding-IronPython-in-a-C-Application
Iron Python Allows you to embed a scripting engine into your application. There are many other solutions. In fact, you can google something like "C# embedded scripting" and find a whole bunch of options. Some are easier than others to integrate, and some are easier than others to code up the scripts.
Of course, there is always VBA. But that's just downright ugly.
You could create a simple class at runtime, just by writing your logic into a string or the like, compile it, run it and make it return the calculations you need. This article shows you how to access the compiler from runtime: http://www.codeproject.com/KB/cs/codecompilation.aspx
I faced a similar problem a few years ago. I had a web app with moderate traffic that needed to allow equations, and it needed similar features to yours, and it had to be fast. I went through several ideas.
The first solution involved adding calculated columns to our database. Our tables for the app store the properties in columns (e.g., there's a column for Amount Due, another Discount, etc.). If the user typed in a formula like PropertyA * 2, the code would alter the underlying table to have a new calculated column. It's messy as far as adding and removing columns. It does have a few advantages though: the database (SQL Server) was really fast at doing the calculations; the database handled a lot of error detection for us; and I could pretend that the calculated values were the same as the non-calculated values, which meant that I didn't have to modify any existing code that worked with the non-calculated values.
That worked for a while until we needed the ability for a formula to reference another formula, and SQL Server doesn't allow that. So I switched to a scripting engine. IronPython wasn't very mature back then, so I chose another engine... I can't remember which one right now. Anyway, it was easy to write, but it was a little slow. Not a lot, maybe a few milliseconds per query, but for a web app the time really added up over all the requests.
That was when I decided to write my own parser for the formulas. That is, I have a PlusToken class to add two values, an ItemToken class that corresponds to GetValue("Discount"), etc. When the user enters a new formula, a validator parses the formula, makes sure it's valid (things like, did they reference a column that doesn't exist?), and stores it in a semi-compiled form that's easy to parse later. When the user requests a calculated value, a parser reads the formula, parses it, figures out what data is needed from the database, and computes the final answer. It took a fair amount of work up front, but it works well and it's really fast. Here's what I learned:
If the user enters a formula that leads to a cycle in the formulas, and you try to compute the value of the formula, you'll run out of stack space. If you're running this on a web app, the entire web server will stop working until you reset it. So it's important to detect cycles at the validation stage.
If you have more than a couple formulas, aggregate all the database calls in one place, then request all the data at once. Much faster.
Users will enter wacky stuff into formulas. A parser that provides useful error messages will save a lot of headaches later on.
If the custom scripts don't get more complex than the ones that you show above, I would agree with Sylvestre: Create your own parser, make a tree and do the logic yourself. You can generate a .Net expression tree or just go through the Syntax tree yourself and make the operations within your own code (Antlr below will help you generate such code).
Then you are in complete control of your references, you are always within C#, so you don't need to worry about memory management (any more than you would normally do) etc. IMO Antlr is the best tool for doing this in C# . You get examples from the site for little languages, like your scenario.
But... if this is really just a beginning and at the end you need almost full power of a proper scripting language, you would need to go into embedding a scripting language to your system. With your numbers, you will have a problem with performance, memory management and probably references as you noted. There are several approaches, but I cannot really give one recommendation for your scenario: I've never done it in such a scale.
You could build two base classes UnaryOperator (if, square, root...) and BinaryOperator (+ - / *) and build a tree from the expression. Then evaluate the tree for each item.

Is it better to have one big workflow or several smaller specific ones?

I need to build an app that gets files from a server and moves to another server. It was suggested that I look into using Windows Workflow Foundation (WF).
I started to build the workflow but it is getting messy and I'm not sure I'm doing it the best way possible.
Here is the basic worklow activities:
Get a list of sources
Determine if source is ftp or disk drive
Get a list of files from the server
If source is ftp then get the file with ftp get else if source is drive then read file from drive
If target is ftp then ftp file to server else if target is drive then write to a drive else if target is web service then post to web service
If source is ftp then delete file with ftp commands else if source is drive then delete file
With one workflow it gets a little busy. I need 2 while loops, one around the integrations and one after I get a file list.
The other thing I thought of was to build multiple workflows. One for FTPtoFTP, FTPtoDrive, FTPtoWebServie, DriveToFTP, DrivetoDrive, DriveToWebService.
Any suggestions?
First, you should consider creating custom Activities for each of the major sections. The custom activities will be Composite Activities that can be composed of many steps. This will help de-clutter things a bit and allow you to continue working with the workflows at a relatively high-level.
The Workflow Designer, while handy, is not really designed to scale very large. As of VS 2008, the best way to work with XAML-based technologies is to use the text editor and read/write the XML directly.
Breaking it down into several workflows might not be the best approach unless you can break it down into a few high-level activities and are working at the XAML level. Keep in mind that if the logic and flow is nearly identical for all of these, you will now have to maintain 6 different workflows. This is a bigger nightmare if your workflows are complex and you need to fix a common logic error across all of them.
You should also consider the use of the Services. This may allow you to have ONE workflow and ONE set of activities, but the implementation of each step can be isolated into a service. In this case, you would need to instantiate one workflow per combination, load the same workflow into each, and inject different activities. Not necessarily the best approach, but something to consider.
First of all, this sounds to me like using WF is adding extra complications to what should be a fairly straightforward process. Although WF can be used to model execution flow, its purpose is to model business flow, and include business rules and logic without putting those into your implementations.
In your example, the business rules seem largely like things which should be dealt with by an app.config file.
However, on the broader question of using one workflow or many. You want each of your workflow tasks to be approximately the same 'broad scope'
For instance
WF for building a table
purchase wood
cut wood
cut wood for legs
bevel edges
round cornices
sand twice with different coarseness
assemble table
The steps in the middle are all much more detailed than the steps around them.
So you would consider splitting it up into two separate workflows: a high level workflow that contains the broads steps, and lower level workflows that contain the particulars.
So the 'GetDatasource' workflow step would not care (externally) what type of datasource it is gathering from, it just returns to the next step in the workflow a set of data.
Same goes for target, it doesn't care what type of datasource it had, it only cares what it has to do with the data. So that should be encapsulated as well.
So your Workflow could be three workflows
Highest WF
GetDataSourceWF
DoThingsWithDataWF
Then your DoThingsWithDataWF and GetDataSourceWF Workflows can each be concerned with only the execution context that they need.
EDIT
As pointed out by the commenter James Schek.
You can use the higher level workflow to actually kick off your lower level workflows and manage their execution into each other.
Well personally I have not used WWF yet. I have done quite a bit of workflows before though. To me breaking them up into smaller workflows would seem to be the best way. When you're working with workflows you should try to limit each workflow to a specific task so that you have a definitive start action and at least one successful route and at least one failure route. Workflows in general can be very tricky things and it's best to keep each as simple as possible.
As a general rule, anytime things get "messy", you should break them down into smaller parts. I'd definitely recommend breaking it down into several workflows.

Categories

Resources