In my small compiler I currently have a hand-made AST.
I was considering the idea of having a visitor that would look after nodes of a certain type X and would replace them by nodes of type X'. The trouble is that it seems that it isn't something easy to implement with the visitor pattern.
The only way I can see to make this work would be to have visit() methods to all kinds of nodes that could possibility have a node of type X as child and put my node replacing logic there, but there may be lots of those nodes. Plus, if I later decide to add a new kind of node, I incur the risk of not remembering to check for that new special case in this visitor.
What's the problem I'm trying to solve:
For the current case I have in my tree nodes of type FunctionCall that convey only the name of a operation as well as its parameters.
I'd like to substitute those with a MethodInvocation, with the appropriate OOish transformation:
m(A, B) -> A.m(B)
m(n(A, B), C) -> (A.n(B)).m(C)
Of course this can be done in a thousand of different ways, being the easiest one to simply try to consider only a Call class in which there may or may not exist a target, but I'd like to be as explicit as possible (that is, using different kinds of nodes), to express different things, if possible.
Related
I am looking for an algorithm or approach to evaluate mathematical expressions that are stated as string. The expression contains mathematical components but also custom functions. I look to implement said algorithm in C#/.Net.
I am aware that Roslyn allows me to evaluate an expression of the kind
"var value = 3+5*11-Math.Sqrt(9);"
I am also familiar how to use "node re-writing" in order to accomplish avoidance of variable declarations or fully qualified function names or the omission of the trailing semicolon in order to evaluate
"value = 3+5*11-Sqrt(9)"
However, what I want to implement on top of this is to offer custom script functions such as
"value = Ratio(A,B)", where Ratio is a custom function that divides each element in vector A by each element in vector B and returns a same length vector.
or
"value = Sma(A, 10)", where Sma is a custom function that calculates the simple moving average of vector/timeseries A with a lookback window of 10.
Ideally I want to get to the ability to provide more complexity such as
"value = Ratio(A,B) * Pi + 0.5 * Spread(C,D) + Sma(E, lookback)", whereby the parsing engine would respect operator precedence and build a parsing tree in order to fetch values, required to evaluate the expression.
I can't wrap my head around how I could solve such kind of problem with Roslyn.
What other approaches are out there to get me started or am I missing features that Roslyn offers that may assist in solving this problem?
Assuming that all your expressions are valid C# expressions you can make use of Roslyn in multiple ways.
You could use Roslyn only for parsing. SyntaxFactory.ParseExpression would give you the syntax tree of an expression. Note that your first (var v = expr;) example is not an expression, but a variable declaration. However v = expr is an expression, namely an AssignmentExpressionSyntax. Then you could traverse this AST, and do with each node what you want to do, basically you'd write an interpreter. The benefit of this approach is that you don't have to write your own parser, walking an AST is very simple, and this approach would be flexible, as defining what you do with "unknown" methods would be perfectly up to you.
Use Roslyn for evaluation too. This can be done in multiple flavors: either putting together a valid C# file, and compiling that into an assembly, or you could go through the Scripting API. This approach would basically require a class library that contains the implementation of all your extra methods, like Sma, Spread, ... But these would also be needed in some form in the first approach, so it's not really an extra effort.
If the only goal is to evaluate the expression, then I would go with the 2nd approach. If there are extra requirements (which you haven't mentioned) like being able to let's say produce a simplified form of an expression, then I'd consider the first solution.
If you find a library that does exactly what you need (and the perf is good, and you don't mind the dependency on 3rd party tools, ...), I'd go with that. MathParser.org-mXparser suggested in the comment seems pretty much what you're looking for.
As the question shows,
As we are using string functions like IsNullOrEmpty or IsNullOrWhiteSpace as the name of functions shows , these are doing more than one job , is it not a violation of SRP?
rather should it not be string.isValid(Enum typeofValidation) than using strategey pattern to choose the correct strategey to validate.
or is it perfectly OK to violate SRP in utilities class or static classes.
The SRP says that a function or class should have only one reason to change. What is a reason to change? A reason to change is a user who requests changes. So a class or function should have only one user who requests changes.
Now a function that does some calculations and then some formatting, has two different users that could request a change. One would request changes to the calculations and the other would request changes to the formatting. Since these users have different needs and will make their requests and different times, we'd like them to be served by different functions.
IsNullOrEmpty(String) is not likely to be serving two different users. The user who cares about null is likely the same user who cares about empty, so isNullOrEmpty does not violate the SRP.
In object-oriented programming, the single responsibility principle states that every object should have a single responsibility
You're describing methods: IsNullOrEmpty or IsNullOrWhiteSpace, which are also self-describing in what they do, they're not objects. string has a single responsibility - to be responsible for text strings!
Static helpers can perform many tasks if you choose: the whole point of the Single Responsibility principle is to ultimately make your code more maintainable and readable for future teams and yourself. As a comment says, don't overthink it. You're not designing the framework here but just consuming some parts of it that will clean your strings for you, and validate incoming data.
The SRP applies to classes, not methods. Still, it's a good idea to have methods that do one thing only. But you can't take that to extremes. For example, a console application would be fairly useless if its Main method could contain only one statement (and, if the statement is a method call, that method could also contain only one statement, etc., recursively).
Think about the implementation of IsNullOrEmpty:
static bool IsNullOrEmpty(string s)
{
return ReferenceEquals(s, null) || Equals(s, string.Empty);
}
So, yes, it's doing two things, but they're done in a single expression. If you go to the level of expressions, any boolean expression involving binary boolean operators could be said to be "doing more than one thing" because it is evaluating the truth of more than one condition.
If the names of the methods bother you because they imply too much activity for a single method, wrap them in your own methods with names that imply the evaluation of a single condition. For example:
static bool HasNoVisibleCharacters(string s) { return string.IsNullOrWhitespace(s); }
static bool HasNoCharacters(string s) { return string.IsNullOrEmpty(s); }
In response to your comment:
say I wrote the function like SerilizeAndValidate(ObjectToSerilizeAndValidate) , clearly this method / class , is doing 2 things , Serialize and Validation, clearly a violation , some time methods in a class leads to maintenance nightmare like above example of serialize and validation
Yes, you are right to be concerned about this, but again, you cannot literally have methods that do one thing only. Remember that different methods will deal with different levels of abstraction. You might have a very high-level method that calls SerializeAndValidate as part of a long sequence of actions. At that level of abstraction, it might be very reasonable to think of SerializeAndValidate as a single action.
Imagine writing a set of step-by-step instructions for an experienced user to open a file's "properties" dialogue:
Right-click the file
Choose "Properties"
Now imagine writing the same instructions for someone who's never used a mouse before:
Position the mouse pointer over the file's icon
Press and release the right mouse button
A menu appears. Position the mouse pointer over the word "Properties"
Press and release the left mouse button
When we write computer programs, we need to operate at both levels of abstraction. Or, rather, at any given time, we're operating at one level of abstraction or another, so as not to confuse ourselves. Furthermore, we rely on library code that operates at lower levels of abstraction still.
Methods also allow you to comply with the "do not repeat yourself" principle (often known as "DRY"). If you need to both serialize and validate objects in many parts of your application, you'd want to have a SerializeAndValidate method to reduce duplicative code. You'd be very well advised to implement the method as a simple convenience method:
void SerializeAndValidate(SomeClass obj)
{
Serialize(obj);
Validate(obj);
}
This allows you the convenience of calling one method, while preserving the separation of serialization logic from validation logic, which should make the program easier to maintain.
I don't see this as doing more than one thing. It is just making sure your string passes a required condition.
I'm attempting to build a tree, where each node can have an unspecified amount of children nodes. The tree is to have over a million nodes in practice.
I've managed to contruct the tree, however I'm experiencing memory errors due to a full heap when I fill the tree with a few thousand nodes. The reason for this is because I'm attempting to store each node's children in a Dictionary data structure (or any data structure for that matter). Thus, at run-time I've got thousands of such data structures being created since each node can have an unspecified amount of children, and each node's children are to be stored in this data structure.
Is there another way of doing this? I cannot simply use a variable to store a reference of the children, as there can be an unspecified amount of children for each node. THus, it is not like a binary tree where I could have 2 variables keeping track of the left child and right child respectively.
Please no suggestions for another method of doing this. I've got my reasons for needing to create this tree, and unfortunately I cannot do otherwise.
Thanks!
How many of your nodes will be "leaf" nodes? Perhaps only create the data structure to store children when you first have a child, otherwise keeping a null reference.
Unless you need to look up the children as a map, I'd use a List<T> (initialized with an appropriate capacity) instead of a Dictionary<,> for the children. It sounds like you may have more requirements than you've explained though, which makes it hard to say.
I'm surprised you're failing after only a few thousand nodes though - you should be able to create a pretty large number of objects before having problems.
I'd also suggest that if you think you'll end up using a lot of memory, make sure you're on a 64-bit machine and make sure your application itself is set to be 64-bit. (That may just be a thin wrapper over a class library, which is fine so long as the class library is set to be 64-bit or AnyCPU.)
Why is the main purpose of the extension method Single()?
I know it will throw an exception if more than an element that matches the predicate in the sequence, but I still don't understand in which context it could be useful.
Edit:
I do understand what Single is doing, so you don't need to explain in your question what this method does.
It's useful for declaratively stating
I want the single element in the list and if more than one item matches then something is very wrong
There are many times when programs need to reduce a set of elements to the one that is interesting based an a particular predicate. If more than one matches it indicates an error in the program. Without the Single method a program would need to traverse parts of the potentially expensive list more once.
Compare
Item i = someCollection.Single(thePredicate);
To
Contract.Requires(someCollection.Where(thePredicate).Count() == 1);
Item i = someCollection.First(thePredicate);
The latter requires two statements and iterates a potentially expensive list twice. Not good.
Note: Yes First is potentially faster because it only has to iterate the enumeration up until the first element that matches. The rest of the elements are of no consequence. On the other hand Single must consider the entire enumeration. If multiple matches are of no consequence to your program and indicate no programming errors then yes use First.
Using Single allows you to document your expectations on the number of results, and to fail early, fail hard if they are wrong. Unless you enjoy long debugging sessions for their own sake, I'd say it's enormously useful for increasing the robustness of your code.
Every LINQ operator returns a sequence, so an IEnumerable<T>. To get an actual element, you need one of the First, Last or Single methods - you use the latter if you know for sure the sequence only contains one element. An example would be a 1:1 ID:Name mapping in a database.
A Single will return a single instance of the class/object and not a collection. Very handy when you get a single record by Id. I never expect more than one row.
I'm building a gui component that has a tree-based data model (e.g. folder structure in the file system). so the gui component basically has a collection of trees, which are just Node objects that have a key, reference to a piece of the gui component (so you can assign values to the Node object and it in turn updates the gui), and a collection of Node children.
one thing I'd like to do is be able to set "styles" that apply to each level of nodes (e.g. all top-level nodes are bold, all level-2 nodes are italic, etc). so I added this to the gui component object. to add nodes, you call AddChild on a Node object. I would like to apply the style here, since upon adding the node I know what level the node is.
problem is, the style info is only in the containing object (the gui object), so the Node doesn't know about it. I could add a "pointer" within each Node to the gui object, but that seems somehow wrong...or I could hide the Nodes and make the user only be able to add nodes through the gui object, e.g. gui.AddNode(Node new_node, Node parent), which seems inelegant.
is there a nicer design for this that I'm missing, or are the couple of ways I mentioned not really that bad?
Adding a ParentNode property to each node is "not really that bad". In fact, it's rather common. Apparently you didn't add that property because you didn't need it originally. Now you need it, so you have good reason to add it.
Alternates include:
Writing a function to find the parent of a child, which is processor intensive.
Adding a separate class of some sort which will cache parent-child relationships, which is a total waste of effort and memory.
Essentially, adding that one pointer into an existing class is a choice to use memory to cache the parent value instead of using processor time to find it. That appears to be a good choice in this situation.
It seems to me that the only thing you need is a Level property on the nodes, and use that when rendering a Node through the GUI object.
But it matters whether your Tree elements are Presentation agnostic like XmlNode or GUI oriented like Windows.Forms.TreeNode. The latter has a TreeView property and there is nothing wrong with that.
I see no reason why you should not have a reference to the GUI object in the node. A node cannot exist outside the GUI object, and it is useful to be able to easily find the GUI object a node is contained in.
You may not want to tie the formatting to the level the node is at if your leaf nodes may be at different levels.