Using F# quotations to detect code changes - c#

I need to cache results from heavy calculations done by several different classes inheriting from the same base class. I was doing run-time caching by subclass name. Now I need to store the results on disk/DB to avoid long recalculations when I restart my app, but I need to invalidate cache if I change the code inside Calculate().
type BaseCalculator() =
let output = ConcurrentDictionary<string, Series<DateTime, float>>() // has public getter, the output is cached by the caller of Calculate() method
abstract Calculate: unit->unit
type FirstCalculator() =
inherit BaseCalculator()
override this.Calculate() = ... do heavy work here ...
From this question and answers I have learned that I could use [<ReflectedDefinition>] on my calculate method. But I have never worked with quotations myself before.
The questions are:
Could I use a hash code of quotations to uniquely identify the body of the method, or there are some guids or timestamps inside quotations?
Could [<ReflectedDefinition>] be applied to the abstract method and will it work if I override the method in C#, but call the method from F# runner? (I use reflection to load all implementations of the base class in all dlls in a folder)
Is there other simple and reliable method to detect code changes in an assembly automatically, without quotations? Using last modified time of an assembly file (dll) could work, but any change in an assembly will invalidate all calculators in the assembly. This could work if I separate stable and WIP calculators into separate assemblies, but more granularity is preferred.

Could I use a hash code of quotations to uniquely identify the body of the method, or there are some guids or timestamps inside quotations?
I think this would work. Have a look at FsPickler (which is used by MBrace to serialize quotations). I think it can give you a hash of a quotation too. But keep in mind that other parts of your code might change (e.g. another method in another type).
Could ReflectedDefinition be applied to the abstract method and will it work if I override the method in C#, but call the method from F# runner?
No, the attribute only works on F# methods compiled using the F# compiler.
Is there other simple and reliable method to detect code changes in an assembly automatically, without quotations?
I'm not sure. You could use GetMethodBody method and .NET reflection to get the IL, but this only gives you the immediate body - not including e.g. lambda functions, so changes that happen elsewhere will not be easy to detect.
A completely different approach that might work better would be to keep the calculations in FSX files in plain text and compile them on the fly using F# Compiler Service. Then you could just hash the source of the individual FSX files (on a per-computation basis).

I have found that Mono.Cecil is surprisingly easy to use, and with it I could hash all member bodies of a type, including base type.
This SO question was very helpful as a starting point.

Related

How to translate or convert CompilerGenerated code?

If you try to use decompilers like: jetbrains dotpeek, redgate reflector, telerik justdecompile, whatever.. Sometimes if you need a code to copy or just to understand, it is not possible because are shown somethings like it:
[CompilerGenerated]
private sealed class Class15
{
// Fields
public Class11.Class12 CS$<>8__locals25;
public string endName;
// Methods
public Class15();
public bool <Show>b__11(object intelliListItem_0);
}
I'm not taking about obfuscation, this is happens at any time, I didsome tests (my own code), and occurs using lambdas and iterators. I'm not sure, could anyone give more information about when and why..?
So, by standard Visual Studio not compile $ and <> keywords in c# (like the code above)...
There is a way to translate or convert this decompiled code automatically?
Lambdas are a form of closure which is a posh way of saying it's a unit of code you can pass around like it was an object (but with access to its original context). When the compiler finds a lambda it generates a new type (Type being a class or struct) which encapsulates the code and any fields accessed by the lambda in its original context.
The problem here is, how do you generate code which will never conflict with user written code?
The compiler's answer is to generate code which is illegal in the language you are using, but legal in IL. IL is "Intermediate Language" it's the native language used by the Common Language Runtime. Any language which runs on the CLR (C#, vb.net, F#) compiles into IL. This is how you get to use VB.Net assemblies in C# code and so on.
So this is why the decompilers generate the hideous code you see. Iterators follow the exact same model as do a bunch of other language features that require generated types.
There is an interesting side effect. The Lambda may capture a variable in its original context:
public void TestCapture()
{
StringBuilder b = new StringBuilder();
Action l = () => b.Append("Kitties!");
}
So by capture I mean the variable b here is included in the package that defines the closure.
The compiler tries to be efficient and create as few types as possible, so you can end up with one generated class that supports all the lambdas found in a specific class, including fields for all the captured variables. In this way, if you're not careful, you can accidentally capture something you expect to be released, causing really tricky to trace memory leaks.
Is there an option to change the target framework?... I know with some decompilers they default to the lowest level framework (C# 1.0)

Method analysis using Reflection and CodeDom

The context of this question is too elaborate to describe here and will likely adversely affect responses so I am not including it. I want to assert certain things about a method in a unit test. Some of these things are possible using reflection such as format of the try/finally block, class fields and method local variables, etc. I already know the type and method signature.
protected override void OnTest ()
{
bool result = false;
SomeCOMObject com = null; // System.__ComObject
try
{
}
finally
{
System.Runtime.InteropServices.Marshal.ReleaseComObject(com);
}
return (result);
}
What I have not been able to achieve are things like:
Whether the method contains only a single return (result); statement and whether that statement is the last one in the function.
Whether all variables of type System.__ComObject have been manually de-referenced using System.Runtime.InteropServices.Marshal.ReleaseComObject(object) in the finally block.
Since some of these things are not possible using reflection, and source code text analysis is far from ideal, I turned to CodeDom but have not been able to get a grip on it. I have been told that creating expression trees from source code is not possible. Nor is it possible to create expression trees from the runtime type. If that is correct, how can I leverage CodeDom to achieve things in the list above?
I have used CodeDom in the past for code generation and compiling simple code classes to assemblies. But I have no idea how it could be used to analyze the internals of a method. Please advise.
In general, reflection built into programming languages provides no access to the content of functions. So you pretty much can't do this with reflection.
You might be able to do it if you have access to the byte-code equivalent, but byte code can't really answer questions about the syntax of the method, e.g., "how many return statements exists returning the same expression".
If you want to reason about code, your need to reason about the source code. This means you need access to a parser, and often other useful facts ("what the declaration of X?", "Is the type of X and Y compatible?", "Does data flow from X to Y?"), etc.
Roslyn provides some of this information. There are also commercial solutions (I have one).

When to use run-time type information?

If I have various subclasses of something, and an algorithm which operates on instances of those subclasses, and if the behaviour of the algorithm varies slightly depending on what particular subclass an instance is, then the most usual object-oriented way to do this is using virtual methods.
For example if the subclasses are DOM nodes, and if the algorithm is to insert a child node, that algorithm differs depending on whether the parent node is a DOM element (which can have children) or DOM text (which can't): and so the insertChildren method may be virtual (or abstract) in the DomNode base class, and implemented differently in each of the DomElement and DomText subclasses.
Another possibility is give the instances a common property, whose value can be read: for example the algorithm might read the nodeType property of the DomNode base class; or for another example, you might have different types (subclasses) of network packet, which share a common packet header, and you can read the packet header to see what type of packet it is.
I haven't used run-time-type information much, including:
The is and as keywords in C#
Downcasting
The Object.GetType method in dot net
The typeid operator in C++
When I'm adding a new algorithm which depends on the type of subclass, I tend instead to add a new virtual method to the class hierarchy.
My question is, when is it appropriate to use run-time-type information, instead of virtual functions?
When there's no other way around. Virtual methods are always preferred but sometimes they just can't be used. There's couple of reasons why this could happen but most common one is that you don't have source code of classes you want to work with or you can't change them. This often happens when you work with legacy system or with closed source commercial library.
In .NET it might also happens that you have to load new assemblies on the fly, like plugins and you generally have no base classes but have to use something like duck typing.
In C++, among some other obscure cases (which mostly deal with inferior design choices), RTTI is a way to implement so-called multi methods.
This constructions ("is" and "as") are very familiar for Delphi developers since event handlers usually downcast objects to a common ancestor. For example event OnClick passes the only argurment Sender: TObject regardless of the type of the object, whether it is TButton, TListBox or any other. If you want to know something more about this object you have to access it through "as", but in order to avoid an exception, you can check it with "is" before. This downcasting allows design-type binding of objects and methods that could not be possible with strict class type checking. Imagine you want to do the same thing if the user clicks Button or ListBox, but if they provide us with different prototypes of functions, it could not be possible to bind them to the same procedure.
In more general case, an object can call a function that notifies that the object for example has changed. But in advance it leaves the destination the possibility to know him "personally" (through as and is), but not necessarily. It does this by passing self as a most common ancestor of all objects (TObject in Delphi case)
dynamic_cast<>, if I remember correctly, is depending on RTTI. Some obscure outer interfaces might also rely on RTTI when an object is passed through a void pointer (for whatever reason that might happen).
That being said, I haven't seen typeof() in the wild in 10 years of pro C++ maintenance work. (Luckily.)
You can refer to More Effective C# for a case where run-time type checking is OK.
Item 3. Specialize Generic Algorithms
Using Runtime Type Checking
You can easily reuse generics by
simply specifying new type parameters.
A new instantiation with new type
parameters means a new type having
similar functionality.
All this is great, because you write
less code. However, sometimes being
more generic means not taking
advantage of a more specific, but
clearly superior, algorithm. The C#
language rules take this into account.
All it takes is for you to recognize
that your algorithm can be more
efficient when the type parameters
have greater capabilities, and then to
write that specific code. Furthermore,
creating a second generic type that
specifies different constraints
doesn't always work. Generic
instantiations are based on the
compile-time type of an object, and
not the runtime type. If you fail to
take that into account, you can miss
possible efficiencies.
For example, suppose you write a class that provides a reverse-order enumeration on a sequence of items represented through IEnumerable<T>. In order to enumerate it backwards you may iterate it and copy items into an intermediate collection with indexer access like List<T> and than enumerate that collection using indexer access backwards. But if your original IEnumerable is IList why not take advantage of it and provide more performant way (without copying to intermediate collection) to iterate items backwards. So basically it is a special we can take advantage of but still providing the same behavior (iterating sequence backwards).
But in general you should carefully consider run-time type checking and ensure that it doesn't violate Liskov Substituion Principle.

What is 'unverifiable code' and why is it bad?

I am designing a helper method that does lazy loading of certain objects for me, calling it looks like this:
public override EDC2_ORM.Customer Customer {
get { return LazyLoader.Get<EDC2_ORM.Customer>(
CustomerId, _customerDao, ()=>base.Customer, (x)=>Customer = x); }
set { base.Customer = value; }
}
when I compile this code I get the following warning:
Warning 5 Access to member
'EDC2_ORM.Billing.Contract.Site'
through a 'base' keyword from an
anonymous method, lambda expression,
query expression, or iterator results
in unverifiable code. Consider moving
the access into a helper method on the
containing type.
What exactly is the complaint here and why is what I'm doing bad?
"base.Foo" for a virtual method will make a non-virtual call on the parent definition of the method "Foo". Starting with CLR 2.0, the CLR decided that a non-virtual call on a virtual method can be a potential security hole and restricted the scenarios in which in can be used. They limited it to making non-virtual calls to virtual methods within the same class hierarchy.
Lambda expressions put a kink in the process. Lambda expressions often generate a closure under the hood which is a completely separate class. So the code "base.Foo" will eventually become an expression in an entirely new class. This creates a verification exception with the CLR. Hence C# issues a warning.
Side Note: The equivalent code will work in VB. In VB for non-virtual calls to a virtual method, a method stub will be generated in the original class. The non-virtual call will be performed in this method. The "base.Foo" will be redirected into "StubBaseFoo" (generated name is different).
I suspect the problem is that you're basically saying, "I don't want to use the most derived implementation of Customer - I want to use this particular one" - which you wouldn't be able to do normally. You're allowed to do it within a derived class, and for good reasons, but from other types you'd be violating encapsulation.
Now, when you use an anonymous method, lambda expression, query expression (which basically uses lambda expressions) or iterator block, sometimes the compiler has to create a new class for you behind the scenes. Sometimes it can get away with creating a new method in the same type for lambda expressions, but it depends on the context. Basically if any local variables are captured in the lambda expression, that needs a new class (or indeed multiple classes, depending on scope - it can get nasty). If the lambda expression only captures the this reference, a new instance method can be created for the lambda expression logic. If nothing is captured, a static method is fine.
So, although the C# compiler knows that really you're not violating encapsulation, the CLR doesn't - so it treats the code with some suspicion. If you're running under full trust, that's probably not an issue, but under other trust levels (I don't know the details offhand) your code won't be allowed to run.
Does that help?
copy/pasting from here:
Codesta: C#/CLR has 2 kinds of code, safe and unsafe. What is it trying to provide and how did this affect the virtual machine?
Peter Hallam: For C# the terms are safe and unsafe. The CLR uses the terms verifiable and unverifiable.
When running verifiable code the CLR can enforce security policies; the CLR can prevent verifiable code from doing things that it doesn't have permission to do. When running potentially malicious code, code that was downloaded from the internet for example, the CLR will only run verifiable code, and will ensure that the untrusted code doesn't access anything that it doesn't have permission to access.
The use of standard C style pointers creates unverifiable code. The CLR supports C style pointers natively. Once you've got a C style pointer you can read or write to any byte of memory in the process, so the runtime cannot enforce security policy. Actually it could but the performance penalty would make it impractical.
Now, that does not fully answer your question (i.e. WHY is this now unverifiable code), but at least it explains that "unverifiable" is the CLR-term for "unsafe". I assume that anonymous methods and base classes result in some funky pointer-magic internally.
By the Way: I think that the code snippet does not match the Warning message. The code is talking about a Customer, the Warning is about the Billing. Is it possible to post the actuon code the warning is generated for? Maybe you have something else in that code that would better explain why you get the warning.

Is it considered bad design to do lengthy operations in a constructor?

I am implementing a class to compare directory trees (in C#). At first I implemented the actual comparison in the class's constructor. Like this:
DirectoryComparer c = new DirectoryComparer("C:\\Dir1", "C:\\Dir2");
But it doesn't feel "right" to do a possible lengthy operation in the constructor. An alternative way is to make the constructor private and add a static method like this:
DirectoryComparer c = DirectoryComparer.Compare("C:\\Dir1", "C:\\Dir2");
What do you think? Do you expect a constructor to be "quick"? Is the second example better or is it just complicating the usage of the class?
BTW:
I wont mark any answer as accepted because I don't think there is a correct answer, just preference and taste.
Edit:
Just to clarify my example a little. I'm not only insterested if the directories differs, I'm also interested in how they differ (which files). So a simple int return value wont be enough. The answer by cdragon76.myopenid.com actually is pretty close to what I want (+1 to you).
I would think a combination of the two is the "right" choice, as I would expect the Compare method to return the comparison result, not the comparer itself.
DirectoryComparer c = new DirectoryComparer();
int equality = c.Compare("C:\\Dir1", "C:\\Dir2");
...and as Dana mentions, there is an IComparer interface in .Net that reflects this pattern.
The IComparer.Compare method returns an int since the use of IComparer classes is primarily with sorting. The general pattern though fits the problem of the question in that:
Constructor initializes an instance with (optionally) "configuring" parameters
Compare method takes two "data" parameters, compares them and returns a "result"
Now, the result can be an int, a bool, a collection of diffs. Whatever fits the need.
I prefer the second one.
I expect the constructor to instanciate the class.
The method compare does what it is designed to do.
I think an interface might be what you're after. I would create a class to represent a directory, and have that implement the DirectoryComparer interface. That interface would include the compare method. If C# already has a Comparable interface, you could also just implement that.
In code, your call would be:
D1 = new Directory("C:\");
..
D1.compare(D2);
You should never do anything that might fail in a constructor. You don't want to ever create invalid objects. While you could implement a "zombie" state where the object doesn't do much, it's much better to perform any complex logic in seperate methods.
I agree with the general sentiment of not doing lengthy operations inside constructors.
Additionally, while on the subject of design, I'd consider changing your 2nd example so that the DirectoryComparer.Compare method returns something other than a DirectoryComparer object. (Perhaps a new class called DirectoryDifferences or DirectoryComparisonResult.) An object of type DirectoryComparer sounds like an object you would use to compare directories as opposed to an object that represents the differences between a pair of directories.
Then if you want to define different ways of comparing directories (such as ignoring timestamps, readonly attributes, empty directories, etc.) you could make those parameters you pass to the DirectoryComparer class constructor. Or, if you always want DirectoryComparer to have the exact same rules for comparing directories, you could simply make DirectoryComparer a static class.
For example:
DirectoryComparer comparer = new DirectoryComparer(
DirectoryComparerOptions.IgnoreDirectoryAttributes
);
DirectoryComparerResult result = comparer.Compare("C:\\Dir1", "C:\\Dir2");
Yes, typically a constructor is something quick, it is designed to prepare the object for use, not to actually do operations. I like your second option as it keeps it a one line operation.
You could also make it a bit easier by allowing the constructor to pass the two paths, then have a Compare() method that actually does the processing.
I like the second example because it explains what is exactly happening when you instantiate the object. Plus, I always use the constructor to initialize all of the global settings fro the class.
I think for a general purpose comparer you may on construction only want to specify the files you are comparing and then compare later- this way you can also implement extended logic:
Compare again- what if the directories changed?
Change the files you are comparing by updating the members.
Also, you may want to consider in your implementation receiving messages from your OS when files have been changed in the target directories- and optionally recomparing again.
The point is- you are imposing limits by assuming that this class will only be used to compare once for a single instance of those files.
Therefore, I prefer:
DirectoryComparer = new DirectoryComparer(&Dir1,&Dir2);
DirectoryComparer->Compare();
Or
DirectoryComparer = new DirectoryComparer();
DirectoryComparer->Compare(&Dir1,&Dir2);
I think it's not only okay for a constructor to take as much time as needed to construct a valid object, but the constructor is required to do so. Deferring object creation is very bad as you end up with potentially invalid objects. So, you will have to check an object everytime before you touch it (this is how it is done in the MFC, you have bool IsValid() methods everywhere).
I only see a slight difference in the two ways of creating the object. One can see the new operator as a static function of the class anyway. So, this all boils down to syntactic sugar.
What does the DirectoryComparer class do? What is it's responsibility? From my point of view (which is a C++ programmer's view) it looks like you'd be better off with just using a free function, but I don't think that you can have free functions in C#, can you? I guess you will collect the files which are different in the DirectoryComparer object. If so, you could better create something like an array of files or an equivalent class that's named accordingly.
If you are working with C#, you could use extension methods to create a method for comparing 2 directories that you would attach to the build in DirectoryClass, so it would look some thing like:
Directory dir1 = new Directory("C:\.....");
Directory dir2 = new Directory("D:\.....");
DirectoryCompare c = dir1.CompareTo(dir2);
This would be much clearer implementation.
More on extension methods here.
If an operation may take an unknown amount of time, it is an operation you might want to export into a different thread (so your main thread won't block and can do other things, like showing a spinning progress indicator for example). Other apps may not want to do this, they may want everything within a single thread (e.g. those that have no UI). Moving object creation to a separate thread is a bit awkward IMHO. I'd prefer to create the object (quickly) in my current thread and then just let a method of it run within another thread and once the method finished running, the other thread can die and I can grab the result of this method in my current thread by using another method of the object before dumping the object, since I'm happy as soon as I know the result (or keeping a copy if the result involves more details I may have to consume one at a time).
If the arguments are just going to be processed once then I don't think they belong as either constructor arguments or instance state.
If however the comparison service is going to support some kind of suspendable algorithm or you want to notify listeners when the equality state of two directories changes based on filesystem events or something like that. Then ther directories is part of the instance state.
In neither case is the constructor doing any work other than initializing an instance. In case two above the algorithm is either driven by a client, just like an Iterator for example, or it's driven by the event listening thread.
I generally try to do things like this:
Don't hold state in the instance if it can be passed as arguments to service methods.
Try to design the object with immutable state.
Defining attributes, like those used in equals and hashcode should allways be immutable.
Conceptualy a constructor is a function mapping an object representation to the object it represents.
By the definition above Integer.valueOf(1) is actually more of a constructor than new Integer(1) because Integer.valueOf(1) == Integer.valueOf(1).
,
In either case this concept also means that all the cosntructor arguments, and only the constructor argument, should define the equals behavior of an object.
I would definitely do the second.
Long actions in a constructor are fine if they are actually building the object so it is usable.
Now one thing that I see people do in constructors is call virtual methods. This is BAD since once someone uses you as a base class and overrides one of those functions you will call the base class's version not the derived class once you get into your constructor.
I don't think that talking about abstract terms like "lengthy" have anything to do with the decision if you put something in an constructor or not.
A constructor is something that should be used to initialize an object, a method should be used to "do something", i.e. have a function.

Categories

Resources