Can I tell the compiler to prefer baseTypes for extensions (with same signature) - c#

My issue: with .NET 6, the .StartsWith / .IndexOf / etc. methods in System.Runtime lib perform a lot worse than in previous versions of the framework.
The 'fast' implementations are now in the System.MemoryExtensions namespace, using ReadOnlySpan<T>.
Edit i just found my real issue.
When not using StringComparison, the default is CurrentCulture. And Current Culture performs ~7 times worse than in framework (de-DE) And initially i compared Runtime .StartsWith (with CurrentCulture) with memoryextensions Ordinal
UsedCode (using Benchmark.net):
[SimpleJob(RuntimeMoniker.Net48)]
[SimpleJob(RuntimeMoniker.Net60)]
[MemoryDiagnoser]
public class BenchForFlydog57
{
string somestring = "SomeString";
[Benchmark]
public bool SystemRuntimeStartsWith() => somestring.StartsWith("ABEC");
[Benchmark]
public bool MemoryExtensionStartsWith() => MemoryExtensions.StartsWith(somestring.AsSpan(), "ABEC".AsSpan(), StringComparison.CurrentCulture);
}
Results:
Using StringCompareOrdinal, leads to following result:
[SimpleJob(RuntimeMoniker.Net48)]
[SimpleJob(RuntimeMoniker.Net60)]
[MemoryDiagnoser]
public class BenchForFlydog57
{
string somestring = "SomeString";
[Benchmark]
public bool SystemRuntimeStartsWith() => somestring.StartsWith("ABEC", StringComparison.Ordinal);
[Benchmark]
public bool MemoryExtensionStartsWith() => MemoryExtensions.StartsWith(somestring.AsSpan(), "ABEC".AsSpan(), StringComparison.Ordinal);
}
This means my issue is solved and even performs better if we just specify the StringComparison to Ordinal.

Related

How to deal with optional arguments when wanting to enable nullable reference types?

I see the great advantage of turning on (non-)nullable reference types, but I have quite a few methods with optional parameters and I am wondering what the right way to correct the warnings yielded by the compiler is.
Making the parameter nullable by annotating the type with ? takes all of the goodness away. Another idea is to turn all methods with optional parameters into separate methods, which is quite a lot of work and yields high complexity (exponential explosion of parameter combinations).
I was thinking about something like this, but I really question if that it a good approach (performance-wise etc.) beyond the first glance:
[Fact]
public void Test()
{
Assert.Equal("nothing", Helper().ValueOrFallbackTo("nothing"));
Assert.Equal("foo", Helper("foo").ValueOrFallbackTo("whatever"));
}
public static Optional<string> Helper(Optional<string> x = default)
{
return x;
}
public readonly ref struct Optional<T>
{
private readonly bool initialized;
private readonly T value;
public Optional(T value)
{
initialized = true;
this.value = value;
}
public T ValueOrFallbackTo(T fallbackValue)
{
return initialized ? value : fallbackValue;
}
public static implicit operator Optional<T>(T value)
{
return new Optional<T>(value);
}
}
This look's like F#'s Option. This can be emulated in C# 8 up to a point with pattern matching expressions. This struct :
readonly struct Option<T>
{
public readonly T Value {get;}
public readonly bool IsSome {get;}
public readonly bool IsNone =>!IsSome;
public Option(T value)=>(Value,IsSome)=(value,true);
public void Deconstruct(out T value)=>(value)=(Value);
}
//Convenience methods, similar to F#'s Option module
static class Option
{
public static Option<T> Some<T>(T value)=>new Option<T>(value);
public static Option<T> None<T>()=>default;
...
}
Should allow code like this :
static string Test(Option<MyClass> opt = default)
{
return opt switch
{
Option<MyClass> { IsNone: true } => "None",
Option<MyClass> (var v) => $"Some {v.SomeText}",
};
}
The first option uses property pattern matching to check for None, while the second one uses positional pattern matching to actually extract the value through the deconstructor.
The nice thing is that the compiler recognizes this as an exhaustive match so we don't need to add a default clause.
Unfortunately, a Roslyn bug prevents this. The linked issue actually tries to create an Option class based on an abstract base class. This was fixed in VS 2019 16.4 Preview 1.
The fixed compiler allows us to omit the parameter or pass a None :
class MyClass
{
public string SomeText { get; set; } = "";
}
...
Console.WriteLine( Test() );
Console.WriteLine( Test(Option.None<MyClass>()) );
var c = new MyClass { SomeText = "Cheese" };
Console.WriteLine( Test(Option.Some(c)) );
This produces :
None
None
Some Cheese
VS 2019 16.4 should come out at the same time as .NET Core 3.1 in a few weeks.
Until then, an uglier solution could be to return IsSome in the deconstructor and use positional pattern matching in both cases:
public readonly struct Option<T>
{
public readonly T Value {get;}
public readonly bool IsSome {get;}
public readonly bool IsNone =>!IsSome;
public Option(T value)=>(Value,IsSome)=(value,true);
public void Deconstruct(out T value,out bool isSome)=>(value,isSome)=(Value,IsSome);
public void Deconstruct(out T value)=>(value)=(Value);
}
And
return opt switch { Option<MyClass> (_ ,false) =>"None",
Option<MyClass> (var v,true) => $"Some {v.SomeText}" , };
Borrowing from F# Options
No matter which technique we use, we can add extension methods to the Option static class that mimic F#'s Option module, eg Bind, perhaps the most useful method, applies a function to an Option if it has a value and returns an Option, or returns None if there's no value :
public static Option<U> Bind<T,U>(this Option<T> inp,Func<T,Option<U>> func)
{
return inp switch { Option<T> (_ ,false) =>Option.None<U>(),
Option<T> (var v,true) => func(v) ,
};
}
For example this applies the Format method to an Option to create a Optino :
Option<string> Format(MyClass c)
{
return Option.Some($"Some {c.SomeText}");
}
var c=new MyClass { SomeText = "Cheese"};
var opt=Option.Some(c);
var message=opt.Bind(Format);
This makes it easy to create other helper functions, or chain functions that produce options

Moq Verify without It - what kind of compare?

When using Moq with Verify, to assert that a certain method has been called with specified parameters, different kind of syntax is possible; one is the "It" syntax, like this
mock.Verify(c => c.SomeMethod(It.Is<string>(s => s == ExpectedString)));
What happens here is that the parameter that SomeMethod is called with, is checked for equality with ExpectedString. Another possible syntax is without "It":
mock.Verify(c => c.SomeMethod(ExpectedString));
which should give the same result. From what I have been able to find on different forums, the difference is that the latter is an identify check (reference equals) (except for value types).
However, my question is when the parameter is of a type Collection type. In .NET, Equals on Collection<T> is just inherited from object, so the following verify:
mock.Verify(c => c.SomeMethod(new Collection<string> { ExpectedString }));
should not be possible to pass, given that the collection is instantiated in the verify, and thus cannot possibly be the same instance that is instantiated in the production code. Nevertheless, it works, which indicates that Moq does a CollectionAssert or something like that, contrary to what information I could find.
Here is a code example that illustrates the behaviour, the test passes, but I think it should fail if Moq had used reference equals comparison.
[TestMethod]
public void Test()
{
var mock = new Mock<IPrint>();
const int ExpectedParam = 1;
var test = new TestPrinter { Printer = mock.Object, Expected = ExpectedParam };
test.Do();
mock.Verify(c => c.Print(new Collection<int> { ExpectedParam }));
}
public interface IPrint
{
void Print(Collection<int> numbers);
}
public class TestPrinter
{
public IPrint Printer { get; set; }
public int Expected { get; set; }
public void Do()
{
Printer.Print(new Collection<int> { Expected });
}
}
Does anyone know if this is expected behaviour of Moq (version 4.1)? Was the behaviour changed at some version level?
This is desired behaviourand was added to moq in January 2009 (version 3.0.203.1).
If moq finds an IEnumerable, it uses SequenceEqual to compare the actual argument and the argument used in the setup, otherwise it just uses Equals.
Here's the relevant bit of code:
internal class ConstantMatcher : IMatcher
{
...
public bool Matches(object value)
{
if (object.Equals(constantValue, value))
{
return true;
}
if (this.constantValue is IEnumerable && value is IEnumerable)
{
return this.MatchesEnumerable(value);
}
return false;
}
private bool MatchesEnumerable(object value)
{
var constValues = (IEnumerable)constantValue;
var values = (IEnumerable)value;
return constValues.Cast<object>().SequenceEqual(values.Cast<object>());
}
}

java enums vs C# enums - missing features

in java I could easily describe an enum with aditional data.
I could describe it something like this
public enum OperatorType
{
GreaterOrEqual (">=", "GreaterOrEqual"),
Greater (">" ,"Greater"),
Less ("<", "Less"),
LessOrEqual ("<=", "LessOrEqual"),
Equal ("==", "Equal"),
Between ("Between", "Between"),
Around ("Around","Around");
private final String symbol;
private final String name;
private OperatorType(final String symbol, final String name) {
this.symbol = symbol;
this.name = name;
}
}
And then add a static method that iterates over values(), adds all data to a hashmap and allow to retrieve from the map full enum data by one of its attriburtes as a key.
In brief, enum is a very developed type in java.
Now,
moving to c#, what are my options?
I want to hold an enum with its attributes, load it to a map, and retreive by key when I need. Do I have anything to assist (like, a singletone for each enum - which is not a good idea).
Thanks.
I would just create a class with public static readonly instances of each type and ditch enums altogether. You can use them as dictionary keys or do whatever you like. If you still intend to map them to an underlying data type (int) then you can create implicit operators for that too.
public class OperatorType
{
private static readonly Dictionary<int, OperatorType> OperatorMapping = new Dictionary<int, OperatorType>();
public static readonly OperatorType GreaterOrEqual = new OperatorType(0, ">=", "GreaterOrEqual");
public static readonly OperatorType Greater = new OperatorType(1, ">", "Greater");
public readonly String symbol;
public readonly String name;
private readonly int underlyingValue;
private OperatorType(int underlyingValue, string symbol, string name) {
this.underlyingValue = underlyingValue;
OperatorMapping[underlyingValue] = this;
this.symbol = symbol;
this.name = name;
}
public static implicit operator int(OperatorType operatorType)
{
return operatorType.underlyingValue;
}
public static implicit operator OperatorType(int value)
{
return OperatorMapping[value];
}
}
Sample usage:
Dictionary<OperatorType, string> operators = new Dictionary<OperatorType, string>();
operators.Add(OperatorType.GreaterOrEqual, "Greater or equal");
Console.WriteLine(operators[OperatorType.GreaterOrEqual]); //"Greater or equal"
OperatorType operatorType = 1;
Console.WriteLine(operatorType.name); //"Greater"
If you don't care about an underlying value, don't include it. Also consider whether or not the Dictionary mapping should be threadsafe for your usage. You can also expose a static IEnumerable<OperatorType> (or other collection) to get all operators defined if you want.
EDIT: On second thought, explicit operators are possibly preferable instead of implicit, both to conform with typical .NET best practices and to better match typical enum conversions.
The most convinient workaround might be to create an extension method to your enum type, and return the associated symbols.
Something like this:
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
tester t = tester.x;
t.testenums();
Console.ReadKey();
}
}
public static class ext
{
public static void testenums(this tester x)
{
Console.WriteLine(x.ToString());
}
}
public enum tester
{
x,
y
}
}
Of course you can write a more complex extension method, with return value, etc, this is just an example how to do it.
You can create an attribute:
public class EnumKeyAttribute : Attribute
{
public string Key { get; set; }
public string Description { get; set; }
public EnumKeyAttribute(string key, string description)
{
this.Key = key;
this.Description = description;
}
}
Then apply it to your enum
public enum OperatorType
{
[EnumKey(">=", "GreaterOrEqual")]
GreaterOrEqual,
[EnumKey(">", "Greater")]
Greater,
[EnumKey("<", "Less")]
Less,
[EnumKey("<=", "LessOrEqual")]
LessOrEqual,
[EnumKey("==", "Equal")]
Equal,
[EnumKey("Between", "Between")]
Between,
[EnumKey("Around", "Around")]
Around
}
To get the attribute data you can use reflection. Below is an example of getting the attribute for "Less"
MemberInfo memberInfo = typeof(OperatorType).GetMember(OperatorType.Less.ToString()).FirstOrDefault();
if(memberInfo != null)
{
EnumKeyAttribute attribute = (EnumKeyAttribute)memberInfo.GetCustomAttributes(typeof(EnumKeyAttribute), false).FirstOrDefault();
Console.WriteLine(attribute.Key);
Console.WriteLine(attribute.Description);
}
But because these enums are not created at runtime you can increase your efficiency by creating a static method that looks up the value in a dictionary. Do this as an extension method for ease of use
public static class KeyFinder
{
private static Dictionary<OperatorType, EnumKeyAttribute> lookupTable =
new Dictionary<OperatorType, EnumKeyAttribute>();
public static EnumKeyAttribute GetKey(this OperatorType type)
{
if (lookupTable.ContainsKey(type))
{
return lookupTable[type];
}
MemberInfo memberInfo = typeof(OperatorType).GetMember(type.ToString()).FirstOrDefault();
if (memberInfo != null)
{
EnumKeyAttribute attribute = (EnumKeyAttribute)memberInfo.GetCustomAttributes(typeof(EnumKeyAttribute), false).FirstOrDefault();
if (attribute != null)
{
lookupTable.Add(type, attribute);
return attribute;
}
}
// add a null value so next time it doesn't use reflection only to find nothing
lookupTable.Add(type, null);
return null;
}
}
So now to get the values you simply do the following:
OperatorType.Less.GetKey().Key
OperatorType.Less.GetKey().Description
Just be careful of null reference exceptions (since it will return null if it can't find an attribute). If you want to find by key you can simply create other extension methods that use the string value as the key.
C# doesn't really have the same feature. However there are several possibilities to get really close (and potentially more flexible as well).
Sticking to regular enums, you could use attributes to enrich with extra information. Of course, this requires reflection to work with that
public enum OperatorType
{
[DisplayName(">=")]
GreaterOrEqual,
// ...
}
There are several patterns to work with this, e.g. http://www.codeproject.com/Articles/28087/DisplayNameAttribute-for-Enumerations, google for more.
Another approach can be to enhance your enumeration types using regular classes:
public class OperatorType
{
public static OperatorType GreaterOrEqual = new OperatorType(">=", "GreaterOrEqual");
// ...
string symbol;
string name;
private OperatorType(string symbol, string name)
{
this.symbol = symbol;
this.name = name;
}
}
This article describes some other ways to work with enum-like types in C#
If you really need the functionality of Java-style enums in C#, I see three reasonable ways to implement it:
Use a C# enum and a static class of helper methods. You lose type safety, but this is an otherwise very workable solution.
Use a C# enum and a set of extension methods. Probably the most idiomatic C# solution, but you still have to deal with the loss of type safety (your extension methods should be able to cope with out-of-range values, even if only by throwing an exception).
Use the type-safe enum pattern that was common in Java before the language gained the enum keyword in Java 5. If you have non-trivial logic for each enum value, this would be my preference.

Increasing performance in custom sorting of a string array

I am trying to find an efficient way to sort an array of strings based on a numeric value within each string element of the array. I am currently using the Array.Sort(array, customComparer) static method (quick sort), with my custom comparer class (sorting in descending order) being:
class StringComparer : IComparer<string>
{
public int Compare(string a, string b)
{
string s1 = a;
string s2 = b;
Match matchA = Regex.Match(s1, #"\d+$");
Match matchB = Regex.Match(s2, #"\d+$");
long numberA = long.Parse(matchA.Value);
long numberB = long.Parse(matchB.Value);
if (numberB - numberA < 0)
{
return -1;
}
else
{
return 1;
}
}
}
This works very well, but sometimes it takes too much time to sort, with an array of 100 000 strings taking more than a minute on a 2.4Ghz processor. I wonder if there is a more efficient way to accomplish the same. For example, implementing a different sorting algorithm or taking another approach like using a dictionary and sorting on the value (the value being the numeric part of the string). Any suggestions? Thanks in advance!
You're parsing the value for each comparison. I would suggest you parse once, to get a string/long pair, sort that, and then extract the string part afterwards.
Note that your existing code has a bug: it will never return 0, for two strings comparing as equal.
Here's an alternative approach using LINQ (which isn't in-place sorting, but is simple.)
var sorted = unsorted.OrderBy(x => long.Parse(Regex.Match(x, #"\d+$").Value));
.ToList();
(OrderBy projects once to get the keys, then compares keys.)
You are now performing the Regexes O(n log n) times.
Consider looping once over all strings, extracting the numerical value and adding it to a SortedDictionary<long, string>
This requires only O(n) executions of the Reg expression. The rest of the sorting should be comparable.
First, you're needlessly parsing the same string over and over (both matching with the regular expression and then parsing the matches). Instead, encapsulate what you have into a custom type so that you only have to parse once.
public class FooString {
private readonly string foo;
private readonly long bar;
public FooString(string foo) {
this.foo = foo;
Match match = Regex.Match(foo, #"\d+$");
this.bar = Int64.Parse(match.Value);
}
public string Foo { get { return this.foo; } }
public long Bar { get { return this.bar; } }
}
I'd even add a Contract.Requires to this class that says that foo must satisfy the regular expression.
Second, you have an IComparer<T> that dies on certain values of T (in your case, strings that don't match the regular expression and can't be parsed to a long). This is generally a bad idea.
So, make the comparer for FooString:
public FooStringComparer : IComparer<FooString> {
public int Compare(FooString a, FooString b) {
Contract.Requires(a != null);
Contract.Requires(b != null);
return a.Bar.CompareTo(b.Bar);
}
}
Now, your sorting will be blazingly fast because you've stopped parsing the same string over and over.
Create the Regex only once with the Compiled option. This will increase the speed.
class StringComparer : IComparer<string>
{
private static Regex _regex = new Regex(#"\d+$", RegexOptions.Compiled);
public int Compare(string a, string b)
{
long numberA = Int64.Parse(_regex.Match(a).Value);
long numberB = Int64.Parse(_regex.Match(b).Value);
return numberA.CompareTo(numberB);
}
}

Constructing custom expression trees while using operators in C#

This question is about constructing custom expression trees in .NET using the operators found in C# (or any other language). I provide the question along with some the background information.
For my managed 2-phase 64-bit assembler I need support for expressions. For example, one might want to assemble:
mystring: DB 'hello, world'
TIMES 64-$+mystring DB ' '
The expression 64-$+mystring must not be a string but an actual valid expression with the benefits of syntax and type checking and IntelliSense in VS, something along the lines of:
64 - Reference.CurrentOffset + new Reference("mystring");
This expression is not evaluated when it is constructed. Instead, it is evaluated later in my assembler's context (when it determines the symbol offsets and such). The .NET framework (since .NET 3.5) provides support for expressions trees, and it seems to me that it is ideal for this kind of expressions which are evaluated later or somewhere else.
But I don't know how to ensure that I can use the C# syntax (using +, <<, %, etc..) for constructing the expression tree. I want to prevent things like:
var expression = AssemblerExpression.Subtract(64,
AssemblerExpression.Add(AssemblerExpression.CurrentOffset(),
AssemblerExpression.Reference("mystring")))
How would you go about this?
Note: I need an expression tree to be able to convert the expression into an acceptable custom string representation, and at the same time be able to evaluate it at a point in time other than at its definition.
An explanation of my example: 64-$+mystring. The $ is the current offset, so it is a specific number that is unknown in advance (but known at evaluation time). The mystring is a symbol which may or may not be known at evaluation time (for example when it has not yet been defined). Subtracting a constant C from a symbol S is the same as S + -C. Subtracting two symbols S0 and S1 (S1 - S0) gives the integer difference between the two symbol's values.
However, this question is not really about how to evaluate assembler expressions, but more about how to evaluate any expression that has custom classes in them (for things like the symbols and $ in the example) and how to still ensure that it can be pretty-printed using some visitor (thus keeping the tree). And since the .NET framework has its expression trees and visitors, it would be nice to use those, if possible.
I don't know what exactly you are aiming for, but the following is some sketchy approach that I think would work.
Note I
demonstrate only indexed reference expressions (thus ignoring indirect addressing via registers for now; you could add a RegisterInderectReference analogous to the SymbolicReference class). This also goes for you suggested $ (current offset) feature. It would probably be sure a register (?)
doesn't explicitely show the unary/binary operator- at work either. However, the mechanics are largely the same. I stopped short of adding it because I couldn't work out the semantics of the sample expressions in your question (I'd think that subtracting the address of a known string is not useful, for example)
the approach does not place (semantic) limits: you can offset any ReferenceBase derived IReference. In practice, you might only want to allow one level of indexing, and defining the operator+ directly on SymbolicReference would be more appropriate.
Has sacrificed coding style for demo purposes (in general, you'll not want to repeatedly Compile() your expression trees, and direct evaluation with .Compile()() looks ugly and confusing. It's left up to the OP to integrate it in a more legible fashion
The demonstration of the explicit conversion operator is really off-topic. I got carried away slighlty (?)
You can observe the code running live on IdeOne.com
.
using System;
using System.Collections.Generic;
using System.Linq.Expressions;
using System.Linq;
namespace Assembler
{
internal class State
{
public readonly IDictionary<string, ulong> SymbolTable = new Dictionary<string, ulong>();
public void Clear()
{
SymbolTable.Clear();
}
}
internal interface IReference
{
ulong EvalAddress(State s); // evaluate reference to address
}
internal abstract class ReferenceBase : IReference
{
public static IndexedReference operator+(long directOffset, ReferenceBase baseRef) { return new IndexedReference(baseRef, directOffset); }
public static IndexedReference operator+(ReferenceBase baseRef, long directOffset) { return new IndexedReference(baseRef, directOffset); }
public abstract ulong EvalAddress(State s);
}
internal class SymbolicReference : ReferenceBase
{
public static explicit operator SymbolicReference(string symbol) { return new SymbolicReference(symbol); }
public SymbolicReference(string symbol) { _symbol = symbol; }
private readonly string _symbol;
public override ulong EvalAddress(State s)
{
return s.SymbolTable[_symbol];
}
public override string ToString() { return string.Format("Sym({0})", _symbol); }
}
internal class IndexedReference : ReferenceBase
{
public IndexedReference(IReference baseRef, long directOffset)
{
_baseRef = baseRef;
_directOffset = directOffset;
}
private readonly IReference _baseRef;
private readonly long _directOffset;
public override ulong EvalAddress(State s)
{
return (_directOffset<0)
? _baseRef.EvalAddress(s) - (ulong) Math.Abs(_directOffset)
: _baseRef.EvalAddress(s) + (ulong) Math.Abs(_directOffset);
}
public override string ToString() { return string.Format("{0} + {1}", _directOffset, _baseRef); }
}
}
namespace Program
{
using Assembler;
public static class Program
{
public static void Main(string[] args)
{
var myBaseRef1 = new SymbolicReference("mystring1");
Expression<Func<IReference>> anyRefExpr = () => 64 + myBaseRef1;
Console.WriteLine(anyRefExpr);
var myBaseRef2 = (SymbolicReference) "mystring2"; // uses explicit conversion operator
Expression<Func<IndexedReference>> indexedRefExpr = () => 64 + myBaseRef2;
Console.WriteLine(indexedRefExpr);
Console.WriteLine(Console.Out.NewLine + "=== show compiletime types of returned values:");
Console.WriteLine("myBaseRef1 -> {0}", myBaseRef1);
Console.WriteLine("myBaseRef2 -> {0}", myBaseRef2);
Console.WriteLine("anyRefExpr -> {0}", anyRefExpr.Compile().Method.ReturnType);
Console.WriteLine("indexedRefExpr -> {0}", indexedRefExpr.Compile().Method.ReturnType);
Console.WriteLine(Console.Out.NewLine + "=== show runtime types of returned values:");
Console.WriteLine("myBaseRef1 -> {0}", myBaseRef1);
Console.WriteLine("myBaseRef2 -> {0}", myBaseRef2);
Console.WriteLine("anyRefExpr -> {0}", anyRefExpr.Compile()()); // compile() returns Func<...>
Console.WriteLine("indexedRefExpr -> {0}", indexedRefExpr.Compile()());
Console.WriteLine(Console.Out.NewLine + "=== observe how you could add an evaluation model using some kind of symbol table:");
var compilerState = new State();
compilerState.SymbolTable.Add("mystring1", 0xdeadbeef); // raw addresses
compilerState.SymbolTable.Add("mystring2", 0xfeedface);
Console.WriteLine("myBaseRef1 evaluates to 0x{0:x8}", myBaseRef1.EvalAddress(compilerState));
Console.WriteLine("myBaseRef2 evaluates to 0x{0:x8}", myBaseRef2.EvalAddress(compilerState));
Console.WriteLine("anyRefExpr displays as {0:x8}", anyRefExpr.Compile()());
Console.WriteLine("indexedRefExpr displays as {0:x8}", indexedRefExpr.Compile()());
Console.WriteLine("anyRefExpr evaluates to 0x{0:x8}", anyRefExpr.Compile()().EvalAddress(compilerState));
Console.WriteLine("indexedRefExpr evaluates to 0x{0:x8}", indexedRefExpr.Compile()().EvalAddress(compilerState));
}
}
}
C# supports assigning a lambda expression to an Expression<TDelegate>, which will cause the compiler to emit code to create an expression tree representing the lambda expression, which you can then manipulate. E.g.:
Expression<Func<int, int, int>> times = (a, b) => a * b;
You could then potentially take the generated expression tree and convert it into your assembler's syntax tree, but this doesn't seem to be quite what you're looking for, and I don't think you're going to be able to leverage the C# compiler to do this for arbitrary input.
You're probably going to end up having to build your own parser for your assembly language, as I don't think the C# compiler is going to do what you want in this case.
Again, not quite sure if this is exactly what you're looking for, but from the starting point of wanting to create some kind of expression tree using C# syntax, I've come up with...
public abstract class BaseExpression
{
// Maybe a Compile() method here?
}
public class NumericExpression : BaseExpression
{
public static NumericExpression operator +(NumericExpression lhs, NumericExpression rhs)
{
return new NumericAddExpression(lhs, rhs);
}
public static NumericExpression operator -(NumericExpression lhs, NumericExpression rhs)
{
return new NumericSubtractExpression(lhs, rhs);
}
public static NumericExpression operator *(NumericExpression lhs, NumericExpression rhs)
{
return new NumericMultiplyExpression(lhs, rhs);
}
public static NumericExpression operator /(NumericExpression lhs, NumericExpression rhs)
{
return new NumericDivideExpression(lhs, rhs);
}
public static implicit operator NumericExpression(int value)
{
return new NumericConstantExpression(value);
}
public abstract int Evaluate(Dictionary<string,int> symbolTable);
public abstract override string ToString();
}
public abstract class NumericBinaryExpression : NumericExpression
{
protected NumericExpression LHS { get; private set; }
protected NumericExpression RHS { get; private set; }
protected NumericBinaryExpression(NumericExpression lhs, NumericExpression rhs)
{
LHS = lhs;
RHS = rhs;
}
public override string ToString()
{
return string.Format("{0} {1} {2}", LHS, Operator, RHS);
}
}
public class NumericAddExpression : NumericBinaryExpression
{
protected override string Operator { get { return "+"; } }
public NumericAddExpression(NumericExpression lhs, NumericExpression rhs)
: base(lhs, rhs)
{
}
public override int Evaluate(Dictionary<string,int> symbolTable)
{
return LHS.Evaluate(symbolTable) + RHS.Evaluate(symbolTable);
}
}
public class NumericSubtractExpression : NumericBinaryExpression
{
protected override string Operator { get { return "-"; } }
public NumericSubtractExpression(NumericExpression lhs, NumericExpression rhs)
: base(lhs, rhs)
{
}
public override int Evaluate(Dictionary<string, int> symbolTable)
{
return LHS.Evaluate(symbolTable) - RHS.Evaluate(symbolTable);
}
}
public class NumericMultiplyExpression : NumericBinaryExpression
{
protected override string Operator { get { return "*"; } }
public NumericMultiplyExpression(NumericExpression lhs, NumericExpression rhs)
: base(lhs, rhs)
{
}
public override int Evaluate(Dictionary<string, int> symbolTable)
{
return LHS.Evaluate(symbolTable) * RHS.Evaluate(symbolTable);
}
}
public class NumericDivideExpression : NumericBinaryExpression
{
protected override string Operator { get { return "/"; } }
public NumericDivideExpression(NumericExpression lhs, NumericExpression rhs)
: base(lhs, rhs)
{
}
public override int Evaluate(Dictionary<string, int> symbolTable)
{
return LHS.Evaluate(symbolTable) / RHS.Evaluate(symbolTable);
}
}
public class NumericReferenceExpression : NumericExpression
{
public string Symbol { get; private set; }
public NumericReferenceExpression(string symbol)
{
Symbol = symbol;
}
public override int Evaluate(Dictionary<string, int> symbolTable)
{
return symbolTable[Symbol];
}
public override string ToString()
{
return string.Format("Ref({0})", Symbol);
}
}
public class StringConstantExpression : BaseExpression
{
public string Value { get; private set; }
public StringConstantExpression(string value)
{
Value = value;
}
public static implicit operator StringConstantExpression(string value)
{
return new StringConstantExpression(value);
}
}
public class NumericConstantExpression : NumericExpression
{
public int Value { get; private set; }
public NumericConstantExpression(int value)
{
Value = value;
}
public override int Evaluate(Dictionary<string, int> symbolTable)
{
return Value;
}
public override string ToString()
{
return Value.ToString();
}
}
Now, obviously none of these classes actually do anything (you'd probably want a Compile() method on there amongst others) and not all the operators are implemented, and you can obviously shorten the class names to make it more concise etc... but it does allow you to do things like:
var result = 100 * new NumericReferenceExpression("Test") + 50;
After which, result will be:
NumericAddExpression
- LHS = NumericMultiplyExpression
- LHS = NumericConstantExpression(100)
- RHS = NumericReferenceExpression(Test)
- RHS = NumericConstantExpression(50)
It's not quite perfect - if you use the implicit conversions of numeric values to NumericConstantExpression (instead of explicitly casting/constructing them), then depending on the ordering of your terms, some of the calculations may be performed by the built-in operators, and you'll only get the resulting number (you could just call this a "compile-time optimization"!)
To show what I mean, if you were to instead run this:
var result = 25 * 4 * new NumericReferenceExpression("Test") + 50;
in this case, the 25 * 4 is evaluated using built-in integer operators, so the result is actually identical to the above, rather than building an additional NumericMultiplyExpression with two NumericConstantExpressions (25 and 4) on the LHS and RHS.
These expressions can be printed using ToString() and evaluated, if you provide a symbol table (here just a simple Dictionary<string, int>):
var result = 100 * new NumericReferenceExpression("Test") + 50;
var symbolTable = new Dictionary<string, int>
{
{ "Test", 30 }
};
Console.WriteLine("Pretty printed: {0}", result);
Console.WriteLine("Evaluated: {0}", result.Evaluate(symbolTable));
Results in:
Pretty printed: 100 * Ref(Test) + 50
Evaluated: 3050
Hopefully despite the drawback(s) mentioned, this is something approaching what you were looking fo (or I've just wasted the last half hour!)
You are implementing a two phase (pass?) assembler? The purpose of a two pass assembler
is to handle forward references (e.g., symbol that are undefined when first encountered).
Then you pretty much don't need to build an expression tree.
In phase (pass 1), you parse the source text (by any means you like: ad hoc parser, recursive descent, parser generator) and collect values of symbols (in particular, the relative values of labels with respect to the code or data section in which they are contained. If you encounter an expression, you attempt to evaluate it using on-the-fly expression evalution, typically involving a push down stack for subexpressions, and producing a final result. If you encounter a symbol whose value is undefined, you propagate the undefinedess as the expression result. If the assembly operator/command needs the expression value to define a symbol (eg., X EQU A+2) or to determine offsets into a code/data section (e.g, DS X+23), then the value must be defined or the assembler throws an error. This allows ORG A+B-C to work. Other assembly operators that don't need the value during pass one simply ignore the undefined result (e.g., LOAD ABC doesn't care what ABC is, but can determine the length of the LOAD instruction).
In phase (pass II), you re-parse the code the same way. This time all the symbols have values, so all expressions should evaluate. Those that had to have a value in Phase I are checked against the values produced in Phase II to ensure they are identical (otherwise you get a PHASE error). Other assembly operators/instructions now have enough information to generate the actual machine instructions or data initializations.
The point is, you never have to build an expression tree. You simply evaluate the expression as you encounter it.
If you built a one pass assembler, you might need to model the expression to allow re-evaluation later. I found it easier to produce reverse polish as sequence of "PUSH value" and arithop, and store the sequence (equivalent to the expression tree), because it is dense (trees are not) and trivial to evaluate by doing a linear scan using (as above) a small pushdown stack.
In fact what I did was to produce reverse polish that in fact acted as the expression stack itself; during a linear scan, if operands could be evaluated they were replaced by a "PUSH value" command, and the remaining reverse polish is squeezed to remove the bubble. This isnt expensive because most expressions are actually tiny. And it meant that any expression that had to saved for later evaluation was as small as possible. If you threaded the PUSH identifier commands through the symbol table, then when as symbol becomes defined, you can fill in all the partially evaluated expressions and reevaluate them; the ones that produce a single value are then processed and their space recycled. This allowed me to assemble giant programs in a 4K word, 16 bit machine, back in 1974, because most forward references don't really reach very far.

Categories

Resources