What .NET StringComparer is equivalent SQL's Latin1_General_CI_AS - c#

I am implementing a caching layer between my database and my C# code. The idea is to cache the results of certain DB queries based on the parameters to the query. The database is using the default collation - either SQL_Latin1_General_CP1_CI_AS or Latin1_General_CI_AS, which I believe based on some brief googling are equivalent for equality, just different for sorting.
I need a .NET StringComparer that can give me the same behavior, at least for equality testing and hashcode generation, as the database's collation is using. The goal is to be able to use the StringComparer in a .NET dictionary in C# code to determine whether a particular string key is already in the cache or not.
A really simplified example:
var comparer = StringComparer.??? // What goes here?
private static Dictionary<string, MyObject> cache =
new Dictionary<string, MyObject>(comparer);
public static MyObject GetObject(string key) {
if (cache.ContainsKey(key)) {
return cache[key].Clone();
} else {
// invoke SQL "select * from mytable where mykey = #mykey"
// with parameter #mykey set to key
MyObject result = // object constructed from the sql result
cache[key] = result;
return result.Clone();
}
}
public static void SaveObject(string key, MyObject obj) {
// invoke SQL "update mytable set ... where mykey = #mykey" etc
cache[key] = obj.Clone();
}
The reason it's important that the StringComparer matches the database's collation is that both false positives and false negatives would have bad effects for the code.
If the StringComparer says that two keys A and B are equal when the database believes they are distinct, then there could be two rows in the database with those two keys, but the cache will prevent the second one ever getting returned if asked for A and B in succession - because the get for B will incorrectly hit the cache and return the object that was retrieved for A.
The problem is more subtle if the StringComparer says that A and B are different when the database believes they are equal, but no less problematic. GetObject calls for both keys would be fine, and return objects corresponding to the same database row. But then calling SaveObject with key A would leave the cache incorrect; there would still be a cache entry for key B that has the old data. A subsequent GetObject(B) would give outdated information.
So for my code to work correctly I need the StringComparer to match the database behavior for equality testing and hashcode generation. My googling so far has yielded lots of information about the fact that SQL collations and .NET comparisons are not exactly equivalent, but no details on what the differences are, whether they are limited to only differences in sorting, or whether it is possible to find a StringComparer that is equivalent to a specific SQL collation if a general-purpose solution is not needed.
(Side note - the caching layer is general purpose, so I cannot make particular assumptions about what the nature of the key is and what collation would be appropriate. All the tables in my database share the same default server collation. I just need to match the collation as it exists)

I've recently faced with the same problem: I need an IEqualityComparer<string> that behaves in SQL-like style. I've tried CollationInfo and its EqualityComparer. If your DB is always _AS (accent sensitive) then your solution will work, but in case if you change the collation that is AI or WI or whatever "insensitive" else the hashing will break.
Why? If you decompile Microsoft.SqlServer.Management.SqlParser.dll and look inside you'll find out that CollationInfo internally uses CultureAwareComparer.GetHashCode (it's internal class of mscorlib.dll) and finally it does the following:
public override int GetHashCode(string obj)
{
if (obj == null)
throw new ArgumentNullException("obj");
CompareOptions options = CompareOptions.None;
if (this._ignoreCase)
options |= CompareOptions.IgnoreCase;
return this._compareInfo.GetHashCodeOfString(obj, options);
}
As you can see it can produce the same hashcode for "aa" and "AA", but not for "äå" and "aa" (which are the same, if you ignore diacritics (AI) in majority of cultures, so they should have the same hashcode). I don't know why the .NET API is limited by this, but you should understand where the problem can come from.
To get the same hashcode for strings with diacritics you can do the following: create implementation of IEqualityComparer<T> implementing the GetHashCode that will call appropriate CompareInfo's object's GetHashCodeOfString via reflection because this method is internal and can't be used directly. But calling it directly with correct CompareOptions will produce the desired result:
See this example:
static void Main(string[] args)
{
const string outputPath = "output.txt";
const string latin1GeneralCiAiKsWs = "Latin1_General_100_CI_AI_KS_WS";
using (FileStream fileStream = File.Open(outputPath, FileMode.Create, FileAccess.Write))
{
using (var streamWriter = new StreamWriter(fileStream, Encoding.UTF8))
{
string[] strings = { "aa", "AA", "äå", "ÄÅ" };
CompareInfo compareInfo = CultureInfo.GetCultureInfo(1033).CompareInfo;
MethodInfo GetHashCodeOfString = compareInfo.GetType()
.GetMethod("GetHashCodeOfString",
BindingFlags.Instance | BindingFlags.NonPublic,
null,
new[] { typeof(string), typeof(CompareOptions), typeof(bool), typeof(long) },
null);
Func<string, int> correctHackGetHashCode = s => (int)GetHashCodeOfString.Invoke(compareInfo,
new object[] { s, CompareOptions.IgnoreCase | CompareOptions.IgnoreNonSpace, false, 0L });
Func<string, int> incorrectCollationInfoGetHashCode =
s => CollationInfo.GetCollationInfo(latin1GeneralCiAiKsWs).EqualityComparer.GetHashCode(s);
PrintHashCodes(latin1GeneralCiAiKsWs, incorrectCollationInfoGetHashCode, streamWriter, strings);
PrintHashCodes("----", correctHackGetHashCode, streamWriter, strings);
}
}
Process.Start(outputPath);
}
private static void PrintHashCodes(string collation, Func<string, int> getHashCode, TextWriter writer, params string[] strings)
{
writer.WriteLine(Environment.NewLine + "Used collation: {0}", collation + Environment.NewLine);
foreach (string s in strings)
{
WriteStringHashcode(writer, s, getHashCode(s));
}
}
The output is:
Used collation: Latin1_General_100_CI_AI_KS_WS
aa, hashcode: 2053722942
AA, hashcode: 2053722942
äå, hashcode: -266555795
ÄÅ, hashcode: -266555795
Used collation: ----
aa, hashcode: 2053722942
AA, hashcode: 2053722942
äå, hashcode: 2053722942
ÄÅ, hashcode: 2053722942
I know it looks like the hack, but after inspecting decompiled .NET code I'm not sure if there any other option in case the generic functionality is needed.
So be sure that you'll not fall into trap using this not fully correct API.
UPDATE:
I've also created the gist with potential implementation of "SQL-like comparer" using CollationInfo.
Also there should be paid enough attention where to search for "string pitfalls" in your code base, so if the string comparison, hashcode, equality should be changed to "SQL collation-like" those places are 100% will be broken, so you'll have to find out and inspect all the places that can be broken.
UPDATE #2:
There is better and cleaner way to make GetHashCode() treat CompareOptions. There is the class SortKey that works correctly with CompareOptions and it can be retrieved using
CompareInfo.GetSortKey(yourString, yourCompareOptions).GetHashCode()
Here is the link to .NET source code and implementation.
UPDATE #3:
If you're on .NET Framework 4.7.1+ you should use new GlobalizationExtensions class as proposed by this recent answer.

Take a look at the CollationInfo class. It is located in an assembly called Microsoft.SqlServer.Management.SqlParser.dll although I am not totally sure where to get this. There is a static list of Collations (names) and a static method GetCollationInfo (by name).
Each CollationInfo has a Comparer. It is not exactly the same as a StringComparer but has similar functionality.
EDIT: Microsoft.SqlServer.Management.SqlParser.dll is a part of the Shared Management Objects (SMO) package. This feature can be downloaded for SQL Server 2008 R2 here:
http://www.microsoft.com/download/en/details.aspx?id=16978#SMO
EDIT: CollationInfo does have a property named EqualityComparer which is an IEqualityComparer<string>.

The following is much simpler:
System.Globalization.CultureInfo.GetCultureInfo(1033)
.CompareInfo.GetStringComparer(CompareOptions.IgnoreCase | CompareOptions.IgnoreKanaType | CompareOptions.IgnoreWidth)
It comes from https://learn.microsoft.com/en-us/dotnet/api/system.globalization.globalizationextensions?view=netframework-4.8
It computes the hashcode correctly given the options given.
You'll still have to trim trailing spaces manually, as they as discarded by ANSI sql but not in .net
Here is a wrapper that trims spaces.
using System.Collections.Generic;
using System.Globalization;
namespace Wish.Core
{
public class SqlStringComparer : IEqualityComparer<string>
{
public static IEqualityComparer<string> Instance { get; }
private static IEqualityComparer<string> _internalComparer =
CultureInfo.GetCultureInfo(1033)
.CompareInfo
.GetStringComparer(CompareOptions.IgnoreCase | CompareOptions.IgnoreKanaType | CompareOptions.IgnoreWidth);
private SqlStringComparer()
{
}
public bool Equals(string x, string y)
{
//ANSI sql doesn't consider trailing spaces but .Net does
return _internalComparer.Equals(x?.TrimEnd(), y?.TrimEnd());
}
public int GetHashCode(string obj)
{
return _internalComparer.GetHashCode(obj?.TrimEnd());
}
static SqlStringComparer()
{
Instance = new SqlStringComparer();
}
}
}

SQL Server's Server.GetStringComparer may be of some use.

Related

Is it possible to compare multiple values with one field? [duplicate]

Any easier way to write this if statement?
if (value==1 || value==2)
For example... in SQL you can say where value in (1,2) instead of where value=1 or value=2.
I'm looking for something that would work with any basic type... string, int, etc.
How about:
if (new[] {1, 2}.Contains(value))
It's a hack though :)
Or if you don't mind creating your own extension method, you can create the following:
public static bool In<T>(this T obj, params T[] args)
{
return args.Contains(obj);
}
And you can use it like this:
if (1.In(1, 2))
:)
A more complicated way :) that emulates SQL's 'IN':
public static class Ext {
public static bool In<T>(this T t,params T[] values){
foreach (T value in values) {
if (t.Equals(value)) {
return true;
}
}
return false;
}
}
if (value.In(1,2)) {
// ...
}
But go for the standard way, it's more readable.
EDIT: a better solution, according to #Kobi's suggestion:
public static class Ext {
public static bool In<T>(this T t,params T[] values){
return values.Contains(t);
}
}
C# 9 supports this directly:
if (value is 1 or 2)
however, in many cases: switch might be clearer (especially with more recent switch syntax enhancements). You can see this here, with the if (value is 1 or 2) getting compiled identically to if (value == 1 || value == 2).
Is this what you are looking for ?
if (new int[] { 1, 2, 3, 4, 5 }.Contains(value))
If you have a List, you can use .Contains(yourObject), if you're just looking for it existing (like a where). Otherwise look at Linq .Any() extension method.
Using Linq,
if(new int[] {1, 2}.Contains(value))
But I'd have to think that your original if is faster.
Alternatively, and this would give you more flexibility if testing for values other than 1 or 2 in future, is to use a switch statement
switch(value)
{
case 1:
case 2:
return true;
default:
return false
}
If you search a value in a fixed list of values many times in a long list, HashSet<T> should be used. If the list is very short (< ~20 items), List could have better performance, based on this test
HashSet vs. List performance
HashSet<int> nums = new HashSet<int> { 1, 2, 3, 4, 5 };
// ....
if (nums.Contains(value))
Generally, no.
Yes, there are cases where the list is in an Array or List, but that's not the general case.
An extensionmethod like this would do it...
public static bool In<T>(this T item, params T[] items)
{
return items.Contains(item);
}
Use it like this:
Console.WriteLine(1.In(1,2,3));
Console.WriteLine("a".In("a", "b"));
You can use the switch statement with pattern matching (another version of jules's answer):
if (value switch{1 or 3 => true,_ => false}){
// do something
}
Easier is subjective, but maybe the switch statement would be easier? You don't have to repeat the variable, so more values can fit on the line, and a line with many comparisons is more legible than the counterpart using the if statement.
In vb.net or C# I would expect that the fastest general approach to compare a variable against any reasonable number of separately-named objects (as opposed to e.g. all the things in a collection) will be to simply compare each object against the comparand much as you have done. It is certainly possible to create an instance of a collection and see if it contains the object, and doing so may be more expressive than comparing the object against all items individually, but unless one uses a construct which the compiler can explicitly recognize, such code will almost certainly be much slower than simply doing the individual comparisons. I wouldn't worry about speed if the code will by its nature run at most a few hundred times per second, but I'd be wary of the code being repurposed to something that's run much more often than originally intended.
An alternative approach, if a variable is something like an enumeration type, is to choose power-of-two enumeration values to permit the use of bitmasks. If the enumeration type has 32 or fewer valid values (e.g. starting Harry=1, Ron=2, Hermione=4, Ginny=8, Neville=16) one could store them in an integer and check for multiple bits at once in a single operation ((if ((thisOne & (Harry | Ron | Neville | Beatrix)) != 0) /* Do something */. This will allow for fast code, but is limited to enumerations with a small number of values.
A somewhat more powerful approach, but one which must be used with care, is to use some bits of the value to indicate attributes of something, while other bits identify the item. For example, bit 30 could indicate that a character is male, bit 29 could indicate friend-of-Harry, etc. while the lower bits distinguish between characters. This approach would allow for adding characters who may or may not be friend-of-Harry, without requiring the code that checks for friend-of-Harry to change. One caveat with doing this is that one must distinguish between enumeration constants that are used to SET an enumeration value, and those used to TEST it. For example, to set a variable to indicate Harry, one might want to set it to 0x60000001, but to see if a variable IS Harry, one should bit-test it with 0x00000001.
One more approach, which may be useful if the total number of possible values is moderate (e.g. 16-16,000 or so) is to have an array of flags associated with each value. One could then code something like "if (((characterAttributes[theCharacter] & chracterAttribute.Male) != 0)". This approach will work best when the number of characters is fairly small. If array is too large, cache misses may slow down the code to the point that testing against a small number of characters individually would be faster.
Using Extension Methods:
public static class ObjectExtension
{
public static bool In(this object obj, params object[] objects)
{
if (objects == null || obj == null)
return false;
object found = objects.FirstOrDefault(o => o.GetType().Equals(obj.GetType()) && o.Equals(obj));
return (found != null);
}
}
Now you can do this:
string role= "Admin";
if (role.In("Admin", "Director"))
{
...
}
public static bool EqualsAny<T>(IEquatable<T> value, params T[] possibleMatches) {
foreach (T t in possibleMatches) {
if (value.Equals(t))
return true;
}
return false;
}
public static bool EqualsAny<T>(IEquatable<T> value, IEnumerable<T> possibleMatches) {
foreach (T t in possibleMatches) {
if (value.Equals(t))
return true;
}
return false;
}
I had the same problem but solved it with a switch statement
switch(a value you are switching on)
{
case 1:
the code you want to happen;
case 2:
the code you want to happen;
default:
return a value
}

Elegant way to query a dictionary in C#

I am trying to create an elegant and extensible way of querying a dictionary which maps an enum to a set of strings.
So I have this class SearchFragments that has the dictionary in it. I then want a method wherein consumers of this class can simply ask "HasAny" and, this is the bit where I am struggling, simply pass in some query like expression and get the boolean answer back.
public class SearchFragments
{
private readonly IDictionary<SearchFragmentEnum, IEnumerable<string>> _fragments;
public SearchFragments()
{
_fragments = new Dictionary<SearchFragmentEnum, IEnumerable<string>>();
}
public bool HasAny(IEnumerable<SearchFragmentEnum> of)
{
int has = 0;
_fragments.ForEach(x => of.ForEach(y => has += x.Key == y ? 1 : 0));
return has >= 1;
}
}
The problem with the way this currently is, is that consumers of this class now have to construct an IEnumerable<SearchFragmentEnum> which can be quite messy.
What I am looking for is that the consuming code will be able to write something along the lines of:
searchFragments.HasAny(SearchFragmentEnum.Name, SearchFragmentEnum.PhoneNumber)
But where that argument list can vary in size (without me having to write method overloads in the SearchFragments class for every possible combination (such that if new values are added to the SearchFragmentEnum at a future date I won't have to update the class.
You can use params[]
public bool HasAny(params SearchFragmentEnum[] of)
{ ...
Sidenote: you know that LIN(Q) queries should just query a source and never cause any side-effects? But your query does unnecessarily increment the integer:
_fragments.ForEach(x => of.ForEach(y => has += x.Key == y ? 1 : 0));
Instead use this (which is also more efficient and more readable):
return _fragments.Keys.Intersect(of).Any();
An even more efficient alternative to this is Sergey's idea:
return of?.Any(_fragments.ContainsKey) == true;
For variable sized arguments in c# you use the params keyword:
public int HasAny(params SearchFragmentEnum[] of)
The .Net API usually offers a couple of overloads of this for performance reasons; the parameters passed are copied into a new array. Explicitely providing overloads for the most common cases avoids this.
public int HasAny(SearchfragmentEnum of1)
public int HasAny(SearchFragmentEnum of1, SearchFragmentEnum of2)
etc.
Instead of using params you could also consider marking your enum with the [Flags] attribute. Parameters could than be passed like HasAny(SearchFragmentEnum.Name | SearchFragmentEnum.PhoneNumber. Examples abundant on StackOverflow (e.g. Using a bitmask in C#)
Use the params keyword to allow a varying number of arguments. Further, you can simplify your code by looping over the smaller of array. Also, you are using a dictionary that has O(1) key check, so it is uneccessary to have an inner loop:
public bool HasAny(params SearchFragmentEnum[] of)
{
foreach(var o in of) {
if (this._fragments.ContainsKey(o))
return true;
}
return false;
}
or shorter with LINQ
public bool HasAny(params SearchFragmentEnum[] of) {
return of?.Any(_fragments.ContainsKey) ?? false;
}

How can I get a hash of the body of a method in .net?

I'd like to generate a hash of a method body or entire class in order to compare it to a previous hash and therefore detect changes at run-time.
I've seen reference to this being achieved in Mono using Mono.Cecil.
Is it possible to do this in Microsoft.NET? If so how?
Or what other approaches have you used?
Background
I'd like to create projections of my entities into a read-only, denormalised database - a kind of local cache if you like.
The transformation algorithm that does this will be pinged whenever a change has occurred to the entities. It will then take the entity in question and generate the projection. However, on start up or as part of the deployment, I'd like to check if the transformation algorithm has changed and if so delete and regenerate the table and then repopulate it using all the entities in the main store.
I've seen this approach suggested in Event Source/CQRS blogs but feel free to comment if you have alternative/better approaches to this requirement.
Incomplete Solution based on Jon Skeet's Answer
The following LinqPad C# Program is an attempt at implementing this:
void Main()
{
GetMethodHash(this.GetType(), "DoOriginal").Dump();
GetMethodHash(this.GetType(), "DoSame").Dump();
GetMethodHash(this.GetType(), "DoWithConstChange").Dump();
GetMethodHash(this.GetType(), "DoWithCodeChange").Dump();
}
// Define other methods and classes here
private string DoOriginal(int data)
{
return string.Format("data = {0}", data);
}
private string DoSame(int data)
{
return string.Format("data = {0}", data);
}
private string DoWithConstChange(int data)
{
return string.Format("data : {0}", data);
}
private string DoWithCodeChange(int data)
{
return string.Format("data = {0} {1}", data, 999);
}
private string GetMethodHash(Type type, string methodName)
{
var method = type.GetMethod(methodName, BindingFlags.NonPublic | BindingFlags.Instance );
var body = method.GetMethodBody();
var il = body.GetILAsByteArray();
using (var sha256 = System.Security.Cryptography.SHA256.Create())
{
byte[] hash = sha256.ComputeHash(il);
string hashBase64 = Convert.ToBase64String(hash);
return string.Format("{0}{1}", methodName.PadRight(20), hashBase64);
}
}
Outputs:
DoOriginal +PyMqwv5PVz7/7EnNL9A676XNw/g8x2KfEUa63wD9oU=
DoSame +PyMqwv5PVz7/7EnNL9A676XNw/g8x2KfEUa63wD9oU=
DoWithConstChange VZX2CkxwUHzJVt8Tn73jmL1I5dKYUFKyD9bPwPsItVk=
DoWithCodeChange 9kZkq0fDHUXqxOmclwso/r2pQLDyNVwTtkIppSkcdQg=
My first investigation with this appeared to produce the same hash between DoOriginal and DoWithConstChange ... but that was before lunch and now it looks okay. So either something odd was going on or I needed the break - I vote the latter.
Next up would be to add in referenced code and structures. Any suggestions?
You can use MethodInfo.GetMethodBody() to get a MethodBody, then create a hash based on whatever you like (e.g. the IL byte array) using whatever kind of hash you want (e.g. SHA-256).
Of course, it's possible for the IL to change without the source code changing (e.g. due to changing compiler) - or vice versa (e.g. if a local variable name was changed but not present in the optimized IL).

Implementing a content-hashable HashSet in C# (like python's `frozenset`)

Brief summary
I want to build a set of sets of items in C#. The inner sets of items have a GetHashCode and Equals method defined by their contents. In mathematical notation:
x = { }
x.Add( { A, B, C } )
x.Add( { A, D } )
x.Add( { B, C, A } )
now x should be{ { A, B, C }, { A, D } }
In python, this could be accomplished with frozenset:
x = set()
x.add( frozenset(['A','B','C']) )
x.add( frozenset(['A','D']) )
x.add( frozenset(['B','C','A']) )
/BriefSummary
I would like to have a hashable HashSet in C#. This would allow me to do:
HashSet<ContentHashableHashSet<int>> setOfSets;
Although there are more sophisticated ways to accomplish this, This can be trivially achieved in practice (although not in the most efficient manner) by adding overriding ContentHashableHashSet.ToString() (outputing the strings of the elements contained in sorted order) and then using then using ContentHashableHashSet.ToString().GetHashCode() as the hash code.
However, if an object is modified after placement in setOfSets, it could result in multiple copies:
var setA = new ContentHashableHashSet<int>();
setA.Add(1);
setA.Add(2);
var setB = new ContentHashableHashSet<int>();
setB.Add(1);
setOfSets.Add(setA);
setOfSets.Add(setB);
setB.Add(2); // now there are duplicate members!
As far as I can see, I have two options: I can derive ContentHashableHashSet from HashSet, but then I will need to make it so that all modifiers throw an exception. Missing one modifier could cause an insidious bug.
Alternatively, I can use encapsulation and class ContentHashableHashSet can contain a readonly HashSet. But then I would need to reimplement all set methods (except modifiers) so that the ContentHashableHashSet can behave like a HashSet. As far as I know, extensions would not apply.
Lastly, I could encapsulate as above and then all set-like functionality will occur by returning the const (or readonly?) HashSet member.
In hindsight, this is reminiscent of python's frozenset. Does anyone know of a well-designed way to implement this in C#?
If I was able to lose ISet functionality, then I would simply create a sorted ImmutableList, but then I would lose functionality like fast union, fast intersection, and sub-linear ( roughly O(log(n)) ) set membership with Contains.
EDIT: The base class HashSet does not have virtual Add and Remove methods, so overriding them will work within the derived class, but will not work if you perform HashSet<int> set = new ContentHashableHashSet<int>();. Casting to the base class will allow editing.
EDIT 2: Thanks to #xanatos for recommending a simple GetHashCode implementation:
The easiest way to calculate the GetHashCode is to simply xor (^) all the gethashcodes of the elements. The xor operator is commutative, so the ordering is irrelevant. For the comparison you can use the SetEquals
EDIT 3: Someone recently shared information about ImmutableHashSet, but because this class is sealed, it is not possible to derive from it and override GetHashCode.
I was also told that HashSet takes an IEqualityComparer as an argument, and so this can be used to provide an immutable, content-hashable set without deriving from ImmutableHashSet; however, this is not a very object oriented solution: every time I want to use a ContentHashableHashSet, I will need to pass the same (non-trivial) argument. As I'm sure you know, this can really wreak havoc with your coding zen, and where I would be flying along in python with myDictionary[ frozenset(mySet) ] = myValue, I will be stuck doing the same thing again and again and again.
Thanks for any help you can provide. I have a temporary workaround (whose problems are mentioned in EDIT 1 above), but I'd mostly like to learn about the best way to design something like this.
Hide the elements of your set of sets so that they can't be changed. That means copying when you add/retrieve sets, but maybe that's acceptable?
// Better make sure T is immutable too, else set hashes could change
public class SetofSets<T>
{
private class HashSetComparer : IEqualityComparer<HashSet<T>>
{
public int GetHashCode(HashSet<T> x)
{
return x.Aggregate(1, (code,elt) => code ^ elt.GetHashCode());
}
public bool Equals(HashSet<T> x, HashSet<T> y)
{
if (x==null)
return y==null;
return x.SetEquals(y);
}
}
private HashSet<HashSet<T>> setOfSets;
public SetofSets()
{
setOfSets = new HashSet<HashSet<T>>(new HashSetComparer());
}
public void Add(HashSet<T> set)
{
setOfSets.Add(new HashSet<T>(set));
}
public bool Contains(HashSet<T> set)
{
return setOfSets.Contains(set);
}
}

How to inject/generate plumbing code into methods decorated with an Attribute?

I was reading through some articles on Caching and Memoization and how to implement it easily using delegates and generics. The syntax was pretty straightforward, and it is surprisingly easy to implement, but I just feel due to the repetitive nature it should be possible to generate code based on an Attribute, instead of having to write the same plumbing code over and over.
Let's say we start off with the default example:
class Foo
{
public int Fibonacci(int n)
{
return n > 1 ? Fibonacci(n-1) + Fibonacci(n-2) : n;
}
}
And then to memoize this:
// Let's say we have a utility class somewhere with the following extension method:
// public static Func<TResult> Memoize<TResult>(this Func<TResult> f)
class Foo
{
public Func<int,int> Fibonacci = fib;
public Foo()
{
Fibonacci = Fibonacci.Memoize();
}
public int fib(int n)
{
return n > 1 ? Fibonacci(n-1) + Fibonacci(n-2) : n;
}
}
I thought, wouldn't it be simpler to just make a code generator that spits out this code, once it finds a tagged method that matches one of the Memoize extension methods. So in stead of writing this plumbing code, I could just add an attribute:
class Foo
{
[Memoize]
public int Fibonacci(int n)
{
return n > 1 ? Fibonacci(n-1) + Fibonacci(n-2) : n;
}
}
Honestly, I know this is looking more like compiler sugar that should be converted by a preprocessor than actual code generation but my question is:
What do you think is the best way to find the methods in a c# source file that have a given attribute, parsing out the parametertypes and returntype, and generating a delegate that matches this fingerprint
What would be the best way to integrate this into the build process, without actually overwriting my code. Is it possible to do some preprocessing on the source files before passing it on to the compiler?
Thanks for any and all ideas.
Update:
I have looked into the Postsharp library as Shay had suggested, and it seemed very suited for the job on non time-critical applications like Transaction Management, Tracing or Security.
However when using it in a time-critical context it proved a LOT slower than the delegate. One million iterations of the Fibonacci example with each implementation resulted in a 80x slower runtime. (0.012ms postsharp vs 0.00015ms delegate per call)
But honestly the result is completely acceptable in the context in which I intend to use it. Thanks for the responses!
Update2:
Apparently the author of Postsharp is working hard on a release 2.0 which will include, among other things, performance improvements in produced code, and compile time.
I bump into this memoizer attribute using postsharp
I've used the following Memoize function in a project of mine:
public class Foo
{
public int Fibonacci(int n)
{
return n > 1 ? Fibonacci(n - 1) + Fibonacci(n - 2) : n;
}
}
class Program
{
public static Func<Т, TResult> Memoize<Т, TResult>(Func<Т, TResult> f) where Т : IEquatable<Т>
{
Dictionary<Т, TResult> map = new Dictionary<Т, TResult>();
return a =>
{
TResult local;
if (!TryGetValue<Т, TResult>(map, a, out local))
{
local = f(a);
map.Add(a, local);
}
return local;
};
}
private static bool TryGetValue<Т, TResult>(Dictionary<Т, TResult> map, Т key, out TResult value) where Т : IEquatable<Т>
{
EqualityComparer<Т> comparer = EqualityComparer<Т>.Default;
foreach (KeyValuePair<Т, TResult> pair in map)
{
if (comparer.Equals(pair.Key, key))
{
value = pair.Value;
return true;
}
}
value = default(TResult);
return false;
}
static void Main(string[] args)
{
var foo = new Foo();
// Transform the original function and render it with memory
var memoizedFibonacci = Memoize<int, int>(foo.Fibonacci);
// memoizedFibonacci is a transformation of the original function that can be used from now on:
// Note that only the first call will hit the original function
Console.WriteLine(memoizedFibonacci(3));
Console.WriteLine(memoizedFibonacci(3));
Console.WriteLine(memoizedFibonacci(3));
Console.WriteLine(memoizedFibonacci(3));
}
}
In my project I needed only functions with a single argument that implement IEquatable<Т> but this can be generalized even further.
Another important remark is that this code is not thread safe. If you need thread safety you will need to synchronize the read/write access to the internal map hashtable.
To specifically address your points:
This would be likely too hard to do
in the way that you describe, as
you'd need a full-blown C# grammar
parser. What might be a more viable
alternative is writing a managed app
which could load the compiled
assembly and extract type
information using Reflection. This
would involve getting all Type
objects in a given assembly, looking
for methods on the types, retrieving
the custom attribute, and then
emitting the memoization code (this
part might be a bit difficult).
If you go the route I mention in #1, you could simply
add a post-build step to run your tool. Visual Studio
(which uses MSBuild underneath) makes this relatively easy.
If you write a plugin to PostSharp instead of using its LAOS library, you would get no performance hit.

Categories

Resources