Unexpected behavior of adjacent wildcards in VB "Like" operator

Unexpected behavior of adjacent wildcards in VB "Like" operator - c#

I am working on some C# code which uses the .NET class Microsoft.VisualBasic.CompilerServices.LikeOperator. (Some additional context: I am porting this code to target .NET Standard 2.0, which does not support that class. It is missing from the .NET Standard 2.0 flavor of the Microsoft.VisualBasic nuget package. So I would like to replace the usage of LikeOperator, but unfortunately there is other code and maybe even end users which depend on the pattern language.)
While playing with LikeOperator.LikeString to make sure that I understand exactly what it does, I got an unexpected result. I used the pattern "foo*?bar" to express that I want to match any string which starts with "foo", ends with "bar", and has one or more characters in between.
However, the following call unexpectedly returns true:
LikeOperator.LikeString("foobar", "foo*?bar", CompareMethod.Text)
As far as I understand, the ? wildcard should force at least one character to be present between foo and bar, so I don't understand this result. Are these adjacent *? wildcards a special case?
edit: the like operator does seem to work as expected when tested in the VB.NET implementation of mono 6.12.0. So there seems to be a difference between the .NET Framework and mono here. Could this actually be a bug in Microsoft's implementation?

Related

Alternatives of CompileToMethod in .Net Standard

I'm now porting some library that uses expressions to .Net Core application and encountered a problem that all my logic is based on LambdaExpression.CompileToMethod which is simply missing in. Here is sample code:
public static MethodInfo CompileToInstanceMethod(this LambdaExpression expression, TypeBuilder tb, string methodName, MethodAttributes attributes)
{
...
var method = tb.DefineMethod($"<{proxy.Name}>__StaticProxy", MethodAttributes.Private | MethodAttributes.Static, proxy.ReturnType, paramTypes);
expression.CompileToMethod(method);
...
}
Is it possible to rewrite it somehow to make it possible to generate methods using Expressions? I already can do it with Emit but it's quite complex and i'd like to avoid it in favor of high-level Expressions.
I tried to use var method = expression.Compile().GetMethodInfo(); but in this case I get an error:
System.InvalidOperationException : Unable to import a global method or
field from a different module.
I know that I can emit IL manually, but I need exactly convert Expression -> to MethodInfo binded to specific TypeBuilder instead of building myself DynamicMethod on it.

It is not an ideal solution but it is worth considering if you don't want to write everything from the scratch:
If you look on CompileToMethod implementation, you will see that under the hood it uses internal LambdaCompiler class.
If you dig even deeper, you willl see that LambdaCompiler uses System.Reflection.Emit to convert lambdas into MethodInfo.
System.Reflection.Emit is supported by .NET Core.
Taking this into account, my proposition is to try to reuse the LambdaCompiler source code. You can find it here.
The biggest problem with this solution is that:
LambdaCompiler is spread among many files so it may be cumbersome to find what is needed to compile it.
LambdaCompiler may use some API which is not supported by .NET Core at all.
A few additional comments:
If you want to check which API is supported by which platform use .NET API Catalog.
If you want to see differences between .NET standard versions use this site.

Disclaimer: I am author of the library.
I have created Expression Compiler, which has similar API to that of Linq Expressions, with slight changes. https://github.com/yantrajs/yantra/wiki/Expression-Compiler
var a = YExpression.Parameter(typeof(int));
var b = YExpression.Parameter(typeof(int));
var exp = YExpression.Lambda<Func<int,int,int>>("add",
YExpression.Binary(a, YOperator.Add, b),
new YParameterExpression[] { a, b });
var fx = exp.CompileToStaticMethod(methodBuilder);
Assert.AreEqual(1, fx(1, 0));
Assert.AreEqual(3, fx(1, 2));
This library is part of JavaScript compiler we have created. We are actively developing it and we have added features of generators and async/await in JavaScript, so Instead of using Expression Compiler, you can create debuggable JavaScript code and run C# code easily into it.

I ran into the same issue when porting some code to netstandard. My solution was to compile the lambda to a Func using the Compile method, store the Func in a static field that I added to my dynamic type, then in my dynamic method I simply load and call the Func from that static field. This allows me to create the lambda using the LINQ Expression APIs instead of reflection emit (which would have been painful), but still have my dynamic type implement an interface (which was another requirement for my scenario).
Feels like a bit of a hack, but it works, and is probably easier than trying to recreate the CompileToMethod functionality via LambdaCompiler.

Attempting to get LambdaCompiler working on .NET Core
Building on Michal Komorowski's answer, I decided to give porting LambdaCompiler to .NET Core a try. You can find my effort here (GitHub link). The fact that the class is spread over multiple files is honestly one of the smallest problems here. A much bigger problem is that it relies on internal parts in the .NET Core codebase.
Quoting myself from the GitHub repo above:
Unfortunately, it is non-trivial because of (at least) the following issues:
AppDomain.CurrentDomain.DefineDynamicAssembly is unavailable in .NET Core - AssemblyBuilder.DefineDynamicAssembly replaces is. This SO answer describes how it can be used.
Assembly.DefineVersionInfoResource is unavailable.
Reliance on internal methods and properties, for example BlockExpression.ExpressionCount, BlockExpression.GetExpression, BinaryExpression.IsLiftedLogical etc.
For the reasons given above, an attempt to make this work as a standalone package is quite unfruitful in nature. The only realistic way to get this working would be to include it in .NET Core proper.
This, however, is problematic for other reasons. I believe the licensing is one of the stumbling stones here. Some of this code was probably written as part of the DLR effort, which it itself Apache-licensed. As soon as you have contributions from 3rd party contributors in the code base, relicensing it (to MIT, like the rest of the .NET Core codebase) becomes more or less impossible.
Other options
I think your best bet at the moment, depending on the use case, is one of these two:
Use the DLR (Dynamic Language Runtime), available from NuGet (Apache 2.0-licensed). This is the runtime that empowers IronPython, which is to the best of my knowledge the only actively maintained DLR-powered language out there. (Both IronRuby and IronJS seems to be effectively abandoned.) The DLR lets you define lambda expressions using Microsoft.Scripting.Ast.LambdaBuilder; this doesn't seem to be directly used by IronPython though. There is also the Microsoft.Scripting.Interpreter.LightCompiler class which seems to be quite interesting.
The DLR unfortunately seems to be quite poorly documented. I think there is a wiki being referred to by the CodePlex site, but it's offline (can probably be accessed by downloading the archive on CodePlex though).
Use Roslyn to compile the (dynamic) code for you. This probably has a bit of a learning curve as well; I am myself not very familiar with it yet unfortunately.
This seems to have quite a lot of curated links, tutorials etc: https://github.com/ironcev/awesome-roslyn. I would recommend this as a starting point. If you're specifically interested in building methods dynamically, these also seem worth reading:
https://gunnarpeipman.com/using-roslyn-to-build-object-to-object-mapper/
http://www.tugberkugurlu.com/archive/compiling-c-sharp-code-into-memory-and-executing-it-with-roslyn
Here are some other general Roslyn reading links. Most of these links are however focused on analyzing C# code here (which is one of the use cases for Roslyn), but Roslyn can be used to generate IL code (i.e. "compile") C# code as well.
The .NET Compiler Platform SDK: https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/
Get started with syntax analysis: https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/get-started/syntax-analysis
Tutorial: Write your first analyzer and code fix: https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/tutorials/how-to-write-csharp-analyzer-code-fix
There is also the third option, which is probably uninteresting for most of us:
Use System.Reflection.Emit directly, to generate the IL instructions. This is the approach used by e.g. the F# compiler.

Is there an equivalent to Expression.Parameter in C# 2.0

I ask because I'm trying to back port something to use with unity which uses C# 2.0. e.g. I have a line like this:
var patternSymbol = Expression.Parameter(typeof (T));
So can I substitute a bit of my own code to make this work? Note: I can change stuff around as I like but the further I get away from the original code the more work I'm likely to run into. Essentially I need to replicate the Expression/Parameter behaviour (from C# 3.0?) in C# 2.0.
(I'm a bit of a C# newbie and finding it difficult to find answers to what are now historical questions like this).

You cannot backport this code to .NET 2.0 because the entire LINQ subsystem is missing. System.Linq.Expressions.Expression class has been introduced as part of LINQ with C# 3.5, which was a big step forward for the .NET framework.
In particular, Expression.Parameter is not useful all by itself. Typically, you would find Expression.Lambda and a Compile calls further down the code, followed by a cast to Func<...>. Imitating these calls would require generating CLR code, which is almost like writing your own compiler. It's rarely worth the effort, though.
You need to see how the old code is used, and build a replacement from scratch. For example, you could replace a compiled expression with an interpreted one, which is considerably easier to build (at the price of lower run-time performance). In any event, you would end up writing an entirely new subsystem, not simply porting the code.

System.Uri.ToString behaviour change after VS2012 install

After installing VS2012 Premium on a dev machine a unit test failed, so the developer fixed the issue. When the changes were pushed to TeamCity the unit test failed. The project has not changed other than the solution file being upgraded to be compatible with VS2012. It still targets .net framework 4.0
I've isolated the problem to an issue with unicode characters being escaped when calling Uri.ToString. The following code replicates the behavior.
Imports NUnit.Framework
<TestFixture()>
Public Class UriTest
<Test()>
Public Sub UriToStringUrlDecodes()
Dim uri = New Uri("http://www.example.org/test?helloworld=foo%B6bar")
Assert.AreEqual("http://www.example.org/test?helloworld=foo¶bar", uri.ToString())
End Sub
End Class
Running this in VS2010 on a machine that does not have VS2012 installed succeeds, running this in VS2010 on a machine with VS2012 installed fails. Both using the latest version of NCrunch and NUnit from NuGet.
The messages from the failed assert are
Expected string length 46 but was 48. Strings differ at index 42.
Expected: "http://www.example.org/test?helloworld=foo¶bar"
But was: "http://www.example.org/test?helloworld=foo%B6bar"
-----------------------------------------------------^
The documentation on MSDN for both .NET 4 and .NET 4.5 shows that ToString should not encode this character, meaning that the old behavior should be the correct one.
A String instance that contains the unescaped canonical representation of the Uri instance. All characters are unescaped except #, ?, and %.
After installing VS2012, that unicode character is being escaped.
The file version of System.dll on the machine with VS2012 is 4.0.30319.17929
The file version of System.dll on the build server is 4.0.30319.236
Ignoring the merits of why we are using uri.ToString(), what we are testing and any potential work around. Can anyone explain why this behavior seems to have changed, or is this a bug?
Edit, here is the C# version
using System;
using NUnit.Framework;
namespace SystemUriCSharp
{
[TestFixture]
public class UriTest
{
[Test]
public void UriToStringDoesNotEscapeUnicodeCharacters()
{
var uri = new Uri(#"http://www.example.org/test?helloworld=foo%B6bar");
Assert.AreEqual(#"http://www.example.org/test?helloworld=foo¶bar", uri.ToString());
}
}
}
A bit of further investigation, if I target .NET 4.0 or .NET 4.5 the tests fail, if I switch it to .NET 3.5 then it succeeds.

There are some changes introduced in .NET Framework 4.5, which is installed along with VS2012, and which is also (to the best of my knowledge) a so called "in place upgrade". This means that it actually upgrades .NET Framework 4.
Furthermore, there are breaking changes documented in System.Uri. One of them says Unicode normalization form C (NFC) will no longer be performed on non-host portions of URIs. I am not sure whether this is applicable to your case, but it could serve as a good starting point in your investigation of the error.

The change is related to problems with earlier .NET versions, which have now changed to become more compliant to the standards. %B6 is UTF-16, but according to the standards UTF-8 should be used in the Uri, meaning that it should be %C2%B6. So as %B6 is not UTF-8 it is now correctly ignored and not decoded.
More details from the connect report quoted in verbatim below.
.NET 4.5 has enhanced and more compatible application of RFC 3987
which supports IRI parsing rules for URI's. IRIs are International
Resource Identifiers. This allows for non-ASCII characters to be in a
URI/IRI string to be parsed.
Prior to .NET 4.5, we had some inconsistent handling of IRIs. We had
an app.config entry with a default of false that you could turn on:
which did some IRI handling/parsing. However, it had some problems. In
particular it allowed for incorrect percent encoding handling.
Percent-encoded items in a URI/IRI string are supposed to be
percent-encoded UTF-8 octets according to RFC 3987. They are not
interpreted as percent-encoded UTF-16. So, handling “%B6” is incorrect
according to UTF-8 and no decoding will occur. The correct UTF-8
encoding for ¶ is actually “%C2%B6”.
If your string was this instead:
string strUri = #"http://www.example.com/test?helloworld=foo%C2%B6bar";
Then it will get normalized in the ToString() method and the
percent-encoding decoded and removed.
Can you provide more information about your application needs and the
use of ToString() method? Usually, we recommend the AbsoluteUri
property of the Uri object for most normalization needs.
If this issue is blocking your application development and business
needs then please let us know via the "netfx45compat at Microsoft dot
com" email address.
Thx,
Networking Team

In that situation you can't do like that.
The main issue is the character "¶".
In .Net we got a problem on character ¶.
You can make a research on that.
Take the uri' parameters one by one.
Add them by one by and compare them.
May be you can use a method for "¶" character to create it or replace it.
For example;
Dim uri = New Uri("http://www.example.org/test?helloworld=foo%B6bar")
Assert.AreEqual("http://www.example.org/test?helloworld=foo¶bar", uri.Host+uri.AbsolutePath+"?"+uri.Query)
that'll work
uri.AbsolutePath: /test
url.Host: http://www.example.org
uri.Query: helloworld=foo¶bar

Best approach to render MediaWiki in C#?

Question:
I want to render MediaWiki syntax (and I mean MediaWiki syntax as used by WikiPedia, not some other wiki format from some other engine such as WikiPlex), and that in C#.
Input: MediaWiki Markup string
Output: HTML string
There are some alternative mediawiki parsers, but nothing in C#, and additionally pinvoking C/C++ looks bleak, because of the structure of those libaries.
As syntax guidance, I use
http://en.wikipedia.org/wiki/Wikipedia:Cheatsheet
My first goal is to render that page's markup correctly.
Markup can be seen here:
http://en.wikipedia.org/w/index.php?title=Wikipedia:Cheatsheet&action=edit
Now, if I use Regex, it's not of much use, because one can't exactly say which tag ends which starting ones, especially when some elements, such as italic, become an attribute of the parent element.
On the other hand, parsing character by character is not a good approach either, because
for example ''' means bold, '' means italic, and ''''' means bold and italic...
I looked into porting some of the other parsers' code, but the java implementations are obscure, and the Python implementations have have a very different regex syntax.
The best approach I see so far would be to port mwlib to IronPython
http://www.mediawiki.org/wiki/Alternative_parsers
But frankly, I'm not looking forward to having the IronPython runtime added as a dependency to my application, and even if I would want to, the documentation is bad at best.

Update per 2017:
You can use ParseoidSharp to get a fully compatible MediaWiki-renderer.
It uses the official Wikipedia Parsoid library via NodeServices.
(NetStandard 2.0)
Since Parsoid is GPL 2.0, and and the GPL-code is invoked in nodejs in a separate process via network, you can even use any license you like ;)
Pre-2017
Problem solved.
As originally assumed, the solution lies in using one of the existing alternative parsers in C#.
WikiModel (Java) works well for that purpose.
First attempt was pinvoke kiwi.
It worked, but but failed because:
kiwi uses char* (fails on anything non-English/ASCII)
not thread safe.
bad because of the need have a native dll in the code for every architecture
(did add x86 and amd64, then it went kaboom on my ARM processor)
Second attempt was mwlib.
That failed because somehow IronPython doesn't work as it should.
Third attempt was Swebele, which essentially turned out to be academic vapoware.
The fourth attempt was using the original mediawiki renderer, using Phalanger. That failed because the MediaWiki renderer is not really modular.
The fifth attempt was using Wiky.php via Phalanger, which worked, but was slow and Wiky.php doesn't very completely implement MediaWiki.
The sixth attempt was using bliki via ikvmc, which failed because of the excessive use of 3rd party libraries ==> it compiles, but yields null-reference exceptions only
The seventh attempt was using JavaScript in C#, which worked but was very slow, plus the MediaWiki functionality implemented was very incomplete.
The 8th attempt was writing an own "parser" via Regex.
But the time required to make it work is just excessive, so I stopped.
The 9th attempt was successful.
Using ikvmc on WikiModel yields a useful dll.
The problem there was the example-code was hoplessly out of date.
But using google and the WikiModel sourcecode, I was able to piece it together.
The end-result can be found here:
https://github.com/ststeiger/MultiWikiParser

Why shouldn't this be possible with regular expressions?
inputString = Regex.Replace(inputString, #"(?:'''''')(.*?)(?:'''''')", #"<strong><em>$1</em></strong>");
inputString = Regex.Replace(inputString, #"(?:''')(.*?)(?:''')", #"<strong>$1</strong>");
inputString = Regex.Replace(inputString, #"(?:'')(.*?)(?:'')", #"<em>$1</em>");
This will, as far as I can see, render all 'Bold and italic', 'Bold' and 'Italic' text.

Here is how I once implemented a solution:
define your regular expressions for Markup->HTML conversion
regular expressions must be non greedy
collect the regular expressions in a Dictionary<char, List<RegEx>>
The char is the first (Markup) character in each RegEx, and RegEx's must be sorted by Markup keyword length desc, e.g. === before ==.
Iterate through the characters of the input string, and check if Dictionary.ContainsKey(char). If it does, search the List for matching RegEx. First matching RegEx wins.
As MediaWiki allows recursive markup (except for <pre> and others), the string inside the markup must also be processed in this fashion recursively.
If there is a match, skip ahead the number of characters matching the RegEx in input string. Otherwise proceed to next character.

Kiwi (https://github.com/aboutus/kiwi, mentioned on http://mediawiki.org/wiki/Alternative_parsers) may be a solution. Since it is C based, and I/O is simply done by stdin/stdout, it should not be too hard to create a "PInvoke"-able DLL from it.

As with the accepted solution I found parsoid is the best way forward as it's the official library - and has the greatest support for the wikimedia markup; that said I found ParseoidSharp to be using obsolete methods such as Microsoft.AspNetCore.NodeServices and really it's just a wrapper for a fairly old version of pasoid's npm package.
Since there is a fairly current version of parsoid in node.js you can use Jering.Javascript.NodeJS to do the same thing as ParseoidSharp, the steps are fairly similar also.
Install nodeJS (
download parsoid https://www.npmjs.com/package/parsoid place the required files in your project.
in powershell cd to your project
npm install
Then it's as simple as
output = StaticNodeJSService.InvokeFromFileAsync(Of String)(HttpContext.Current.Request.PhysicalApplicationPath & "./NodeScripts/parsee.js", args:=New Object() {Markup})
Bonus it's now much easier than ParseoidSharp's method to add the options required, e.g. you'll probably want to set the domain to your own domain.

How to name iOS-related elements in code?

I've been doing a lot of iPhone development lately, and I have a naming issue for which I can't think of a good solution. The problem is that somethimes I have to refer to "iOS" in variable, namespace, or class names. How should I do it? Suppose I have a version of "MyClass" that is designed for iOS. Should I call it:
iOSMyClass?
This is bad. Class names are supposed to start with a capital letter.
IOSMyClass?
This is bad. Now my class looks like an interface.
AppleMyClass?
This is better, but what if I create a version of the class for Macs?
AppleMobileMyClass?
This is better, but it's starting to get pretty verbose.
Any thoughts? I'm developing desktop software using C#.

Suggest going with your second choice, but a slight mod: IosMyClass. Consider the convention of 3 letter acronyms being Pascal cased (i.e. MVC in System.Web.Mvc). Of course, Apple flips that around in its implemention with iOS. However it sounds as if this is for the .NET space, and the desire is to follow its conventions.
There are handfuls of other classes (in the .NET Framework even), that start with the letter I. Admittedly, it's not ideal, as it's so close to the convention of prefixing interfaces with I. However, interfaces start with two upper case letters (I[A-Z]). That identifies them as an interface. So IosMyClass, by convention, is not an interface, but IIosFoo would be.

I know you specify C# as your language here, but I thought I should point out the standard Objective-C naming conventions, which you'll see as you look at other iOS code.
Objective-C lacks the concept of namespaces, so it is generally recommended that you prefix your class names with 2-3 capital letters that signify your company or framework to avoid naming collisions. Apple has been recommending 3 characters lately, because frameworks like Core Plot that use 2 letters have started having collisions with private classes in Apple's frameworks (like CPImage, in this case). For another take on the naming guidelines, see here.
For that reason, I would recommend against the use of a generic iOS prefix, because Apple might use that internally for something and cause your application to break. Admittedly, I don't know how MonoTouch C# classes interact with the Cocoa framework classes, but I'd want to avoid any chance of a problem like this.
Again, if I may point to the Core Plot framework, there is some code within that framework that is specific to Mac and some to iOS, yet separate classes are not used for each platform. Platform-specific methods are extracted from the main class files and placed in categories. These categories have the same name, but the Mac target compiles using one file and the iOS target uses another. Similarly, entire classes that are specific to each platform are given the same name and interface, but put in different files. Which file is used at compilation is determined by the platform target.
Therefore, I'd recommend against giving iOS-specific classes a unique name unless they have no analogue on your other target platforms.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.