Strange behavior in Uri-class (.NET) - c#

Why does the Uri class urldecode my url that I send to its contructor and how can I prevent this?
Example (look at the querystring value "options"):
string url = "http://www.example.com/default.aspx?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3";
Uri uri = new Uri(url); // http://www.example.com/default.aspx?id=1&name=andreas&options=one=1&two=2&three=3
Update:
// ?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3
Request.QueryString["options"] = one=1&two=2&three=3
// ?id=1&name=andreas&options=one=1&two=2&three=3
Request.QueryString["options"] = one=1
This is my problem :)

why exactly?
you can get to the encoded version using url.AbsoluteUri
EDIT
Console.WriteLine("1) " + uri.AbsoluteUri);
Console.WriteLine("2) " + uri.Query);
OUT:
1) http://www.example.com/default.aspx?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3
2) ?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3

I would expect that from a Uri class. I am quite sure that it still gets you in a good place if you use it with e.g. WebClient class (i.e. WebClient.OpenRead (Uri uri)). What's the problem in your case?

This is how the internal code of .NET behaves - in previous versions you could use another constructor of Uri that accepted boolean value telling if to escape or not, but it has been deprecated.
The only way around it is hackish: accessing some private method directly by means of reflection:
string url = "http://www.example.com/default.aspx?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3";
Uri uri = new Uri(url);
MethodInfo mi = uri.GetType().GetMethod("CreateThis", BindingFlags.NonPublic | BindingFlags.Instance);
if (mi != null)
mi.Invoke(uri, new object[] { url, true, UriKind.RelativeOrAbsolute });
This worked for me in quick test, but not ideal as you "hack" into .NET internal code.

Related

Convert a string or html file to C# HtmlDocument without using WebBrowser or HAP

The only solution I could find was using:
mshtml.HTMLDocument htmldocu = new mshtml.HTMLDocument();
htmldocu .createDocumentFromUrl(url, "");
and I am not sure about the performance, it should be better than loading the html file in a WebBrowser and then grab the HtmlDocument from there. Anyhow, that code does not work on my machine. The application crashes when it tries to execute the second line.
Has anyone an approach to achieve this efficiently or any other way?
NOTE: Please understand that I need the HtmlDocument object for DOM processing. I do not need the html string.
Use the DownloadString method of the WebClient object. e.g.
WebClient client = new WebClient();
string reply = client.DownloadString("http://www.google.com");
In the above example, after executed, reply will contain the html markup of the endpoint http://www.google.com.
WebClient.DownloadString MSDN
In an attempt to answer your actual question from four years ago (at the time of me posting this answer), I'm providing a working solution. I wouldn't be surprised if you found another way to do this, either, so this is mostly for other people searching for a similar solution. Keep in mind, however, that this is considered
somewhat obsolete (the actual use of HtmlDocument)
not the best way to handle HTML DOM parsing (the preferred solution is to use HtmlAgilityPack or CsQuery or some other method using actual parsing and not regular expressions)
extremely hacky and therefore not the safest/most compatible way to do it
you really should not be doing what I'm about to show
Additionally, keep in mind that HtmlDocument is really just a wrapper for mshtml.HTMLDocument2, so it is technically slower than just using a COM wrapper directly, but I completely understand the use case simply for ease of coding.
If you're cool with all of the above, here's how to accomplish what you want.
public class HtmlDocumentFactory
{
private static Type htmlDocType = typeof(System.Windows.Forms.HtmlDocument);
private static Type htmlShimManagerType = null;
private static object htmlShimSingleton = null;
private static ConstructorInfo docCtor = null;
public static HtmlDocument Create()
{
if (htmlShimManagerType == null)
{
// get a type reference to HtmlShimManager
htmlShimManagerType = htmlDocType.Assembly.GetType(
"System.Windows.Forms.HtmlShimManager"
);
// locate the necessary private constructor for HtmlShimManager
var shimCtor = htmlShimManagerType.GetConstructor(
BindingFlags.NonPublic | BindingFlags.Instance, null, new Type[0], null
);
// create a new HtmlShimManager object and keep it for the rest of the
// assembly instance
htmlShimSingleton = shimCtor.Invoke(null);
}
if (docCtor == null)
{
// get the only constructor for HtmlDocument (which is marked as private)
docCtor = htmlDocType.GetConstructors(
BindingFlags.NonPublic | BindingFlags.Instance
)[0];
}
// create an instance of mshtml.HTMLDocument2 (in the form of
// IHTMLDocument2 using HTMLDocument2's class ID)
object htmlDoc2Inst = Activator.CreateInstance(Type.GetTypeFromCLSID(
new Guid("25336920-03F9-11CF-8FD0-00AA00686F13")
));
var argValues = new object[] { htmlShimSingleton, htmlDoc2Inst };
// create a new HtmlDocument without involving WebBrowser
return (HtmlDocument)docCtor.Invoke(argValues);
}
}
To use it:
var htmlDoc = HtmlDocumentFactory.Create();
htmlDoc.Write("<html><body><div>Hello, world!</body></div></html>");
Console.WriteLine(htmlDoc.Body.InnerText);
// output:
// Hello, world!
I have not tested this code directly -- I have translated it from an old Powershell script that needed the same functionality you're requesting. If it fails, let me know. The functionality is there but the code might need very minor tweaking to get working.

How to check that a uri string is valid

How do you check that a uri string is valid (that you can feed it to the Uri constructor)?
So far I only have the following but for obvious reasons I'd prefer a less brute way:
Boolean IsValidUri(String uri)
{
try
{
new Uri(uri);
return true;
}
catch
{
return false;
}
}
I tried Uri.IsWellFormedUriString but it doesn't seem to like everything that you can throw at the constructor. For example:
String test = #"C:\File.txt";
Console.WriteLine("Uri.IsWellFormedUriString says: {0}", Uri.IsWellFormedUriString(test, UriKind.RelativeOrAbsolute));
Console.WriteLine("IsValidUri says: {0}", IsValidUri(test));
The output will be:
Uri.IsWellFormedUriString says: False
IsValidUri says: True
Update/Answer
The Uri constructor uses kind Absolute by default. This was causing a discrepancy when I tried using Uri.TryCreate and the constructor. You do get the expected outcome if you match the UriKind for both the constructor and TryCreate.
A well-formed URI implies conformance with certain RFCs. The local path in your example is not conformant with these. Read more in the IsWellFormedUriString documentation.
A false result from that method does not imply that the Uri class will not be able to parse the input. While the URI input might not be RFC conformant, it still can be a valid URI.
Update: And to answer your question - as the Uri documentation shows, there is a static method called TryCreate that will attempt exactly what you want and return true or false (and the actual Uri instance if true).
Since the accepted answer doesn't provide an explicit example, here is some code to validate URIs in C#:
Uri outUri;
if (Uri.TryCreate("ThisIsAnInvalidAbsoluteURI", UriKind.Absolute, out outUri)
&& (outUri.Scheme == Uri.UriSchemeHttp || outUri.Scheme == Uri.UriSchemeHttps))
{
//Do something with your validated Absolute URI...
}
Assuming we only want to support absolute URI and HTTP requests, here is a function that does what you want:
public static bool IsValidURI(string uri)
{
if (!Uri.IsWellFormedUriString(uri, UriKind.Absolute))
return false;
Uri tmp;
if (!Uri.TryCreate(uri, UriKind.Absolute, out tmp))
return false;
return tmp.Scheme == Uri.UriSchemeHttp || tmp.Scheme == Uri.UriSchemeHttps;
}
In my case I just wanted to test the uri, I don't want to slow down the application testing the uri.
Boolean IsValidUri(String uri){
return Uri.IsWellFormedUriString(uri, UriKind.Absolute);
}
Try it:
private bool IsValidUrl(string address)
{
return Uri.IsWellFormedUriString(address, UriKind.RelativeOrAbsolute);
}
In your case the uri argument is an absolute path which refers to a file location, so as per the doc of the method it returns false. Refer to this

HttpWebRequest Url escaping

I know, the title sounds like this question has been addressed many times. But I am struggling with a specific case and I am very confused over it. Hopefully a seasoned C#'er could point me in the correct direction.
I have the code:
string serviceURL = "https://www.domain.com/service/tables/bucketname%2Ftables%2Ftesttable/imports";
HttpWebRequest dataRequest = (HttpWebRequest)WebRequest.Create(serviceURL);
Now when I quickwatch dataRequest, I see that:
RequestUri: {https://www.domain.com/service/tables/bucketname/tables/testtable/imports}
And it looks like the HttpWebRequest has changed both the %2F to /. However, the server needs the requested Uri to be exactly as serviceURL is written, containing the %2F.
Is there any way to get the HttpWebRequest class to call the Url:
https://www.domain.com/service/tables/bucketname%2Ftables%2Ftesttable/imports
Many thanks! I am at a complete loss here...
-Brett
Kyle posted the answer in a comment, so to make it official:
GETting a URL with an url-encoded slash
It's a weird work around, but nevertheless gets the job done.
As long as the problem lies in %2F being unescaped to "/" there are solutions out there. One involving a hack and for newer versions of .Net, an app.config setting. Check here: How to make System.Uri not to unescape %2f (slash) in path?
However I have still to figure out how to prevent it unescaping some specifically escaped characters, like '(' and ')' (%28 and %29). I have tried all the settings and hacks that I found out there to prevent the Uri class from delivering a partially unescaped path for the WebRequest. The solutions will happily prevent %2F being unescaped, but not %28 and %29 and possible most of the other chars being specifically escaped.
It seems like the WebRequest is specifically asking for 1 value from the Uri object to create the "GET /path HTTP/1.1" syntax: Uri.PathAndQuery which again calls its UriParser.GetComponents.
If you want to download from mediafire and it contains the chars %28 and %29 you will get into a infinite redirect loop as .Net keeps changing %28 and %29 to '(' and ')' and following the redirect (exception: "Too many automatic redirections were attempted").
So this is a solution for those who are stuck and have not been able to find a way to prevent the unescape of some characters.
The only way I have found to override this (currenly using .Net 4.6) and deliver my own PathAndQuery has been a combination of inherting UriParser and hacking its use.
public sealed class MyUriParser : System.UriParser
{
private UriParser _originalParser;
private MethodInfo _getComponentsMethod;
public MyUriParser(UriParser originalParser) : base()
{
if (_originalParser == null)
{
_originalParser = originalParser;
_getComponentsMethod = typeof(UriParser).GetMethod("GetComponents", BindingFlags.NonPublic | BindingFlags.Instance);
if (_getComponentsMethod == null)
{
throw new MissingMethodException("UriParser", "GetComponents");
}
}
}
private static Regex rx = new Regex(#"^(?<Scheme>[^:]+):(?://((?<User>[^#/]+)#)?(?<Host>[^#:/?#]+)(:(?<Port>\d+))?)?(?<Path>([^?#]*)?)?(\?(?<Query>[^#]*))?(#(?<Fragment>.*))?$",RegexOptions.Compiled | RegexOptions.ExplicitCapture | RegexOptions.Singleline);
private Match m = null;
protected override string GetComponents(Uri uri, UriComponents components, UriFormat format)
{
var original = (string)_getComponentsMethod.Invoke(_originalParser, BindingFlags.InvokeMethod, null, new object[] { uri, components, format }, null);
if (components == UriComponents.PathAndQuery)
{
var reg = rx.Match(uri.OriginalString);
var path = reg.Groups["Path"]?.Value;
var query = reg.Groups["Query"]?.Value;
if (path != null && query != null) return $"{path}?{query}";
if (query == null) return $"{path}";
return $"{path}";
}
return original;
}
}
And then hacking it into the Uri instance by replacing its UriParser with this one.
public static Uri CreateUri(string url)
{
var uri = new Uri(url);
if (url.Contains("%28") || url.Contains("%29"))
{
var originalParser = ReflectionHelper.GetValueByReflection(uri, "m_Syntax") as UriParser;
var parser = new MyUriParser(originalParser);
ReflectionHelper.SetValueByReflection(parser, "m_Scheme", "http");
ReflectionHelper.SetValueByReflection(parser, "m_Port", 80);
ReflectionHelper.SetValueByReflection(uri, "m_Syntax", parser);
}
return uri;
}
Due to the way UriParser works, it normally needs to register to have its port and scheme name set, so these 2 values has to be set by reflection as we are not registering it the correct way. I have not found a way to register "http" as it already exist. The ReflectionHelper is just a class I have but can be quickly replaced with normal reflection code.
Then call it like this:
HttpWebRequest dataRequest = (HttpWebRequest)WebRequest.Create(CreateUri(serviceURL));
string serviceURL = Uri.EscapeUriString("https://www.domain.com/service/tables/bucketname%2Ftables%2Ftesttable/imports");

Alternatives to .NET provided apis regarding uris and urls

I've recently come to the realization that the .NET apis working with URLs and URIs frequently come up short in achieving even basic functionality (atleast easily) including things such as: generating a FQDN url from a relative path, forcing https or back to http, getting the root of the site, combining relative urls properly and so forth.
Are there any alternative libraries out there that have put all of these type of functionality in a simple and reliable project?
I've certainly found myself doing much the same URI-manipulation code more than once, in .NET, but I don't see your cases as places it lacks.
Full URI from relative Uri:
new Uri(base, relative) // (works whether relative is a string or a Uri).
Obtaining the actual FQDN:
string host = uri.Host;
string fqdn = hostEndsWith(".") ? host : host + ".";
Forcing https or back to http:
UriBuilder toHttp = new UriBuilder(someUri);
toHttp.Scheme = "http";
toHttp.Port = 80;
return toHttp.Uri;
UriBuilder toHttps = new UriBuilder(someUri);
toHttps.Scheme = "https";
toHttps.Port = 443;
return toHttps.Uri;
Getting the root of the site:
new Uri(startingUri, "/");
Combining relative urls properly:
new Uri(baseUri, relUri); // We had this one already.
Only two of these are more than a single method call, and of those obtaining the FQDN is pretty obscure (unless rather than wanting the dot-ended FQDN you just wanted the absolute URI, in which case we're back to a single method call).
There is a single method version of the HTTPS/HTTP switching, though it's actually more cumbersome since it calls several properties of the Uri object. I can live with it taking a few lines to do this switch.
Still, to provide a new API one need only supply:
public static Uri SetHttpPrivacy(this Uri uri, bool privacy)
{
UriBuilder ub = new UriBuilder(uri);
if(privacy)
{
ub.Scheme = "https";
ub.Port = 443;
}
else
{
ub.Scheme = "http";
ub.Port = 80;
}
return ub.Uri;
}
I really can't see how an API could possibly be any more concise in the other cases.
XUri is a nice class that is part of the open source project from MindTouch
http://developer.mindtouch.com/en/ref/dream/MindTouch.Dream/XUri?highlight=XUri
This article includes a quick sample on how to use it.
http://blog.developer.mindtouch.com/2009/05/18/consuming-rest-services-and-tdd-with-plug/
I am a fan of it. A little overkill assembly wise if you are going to just use the XUri portion, but there are other really nice things in the library too.
I use a combination of extensions with 'System.IO.Path' object as well.
These are just blurbs for example.
public static Uri SecureIfRemote(this Uri uri){
if(!System.Web.HttpContext.Current.Request.IsSecureConnection &&
!System.Web.HttpContext.Current.Request.IsLocal){
return new Uri......(build secure uri here)
}
return uri;
}
public static NameValueCollection ParseQueryString(Uri uri){
return uri.Query.ParseQueryString();
}
public static NameValueCollection ParseQueryString(this string s)
{
//return
return HttpUtility.ParseQueryString(s);
}

Alternative to .NET's Uri implementation?

I have a problem with the .NET's Uri implementation. It seems that if the scheme is "ftp", the query part is not parsed as a Query, but as a part of the path instead.
Take the following code for example:
Uri testuri = new Uri("ftp://user:pass#localhost/?passive=true");
Console.WriteLine(testuri.Query); // Outputs an empty string
Console.WriteLine(testuri.AbsolutePath); // Outputs "/%3Fpassive=true"
It seems to me that the Uri class wrongfully parses the query part as a part of the path. However changing the scheme to http, the result is as expected:
Uri testuri = new Uri("http://user:pass#localhost/?passive=true");
Console.WriteLine(testuri.Query); // Outputs "?passive=true"
Console.WriteLine(testuri.AbsolutePath); // Outputs "/"
Does anyone have a solution to this, or know of an alternative Uri class that works as expected?
Well, the problem is not that I am unable to create a FTP connection, but that URI's are not parsed accoding to RFC 2396.
What I actually intended to do was to create a Factory that provides implementations of a generic File transfer interface (containing get and put methods), based on a given connection URI. The URI defines the protocol, user info, host and path, and any properties needed to be passed should be passed through the Query part of the URI (such as the Passive mode option for the FTP connection).
However this proved difficult using the .NET Uri implementation, because it seems to parse the Query part of URI's differently based on the schema.
So I was hoping that someone knew a workaround to this, or of an alternative to the seemingly broken .NET Uri implementation. Would be nice to know before spending hours implementing my own.
I have been struggling with the same issue for a while. Attempting to replace the existing UriParser for the "ftp" scheme using UriParser.Register throws an InvalidOperationException because the scheme is already registered.
The solution I have come up with involves using reflection to modify the existing ftp parser so that it allows the query string. This is based on a workaround to another UriParser bug.
MethodInfo getSyntax = typeof(UriParser).GetMethod("GetSyntax", System.Reflection.BindingFlags.Static
| System.Reflection.BindingFlags.NonPublic);
FieldInfo flagsField = typeof(UriParser).GetField("m_Flags", System.Reflection.BindingFlags.Instance
| System.Reflection.BindingFlags.NonPublic);
if (getSyntax != null && flagsField != null)
{
UriParser parser = (UriParser)getSyntax.Invoke(null, new object[] { "ftp"});
if (parser != null)
{
int flagsValue = (int)flagsField.GetValue(parser);
// Set the MayHaveQuery attribute
int MayHaveQuery = 0x20;
if ((flagsValue & MayHaveQuery) == 0) flagsField.SetValue(parser, flagsValue | MayHaveQuery);
}
}
Run that somewhere in your initialization, and your ftp Uris will have the query string go into the Query parameter, as you would expect, instead of Path.
You should use the FtpWebRequest and FtpWebResponse classes unless you have a specific reason not to.
FtpWebRequest.fwr = (FtpWebRequest)FtpWebRequest.Create(new Uri("ftp://uri"));
fwr.ftpRequest.Method = WebRequestMethods.Ftp.UploadFile;
fwr.ftpRequest.Credentials = new NetworkCredential("user", "pass");
FileInfo ff = new FileInfo("localpath");
byte[] fileContents = new byte[ff.Length];
using (FileStream fr = ff.OpenRead())
{
fr.Read(fileContents, 0, Convert.ToInt32(ff.Length));
}
using (Stream writer = fwr.GetRequestStream())
{
writer.Write(fileContents, 0, fileContents.Length);
}
FtpWebResponse frp = (FtpWebResponse)fwr.GetResponse();
Response.Write(frp.ftpResponse.StatusDescription);
Ref1 Ref2
You have to use a specific class for FTP protocol like FtpWebRequest that has a Uri property like RequestUri.
You should search in thoses classes for a Uri parser I think.

Categories

Resources