I know, the title sounds like this question has been addressed many times. But I am struggling with a specific case and I am very confused over it. Hopefully a seasoned C#'er could point me in the correct direction.
I have the code:
string serviceURL = "https://www.domain.com/service/tables/bucketname%2Ftables%2Ftesttable/imports";
HttpWebRequest dataRequest = (HttpWebRequest)WebRequest.Create(serviceURL);
Now when I quickwatch dataRequest, I see that:
RequestUri: {https://www.domain.com/service/tables/bucketname/tables/testtable/imports}
And it looks like the HttpWebRequest has changed both the %2F to /. However, the server needs the requested Uri to be exactly as serviceURL is written, containing the %2F.
Is there any way to get the HttpWebRequest class to call the Url:
https://www.domain.com/service/tables/bucketname%2Ftables%2Ftesttable/imports
Many thanks! I am at a complete loss here...
-Brett
Kyle posted the answer in a comment, so to make it official:
GETting a URL with an url-encoded slash
It's a weird work around, but nevertheless gets the job done.
As long as the problem lies in %2F being unescaped to "/" there are solutions out there. One involving a hack and for newer versions of .Net, an app.config setting. Check here: How to make System.Uri not to unescape %2f (slash) in path?
However I have still to figure out how to prevent it unescaping some specifically escaped characters, like '(' and ')' (%28 and %29). I have tried all the settings and hacks that I found out there to prevent the Uri class from delivering a partially unescaped path for the WebRequest. The solutions will happily prevent %2F being unescaped, but not %28 and %29 and possible most of the other chars being specifically escaped.
It seems like the WebRequest is specifically asking for 1 value from the Uri object to create the "GET /path HTTP/1.1" syntax: Uri.PathAndQuery which again calls its UriParser.GetComponents.
If you want to download from mediafire and it contains the chars %28 and %29 you will get into a infinite redirect loop as .Net keeps changing %28 and %29 to '(' and ')' and following the redirect (exception: "Too many automatic redirections were attempted").
So this is a solution for those who are stuck and have not been able to find a way to prevent the unescape of some characters.
The only way I have found to override this (currenly using .Net 4.6) and deliver my own PathAndQuery has been a combination of inherting UriParser and hacking its use.
public sealed class MyUriParser : System.UriParser
{
private UriParser _originalParser;
private MethodInfo _getComponentsMethod;
public MyUriParser(UriParser originalParser) : base()
{
if (_originalParser == null)
{
_originalParser = originalParser;
_getComponentsMethod = typeof(UriParser).GetMethod("GetComponents", BindingFlags.NonPublic | BindingFlags.Instance);
if (_getComponentsMethod == null)
{
throw new MissingMethodException("UriParser", "GetComponents");
}
}
}
private static Regex rx = new Regex(#"^(?<Scheme>[^:]+):(?://((?<User>[^#/]+)#)?(?<Host>[^#:/?#]+)(:(?<Port>\d+))?)?(?<Path>([^?#]*)?)?(\?(?<Query>[^#]*))?(#(?<Fragment>.*))?$",RegexOptions.Compiled | RegexOptions.ExplicitCapture | RegexOptions.Singleline);
private Match m = null;
protected override string GetComponents(Uri uri, UriComponents components, UriFormat format)
{
var original = (string)_getComponentsMethod.Invoke(_originalParser, BindingFlags.InvokeMethod, null, new object[] { uri, components, format }, null);
if (components == UriComponents.PathAndQuery)
{
var reg = rx.Match(uri.OriginalString);
var path = reg.Groups["Path"]?.Value;
var query = reg.Groups["Query"]?.Value;
if (path != null && query != null) return $"{path}?{query}";
if (query == null) return $"{path}";
return $"{path}";
}
return original;
}
}
And then hacking it into the Uri instance by replacing its UriParser with this one.
public static Uri CreateUri(string url)
{
var uri = new Uri(url);
if (url.Contains("%28") || url.Contains("%29"))
{
var originalParser = ReflectionHelper.GetValueByReflection(uri, "m_Syntax") as UriParser;
var parser = new MyUriParser(originalParser);
ReflectionHelper.SetValueByReflection(parser, "m_Scheme", "http");
ReflectionHelper.SetValueByReflection(parser, "m_Port", 80);
ReflectionHelper.SetValueByReflection(uri, "m_Syntax", parser);
}
return uri;
}
Due to the way UriParser works, it normally needs to register to have its port and scheme name set, so these 2 values has to be set by reflection as we are not registering it the correct way. I have not found a way to register "http" as it already exist. The ReflectionHelper is just a class I have but can be quickly replaced with normal reflection code.
Then call it like this:
HttpWebRequest dataRequest = (HttpWebRequest)WebRequest.Create(CreateUri(serviceURL));
string serviceURL = Uri.EscapeUriString("https://www.domain.com/service/tables/bucketname%2Ftables%2Ftesttable/imports");
Related
Given a URL as follows:
foo.bar.car.com.au
I need to extract foo.bar.
I came across the following code :
private static string GetSubDomain(Uri url)
{
if (url.HostNameType == UriHostNameType.Dns)
{
string host = url.Host;
if (host.Split('.').Length > 2)
{
int lastIndex = host.LastIndexOf(".");
int index = host.LastIndexOf(".", lastIndex - 1);
return host.Substring(0, index);
}
}
return null;
}
This gives me like foo.bar.car. I want foo.bar. Should i just use split and take 0 and 1?
But then there is possible wwww.
Is there an easy way for this?
Given your requirement (you want the 1st two levels, not including 'www.') I'd approach it something like this:
private static string GetSubDomain(Uri url)
{
if (url.HostNameType == UriHostNameType.Dns)
{
string host = url.Host;
var nodes = host.Split('.');
int startNode = 0;
if(nodes[0] == "www") startNode = 1;
return string.Format("{0}.{1}", nodes[startNode], nodes[startNode + 1]);
}
return null;
}
I faced a similar problem and, based on the preceding answers, wrote this extension method. Most importantly, it takes a parameter that defines the "root" domain, i.e. whatever the consumer of the method considers to be the root. In the OP's case, the call would be
Uri uri = "foo.bar.car.com.au";
uri.DnsSafeHost.GetSubdomain("car.com.au"); // returns foo.bar
uri.DnsSafeHost.GetSubdomain(); // returns foo.bar.car
Here's the extension method:
/// <summary>Gets the subdomain portion of a url, given a known "root" domain</summary>
public static string GetSubdomain(this string url, string domain = null)
{
var subdomain = url;
if(subdomain != null)
{
if(domain == null)
{
// Since we were not provided with a known domain, assume that second-to-last period divides the subdomain from the domain.
var nodes = url.Split('.');
var lastNodeIndex = nodes.Length - 1;
if(lastNodeIndex > 0)
domain = nodes[lastNodeIndex-1] + "." + nodes[lastNodeIndex];
}
// Verify that what we think is the domain is truly the ending of the hostname... otherwise we're hooped.
if (!subdomain.EndsWith(domain))
throw new ArgumentException("Site was not loaded from the expected domain");
// Quash the domain portion, which should leave us with the subdomain and a trailing dot IF there is a subdomain.
subdomain = subdomain.Replace(domain, "");
// Check if we have anything left. If we don't, there was no subdomain, the request was directly to the root domain:
if (string.IsNullOrWhiteSpace(subdomain))
return null;
// Quash any trailing periods
subdomain = subdomain.TrimEnd(new[] {'.'});
}
return subdomain;
}
You can use the following nuget package Nager.PublicSuffix. It uses the PUBLIC SUFFIX LIST from Mozilla to split the domain.
PM> Install-Package Nager.PublicSuffix
Example
var domainParser = new DomainParser();
var data = await domainParser.LoadDataAsync();
var tldRules = domainParser.ParseRules(data);
domainParser.AddRules(tldRules);
var domainName = domainParser.Get("sub.test.co.uk");
//domainName.Domain = "test";
//domainName.Hostname = "sub.test.co.uk";
//domainName.RegistrableDomain = "test.co.uk";
//domainName.SubDomain = "sub";
//domainName.TLD = "co.uk";
private static string GetSubDomain(Uri url)
{
if (url.HostNameType == UriHostNameType.Dns)
{
string host = url.Host;
String[] subDomains = host.Split('.');
return subDomains[0] + "." + subDomains[1];
}
return null;
}
OK, first. Are you specifically looking in 'com.au', or are these general Internet domain names? Because if it's the latter, there is simply no automatic way to determine how much of the domain is a "site" or "zone" or whatever and how much is an individual "host" or other record within that zone.
If you need to be able to figure that out from an arbitrary domain name, you will want to grab the list of TLDs from the Mozilla Public Suffix project (http://publicsuffix.org) and use their algorithm to find the TLD in your domain name. Then you can assume that the portion you want ends with the last label immediately before the TLD.
I would recommend using Regular Expression. The following code snippet should extract what you are looking for...
string input = "foo.bar.car.com.au";
var match = Regex.Match(input, #"^\w*\.\w*\.\w*");
var output = match.Value;
In addition to the NuGet Nager.PubilcSuffix package specified in this answer, there is also the NuGet Louw.PublicSuffix package, which according to its GitHub project page is a .Net Core Library that parses Public Suffix, and is based on the Nager.PublicSuffix project, with the following changes:
Ported to .NET Core Library.
Fixed library so it passes ALL the comprehensive tests.
Refactored classes to split functionality into smaller focused classes.
Made classes immutable. Thus DomainParser can be used as singleton and is thread safe.
Added WebTldRuleProvider and FileTldRuleProvider.
Added functionality to know if Rule was a ICANN or Private domain rule.
Use async programming model
The page also states that many of above changes were submitted back to original Nager.PublicSuffix project.
The only solution I could find was using:
mshtml.HTMLDocument htmldocu = new mshtml.HTMLDocument();
htmldocu .createDocumentFromUrl(url, "");
and I am not sure about the performance, it should be better than loading the html file in a WebBrowser and then grab the HtmlDocument from there. Anyhow, that code does not work on my machine. The application crashes when it tries to execute the second line.
Has anyone an approach to achieve this efficiently or any other way?
NOTE: Please understand that I need the HtmlDocument object for DOM processing. I do not need the html string.
Use the DownloadString method of the WebClient object. e.g.
WebClient client = new WebClient();
string reply = client.DownloadString("http://www.google.com");
In the above example, after executed, reply will contain the html markup of the endpoint http://www.google.com.
WebClient.DownloadString MSDN
In an attempt to answer your actual question from four years ago (at the time of me posting this answer), I'm providing a working solution. I wouldn't be surprised if you found another way to do this, either, so this is mostly for other people searching for a similar solution. Keep in mind, however, that this is considered
somewhat obsolete (the actual use of HtmlDocument)
not the best way to handle HTML DOM parsing (the preferred solution is to use HtmlAgilityPack or CsQuery or some other method using actual parsing and not regular expressions)
extremely hacky and therefore not the safest/most compatible way to do it
you really should not be doing what I'm about to show
Additionally, keep in mind that HtmlDocument is really just a wrapper for mshtml.HTMLDocument2, so it is technically slower than just using a COM wrapper directly, but I completely understand the use case simply for ease of coding.
If you're cool with all of the above, here's how to accomplish what you want.
public class HtmlDocumentFactory
{
private static Type htmlDocType = typeof(System.Windows.Forms.HtmlDocument);
private static Type htmlShimManagerType = null;
private static object htmlShimSingleton = null;
private static ConstructorInfo docCtor = null;
public static HtmlDocument Create()
{
if (htmlShimManagerType == null)
{
// get a type reference to HtmlShimManager
htmlShimManagerType = htmlDocType.Assembly.GetType(
"System.Windows.Forms.HtmlShimManager"
);
// locate the necessary private constructor for HtmlShimManager
var shimCtor = htmlShimManagerType.GetConstructor(
BindingFlags.NonPublic | BindingFlags.Instance, null, new Type[0], null
);
// create a new HtmlShimManager object and keep it for the rest of the
// assembly instance
htmlShimSingleton = shimCtor.Invoke(null);
}
if (docCtor == null)
{
// get the only constructor for HtmlDocument (which is marked as private)
docCtor = htmlDocType.GetConstructors(
BindingFlags.NonPublic | BindingFlags.Instance
)[0];
}
// create an instance of mshtml.HTMLDocument2 (in the form of
// IHTMLDocument2 using HTMLDocument2's class ID)
object htmlDoc2Inst = Activator.CreateInstance(Type.GetTypeFromCLSID(
new Guid("25336920-03F9-11CF-8FD0-00AA00686F13")
));
var argValues = new object[] { htmlShimSingleton, htmlDoc2Inst };
// create a new HtmlDocument without involving WebBrowser
return (HtmlDocument)docCtor.Invoke(argValues);
}
}
To use it:
var htmlDoc = HtmlDocumentFactory.Create();
htmlDoc.Write("<html><body><div>Hello, world!</body></div></html>");
Console.WriteLine(htmlDoc.Body.InnerText);
// output:
// Hello, world!
I have not tested this code directly -- I have translated it from an old Powershell script that needed the same functionality you're requesting. If it fails, let me know. The functionality is there but the code might need very minor tweaking to get working.
Why does the Uri class urldecode my url that I send to its contructor and how can I prevent this?
Example (look at the querystring value "options"):
string url = "http://www.example.com/default.aspx?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3";
Uri uri = new Uri(url); // http://www.example.com/default.aspx?id=1&name=andreas&options=one=1&two=2&three=3
Update:
// ?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3
Request.QueryString["options"] = one=1&two=2&three=3
// ?id=1&name=andreas&options=one=1&two=2&three=3
Request.QueryString["options"] = one=1
This is my problem :)
why exactly?
you can get to the encoded version using url.AbsoluteUri
EDIT
Console.WriteLine("1) " + uri.AbsoluteUri);
Console.WriteLine("2) " + uri.Query);
OUT:
1) http://www.example.com/default.aspx?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3
2) ?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3
I would expect that from a Uri class. I am quite sure that it still gets you in a good place if you use it with e.g. WebClient class (i.e. WebClient.OpenRead (Uri uri)). What's the problem in your case?
This is how the internal code of .NET behaves - in previous versions you could use another constructor of Uri that accepted boolean value telling if to escape or not, but it has been deprecated.
The only way around it is hackish: accessing some private method directly by means of reflection:
string url = "http://www.example.com/default.aspx?id=1&name=andreas&options=one%3d1%26two%3d2%26three%3d3";
Uri uri = new Uri(url);
MethodInfo mi = uri.GetType().GetMethod("CreateThis", BindingFlags.NonPublic | BindingFlags.Instance);
if (mi != null)
mi.Invoke(uri, new object[] { url, true, UriKind.RelativeOrAbsolute });
This worked for me in quick test, but not ideal as you "hack" into .NET internal code.
I've recently come to the realization that the .NET apis working with URLs and URIs frequently come up short in achieving even basic functionality (atleast easily) including things such as: generating a FQDN url from a relative path, forcing https or back to http, getting the root of the site, combining relative urls properly and so forth.
Are there any alternative libraries out there that have put all of these type of functionality in a simple and reliable project?
I've certainly found myself doing much the same URI-manipulation code more than once, in .NET, but I don't see your cases as places it lacks.
Full URI from relative Uri:
new Uri(base, relative) // (works whether relative is a string or a Uri).
Obtaining the actual FQDN:
string host = uri.Host;
string fqdn = hostEndsWith(".") ? host : host + ".";
Forcing https or back to http:
UriBuilder toHttp = new UriBuilder(someUri);
toHttp.Scheme = "http";
toHttp.Port = 80;
return toHttp.Uri;
UriBuilder toHttps = new UriBuilder(someUri);
toHttps.Scheme = "https";
toHttps.Port = 443;
return toHttps.Uri;
Getting the root of the site:
new Uri(startingUri, "/");
Combining relative urls properly:
new Uri(baseUri, relUri); // We had this one already.
Only two of these are more than a single method call, and of those obtaining the FQDN is pretty obscure (unless rather than wanting the dot-ended FQDN you just wanted the absolute URI, in which case we're back to a single method call).
There is a single method version of the HTTPS/HTTP switching, though it's actually more cumbersome since it calls several properties of the Uri object. I can live with it taking a few lines to do this switch.
Still, to provide a new API one need only supply:
public static Uri SetHttpPrivacy(this Uri uri, bool privacy)
{
UriBuilder ub = new UriBuilder(uri);
if(privacy)
{
ub.Scheme = "https";
ub.Port = 443;
}
else
{
ub.Scheme = "http";
ub.Port = 80;
}
return ub.Uri;
}
I really can't see how an API could possibly be any more concise in the other cases.
XUri is a nice class that is part of the open source project from MindTouch
http://developer.mindtouch.com/en/ref/dream/MindTouch.Dream/XUri?highlight=XUri
This article includes a quick sample on how to use it.
http://blog.developer.mindtouch.com/2009/05/18/consuming-rest-services-and-tdd-with-plug/
I am a fan of it. A little overkill assembly wise if you are going to just use the XUri portion, but there are other really nice things in the library too.
I use a combination of extensions with 'System.IO.Path' object as well.
These are just blurbs for example.
public static Uri SecureIfRemote(this Uri uri){
if(!System.Web.HttpContext.Current.Request.IsSecureConnection &&
!System.Web.HttpContext.Current.Request.IsLocal){
return new Uri......(build secure uri here)
}
return uri;
}
public static NameValueCollection ParseQueryString(Uri uri){
return uri.Query.ParseQueryString();
}
public static NameValueCollection ParseQueryString(this string s)
{
//return
return HttpUtility.ParseQueryString(s);
}
I have a problem with the .NET's Uri implementation. It seems that if the scheme is "ftp", the query part is not parsed as a Query, but as a part of the path instead.
Take the following code for example:
Uri testuri = new Uri("ftp://user:pass#localhost/?passive=true");
Console.WriteLine(testuri.Query); // Outputs an empty string
Console.WriteLine(testuri.AbsolutePath); // Outputs "/%3Fpassive=true"
It seems to me that the Uri class wrongfully parses the query part as a part of the path. However changing the scheme to http, the result is as expected:
Uri testuri = new Uri("http://user:pass#localhost/?passive=true");
Console.WriteLine(testuri.Query); // Outputs "?passive=true"
Console.WriteLine(testuri.AbsolutePath); // Outputs "/"
Does anyone have a solution to this, or know of an alternative Uri class that works as expected?
Well, the problem is not that I am unable to create a FTP connection, but that URI's are not parsed accoding to RFC 2396.
What I actually intended to do was to create a Factory that provides implementations of a generic File transfer interface (containing get and put methods), based on a given connection URI. The URI defines the protocol, user info, host and path, and any properties needed to be passed should be passed through the Query part of the URI (such as the Passive mode option for the FTP connection).
However this proved difficult using the .NET Uri implementation, because it seems to parse the Query part of URI's differently based on the schema.
So I was hoping that someone knew a workaround to this, or of an alternative to the seemingly broken .NET Uri implementation. Would be nice to know before spending hours implementing my own.
I have been struggling with the same issue for a while. Attempting to replace the existing UriParser for the "ftp" scheme using UriParser.Register throws an InvalidOperationException because the scheme is already registered.
The solution I have come up with involves using reflection to modify the existing ftp parser so that it allows the query string. This is based on a workaround to another UriParser bug.
MethodInfo getSyntax = typeof(UriParser).GetMethod("GetSyntax", System.Reflection.BindingFlags.Static
| System.Reflection.BindingFlags.NonPublic);
FieldInfo flagsField = typeof(UriParser).GetField("m_Flags", System.Reflection.BindingFlags.Instance
| System.Reflection.BindingFlags.NonPublic);
if (getSyntax != null && flagsField != null)
{
UriParser parser = (UriParser)getSyntax.Invoke(null, new object[] { "ftp"});
if (parser != null)
{
int flagsValue = (int)flagsField.GetValue(parser);
// Set the MayHaveQuery attribute
int MayHaveQuery = 0x20;
if ((flagsValue & MayHaveQuery) == 0) flagsField.SetValue(parser, flagsValue | MayHaveQuery);
}
}
Run that somewhere in your initialization, and your ftp Uris will have the query string go into the Query parameter, as you would expect, instead of Path.
You should use the FtpWebRequest and FtpWebResponse classes unless you have a specific reason not to.
FtpWebRequest.fwr = (FtpWebRequest)FtpWebRequest.Create(new Uri("ftp://uri"));
fwr.ftpRequest.Method = WebRequestMethods.Ftp.UploadFile;
fwr.ftpRequest.Credentials = new NetworkCredential("user", "pass");
FileInfo ff = new FileInfo("localpath");
byte[] fileContents = new byte[ff.Length];
using (FileStream fr = ff.OpenRead())
{
fr.Read(fileContents, 0, Convert.ToInt32(ff.Length));
}
using (Stream writer = fwr.GetRequestStream())
{
writer.Write(fileContents, 0, fileContents.Length);
}
FtpWebResponse frp = (FtpWebResponse)fwr.GetResponse();
Response.Write(frp.ftpResponse.StatusDescription);
Ref1 Ref2
You have to use a specific class for FTP protocol like FtpWebRequest that has a Uri property like RequestUri.
You should search in thoses classes for a Uri parser I think.