Webmaster Tools API C# - Url Encoded SiteID causing 400 response - c#

The Webmaster Tools API requires a SiteID for most operations.
This SiteID is a Url Encoded version of the site's url, as appears in the Google Webmaster Tools dashboard.
So why does the next URL doesn't work (the dreaded "Bad Request", or "Site Not Found")?
var site = "http://example.com/";
var urlEncoded = HttpUtility.UrlEncode(site);
var url = "https://www.google.com/webmasters/tools/feeds/" + urlEncoded + "/crawlissues/";

Google expects upper case letters for the encoded characters, while HttpUtility.UrlEncode produces lower case characters.
See this answer for a "selective ToUpper" method implementation.
(Another thing, the last slash might make a difference! http://x.com/ is not http://x.com)

Related

Prevent unescaping url in outbound request

If I do this in .Net Core 3.1:
await new HttpClient().GetAsync("http://test.com/page?parameter=%2D%2E%5F%2E%2D");
then this happens:
GET http://test.com/page?parameter=-._.- HTTP/1.1
but this is what I want:
GET http://test.com/page?parameter=%2D%2E%5F%2E%2D HTTP/1.1
The background is that I get a signed Url from a third party and I need to use the url as it is, non-unescaped. I manage to find the resource with the unescaped url, but the signature check fails on the other end because the url they see in the request is not the url that was signed.
I can paste the url into any browser and get the resource, but the signature check fails when I do it programatically in .Net Core 3.1.
The unescaping is supposed to happen according to documentation on the Uri Class:
Escaped characters (also known as percent-encoded octets) that don't
have a reserved purpose are decoded (also known as being unescaped).
These unreserved characters include uppercase and lowercase letters
(%41-%5A and %61-%7A), decimal digits (%30-%39), hyphen (%2D), period
(%2E), underscore (%5F), and tilde (%7E).
I have tried solutions listed in these questions:
GETting a URL with an url-encoded slash. But the schemeSetting seems not to work for .Net Core 3.1 and and neither does the workaround ForceCanonicalPathAndQuery.
How to make System.Uri not to unescape %2f (slash) in path?. Again schemeSetting seems not to work for .Net Core 3.1, and neither does the workaround LeaveDotsAndSlashesEscaped.
So, does anyone know how I can use the signed url as is, non-unescaped, on .Net Core 3.1?
So after fiddling around a bit I came up with this:
private static Uri CreateNonUnescapedUri(string url)
{
// Initiate Uri as e.g "http://test.com" so internal flags will indicate that the url does not include characters that needs unescaping
int offset = url.IndexOf("://");
offset = url.IndexOf('/', offset + 4);
var uri = new Uri(url.Substring(0, offset));
// Then replace internal field with complete url that would otherwise be unescaped
typeof(Uri).GetField("_string", BindingFlags.Instance | BindingFlags.NonPublic).SetValue(uri, url);
return uri;
}
I tested it on 300 signed url's.
Offcourse changing the internal state of the Uri which is 5600 lines of pure madness is bound to fail in the future, but I need this working by monday and this is what I've got. Let me know if anyone has a real solution.
Edit April 2022:
In .Net 6 there is a new constructor that will keep the original url as is, using UriCreationOptions:
var uri = new Uri("http://test.com/page?parameter=%2D%2E%5F%2E%2D",
new UriCreationOptions { DangerousDisablePathAndQueryCanonicalization = true });
I have no idea whats supposedly dangerous about it though.
For .Net Core 3.1 I'm still using the hack above, I never did find a better solution for it.

Different results of UTF-8 encoding to URL parameter with particular languages between RawUrl and QueryString in HttpRequest

Update - 31 Aug
string test = "ö";
string unicode1 = HttpUtility.UrlEncode(test, Encoding.Unicode);
string unicode2 = HttpUtility.UrlEncodeUnicode(test);
Console.WriteLine("Result of unicode1: " + unicode1);
Console.WriteLine("Result of unicode2: " + unicode2);
we can see the different results. now the case is when i was using UrlEncode the parameter in URL for posting the data, it returns would become unicode2 when the browser is getting the resources.
Update - 30 Aug please click the link to see the tracing httprequest
it was strange that the value of the parameter "nm" in QueryString became different, its original string was "ööö", so we can see it in Url, after encoding by UTF-8, it became "%c3%b6%c3%b6%c3%b6%" in RawUrl, in normal case, it should keep the same result of encoding between RawUrl and QueryString. Does anyone know the reason?
I encountered an issue that the URL referrer will become to null after clicking the button.
I filled in the text "öööööööö", you can see the different encoding between c# and IE.
the url encoded by c#, it was captured by Fiddler
the url encoded by IE, it was displayed in IE status bar
Result of encoding by UTF8: “%c3%b6%c3%b6%c3%b6%c3%b6%c3%b6%c3%b6%c3%b6%c3%b6”
Result of encoding by IE: “%u00f6%u00f6%u00f6%u00f6%u00f6%u00f6%u00f6%u00f6”
does anyone know how it happen and give a hand?

server.htmlencode of asp.net equivalent in asp.net mvc

I have a ASP.NET MVC controller which is making call to another service using HttpClient class.
var url = "some url";
var client = new HttpClient();
var result= client.GetAsync(url);
The URL I am sending contains some special characters. How can encode special characters in ASP.NET MVC controller?
Thanks!!1
Try this:
url = HttpUtility.UrlEncode(url);
As you are considering a URL that will be used as such for a request (with HttpClient.getAsync) -- not as an argument within another URL -- you should use Uri.EscapeUriString.
Here is a comparison of three methods for the following URL:
var url = "http://some url?data=x y+z&user=1#ok";
HttpUtility.UrlEncode
Console.WriteLine(HttpUtility.UrlEncode(url));
http%3a%2f%2fsome+url%3fdata%3dx+y%2bz%26user%3d1%23ok
Obviously, this is not desired: the URL got damaged with / escaped, a + entered in the path, ...etc. The method seems useful for the query part of the URL, but not for the whole lot.
HttpUtility.UrlPathEncode
Console.WriteLine(HttpUtility.UrlPathEncode(url));
http://some%20url?data=x y+z&user=1#ok
This looks useful, although the space is a bit of a problem in the query part (notice the broken hyperlinking here, although browser can deal with it). But more importantly, the method is being deprecated:
Do not use; intended only for browser compatibility. Use UrlEncode.
Uri.EscapeUriString
Console.WriteLine(Uri.EscapeUriString(url));
http://some%20url?data=x%20y+z&user=1#ok
This seems to do the job well: %20 is an escape sequence that all modern browsers should support, also when occurring in the query part of the URL.
There is no need it encoding in Razor View Engine starting from 3rd version, and it's very convenient. Instead if you want to use tags you should use:
#Html.Raw(myString)
So basically just using Razor comes with encoding by default.
You should use HttpUtility.UrlPathEncode
When you use url = HttpUtility.UrlEncode(url) it doesn't work fine with spaces.

Why is my S3 pre-signed request invalid when I set a response header override that contains a "+"?

I'm using the Amazon .NET SDK to generate a pre-signed URL like this:
public System.Web.Mvc.ActionResult AsActionResult(string contentType, string contentDisposition)
{
ResponseHeaderOverrides headerOverrides = new ResponseHeaderOverrides();
headerOverrides.ContentType = contentType;
if (!string.IsNullOrWhiteSpace(contentDisposition))
{
headerOverrides.ContentDisposition = contentDisposition;
}
GetPreSignedUrlRequest request = new GetPreSignedUrlRequest()
.WithBucketName(bucketName)
.WithKey(objectKey)
.WithProtocol(Protocol.HTTPS)
.WithExpires(DateTime.Now.AddMinutes(6))
.WithResponseHeaderOverrides(headerOverrides);
string url = S3Client.GetPreSignedURL(request);
return new RedirectResult(url, permanent: false);
}
This works perfectly, except if my contentType contains a + in it. This happens when I try to get an SVG file, for example, which gets a content type of image/svg+xml. In this case, S3 throws a SignatureDoesNotMatch error.
The error message shows the StringToSign like this:
GET 1234567890 /blah/blabh/blah.svg?response-content-disposition=filename="blah.svg"&response-content-type=image/svg xml
Notice there's a space in the response-content-type, where it now says image/svg xml instead of image/svg+xml. It seems to me like that's what is causing the problem, but what's the right way to fix it?
Should I be encoding my content type? Enclose it within quotes or something? The documentation doesn't say anything about this.
Update
This bug has been fixed as of Version 1.4.1.0 of the SDK.
Workaround
This is a confirmed bug in the AWS SDK, so until they issue a fix I'm going with this hack to make things work:
Specify the content type exactly how you want it to look like in the response header. So, if you want S3 to return a content type of image/svg+xml, set it exactly like this:
ResponseHeaderOverrides headerOverrides = new ResponseHeaderOverrides();
headerOverrides.ContentType = "image/svg+xml";
Now, go ahead and generate the pre signed request as usual:
GetPreSignedUrlRequest request = new GetPreSignedUrlRequest()
.WithBucketName(bucketName)
.WithKey(objectKey)
.WithProtocol(Protocol.HTTPS)
.WithExpires(DateTime.Now.AddMinutes(6))
.WithResponseHeaderOverrides(headerOverrides);
string url = S3Client.GetPreSignedURL(request);
Finally, "fix" the resulting URL with the properly URL encoded value for your content type:
url = url.Replace(contentType, HttpUtility.UrlEncode(contentType));
Yes, it's a dirty workaround but, hey, it works for me! :)
Strange indeed - I've been able reproduce this easily, with the following observed behavior:
replacing + in the the URL generated by GetPreSignedURL() with its encoded form %2B yields a working URL/signature
this holds true, no matter whether / is replaced with its encoded form %2F or not
encoding the contentType upfront before calling GetPreSignedURL(), e.g. via the HttpUtility.UrlEncode Method, yields invalid signatures regardless of any variation of the generated URL
Given how long this functionality is available already, this is somewhat surprising, but I'd still consider it to be a bug - accordingly it might be best to inquiry about this in the Amazon Simple Storage Service forum.
Update: I just realized you asked the same question there already and the bug got confirmed indeed, so the correct answer can be figured out over time by monitoring the AWS team response ;)
Update: This bug has been fixed as of Version 1.4.1.0 of the SDK.

HttpWebRequest long URI workaround?

I've encountered an issue with HttpWebRequest that if the URI is over 2048 characters long the request fails and returns a 404 error even though the server is perfectly capable of servicing a request with a URI that long. I know this since the same URI that causes an error if submitted via HttpWebRequest works fine when pasted directly into a browser address bar.
My current workaround is to allow users to set a compatability flag to say that it's safe to send the parameters as a POST request instead in the case where the URI would be too long but this is not ideal since the protocol I'm using is RESTful and GET should be used for queries. Plus there is no guarentee that other implementors of the protocol will accept POSTed queries
Is there another class in .Net that has equivalent functionality to HttpWebRequest that doesn't suffer from the URI length limit that I could use?
I'm aware of WebClient but I don't really want to use that as I need to be able to fully control the HTTP Headers which WebClient restricts the ability to do.
Edit
Because Shoban asked for it:
http://localhost/BBCDemo/sparql/?query=PREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0D%0APREFIX+skos%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%3E%0D%0APREFIX+dc%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%0D%0APREFIX+po%3A+%3Chttp%3A%2F%2Fpurl.org%2Fontology%2Fpo%2F%3E%0D%0APREFIX+timeline%3A+%3Chttp%3A%2F%2Fpurl.org%2FNET%2Fc4dm%2Ftimeline.owl%23%3E%0D%0ASELECT+*+WHERE+{%0D%0A++++%3Chttp%3A%2F%2Fwww.bbc.co.uk%2Fprogrammes%2Fb00n4d6y%23programme%3E+dc%3Atitle+%3Ftitle+.%0D%0A++++%3Chttp%3A%2F%2Fwww.bbc.co.uk%2Fprogrammes%2Fb00n4d6y%23programme%3E+po%3Ashort_synopsis+%3Fsynopsis-short+.%0D%0A++++%3Chttp%3A%2F%2Fwww.bbc.co.uk%2Fprogrammes%2Fb00n4d6y%23programme%3E+po%3Amedium_synopsis+%3Fsynopsis-med+.%0D%0A++++%3Chttp%3A%2F%2Fwww.bbc.co.uk%2Fprogrammes%2Fb00n4d6y%23programme%3E+po%3Along_synopsis+%3Fsynopsis-long+.%0D%0A++++%3Chttp%3A%2F%2Fwww.bbc.co.uk%2Fprogrammes%2Fb00n4d6y%23programme%3E+po%3Amasterbrand+%3Fchannel+.%0D%0A++++%3Chttp%3A%2F%2Fwww.bbc.co.uk%2Fprogrammes%2Fb00n4d6y%23programme%3E+po%3Agenre+%3Fgenre+.%0D%0A++++%3Fchannel+dc%3Atitle+%3Fchanneltitle+.%0D%0A++++OPTIONAL+{%0D%0A++++++++%3Chttp%3A%2F%2Fwww.bbc.co.uk%2Fprogrammes%2Fb00n4d6y%23programme%3E+po%3Abrand+%3Fbrand+.%0D%0A++++++++%3Fbrand+dc%3Atitle+%3Fbrandtitle+.%0D%0A++++}%0D%0A++++OPTIONAL+{%0D%0A++++++++%3Chttp%3A%2F%2Fwww.bbc.co.uk%2Fprogrammes%2Fb00n4d6y%23programme%3E+po%3Aversion+%3Fver+.%0D%0A++++++++%3Fver+po%3Atime+%3Finterval+.%0D%0A++++++++%3Finterval+timeline%3Astart+%3Fstart+.%0D%0A++++++++%3Finterval+timeline%3Aend+%3Fend+.%0D%0A++++}%0D%0A}&default-graph-uri=&timeout=30000
Which is the following encoded onto the querystring:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX po: <http://purl.org/ontology/po/>
PREFIX timeline: <http://purl.org/NET/c4dm/timeline.owl#>
SELECT * WHERE {
<http://www.bbc.co.uk/programmes/b00n4d6y#programme> dc:title ?title .
<http://www.bbc.co.uk/programmes/b00n4d6y#programme> po:short_synopsis ?synopsis-short .
<http://www.bbc.co.uk/programmes/b00n4d6y#programme> po:medium_synopsis ?synopsis-med .
<http://www.bbc.co.uk/programmes/b00n4d6y#programme> po:long_synopsis ?synopsis-long .
<http://www.bbc.co.uk/programmes/b00n4d6y#programme> po:masterbrand ?channel .
<http://www.bbc.co.uk/programmes/b00n4d6y#programme> po:genre ?genre .
?channel dc:title ?channeltitle .
OPTIONAL {
<http://www.bbc.co.uk/programmes/b00n4d6y#programme> po:brand ?brand .
?brand dc:title ?brandtitle .
}
OPTIONAL {
<http://www.bbc.co.uk/programmes/b00n4d6y#programme> po:version ?ver .
?ver po:time ?interval .
?interval timeline:start ?start .
?interval timeline:end ?end .
}
}
the protocol I'm using is RESTful and GET should be used for queries.
There's no reason POST can't also be used for queries; for really long request data you have to, as very-long-URIs aren't globally supported, and have never been. This is one area where HTTP does not live up to the REST ideal.
The reason POST generally isn't used on a plain-HTML level is to stop the browser prompting for reloads, and promote eg. bookmarking. But for HttpWebRequest you don't have either of those concerns, so go ahead and POST it. Web applications should use a parameter or a URI path part to distinguish write requests from queries, not merely the request method. (Of course a write request from a GET method should still be denied.)
I don't think HttpWebRequest is actually incompatible with GET URLs of the size you are talking about. I say this based on two things:
In my own work I use HttpWebRequest to send HTTP GET requests longer than 2048 characters without trouble. I'm not sure what my longest ones are, but we're talking 10,000+ characters. (This is primarily between a web application and an instance of Solr running under Tomcat.)
.NET does have some limits on GET URL lengths, but the ones I'm aware of are much higher than 2048 characters. For example, I learned today from my profiler that WebRequest.Create(string url) calls the Uri class constructor, and that is documented to throw a UriFormatException if "the length of uriString exceeds 65534 characters."
I'm not sure where your problem might be, if it's not HttpWebRequest itself. Do you know under what conditions your web service will return HTTP 404 (i.e. "not found")? (I assume the 404 is coming from your web service, rather than being faked inside the depths of .NET.) I'd also want to double-check that the address you're pasting into the browser is actually the same one that's being sent by .NET; as feroze suggested, you should use a network sniffing tool for this. If the two addresses are the same, then maybe next compare how the HTTP headers vary between the .NET case and the browser case. (Incidentally, I personally find Fiddler a bit handier than wireshark for HTTP debugging tasks along these lines.)
See also this somewhat related question: How does HttpWebRequest differ (functional) from pasteing a URL into an address bar?
Here's a snippet which constructs HttpWebRequest instances with bigger and bigger url values until an exception gets thrown:
using System.Net;
...
StringBuilder url = new StringBuilder("http://example.com?p=");
try
{
for (int i = 1; i < Int32.MaxValue; i++)
{
url.Append("0");
HttpWebRequest request = HttpWebRequest.CreateHttp(url.ToString());
}
}
catch (Exception ex)
{
Console.Out.WriteLine("Error occurred at url length: " + url.Length);
Console.Out.WriteLine(ex.GetType().ToString() + ": " + ex.Message);
return;
}
Console.Out.WriteLine("Completed without error!");
On my machine (in LINQPad running .Net 4.5), this snippet outputs:
Error occurred at url length: 65520
System.UriFormatException: Invalid URI: The Uri string is too long.
Your query string is wrong according to RFC3986. '{' and '}' characters are not allowed in a URI.

Categories

Resources