C#: Convert http://www.google.com to google.com - c#

I would like to convert a URL like this: http://www.google.com to this google.com. So if the format of the URL is like this http://google.com it will be converted to this google.com
I need to determine if two urls are the same no matter the format of the URL. Is there a way to do this. I am using the Uri library in C# but the host value is different for http://www.google.com and http://google.com

You can use the Uri class to parse URIs.
Example: Get just the domain name from a URL?
You can extract more than the host name, too. Here is a full list of properties you can get from a Uri instance: http://msdn.microsoft.com/en-us/library/system.uri_properties(v=VS.71).aspx

Related

Get Image Absolute URL From Some Node in HtmlAgilityPack.HtmlDocument

I want fetch some webpage from internet, and get absolute URLs of some images on the page by using HtmlAgilityPack in C#.
The problem is...
The website will first redirect the URL to some other one, and then the src attribute in the <img> tag is related URL.
Currently, I have some codes like this:
using HtmlAgilityPack;
HtmlDocument webpageDocument = new HtmlWeb().Load("http://xyz.example.com/");
HtmlNodeCollection nodes = webpageDocument.DocumentNode.SelectNodes("//img");
String url = nodes[0].Attributes["src"].Value.ToString();
Above codes fetch a webpage from the given example url, and get some <img> element from the DOM tree, and get src attribute of it.
It works if the <img> has absolute url. But unfortunately the website I want to handle give me a related URI (e.g. /img/01.png). I need the absolute URL so that I can do more options about the image.
So, I need to know what URL is the base URL for given src, but failed. Or, in another word, I don't know how to get the location of the webpage after redirect.
Server side is not mine (I have no control to it).
Consider ResponseUri and to avoid second call give html agility parser the string with the content of the page instead.

How pass url params to route?

I add route to project.All work correct, but if i send normal url as first param its not correctly work.
Get["/{url}/{digit}"
If i send this params to server-all work correctly.
localhost:8888/google.com/2
But if i send param with http://www its not work.
localhost:8888/https://www.google.com/2
How correct pass url param to route? I think it because Nancy think that i send 3 input param.
If you really need to use GET instead of POST try HttpUtility.UrlEncode("https://google.com/2") to urlencode your url.
You have to encode your url which is send as a paramater:
Use:
var encodedString = Uri.EscapeDataString("https://www.google.com/2");
then your url will look like this and it shouldn't get any errors:
https%3A%2F%2Fwww.google.com%2F2
Sending the request:
localhost:8888/https%3A%2F%2Fwww.google.com%2F2
Or ou can use the
HttpUtility.UrlEncode();
method. For further information have a look at this.
Since you insist on changing the backend only, you could try using a regex to capture your route
Get["^(?<url>.*<digit>[0-9]+)$"]
This should match any url ending with atleast one number, and put everything before it in url like so:
Get["^(?<url>.*<digit>[0-9]+)$"] = parameters =>
{
var url = parameters.url;
var digit = parameters.digit;
};
I am currently unable to verify if this works as you want it to though, and to make sure you can adjust this yourself make sure to look into how to write regex

pdf.js how to pass a file response object in c#

so I am using the excellent pdf.js tool and it works great. However, I'd like to pass it a response object that has the PDF file stream from Amazon S3 over to the viewer of pdf.js.
In the demo I see it calls it like this:
=/pdf/web/viewer.html?file=%2FmypdfFile.pdf
However, looking in viewer.html or pdf.js or any of its files, I cannot see where on earth its using the ?file parameter that is passed on the URL. I'd like to replace it with something where I can pass it a response item and it will load up the viewer.html.
I'd like to do something like this:
(pseudo code sorta)
request = S3.GetObjectRequest(bucket, key);
using GetObjectResponse response - client.getobject(request);
openPDFViewer (response);
Is that doable? response would contain the file, i.e. I can say
response.WriteResponseToFile("c:\mypdf.pdf")
and I get the file out.
I cannot see where on earth its using the ?file parameter that is
passed on the URL. I'd like to replace it with something where I can
pass it a response item and it will load up the viewer.html.
If you look into viewer.js, there is this pdfViewOpen method having following parameter:-
pdfViewOpen(url, id, scale, password,pdfDataRangeTransport, args)
The string url that you pass after file= is passed in as url parameter of this method. You can change inside that method what u want to do with it.
From there you might want to look at PDFJS.getDocument() method of pdf.js file, this follows to fetchDocument method from where its taken care by messageHandler.
If you want to handle all that by yourself, you can intercept it before. There are already several examples on SO. Here is link to one:-
How to get byte-array data from servlet to pdf.js
IIRC S3 files have URL references so you don't have to 'preload' the pdf on the server side. Just render out the URL to have the file=[urlencoded path to s3 file]

Should a .Net protocol-less URI showing as relative?

All,
In HTML, it is my understanding that a url that starts // (e.g. //www.google.com) refers to a protocol-less url that should be requested in the same scheme as that in which the page was served.
However, the following c# code fails
var uri = new Uri("//www.google.com", UriKind.RelativeOrAbsolute);
Assert.IsTrue(uri.IsAbsoluteUri);
Am I missing something here? At the moment I am rolling my own regex to find out if a URI is absolute:
return Regex.IsMatch(url, #"^(https?:)?//")
It's not absolute. It's relative to whether the URL is accessed from a source that is served over HTTP, HTTPS, or something else.

Combine complete URL and virtual URL, like a browser does

I have a complete URL like: A: http://www.domain.com/aaa/bbb/ccc/ddd/eee.ext.
I have a relative URL like: B: ../../fff.ext
I’m looking for the easiest way in .NET C# to combine these two URLs and get:
C: http://www.domain.com/aaa/bbb/fff.ext
This is like what browsers does: you’re browsing URL A, then, page’s HTML have an hyperlink as B, the resulting URL is C.
You'd probably have better luck looking up "PathCanonicalize".
Also, with my findings, one of the overloaded Uri constructors can handle this:
Uri combined = new Uri(
new Uri("http://www.domain.com/aaa/bbb/ccc/ddd/eee.ext", UriKind.Absolute),
"../../fff.ext"
);
Proof is in the pudding

Categories

Resources