How do I validate whether a URL location exists or not?

How do I validate whether a URL location exists or not? - c#

We're hosting SSIS reports on our servers and we are storing their paths in a sql server table. From .Net, I want to be able to make sure the path entered is correct.
This should return "true" because there's a report there, for example:
http://server/Reports/Pages/Viewer.aspx?%2fShopFloor%2fProduction+by+Turn&rs:Command=Render
This should return false.
http://I am an invalid location/I am laughing at you/now I'm UEEing
I was looking at WebRequests, but I don't know if that's the route I should be taking.
Any help would be appreciated. Thanks!

You can try making a HEAD request to validate that the resource exists. With a HEAD request, you would only need the HTTP Code (200 = Success, 404 = Not Found) without consuming resources or excess memory to download the entire resource. Take a look at HttpWebRequest class for performing the actual request.

Do a http HEAD request to the URL, which should just fetch headers. This way you dont need to download the entire page if it exists. from the returned headers(if there are any) you should be able to determine if its a correct URL or not.

I'm unsure of the exact layout of your network, but if the .net application has visibility to the location the reports are stored, you could use File.Exists(). Otherwise, mellamokb and red-X have the right idea.

Related

How should I handle users browsing to pages in my site meant only for AJAX? Should I ever use GET?

Similar questions have been asked about the nature of when to use POST and when to use GET in an AJAX request
Here:
What are the advantages of using a GET request over a POST request?
and here: GET vs. POST ajax requests: When and how to use either?
However, I want to make it clear that that is not exactly what I am asking. I get idempotence, sensitive data, the ability for browsers to be able to try again in the event of an error, and the ability for the browser to be able to cache query string data.
My real scenario is such that I want to prevent my users from being able to simply enter in the URL to my "Compute.cshtml" file (i.e. the file on the server that my jQuery $.ajax function posts to).
I am in a WebMatrix C#.net web-pages environment and I have tried to precede the file name with an underscore (_), but apparently an AJAX request falls under the same criteria that this underscore was designed to prevent the display of and it, of course, breaks the request.
So if I use POST I can simply use this logic:
if (!IsPost) //if this is not a post...
{
Response.Redirect("~/") //...redirect back to home page.
}
If I use GET, I suppose I can send additional data like a string containing the value "AccessGranted" and check it on the other side to see if it equals this value and redirect if not, but this could be easily duplicated through typing in the address bar (not that the data is sensitive on the other side, but...).
Anyway, I suppose I am asking if it is okay to always use POST to handle this logic or what the appropriate way to handle my situation is in regards to using GET or POST with AJAX in a WebMatrix C#.net web-pages environment.

My advice is, don't try to stop them. It's harmless.
You won't have direct links to it, so it won't really come up. (You might want your robots.txt to exclude the whole /api directory, for Google's sake).
It is data they have access to anyway (otherwise you need server-side trimming), so you can't be exposing anything dangerous or sensitive.
The advantages in using GETs for GET-like requests are many, as you linked to (caching, semantics, etc)
So what's the harm in having that url be accessible via direct browser entry? They can POST directly too, if they're crafty enough, using Fiddler "compose" for example. And having the GETs be accessible via url is useful for debugging.
EDIT: See sites like http://www.robotstxt.org/orig.html for lots of details, but a robots.txt that excluded search engines from your web services directory called /api would look like this:
User-agent: *
Disallow: /api/

Similar to IsPost, you can use IsAjax to determine whether the request was initiated by the XmlHttpRequest object in most browsers.
if(!IsAjax){
Response.Redirect("~/WhatDoYouThinkYoureDoing.cshtml");
}
It checks the request to see if it has an X-Requested-With header with the value of XmlHttpRequest, or if there is an item in the Request object with the key X-Requested-With that has a value of XmlHttpRequest.

One way to detect a direct AJAX call is to check for the presence of the http_referer header. Directly typed URLs won't generate a referrer, but you still won't be able to differentiate the call from a simple anchor link.
(Just keep in mind that some browsers don't generate the header for XHR requests.)

Set Referer header in asp.net

This should be an easy question, but I've been unable to solve it. I'm trying to change the Referral header prior to redirecting the page of an HttpResponse object. I know this can be done in an HttpWebResponse, but can't get this to work for a standard Page.Response.
I'm trying to just set the referer header to look like it originated from a temp page on my site (this is for analytics tracking for an external system).
Is this possible to do??
I've tried to use the code below (as well as variations such as Response.AppendHeader and Response.AddHeader), however the Referer always shows as the page that the Request initiated from.
Response.Headers.Add("Referer", "http://test.local/fromA");
Response.Redirect(HttpContext.Current.Request.Url.AbsoluteUri);
If not via .net can this be accomplished via js?
Thanks!

Referer is controlled (and sent) by the client. You can't affect it server-side. There may be some JavaScript that you could emit that'd get the client to do it - but it's probably considered a security flaw, so I wouldn't count on it.

The referrer is set by the client, not the server. It is useful to include in a request and not a response as it points to the URL where the request came from.

Wrong host returned using HttpContext.Current.Request.Url.Host

We have multiple domains for one of our websites.
e.g. mydomain-uk.com and mydomain.co.uk
I have a handler which creates an XML sitemap and it uses HttpContext.Current.Request.Url.Host to retrieve the host site.
When my browser is on mydomain.co.uk/handler it retrieves mydomain-uk.com as the host
How can I ensure it always retrieves mydomain.co.uk ?
Is there a preference order configured somewhere on the server?

The host is get it from the URL on the request, and this is logical, you can not change this.
To solve this, create a static variable with your URL name, even better place it on your web.config, and just get this variable and not the Url.Host
Hope this help

Don't point all of your domains at the website. Have the extra domains perform a 301 redirect to the main domain name. This will also help resolve confusion by search engines when they try to resolve your site as to which site is the original source of your content, and will prevent inbound links from other websites from using a mixture of domains which will only exacerbate the problem.

Don't forget that HttpContext.Current.Request.Url.Host is simply going to return whatever HOST was requested at the time it happened. If the client requested something else, HttpContext.Current.Request will reflect this.

Getting data from a webpage

I have an idea for an App that would really help me out in work but I'm not sure if it's possible.
I want to run a C# desktop application that will ask for a value. When a value is supplied, the application will open a browswer, go to a webpage and add the value into a form on an online website. The form is then submitted and a new page is loaded that contains a table of results. I then want to extract the table of results from the page source and write code to parse the result values.
It is not important that the user see's this happen in an actual browser. In other words if there's a way to do it by reading HTTP requests then thats great.
The biggest problem I have is getting the values into the form and then retrieving the page source after the form is submitted and the next page loads.
Any help really appreciated.
Thanks

Provided that you're only using this in a legal context:
Usually, web forms are sent via POST request to the web server, specifically some script that handles it. You can look at the HTML code for the form's page and find out the destination for the form (form's action).
You can then use a HttpWebRequest in C# to "pretend you are the form", sending a POST request with all the required parameters (adding them to the HTTP header).
As a result you will get the source code of the destination page as it would be sent to the browser. You can parse this.

This is definitely possible and you don't need to use an actual web browser for this. You can simply use a System.Net.WebClient to send your HTTP request and get an HTTP response.
I suggest to use wireshark (or you can use Firefox + Firebug) it allows you to see HTTP requests and responses. By looking at the HTTP traffic you can see exactly how you should pass your HTTP request and which parameters you should be setting.

You don't need to involve the browser with this. WebClient should do all that you require. You'll need to see what's actually being posted when you submit the form with the browser, and then you should be able to make a POST request using the WebClient and retrieve the resulting page as a string.
The docs for the WebClient constructor have a nice example.

See e.g. this question for some pointers on at least the data retrieval side. You're going to know a lot more about the http protocol before you're done with this...

Why would you do this through web pages if you don't even want the user to do anything?
Web pages are purely for interaction with users, if you simply want data transfer, use WCF.
#Brian using Wireshark will result in a very angry network manager, make sure you are actually allowed to use it.

How can I programmatically tell if a binary file on a website (e.g. image) has changed without downloading it?

How can I programmatically tell if a binary file on a website (e.g. image) has changed without downloading it? Is there a way using HTTP methods (in C# in this case) to check prior to fully downloading it?

Really, you want to look for the Last-Modified header after issuing a HEAD request (rather than a GET). I wrote some code to get the HEAD via WebClient here.

You can check that whether the file is changed or not by requesting with HEAD.
Then, returned response header may include Last-Modified, or ETag if the web server support.

You can do a HEAD request and check the last-modified datetime value, as well as the content-length.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.