HTTPHandler does not handle secondary requests - c#

I want to run my personal web sites via an httphandler (I have a web server and static ip at home.)
Eventually, I will incorporate a data access layer and domain router into the handler, but for now, I am just trying to use it to return static web content.
I have the handler mapped to all verbs and paths with no access restrictions in IIS 7 on Windows 7.
I have added a little file logging at the beginning of process request. As it is the first thing in the handler, I use the logging to tell me when the handler is hit.
At the moment, the handler just returns a single web page that I have already written.
The handler itself is mostly just this:
using (FileStream fs = new FileStream(Request.PhysicalApplicationPath + "index.htm",
FileMode.Open))
{
fs.CopyTo(Response.OutputStream);
}
I understand that this won't work for anything but the one file.
So my issue is this: the HTML file has links to some images in it. I would expect that the browser would come back to the server to get those images as new requests. I would expect those requests to fail (because they'd be mapped to index.htm). But I would expect to see the logging hit at least twice (and potentially hit recursively). However, I only see a single request. The web page comes up and the images are 'X's.
When I refresh the browser, I see another request come through, but only for the root page again. The page is basic HTML, I do not have an asp.net application (nor do I want one, I like HTML/CSS/JS).
What do I have to do to get more than just the first request sent from the browser? I assume I'm just totally off the mark because I wrote an HTTP Module first, but strangely got the same exact behavior. I'm thinking I need to specify some response headers, but don't see that in any example.

Related

Handling Authentication for a File Display using a Web Service

This is my first time developing this kind of system, so many of these concepts are very new to me. Any and all help would be appreciated. I'll try to sum up what I'm doing as efficiently as possible.
Background: I have a web application running AngularJS with Bootstrap. The app communicates with the server and DB through a web service programmed using C#. On the site, users can upload files and reference them later using direct links. There's no restriction to file type (yet), so just about anything is allowed.
My Goal: Having direct links creates a big security problem for me, since the documents/images are supposed to be private data. What I would prefer to do is validate a user's credentials when the link is clicked, then load the file in the browser using a more generic url path.
--Example--
"mysite.com/attachments/1" ---> (Image)
--instead of--
"mysite.com/data/files/importantImg.jpg"
Where I'm At: Not very far. My first thought was to add a page that sends the server request and receives a file byte stream along with mime type that I can reassemble and present to the user. However, I have no idea if this is possible using a web service that sends JSON requests, nor do I have a clue about how the reassembling process would work client-side.
Like I said, I'll take any and all advice. I'd love to learn more about this subject for future projects as well, but for now I just need to be pointed in the right direction.
Your first thought is correct, for it, you need to use the Response object, and more specifically the AddHeader and Write functions. Of course this will be a different page that will only handle file downloads, so it will be perfectly fine in your JSON web service.
I don't think you want to do this with a web service. Just use a regular IHttpHandler to perform the validation and return the data. So you would have the URL "attachments/1" get rewritten to "attachments/download.ashx?id=1". When you've verified access, write the data to the response stream. You can use the Content Disposition header to set the file name.

Response.Redirect timeout in NLB environment

I have a custom Sharepoint 2010 web part that runs the user through a series of steps in a registration process. At each step, when whatever required input is completed, the user clicks the Continue button which is a standard server side button control. The code behind does some validation and DB updates before calling Response.Redirect which refreshes the same page with updated session data.
(Note: the session data is kept in the URL as an encrypted query string parameter, not by the conventional Session object)
This solution works fine in my single server test environment, but as soon as I deploy it to a load balanced stage or production environment some requests simply time out without receiving a response after clicking Continue (ERR_TIMED_OUT).
The Webpart log shows that the webpart is in fact calling Response.Redirect with a valid URL
This is no server resource issue. The timeout can be set to a minute or more, no response is received.
Only happens when deployed to load balanced servers
Everything works fine when I complete a registration on one of the load balanced servers - which leads me to believe there is a problem with load balancing and server sessions. I know that when interacting with a load balanced web application from one of the server nodes in the NLB, all requests will go to that particular server.
I know I have faced a similar issue before, but it is several years ago and I cannot remember what the solution was.
try
{
// get clean URL without query string parameters
string url;
if (string.IsNullOrEmpty(Request.Url.Query))
url = Request.Url.AbsoluteUri;
else
url = Request.Url.AbsoluteUri.Replace(Request.Url.Query, "");
// add encrypted serialized session object
url += "?" + Constants.QueryStringParameterData + "=" + SessionData.Serialize(true);
_log.Info("Redirecting to url '" + url + "'..");
Response.Redirect(url);
}
catch (Exception) { }
OK, the problem has been resolved.
It turned out to be UAG that was doing something in the background, and the way I discovered it was that the links that triggered the postbacks got changed from
http://some_url.com/sites/work/al2343/page.aspx
to
http://some_other_url.domain.com/uniquesigfed6a45cdc95e5fa9aa451d1a37451068d36e625ec2be5d4bc00f965ebc6a721/uniquesig1/sites/work/al2343/page.aspx
(Take note of the "uniquesig" in there)
This was the URL the browser actually tried to redirect to, but because of whatever the issue was with UAG the navigation froze.
I don't know how they fixed it, but at least the problem was not in my component.
One possibility that Request.Url is how particular server sees the url (something like http://internalServer44/myUrl) instead of externally visible load-balanced Ulr (like http://NlbFarmUrl/myUrl).
In case of SharePoint it will be better to use SPContext.Current.Site/Web properties to get base portion of Url since this Urls should already be in externally visible form.

Server side redirect truncating request payloads

I'm on IIS 6 and I have an ASP.Net 4.0 site that's a single page to serve as a SOAP reverse proxy. I have to modify the return content in order to delete a trouble node from the response and add a tracking node.
In order to facilitate its function as a reverse proxy for all addresses, I have the 404 on the server set to a custom "URL" of "/default.aspx" (the page for my app)
For requests without a payload, it works perfectly - such as for ?WSDL Urls. It requests the proper URL from the target system, gets the response and sends it back - it's pretty utterly transparent in this regard.
However, when a SOAP request is being made with an input payload, the Request.InputStream in the code is always empty. Empty - with one exception - using SOAPUI, I can override the end point and send the request directly to /default.aspx and it will receive the input payload. Thus, I have determined that the custom 404 handler is - when server-side transferring the request - stripping the payload. I know the payload is being sent - I have even wiresharked it on the server to be sure. But then when I add code to log the contents of Request.InputStream it's blank - even though Request.ContentLength shows the right content length for the original request.
I've also been looking for a good way to use ASP.Net to intercept the requests directly rather than allowing the normal IIS 404 handler to take care of it but even with a wildcard mapping, I can't seem to get the settings right nor am I fully confident that it would help. (But I'm hoping it would?)
Finally, I don't have corporate permission to install MVC framework.
Thus, I need either some configuration for IIS I am missing to make this work properly or some other method of ensuring that I get the request payload to my web page.
Thanks!
What about using an HTTP Handler mapped to all requests?
You'll need to add a wildcard application mapping as detailed here and correctly configure your HTTP Handler.

How can I detect a child request?

I am trying to create an HttpModule in C# which will redirect arbitrary URLs and missing files, and which will perform canonicalization on all URLs that come in. Part of my canonicalization process is to redirect from default documents (such as http://www.contoso.com/default.aspx) to a bare directory. (like http://www.contoso.com/)
I have discovered that when an IIS server receives a request for a bare directory, it processes this request normally, and then it creates a child request for the selected default document. This is producing a redirect loop in my module - the first request goes through just fine, but when it sees the child request it removes the default document from the url and redirects back to the bare directory, starting the process over again.
Obviously, all I need to solve this problem is for my module to know when it's seeing a child request, so that it can ignore it. But I cannot find anything online describing how to tell the two requests apart. I found that request headers persist between the two requests, so I tried adding a value to the request headers and then looking for that value. This worked in IIS 7, but apparently IIS 6 won't let you alter request headers, and my code needs to run in both.
These child requests can also be triggered by any Server.Transfer or Server.Executes in the code. One trick that works to detect a child request would be to add a custom request header during the first request and checking for it later (when in the child request). Example:
private bool IsChildRequest(HttpRequest request)
{
var childRequestHeader = request.Headers["x-parent-breadcrumb"];
if (childRequestHeader != null)
{
return true;
}
request.Headers["x-parent-breadcrumb"] = "1"; // arbitrary value
return false;
}
This works because the request headers are passed to the child request. I initially tried this with HttpContext.Current.Items, but that seemed to get reset for the child request.
What's happening with your module is perfectly the way it should. If your default page is Default.aspx, then IIS is bound to redirect to Default.aspx, which causes your module to redo the work. However one thing I don't understand is that why would you want to have http://www.contoso.com/default.aspx to be redirected to http://www.contoso.com? probably you need to redefine your requirement. Or else, if possible you could have another default page (like http://www.contoso.com/Home.aspx) and then your IIS should forward the bare requests to that URL.

C# WebClient - View source question

I'm using a C# WebClient to post login details to a page and read the all the results.
The page I am trying to load includes flash (which, in the browser, translates into HTML). I'm guessing it's flash to avoid being picked up by search engines???
The flash I am interested in is just text (not an image/video) etc and when I "View Selection Source" in firefox I do actually see the text, within HTML, that I want to see.
(Interestingly when I view the source for the whole page I do not see the text, within HTML, that I want to see. Could this be related?)
Currently after I have posted my login details, and loaded the HTML back, I see the page which does NOT show the flash HTML (as if I had viewed source for the whole page).
Thanks in advance,
Jim
PS: I should point out that the POST is actually working, my log in is successful.
Fiddler (or similar tool) is invaluable to track down screen-scraping problems like this. Using a normal browser and with fiddler active, look at all the requests being made as you go through the login and navigation process to get to the data you want. In between, you will likely see one or more things that your code is doing differently which the server is responding to and hence showing you different HTML than a real client.
The list of stuff below (think of it as "scraping 101") is what you want to look for. Most of the stuff below is probably stuff you're already doing, but I included everything for completeness.
In order to scrape effectively, you may need to deal with one or more of the following:
cookies and/or hidden fields. when you show up at any page on a site, you'll typically get a session cookie and/or hidden form field which (in a normal browser) would be propagated back to the server on all subsequent requests. You will likely also get a persistent cookie. On many sites, if a requests shows up without a proper cookie (or form field for sites using "cookieless sessions"), the site will redirect the user to a "no cookies" UI, a login page, or another undesirable location (from the scraper app's perspective). always make sure you capture the cookies set on the initial request and faithfully send them back to the server on subsequent requests, except if one of those subsequent requests changes a cookie (in which case propagate that new cookie instead).
authentication tokens a special case of above is forms-authentication cookies or hidden fields. make sure you're capturing the login token (usually a cookie) and sending it back.
POST vs. GET this is obvious, but make sure you're using the same HTTP method that a real browser does.
form fields (esp. hidden ones!) I'm sure you're doing this already, but make sure to send all form fields that a real browser does, not just the visible fields. make sure fields are HTML-encoded properly.
HTTP headers. you already checked this, but it may make sense to check again just to make sure the (non-cookie) headers are identical. I always start with the exact same headers and then start pulling out headers one by one, and only keep the ones that cause the request to fail or return bogus data. this approach simplifies your scraping code.
redirects. These can either come from the server, or from client script (e.g. "if user doesn't have flash plug-in loaded, redirect to a non-flash page"). See WebRequest: How to find a postal code using a WebRequest against this ContentType="application/xhtml+xml, text/xml, text/html; charset=utf-8"? for a crazy example of how redirection can trip up a screen-scraper. Note that if you're using .NET for scraping, you'll need to use HttpWebRequest (not WebClient) for redirect-dependent scraping, because by default WebClient doesn't provide a way for your code to attach cookies and headers to the second (post-redirect) request. See the thread above for more details.
sub-requests (frames, ajax, flash, etc.) - often, page elements (not the main HTTP requests) will end up fetching the data you want to scrape. you'll be able to figure this out by looking which HTTP response contains the text you want, and then working backwards until you find what on the page is actually making the request for that content. A few sites do really crazy things in sub-requests, like requesting compressed or encrypted text via ajax, and then using client-side script to decrypt it. if this is the case, you'll need to do a bit more work like reverse-engineering what the client script is doing.
ordering - this one is obvious: make HTTP requests in the same order that a browser client does. that doesn't mean you need to make every request (e.g. images). Typically you only need to make the requests which return text/html content type, unless the data you want is not in the HTML and is in an ajax/flash/etc. request.
(Interestingly when I view the source for the whole page I do not see the text, within HTML, that I want to see. Could this be related?)
This usually means that the discrepancy is caused by some DOM manipulations via javascript after the page has loaded. Try turning off javascript and see what it looks like.

Categories

Resources