ASHX renders as broken image - c#

I've got a really vexxing problem with an ASHX handler that renders a captcha image. The thing that makes it really vexxing is that it was working fine two months ago and when I went back to it again today it had stopped working.
What I've got is a page that throws in a captcha every so often. This is the markup from an example of a challenge:
<img class="challengedtl" src="Challenge.ashx?tkn=0057ea27-4d35-4850-9c6f-7a6fdc9818e2"/>
The GUID references a record in a SQL table that contains the actual content of the captcha as well as the status of the captcha challenge, i.e. has it been processed and if so did the user get it right etc.
On the page where this markup is found, the image displays as a broken jpeg. When I drop a breakpoint in the ASHX ProcessRequest() method I can see that the ASHX is never being called.
When I take the URL out of the source attribute and run it directly from the address bar in my browser, then I hit my break point in ProcessRequest and the captch image is rendered just fine.
I don't believe that my ASHX code is the problem, since it works when I call it directly. The problem seems to be with why the ASHX isn't being called by the main page. Given that this was working in February I am at a loss to explain what is going on.
I know that something has happened to my machine since then. I suspect a Windows Update or a service pack for something. The reason for this is that my captcha processing includes tracking the IP address of the caller. Back when this was working my local host was being registered as 127.0.0.1 (IPv4) but now it is being registered as ::1 (IPv6). Probably a red herring.
Does anyone know what might be causing this or do you have any suggestions for how to troubleshoot this problem?

Is the handler in the same folder as the page containing the html you posted above?

Here are the two key parts:
When I drop a breakpoint in the ASHX ProcessRequest() method I can see that the ASHX is never being called.
and
src="Challenge.ashx?tkn=0057ea27-4d35-4850-9c6f-7a6fdc9818e2"
Put those together, and what we can surmise that the path in your src attribute is wrong.
It's just an image tag. If the html loads it will send a request for that resource. Since your breakpoint is not hit, it can only mean that either you aren't testing somewhere that allows breakpoints or that it's sending the request to the wrong place.
It could be as simple as sending the request to the production version of the site, using the wrong schema (ie: https vs http), or missing a folder or port number somewhere. The browser should be able to give you the entire path of the resource -- make sure this matches what you expect.

Related

AG_E_NETWORK_ERROR while downloading some images from web in WindowsPhone

I am using HtmlAgilityPack.
I am downloading articles and images from one web site. 80% images downloading without problem. But some images throwing error. I can see name of error in image_failed event.
I am downloading image like that:
Image = new BitmapImage(new Uri(img.Attributes["src"].Value));
I have searched google and found that this is really WTF problem.
There's a good chance the referrer header is screwing you up. You need to issue the calls yourself (instead of relying on BitmapImage to download the file).
There's a handy snippet/utility that 'extends' xaml and makes it easier to do.
http://blogs.msdn.com/b/swick/archive/2011/08/04/wp7-mango-image-download-with-custom-referer-header.aspx
Edit: Explanation
A lot of sites block requests for images not coming from their sites. That way, if you have http://mysite.com and you link to images in http://cnn.com, they can block images directly linked and redirect them or something.
Now, the reason it works is that the browser controls all calls made from the tag (or from any other mechanism such as AJAX) and it adds the REFERRER HTTP header saying where the request is coming from (http://mysite.com) - and then the cnn.com code can block it.
In .NET desktop, the Referrer header is not automatically added to the request - that means that the call would be blocked by some site that checks for an empty referrer and not for others that don't.
Switch to WP7/8 which is based on Silverlight. In Silverlight, the referrer is the site on which the Silverlight control is hosted. So if you have a SL control running on http://mysite.com and it makes [any] http request, the referrer header will be automatically set for you to http://mysite.com. There's no way to control that afaik (for security reasons). Windows Phone, however, while based on SL, does not need to be bound by the same security constraints. However, when they "ported" the code to Windows Phone, they put some value into referrer into it - the value is actually the package location inside the phone (you can see this by using fiddler). It's literally some path (/apps/storage/[guid]) or something like that - I don't recall the exact value. To fix that, you go and set the referrer to the site on the HTTP headers making the request.
Hope that makes it clear.

why does a single web request create a second separate thread call to my defaultdocument page?

I'm seeing some very strange behaviour when debugging my web application in VS2010 locally. The same user journey/sequence of pages happens in Production.
Debugging, I'm seeing this:
1. request for MyPage.aspx - handled by thread_1
2. (there is something on that page that IIS/ASP.Net doesn't like it seems) I'm slowly removing sections to pin-point exactly but there's
no JS, or anything fancy there just html content, user controls etc.
3. Either way a separate thread_2 to begin processing the Page_Load of my defaultdocument i.e. home.aspx is executed. There is logic in
home.aspx.cs to clear some data.
4. So when thread_1 continues processing, checks against the data above fail, resulting with the user being redirected to an error page.
Can anyone shed any light on why the second thread is created and why it starts to process my default document?
Please note:
I've checked the global methods for errors e.g. session_end,
app_error etc but nothing.
I do intermittently see a 401 error with Failed Request Tracing Logging enabled but I don't understand how that would start the
processing of my default home page?
just to sanity check, I placed a new doc test.aspx at the beginning of my defaultdocument list in the web.config and it did get called.
It seems as though, something within IIS/ASP.Net is configured to begin processing the default page on an error but this is new behaviour to me?
I've tried researching this but the only thing that seems that it could be related is thread-agility but I'm not too sure..?
It seems like there are two HTTP requests running concurrently. As each request (generally) executes on its on thread this condition would make sense.
HTTP requests by default do not share state. They operate on different data. For that reason this is not a thread-safety issue.
An exception to this rule is if you explicitly share state e.g. using static variables. You shouldn't do this for various reasons.
To debug the problem launch Fiddler and examine the HTTP request being executed. Also example HttpContext.Current.Request.RawUrl on each of the two concurrent threads.
After removing a lot of content within the faulty MyPage.aspx, I came across the guilty line of code: btnShowPost.ImageUrl = SitePath + "post.png"; (it was never accessed behind an if statement) and therefore the image <asp:Image ID="btnShowPost" runat="server" /> never set the necessary ImageUrl.
Without it, apparently this is standard browser behavior: any img, script, css, etc, with a
src= missing, will use the default path as the url. iis will usually redirect to default.aspx (or whatever is the default).
See full explanation on this link

How can I debug problems related to (lack of) postback

I have created a custom wizard control that dynamically loads usercontrols as you progress though it. The wizard is behaving as expected in all environments (PC/MAC) and browsers I have tested however a client is reporting that she is unable to complete the wizard. What I know about the issue:
It always fails on the same wizard step for this user (not the first step)
When the user clicks on the 'next' button in the step, the controller reports that the request was not a postback request (ie. IsPostBack() == false) and displays the first page of the wizard
The client is using a Mac and is accessing the site using the latest version of Safari
If the client switches to Firefox, or even just switches the user agent in Safari to something other than Safari the problem goes away.
So the problem is that when the client reaches a certain step in the wizard and clicks 'next', instead of re-loading that step to initiate the save event, the controller is merely displaying the first step of the wizard.
The step that fails contains many different form controls including textboxes, dropdowns, checkboxes and a fileupload control. We thought that it might have something to do with invalid characters getting pasted in from Word or something similar but that seems strange seeing as the problem only appears to be happening in Safari.
No exceptions are thrown and the windows event log is not displaying any related errors/warnings.
What I am looking for is ways to diagnose this error. At the moment I've been unable to reproduce the behavior that the client is experiencing but after going on site and seeing it for myself I can verify that it is definitely a valid issue.
Update 26/10/2010:
We installed a proxy on the clients NIC in order to retrieve the requests and responses. Problem is that when running the proxy the client appears to not have to problem any more. Does this behavior make sense to anyone?
Update 27/10/2010:
After investigating the traffic on the clients machine we noticed that the response headers were including some entries related to a client side proxy and we confirmed that they are in fact running the squid proxy in their office. To rule out that it had anything to do with the problem we got them to turn if off and then try the wizard again. This time no problems were encountered! So the proxy seems to be interfering with the requests causing .NET to somehow record the POST request as a non-postback. The following lines were found in the response header of a failed request. Can anyone comment on how squid could cause the behavior we are experiencing and what we can do about it?
Via:1.0 squid-12 (squid/3.1.0.13), 1.0 ClientSiteProxy:3128 (squid/2.7.STABLE4)
X-Cache:MISS from squid-12, MISS from ClientSiteProxy
X-Cache-Lookup:MISS from ClientSiteProxy:3128
If I have to troubleshoot this, I would first take a fiddler trace (www.fiddlertool.com) on the client and see what the requests are up to. I am not sure if Fiddler works on Mac, but any HTTP Watch, Network Monitor tool should be good. The reason that I am not doubting the code is that it works very well on all the other browsers, so the code shouldn't be bad.
May be there is something in the code [like adding cookies, etc] that is messing with the specific Client's browser.
HTH,
Rahul
For Mac There's a HTTPScoop which lets you to debug http post data's....it is similar to fiddler
The problem is not solved as such but we ended up just adding an exception to the clients squid proxy to bypass our website. The problem seems proxy/IIS/Safari related but we haven't been able to track the problem down any further and the client is happy with this solution as long as the problem doesn't resurface somewhere else. I'll re-post if more information surfaces.

POSTing to a re-written URL on IIS 6 doesn't work

I am working on a site which is programmed in C# .net. It uses a CMS called ADX Studio (a decision which predates my time there) which provides a shonky form of URL Rewriting (as far as I can tell it works by assigning an aspx page as the default 404 handler in IIS).
I have an web form which lives at a rewritten URL. I edited it so that the html form's action points back to the rewritten URL:
var u = new Uri(Request.RawUrl.Split(new char[1] { ';' }).Last());
userAdminForm.Action = u.PathAndQuery;
(kind of ugly but works based on what Request.RawUrl is on these rewritten URLs).
The "pretty" URL is something like this:
http://www.site.com/admin/user/edit/
On my development box (Windows XP/ IIS 5) when I initially tried POSTing back to URLs like this I got a HTTP 405 error. I worked around this by adding a script mapping so Aspnet_isapi.dll handles all (*) requests. And everything works fine on my development machine.
I just pushed my changes to the live server (Windows Server 2003 R2 and IIS 6) and the post fails silently. The page refreshes but all of my logic (from within an IsPostBack path in the code) doesn't get hit. No errors are displayed, it just doesn't work.
If I remove my code setting the .Action of the form then the postback works but it is posting to the ugly URL corresponding to the physical location of the aspx file rather than my page.
Am I missing a simple way to make this work? I don't want to be switching URL rewriting method or anything as this is a large legacy site and is unfortunately pretty dependent on ADX Studio so I don't want to do anything that will break that.
[edited because somehow the code above lost its code highlighting]
The issue is that the page's <form> tag is referencing the "ugly" url as the action. You can resolve that by completely removing the action tag from the form. Browsers will, by default, postback to the same page, ie. the "pretty" url.
This article explains how to accomplish an "actionless" form (~ two thirds of the way down) http://msdn.microsoft.com/en-us/library/ms972974.aspx
It seems like the problem is the same as it was on IIS 5. I can get it to work by doing the following in the IIS Manager:
Right click on the relevant website and select "Properties"
Choose the "Home Directory" tab
Click "Configuration" down in the "Application settings"
Click "Insert" next to the "Wildcard application maps"
Browse to the location of aspnet_isapi.dll (in my case: C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll )
Untick "Check that file exists"
Click "OK" back through the Russian doll of dialogs.
This is basically the same as the approach that I linked to in the question for IIS5. However, it's not optimal because IIS is running every request through asp (even static files). Which seems like it can only slow things down. I'd like to be able to specify that asp only needs invoking for HTTP POST requests at least.
The weird thing is that IIS5 gave a HTTP 405 error when POSTing to an extension without a registered ISAPI extension but IIS6 just fails silently. And the page is being run through IIS (I can debug with a breakpoint in the Page_Load function) but IsPostBack (and IsCrossPagePostBack) don't get correctly set. Could it be related to the view state? Is there any alternative to my solution described above?
I've come to what I think is an optimal solution for this problem. It turns out that ADXStudio CMS does use the default 404 rule to do some form of URL rewriting. This has a problem with http POST:
when IIS initially executes a custom
URL on a 404 error, it changes POST to
GET, even if the client does a POST
request.
(thanks to elite brains' blog post about setting up IIS6 and ASP.NET MVC).
Rather than creating my own HttpModule I decided instead to use Ionics Isapi Rewrite Filter to rewrite my URLs. I then set the 404 error handler in IIS to the default. And I created this IIRF.ini file to redirect all requests to the same format as the 404 handler produced:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /Default.aspx?404;http://%{HTTP_HOST}$1 [U,L]
And everything seems to work great. The advantage over my previous answer is that the rewrite code is low level and runs fast and the -f and -d switches mean that if a file actually exists it isn't re-written and so static files don't have the overhead of running through .net.

C# WebClient - View source question

I'm using a C# WebClient to post login details to a page and read the all the results.
The page I am trying to load includes flash (which, in the browser, translates into HTML). I'm guessing it's flash to avoid being picked up by search engines???
The flash I am interested in is just text (not an image/video) etc and when I "View Selection Source" in firefox I do actually see the text, within HTML, that I want to see.
(Interestingly when I view the source for the whole page I do not see the text, within HTML, that I want to see. Could this be related?)
Currently after I have posted my login details, and loaded the HTML back, I see the page which does NOT show the flash HTML (as if I had viewed source for the whole page).
Thanks in advance,
Jim
PS: I should point out that the POST is actually working, my log in is successful.
Fiddler (or similar tool) is invaluable to track down screen-scraping problems like this. Using a normal browser and with fiddler active, look at all the requests being made as you go through the login and navigation process to get to the data you want. In between, you will likely see one or more things that your code is doing differently which the server is responding to and hence showing you different HTML than a real client.
The list of stuff below (think of it as "scraping 101") is what you want to look for. Most of the stuff below is probably stuff you're already doing, but I included everything for completeness.
In order to scrape effectively, you may need to deal with one or more of the following:
cookies and/or hidden fields. when you show up at any page on a site, you'll typically get a session cookie and/or hidden form field which (in a normal browser) would be propagated back to the server on all subsequent requests. You will likely also get a persistent cookie. On many sites, if a requests shows up without a proper cookie (or form field for sites using "cookieless sessions"), the site will redirect the user to a "no cookies" UI, a login page, or another undesirable location (from the scraper app's perspective). always make sure you capture the cookies set on the initial request and faithfully send them back to the server on subsequent requests, except if one of those subsequent requests changes a cookie (in which case propagate that new cookie instead).
authentication tokens a special case of above is forms-authentication cookies or hidden fields. make sure you're capturing the login token (usually a cookie) and sending it back.
POST vs. GET this is obvious, but make sure you're using the same HTTP method that a real browser does.
form fields (esp. hidden ones!) I'm sure you're doing this already, but make sure to send all form fields that a real browser does, not just the visible fields. make sure fields are HTML-encoded properly.
HTTP headers. you already checked this, but it may make sense to check again just to make sure the (non-cookie) headers are identical. I always start with the exact same headers and then start pulling out headers one by one, and only keep the ones that cause the request to fail or return bogus data. this approach simplifies your scraping code.
redirects. These can either come from the server, or from client script (e.g. "if user doesn't have flash plug-in loaded, redirect to a non-flash page"). See WebRequest: How to find a postal code using a WebRequest against this ContentType="application/xhtml+xml, text/xml, text/html; charset=utf-8"? for a crazy example of how redirection can trip up a screen-scraper. Note that if you're using .NET for scraping, you'll need to use HttpWebRequest (not WebClient) for redirect-dependent scraping, because by default WebClient doesn't provide a way for your code to attach cookies and headers to the second (post-redirect) request. See the thread above for more details.
sub-requests (frames, ajax, flash, etc.) - often, page elements (not the main HTTP requests) will end up fetching the data you want to scrape. you'll be able to figure this out by looking which HTTP response contains the text you want, and then working backwards until you find what on the page is actually making the request for that content. A few sites do really crazy things in sub-requests, like requesting compressed or encrypted text via ajax, and then using client-side script to decrypt it. if this is the case, you'll need to do a bit more work like reverse-engineering what the client script is doing.
ordering - this one is obvious: make HTTP requests in the same order that a browser client does. that doesn't mean you need to make every request (e.g. images). Typically you only need to make the requests which return text/html content type, unless the data you want is not in the HTML and is in an ajax/flash/etc. request.
(Interestingly when I view the source for the whole page I do not see the text, within HTML, that I want to see. Could this be related?)
This usually means that the discrepancy is caused by some DOM manipulations via javascript after the page has loaded. Try turning off javascript and see what it looks like.

Categories

Resources