I am creating an application which will check for broken links in content.
All working apart from you tube links where I get a mixed response, broken links (or codes I have just made up) sometime come up with 200 ok and sometimes they come up as broken.
Is there a different way of checking broken links in youtube?
Im using standard .net/c# code
try
{
HttpWebRequest request = WebRequest.Create(match.Groups[1].ToString()) as HttpWebRequest;
//Setting the Request method HEAD, you can also use GET too.
request.Method = "HEAD";
//Getting the Web Response.
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
//Returns TRUE if the Status code == 200
// result = "true";
result = response.StatusDescription;
response.Close();
// return (response.StatusCode == HttpStatusCode.OK);
}
catch
{
//Any exception will returns false.
result = "false";
}
if(match.Groups[1].ToString().Contains(#"\n"))
{
//....
}
sometime come up with 200 ok and sometimes they come up as broken.
What you are doing is similar to web crawler, youtube will definitely have an anti-crawler mechanism, so if you have been accessing the link uninterrupted, your access may be blocked. In order to reduce the probability of this situation, you can reduce the frequency of visits and simulating the request of real users to visit the site as much as possible
TL;DR version
When a transfer error occurs while writing to the request stream, I can't access the response, even though the server sends it.
Full version
I have a .NET application that uploads files to a Tomcat server, using HttpWebRequest. In some cases, the server closes the request stream prematurely (because it refuses the file for one reason or another, e.g. an invalid filename), and sends a 400 response with a custom header to indicate the cause of the error.
The problem is that if the uploaded file is large, the request stream is closed before I finish writing the request body, and I get an IOException:
Message: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host.
InnerException: SocketException: An existing connection was forcibly closed by the remote host
I can catch this exception, but then, when I call GetResponse, I get a WebException with the previous IOException as its inner exception, and a null Response property. So I can never get the response, even though the server sends it (checked with WireShark).
Since I can't get the response, I don't know what the actual problem is. From my application point of view, it looks like the connection was interrupted, so I treat it as a network-related error and retry the upload... which, of course, fails again.
How can I work around this issue and retrieve the actual response from the server? Is it even possible? To me, the current behavior looks like a bug in HttpWebRequest, or at least a severe design issue...
Here's the code I used to reproduce the problem:
var request = HttpWebRequest.CreateHttp(uri);
request.Method = "POST";
string filename = "foo\u00A0bar.dat"; // Invalid characters in filename, the server will refuse it
request.Headers["Content-Disposition"] = string.Format("attachment; filename*=utf-8''{0}", Uri.EscapeDataString(filename));
request.AllowWriteStreamBuffering = false;
request.ContentType = "application/octet-stream";
request.ContentLength = 100 * 1024 * 1024;
// Upload the "file" (just random data in this case)
try
{
using (var stream = request.GetRequestStream())
{
byte[] buffer = new byte[1024 * 1024];
new Random().NextBytes(buffer);
for (int i = 0; i < 100; i++)
{
stream.Write(buffer, 0, buffer.Length);
}
}
}
catch(Exception ex)
{
// here I get an IOException; InnerException is a SocketException
Console.WriteLine("Error writing to stream: {0}", ex);
}
// Now try to read the response
try
{
using (var response = (HttpWebResponse)request.GetResponse())
{
Console.WriteLine("{0} - {1}", (int)response.StatusCode, response.StatusDescription);
}
}
catch(Exception ex)
{
// here I get a WebException; InnerException is the IOException from the previous catch
Console.WriteLine("Error getting the response: {0}", ex);
var webEx = ex as WebException;
if (webEx != null)
{
Console.WriteLine(webEx.Status); // SendFailure
var response = (HttpWebResponse)webEx.Response;
if (response != null)
{
Console.WriteLine("{0} - {1}", (int)response.StatusCode, response.StatusDescription);
}
else
{
Console.WriteLine("No response");
}
}
}
Additional notes:
If I correctly understand the role of the 100 Continue status, the server shouldn't send it to me if it's going to refuse the file. However, it seems that this status is controlled directly by Tomcat, and can't be controlled by the application. Ideally, I'd like the server not to send me 100 Continue in this case, but according to my colleagues in charge of the back-end, there is no easy way to do it. So I'm looking for a client-side solution for now; but if you happen to know how to solve the problem on the server side, it would also be appreciated.
The app in which I encounter the issue targets .NET 4.0, but I also reproduced it with 4.5.
I'm not timing out. The exception is thrown long before the timeout.
I tried an async request. It doesn't change anything.
I tried setting the request protocol version to HTTP 1.0, with the same result.
Someone else has already filed a bug on Connect for this issue: https://connect.microsoft.com/VisualStudio/feedback/details/779622/unable-to-get-servers-error-response-when-uploading-file-with-httpwebrequest
I am out of ideas as to what can be a client side solution to your problem. But I still think the server side solution of using a custom tomcat valve can help here. I currently doesn`t have a tomcat setup where I can test this but I think a server side solution here would be along the following lines :
RFC section 8.2.3 clearly states :
Requirements for HTTP/1.1 origin servers:
- Upon receiving a request which includes an Expect request-header
field with the "100-continue" expectation, an origin server MUST
either respond with 100 (Continue) status and continue to read
from the input stream, or respond with a final status code. The
origin server MUST NOT wait for the request body before sending
the 100 (Continue) response. If it responds with a final status
code, it MAY close the transport connection or it MAY continue
to read and discard the rest of the request. It MUST NOT
perform the requested method if it returns a final status code.
So assuming tomcat confirms to the RFC, while in the custom valve you would have recieved the HTTP request header, but the request body would not be sent since the control is not yet in the servlet that reads the body.
So you can probably implement a custom valve, something similar to :
import org.apache.catalina.connector.Request;
import org.apache.catalina.connector.Response;
import org.apache.catalina.valves.ErrorReportValve;
public class CustomUploadHandlerValve extends ValveBase {
#Override
public void invoke(Request request, Response response) throws IOException, ServletException {
HttpServletRequest httpRequest = (HttpServletRequest) request;
String fileName = httpRequest.getHeader("Filename"); // get the filename or whatever other parameters required as per your code
bool validationSuccess = Validate(); // perform filename check or anyother validation here
if(!validationSuccess)
{
response = CreateResponse(); //create your custom 400 response here
request.SetResponse(response);
// return the response here
}
else
{
getNext().invoke(request, response); // to pass to the next valve/ servlet in the chain
}
}
...
}
DISCLAIMER : Again I haven`t tried this to success, need sometime and a tomcat setup to try it out ;).
Thought it might be a starting point for you.
I had the same problem. The server sends a response before the client end of the transmission of the request body, when I try to do async request. After a series of experiments, I found a workaround.
After the request stream has been received, I use reflection to check the private field _CoreResponse of the HttpWebRequest. If it is an object of class CoreResponseData, I take his private fields (using reflection): m_StatusCode, m_StatusDescription, m_ResponseHeaders, m_ContentLength. They contain information about the server's response!
In most cases, this hack works!
What are you getting in the status code and response of the second exception not the internal exception?
If a WebException is thrown, use the Response and Status properties of the exception to determine the response from the server.
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.getresponse(v=vs.110).aspx
You are not saying what exactly version of Tomcat 7 you are using...
checked with WireShark
What do you actually see with WireShark?
Do you see the status line of response?
Do you see the complete status line, up to CR-LF characters at its end?
Is Tomcat asking for authentication credentials (401), or it is refusing file upload for some other reason (first acknowledging it with 100 but then aborting it mid-flight)?
The problem is that if the uploaded file is large, the request stream
is closed before I finish writing the request body, and I get an IOException:
If you do not want the connection to be closed but all the data transferred over the wire and swallowed at the server side, on Tomcat 7.0.55 and later it is possible to configure maxSwallowSize attribute on HTTP connector, e.g. maxSwallowSize="-1".
http://tomcat.apache.org/tomcat-7.0-doc/config/http.html
If you want to discuss Tomcat side of connection handling, you would better ask on the Tomcat users' mailing list,
http://tomcat.apache.org/lists.html#tomcat-users
At .Net side:
Is it possible to perform stream.Write() and request.GetResponse() simultaneously, from different threads?
Is it possible to performs some checks at the client side before actually uploading the file?
hmmm... i don't get it - that is EXACTLY why in many real-life scenarios large files are uploaded in chunks (and not as a single large file)
by the way: many internet servers have size limitations. for instance in tomcat that is representad by maxPostSize (as seen in this link: http://tomcat.apache.org/tomcat-5.5-doc/config/http.html)
so tweaking the server configurations seems like the easy way, but i do think that the right way is to split the file to several requests
EDIT: replace Uri.EscapeDataString with HttpServerUtility.UrlEncode
Uri.EscapeDataString(filename) // a problematic .net implementation
HttpServerUtility.UrlEncode(filename) // the proper way to do it
I am experience a pretty similar problem currently also with Tomcat and a Java client. The Tomcat REST service sends a HTTP returncode with response body before reading the whole request body. The client however fails with IOException. I inserted a HTTP Proxy on the client to sniff the protocol and actually the HTTP response is sent to the client eventually. Most likly the Tomcat closed the request input stream before sending the response.
One solution is to use a different HTTP server like Jetty which does not have this problem. The other solution is a add a Apache HTTP server with AJP in front of Tomcat. Apache HTTP server has a different handling of streams and with that the problem goes away.
My app communicates with an internal web API that requires authentication.
When I send the request I get the 401 challenge as expected, the handshake occurs, the authenticated request is re-sent and everything continues fine.
However, I know that the auth is required. Why do I have to wait for the challenge? Can I force the request to send the credentials in the first request?
My request generation is like this:
private static HttpWebRequest BuildRequest(string url, string methodType)
{
var request = HttpWebRequest.CreateHttp(url);
request.PreAuthenticate = true;
request.AuthenticationLevel = AuthenticationLevel.MutualAuthRequested;
request.Credentials = CredentialCache.DefaultNetworkCredentials;
request.Proxy.Credentials = CredentialCache.DefaultNetworkCredentials;
request.ContentType = CONTENT_TYPE;
request.Method = methodType;
request.UserAgent = BuildUserAgent();
return request;
}
Even with this code, the auth header isn't included in the initial request.
I know how to include the auth info with basic.... what I want to do is to use Windows Auth of the user executing the app (so I can't store the password in a config file).
UPDATE I also tried using a HttpClient and its own .Credentials property with the same result: no auth header is added to the initial request.
The only way I could get the auth header in was to hack it in manually using basic authentication (which won't fly for this use-case)
Ntlm is a challenge/response based authentication protocol. You need to make the first request so that the server can issue the challenge then in the subsequent request the client sends the response to the challenge. The server then verifies this response with the domain controller by giving it the challenge and the response that the client sent.
Without knowing the challenge you can't send the response which is why 2 requests are needed.
Basic authentication is password based so you can short circuit this by sending the credentials with the first request but in my experience this can be a problem for some servers to handle.
More details available here:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa378749(v=vs.85).aspx
I'm not 100% sure, but I suspect that there is no way around this; it's simply the way HttpWebRequest works.
In the online .NET source, function DoSubmitRequestProcessing which is here, you can see this comment just after the start of the function, line 1731:
// We have a response of some sort, see if we need to resubmit
// it do to authentication, redirection or something
// else, then handle clearing out state and draining out old response.
A little further down (line 1795) (some lines removed for brevity)
if (resubmit)
{
if (CacheProtocol != null && _HttpResponse != null) CacheProtocol.Reset();
ClearRequestForResubmit(ntlmFollowupRequest);
...
}
And in ClearRequestForResubmit line 5891:
// We're uploading and need to resubmit for Authentication or Redirect.
and then (Line 5923):
// The second NTLM request is required to use the same connection, don't close it
if (ntlmFollowupRequest) {....}
To my (admittedly n00bish) eyes these comments seem to indicate that the developers decided to follow the "standard" challenge-response protocol for NTML/Kerberos and not include any way of sending authentication headers up-front.
Setting PreAuthenticate is what you want, which you are doing. The very first request will still do the handshake but for subsequent requests it will automatically send the credentials (based on the URL being used). You can read up on it here: http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.preauthenticate(v=vs.110).aspx.
I want to check if an given URL a is link to http://youtube.com.
I know there are lots of various shortened version's of the links (e.g. http://youtu.be), so what I am after is a way to resolve the URL and see if it ends up as http://youtube.com.
A couple of example inputs are:
http://www.youtube.com/v/[videoid]
http://www.youtu.be/watch?v=[videoid]
Does anyone know of a way to do this?
You could perform a HEAD request:
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("http://www.youtu.be/Ddn4MGaS3N4");
request.Method = "HEAD";
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) {
Console.WriteLine("Does this resolve to youtube?: {0}", response.ResponseUri.ToString().Contains("youtube.com") ? "Yes" : "No");
}
Appears to work fine. Unsure of edge cases but seems to do the job.
(Note: No error checking here such as 404 errors, etc).
bool isYoutube = false;
string host = new Uri(url).Host;
if (host == "youtube.com" || host == "youtu.be")
{
isYoutube = true;
}
First you may have to check what the hostname is for youtube (I'm just assuming it is http://youtube.com) but after you have that the following code will do what you want;
using System.Net;
IPHostEntry host = Dns.Resolve(theInputHostName);
if (host.HostName == "http://youtube.com")
// it resolves to youtube, do something.
If you want to know whether a given URL redirects (using status codes 301/302) to an YouTube URL, you may either use WebClient/HttWebRequest/whatever directly and check the response, or disable HttpWebRequest.AllowAutoRedirect and traverse all redirects manually (checking the status code and then the Location HTTP header).
I need to check that our visitors are using HTTPS. In BasePage I check if the request is coming via HTTPS. If it's not, I redirect back with HTTPS. However, when someone comes to the site and this function is used, I get the error:
System.Web.HttpException: Server
cannot append header after HTTP
headers have been sent. at
System.Web.HttpResponse.AppendHeader(String
name, String value) at
System.Web.HttpResponse.AddHeader(String
name, String value) at
Premier.Payment.Website.Generic.BasePage..ctor()
Here is the code I started with:
// If page not currently SSL
if (HttpContext.Current.Request.ServerVariables["HTTPS"].Equals("off"))
{
// If SSL is required
if (GetConfigSetting("SSLRequired").ToUpper().Equals("TRUE"))
{
string redi = "https://" +
HttpContext.Current.Request.ServerVariables["SERVER_NAME"].ToString() +
HttpContext.Current.Request.ServerVariables["SCRIPT_NAME"].ToString() +
"?" + HttpContext.Current.Request.ServerVariables["QUERY_STRING"].ToString();
HttpContext.Current.Response.Redirect(redi.ToString());
}
}
I also tried adding this above it (a bit I used in another site for a similar problem):
// Wait until page is copletely loaded before sending anything since we re-build
HttpContext.Current.Response.BufferOutput = true;
I am using c# in .NET 3.5 on IIS 6.
Chad,
Did you try ending the output when you redirect? There is a second parameter that you'd set to true to tell the output to stop when the redirect header is issued. Or, if you are buffering the output then maybe you need to clear the buffer before doing the redirect so the headers are not sent out along with the redirect header.
Brian
This error usually means that something has bee written to the response stream before a redirection is initiated. So you should make sure that the test for https is done fairly high up in the page load function.