Get Embedded pdf File - c#

I have looked through the forums and found many seemingly related questions, but nothing has helped thus far. I want to be able to get select pdfs from various websites. Here is a snippet that I'm using successfully for most of the documents I'm interested in.
if (!String.IsNullOrEmpty(filePaths[1]))
{
var myRequest = (HttpWebRequest)WebRequest.Create(filePaths[1]);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
var sr = new StreamReader(myResponse.GetResponseStream(), Encoding.UTF8);
var fileBytes = sr.ReadToEnd();
using (var sw = new StreamWriter("<localfilepath/name")
{
sw.Write(fileBytes);
}
}
The problem comes when I try to get this document: http://www.azdor.gov/LinkClick.aspx?fileticket=r_I2VeNlcCQ%3d&tabid=265&mid=921
If I use the above code, I get a DotNetNuke error. I tried utilizing a WebClient as many other posts have suggested, but get the same error.
When I use this code:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.UserAgent = #"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0";
request.ContentType = "application/x-unknown";
request.Method = "GET";
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
var sr2 = new StreamReader(stream, Encoding.UTF8);//.ASCII);
var srt = sr2.ReadToEnd();
var a = srt.Length;
using (var sw = new StreamWriter("WebDataTestdocs/testpdf.pdf"))
{
sw.Write(srt);
}
}
}
I get a file back, but it says it was corrupted. Also utilizing UTF8 makes the file size bigger than the one I get when going to the site. If I make the Encoding.ASCII, the file size is correct, but still am getting the corrupted file error. I can see the English text in the file by opening it with notepad, so I'm not sure what is corrupted exactly.
Any help that can be offered would be greatly appreciated, I've been at this for quite a while!

Related

Get Response from HttpWebRequest on Windows Phone 8

I'm trying to do a WebRequest to a site from a Windows Phone Application.
But is vry important for me to also get the response from the server.
Here is my code:
Uri requestUri = new Uri(string.Format("http://localhost:8099/hello/{0}", metodo));
HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(requestUri);
httpWebRequest.ContentType = "application/xml; charset=utf-8";
httpWebRequest.Method = "POST";
using (var stream = await Task.Factory.FromAsync<Stream>(httpWebRequest.BeginGetRequestStream,
httpWebRequest.EndGetRequestStream, null))
{
string xml = "<string xmlns=\"http://schemas.microsoft.com/2003/10/Serialization/\">Ahri</string>";
byte[] xmlAsBytes = Encoding.UTF8.GetBytes(xml);
await stream.WriteAsync(xmlAsBytes, 0, xmlAsBytes.Length);
}
Unfortunatelly, I have no idea of how I could get the response from the server.
Does anyone have an idea?
Thanks in advance.
Thanks to #max I found the solution and wanted to share it above.
Here is how my code looks like:
string xml = "<string xmlns=\"http://schemas.microsoft.com/2003/10/Serialization/\">Claor</string>";
Uri requestUri = new Uri(string.Format("http://localhost:8099/hello/{0}", metodo));
string responseFromServer = "no response";
HttpWebRequest httpWebRequest = HttpWebRequest.Create(requestUri) as HttpWebRequest;
httpWebRequest.ContentType = "application/xml; charset=utf-8";
httpWebRequest.Method = "POST";
using (Stream requestStream = await httpWebRequest.GetRequestStreamAsync())
{
byte[] xmlAsBytes = Encoding.UTF8.GetBytes(xml);
await requestStream.WriteAsync(xmlAsBytes, 0, xmlAsBytes.Length);
}
WebResponse webResponse = await httpWebRequest.GetResponseAsync();
using (var reader = new StreamReader(webResponse.GetResponseStream()))
{
responseFromServer = reader.ReadToEnd();
}
I hope it will help someone in the future.
This is very common question for people new in windows phone app
development. There are several sites which gives tutorials for the
same but I would want to give small answer here.
In windows phone 8 xaml/runtime you can do it by using HttpWebRequest or a WebClient.
Basically WebClient is a wraper around HttpWebRequest.
If you have a small request to make then user HttpWebRequest. It goes like this
HttpWebRequest request = HttpWebRequest.Create(requestURI) as HttpWebRequest;
WebResponse response = await request.GetResponseAsync();
using (var reader = new StreamReader(response.GetResponseStream()))
{
string responseContent = reader.ReadToEnd();
// Do anything with you content. Convert it to xml, json or anything.
}
Although this is a get request and i see that you want to do a post request, you have to modify a few steps to achieve that.
Visit this place for post request.
If you want windows phone tutorials, you can go here. He writes awesome tuts.

Can't able to Download HTML of a Specific Website

I am doing webparsing using C# Console Application.
My code is:
var req = WebRequest.Create("http://watch.squidtv.net/");
req.BeginGetResponse(r =>
{
var response = req.EndGetResponse(r);
var stream = response.GetResponseStream();
var reader = new StreamReader(stream, true);
var str = reader.ReadToEnd();
Console.WriteLine(str);
}, null);
This Code is runing fine with other URLs but when I changed URL to http://watch.squidtv.net/ then two problems occurred-
First one- It is not downloading its html.
Second one- Its generates a sound of CPU.
Then I changed the code and used webClient like this -
string htmlCode = "";
htmlCode = client.DownloadString("http://watch.squidtv.net");
Console.WriteLine(htmlCode);
But the problem is same :(
what can be the problem ???
I found the the Solution
the probelm was HTML header in HTML header there is gzip object Encoding the httpwebrequest is not accepting the gzip header which causing the problem when i used this code the problem solved
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://watch.squidtv.net/");
req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
req.Method = "GET";
req.UserAgent = "Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US))";
string htmlCode;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
htmlCode = reader.ReadToEnd();
}
Possibly you'll have to specify more in your WebRequest so that the SquidTV server can know to send you back the HTML for one idea.
Consider that in a browser there are lots of headers that get sent to the server. If you want to take a look use Fiddler or WireShark to see all the extra data that gets sent.
Firewalls could be another issue as you are sending out a request that may not be allowed and thus nothing is coming back. This would be where I'd likely suggest intermediate tools like WireShark or Fiddler that may be useful in seeing if the request is getting out at least.

Omit images from webpage requested through HttpWebRequest

I fetch webpages in order to feed data to my application. However, the pages contain a lot of images which I don't require at all. I only need the text data.
My problem is that the web requests take an unacceptable amount of time. I think the images also are fetch during a web request. Is there any way to eliminate the images and download only the text data?
The following is the code that I am using currently.
var httpWebRequest = HttpWebRequest.Create(url) as HttpWebRequest;
httpWebRequest.Method = "GET";
httpWebRequest.ProtocolVersion = HttpVersion.Version11;
httpWebRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
httpWebRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
httpWebRequest.Proxy = null;
httpWebRequest.KeepAlive = true;
httpWebRequest.Accept = "text/html";
string responseString = null;
var httpWebResponse = httpWebRequest.GetResponse() as HttpWebResponse;
using (var responseStream = httpWebResponse.GetResponseStream())
{
using (var streamReader = new StreamReader(responseStream))
{
responseString = streamReader.ReadToEnd();
}
}
Also, any other optimization suggestions are most welcome.
That is incorrect.
HttpWebRequest does not know anything about HTML or images; it just sends raw HTTP requests.
You can use Fiddler to see exactly what's going on.

C# Not getting proper response from HttpWebResponse. Encoding?

I'm trying to fetch some webpages using the code below:
public static string FetchPage(string url)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Method = "GET";
req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.0; sv-SE; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 (.NET CLR 3.5.30729";
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
req.Headers.Add("Accept-Language", "sv-se,sv;q=0.8,en-us;q=0.5,en;q=0.3");
req.Headers.Add("Accept-Encoding", "gzip,deflate");
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
req.Headers.Add("Keep-Alive", "115");
req.Headers.Add("Cache-Control: max-age=0");
req.AllowAutoRedirect = true;
req.IfModifiedSince = DateTime.Now;
using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
{
using (Stream resStream = resp.GetResponseStream())
{
StreamReader reader = new StreamReader(resStream);
return reader.ReadToEnd();
}
}
}
Some pages work (W3C, example.com) while most others I've tried do not (BBC.co.uk, CNN.com, etc). Wireshark shows that I'm getting a proper reponse.
I've tried setting the encoding of the reader to the expected encoding of the response (CNN - utf8) as well as every possible combination but I have had no luck.
What am I missing out on here?
The first bytes of my response are always "1f ef bf bd" if you're able to tell something based on that.
I suspect the most likely explanation is that you are getting compressed data and not uncompressing it. Try using a stream filter to deflate/unzip it. See Rick Strahl's blog article for more info.
Loading http://bbc.co.uk worked for me when leaving out the "Accept-Encoding" header:
req.Headers.Add("Accept-Encoding", "gzip,deflate");

HttpWebRequest has empty response requesting a search from Bing

I have the following code that sends a HttpWebRequest to Bing. When I request the url below though it returns what appears to be an empty response when it should be returning a list of results.
var response = string.Empty;
var httpWebRequest = WebRequest.Create("http://www.bing.com/search?q=stackoverflow&count=100") as HttpWebRequest;
httpWebRequest.Method = WebRequestMethods.Http.Get;
httpWebRequest.Headers.Add("Accept-Language", "en-US");
httpWebRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Win32)";
httpWebRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
using (var httpWebResponse = httpWebRequest.GetResponse() as HttpWebResponse)
{
Stream stream = null;
using (stream = httpWebResponse.GetResponseStream())
{
if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
stream = new GZipStream(stream, CompressionMode.Decompress);
else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
stream = new DeflateStream(stream, CompressionMode.Decompress);
var streamReader = new StreamReader(stream, Encoding.UTF8);
response = streamReader.ReadToEnd();
}
}
Its pretty standard code for requesting and receiving a web page. Any ideas why the response is empty? Thanks in advance.
EDIT I left off a query string parameter in the url. I also had &count=100 which I have now corrected. It seems to work for values of 50 and below but returns nothing when larger. This works ok when in the browser, but not for this web request.
It makes me think the issue is that the response is large and HttpWebResponse is not handling that for me the way I have it set up. Just a guess though.
This works just fine on my machine. Perhaps you are IP banned from Bing?
Your code works fine on my machine.
I suggest you get yourself a copy of Fiddler and examine the actual HTTP sesssion occuring. May be a proxy or firewall thing.

Categories

Resources