C# Code to download an image from a "not so easy" CDN

C# Code to download an image from a "not so easy" CDN - c#

Im trying to download an image from a specific website.
Actually, my code is up and running in production for months, but it's not able to download imagens from this specific website
The image URL I need to download is this one: (for instance)
http://static7.kabum.com.br/produtos/fotos/64297/64297_index_g.jpg
The codes I tried so far:
Method 1 -> (failed)
string url = "http://static7.kabum.com.br/produtos/fotos/64297/64297_index_g.jpg";
var request = (HttpWebRequest)WebRequest.Create(url);
request.Timeout = (timeout == 0 ? 30 : timeout) * 1000;
request.KeepAlive = false;
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36";
var response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
const int BUFFER_SIZE = 16 * 1024;
var buffer = new byte[BUFFER_SIZE];
// if the remote file was found, download it
using (Stream inputStream = response.GetResponseStream())
using (Stream outputStream = File.Create(fileName, BUFFER_SIZE))
{
int bytesRead;
do
{
bytesRead = inputStream.Read(buffer, 0, buffer.Length);
outputStream.Write(buffer, 0, bytesRead);
} while (bytesRead != 0);
}
}
Method 2 -> (also failed)
[..]
using(Image webImage = Image.FromStream(response.GetResponseStream()))
{
webImage.Save(fileName);
}
[..]
Both methods fail with the following exception
“Parameter not valid” exception loading System.Drawing.Image
StackTrace = " em System.Drawing.Image.FromStream(Stream stream,
Boolean useEmbeddedColorManagement, Boolean validateImageData) in
System.Drawing.Image.FromStream(Stream stream) in
MonitorLib.Helper.RequestPageHelper.RequestDowloadPage(Boolean proxy,
Strin...
I guess the image data is incomplete or compacted, but the URL Works fine on any browser
any thoughts?
thanks a lot friends

you could use the WebClient.DownloadFile() method.
var fileName = #"C:\path\to\file.jpg";
var url = "http://static7.kabum.com.br/produtos/fotos/64297/64297_index_g.jpg";
using (var client = new WebClient())
{
client.DownloadFile(url, fileName);
}

This seems to be problem with the server responding with bad header that the browsers are able to ignore and get past. You need to tell your application to do the same. There are several options for doing that. The server committed a protocol violation. Section=ResponseHeader Detail=CR must be followed by LF, In WinForms? should be able to guide you to the right direction.

Related

Not able to access a page from US as a country

I want to use the US as a country to access this
url = http://www.tillys.com/product/Say-What/Short-Dresses/SAY-WHAT--Ribbed-Tank-Midi-Dress/Heather-Grey/285111595,
I've tried with cookies and all but the url still it redirects to the site's home page.
I want to see if there is any way i can access this page. Below is the code with which i am trying.
Below is the function with which i am trying to do this:
public static string getUrlContent (string url)
{
var myHttpWebRequest = (HttpWebRequest)WebRequest.Create(url);
myHttpWebRequest.Method = "GET";
myHttpWebRequest.AllowAutoRedirect = true;
myHttpWebRequest.ContentLength = 0;
myHttpWebRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
myHttpWebRequest.Headers.Add("Cookie", "=en%5FUS;");
myHttpWebRequest.UserAgent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/49.0.2623.108 Chrome/49.0.2623.108 Safari/537.36";
//myHttpWebRequest.Headers.Add("Accept-Encoding", "gzip, deflate, sdch");
myHttpWebRequest.Headers.Add("Accept-Language", "en-US,en;q=0.8");
myHttpWebRequest.Headers.Add("Cookie", "wlcme=true");
//myHttpWebRequest.CookieContainer = new CookieContainer();
//myHttpWebRequest.Headers.Add("X-Macys-ClientId", "NavApp");
var response = (HttpWebResponse)myHttpWebRequest.GetResponse();
var rmyResponseHeaders = response.Headers;
Console.WriteLine ("Content length is {0}", response.ContentLength);
Console.WriteLine ("Content type is {0}", response.ContentType);
// Get the stream associated with the response.
Stream receiveStream = response.GetResponseStream ();
// Pipes the stream to a higher level stream reader with the required encoding format.
StreamReader readStream = new StreamReader (receiveStream, Encoding.UTF8);
//Console.WriteLine ("Response stream received.");
Console.WriteLine (readStream.ReadToEnd ());
var josnStr = readStream.ReadToEnd ();
Console.WriteLine (josnStr);
return josnStr;
//Encoding enc1 = Encoding.GetEncoding(1252);
}

If the site www.tillys.com is using Geo-fencing it will show you different content based on a lookup of the requesting IP address. In this case there's nothing C# or other languages can do.
You'll need to either proxy your request through a VPN (see How to send WebRequest via proxy?) or deploy your code to a data center in the US. For example, if you use Azure you can deploy to several different data centers through out the world including several data centers in the US. Once your code is running in the US it should be able to access the US version of the page.

The underlying connection was closed exception while WebClient DownloadString

Just a piece of code
WebClient wc = new WebClient();
String str = wc.DownloadString(new Uri("http://content.warframe.com/dynamic/rss.php"));
And I got exception:
An unhandled exception of type 'System.Net.WebException' occurred in
System.dll
Additional information: The underlying connection was closed: The
connection was closed unexpectedly.
I've head that this is a bug in .NET (I am using 3.5), but I tried other methods to obtain the content of this link (its rss, xml). No lucky shot yet
var webrequest = (WebRequest)HttpWebRequest.Create(#"http://content.warframe.com/dynamic/rss.php");
var resp = webrequest.GetResponse();
//HttpWebResponse webresponse = (HttpWebResponse)webrequest.GetResponse(); // Wont work also
This code above won't work either, both casts the same exception
Fiddler logs:
SESSION STATE: Aborted.
Response Entity Size: 512 bytes.
== FLAGS ==================
BitFlags: [ResponseGeneratedByFiddler] 0x100
X-ABORTED-WHEN: Done
X-CLIENTIP: 127.0.0.1
X-CLIENTPORT: 2747
X-EGRESSPORT: 2748
X-FAILSESSION-WHEN: ReadingResponse
X-HOSTIP: 205.185.216.10
X-PROCESSINFO: willitwork.vshost:3300
== TIMING INFO ============
ClientConnected: 10:29:11.706
ClientBeginRequest: 10:29:11.713
GotRequestHeaders: 10:29:11.713
ClientDoneRequest: 10:29:11.713
Determine Gateway: 0ms
DNS Lookup: 164ms
TCP/IP Connect: 74ms
HTTPS Handshake: 0ms
ServerConnected: 10:29:11.953
FiddlerBeginRequest: 10:29:11.953
ServerGotRequest: 10:29:11.953
ServerBeginResponse: 10:29:12.372
GotResponseHeaders: 00:00:00.000
ServerDoneResponse: 10:29:12.372
ClientBeginResponse: 10:29:12.385
ClientDoneResponse: 10:29:12.385
Overall Elapsed: 0:00:00.672
The response was buffered before delivery to the client.
== WININET CACHE INFO ============
This URL is not present in the WinINET cache. [Code: 2]
* Note: Data above shows WinINET's current cache state, not the state at the time of the request.
* Note: Data above shows WinINET's Medium Integrity (non-Protected Mode) cache only.
Also - 504, this does not makes sense since I can get data from link via chrome / firefox / ie...
I just did it to work in other language, but I am forced to do it with C# (I' ve made 2 much code to rewrite it)
I've added some settings like fiddler said
myHttpWebRequest1.ProtocolVersion = HttpVersion.Version11;
myHttpWebRequest1.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36";
myHttpWebRequest1.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
At least now I get 504 error instead of "unknown", but I can still view the content via webbrowser, so the 504 error is fake
Edit: There is no response error when I added
myHttpWebRequest1.Headers["Accept-Encoding"] = "gzip";
but now the output is messed and unreadable

I have same error.
You can add User Agent to your httpRequest
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";

Ok, i got this all fixes & working!
static void Main(string[] args)
{
Uri url = new Uri(#"http://content.warframe.com/dynamic/rss.php");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
// MAGIC LINE GOES HERE \/
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
// Assign the response object of HttpWebRequest to a HttpWebResponse variable.
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (Stream streamResponse = response.GetResponseStream())
{
using (StreamReader streamRead = new StreamReader(streamResponse))
{
Char[] readBuff = new Char[2000];
int count = streamRead.Read(readBuff, 0, 2000);
while (count > 0)
{
String outputData = new String(readBuff, 0, count);
Console.Write(outputData);
count = streamRead.Read(readBuff, 0, 2000);
}
}
}
}
Console.ReadKey();
}
Besides of not-using WebClient.DownloadString method i had to add decompresion line
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
Thanks for tips (especially fiddler one, Decode button saved my time to find what's wrong)

Check this answer:
..The underlying connection was closed: An unexpected error occurred on a receive
So this may work for you:
var webRequest = (HttpWebRequest)HttpWebRequest.Create(#"http://content.warframe.com/dynamic/rss.php");
webRequest.KeepAlive = false;
var resp = webRequest.GetResponse();
EDIT:
You are right, check rather this:
http://msdn.microsoft.com/cs-cz/library/system.net.httpwebrequest.keepalive%28v=vs.110%29.aspx
Here is working code that will print out the recieved response content:
static void Main(string[] args)
{
// Create a new HttpWebRequest object.Make sure that
// a default proxy is set if you are behind a firewall.
HttpWebRequest myHttpWebRequest1 =
(HttpWebRequest)WebRequest.Create(#"http://content.warframe.com/dynamic/rss.php");
myHttpWebRequest1.KeepAlive=false;
// Assign the response object of HttpWebRequest to a HttpWebResponse variable.
HttpWebResponse myHttpWebResponse1 =
(HttpWebResponse)myHttpWebRequest1.GetResponse();
Console.WriteLine("\nThe HTTP request Headers for the first request are: \n{0}", myHttpWebRequest1.Headers);
Stream streamResponse = myHttpWebResponse1.GetResponseStream();
StreamReader streamRead = new StreamReader(streamResponse);
Char[] readBuff = new Char[256];
int count = streamRead.Read(readBuff, 0, 256);
Console.WriteLine("The contents of the Html page are.......\n");
while (count > 0)
{
String outputData = new String(readBuff, 0, count);
Console.Write(outputData);
count = streamRead.Read(readBuff, 0, 256);
}
Console.WriteLine();
// Close the Stream object.
streamResponse.Close();
streamRead.Close();
Console.ReadKey();
}

C# using HttpWebRequest Post method doesn't work

Hey I'm trying to figure out using HttpWebRequest to do a Post request to a login page, say yahoo mail, and examine the returned page source.
But using my Post method I still got the login page.
Here is my method:
public static string GetResponse(string sURL, ref CookieContainer cookies, string sParameters)
{
HttpWebRequest httpRequest = (HttpWebRequest)WebRequest.Create(sURL);
httpRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36";
httpRequest.CookieContainer = cookies;
httpRequest.Method = "Post";
httpRequest.ContentType = "application/x-www-form-urlencoded";
httpRequest.ContentLength = sParameters.Length;
httpRequest.AllowAutoRedirect = true;
using (Stream stream = httpRequest.GetRequestStream())
{
stream.Write(Encoding.UTF8.GetBytes(sParameters), 0, sParameters.Length);
}
HttpWebResponse httpWebResponse = (HttpWebResponse)httpRequest.GetResponse();
string sResponse;
using (Stream stream = httpWebResponse.GetResponseStream())
{
StreamReader reader = new StreamReader(stream, System.Text.Encoding.GetEncoding(936));
sResponse = reader.ReadToEnd();
}
return sResponse;
}
The code to call the method is:
string sParameter = ".tries=1&.src=ym&.md5=&.hash=&.js=&.last=&promo=&.intl=us&.lang=en-US&.bypass=&.partner=&.u=eip09319532h1&.v=0&.challenge=3QjvX9eEFtJRrABhZp9kgS9IT.VO&.yplus=&.emailCode=&pkg=&stepid=&.ev=&hasMsgr=0&.chkP=Y&.done=http%3A%2F%2Fmail.yahoo.com&.pd=ym_ver%3D0%26c%3D%26ivt%3D%26sg%3D&.ws=1&.cp=0&nr=0&pad=3&aad=3&login=username%40yahoo.com&passwd=xxxxx&.persistent=&.save=&passwd_raw=";
System.Net.CookieContainer coookies = null ;
string sResponse;
sResponse = GetResponse(sUrl, ref coookies, sParameter);
The string sParameter was obtained by examining the data posted to the server in Firefox's Firebug plugin. But in the parameters I posted above, I masked my user id and password.
I wanted to re-use the session so I passed a CookieContainer object as reference to the method.
It compiles and runs, but the page returned to me is not logged-in status.
I have read several similar questions on stackoverflow, but still can't make my method work. Your help is appreciated.

Issue WebRequest while reading NetworkStream of TcpListener

Following on from the post How to create a simple proxy in C#? I have been playing around with implementing a basic proxy.
Where I am getting stuck and confused is trying to issue a WebRequest with the information provided in the original request.
Using the following code.
WebRequest webRequest = WebRequest.Create("http://www.google.com");
(webRequest as HttpWebRequest).UserAgent = "MOZILLA/5.0 (WINDOWS NT 6.1; WOW64) APPLEWEBKIT/537.1 (KHTML, LIKE GECKO) CHROME/21.0.1180.75 SAFARI/537.1";
webRequest.Method = "GET";
WebResponse webResponse = webRequest.GetResponse();
Stream responseStream = webResponse.GetResponseStream();
byte[] responseBytes = responseStream.ReadFully();
I can successfully issue a request and return the page content.
However when I put it inside a Proxy request (IE: TcpListener) like such.
TcpListener _listener = new TcpListener(IPAddress.Any, 1234);
this._listener.Start();
byte[] bytes = new byte[1024];
while (true)
{
TcpClient client = this._listener.AcceptTcpClient();
NetworkStream networkStream = client.GetStream();
int i = networkStream.Read(bytes, 0, bytes.Length);
while (i != 0)
{
string data = System.Text.Encoding.ASCII.GetString(bytes, 0, i);
RequestHeader header = new RequestHeader(data.ToUpper());
WebRequest webRequest = WebRequest.Create(header.URL);
(webRequest as HttpWebRequest).UserAgent = header.UserAgent;
webRequest.Method = "GET";
WebResponse webResponse = webRequest.GetResponse(); //It gets here and never returns
Stream responseStream = webResponse.GetResponseStream();
byte[] responseBytes = responseStream.ReadFully();
networkStream.Write(responseBytes, 0, responseBytes.Length);
i = networkStream.Read(bytes, 0, bytes.Length);
}
client.Close();
}
It blocks at the line WebResponse webResponse = webRequest.GetResponse(); and never returns.
This has definitely got nothing to do with the data provided by the RequestHeader class I created as I've also tried hardcoding the values.
I'm assuming I'm missing something fundamental about the way sockets work in such a scenario and the approach required. Hopefully someone can clarify for me.

Yeah, you are assuming you have read all the header.
Instead of this, some kind of state machine should be implemented to parse the incoming HTTP-request. The state machine must collect the information about the request and, of course, detect the end of request, then you process the request (proxy stuff) and send the response. Just Google C# http state machine for examples.

So it turned out to be a proxy issue.
Basically for testing I needed to set the machine proxy to 127.0.0.1:1234 or similar.
This in turn was being used in the default settings when initializing a WebRequest.
So all I needed to do in the end was the following to bypass the proxy.
(webRequest as HttpWebRequest).UserAgent = header.UserAgent;
webRequest.Method = "GET";
webRequest.Proxy = null; //Adding this line cleared the proxy.

C# WebClient - Getting a question-mark-inside-a-square characters instead of øæå when downloading a page

Im using WebClient to download a webpage from a norwegian website. And in the downloaded data all special characters (øæå) are missing and replaced by a question mark type char instead.
I used to have this issue on my webpage before I added a "" in my html file, this is present here.
If I open a browser and browse to the address everything looks fine.
I have used fiddler to see exactly what headers I need to send and I am use im sending everything the exact same as my brower.
So by power of deduction I believe that WebClient is the offender, and somehow cripples the data before returning it to me, and im not sure how to stop him from doing this.
For more information this is my code to get the webpage:
string result = string.Empty;
using (WebClient client = new WebClient())
{
client.Headers["Accept"] = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*";
client.Headers["Referer"] = "http://mywebsite.no/forum/viewforum.php?f=7";
client.Headers["Accept-Language"] = "nb-NO";
client.Headers["User-Agent"] = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; AskTbFXTV5/5.9.1.14019)";
client.Headers["Accept-Encoding"] = "gzip, deflate";
using (Stream stream = client.OpenRead(new Uri(textBox1.Text)))
{
using (StreamReader reader = new StreamReader(stream))
{
result = reader.ReadToEnd();
}
}
}
Any tips?

As others have said, you might not have set the correct encoding. See how to detect encoding of the response body which shows how to guess the encoding from the response headers or the HTML META tag in the response body.

Have you tried setting the encoding on the response?
string result = string.Empty;
using (WebClient client = new WebClient())
{
client.Headers["Accept"] = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*";
client.Headers["Referer"] = "http://mywebsite.no/forum/viewforum.php?f=7";
client.Headers["Accept-Language"] = "nb-NO";
client.Headers["User-Agent"] = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; AskTbFXTV5/5.9.1.14019)";
client.Headers["Accept-Encoding"] = "gzip, deflate";
using (Stream stream = client.OpenRead(new Uri("")))
{
byte[] resultBytes = StreamUtil.ReadToEnd(stream);
result = System.Text.ASCIIEncoding.UTF8.GetString(resultBytes);
}
}
internal class StreamUtil
{
internal static byte[] ReadToEnd(System.IO.Stream stream)
{
byte[] readBuffer = new byte[4096];
int totalBytesRead = 0;
int bytesRead;
while ((bytesRead = stream.Read(readBuffer, totalBytesRead, readBuffer.Length - totalBytesRead)) > 0)
{
totalBytesRead += bytesRead;
if (totalBytesRead == readBuffer.Length)
{
int nextByte = stream.ReadByte();
if (nextByte != -1)
{
byte[] temp = new byte[readBuffer.Length * 2];
Buffer.BlockCopy(readBuffer, 0, temp, 0, readBuffer.Length);
Buffer.SetByte(temp, totalBytesRead, (byte)nextByte);
readBuffer = temp;
totalBytesRead++;
}
}
}
byte[] buffer = readBuffer;
if (readBuffer.Length != totalBytesRead)
{
buffer = new byte[totalBytesRead];
Buffer.BlockCopy(readBuffer, 0, buffer, 0, totalBytesRead);
}
return buffer;
}
}

Try using a StreamReader constructor that specifies the encoding.
http://msdn.microsoft.com/en-us/library/ms143456.aspx
http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx
To figure out the encoding of the page, in firefox you can right click and select View Page Info. Encoding should be listed there.

There are two likely reasons:
You are not using the correct encoding for the StreamReader.
You are displaying the result using a font that doesn't support the characters.
If you know what the encoding is, and know that it will stay the same, you can just provide the encoding when you create the StreamReader object.
If not, you would have to get the first part of the page into a byte buffer, so that you can encode enough of it using a plain ASCII encoding to find a content meta tag, so that you can determine the encoding from that. Then you can decode the buffer and the rest of the page using the correct encoding.
As you are saying "question-mark-inside-a-square characters" and not just question marks, it leads me to suspect that it might be displaying the content that is actually the problem, not decoding it. A decoding problem would produce regular question marks, while fonts contains a special character for missing glyphs that looks exactly as you describe.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Code to download an image from a "not so easy" CDN - c#

you could use the WebClient.DownloadFile() method. var fileName = #"C:\path\to\file.jpg"; var url = "http://static7.kabum.com.br/produtos/fotos/64297/64297_index_g.jpg"; using (var client = new WebClient()) { client.DownloadFile(url, fileName); }

Related

Not able to access a page from US as a country

The underlying connection was closed exception while WebClient DownloadString

C# using HttpWebRequest Post method doesn't work

Issue WebRequest while reading NetworkStream of TcpListener

C# WebClient - Getting a question-mark-inside-a-square characters instead of øæå when downloading a page

Categories

Resources