I am trying to download this image using C#
http://www.pinkice.com/data/product_image/1/13954Untitled-1.jpg
When I try to download it using a WebClient I get an exception saying the underlying connection was closed unexpectedly.
I've tried modifying the headers to simulate chrome
Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
Headers[HttpRequestHeader.AcceptLanguage] = "en-US,en;q=0.8";
Headers[HttpRequestHeader.CacheControl] = "max-age=0";
Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.6 (KHTML, like Gecko) Chrome/23.0.1243.2 Safari/537.6";
This did not work. I then tried to see if it even worked with wget
wget "http://www.pinkice.com/data/product_image/1/14231Untitled-2.jpg"
Which resulted in
HTTP request sent, awaiting response... No data received. Retrying.
Can anyone figure this out?
Below code works..
using (WebClient wc = new WebClient())
{
wc.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.83 Safari/537.1";
byte[] buf = wc.DownloadData("http://www.pinkice.com/data/product_image/1/13954Untitled-1.jpg");
Image bmp = Image.FromStream(new MemoryStream(buf));
}
The problem was I was using reusing the WebClient object. I think it caches something weirdly when there is a 304 HTTP Status code from the If-Modified-Since header. Moral of the story is do not try to reuse the WebClient object.
Related
I'm trying to get a Url full path after it's doing redirect, simply here is the code:
var documentx = new HtmlWeb().Load(textBox1.Text);
Where the textbox1.text value is "https://xxxx.org/file/download"
so after i run that code in real it's redirect and change the structure to:
https://xxxx.org/file/ur344333kd/45rrreew
so how i can get the new url path? using HtmlAgilityPack C# Winform. Thanks
By setting web.CaptureRedirect to true, and by querying web.ResponseUri,
You can get the Url of the final request which actually downloaded the document:
Note: I am sending this UserAgent string, just like my Chrome Browser because server behavior may change depending on it.
HtmlWeb web = new HtmlWeb();
web.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36";
web.CaptureRedirect = true;
HtmlDocument doc = web.Load("http://www.google.com");
Console.WriteLine("Response retrieved from: {0}", web.ResponseUri);
The output is:
Response retrieved from: https://www.google.com/?gws_rd=ssl
Consider the following URL: "http://www.bestbuy.com". This resource is quickly and correctly loaded in all browsers and from all locations.
However, basic C# code currently stucks (ends by timeout for any timeout) for this URL:
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
var req = WebRequest.CreateHttp("http://www.bestbuy.com");
req.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
req.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36";
req.Timeout = 30000; // you can set any timeout
using (var resp = (HttpWebResponse)req.GetResponse()) // stucks and finally ends by timeout here
{
Console.WriteLine(resp.StatusCode);
Console.WriteLine(resp.ResponseUri.AbsoluteUri);
}
Fiddle: https://dotnetfiddle.net/M7NZgG
The same code works fine for most other URLs.
Tried different things, but all of them did not help:
direct loading of HTTPS version ("https://www.bestbuy.com")
remove calls for UserAgent, AutomaticDecompression and SecurityProtocol setters
HttpClient class also stucks and ends by timeout for that resource.
In Fiddler the response is quickly returned, but it looks strange - it is completely empty:
I try to get a specific value from a specific site...
the site periodically updating the value using an Ajax call to
https://www.plus500.co.il/api/LiveData/FeedUpdate?instrumentId=19
(you can Navigate to the address and see you get the XML response.)
using Postman:
sending
GET /api/LiveData/FeedUpdate?instrumentId=19 HTTP/1.1
Host: www.plus500.co.il
Cache-Control: no-cache
Postman-Token: f823c87d-3edc-68ce-e1e7-02a8fc68be7a
I get a valid Json Response...
Though, when i try it from C#:
var webRequest = WebRequest.CreateHttp(#"https://www.plus500.co.il/api/LiveData/FeedUpdate?instrumentId=19");
webRequest.Method = "GET";
using (var response = webRequest.GetResponse())
{...}
The request Fails with Error-Code 403 (Forbidden)
when adding:
webRequest.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36";
The request Fails with Error-Code 500 (Internal Server Error)
Addition (Edit)
I also initiate with
ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 |
SecurityProtocolType.Tls11 |
SecurityProtocolType.Tls |
SecurityProtocolType.Ssl3;
Also, I Tried Setting a CookieContainer, but the result is the same 500.
Why is Postman/Chrome Successfuly querying this API while C# Webrequest do not?
What is the difference?
So, the reason that this is failing is because of the headers being included in the client request from postman by default, though not from the C# request.
Using a program like Fiddler (https://www.telerik.com/fiddler) you can watch the request to see that the headers from the postman request are:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36
Yet from C# are just
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Filling in the extra client request headers like this allows it to go through fine:
webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
webRequest.Headers.Add("Accept-Encoding", "gzip deflate,br");
webRequest.Headers.Add("Accept-Language", "en-US,en;q=0.9");
I have a website on a local network that I am trying to write a little client for.
I am trying to use WebClient for this purpose, however, it seems that the website somehow detects it and does not allow to continue cutting the connection, which results in WebException.
To counter this, I have tried adding headers like:
WebClient wc = new WebClient();
wc.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
wc.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml");
wc.Headers.Add("Accept-Encoding", "deflate, sdch, br");
wc.Headers.Add("Accept-Charset", "ISO-8859-1");
wc.Headers.Add("Accept-Language", "en-us;q=0.7,en;q=0.3");
However, the website still cut of the connection and I managed to notice that not all headers were sent, then, I have tried to override WebRequest:
public class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
var castRequest = request as HttpWebRequest;
if (castRequest != null)
{
castRequest.KeepAlive = true;
castRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
castRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
castRequest.Headers.Add("Accept-Encoding", "deflate, sdch, br");
castRequest.Headers.Add("Accept-Charset", "ISO-8859-1");
castRequest.Headers.Add("Accept-Language", "en-US,en;q=0.8");
}
return request;
}
}
This managed to send all the headers, however, I still could not access the website.
I can access the website just fine using any browser like Firefox or Chrome from which I copied the headers or even WebBrowser control, I can also access other websites using WebClient without any issue.
Is there anything specific why I cannot access such website using WebClient?
Is there anything else to make WebClient request look more like browser for a website?
I have figured out that I was looking in the wrong direction.
It seems the website in question does not support default SecurityProtocol, therefore I had to enable TLS12:
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
For some reason it has caused the same issue with another local website I was parsing, which I have solved by enabling all TLS versions:
ServicePointManager.SecurityProtocol =
SecurityProtocolType.Tls
| SecurityProtocolType.Tls11
| SecurityProtocolType.Tls12;
How can I simulate visiting a url in chrome that ends in .php. What code is being run exactly?
For example,
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://www.example.com/upload.php");
req.GetResponse();
Would the c# code be as simple as this?
this should do the trick.
Set the useragent to the Chrome Useragents: http://www.useragentstring.com/ you can find different ones
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://www.example.com/upload.php");
req.UserAgent="Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36";
req.GetResponse();
here you can find the documentation to the property on msdn.