Can anybody tell me how i can download file in my C# program from that URL:
http://www.cryptopro.ru/products/cades/plugin/get_2_0
I try to use WebClient.DownloadFile, but i'm getting only html page instead of file.
Looking in Fiddler the request fails if there is not a legitimate U/A string, so:
WebClient wb = new WebClient();
wb.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.33 Safari/537.36");
wb.DownloadFile("http://www.cryptopro.ru/products/cades/plugin/get_2_0/cadeplugin.exe", "c:\\xxx\\xxx.exe");
I belive this would do the trick.
WebClient wb = new WebClient();
wb.DownloadFile("http://www.cryptopro.ru/products/cades/plugin/get_2_0/cadeplugin.exe","file.exe");
If you need to know the download status or use credentials in order to make the request, I'll suggest this solution:
WebClient client = new WebClient();
Uri ur = new Uri("http://remoteserver.do/images/img.jpg");
client.Credentials = new NetworkCredential("username", "password");
client.DownloadProgressChanged += WebClientDownloadProgressChanged;
client.DownloadDataCompleted += WebClientDownloadCompleted;
client.DownloadFileAsync(ur, #"C:\path\newImage.jpg");
And her it is the implementation of the callbacks:
void WebClientDownloadProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
Console.WriteLine("Download status: {0}%.", e.ProgressPercentage);
}
void WebClientDownloadCompleted(object sender, DownloadDataCompletedEventArgs e)
{
Console.WriteLine("Download finished!");
}
Try WebClient.DownloadData
You would get response in the form of byte[] then you can do whatever you want with that.
Sometimes a server would not let you download files with scripts/code. to take care of this you need to set user agent header to fool the server that the request is coming from browser. using the following code, it works. Tested ok
var webClient=new WebClient();
webClient.Headers["User-Agent"] =
"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36";
webClient.DownloadFile("the url","path to downloaded file");
this will work as you expect, and you can download file.
Related
I am trying to download the HTML from a site and parse it. I am actually interested in the OpenGraph data in the head section only. For most sites using the WebClient, HttpClient or HtmlAgilityPack works, but some domains I get 403, for example: westelm.com
I have tried setting up the Headers to be absolutely the same as they are when I use the browser, but I still get 403. Here is some code:
string url = "https://www.westelm.com/m/products/brushed-herringbone-throw-t5792/?";
var doc = new HtmlDocument();
using(WebClient client = new WebClient()) {
client.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36";
client.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
client.Headers["Accept-Encoding"] = "gzip, deflate, br";
client.Headers["Accept-Language"] = "en-US,en;q=0.9";
doc.Load(client.OpenRead(url));
}
At this point, I am getting a 403.
Am I missing something or the site administrator is protecting the site from API requests?
How can I make this work? Is there a better way to get OpenGraph data from a site?
Thanks.
I used your question to resolve the same problem. IDK if you're already fixed this but I tell you how it worked for me
A page was giving me 403 for the same reasons. The thing is: you need to emulate a "web browser" from the code, sending a lot of headers.
I used one of yours headers I wasn't using (like Accept-Language)
I didn't use WebClient though, I used HttpClient to parse the webpage
private static async Task<string> GetHtmlResponseAsync(HttpClient httpClient, string url)
{
using var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url));
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36");
request.Headers.TryAddWithoutValidation("Accept-Charset", "UTF-8");
request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.9");
using var response = await httpClient.SendAsync(request).ConfigureAwait(false);
if (response == null)
return string.Empty;
using var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
return await streamReader.ReadToEndAsync().ConfigureAwait(false);
}
If it helps you, I'm glad. If not, I will leave this answer here to help someone else in the future!
So Im learning RestSharp
But I'm stuck at this problem which is getting specific string for client cookies here is my code:
var cookieJar = new CookieContainer();
var client = new RestClient("https://server.com")
{
UserAgent =
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36",
};
client.CookieContainer = cookieJar;
var request = new RestRequest(Method.GET);
var cookie = client.CookieContainer.GetCookieHeader(new Uri("https://server.com"));
MessageBox.Show(""+cookie);
and I always get the cookie empty can anyone helps me?
This will set the cookie for your client. After all, you need to do is client.Execute. The code is in C# pretty sure you can make it work for anything else.
string myUserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36";
client.CookieContainer.Add(new Cookie("UserAgent", myUserAgent) { Domain = myUri.Host });
I am trying to get the contents of this URL as a string.
https://noembed.com/embed?url=https://www.youtube.com/watch?v=1FLhOGOg2Qg
This is the code I am using:
var html_content = "";
using (var client = new WebClient())
{
client.Headers.Add("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36");
html_content += client.DownloadString("https://noembed.com/embed?url=https://www.youtube.com/watch?v=1FLhOGOg2Qg");
}
Console.WriteLine(html_content);
Console.ReadLine();
And this is the error I get:
System.Net.WebException was unhandled
HResult=-2146233079
Message=The request was aborted: Could not create SSL/TLS secure channel.
Source=System
I am using this on a WPF application and I am OK with ignoring SSL here. I have already tried other answers for ignoring SSL but none worked. It works with other urls, eg https://www.youtube.com/watch?v=1FLhOGOg2Qg but not with the noembed.com URL.
Add ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
this worked for me :
var html_content = "";
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
using (var client = new WebClient())
{
client.Headers.Add("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36");
html_content += client.DownloadString("https://noembed.com/embed?url=https://www.youtube.com/watch?v=1FLhOGOg2Qg");
}
Console.WriteLine(html_content);
Console.ReadLine();
output i got :
{"author_url":"https://www.youtube.com/user/nogoodflix","url":"https://www.youtube.com/watch?v=1FLhOGOg2Qg","provider_url":"https://www.youtube.com/","title":"ONE FOR THE MONEY Trailer 2011 Official [HD] Katherine Heigl","author_name":"Streaming Clips","type":"video","height":270,"thumbnail_height":360,"thumbnail_width":480,"provider_name":"YouTube","html":"\nhttps://www.youtube.com/embed/1FLhOGOg2Qg?feature=oembed\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\">\n","thumbnail_url":"https://i.ytimg.com/vi/1FLhOGOg2Qg/hqdefault.jpg","version":"1.0","width":480}
I'm trying to crawl data from a aspx page, which have three dropdowns: State, District, and City.They are implemented as the dependency dropdowns with the server side post back.
I Have all the ids of the State, District, and the City.I am writing a Console application using WebClient to Post all three drop-down ids as a form data to the page. But every time it is redirecting to an error page. Can anyone help me to set all the drop-down values at a time with single post call?
Code Snippet:
var formValues = new NameValueCollection();
formValues["__VIEWSTATE"] = Extract("__VIEWSTATE", responseString);
formValues["__EVENTVALIDATION"] = Extract("__EVENTVALIDATION", responseString);
formValues["ddlSelectLanguage"] = "en-US";
formValues["ddlState"] = "19";
formValues["DDLDistrict"] = "237";
formValues["DDLVillage"] = "bcab59fd-35d2-e111-882d-001517f1d35c";
client.Headers.Set(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36");
var responseData = client.UploadValues(firstPage, formValues);
responseString = Encoding.ASCII.GetString(responseData);
I want to get some information from the web with HtmlAgilityPack, the application was normal before I use the application to get the data from this page, the number of the error is 403, And my code is as follows:
string wikipageurl = geturl.Text;
WebClient wc1 = new WebClient();
Stream stream1 = wc1.OpenRead(wikipageurl);
StreamReader sr1 = new StreamReader(stream1, Encoding.UTF8);
showhtml.Text = sr1.ReadToEnd();
I use showhtml textbox to show me the information the application got.
This is how you can do it using HtmlAgilityPack:
HtmlDocutment doc;
HtmlWeb web = new HtmlWeb();
web.OverrideEncoding = Encoding.UTF8;
web.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0";
doc = web.Load("http://zh.wikipedia.org/wiki/%E6%97%A5%E6%9C%AC%E5%85%83%E5%B8%85%E5%88%97%E8%A1%A8");
showhtml.Text = doc.DocumentNode.OuterHtml;
If you want to do it using WebClient check Oscar Mederos answer
Just try to simulate you're accessing to it through a web browser. For that, you use the User-Agent header:
...
WebClient wc1 = new WebClient();
wc1.Headers.Add(
"User-Agent",
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0"
);
Stream stream1 = wc1.OpenRead(wikipageurl);
...