So I'm trying to download a list of strings, which are MP4 videos. This code works when theres only one video to download, yet doesn't if there are multiple on the list?
If there are multiple it downloads it, its always 154kb in size, and no length. Its esentially corrupted and can't be watched, that's exactly what windows also tells me when I open it.
Can anyone help? Am I not doing something I should be?
public static void DownloadFiles(IList<string> files)
{
foreach (var file in files)
{
DownloadFile(file, file.Location);
}
}
private static void DownloadFile(string url, string fileName)
{
using (var webClient = new WebClient())
{
webClient.Headers.Add("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36");
webClient.DownloadFile(url, fileName);
}
}
Related
I'm making a simple website scraper to retrieve party names of supreme court cases(this is public information) in C# like in this sample link: https://www.supremecourt.gov/search.aspx?filename=/docket/docketfiles/html/public/19-8334.html
C# Code:
private static async void GetHtmlAsync(String docket)
{
var url = "https://www.supremecourt.gov/search.aspx?filename=/docket/docketfiles/html/public/19-8334.html";
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36 OPR/71.0.3770.234");
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml(html);
Console.WriteLine();
}
The problem is that whenever I run this, it successfully gives back the whole HTML file but without the data I need which is in enclosed in the element.
In browser:
In Runtime:
I don't know why but you should get proper response.
Try following you might get the answer.
var html = httpClient.GetAsync(url).GetAwaiter().GetResult();
I am new to C# and wanted to try to make a little scraper out of it to try out some things. I saw a YT video on it. I am trying to scrape bet365.dk (more specifically this link: https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/).
This is my code:
using System;
using System.Net.Http;
using HtmlAgilityPack;
namespace Bet365Scraper
{
class Program
{
static void Main(string[] args)
{
GetHtmlAsync();
Console.ReadLine();
}
private static async void GetHtmlAsync()
{
var url = "https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/";
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36");
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var htmlBody = htmlDocument.DocumentNode.SelectSingleNode("//body");
var node = htmlBody.Element("//div[#class='src-ParticipantFixtureDetailsHigher_TeamNames ']");
Console.WriteLine(node.InnerHtml);
}
}
}
I am not sure how to do this. And I find the documentation on HTML Agilty Pack's site a bit confusing, and I cannot seem to find what I exactly is looking for. Here is what I want to do. This little piece of the HTML on the bet365 site:
<div class="src-ParticipantFixtureDetailsHigher_TeamNames">
<div class="src-ParticipantFixtureDetailsHigher_TeamWrapper ">
<div class="src-ParticipantFixtureDetailsHigher_Team " style="">Færøerne</div>
</div>
<div class="src-ParticipantFixtureDetailsHigher_TeamWrapper ">
<div class="src-ParticipantFixtureDetailsHigher_Team ">Andorra</div>
</div>
</div>
How could I be able to print out both 'Færørne' and 'Andorra' from the divs in one go? I am aware of the fact, that I probably need to use a foreach, but as said, I'm not too certain how to do with the selectors and such.
I'm not familiar with XPath but i know JS query syntax, and suggest to install Fizzler.Systems.HtmlAgilityPack NuGet package additionally.
Then HtmlNode.QuerySelector() method will be available. It accepts JavaScript query syntax.
Also i fixed HttpClient usage.
namespace Bet365Scraper
{
class Program
{
private static readonly HttpClient httpClient = new HttpClient();
static async Task Main(string[] args)
{
httpClient.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36");
await GetHtmlAsync("https://www.bet365.dk/#/AC/B1/C1/D451/F2/Q1/F^12/");
Console.ReadLine();
}
private static async Task GetHtmlAsync(string url)
{
var html = await httpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var nodes = htmlDocument.DocumentNode.QuerySelectorAll(".src-ParticipantFixtureDetailsHigher_Team");
foreach (HtmlNode node in nodes)
{
Console.WriteLine(node.InnerText);
}
}
}
}
I am trying to download the HTML from a site and parse it. I am actually interested in the OpenGraph data in the head section only. For most sites using the WebClient, HttpClient or HtmlAgilityPack works, but some domains I get 403, for example: westelm.com
I have tried setting up the Headers to be absolutely the same as they are when I use the browser, but I still get 403. Here is some code:
string url = "https://www.westelm.com/m/products/brushed-herringbone-throw-t5792/?";
var doc = new HtmlDocument();
using(WebClient client = new WebClient()) {
client.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36";
client.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
client.Headers["Accept-Encoding"] = "gzip, deflate, br";
client.Headers["Accept-Language"] = "en-US,en;q=0.9";
doc.Load(client.OpenRead(url));
}
At this point, I am getting a 403.
Am I missing something or the site administrator is protecting the site from API requests?
How can I make this work? Is there a better way to get OpenGraph data from a site?
Thanks.
I used your question to resolve the same problem. IDK if you're already fixed this but I tell you how it worked for me
A page was giving me 403 for the same reasons. The thing is: you need to emulate a "web browser" from the code, sending a lot of headers.
I used one of yours headers I wasn't using (like Accept-Language)
I didn't use WebClient though, I used HttpClient to parse the webpage
private static async Task<string> GetHtmlResponseAsync(HttpClient httpClient, string url)
{
using var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url));
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36");
request.Headers.TryAddWithoutValidation("Accept-Charset", "UTF-8");
request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.9");
using var response = await httpClient.SendAsync(request).ConfigureAwait(false);
if (response == null)
return string.Empty;
using var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
return await streamReader.ReadToEndAsync().ConfigureAwait(false);
}
If it helps you, I'm glad. If not, I will leave this answer here to help someone else in the future!
I'm trying to crawl data from a aspx page, which have three dropdowns: State, District, and City.They are implemented as the dependency dropdowns with the server side post back.
I Have all the ids of the State, District, and the City.I am writing a Console application using WebClient to Post all three drop-down ids as a form data to the page. But every time it is redirecting to an error page. Can anyone help me to set all the drop-down values at a time with single post call?
Code Snippet:
var formValues = new NameValueCollection();
formValues["__VIEWSTATE"] = Extract("__VIEWSTATE", responseString);
formValues["__EVENTVALIDATION"] = Extract("__EVENTVALIDATION", responseString);
formValues["ddlSelectLanguage"] = "en-US";
formValues["ddlState"] = "19";
formValues["DDLDistrict"] = "237";
formValues["DDLVillage"] = "bcab59fd-35d2-e111-882d-001517f1d35c";
client.Headers.Set(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36");
var responseData = client.UploadValues(firstPage, formValues);
responseString = Encoding.ASCII.GetString(responseData);
Can anybody tell me how i can download file in my C# program from that URL:
http://www.cryptopro.ru/products/cades/plugin/get_2_0
I try to use WebClient.DownloadFile, but i'm getting only html page instead of file.
Looking in Fiddler the request fails if there is not a legitimate U/A string, so:
WebClient wb = new WebClient();
wb.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.33 Safari/537.36");
wb.DownloadFile("http://www.cryptopro.ru/products/cades/plugin/get_2_0/cadeplugin.exe", "c:\\xxx\\xxx.exe");
I belive this would do the trick.
WebClient wb = new WebClient();
wb.DownloadFile("http://www.cryptopro.ru/products/cades/plugin/get_2_0/cadeplugin.exe","file.exe");
If you need to know the download status or use credentials in order to make the request, I'll suggest this solution:
WebClient client = new WebClient();
Uri ur = new Uri("http://remoteserver.do/images/img.jpg");
client.Credentials = new NetworkCredential("username", "password");
client.DownloadProgressChanged += WebClientDownloadProgressChanged;
client.DownloadDataCompleted += WebClientDownloadCompleted;
client.DownloadFileAsync(ur, #"C:\path\newImage.jpg");
And her it is the implementation of the callbacks:
void WebClientDownloadProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
Console.WriteLine("Download status: {0}%.", e.ProgressPercentage);
}
void WebClientDownloadCompleted(object sender, DownloadDataCompletedEventArgs e)
{
Console.WriteLine("Download finished!");
}
Try WebClient.DownloadData
You would get response in the form of byte[] then you can do whatever you want with that.
Sometimes a server would not let you download files with scripts/code. to take care of this you need to set user agent header to fool the server that the request is coming from browser. using the following code, it works. Tested ok
var webClient=new WebClient();
webClient.Headers["User-Agent"] =
"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36";
webClient.DownloadFile("the url","path to downloaded file");
this will work as you expect, and you can download file.