I am trying to get the contents of this URL as a string.
https://noembed.com/embed?url=https://www.youtube.com/watch?v=1FLhOGOg2Qg
This is the code I am using:
var html_content = "";
using (var client = new WebClient())
{
client.Headers.Add("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36");
html_content += client.DownloadString("https://noembed.com/embed?url=https://www.youtube.com/watch?v=1FLhOGOg2Qg");
}
Console.WriteLine(html_content);
Console.ReadLine();
And this is the error I get:
System.Net.WebException was unhandled
HResult=-2146233079
Message=The request was aborted: Could not create SSL/TLS secure channel.
Source=System
I am using this on a WPF application and I am OK with ignoring SSL here. I have already tried other answers for ignoring SSL but none worked. It works with other urls, eg https://www.youtube.com/watch?v=1FLhOGOg2Qg but not with the noembed.com URL.
Add ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
this worked for me :
var html_content = "";
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
using (var client = new WebClient())
{
client.Headers.Add("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.1 Safari/537.36");
html_content += client.DownloadString("https://noembed.com/embed?url=https://www.youtube.com/watch?v=1FLhOGOg2Qg");
}
Console.WriteLine(html_content);
Console.ReadLine();
output i got :
{"author_url":"https://www.youtube.com/user/nogoodflix","url":"https://www.youtube.com/watch?v=1FLhOGOg2Qg","provider_url":"https://www.youtube.com/","title":"ONE FOR THE MONEY Trailer 2011 Official [HD] Katherine Heigl","author_name":"Streaming Clips","type":"video","height":270,"thumbnail_height":360,"thumbnail_width":480,"provider_name":"YouTube","html":"\nhttps://www.youtube.com/embed/1FLhOGOg2Qg?feature=oembed\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\">\n","thumbnail_url":"https://i.ytimg.com/vi/1FLhOGOg2Qg/hqdefault.jpg","version":"1.0","width":480}
Related
I am trying to download the HTML from a site and parse it. I am actually interested in the OpenGraph data in the head section only. For most sites using the WebClient, HttpClient or HtmlAgilityPack works, but some domains I get 403, for example: westelm.com
I have tried setting up the Headers to be absolutely the same as they are when I use the browser, but I still get 403. Here is some code:
string url = "https://www.westelm.com/m/products/brushed-herringbone-throw-t5792/?";
var doc = new HtmlDocument();
using(WebClient client = new WebClient()) {
client.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36";
client.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9";
client.Headers["Accept-Encoding"] = "gzip, deflate, br";
client.Headers["Accept-Language"] = "en-US,en;q=0.9";
doc.Load(client.OpenRead(url));
}
At this point, I am getting a 403.
Am I missing something or the site administrator is protecting the site from API requests?
How can I make this work? Is there a better way to get OpenGraph data from a site?
Thanks.
I used your question to resolve the same problem. IDK if you're already fixed this but I tell you how it worked for me
A page was giving me 403 for the same reasons. The thing is: you need to emulate a "web browser" from the code, sending a lot of headers.
I used one of yours headers I wasn't using (like Accept-Language)
I didn't use WebClient though, I used HttpClient to parse the webpage
private static async Task<string> GetHtmlResponseAsync(HttpClient httpClient, string url)
{
using var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url));
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36");
request.Headers.TryAddWithoutValidation("Accept-Charset", "UTF-8");
request.Headers.TryAddWithoutValidation("Accept-Language", "en-US,en;q=0.9");
using var response = await httpClient.SendAsync(request).ConfigureAwait(false);
if (response == null)
return string.Empty;
using var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false);
using var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress);
using var streamReader = new StreamReader(decompressedStream);
return await streamReader.ReadToEndAsync().ConfigureAwait(false);
}
If it helps you, I'm glad. If not, I will leave this answer here to help someone else in the future!
I have been trying to make a simple proxy checker...
WebProxy myProxy = default(WebProxy);
foreach (string proxy in Proxies)
{
try
{
myProxy = new WebProxy(proxy);
HttpWebRequest r = HttpWebRequest.Create("http://www.google.com");
r.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36";
r.Timeout = 3000;
r.Proxy = myProxy;
HttpWebResponse re = r.GetResponse();
Console.WriteLine($"[+] {proxy} Good", ConsoleColor.Green);
}
catch (Exception)
{
Console.WriteLine($"[-] {proxy} Bad", ConsoleColor.Red);
}
}
for some reason this line:
HttpWebRequest r = HttpWebRequest.Create("http://www.google.com");
I see a little red line under the http, and this is the error I get
The best overload for Create does not have a parameter names http
How can I fix it? and I how can I make it check proxies reall fast, not like 1 proxy every 5 seconds
The HttpWebRequest class's Create method takes the URL as a string, not HTML:
HttpWebRequest r = HttpWebRequest.Create("http://www.google.com");
Since there's actually no Create on HttpWebRequest, but only on WebRequest, your code is most likely actually this:
HttpWebRequest r = WebRequest.Create("http://www.google.com");
But what you want is this:
HttpWebRequest r = (HttpWebRequest)WebRequest.Create("http://www.google.com");
I want to fill my MultiLine textbox from webpage's this is my code:
WebRequest request = WebRequest.Create(urltxt.Text.Trim());
WebResponse response = request.GetResponse();
Stream data = response.GetResponseStream();
string html = String.Empty;
using (StreamReader sr = new StreamReader(data))
{
html = sr.ReadToEnd();
}
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var htmlBody = htmlDoc.DocumentNode.SelectSingleNode("//body");
valuetxt.Text = htmlBody.InnerText;
This code is working fine for some url but for some url (https) this gave me an error:
Could not find file 'C:\Program Files\IIS Express\www.justdial.com
or:
The remote server returned an error: (403) Forbidden
Can anyone help me? Thanks in advance, sorry for my bad English.
Are you behind a proxy? Even on open internet, depending on your network configuration, you might need to set credentials in your connection before requesting.
WebRequest request = WebRequest.Create(urltxt.Text.Trim());
request.Credentials = new NetworkCredential("user", "password");
It seems your address doesn't have http:// or https:// at the beginning; in the urltxt variable and you get error because of relative addressing.
Add a UserAgent to your request to connect https properly:
request.UserAgent = #"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36";
from here
I'm trying to automate a WebSocket service that denies connection unless you send a user agent with the CONNECT request.
I tried sending the upgrade request with HttpWebRequest and setting User-Agent using the property.
Using Fiddler to debug the request this was sent out:
CONNECT *.*.com:443 HTTP/1.1
Host: *.*.com:443
Connection: keep-alive
How do I add the User-Agent string to the CONNECT request and then upgrade to using WebSocket protocol?
My code so far:
public void Login ( Action onEnd = null ) {
var req = CreateUpgradeRequest();
var res = GetResponse(req);
}
private HttpWebRequest CreateUpgradeRequest ( ) {
HttpWebRequest request = WebRequest.Create("https://lobby35.runescape.com/") as HttpWebRequest;
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36";
request.Connection = "Upgrade";
SetWebSocketHeader(request, "Key", "5LENZfSifyj/Rw1ghTvpgw==");
SetWebSocketHeader(request, "Version", "13");
SetWebSocketHeader(request, "Extensions", "permessage-deflate; client_max_window_bits");
SetWebSocketHeader(request, "Protocol", "jagex");
return request;
}
You cannot use WebRequest to create a websocket connection. You will need ClientWebSocket and use `ClientWebSocket.Options.SetRequestHeader.
Note, you may have issues adding that header: Setting "User-Agent" HTTP header in ClientWebSocket
Update: Since you cannot add that header with ClientWebSocket try with Websocket4Net.
Can anybody tell me how i can download file in my C# program from that URL:
http://www.cryptopro.ru/products/cades/plugin/get_2_0
I try to use WebClient.DownloadFile, but i'm getting only html page instead of file.
Looking in Fiddler the request fails if there is not a legitimate U/A string, so:
WebClient wb = new WebClient();
wb.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.33 Safari/537.36");
wb.DownloadFile("http://www.cryptopro.ru/products/cades/plugin/get_2_0/cadeplugin.exe", "c:\\xxx\\xxx.exe");
I belive this would do the trick.
WebClient wb = new WebClient();
wb.DownloadFile("http://www.cryptopro.ru/products/cades/plugin/get_2_0/cadeplugin.exe","file.exe");
If you need to know the download status or use credentials in order to make the request, I'll suggest this solution:
WebClient client = new WebClient();
Uri ur = new Uri("http://remoteserver.do/images/img.jpg");
client.Credentials = new NetworkCredential("username", "password");
client.DownloadProgressChanged += WebClientDownloadProgressChanged;
client.DownloadDataCompleted += WebClientDownloadCompleted;
client.DownloadFileAsync(ur, #"C:\path\newImage.jpg");
And her it is the implementation of the callbacks:
void WebClientDownloadProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
Console.WriteLine("Download status: {0}%.", e.ProgressPercentage);
}
void WebClientDownloadCompleted(object sender, DownloadDataCompletedEventArgs e)
{
Console.WriteLine("Download finished!");
}
Try WebClient.DownloadData
You would get response in the form of byte[] then you can do whatever you want with that.
Sometimes a server would not let you download files with scripts/code. to take care of this you need to set user agent header to fool the server that the request is coming from browser. using the following code, it works. Tested ok
var webClient=new WebClient();
webClient.Headers["User-Agent"] =
"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36";
webClient.DownloadFile("the url","path to downloaded file");
this will work as you expect, and you can download file.