Get all images from website has block calls - c#

I'm trying get all images from page:
public async Task<PhotoURL> GetImagePortal()
{
strLinkPage = "http://www.propertyguru.com.sg/listing/19077438";
var lstString = new List<string>();
int itotal = default(int);
HttpClient client = new HttpClient();
var doc = new HtmlAgilityPack.HtmlDocument();
string strHtml = await client.GetStringAsync(strLinkPage);
doc.LoadHtml(strHtml);
var pageHtml = doc.DocumentNode;
if (pageHtml != null)
{
var projectRoot = pageHtml.SelectSingleNode("//*[contains(#class,'submain')]");
//var projectChild = projectRoot.SelectSingleNode("div/div[2]");
var imgRoot = projectRoot.SelectSingleNode("//*[contains(#class,'white-bg-padding')]");
var imgChilds = imgRoot.SelectNodes("div[1]/div[1]/ul[1]/li");
itotal = imgChilds.Count();
foreach (var imgItem in imgChilds)
{
string linkImage = imgItem.SelectSingleNode("img").Attributes["src"].Value;
lstString.Add(linkImage);
}
}
return await Task.Run(() => new PhotoURL { total = itotal, URL = lstString });
}
at line
string strHtml = await client.GetStringAsync(strLinkPage);
i have error 405 method not allowed.
I try using
WebClient, HTTPWebRequest.
help me, please!

The site required a user-agent and since you are just using a HttpClient without any options, the site does not think it is a correct request (It does not look like it's coming from a browser without the user agent).
Try this:
HttpClient client = new HttpClient();
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36");
Or if you prefer any other user agents strings.

Related

send request and receive response with selenium and cookies [duplicate]

I log into the site https://dmarket.com. I want to save cookies and use later. In order not to visit the site next time.
private void login_Click(object sender, EventArgs e)
{
string login = textBox1.Text;
string password = textBox2.Text;
string steamguard = textBox3.Text;
IWebDriver driver = new ChromeDriver();
driver.Navigate().GoToUrl(#"https://steamcommunity.com/openid/login?openid.claimed_id=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.identity=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0%2Fidentifier_select&openid.mode=checkid_setup&openid.ns=http%3A%2F%2Fspecs.openid.net%2Fauth%2F2.0&openid.realm=https%3A%2F%2Fapi.dmarket.live&openid.return_to=https%3A%2F%2Fapi.dmarket.live%2Fauth%2Fv1%2Fcallback%2Fsteam%2F901e7d34-06c1-44b0-82b4-2f982c058361");
driver.FindElement(By.XPath("//*[#id=\"steamAccountName\"]")).SendKeys(login);
driver.FindElement(By.XPath("//*[#id=\"steamPassword\"]")).SendKeys(password);
driver.FindElement(By.XPath("//*[#id=\"imageLogin\"]")).Click();
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(150);
driver.FindElement(By.XPath("//*[#id=\"twofactorcode_entry\"]")).SendKeys(steamguard);
driver.FindElement(By.XPath("//*[#id=\"login_twofactorauth_buttonset_entercode\"]/div[1]")).Click();
var cookies = driver.Manage().Cookies.AllCookies;
driver.Manage().Cookies.AddCookie(cookies);
}
But an error occurs: Error CS1503 Argument 1: Unable to convert from "System.Collections.ObjectModel.ReadOnlyCollection <OpenQA.Selenium.Cookie>" to "OpenQA.Selenium.Cookie". Maybe I'm doing something wrong. And maybe it should have been done differently.
Thank you!
From my experience dealing high level with cookies will fail you , to master and get root of the problem my way :
Get Cookie manager extension for Firefox or whatever browser you are using.
see how many cookies key/value you are getting after logging.
install fiddler sniffer and see how many of them sent in request after login when browsing the website.
extract that cookies and inject them in HttpClient or similar class and track the requests with fiddler if succeeded or not .
once socket request succeeded , i add the same headers and cookies to selenium request, and continue doing my selenium stuff.
maybe longer approach but always worked with me , let me show you an example with instagram login :
var ig_did = driver.Manage().Cookies.GetCookieNamed("ig_did");
var sessionid = driver.Manage().Cookies.GetCookieNamed("sessionid");
var mid = driver.Manage().Cookies.GetCookieNamed("mid");
var ig_nrcb = driver.Manage().Cookies.GetCookieNamed("ig_nrcb");
var rur = driver.Manage().Cookies.GetCookieNamed("rur");
var csrftoken = driver.Manage().Cookies.GetCookieNamed("csrftoken");
var ds_user_id = driver.Manage().Cookies.GetCookieNamed("ds_user_id");
string ig_did_value = ig_did.ToString().Substring(0, ig_did.ToString().IndexOf(";")).Replace("ig_did=", "");
string sessionid_value = sessionid.ToString().Substring(0, sessionid.ToString().IndexOf(";")).Replace("sessionid=", "");
string mid_value = mid.ToString().Substring(0, mid.ToString().IndexOf(";")).Replace("mid=", "");
string ig_nrcb_value = ig_nrcb.ToString().Substring(0, ig_nrcb.ToString().IndexOf(";")).Replace("ig_nrcb=", "");
string rur_value = rur.ToString().Substring(0, rur.ToString().IndexOf(";")).Replace("rur=", "");
string ds_user_id_value = ds_user_id.ToString().Substring(0, ds_user_id.ToString().IndexOf(";")).Replace("ds_user_id=", "");
string csrftoken_value = csrftoken.ToString().Substring(0, csrftoken.ToString().IndexOf(";")).Replace("csrftoken=", "");
Then inject them to HttpClient and sniff them with fiddler :
var baseAddress = new Uri("https://www.instagram.com");
var cookieContainer = new CookieContainer();
using (var handler = new HttpClientHandler()
{
CookieContainer = cookieContainer,
Proxy = new WebProxy("127.0.0.1:8888", false),
UseProxy = true,
AllowAutoRedirect = true
})
using (httpclient = new HttpClient(handler) { BaseAddress = baseAddress })
{
httpclient.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36");
httpclient.DefaultRequestHeaders.Add("X-CSRFToken", csrftoken_value);
httpclient.DefaultRequestHeaders.Add("Referer", "My_Instagram_URL");
httpclient.DefaultRequestHeaders.Add("X-IG-App-ID", Ig_app_Id_value);
httpclient.DefaultRequestHeaders.Add("Origin", "https://www.instagram.com");
httpclient.DefaultRequestHeaders.Add("Connection", "keep-alive");
httpclient.DefaultRequestHeaders.Add("X-Requested-With", "XMLHttpRequest");
httpclient.DefaultRequestHeaders.Add("Sec-Fetch-Site", "same-origin");
httpclient.DefaultRequestHeaders.Add("Sec-Fetch-Mode", "cors");
httpclient.DefaultRequestHeaders.Add("Sec-Fetch-Dest", "empty");
cookieContainer.Add(baseAddress, new System.Net.Cookie("ig_did", ig_did_value));
cookieContainer.Add(baseAddress, new System.Net.Cookie("mid", mid_value));
cookieContainer.Add(baseAddress, new System.Net.Cookie("ig_nrcb", ig_nrcb_value));
cookieContainer.Add(baseAddress, new System.Net.Cookie("csrftoken", csrftoken_value));
cookieContainer.Add(baseAddress, new System.Net.Cookie("sessionid", sessionid_value));
cookieContainer.Add(baseAddress, new System.Net.Cookie("rur", rur_value));
string url = "My_Instagram_URL";
var response = await httpclient.GetAsync(url);
}
As said it's looks long approach but this is will always work.
Good Luck.

Unable to fetch data using HttpWebRequest or HtmlAgilityPack

I am trying to make web scraper in C# for NSE. The code works with other sites but when ran on https://www.nseindia.com/ it gives error - An error occurred while sending the request. Unable to read data from the transport connection: Operation timed out.
I have tried with two different approaches Try1() & Try2().
Can anyone please tell what I am missing in my code?
class Program
{
public void Try1() {
HtmlWeb web = new HtmlWeb();
HttpStatusCode statusCode = HttpStatusCode.OK;
web.UserAgent = GetUserAgent();
web.PostResponse = (request, response) =>
{
if (response != null)
{
statusCode = response.StatusCode;
Console.WriteLine("Status Code: " + statusCode);
}
};
Task<HtmlDocument> task = web.LoadFromWebAsync(GetURL());
HtmlDocument document = task.Result;
}
public void Try2() {
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(GetURL());
request.UserAgent = GetUserAgent();
request.Accept= "*/*;";
using (var response = (HttpWebResponse)(request.GetResponse()))
{
HttpStatusCode code = response.StatusCode;
if (code == HttpStatusCode.OK)
{
using (StreamReader streamReader = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.Load(streamReader);
Console.WriteLine("Document Loaded.");
}
}
}
}
private string GetURL() {
// return "https://html-agility-pack.net/";
return "https://www.nseindia.com/";
}
private string GetUserAgent() {
return "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36";
}
}
Your are lack of headers towards Accept and others so it couldn't response back.
Besides that, I would recommend you using HttpClient instead of HttpWebRequest
public static async Task GetHtmlData(string url)
{
HttpClient httpClient = new HttpClient();
using (var request = new HttpRequestMessage(HttpMethod.Get, new Uri(url)))
{
request.Headers.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml, charset=UTF-8, text/javascript, */*; q=0.01");
request.Headers.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate, br");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36 OPR/67.0.3575.137");
request.Headers.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");
request.Headers.TryAddWithoutValidation("X-Requested-With", "XMLHttpRequest");
using (var response = await httpClient.SendAsync(request).ConfigureAwait(false))
{
response.EnsureSuccessStatusCode();
using (var responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false))
using (var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress))
using (var streamReader = new StreamReader(decompressedStream))
{
var result = await streamReader.ReadToEndAsync().ConfigureAwait(false);
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(result);
Console.WriteLine(result);
Console.WriteLine("Document Loaded.");
}
}
}
Use it by
await GetHtmlData("https://www.nseindia.com/");

How to Access File Content from C# HTTPWebRequest with AutoRedirect

I'm trying to call a URL that should return an authentication token.
Data is posted to the URL and after a number of redirects returns a JSON object with a token.
I'm using C# and WPF.
Here is the excerpt from what I am doing:
HttpWebRequest request1 = (HttpWebRequest)WebRequest.Create(action);
request1.Method = "POST";
StringBuilder sb = new StringBuilder();
String boundary = "-----------------------------1721856231228";
foreach (elem in elems)
{
String nameStr = elem.GetAttribute("name");
if (nameStr != null && nameStr.Length != 0)
{
String valueStr = elem.GetAttribute("value");
sb.Append("\r\n" + boundary + "\r\n");
sb.Append("Content-Disposition: form-data; name=\"" + nameStr + "\"" + "\r\n");
sb.Append("\r\n");
sb.Append(valueStr);
}
}
sb.Append("\r\n--" + boundary + "--" + "\r\n");
String postData1 = sb.ToString();
request1.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3";
request1.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36";
request1.ContentType = "application/x-www-form-urlencoded; boundary=" + boundary;
request1.ContentLength = postData1.Length;
request1.KeepAlive = true;
request1.AllowAutoRedirect = true;
StreamWriter w = new StreamWriter(request1.GetRequestStream());
w.Write(postData1);
w.Close();
HttpWebResponse response1 = (HttpWebResponse)request1.GetResponse();
StreamReader reader1 = new StreamReader(response1.GetResponseStream());
String responseText1 = reader1.ReadToEnd();
reader1.Close();
response1.Close();
But the response doesn't contain the JSON with a token.
I am using Fiddler and can pause at the end of the above code and the URI that should have the JSON hasn't been called. I can continue executing other code in the debugger, and then later, Fiddler will show the URI as having been called and a File Download popup lets me then download a JSON file that contains the token.
I don't want the popup and I want to be able to capture the JSON data programmatically.
I found by adding the following line to the end of the code above, and just executing that line in the debugger, that Fiddler will report that the token URL has been called (and I can see in Fiddler the correct JSON response):
System.Windows.Forms.Application.DoEvents();
But I don't know how to access this response or how to short-circuit the file download popup from not happening.
Maybe something in the KeepAlive setting would help?
try newtonsoft to read
using Newtonsoft.Json
TokenModel tokenModel;
StreamReader reader1 = new StreamReader(response1.GetResponseStream());
using (JsonTextReader reader = new JsonTextReader(reader1))
{
tokenModel = serializer.Deserialize<TokenModel>(reader);
}
Reference: https://www.newtonsoft.com/json/help/html/ReadJson.htm
Or you can use following complete request with response with HttpClient
var client = new HttpClient();
client.BaseAddress = new Uri("your url");
int _TimeoutSec = 90;
client.Timeout = new TimeSpan(0, 0, _TimeoutSec);
string _ContentType = "application/x-www-form-urlencoded";
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue(_ContentType));
//if you have any content to send use following keyValuePair
var kv = new List<KeyValuePair<string, string>>();
kv.Add(new KeyValuePair<string, string>("key1", "value"));
kv.Add(new KeyValuePair<string, string>("key2", "value"));
var req = new HttpRequestMessage(System.Net.Http.HttpMethod.Post, "your url") { Content = new FormUrlEncodedContent(kv) };
var responseAsyn = client.SendAsync(req);
var response = responseAsyn.GetAwaiter().GetResult();
TokenModel tokenResponse = new TokenModel();
if (response.StatusCode == System.Net.HttpStatusCode.OK)
{
var responseString = response.Content.ReadAsStringAsync().GetAwaiter().GetResult();
tokenResponse = JsonConvert.DeserializeObject<TokenModel>(responseString);
}

I can't use xNet to create a hostname on noip.com

I can't use xNet to create a hostname on noip.com. My post action will return a redirect to the login page. This is why?
using (var req = new HttpRequest())
{
req.UserAgent = "Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 950) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Mobile Safari/537.36 Edge/13.10586";
CookieDictionary _cookie = new CookieDictionary(false);
req.Cookies = _cookie;
req.AddHeader("Accept-Language", "vi-VN,vi;q=0.8,en-US;q=0.5,en;q=0.3");
req.AddHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
req.CharacterSet = Encoding.UTF8;
req.Referer = "https://www.noip.com/";
req.KeepAlive = true;
string input = "";
string value = "";
input = req.Get("https://www.noip.com/login", null).ToString();
value = Regex.Match(input, "name=\"csrf-token\" content=\"(.*?)\"").Groups[1].Value;
string param = string.Concat(new object[]
{
"_token=",
value,
"&username=fxnzpkg4hzm#johnpo.gq&password=cuongdzvlne&submit_login_page=1&_token=",
value,
"&Login"
});
// Login noip.com
input = req.Post("https://www.noip.com/login", param, "application/x-www-form-urlencoded").ToString();
req.Referer = "https://my.noip.com/";
req.AddHeader("Origin", "https://my.noip.com");
req.AddHeader("Accept", "application/json");
param = "{\"id\":0,\"target\":\"45.77.254.222\",\"name\":\"" + Path.GetRandomFileName().Replace(".", "") + "\",\"domain\":\"zapto.org\",\"wildcard\":false,\"type\":\"A\",\"ipv6\":\"\",\"url\":{\"scheme\":\"http\",\"is_masq\":false,\"masq_title\":\"\",\"meta_desc\":\"\",\"meta_keywords\":\"\"},\"is_offline\":false,\"offline_settings\":{\"action\":\"noop\",\"ip\":\"\",\"url\":\"\",\"protocol\":\"http\",\"page\":{\"title\":\"\",\"image_url\":\"\",\"text\":\"\",\"email\":\"\"}},\"mx_records\":[]}";
req.AddHeader("Content-Length", Convert.ToString(Encoding.UTF8.GetBytes(param).Length));
// Create hostname
input = req.Post("https://my.noip.com/api/host", param, "application/json").ToString();
File.AppendAllText("kq.html", input);
if (input.Contains("https://www.noip.com/login"))
{
MessageBox.Show("-------------- Error");
}
else
{
MessageBox.Show("-------------- OK");
}
}

HTMLAgilityPack get class innerText

I am trying to get the innerText of a class.
This is my code:
using (HttpClient clientduplicate = new HttpClient())
{
clientduplicate.DefaultRequestHeaders.Add("User-Agent",
"Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident / 6.0)");
using (HttpResponseMessage responseduplicate = await clientduplicate.GetAsync(#"https://www.investing.com/news/stock-market-news/warren-buffett:-i-bought-$12-billion-of-stock-after-trump-won-456954")
using (HttpContent contentduplicate = responseduplicate.Content)
{
try
{
string resultduplicate = await contentduplicate.ReadAsStringAsync();
var websiteduplicate = new HtmlDocument();
websiteduplicate.LoadHtml(resultduplicate);
var titlesduplicate = websiteduplicate.DocumentNode.Descendants("div").FirstOrDefault(o => o.GetAttributeValue("class", "") == "arial_14 clear WYSIWYG newsPage");
var match = Regex.Match(titlesduplicate.InnerText, #"(.*?)<!--", RegexOptions.Singleline).Groups[1].Value;
Debug.WriteLine(match.TrimStart());
}
catch(Exception ex1)
{
var dialog2 = new MessageDialog(ex1.Message);
await dialog2.ShowAsync();
}
}
}
Now the problem is that this will also return me the text on the picture. I can find a workaround but I was wondering if there is an other approach on this.
Something simpler/faster.
Plus when I use this on other articles/URLs there are other minor bugs.
There are many ways to do this. One way is to remove the carousel div before getting innerText:
doc.DocumentNode.Descendants("div").FirstOrDefault(_ => _.Id.Equals("imgCarousel"))?.Remove();

Categories

Resources