Getting <img src=""> attribute from AliExpress error

Getting <img src=""> attribute from AliExpress error - c#

Today I was trying to load images from aliexpress products.
I was using this code : string NowImage = HJ.GetElementsByTagName("img")[0].GetAttribute("src");
it worked for the first 8 images and didn't load the rest of images.
it was returning empty string.
And I checked the html of the aliexpress and found out that it should work.
Can someone help me ? Thanks for reading.
public bool Search()
{
WB.DocumentCompleted += WB_SearchCompleted;
WB.Navigate(URL);
while (WB.ReadyState != WebBrowserReadyState.Complete)
Application.DoEvents();
return true;
}
private void WB_SearchCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElementCollection HEC = WB.Document.GetElementsByTagName("li");
foreach(HtmlElement HJ in HEC)
{
if(HJ.GetAttribute("qrdata") == "")
continue;
NowImage = HJ.GetElementsByTagName("img")[0].GetAttribute("src");
//for the first 8 images it was loading perfect after that it was
//returning empty string
}
}

Related

Get the latest html after ajax call in webbrowser control?

There are lot of this kind of questions and I was not able to find a solution for my problem.
I have a webpage and after the webpage loads Ajax is called and it will load a table with data may be it takes 2 seconds.
I want the data inside that table.
When I try to access the table using document text It does not have the table HTML. It still have the initial HTML that has loaded before Ajax call.
webBrowser1.Update(); //Didn't work
Then I tried this didn't work
private void Timer_Tick(object sender, EventArgs e) //Interval of 5000
{
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
HtmlElement element = webBrowser1.Document.GetElementById("tableType3");
if (element != null)
{
String webbrowsercontent = element.InnerHtml;
timer.Stop();
}
}
}
Then I tried this didn't work
private void WaitTillPageLoadsCompletly(WebBrowser webBrControl)
{
WebBrowserReadyState loadStatus;
int waittime = 20000;
int counter = 0;
while (true)
{
loadStatus = webBrControl.ReadyState;
Application.DoEvents();
if ((counter > waittime) || (loadStatus == WebBrowserReadyState.Uninitialized) || (loadStatus == WebBrowserReadyState.Loading) || (loadStatus == WebBrowserReadyState.Interactive))
{
break;
}
counter++;
}
counter = 0;
while (true)
{
loadStatus = webBrControl.ReadyState;
Application.DoEvents();
if (loadStatus == WebBrowserReadyState.Complete && webBrControl.IsBusy != true)
{
break;
}
counter++;
}
}
In debugging I saw the table contents in WebBrowser1.Document.NativeHtmlDocument2 which cant be accessed because of internal class.
Is there any other way to solve my problem.

Have you tried listening to the Ajax onpropertychange event?
I've recently visited a website that teaches how to handle a Ajax component onpropertychange event in webBrowser1_DocumentCompleted.
Here's the following code, I hope this leads the way to your solution.
(The idea here is to get webBrowser1.Document.GetElementById("abc");'s dynamic content generated by AJAX, and show how you can wait on the onpropertychange event in webBrowser1_DocumentCompleted)
HTML Code
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title></title>
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
<script>
$.ajaxSetup({
cache: false
});
var aa = function() {
$.get("ajax.php", function(data) {
$("#abc").html(data);
});
};
$(function() {
aa();
setInterval(aa, 2000);
});
</script>
</head>
<body>
<div id="abc"></div>
</body>
</html>
ajax.php
<?php
echo date("H:i:s");
C# code
private void button1_Click(object sender, EventArgs e)
{
webBrowser1.Navigate("http://127.0.0.1/test.html");
}
private void handlerAbc(Object sender, EventArgs e)
{
HtmlElement elm = webBrowser1.Document.GetElementById("abc");
if (elm == null) return;
Console.WriteLine("elm.InnerHtml(handlerAbc):" + elm.InnerHtml);
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
/* Get the original HTML (method 1)*/
System.IO.StreamReader getReader = new System.IO.StreamReader(webBrowser1.DocumentStream, System.Text.Encoding.Default);
string htmlA = getReader.ReadToEnd(); // htmlA can only extract original HTML
/* Get the original HTML (method 2)*/
string htmlB = webBrowser1.DocumentText; // htmlB can only extract original HTML
/* You can use the following method to extract the 'onChanged' AJAX content*/
HtmlElement elm = webBrowser1.Document.GetElementById("abc"); // Get "abc" element by ID
Console.WriteLine("elm.InnerHtml(DocumentCompleted):" + elm.InnerHtml);
if (elm != null)
{
// Listen on 'abc' onpropertychange event
elm.AttachEventHandler("onpropertychange", new EventHandler(handlerAbc));
}
}
Result：
elm.InnerHtml(DocumentCompleted):
elm.InnerHtml(handlerAbc):06:32:36
elm.InnerHtml(handlerAbc):06:32:38
elm.InnerHtml(handlerAbc):06:32:40

I used OpenWebKitSharp to solved the problem that Html content rendered by js. If you can change the library, just go to this link to check the solution: Get final HTML content after javascript finished by Open Webkit Sharp

Winforms Webbrowser control URL Validation

I am trying to validate a winform web browser control url when a button is clicked. I would like to see if the web browser's current url matches a certain url. When I try to run this code the program freezes
private void button_Click(object sender, EventArgs e)
{
// Check to see if web browser is at URL
if (webBrowser1.Url.ToString != "www.google.com" || webBrowser1.Url.ToString == null)
{
// Goto webpage
webBrowser1.Url = new Uri("www.google.ca");
}
else {
webBrowser1.Document.GetElementById("first").InnerText = "blah";
webBrowser1.Document.GetElementById("second").InnerText = "blah";
}
}

Here you go.
private void button1_Click(object sender, EventArgs e)
{
webBrowser1.Url = new Uri("https://www.google.ca");
// Check to see if web browser is at URL
if (webBrowser1.Url != null)
{
if (webBrowser1.Url.ToString() != "https://www.google.com" || webBrowser1.Url.ToString() == null)
{
// Goto webpage
webBrowser1.Url = new Uri("www.google.ca");
}
else
{
webBrowser1.Document.GetElementById("first").InnerText = "blah";
webBrowser1.Document.GetElementById("second").InnerText = "blah";
}
}
}
1) Please use the schema with the URL.
2) Use ToString() as a function.

Better approach in C# to search data on a third party web site

Here's my requirement. There is a public website which takes alphanumeric string as input and Retrieves data into a table element (via button click). The table element has couple of labels which gets populated with corresponding data. I need a tool/solution which can check if a particular string exists in the website's database. If so retrieve all the Ids of all the occurrences of that string. Looking at the "view source" of the website (No JavaScript used there), I noted the input element name and the button element name and with the help of existing samples I was able to get a working solution. Below is the code which works but I want to check if there is any better and faster approach. I know the below code has some issues like "infinite loop" issue and others. But I am basically looking at alternate solution which can work quickly for a million records.
namespace SearchWebSite
{
public partial class Form1 : Form
{
bool searched = false;
long i;
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
i = 1;
WebBrowser browser = new WebBrowser();
string target = "http://www.SomePublicWebsite.com";
browser.Navigate(target);
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(XYZ);
}
private void XYZ(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser b = null;
if (searched == false)
{
b = (WebBrowser)sender;
b.Document.GetElementById("txtId").InnerText = "M" + i.ToString();
b.Document.GetElementById("btnSearch").InvokeMember("click");
searched = true;
}
if (b.ReadyState == WebBrowserReadyState.Complete)
{
if (b.Document.GetElementById("lblName") != null)
{
string IdNo = "M" + i.ToString();
string DateString = b.Document.GetElementById("lblDate").InnerHtml;
string NameString = b.Document.GetElementById("lblName").InnerHtml;
if (NameString != null && (NameString.Contains("XXXX") || NameString.Contains("xxxx")))
{
using (StreamWriter w = File.AppendText("log.txt"))
{
w.WriteLine("Id {0}, Date {1}, Name {2}", IdNo, DateString, NameString);
i = i + 1;
searched = false;
}
}
else
{
i = i + 1;
searched = false;
}
}
else
{
i = i + 1;
searched = false;
}
}
}
}
}

If the page after seach button clicked contains txtId and btnSearch controls than you can use this code snippet, this is not faster but the correct form I think.
public partial class Form1 : Form
{
bool searched = false;
long i = 1;
private string IdNo { get { return "M" + i.ToString(); } }
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
i = 1;
WebBrowser browser = new WebBrowser();
string target = "http://www.SomePublicWebsite.com";
browser.Navigate(target);
browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(XYZ);
}
private void XYZ(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser b = (WebBrowser)sender;
if (b.ReadyState == WebBrowserReadyState. Complete)
{
if (searched == false)
{
DoSearch(b); return;
}
if (b.Document.GetElementById("lblName") != null)
{
string DateString = b.Document.GetElementById("lblDate").InnerHtml;
string NameString = b.Document.GetElementById("lblName").InnerHtml;
if (NameString != null && (NameString.Contains("XXXX") || NameString.Contains("xxxx")))
using (StreamWriter w = File.AppendText("log.txt"))
w.WriteLine("Id {0}, Date {1}, Name {2}", IdNo, DateString, NameString);
}
i++;
DoSearch(b);
}
}
private void DoSearch(WebBrowser wb)
{
wb.Document.GetElementById("txtId").InnerText = IdNo;
wb.Document.GetElementById("btnSearch").InvokeMember("click");
searched = true;
}
}

Navigate URLs using WebBrowser DocumentCompleted

This is the scenario
1-Navigate to admin page.
2-Enter username and password
3-Navigate to new page
4-Fill some text in textareas etc and post .
5-Repeat Step 3 and 4 until loop ends
The Code Below successfully does step 1 and 2. But it reaches step 3 before new page is loaded and generates the error "Object reference not set to an instance of an object" on this line doc.GetElementById("title").SetAttribute("value", "check1");
I am trying to achieve this from last 3 days but can't reached step 3 until now. Any help will be appreciated
bool AdminPagework =false;
bool postnavigationdone =false;
public Form1()
{
InitializeComponent();
webBrowser1.DocumentCompleted +=
new WebBrowserDocumentCompletedEventHandler(AdminPageCredentials);
webBrowser1.Navigate("www.website.com/admin");
}
private void AdminPageCredentials(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (AdminPagework == false && (webBrowser1.ReadyState == WebBrowserReadyState.Complete))
{
HtmlDocument doc = webBrowser1.Document;
doc.GetElementById("login").SetAttribute("value", "ADMIN");
doc.GetElementById("pass").SetAttribute("value", "123");
doc.GetElementById("submit").InvokeMember("click");
AdminPagework = true;
webBrowser1.DocumentCompleted +=
new WebBrowserDocumentCompletedEventHandler(RedirectToPostPage);
webBrowser1.Navigate("http://www.website.com/admin/post.php");
}
}
public void RedirectToPostPage(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if ((postnavigationdone == false) && (webBrowser1.ReadyState == WebBrowserReadyState.Complete))
{
HtmlDocument doc = webBrowser1.Document;
doc.GetElementById("title").SetAttribute("value", "check1");
doc.GetElementById("content").SetAttribute("value", textBox2.Text);
doc.GetElementById("post-format-video").InvokeMember("click");
doc.GetElementById("in-category-64").InvokeMember("click");
webBrowser1.Document.GetElementById("mm").SetAttribute("value", "01");
webBrowser1.Document.GetElementById("jj").SetAttribute("value", "01");
webBrowser1.Document.GetElementById("aa").SetAttribute("value", "2013");
webBrowser1.Document.GetElementById("hh").SetAttribute("value", "01");
webBrowser1.Document.GetElementById("mm").SetAttribute("value", "01");
doc.GetElementById("publish").InvokeMember("click");
postnavigationdone = true;
}
}

var titleElement = doc.GetElementById("title");
titleElement.SetAttribute("value","check1");
Try that and see if the title element is found after all, since the most likely reason it fails is: There is no element with the name "title".
I like using ScrapySharp framework (you'll find it on NuGet) for web automation.
var titleNodes = doc.DocumentNode.CssSelect("div#title").ToList();
foreach(var titleNode in titleNodes)
{
titleNode.SetAttribute("value","check1");
}
btw. why would you do that anyway, changing this attribute? Just curious...

C# - Get variable from webbrowser generated by javascript

have downloaded page by webbrowser and need to get mail address. But it is generated by javastript. In code i can find this script:
<script type="text/javascript" charset="utf-8">var i='ma'+'il'+'to';var a='impexta#impexta.sk';document.write(''+a+'');</script>
I read everywhere how to Invoke script, by i don't know his name. So what i want is to get "a" variable value.
EDIT: Code before:
...
WebBrowser wb = new WebBrowser();
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
wb.Navigate(url);
for (; wb.ReadyState != WebBrowserReadyState.Complete; )
{
System.Windows.Forms.Application.DoEvents();
}
...
void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = sender as WebBrowser;
if (wb != null)
{
if (wb.ReadyState == WebBrowserReadyState.Complete)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(wb.DocumentStream);
}
}
}

I found easy solution. Just finding the right part of string in HTML code:
foreach (HtmlNode link in root.SelectNodes("//script"))
{
if (link.InnerText.Contains("+a+"))
{
string[] strs = new string[] { "var a='", "';document.write" };
strs = link.InnerText.Split(strs, StringSplitOptions.None);
outMail = System.Net.WebUtility.HtmlDecode(strs[1]);
if (outMail != "")
{
break;
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Getting <img src=""> attribute from AliExpress error - c#

Related

Get the latest html after ajax call in webbrowser control?

Winforms Webbrowser control URL Validation

Better approach in C# to search data on a third party web site

Navigate URLs using WebBrowser DocumentCompleted

C# - Get variable from webbrowser generated by javascript

Categories

Resources