Converting a HtmlAgilityPack.HtmlNode to a string - c#

private void button2_Click(object sender, EventArgs e)
{
MessageBox.Show("In devolopment","Error", MessageBoxButtons.OK);
HtmlAgilityPack.HtmlWeb hw = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load("https://www.stackoverflow.com");
foreach (HtmlAgilityPack.HtmlNode link in doc.DocumentNode.SelectNodes("//a[#href]"))
{
usercon(link);
}
}
.
public void usercon(string toprint)
{
richTextBox1.Text += "\r\n";
richTextBox1.Text += toprint;
//richTextBox1.
}
I need to be able to convert link to a string so that in can be used in the function usercon
This is my first time using the HtmlAgilityPack.

According to the source code found here:
https://htmlagilitypack.codeplex.com/SourceControl/latest#Release/1_4_0/HtmlAgilityPack/HtmlNode.cs
See also (new) documentation: http://html-agility-pack.net/outer-html
HtmlNode has an OuterHtml property and its source on GitHub.
private void button2_Click(object sender, EventArgs e)
{
MessageBox.Show("In devolopment","Error", MessageBoxButtons.OK);
HtmlAgilityPack.HtmlWeb hw = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = hw.Load("https://www.stackoverflow.com");
foreach (HtmlAgilityPack.HtmlNode link in doc.DocumentNode.SelectNodes("//a[#href]"))
{
usercon(link.OuterHtml);
}
}

Related

How do I scrape web content async?

Here is what I tried so far. This works but the Form is Freezing everytime it updates
private void timer1_Tick(object sender, EventArgs e)
{
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("https://www.roblox.com/catalog/527365852/Dominus-Praefectus");
foreach (var item in doc.DocumentNode.SelectNodes("//*[#id='item-details']/div[1]/div[1]/div[2]/div/span[2]"))
{
textBox1.Text = item.InnerText;
}
}

Unable to attribute button to XSLT Transformation (C#)

I am attempting to create an interface that allows me to select 3 XSLT files and merge them together to then be transformed using an XML. I have a working transformation code I am creating a user interface for.
Transformation code:
public static void Transform(string sXmlPath, string sXslPathBody, string sXslPathHead, string sXslPathFoot, string sXslPathMerged)
{
try
{
XNamespace ns = "http://www.w3.org/1999/XSL/Transform";
//load Xml
XPathDocument myXPathDoc = new XPathDocument(sXmlPath);
//Load Body
XElement xslt = XElement.Load(sXslPathBody);
//Add Code To Body
xslt.AddFirst(new XElement(ns + "include", new XAttribute("href", sXslPathFoot)));
xslt.AddFirst(new XElement(ns + "include", new XAttribute("href", sXslPathHead)));
XElement body = xslt.Descendants("body").Single();
body.AddFirst(new XElement(ns + "call-template", new XAttribute("name", "Header")));
body.Add(new XElement(ns + "call-template", new XAttribute("name", "Footer")));
//Save Combined File
//XElement.Save("c:\temp.xlst");
xslt.Save(sXslPathMerged);
XslCompiledTransform myXslTrans = new XslCompiledTransform();
//load Combined File
myXslTrans.Load(sXslPathMerged);
//Merge XML with Combined File
XmlTextWriter myWriter = new XmlTextWriter ("result.html", null);
//transform Xml
myXslTrans.Transform(myXPathDoc, null, myWriter);
myWriter.Close();
}
catch (Exception e)
{
Console.WriteLine("Exception: {0}", e.ToString());
}
}
The Transformation code works fine however I am trying to set each stage to a button which will allow me to select any 3 XSLT files I wish, as well as any XML I wish. Then a button to merge the 3 XSLT files together & create a HTML output.
Code:
public partial class Form1 : Form
{
String HeadFileSelected, BodyFileSelected, FootFileSelected, XmlFileSelected, MergeFiles;
public Form1()
{
InitializeComponent();
}
//Head
private void openFileHead_FileOk(object sender, CancelEventArgs e)
{
}
private void header_Click(object sender, System.EventArgs e)
{
if (openFileHead.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
HeadFileSelected = openFileHead.FileName;
string filename = HeadFileSelected;
textBox2.Text = HeadFileSelected;
}
}
//Body
private void openFileBody_FileOk(object sender, CancelEventArgs e)
{
}
private void body_Click(object sender, System.EventArgs e)
{
if (openFileBody.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
BodyFileSelected = openFileBody.FileName;
string filename = BodyFileSelected;
textBox3.Text = BodyFileSelected;
}
}
//Foot
private void openFileFooter_FileOk(object sender, CancelEventArgs e)
{
}
private void footer_Click(object sender, System.EventArgs e)
{
if (openFileFooter.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
FootFileSelected = openFileFooter.FileName;
string filename = FootFileSelected;
textBox4.Text = FootFileSelected;
}
}
//Xml
private void openFileXml_FileOk(object sender, CancelEventArgs e)
{
}
private void xml_Click(object sender, System.EventArgs e)
{
if (openFileXml.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
XmlFileSelected = openFileXml.FileName;
}
}
//Merge
private void openFileMerge_FileOk(object sender, CancelEventArgs e)
{
}
private void merge_Click(object sender, EventArgs e)
{
//XmlTransformUtil objXMLTrans = new XmlTransformUtil();
XmlTransformUtil.Transform(XmlFileSelected, BodyFileSelected, HeadFileSelected, FootFileSelected, MergeFiles);
}
}
I have managed to get the XML selection & the 3 XSLT selections to work fine it's just the Merge button that is not working as I cannot attribute the String MergeFiles; to Transform(string sXslPathMerged)
I understand why it does not work but I don't now the solution that will give me the result I need.
Managed to figure it out, I needed to define MergeFiles as the saved XSLT which wasn't going anywhere because it was null. Did it by adding the following in my form.
MergeFiles = "C:\\Users\\user\\Desktop\\TEMP.xslt";

C# Web Browser control only loads one page, will not work on the second attempt

I have urls in a listbox. I am trying to navigate to a url when it is selected.
private void lstURL_SelectedIndexChanged(object sender, EventArgs e)
{
wbrBrowser.Navigate(lstURL.Text);
lblUrl.Text = lstURL.Text;
lblTitle.Text = "Loading...";
System.Windows.Forms.HtmlDocument document = wbrBrowser.Document;
document.MouseUp += new HtmlElementEventHandler(this.htmlDocument_Click);
}
private void wbrBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
lblTitle.Text = wbrBrowser.Document.Title;
}
private void htmlDocument_Click(object sender, HtmlElementEventArgs e)
{
HtmlElement element = this.wbrBrowser.Document.GetElementFromPoint(e.ClientMousePosition);
var savedId = element.Id;
var uniqueId = Guid.NewGuid().ToString();
element.Id = uniqueId;
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(element.Document.GetElementsByTagName("html")[0].OuterHtml);
element.Id = savedId;
var node = doc.GetElementbyId(uniqueId);
var xpath = node.XPath;
lblXpath.Text = xpath;
}
It works the first time I load a page, after that it just freezes and lblTitle.Text just stays at "Loading..."
I have been searching for a while but I can't figure out why this is happening.

Trying to log in to a website through a C# program

I'm new to C# so I looked for this topic in other questions but they weren't for me. What I am trying to do is I currently try to login to my school's servers using a c# program(Which I'm trying to implement). What I'm trying to do is I know the code of the page, so I am using web browser of c# to navigate then I just want to write name and password to the input boxes and this is where I stuck. Can you please give me any advices?
If you want to look at page: https://suis.sabanciuniv.edu/prod/twbkwbis.P_SabanciLogin
Thanks for your advices.
Here how I used the code(Edit: Added eventhandler but this is my first time using so it promts me "object reference not set to a instance of an object"):
private void buttonGo_Click(object sender, EventArgs e)
{
try
{
string input = "https://suis.sabanciuniv.edu/prod/twbkwbis.P_SabanciLogin";
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(WebBrowser_DocumentCompleted);
webBrowser1.Navigate(input);
HtmlDocument doc = webBrowser1.Document;
HtmlElement userName = doc.GetElementById("UserID");
HtmlElement pass = doc.GetElementById("PIN");
HtmlElement submit = doc.GetElementById("Login");
userName.SetAttribute("value", textID.Text);
pass.SetAttribute("value", textPASS.Text);
submit.InvokeMember("Click");
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
public void WebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var webBrowser = sender as WebBrowser;
webBrowser.DocumentCompleted -= WebBrowser_DocumentCompleted;
MessageBox.Show(webBrowser.Url.ToString());
}
}
}
Finally I solved problem I cheated a little but managed to solve. Here is the working code:
private void buttonGo_Click(object sender, EventArgs e)
{
try
{
string input = "https://suis.sabanciuniv.edu/prod/twbkwbis.P_SabanciLogin";
webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(WebBrowser_DocumentCompleted);
webBrowser1.Navigate(input);
HtmlDocument doc = webBrowser1.Document;
//HtmlElement userName = doc.GetElementById("UserID"); These not worked because ID of the elements were hidden so they are here to show which of these did not work.
//HtmlElement pass = doc.GetElementById("password");
HtmlElement submit = webBrowser1.Document.Forms[0].Document.All["PIN"].Parent.Parent.Parent.NextSibling.FirstChild;
//userName.SetAttribute("value", textID.Text);
//pass.SetAttribute("value", textPASS.Text);
webBrowser1.Document.Forms[0].All["UserID"].SetAttribute("value", textID.Text);
webBrowser1.Document.Forms[0].All["PIN"].FirstChild.SetAttribute("value", textPASS.Text);
submit.InvokeMember("Click");
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
public void WebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var webBrowser = sender as WebBrowser;
webBrowser.DocumentCompleted -= WebBrowser_DocumentCompleted;
MessageBox.Show(webBrowser.Url.ToString());
}
You need to find the input boxes of the username and password fields as ID's or nodes first. Then assign them as such:
HtmlDocument doc = webBrowser1.Document;
HtmlElement email = doc.GetElementById("email");
HtmlElement pass = doc.GetElementById("pass");
HtmlElement submit = doc.GetElementById("LoginButton");
email.SetAttribute("value", "InsertYourEmailHere");
//Same for password
submit.InvokeMember("Click");

C# stopping an infinite foreach loop

This foreach loop checks a webpage and sees if there are any images then downloads them. How do i stop it? When i press the button it continues the loop forever.
private void button1_Click(object sender, EventArgs e)
{
WebBrowser browser = new WebBrowser();
browser.DocumentCompleted +=browser_DocumentCompleted;
browser.Navigate(textBox1.Text);
}
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = sender as WebBrowser;
HtmlElementCollection imgCollection = browser.Document.GetElementsByTagName("img");
WebClient webClient = new WebClient();
int count = 0; //if available
int maximumCount = imgCollection.Count;
try
{
foreach (HtmlElement img in imgCollection)
{
string url = img.GetAttribute("src");
webClient.DownloadFile(url, url.Substring(url.LastIndexOf('/')));
count++;
if(count >= maximumCount)
break;
}
}
catch { MessageBox.Show("errr"); }
}
use the break; keyword to break out of a loop
You do not have an infinite loop, you have an exception that is being thrown based on how you are writing the file to disk
private void button1_Click(object sender, EventArgs e)
{
WebBrowser browser = new WebBrowser();
browser.DocumentCompleted += browser_DocumentCompleted;
browser.Navigate("www.google.ca");
}
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = sender as WebBrowser;
HtmlElementCollection imgCollection = browser.Document.GetElementsByTagName("img");
WebClient webClient = new WebClient();
foreach (HtmlElement img in imgCollection)
{
string url = img.GetAttribute("src");
string name = System.IO.Path.GetFileName(url);
string path = System.IO.Path.Combine(Environment.CurrentDirectory, name);
webClient.DownloadFile(url, path);
}
}
That code works fine on my environment. The issue you seemed to be having was when you were setting the DownloadFile filepath, you were setting it to a value like `\myimage.png', and the webclient could not find the path so it threw and exception.
The above code drops it into the current directory with the extension name.
Maybe the Event browser.DocumentCompleted cause the error, if the page refreshes the event gets fired again. You could try to deregister the event.
void browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser browser = sender as WebBrowser;
browser.DocumentCompleted -= browser_DocumentCompleted;
HtmlElementCollection imgCollection = browser.Document.GetElementsByTagName("img");
WebClient webClient = new WebClient();
foreach (HtmlElement img in imgCollection)
{
string url = img.GetAttribute("src");
string name = System.IO.Path.GetFileName(url);
string path = System.IO.Path.Combine(Environment.CurrentDirectory, name);
webClient.DownloadFile(url, path);
}
}

Categories

Resources