System.ArgumentNullException when trying to access span with Xpath (C#) - c#

So i've been trying to get a program working where I get info from google finance regarding different stock stats. So far I have not been able to get information out of spans. As of now I have hardcoded direct access to the apple stock.
Link to Apple stock: https://www.google.com/finance?q=NASDAQ%3AAAPL&ei=NgItWIG1GIftsAHCn4zIAg
What i can't understand is that I receive correct output when I trying it in the chrome console with the following command:
$x("//*[#id=\"appbar\"]//div//div//div//span");
This is my current code in Visual studio 2015 with Html Agility Pack installed(I suspect a fault in currDocNodeCompanyName):
class StockDataAccess
{
HtmlWeb web= new HtmlWeb();
private List<string> testList;
public void FindStock()
{
var histDoc = web.Load("https://www.google.com/finance/historical?q=NASDAQ%3AAAPL&ei=q9IsWNm4KZXjsAG-4I7oCA.html");
var histDocNode = histDoc.DocumentNode.SelectNodes("//*[#id=\"prices\"]//table//tr//td");
var currDoc = web.Load("https://www.google.com/finance?q=NASDAQ%3AAAPL&ei=CdcsWMjNCIe0swGd3oaYBA.html");
var currDocNodeCurrency = currDoc.DocumentNode.SelectNodes("//*[#id=\"ref_22144_elt\"]//div//div");
var currDocNodeCompanyName = currDoc.DocumentNode.SelectNodes("//*[#id=\"appbar\"]//div//div//div//span");
var histDocText = histDocNode.Select(node => node.InnerText);
var currDocCurrencyText = currDocNodeCurrency.Select(node => node.InnerText);
var currDocCompanyName = currDocNodeCompanyName.Select(node => node.InnerText);
List<String> result = new List<string>(histDocText.Take(6));
result.Add(currDocCurrencyText.First());
result.Add(currDocCompanyName.Take(2).ToString());
testList = result;
}
public List<String> ReturnStock()
{
return testList;
}
}
I have been trying the Xpath expression [text] and received an output that i can work with when using the chrome console but not in VS. I have also been experimenting with a foreach-loop, a few suggested it to others.
class StockDataAccess
{
HtmlWeb web= new HtmlWeb();
private List<string> testList;
public void FindStock()
{
///same as before
var currDoc = web.Load("https://www.google.com/finance?q=NASDAQ%3AAAPL&ei=CdcsWMjNCIe0swGd3oaYBA.html");
HtmlNodeCollection currDocNodeCompanyName = currDoc.DocumentNode.SelectNodes("//*[#id=\"appbar\"]//div//div//div//span");
///Same as before
List <string> blaList = new List<string>();
foreach (HtmlNode x in currDocNodeCompanyName)
{
blaList.Add(x.InnerText);
}
List<String> result = new List<string>(histDocText.Take(6));
result.Add(currDocCurrencyText.First());
result.Add(blaList[1]);
result.Add(blaList[2]);
testList = result;
}
public List<String> ReturnStock()
{
return testList;
}
}
I would really appreciate if anyone could point me in the right direction.

If you check the contents of currDoc.DocumentNode.InnerHtml you will notice that there is no element with the id "appbar", therefore the result is correct, since the xpath doesn't return anything.
I suspect that the html element you're trying to find is generated by a script (js for example), and that explains why you can see it on the browser and not on the HtmlDocument object, since HtmlAgilityPack does not render scripts, it only download and parse the raw source code.

Related

Looping through HtmlNodes and collecting data gives me the same result every time

I have an async method which calls a mapper for turning HTML string into an IEnumerable:
public async Task<IEnumerable<MovieRatingScrape>> GetMovieRatingsAsync(string username, int page)
{
var response = await _httpClient.GetAsync($"/betyg/{username}?p={page}");
response.EnsureSuccessStatusCode();
var html = await response.Content.ReadAsStringAsync();
return new MovieRatingsHtmlMapper().Map(html);
}
...
public class MovieRatingsHtmlMapper : HtmlMapperBase<IEnumerable<MovieRatingScrape>>
{
// In reality, this method belongs to base class with signature T Map(string html)
public IEnumerable<MovieRatingScrape> Map(string html)
{
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
return Map(htmlDocument);
}
public override IEnumerable<MovieRatingScrape> Map(HtmlDocument item)
{
var movieRatings = new List<MovieRatingScrape>();
var nodes = item.DocumentNode.SelectNodes("//table[#class='list']/tr");
foreach (var node in nodes)
{
var title = node.SelectSingleNode("//td[1]/a")?.InnerText;
movieRatings.Add(new MovieRatingScrape
{
Date = DateTime.Parse(node.SelectSingleNode("//td[2]")?.InnerText),
Slug = node.SelectSingleNode("//td[1]/a[starts-with(#href, '/film/')]")?
.GetAttributeValue("href", null)?
.Replace("/film/", string.Empty),
SwedishTitle = title,
Rating = node.SelectNodes($"//td[3]/i[{XPathHasClass("fa-star")}]").Count
});
}
return movieRatings;
}
}
The resulting list movieRatings contains copies of the same object, but when I look at the HTML and when I debug and view the HtmlNode node they differ as they are supposed to.
Either I'm blind to something really obvious, or I am hitting some async issue which I do not grasp. Any ideas? I should be getting 50 unique objects out of this call, now I am only getting the first 50 times.
Thank you in advance, Viktor.
Edit: Adding some screenshots to show my predicament. Look at locals InnerHtml (node) and title for item 1 and 2 of the foreach loop.
Edit 2: Managed to reproduce on .NET Fiddle: https://dotnetfiddle.net/A2I4CQ
You need to use .// and not //
Here is the fixed Fiddle: https://dotnetfiddle.net/dZkSRN
// will search anywhere in the document
.// will search anywhere in the current node
i am not super sure how to describe this but your issue is here (i think)
//table[#class='list']/tr"
specifically the //
I experienced the same thing while looking for a span. i had to use something similar
var nodes = htmlDoc.DocumentNode.SelectNodes("//li[#class='itemRow productItemWrapper']");
foreach(HtmlNode node in nodes)
{
var nodeDoc = new HtmlDocument();
nodeDoc.LoadHtml(node.InnerHtml);
string name = nodeDoc.DocumentNode.SelectSingleNode("//span[#class='productDetailTitle']").InnerText;
}

Extent Reports version 3.0.2 - AppendExisting

Below is the code I am trying to use to append all tests to a single report. However, latest test is replacing all the older test reports. So, it's not appending to a single report for some reason. Can you please help me out here?
var htmlReporter = new ExtentHtmlReporter(ResourcesConfig.ReportPath);
extent = new ExtentReports();
extent.AttachReporter(htmlReporter);
htmlReporter.LoadConfig(ResourcesConfig.ReportXMLPath);
**htmlReporter.AppendExisting = true;**
I had a lot of trouble with this as well as the documentation doesn't explain much. I have one method called ReportCreation which runs for every test case and in that method i have the following:
public static ExtentReports ReportCreation(){
System.out.println(extent);
if (extent == null) {
extent = new ExtentReports();
htmlReports = new ExtentHtmlReporter(fileName+ n + "\\extentReportFile.html");
htmlReports.config().setReportName("Pre release Smoke test");
htmlReports.config().setTheme(Theme.STANDARD);
htmlReports.config().setTestViewChartLocation(ChartLocation.BOTTOM);
extent.attachReporter(htmlReports);
}
else {
htmlReports = new ExtentHtmlReporter(fileName+ n+ "\\extentReportFile.html");
htmlReports.setAppendExisting(true);
extent.attachReporter(htmlReports);
}
return extent;
}
So when the first unit test is run, it will create the html report, but the second unit test will see that the report has already been generated and so use the existing one.
I have created a random number generator so that it goes to a different report on every run
public static Random rand = new Random();
public static int n = rand.nextInt(10000)+1;
I was facing the same issue. My solution was using .NET Core so ExtentReports 3 and 4 were not supported.
Instead, I wrote code to merge the results from previous html file to the new html file.
This is the code I used:
public static void GenerateReport()
{
// Publish test results to extentnew.html file
extent.Flush();
if (!File.Exists(extentConsolidated))
{
// Rename extentnew.html to extentconsolidated.html after execution of 1st batch
File.Move(extentLatest, extentConsolidated);
}
else
{
// Append test results to extentconsolidated.html from 2nd batch onwards
_ = AppendExtentHtml();
}
}
public static async Task AppendExtentHtml()
{
var htmlconsolidated = File.ReadAllText(extentConsolidated);
var htmlnew = File.ReadAllText(extentLatest);
var config = Configuration.Default;
var context = BrowsingContext.New(config);
var newdoc = await context.OpenAsync(req => req.Content(htmlnew));
var newlis = newdoc.QuerySelector(#"ul.test-list-item");
var consolidateddoc = await context.OpenAsync(req => req.Content(htmlconsolidated));
var consolidatedlis = consolidateddoc.QuerySelector(#"ul.test-list-item");
foreach (var li in newlis.Children)
{
li.RemoveFromParent();
consolidatedlis.AppendElement(li);
}
File.WriteAllText(extentConsolidated, consolidateddoc.DocumentElement.OuterHtml);
}
This logic bypasses any Extent Report reference and treats the result file as any other html.
Hope this helps.

Parsing Nodes with HTML AgilityPack

I'm trying to get information from that page : http://www.wowhead.com/transmog-sets?filter=3;5;0#transmog-sets
rows look like this when inspecting elements :
I've tried this code but it return me null every time on any nodes:
public class ItemSetsTransmog
{
public string ItemSetName { get; set; }
public string ItemSetId { get; set; }
}
public partial class Fmain : Form
{
DataTable Table;
HtmlWeb web = new HtmlWeb();
public Fmain()
{
InitializeComponent();
initializeItemSetTransmogTable();
}
private async void Fmain_Load(object sender, EventArgs e)
{
int PageNum = 0;
var itemsets = await ItemSetTransmogFromPage(0);
while (itemsets.Count > 0)
{
foreach (var itemset in itemsets)
Table.Rows.Add(itemset.ItemSetName, itemset.ItemSetId);
itemsets = await ItemSetTransmogFromPage(PageNum++);
}
}
private async Task<List<ItemSetsTransmog>> ItemSetTransmogFromPage(int PageNum)
{
String url = "http://www.wowhead.com/transmog-sets?filter=3;5;0#transmog-sets";
if (PageNum != 0)
url = "http://www.wowhead.com/transmog-sets?filter=3;5;0#transmog-sets:75+" + PageNum.ToString();
var doc = await Task.Factory.StartNew(() => web.Load(url));
var NameNodes = doc.DocumentNode.SelectNodes("//*[#id=\"tab - transmog - sets\"]//div//table//tr//td//div//a");
var IdNodes = doc.DocumentNode.SelectNodes("//*[#id=\"tab - transmog - sets\"]//div//table//tr//td//div//a");
// if these are null it means the name/score nodes couldn't be found on the html page
if (NameNodes == null || IdNodes == null)
return new List<ItemSetsTransmog>();
var ItemSetNames = NameNodes.Select(node => node.InnerText);
var ItemSetIds = IdNodes.Select(node => node.InnerText);
return ItemSetNames.Zip(ItemSetIds, (name, id) => new ItemSetsTransmog() { ItemSetName = name, ItemSetId = id }).ToList();
}
private void initializeItemSetTransmogTable()
{
Table = new DataTable("ItemSetTransmogTable");
Table.Columns.Add("ItemSetName", typeof(string));
Table.Columns.Add("ItemSetId", typeof(string));
ItemSetTransmogDataView.DataSource = Table;
}
}
}
why does my script doesn't load any of theses nodes ? how can i fix it ?
Your code does not load these nodes because they do not exist in the HTML that is pulled back by HTML Agility Pack. This is probably because a large majority of the markup you have shown is generated by JavaScript. Just try inspecting the doc.ParsedText property in your ItemSetTransmogFromPage() method.
Html Agility Pack is an HTTP Client/Parser, it will not run scripts. If you really need to get the data using this process then you will need to use a "headless browser" such as Optimus to retrieve the page (caveat: I have not used this library, though a nuget package appears to exist) and then probably use HTML Agility Pack to parse/query the markup.
The other alternative might be to try to parse the JSON that exists on this page (if this provides you with the data that you need, although this appears unlikely).
Small note - I think the id in you xpath should be "tab-transmog-sets" instead of "tab - transmog - sets"

Editing XML node inner text only if has specified value

Hello i need your super help.
Im not soo skilled in C# and i stack for about 6 hours on this. So please if anyone know help me . Thx
I have Xml like this
<COREBASE>
<AGENT>
<AGENT_INDEX>1</AGENT_INDEX>
<AGENT_PORTER_INDEX>
</AGENT_PORTER_INDEX>
<AGENT_NAME>John</AGENT_NAME>
<AGENT_SURNAME>Smith</AGENT_SURNAME>
<AGENT_MOBILE_NUMBER>777777777</AGENT_MOBILE_NUMBER>
</AGENT>
<AGENT>
<AGENT_INDEX>2</AGENT_INDEX>
<AGENT_PORTER_INDEX>1
</AGENT_PORTER_INDEX>
<AGENT_NAME>Charles</AGENT_NAME>
<AGENT_SURNAME>Bukowski</AGENT_SURNAME>
<AGENT_MOBILE_NUMBER>99999999</AGENT_MOBILE_NUMBER>
</AGENT>
</COREBASE>
And I need to select agent by index in windows forms combo box and than edit and save his attributes to xml. I found how to edit and save it but i dont know why but its saved to the first agent and overwrite his attributes in XML but not in the selected one.. :-(
Plese i will be glad for any help
private void buttonEditAgent_Click(object sender, EventArgs e)
{
XmlDocument AgentBaseEdit = new XmlDocument();
AgentBaseEdit.Load("AgentBase.xml");
XDocument AgentBase = XDocument.Load("AgentBase.xml");
var all = from a in AgentBase.Descendants("AGENT")
select new
{
agentI = a.Element("AGENT_INDEX").Value,
porterI = a.Element("AGENT_PORTER_INDEX").Value,
agentN = a.Element("AGENT_NAME").Value,
agentS = a.Element("AGENT_SURNAME").Value,
agentM = a.Element("AGENT_MOBILE_NUMBER").Value,
};
foreach (var a in all)
{
if ("" == textBoxEditAgentIndex.Text.ToString())
{
MessageBox.Show("You must fill Agent Index field !!", "WARNING");
}
else
{
// AgentBaseEdit.SelectSingleNode("COREBASE/AGENT/AGENT_INDEX").InnerText == textBoxEditAgentIndex.Text
if (a.agentI == textBoxEditAgentIndex.Text.ToString())
{
AgentBaseEdit.SelectSingleNode("COREBASE/AGENT/AGENT_INDEX").InnerText = textBoxEditAgentIndex.Text;
AgentBaseEdit.SelectSingleNode("COREBASE/AGENT/AGENT_PORTER_INDEX").InnerText = textBoxEditAgentPorterIndex.Text;
AgentBaseEdit.SelectSingleNode("COREBASE/AGENT/AGENT_NAME").InnerText = textBoxEditAgentName.Text;
AgentBaseEdit.SelectSingleNode("COREBASE/AGENT/AGENT_SURNAME").InnerText = textBoxEditAgentSurname.Text;
AgentBaseEdit.SelectSingleNode("COREBASE/AGENT/AGENT_MOBILE_NUMBER").InnerText = textBoxEditAgentMobile.Text;
AgentBaseEdit.Save("AgentBase.xml");
ClearEditAgentTxtBoxes();
}
}
}
}
Am i on the right way but i dont see the doors or i am totaly wrong ? Thx all. Miko
OK i tried it this way but it didnt changed the inner text
string agentIndex = comboBoxEditAgentI.SelectedItem.ToString();
XmlDocument AgentBaseEdit = new XmlDocument();
AgentBaseEdit.Load("AgentBase.xml");
XDocument AgentBase = XDocument.Load("AgentBase.xml");
var xElemAgent = AgentBase.Descendants("AGENT")
.First(a => a.Element("AGENT_INDEX").Value == agentIndex);
xElemAgent.Element("AGENT_MOBILE_NUMBER").Value = textBoxEditAgentMobile.Text;
xElemAgent.Element("AGENT_SURNAME").Value = textBoxEditAgentSurname.Text;
AgentBaseEdit.Save("AgentBase.xml");
It would be easier if you use Linq2Xml.
int agentIndex = 2;
XDocument xDoc = XDocument.Load(filename);
var xElemAgent = xDoc.Descendants("AGENT")
.First(a => a.Element("AGENT_INDEX").Value == agentIndex.ToString());
//or
//var xElemAgent = xDoc.XPathSelectElement(String.Format("//AGENT[AGENT_INDEX='{0}']",agentIndex));
xElemAgent.Element("AGENT_MOBILE_NUMBER").Value = "5555555";
xDoc.Save(fileName)
PS: namespaces: System.Xml.XPath System.Xml.Linq
It does not work, because you are selecting the first agent explicitly with in each loop
AgentBaseEdit.SelectSingleNode("COREBASE/AGENT/...")
But you can do it easier by reading and changing withing the same xml document. I'm only changing the agent name and replacing it with "test 1", "test 2", ...
XDocument AgentBase = XDocument.Load("AgentBase.xml");
int i = 0;
foreach (XElement el in AgentBase.Descendants("AGENT")) {
el.Element("AGENT_NAME").Value = "test " + ++i;
// ...
}
AgentBase.Save("AgentBase.xml");
UPDATE
However, I'm suggesting you to separate the logic involving the XML handling from the form. Start by creating an Agent class
public class Agent
{
public string Index { get; set; }
public string PorterIndex { get; set; }
public string Name { get; set; }
public string Surname { get; set; }
public string Mobile { get; set; }
}
Then create an interface defining the needed functionality for an agent repository. The advantage of this interface is that it will make it easier later to switch to another kind of repository like a relational database.
public interface IAgentRepository
{
IList<Agent> LoadAgents();
void Save(IEnumerable<Agent> agents);
}
Then create a class that handles the agents. Here is a suggestion:
public class AgentXmlRepository : IAgentRepository
{
private string _xmlAgentsFile;
public AgentXmlRepository(string xmlAgentsFile)
{
_xmlAgentsFile = xmlAgentsFile;
}
public IList<Agent> LoadAgents()
{
XDocument AgentBase = XDocument.Load(_xmlAgentsFile);
var agents = new List<Agent>();
foreach (XElement el in AgentBase.Descendants("AGENT")) {
var agent = new Agent {
Index = el.Element("AGENT_INDEX").Value,
PorterIndex = el.Element("AGENT_PORTER_INDEX").Value,
Name = el.Element("AGENT_NAME").Value,
Surname = el.Element("AGENT_SURNAME").Value,
Mobile = el.Element("AGENT_MOBILE_NUMBER").Value
};
agents.Add(agent);
}
return agents;
}
public void Save(IEnumerable<Agent> agents)
{
var xDocument = new XDocument(
new XDeclaration("1.0", "utf-8", null),
new XElement("COREBASE",
agents.Select(a =>
new XElement("AGENT",
new XElement("AGENT_INDEX", a.Index),
new XElement("AGENT_PORTER_INDEX", a.PorterIndex),
new XElement("AGENT_NAME", a.Name),
new XElement("AGENT_SURNAME", a.Surname),
new XElement("AGENT_MOBILE_NUMBER", a.Mobile)
)
)
)
);
xDocument.Save(_xmlAgentsFile);
}
}
The form can now concentrate on the editing logic. The form does not even need to know what kind of repository to use if you inject the repository in the form constructor (of cause the form constructor must declare a parameter of type IAgentRepository):
var myAgentForm = new AgentForm(new AgentXmlRepository("AgentBase.xml"));
myAgentForm.Show();
UPDATE #2
Note that you cannot change a single item within an XML file. You must load all the agents, make an edit and then rewrite the whole file, even if you edited only one agent.
To do this, you can use my LoadAgents method, then pick an agent from the returned list, edit the agent and finally write the agents list back to the file with my Save method. You can find an agent in the list with LINQ:
Agent a = agents
.Where(a => a.Index == x)
.FirstOrDefault();
This returns null if an agent with the required index does not exist. Since Agent is a reference type, you don’t have to write it back to the list. The list is keeping a reference to the same agent as the variable a.

Parsing XML String in C#

I have looked over other posts here on the same subject and searched Google but I am extremely new to C# NET and at a loss. I am trying to parse this XML...
<whmcsapi version="4.1.2">
<action>getstaffonline</action>
<result>success</result>
<totalresults>1</totalresults>
<staffonline>
<staff>
<adminusername>Admin</adminusername>
<logintime>2010-03-03 18:29:12</logintime>
<ipaddress>127.0.0.1</ipaddress>
<lastvisit>2010-03-03 18:30:43</lastvisit>
</staff>
</staffonline>
</whmcsapi>
using this code..
XDocument doc = XDocument.Parse(strResponse);
var StaffMembers = doc.Descendants("staff").Select(staff => new
{
Name = staff.Element("adminusername").Value,
LoginTime = staff.Element("logintime").Value,
IPAddress = staff.Element("ipaddress").Value,
LastVisit = staff.Element("lastvisit").Value,
}).ToList();
label1.Text = doc.Element("totalresults").Value;
foreach (var staff in StaffMembers)
{
listBox1.Items.Add(staff.Name);
}
I have printed out the contents of strResponse and the XML is definitely there. However, when I click this button, nothing is added to the listBox1 or the label1 so I something is wrong.
Add Root here to start navigating from the root element (whmcsapi):
string label1_Text = doc.Root.Element("totalresults").Value;

Categories

Resources