Bookmarks Inconsistencies - c#

I have tried to figure this out now for about 30 minutes to no avail.
I got this code:
class BookMarkTest
{
Application word;
Document doc;
public BookMarkTest()
{
word = new Application();
doc = word.Documents.Open("C:\\Users\\Vipar\\Desktop\\TestSkabelon.docx");
Console.WriteLine("BEFORE:\n");
foreach (Bookmark b in doc.Bookmarks)
{
Console.WriteLine("Name: {0}\n", b.Name);
}
Console.WriteLine("Name: {0}, Text: {1}\n", doc.Bookmarks[2].Name, doc.Bookmarks[2].Range.Text);
test(ref doc);
Console.WriteLine("------------------\n");
Console.WriteLine("AFTER:\n");
foreach (Bookmark b in doc.Bookmarks)
{
Console.WriteLine("Name: {0}\n", b.Name);
}
Console.WriteLine("Name: {0}, Text: {1}\n", doc.Bookmarks[2].Name, doc.Bookmarks[2].Range.Text);
doc = null;
word = null;
}
// Fixed the code so it replaces bookmarks correctly rather than removing them.
public void test(ref Document doc)
{
Dictionary<string, Bookmark> bookmarks = new Dictionary<string, Bookmark>();
foreach(Bookmark b in doc.Bookmarks) {
bookmarks.Add(b.Name, b);
}
BookMarkReplaceNative(bookmarks["Titel"], "Min nye titel");
}
internal void BookMarkReplaceNative(Bookmark bookmark, string newText)
{
object rng = bookmark.Range;
string name = bookmark.Name;
bookmark.Range.Text = newText;
doc.Bookmarks.Add(name, rng);
}
}
First I check if all the bookmarks are there. There are 3 (I print them out to check this before I do anything else) so the array should be 0 = KontraktStart, 1 = Signatur, 2 = Titel , but when I call doc.Bookmarks[2].Name I get Signatur rather than Titel. I can't figure out why that is. I have tried to do doc.Bookmarks[3].Name but that tells me the element doesn't exist. For some reason I can't call doc.Bookmarks[0].Name name though. It's like an element disappears and is replaced with nothing.
Also my test() method removes Title from the Bookmarks array entirely. I knew this would happen, but how do I go about just replacing the bookmark rather than completely remove it? When I look in my document I can see that the text on the specific bookmark actually changes, but the bookmark is removed which is not desired. FIXED THIS. ADDED TO CODE SNIPPET
So my question is two-fold:
How come elements are disappearing from the Bookmarks collection before I even manipulate the collection?
How do I replace a Bookmark rather than completely remove it? FIXED THIS. ADDED TO CODE SNIPPET
Thanks in advance!

Related

C# IronWebScraper can iterate but unable to access one element at a time

The webscraper from the library works in htmlnodes, it's hard to explain but I am sort of scraping a tag and then the inside and I want to handle the inside like an array, which it is by default in this library but the issue is, I can iterate over it with a "for loop" like any other array, but I cannot access it with an index for some reason...
this is my code with the website link exactly like the documentation of the library uses:
In the main:
static void Main(string[] args) {
var scraper = new HelloScraper();
scraper.Start();
}
then Init:
public override void Init() {
this.LoggingLevel = WebScraper.LogLevel.None;
this.Request("https://1337x.to/sort-search/Aquaman/time/desc/1/", Parse);
}
And now the Parse which gives me trouble and I will split it to show what works and what doesn't.
This works:
public override void Parse(Response response) {
foreach (var torrentLink in response.Css("tr")) {
HtmlNode[] torrentContents = torrentLink.Css("td");
for (int i = 0; i < torrentContents.Length; i++) {
Console.WriteLine($"{i}: {torrentContents[i].InnerText}");
}
Console.WriteLine();
}
}
To make it easier to understand I will talk about a single "torrent" here.
this working piece of code produces:
0: Aquaman IMAX (2019) AC3 5.1 ITA.ENG 1080p H265 sub NUita.eng Sp33dy94 MIRCrew1
1: 7
2: 0
3: 8pm Oct. 2nd
4: 4.2 GB7
5: Sp33dy94
but this piece of code which basically selects what I need based on the same array with the indexes that I can see that work from the for loop:
public override void Parse(Response response) {
foreach (var torrentLink in response.Css("tr")) {
HtmlNode[] torrentContents = torrentLink.Css("td");
string torrentName = torrentContents[0].InnerText;
string torrentSeeds = torrentContents[1].InnerText;
string torrentSize = torrentContents[4].InnerText;
Console.WriteLine($"{torrentName} --> [Size:{torrentSize} | Seeds:{torrentSeeds}]");
Console.WriteLine();
}
}
this produces nothing... console doesn't display an error, and when I tried to debug it, it looks as when I try to access by index it "points to a null reference".
Maybe I am missing something, but if an array can be access by index in a for loop, it should be accessible outside of it too, am I wrong? what is the issue here?
btw I don't know whether 1337x.to allows web scraping or not, but I am not intending nor to use this commercially or myself, it is just a website I chose to practice with...
After many hours of messing around in the debugger I got it,
when I iterate with a for loop, it skips empty array, and the first was empty, it is the title of the page table, which has no values inside.
adding a simple if statement to check whether the length is more than 0 fixes the issue:
public override void Parse (Response response) {
foreach (var torrentLink in response.Css ("tr")) {
HtmlNode[] torrentContents = torrentLink.Css ("td");
if (torrentContents.Length > 0) {
string torrentName = torrentContents[0].InnerText;
string torrentSeeds = torrentContents[1].InnerText;
string torrentSize = torrentContents[4].InnerText;
Console.WriteLine ($"{torrentName} --> [Size:{torrentSize} | Seeds:{torrentSeeds}]");
Console.WriteLine ();
}
}
}

Looping through HtmlNodes and collecting data gives me the same result every time

I have an async method which calls a mapper for turning HTML string into an IEnumerable:
public async Task<IEnumerable<MovieRatingScrape>> GetMovieRatingsAsync(string username, int page)
{
var response = await _httpClient.GetAsync($"/betyg/{username}?p={page}");
response.EnsureSuccessStatusCode();
var html = await response.Content.ReadAsStringAsync();
return new MovieRatingsHtmlMapper().Map(html);
}
...
public class MovieRatingsHtmlMapper : HtmlMapperBase<IEnumerable<MovieRatingScrape>>
{
// In reality, this method belongs to base class with signature T Map(string html)
public IEnumerable<MovieRatingScrape> Map(string html)
{
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
return Map(htmlDocument);
}
public override IEnumerable<MovieRatingScrape> Map(HtmlDocument item)
{
var movieRatings = new List<MovieRatingScrape>();
var nodes = item.DocumentNode.SelectNodes("//table[#class='list']/tr");
foreach (var node in nodes)
{
var title = node.SelectSingleNode("//td[1]/a")?.InnerText;
movieRatings.Add(new MovieRatingScrape
{
Date = DateTime.Parse(node.SelectSingleNode("//td[2]")?.InnerText),
Slug = node.SelectSingleNode("//td[1]/a[starts-with(#href, '/film/')]")?
.GetAttributeValue("href", null)?
.Replace("/film/", string.Empty),
SwedishTitle = title,
Rating = node.SelectNodes($"//td[3]/i[{XPathHasClass("fa-star")}]").Count
});
}
return movieRatings;
}
}
The resulting list movieRatings contains copies of the same object, but when I look at the HTML and when I debug and view the HtmlNode node they differ as they are supposed to.
Either I'm blind to something really obvious, or I am hitting some async issue which I do not grasp. Any ideas? I should be getting 50 unique objects out of this call, now I am only getting the first 50 times.
Thank you in advance, Viktor.
Edit: Adding some screenshots to show my predicament. Look at locals InnerHtml (node) and title for item 1 and 2 of the foreach loop.
Edit 2: Managed to reproduce on .NET Fiddle: https://dotnetfiddle.net/A2I4CQ
You need to use .// and not //
Here is the fixed Fiddle: https://dotnetfiddle.net/dZkSRN
// will search anywhere in the document
.// will search anywhere in the current node
i am not super sure how to describe this but your issue is here (i think)
//table[#class='list']/tr"
specifically the //
I experienced the same thing while looking for a span. i had to use something similar
var nodes = htmlDoc.DocumentNode.SelectNodes("//li[#class='itemRow productItemWrapper']");
foreach(HtmlNode node in nodes)
{
var nodeDoc = new HtmlDocument();
nodeDoc.LoadHtml(node.InnerHtml);
string name = nodeDoc.DocumentNode.SelectSingleNode("//span[#class='productDetailTitle']").InnerText;
}

How to prevent "stale element" inside a foreach loop?

I'm using Selenium for retrieve data from this site, and I encountered a little problem when I try to click an element within a foreach.
What I'm trying to do
I'm trying to get the table associated to a specific category of odds, in the link above we have different categories:
As you can see from the image, I clicked on Asian handicap -1.75 and the site has generated a table through javascript, so inside my code I'm trying to get that table finding the corresponding element and clicking it.
Code
Actually I have two methods, the first called GetAsianHandicap which iterate over all categories of odds:
public List<T> GetAsianHandicap(Uri fixtureLink)
{
//Contains all the categories displayed on the page
string[] categories = new string[] { "-1.75", "-1.5", "-1.25", "-1", "-0.75", "-0.5", "-0.25", "0", "+0.25", "+0.5", "+0.75", "+1", "+1.25", "+1.5", "+1.75" };
foreach(string cat in categories)
{
//Get the html of the table for the current category
string html = GetSelector("Asian handicap " + asian);
if(html == string.Empty)
continue;
//other code
}
}
and then the method GetSelector which click on the searched element, this is the design:
public string GetSelector(string selector)
{
//Get the available table container (the category).
var containers = driver.FindElements(By.XPath("//div[#class='table-container']"));
//Store the html to return.
string html = string.Empty;
foreach (IWebElement container in containers)
{
//Container not available for click.
if (container.GetAttribute("style") == "display: none;")
continue;
//Get container header (contains the description).
IWebElement header = container.FindElement(By.XPath(".//div[starts-with(#class, 'table-header')]"));
//Store the table description.
string description = header.FindElement(By.TagName("a")).Text;
//The container contains the searched category
if (description.Trim() == selector)
{
//Get the available links.
var listItems = driver.FindElement(By.Id("odds-data-table")).FindElements(By.TagName("a"));
//Get the element to click.
IWebElement element = listItems.Where(li => li.Text == selector).FirstOrDefault();
//The element exist
if (element != null)
{
//Click on the container for load the table.
element.Click();
//Wait few seconds on ChromeDriver for table loading.
driver.Manage().Timeouts().ImplicitWait = TimeSpan.FromSeconds(20);
//Get the new html of the page
html = driver.PageSource;
}
return html;
}
return string.Empty;
}
Problem and exception details
When the foreach reach this line:
var listItems = driver.FindElement(By.Id("odds-data-table")).FindElements(By.TagName("a"));
I get this exception:
'OpenQA.Selenium.StaleElementReferenceException' in WebDriver.dll
stale element reference: element is not attached to the page document
Searching for the error means that the html page source was changed, but in this case I store the element to click in a variable and the html itself in another variable, so I can't get rid to patch this issue.
Someone could help me?
Thanks in advance.
I looked at your code and I think you're making it more complicated than it needs to be. I'm assuming you want to scrape the table that is exposed when you click one of the handicap links. Here's some simple code to do this. It dumps the text of the elements which ends up unformatted but you can use this as a starting point and add functionality if you want. I didn't run into any StaleElementExceptions when running this code and I never saw the page refresh so I'm not sure what other people were seeing.
string url = "http://www.oddsportal.com/soccer/europe/champions-league/paok-spartak-moscow-pIXFEt8o/#ah;2";
driver.Url = url;
// get all the (visible) handicap links and click them to open the page and display the table with odds
IReadOnlyCollection<IWebElement> links = driver.FindElements(By.XPath("//a[contains(.,'Asian handicap')]")).Where(e => e.Displayed).ToList();
foreach (var link in links)
{
link.Click();
}
// print all the odds tables
foreach (var item in driver.FindElements(By.XPath("//div[#class='table-container']")))
{
Console.WriteLine(item.Text);
Console.WriteLine("====================================");
}
I would suggest that you spend some more time learning locators. Locators are very powerful and can save you having to stack nested loops looking for one thing... and then children of that thing... and then children of that thing... and so on. The right locator can find all that in one scrape of the page which saves a lot of code and time.
As you mentioned in related Post, this issue is because site executes an auto refresh.
Solution 1:
I would suggest if there is an explicit way to do refresh, perform that refresh on a periodic basis, or (if you are sure, when you need to do refresh).
Solution 2:
Create a Extension method for FindElement and FindElements, so that it try to get element for a given timeout.
public static void FindElement(this IWebDriver driver, By by, int timeout)
{
if(timeout >0)
{
return new WebDriverWait(driver, TimeSpan.FromSeconds(timeout)).Until(ExpectedConditions.ElementToBeClickable(by));
}
return driver.FindElement(by);
}
public static IReadOnlyCollection<IWebElement> FindElements(this IWebDriver driver, By by, int timeout)
{
if(timeout >0)
{
return new WebDriverWait(driver, TimeSpan.FromSeconds(timeout)).Until(ExpectedConditions.PresenceOfAllElementsLocatedBy(by));
}
return driver.FindElements(by);
}
so your code will use these like this:
var listItems = driver.FindElement(By.Id("odds-data-table"), 30).FindElements(By.TagName("a"),30);
Solution 3:
Handle StaleElementException using an Extension Method:
public static void FindElement(this IWebDriver driver, By by, int maxAttempt)
{
for(int attempt =0; attempt <maxAttempt; attempt++)
{
try
{
driver.FindElement(by);
break;
}
catch(StaleElementException)
{
}
}
}
public static IReadOnlyCollection<IWebElement> FindElements(this IWebDriver driver, By by, int maxAttempt)
{
for(int attempt =0; attempt <maxAttempt; attempt++)
{
try
{
driver.FindElements(by);
break;
}
catch(StaleElementException)
{
}
}
}
Your code will use these like this:
var listItems = driver.FindElement(By.Id("odds-data-table"), 2).FindElements(By.TagName("a"),2);
Use this:
string description = header.FindElement(By.XPath("strong/a")).Text;
instead of your:
string description = header.FindElement(By.TagName("a")).Text;

using List.Add() method is breaking my code out of a foreach loops

So, essentially, I'm running into an interesting issue where, when the call to the "CreateXML()" function in the following code is made, an xelement is created as intended, but then, when I attempt to add it to a collection of xeleents, instead of continuing the foreach loop from which the call to "CreateXML()" originated, the foreach loop is broken out of, and a call is made to "WriteXML()". Additionally, though an XElement is created and populated, it is not added to the List. [for clarification, the foreach loops I am referring to live in the "ParseDoc()" method]
private List<XElement> _xelemlist;
private void WriteXml()
{
XElement head = new XElement("header", new XAttribute("headerattributename", "attribute"));
foreach (XElement xelem in _xelemlist)
{
head.Add(xelem);
}
XDocument doc = new XDocument();
doc.Add(head);
}
private void CreateXML(string attname, string att)
{
XElement xelem = new XElement("name", new XElement("child", new XAttribute(attname, att), segment));
_xelemlist.Add(xelem);
}
private void ExtractSegment(HtmlNode node)
{
HtmlAttribute[] segatts = node.Attributes.ToArray();
string attname = segatts[0].Value.ToString();
string att = node.InnerText.ToString();
CreateXML(attname, att);
}
private HtmlDocument ParseDoc(HtmlDocument document)
{
try
{
HtmlNode root = document.DocumentNode.FirstChild;
foreach (HtmlNode childnode1 in root.SelectNodes(".//child1"))
{
foreach (HtmlNode childnode2 in node.SelectNodes(".//child2"))
{
ExtractSegment(childnode2);
}
}
}
catch (Exception e) { }
WriteXml();
return document;
}
When I comment out the "List.Add()" in "CreateXML()" and step through the code, the foreach loop is not broken out of after the first iteration, and the code works properly.
I have no idea what I'm doing wrong (And yes, the code is instantiated by a public member, don't worry: I am only posting the relevant internal methods to my problem)... if anyone has come across this sort of behavior before, I would really appreciate a push in the right direction to attempt to correct it... Sepcifically: is the problem just poor coding, or is this behavior a result of a property of one of the methods/libraries I am using?
One Caveat: I know that I am using HTMLAgilityPack to parse a file and extract information, but a requirement on this code forces me to use XDocument to write said information... don't ask me why.
I have no idea what I'm doing wrong
This, for starters:
catch (Exception e) { }
That's stopping you from seeing what on earth's going on. I strongly suspect you've got a NullReferenceException due to _xelemlist being null, but that's a secondary problem. The main problem is that by pretending everything's fine whatever happens, with no logging whatsoever, the only way of getting anywhere is by debugging, and that's an awful experience when you don't need to go through it.
It's extremely rarely a good idea catch exceptions and swallow them without any logging at all. It's almost never a good idea to do that with Exception.
Whenever you have a problem which is difficult to diagnose, improve your diagnostic capabilities first. That way, when you next run into a problem, it'll be easier to diagnose.
Declare the List this way,
private List<XElement> _xelemlist = new List<XElement>();
In your foreach loop, you are attempting to use XElement head as a list of XElements when you add() to it. This should probably be a list of XElements?
Might I suggest switching to using XmlDocument?
Here is some sample code which I have written for work (changed to protect my work :D), and we are using it rather well.
Code:
XmlDocument doc = new XmlDocument();
XmlNode root;
if(File.Exists(path + "\\MyXmlFile.xml"))
{
doc.Load(path + "\\MyXmlFile.xml");
root = doc.SelectSingleNode("//Library");
}
else
{
XmlDeclaration dec = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
doc.AppendChild(dec);
root = doc.CreateElement("Library");
doc.AppendChild(root);
}
XmlElement book = doc.CreateElement("Book");
XmlElement title = doc.CreateElement("Title");
XmlElement author = doc.CreateElement("Author");
XmlElement isbn = doc.CreateElement("ISBN");
title.InnerText = "Title of a Book";
author.InnerText = "Some Author";
isbn.InnerText = "RandomNumbers";
book.AppendChild(title);
book.AppendChild(author);
book.AppendChild(isbn);
root.AppendChild(book);
doc.Save(path + "\\MyXmlFile.xml");

Files,strings and save

I been having trouble trying to figure this out. When I think I have it I get told no. Here is a picture of it.
I am working on the save button. Now after the user adds the first name, last name and job title they can save it. If a user loads the file and it comes up in the listbox, that person should be able to click on the name and then hit the edit button and they should be able to edit it. I have code, but I did get inform it looked wackey and the string should have the first name, last name and job title.
It is getting me really confused as I am learning C#. I know how to use savefiledialog but I am not allowed to use it on this one. Here is what I am suppose to be doing:
When the user clicks the “Save” button, write the selected record to
the file specified in txtFilePath (absolute path not relative) without
truncating the values currently inside.
I am still working on my code since I got told that it will be better file writes records in a group of three strings. But this is the code I have right now.
private void Save_Click(object sender, EventArgs e)
{
string path = txtFilePath.Text;
if (File.Exists(path))
{
using (StreamWriter sw = File.CreateText(path))
{
foreach (Employee employee in employeeList.Items)
sw.WriteLine(employee);
}
}
else
try
{
StreamWriter sw = File.AppendText(path);
foreach (var item in employeeList.Items)
sw.WriteLine(item.ToString());
}
catch
{
MessageBox.Show("Please enter something in");
}
Now I can not use save or open file dialog. The user should be able to open any file on the C,E,F drive or where it is. I was also told it should be obj.Also the program should handle and exceptions that arise.
I know this might be a noobie question but my mind is stuck as I am still learning how to code with C#. Now I have been searching and reading. But I am not finding something to help me understand how to have all this into 1 code. If someone might be able to help or even point to a better web site I would appreciate it.
There are many, many ways to store data in a file. This code demonstrates 4 methods that are pretty easy to use. But the point is that you should probably be splitting up your data into separate pieces rather than storing them as one long string.
public class MyPublicData
{
public int id;
public string value;
}
[Serializable()]
class MyEncapsulatedData
{
private DateTime created;
private int length;
public MyEncapsulatedData(int length)
{
created = DateTime.Now;
this.length = length;
}
public DateTime ExpirationDate
{
get { return created.AddDays(length); }
}
}
class Program
{
static void Main(string[] args)
{
string testpath = System.IO.Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "TestFile");
// Method 1: Automatic XML serialization
// Requires that the type being serialized and all its serializable members are public
System.Xml.Serialization.XmlSerializer xs =
new System.Xml.Serialization.XmlSerializer(typeof(MyPublicData));
MyPublicData o1 = new MyPublicData() {id = 3141, value = "a test object"};
MyEncapsulatedData o2 = new MyEncapsulatedData(7);
using (System.IO.StreamWriter w = new System.IO.StreamWriter(testpath + ".xml"))
{
xs.Serialize(w, o1);
}
// Method 2: Manual XML serialization
System.Xml.XmlWriter xw = System.Xml.XmlWriter.Create(testpath + "1.xml");
xw.WriteStartElement("MyPublicData");
xw.WriteStartAttribute("id");
xw.WriteValue(o1.id);
xw.WriteEndAttribute();
xw.WriteAttributeString("value", o1.value);
xw.WriteEndElement();
xw.Close();
// Method 3: Automatic binary serialization
// Requires that the type being serialized be marked with the "Serializable" attribute
using (System.IO.FileStream f = new System.IO.FileStream(testpath + ".bin", System.IO.FileMode.Create))
{
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter bf =
new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
bf.Serialize(f, o2);
}
// Demonstrate how automatic binary deserialization works
// and prove that it handles objects with private members
using (System.IO.FileStream f = new System.IO.FileStream(testpath + ".bin", System.IO.FileMode.Open))
{
System.Runtime.Serialization.Formatters.Binary.BinaryFormatter bf =
new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
MyEncapsulatedData o3 = (MyEncapsulatedData)bf.Deserialize(f);
Console.WriteLine(o3.ExpirationDate.ToString());
}
// Method 4: Manual binary serialization
using (System.IO.FileStream f = new System.IO.FileStream(testpath + "1.bin", System.IO.FileMode.Create))
{
using (System.IO.BinaryWriter w = new System.IO.BinaryWriter(f))
{
w.Write(o1.id);
w.Write(o1.value);
}
}
// Demonstrate how manual binary deserialization works
using (System.IO.FileStream f = new System.IO.FileStream(testpath + "1.bin", System.IO.FileMode.Open))
{
using (System.IO.BinaryReader r = new System.IO.BinaryReader(f))
{
MyPublicData o4 = new MyPublicData() { id = r.ReadInt32(), value = r.ReadString() };
Console.WriteLine("{0}: {1}", o4.id, o4.value);
}
}
}
}
As you are writing the employee objects with WriteLine, the underlying ToString() is being invoked. What you have to do first is to customize that ToString() methods to fit your needs, in this way:
public class Employee
{
public string FirstName;
public string LastName;
public string JobTitle;
// all other declarations here
...........
// Override ToString()
public override string ToString()
{
return string.Format("'{0}', '{1}', '{2}'", this.FirstName, this.LastName, this.JobTitle);
}
}
This way, your writing code still keeps clean and readable.
By the way, there is not a reverse equivalent of ToSTring, but to follow .Net standards, I suggest you to implement an Employee's method like:
public static Employee Parse(string)
{
// your code here, return a new Employee object
}
You have to determine a way of saving that suits your needs. A simple way to store this info could be CSV:
"Firstname1","Lastname 1", "Jobtitle1"
" Firstname2", "Lastname2","Jobtitle2 "
As you can see, data won't be truncated, since the delimiter " is used to determine string boundaries.
As shown in this question, using CsvHelper might be an option. But given this is homework and the constraints therein, you might have to create this method yourself. You could put this in Employee (or make it override ToString()) that does something along those lines:
public String GetAsCSV(String firstName, String lastName, String jobTitle)
{
return String.Format("\"{0}\",\"{1}\",\"{2}\"", firstName, lastName, jobTitle);
}
I'll leave the way how to read the data back in as an exercise to you. ;-)

Categories

Resources