I am creatng a webpart and I am tryng to reference xml from toolpart . I hv created custom properties and its fine if I set the default value to some url otherwise its showing msg fle not found. I want that if its the first time I am loading file it should display the message Open tool part to select the XML.
I m doing lke ths:
private string feedXML;
[Browsable(true),
Personalizable(true),
Category("Example Web Parts"),
DefaultValue(""),
WebPartStorage(Storage.Shared),
FriendlyName("MySetting"),
Description("An example setting")]
public string FeedXML
{
get
{ return feedXML; }
set
{ feedXML = value; }
}
string xmlurl = String.Empty;
string _xsl = string.Empty;
// Load the XML
xmlurl = web.GetFileAsString(GetRelativeURL(feedXML));<---exception as feedXML is null
XmlDocument doc = new XmlDocument();
doc.LoadXml(xmlurl);
As the frst time webpart is loadng its quite obvious it wud be feedXML wud be null but I want to display the msg to user "Select XML frm toolpart" as we generally get when we add OOB webpart (like XML Webpart)
Override the CreateChildControls method; if feedXML is null, create a label that says Open tool pane ... and add it to Web Part's Control collection.
Also, check the Creating a Web Part with a Custom Tool Part article.
Related
I need to pull part of html from external url to another page using agility-pack. I am not sure if i can select a node/element based on id or classname using agility pack. So far i manage to pull complete page but i want to target on node/element with specific id and all its contents.
protected void WebScrapper()
{
HtmlDocument doc = new HtmlDocument();
var url = #"https://www.itftennis.com/en/tournament/w15-valencia/esp/2022/w-itf-esp-35a-2022/acceptance-list/";
var webGet = new HtmlWeb();
doc = webGet.Load(url);
var baseUrl = new Uri(url);
//doc.LoadHtml(doc);
Response.Write(doc.DocumentNode.InnerHtml);
//Response.Write(doc.DocumentNode.Id("acceptance-list-container"));
//var innerContent = doc.DocumentNode.SelectNodes("/div").FirstOrDefault().InnerHtml;
}
When i use Response.Write(doc.DocumentNode.Id("acceptance-list-container")) it generates error.
When i use below code it generates error System.ArgumentNullException: Value cannot be null.
doc.DocumentNode.SelectNodes("/div[#id='acceptance-list-container']").FirstOrDefault().InnerHtml;
so far nothing works if you fix one issue other issue shows up.
The error you get indicates that the SelectNodes() call didn't find any nodes and returned null. In cases like this, it is useful to inspect the actual HTML by using doc.DocumentNode.InnerHtml.
Your code sample is somewhat messy and you are probably trying to do too many things at once (what is Response.Write() for example?). You should try to focus on one thing at a time if possible.
Here is a simple unit test that can get you started:
using HtmlAgilityPack;
using Xunit;
using Xunit.Abstractions;
namespace Scraping.Tests
{
public class ScrapingTests
{
private readonly ITestOutputHelper _outputHelper;
public ScrapingTests(ITestOutputHelper outputHelper)
{
_outputHelper = outputHelper;
}
[Fact]
public void Test()
{
const string url = #"https://www.itftennis.com/en/tournament/w15-valencia/esp/2022/w-itf-esp-35a-2022/acceptance-list/";
var webGet = new HtmlWeb();
HtmlDocument doc = webGet.Load(url);
string html = doc.DocumentNode.InnerHtml;
_outputHelper.WriteLine(html); // use this if you just want to print something
Assert.Contains("acceptance-list-container", html); // use this if you want to automate an assertion
}
}
}
When I tried that the first time, I got some HTML with an iframe. I visited the page in a browser and I was presented with a google captcha. After completing the captcha, I was able to view the page in the browser, but the HTML in the unit test was still different from the one I got in the browser.
Interestingly enough, the HTML in the unit test contains the following:
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
It is obvious that this website has some security measures in place in order to block web scrapers. If you manage to overcome this obstacle and get the actual page's HTML in your program, parsing it and getting the parts that you need will be straightforward.
I am struggling to parse an html webpage in a htaccess-protected area of my website through a .aspx file in C# (the .aspx file is within the protected area). By debugging the code, I can get the raw page through the HtmlWeb().Load method, but when it comes to getting the html node (SelectSingleNode method), I get a null value.
Here is the sample code I am testing:
protected void Page_Load(object sender, EventArgs e)
{
lbl.Text = getTextFromPage();
}
private string getTextFromPage()
{
var web = new HtmlWeb();
var doc = web.Load("html_address");
HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[#id='reference_to_id']");
if (node != null)
{
return node.InnerText;
}
else
{
return "nothing found";
}
}
I always get a "nothing found" response, since the "node" object is null. If I remove the .htaccess file (thus removing the protection), everything works perfectly fine, so I suppose something should be done in the .htaccess definition. What shall I do?
EDIT:
I have included the content of the node in the question:
<div id="reference_to_id"><p>test_test_test</p></div>
I am using Global Weather Web Service in my asp.net application http://www.webservicex.net/globalweather.asmx?op=GetWeather.
The code works fine but I want to get only temperature to be displayed on label.
ServiceReference1.GlobalWeatherSoapClient client = new ServiceReference1.GlobalWeatherSoapClient("GlobalWeatherSoap");
string weather = client.GetWeather("Karachi Airport", "Pakistan");
Label1.Text = weather;
Label control is showing complete data provided by service (i.e Date, Time, Country and city name etc)
According to the link you have provided it returns that string in XML form.
So Use it as below:
var doc = XDocument.Parse(weather); //use .Load if you are pulling an xml file.
var location = doc.Root.Element("Location").Value;
var Temperature = doc.Root.Element("Temperature").Value;
Label1.Text = Temperature;
Just Like above you can get another values too e.g. DewPoint, RelativeHumidity etc
var DewPoint= doc.Root.Element("DewPoint").Value;
var RelativeHumidity = doc.Root.Element("RelativeHumidity ").Value;
You can get it like this also
string weather = client.GetWeather("Karachi Airport", "Pakistan");
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(weather );
XmlNodeList elemlist = xmlDoc.GetElementsByTagName("Temperature");
string temp= elemlist[0].InnerXml;
Background Info: I'm using an ItemCheckedIn receiver in SharePoint 2010, targeting .NET 3.5 Framework. The goal of the receiver is to:
Make sure the properties (columns) of the page match the data in a Content Editor WebPart on the page so that the page can be found in a search using Filter web parts. The pages are automatically generated, so barring any errors they are guaranteed to fit the expected format.
If there is a mismatch, check out the page, fix the properties, then check it back in.
I've kept the receiver from falling into an infinite check-in/check-out loop, although right now it's a very clumsy fix that I'm trying to work on. However, right now I can't work on it because I'm getting a DisconnectedContext error whenever I hit the UpdatePage function:
public override void ItemCheckedIn(SPItemEventProperties properties)
{
// If the main page or machine information is being checked in, do nothing
if (properties.AfterUrl.Contains("home") || properties.AfterUrl.Contains("machines")) return;
// Otherwise make sure that the page properties reflect any changes that may have been made
using (SPSite site = new SPSite("http://san1web.net.jbtc.com/sites/depts/VPC/"))
using (SPWeb web = site.OpenWeb())
{
SPFile page = web.GetFile(properties.AfterUrl);
// Make sure the event receiver doesn't get called infinitely by checking version history
...
UpdatePage(page);
}
}
private static void UpdatePage(SPFile page)
{
bool checkOut = false;
var th = new Thread(() =>
{
using (WebBrowser wb = new WebBrowser())
using (SPLimitedWebPartManager manager = page.GetLimitedWebPartManager(PersonalizationScope.Shared))
{
// Get web part's contents into HtmlDocument
ContentEditorWebPart cewp = (ContentEditorWebPart)manager.WebParts[0];
HtmlDocument htmlDoc;
wb.Navigate("about:blank");
htmlDoc = wb.Document;
htmlDoc.OpenNew(true);
htmlDoc.Write(cewp.Content.InnerText);
foreach (var prop in props)
{
// Check that each property matches the information on the page
string element;
try
{
element = htmlDoc.GetElementById(prop).InnerText;
}
catch (NullReferenceException)
{
break;
}
if (!element.Equals(page.GetProperty(prop).ToString()))
{
if (!prop.Contains("Request"))
{
checkOut = true;
break;
}
else if (!element.Equals(page.GetProperty(prop).ToString().Split(' ')[0]))
{
checkOut = true;
break;
}
}
}
if (!checkOut) return;
// If there was a mismatch, check the page out and fix the properties
page.CheckOut();
foreach (var prop in props)
{
page.SetProperty(prop, htmlDoc.GetElementById(prop).InnerText);
page.Item[prop] = htmlDoc.GetElementById(prop).InnerText;
try
{
page.Update();
}
catch
{
page.SetProperty(prop, Convert.ToDateTime(htmlDoc.GetElementById(prop).InnerText).AddDays(1));
page.Item[prop] = Convert.ToDateTime(htmlDoc.GetElementById(prop).InnerText).AddDays(1);
page.Update();
}
}
page.CheckIn("");
}
});
th.SetApartmentState(ApartmentState.STA);
th.Start();
}
From what I understand, using a WebBrowser is the only way to fill an HtmlDocument in this version of .NET, so that's why I have to use this thread.
In addition, I've done some reading and it looks like the DisconnectedContext error has to do with threading and COM, which are subjects I know next to nothing about. What can I do to prevent/fix this error?
EDIT
As #Yevgeniy.Chernobrivets pointed out in the comments, I could insert an editable field bound to the page column and not worry about parsing any html, but because the current page layout uses an HTML table within a Content Editor WebPart, where this kind of field wouldn't work properly, I'd need to make a new page layout and rebuild my solution from the bottom up, which I would really rather avoid.
I'd also like to avoid downloading anything, as the company I work for normally doesn't allow the use of unapproved software.
You shouldn't do html parsing with WebBrowser class which is part of Windows Forms and is not suited for web as well as for pure html parsing. Try using some html parser like HtmlAgilityPack instead.
Good day
I have question about displaying html documents in a windows forms applications. App that I'm working on should display information from the
database in the html format. I will try to describe actions that I have taken (and which failed):
1) I tried to load "virtual" html page that exists only in memory and dynamically change it's parameters (webbMain is a WebBrowser control):
public static string CreateBookHtml()
{
StringBuilder sb = new StringBuilder();
//Declaration
sb.AppendLine(#"<?xml version=""1.0"" encoding=""utf-8""?>");
sb.AppendLine(#"<?xml-stylesheet type=""text/css"" href=""style.css""?>");
sb.AppendLine(#"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.1//EN""
""http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"">");
sb.AppendLine(#"<html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""en"">");
//Head
sb.AppendLine(#"<head>");
sb.AppendLine(#"<title>Exemplary document</title>");
sb.AppendLine(#"<meta http-equiv=""Content-Type"" content=""application/xhtml+xml;
charset=utf-8""/ >");
sb.AppendLine(#"</head>");
//Body
sb.AppendLine(#"<body>");
sb.AppendLine(#"<p id=""paragraph"">Example.</p>");
sb.AppendLine(#"</body>");
sb.AppendLine(#"</html>");
return sb.ToString();
}
void LoadBrowser()
{
this.webbMain.Navigate("about:blank");
this.webbMain.DocumentText = CreateBookHtml();
HtmlDocument doc = this.webbMain.Document;
}
This failed, because doc.Body is null, and doc.getElementById("paragraph") returns null too. So I cannot change paragraph InnerText property.
Furthermore, this.webbMain.DocumentText is "\0"...
2) I tried to create html file in specified folder, load it to the WebBrowser and then change its parameters. Html is the same as created by
CreateBookHtml() method:
private void LoadBrowser()
{
this.webbMain.Navigate("HTML\\BookPage.html"));
HtmlDocument doc = this.webbMain.Document;
}
This time this.webbMain.DocumentText contains Html data read from the file, but doc.Body returns null again, and I still cannot take element using
getByElementId() method. Of course, when I have text, I would try regex to get specified fields, or maybe do other tricks to achieve a goal, but I wonder - is there simply way to mainipulate html? For me, ideal way would be to create HTML text in memory, load it into the WebBrowser control, and then dynamically change its parameters using IDs. Is it possible? Thanks for the answers in advance, best regards,
Paweł
I've worked some time ago with the WebControl and like you wanted to load a html from memory but have the same problem, body being null. After some investigation, I noticed that the Navigate and NavigateToString methods work asynchronously, so it needs a little time for the control to load the document, the document is not available right after the call to Navigate. So i did something like (wbChat is the WebBrowser control):
wbChat.NavigateToString("<html><body><div>first line</div></body><html>");
DoEvents();
where DoEvents() is implemeted as:
[SecurityPermissionAttribute(SecurityAction.Demand, Flags = SecurityPermissionFlag.UnmanagedCode)]
public void DoEvents()
{
DispatcherFrame frame = new DispatcherFrame();
Dispatcher.CurrentDispatcher.BeginInvoke(DispatcherPriority.Background,
new DispatcherOperationCallback(ExitFrame), frame);
Dispatcher.PushFrame(frame);
}
and it worked for me, after the DoEvents call, I could obtain a non-null body:
mshtml.IHTMLDocument2 doc2 = (mshtml.IHTMLDocument2)wbChat.Document;
mshtml.HTMLDivElement div = (mshtml.HTMLDivElement)doc2.createElement("div");
div.innerHTML = "some text";
mshtml.HTMLBodyClass body = (mshtml.HTMLBodyClass)doc2.body;
if (body != null)
{
body.appendChild((mshtml.IHTMLDOMNode)div);
body.scrollTop = body.scrollHeight;
}
else
Console.WriteLine("body is still null");
I don't know if this is the right way of doing this, but it fixed the problem for me, maybe it helps you too.
Later Edit:
public object ExitFrame(object f)
{
((DispatcherFrame)f).Continue = false;
return null;
}
The DoEvents method is necessary on WPF. For System.Windows.Forms one can use Application.DoEvents().
Another way to do the same thing is:
webBrowser1.DocumentText = "<html><body>blabla<hr/>yadayada</body></html>";
this works without any extra initialization