fairly new to the HTML Agility Pack. I've been searching and trying many examples but didn't get to a conclusion yet.. must be doing something wrong.. hope you can assist me.
My goal is to parse the latest news from a website, including image, title and date - pretty simple. I managed to get the image (background attribute) from the div but the divs are nested and for some reason I can't access their values. Here is my code
using System;
using HtmlAgilityPack;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var html = #"https://pristontale.eu/";
HtmlWeb web = new HtmlWeb();
var doc = web.Load(html);
var news = doc.DocumentNode.SelectNodes("//div[contains(#class,'index-article-wrapper')]");
foreach (var item in news){
var image = Regex.Match(item.GetAttributeValue("style", ""), #"(?<=url\()(.*)(?=\))").Groups[1].Value;
var title = item.SelectSingleNode("//div[#class='article-title']").InnerText;
var date = item.SelectSingleNode("//div[#class='article-date']").InnerText;
Console.WriteLine(image, title, date);
}
}
}
This is what the HTML looks like
<div class="index-article-wrapper" onclick="location.href='article.php?id=2';" style="background-image: url(https://cdn.discordapp.com/attachments/765749063621935104/884439050562461696/1_1.png)">
<div class="meta-wrapper">
div class="article-date">5 Sep, 2021</div>
<div class="article-title">Server merge v1.264 update</div>
</div>
</div>
Currently it correctly grabs me all the 4 news articles but only the image - how do i get title and date of each? I have a fiddle here https://dotnetfiddle.net/BVcAmH
Appreciate the help
I just realized the code has been correct all along, the only flaw was the Console.WriteLine
Wrong
Console.WriteLine(image, title, date);
Correct
Console.WriteLine(image + " " + " " + title + " " + date);
I am doing something like this , i know this code is incorrect ...
string strSrc = Namespace.Properties.Resources.images.ToString();
webBrowser1.DocumentText = #" <img src=""+strSrc+"" />";
You can use this code..
String ExactPath = Path.GetFullPath(Namespace.Properties.Resources.images.image1.jpg);
webBrowser1.DocumentText = #" ";
I have the below code that is dynamically generates a directory tree in html list format. When I try to manipulate the list items with javascript to add a '+' to the end of the item, it doesn't work. I know the jquery is correct, I have used it on another page on the same server. Is jquery not able to manipulate data that is dynamically generated server side with asp.net?
<script langauge="C#" runat="server">
string output;
protected void Page_Load(object sender, EventArgs e) {
getDirectoryTree(Request.QueryString["path"]);
itemWrapper.InnerHtml = output;
}
private void getDirectoryTree(string dirPath) {
try {
System.IO.DirectoryInfo rootDirectory = new System.IO.DirectoryInfo(dirPath);
foreach (System.IO.DirectoryInfo subDirectory in rootDirectory.GetDirectories()) {
output = output + "<ul><li>" + subDirectory.Name + "</li>";
getDirectoryTree(subDirectory.FullName);
if (subDirectory.GetFiles().Length != 0) {
output = output + "<ul>";
foreach (System.IO.FileInfo file in subDirectory.GetFiles()) {
output = output + "<li><a href='" + file.FullName + "'>" + file.Name + "</a></li>";
}
}
output = output + "</ul>";
}
} catch (System.UnauthroizedAccessException) {
//This throws when we don't have access, do nothing and move one.
}
}
</script>
I then try to manipulate the output with the following:
<script langauge="javascript">
$('li > ul').not('li > ul > li > ul').prev().append('+');
</script>
Just an FYI the code for the div is below:
<div id="itemWrapper" runat="server">
</div>
Have you tried execute your JS after the page loads?
Something like this ...
$(function(){
$('li > ul').not('li > ul > li > ul').prev().append('+');
});
It looks like you have a couple of problems here. First you should put your jQuery code inside of $(document).ready. That ensures that the DOM has fully loaded before you try to mess with it. Secondly, your selector is looking for ul elements that are direct children of li elements. Your code does not generate any such HTML. You have li's inside of ul's but not the other way around. Also, if your directory has files in it, you are going to leave some ul elements unclosed which will mess up your HTML and Javascript.
I want to put rich text in HTML on the clipboard so when the users paste to Word, it will include the source HTML formatting.
Using the Clipboard.SetText method doesn't work.
Also, I would like that if users paste into a rich editor like Word it will paste formatted text, and if they paste into a plain editor like Notepad it will paste plain text.
When setting HTML text, you need to provide a header with additional information to what fragment of the html you actually want to paste while being able to provide additional styling around it:
Version:0.9
StartHTML:000125
EndHTML:000260
StartFragment:000209
EndFragment:000222
<HTML>
<head>
<title>HTML clipboard</title>
</head>
<body>
<!–StartFragment–><b>Hello!</b><!–EndFragment–>
</body>
</html>
With the header (and correct indexes), calling Clipboard.SetText with TextDataFormat.Html will do the trick.
To handle HTML and plain text pastes, you can’t use the Clipboard.SetText method, as it clears the clipboard each time it’s called; you need to create a DataObject instance, call its SetData method once with HTML and once with plain text, and then set the object to clipboard using Clipboard.SetDataObject.
Update
See "Setting HTML/Text to Clipboard revisited" for more details and ClipboardHelper implementation.
I found some code: https://www.experts-exchange.com/questions/21966855/Create-a-hyperlink-in-VB-net-copy-to-clipboard-Should-be-able-to-paste-hyperlink-in-Microsoft-Word-Excel.html
This code handles the problems of updating the start and end indexes.
Converted to c#:
public void AddHyperlinkToClipboard(string link, string description)
{
const string sContextStart = "<HTML><BODY><!--StartFragment -->";
const string sContextEnd = "<!--EndFragment --></BODY></HTML>";
const string m_sDescription = "Version:1.0" + Constants.vbCrLf + "StartHTML:aaaaaaaaaa" + Constants.vbCrLf + "EndHTML:bbbbbbbbbb" + Constants.vbCrLf + "StartFragment:cccccccccc" + Constants.vbCrLf + "EndFragment:dddddddddd" + Constants.vbCrLf;
string sHtmlFragment = "" + description + "";
string sData = m_sDescription + sContextStart + sHtmlFragment + sContextEnd;
sData = sData.Replace("aaaaaaaaaa", m_sDescription.Length.ToString().PadLeft(10, '0'));
sData = sData.Replace("bbbbbbbbbb", sData.Length.ToString().PadLeft(10, '0'));
sData = sData.Replace("cccccccccc", (m_sDescription + sContextStart).Length.ToString().PadLeft(10, '0'));
sData = sData.Replace("dddddddddd", (m_sDescription + sContextStart + sHtmlFragment).Length.ToString().PadLeft(10, '0'));
sData.Dump();
Clipboard.SetDataObject(new DataObject(DataFormats.Html, sData), true );
}
Let me share a helper for setting the clipboard data as HTML, which I've just come up with for my little side project #DevComrade:
var dataObject = new DataObject();
dataObject.SetData(DataFormats.Html, ClipboardFormats.ConvertHtmlToClipboardData(html);
Host.SetClipboardDataObject(dataObject);
internal static class ClipboardFormats
{
static readonly string HEADER =
"Version:0.9\r\n" +
"StartHTML:{0:0000000000}\r\n" +
"EndHTML:{1:0000000000}\r\n" +
"StartFragment:{2:0000000000}\r\n" +
"EndFragment:{3:0000000000}\r\n";
static readonly string HTML_START =
"<html>\r\n" +
"<body>\r\n" +
"<!--StartFragment-->";
static readonly string HTML_END =
"<!--EndFragment-->\r\n" +
"</body>\r\n" +
"</html>";
public static string ConvertHtmlToClipboardData(string html)
{
var encoding = new System.Text.UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
var data = Array.Empty<byte>();
var header = encoding.GetBytes(String.Format(HEADER, 0, 1, 2, 3));
data = data.Concat(header).ToArray();
var startHtml = data.Length;
data = data.Concat(encoding.GetBytes(HTML_START)).ToArray();
var startFragment = data.Length;
data = data.Concat(encoding.GetBytes(html)).ToArray();
var endFragment = data.Length;
data = data.Concat(encoding.GetBytes(HTML_END)).ToArray();
var endHtml = data.Length;
var newHeader = encoding.GetBytes(
String.Format(HEADER, startHtml, endHtml, startFragment, endFragment));
if (newHeader.Length != startHtml)
{
throw new InvalidOperationException(nameof(ConvertHtmlToClipboardData));
}
Array.Copy(newHeader, data, length: startHtml);
return encoding.GetString(data);
}
}
I used this and this references. Also, kudos #DaveyBoy for spotting a bug.
Arthur is right about the header, but the important thing to note here is that the data isn't going to be on the clipboard as plain text. You have to use CF_HTML. You can read about that at MSDN: http://msdn.microsoft.com/en-us/library/aa767917(v=vs.85).aspx
To be proper, you'd have a CF_TEXT showing simply: "Hello!", and then CF_HTML with the HTML header and data, as in Arthur's example.
As Arthur had mentioned I used the code at Setting HTML/Text to Clipboard revisited
I had to add linefeeds to the Header to get it to work (in this case VB)
Private Const Header As String = "Version:0.9" & vbCrLf & "StartHTML:<<<<<<<<1" & vbCrLf & "EndHTML:<<<<<<<<2" & vbCrLf & "StartFragment:<<<<<<<<3" & vbCrLf & "EndFragment:<<<<<<<<4" & vbCrLf & "StartSelection:<<<<<<<<3" & vbCrLf & "EndSelection:<<<<<<<<4"
Hope this helps
I've just been told I need to update my links from posting back so I added an Ajax Update Panel to handle the requests instead so there isn't a postback. My links are generated from the codebehind and populate a placeholder. I need to update the code so that it will be able to run server side code. I have tried programmatically adding via controls.add but I couldn't get anything to run server side code.
while (myReader.Read())
{
try
{
eventListCounter++;
string strEventTitle = myReader["eventTitle"].ToString();
string strEventThumb = myReader["eventThumb"].ToString();
string strEventInfo = myReader["eventInfo"].ToString();
int eventID = Int32.Parse(myReader["ID"].ToString());
strHTML += "<div class='styleWeekRedCarpetEventBox'><div class='styleWeekRedCarpetPictureBox'>";
strHTML +="<a href='RedCarpet.aspx?eventID=" + eventID.ToString() +"' runat='server' id='linkEventShowImageSet'><img Width='125' Height='95'src='Images/" + strEventThumb.ToString() + "' border='0'></a></div><div id='styleWeekRedCarpetPictureBoxText'><p><a href='RedCarpet.aspx?eventID=" + eventID + "' runat='server' id='linkEventShowImageSet'><b>" + strEventTitle.ToString() + "</b><br />" + strEventInfo.ToString() + "</a></p></div></div><br><br>";
if (eventListCounter == 5)
{
strHTML += "</div><div class=\"eventListColumn\">";
eventListCounter = 0;
}
}
catch (Exception strError)
{
Response.Write(strError.ToString());
}
}
LiteralControl eventListPlaceholder = new LiteralControl(strHTML.ToString());
PlaceHolder2.Controls.Add(eventListPlaceholder);
This looks like the perfect place to use a repeater control, you should shy away from using Response.Write and let the framework abstract that away from you.
There is a high chance this this will solve the event not being fired server-side.