How can I load only specific elements in AngleSharp? - c#

I'm using AngleSharp to parse HTML5 at the moment what I'm doing is wrapping the elements I want to parse with a little bit of HTML to make it a valid HTML5 and then use the parser on that, is there a better of doing it? meaning, parsing specific elements directly and validate that the structure is indeed HTML5?

Hm, a little example would be nice. But AngleSharp does support fragment parsing, which sounds like the thing you want. In general fragment parsing is also applied when you set properties like InnerHtml, which transform strings to DOM nodes.
You can use the ParseFragment method of the HtmlParser class to get a list of nodes contained in the given source code. An example:
using AngleSharp.Parser.Html;
// ...
var source = "<div><span class=emphasized>Works!</span></div>";
var parser = new HtmlParser();
var nodes = parser.ParseFragment(source, null);//null = no context given
if (nodes.Length == 0)
Debug.WriteLine("Apparently something bad happened...");
foreach (var node in nodes)
{
// Examine the node
}
Usually all nodes will be IText or IElement types. Also comments (IComment) are possible. You will never see IDocument or IDocumentFragment nodes attached to such an INodeList. However, since HTML5 is quite robust it is very likely that you will never experience "errors" using this method.
What you can do is to look for (parsing) errors. You need to provide an IConfiguration that exposes an event aggregator, which collects such events. The simplest implementation for aggregating only such events (without possibility of adding / removing multiple handlers) is the following:
using AngleSharp.Events;
// ...
class SimpleEventAggregator : IEventAggregator
{
readonly List<HtmlParseErrorEvent> _errors = new List<HtmlParseErrorEvent>();
public void Publish<TEvent>(TEvent data)
{
var error = data as HtmlParseErrorEvent;
if (error != null)
_errors.Add(error);
}
public List<HtmlParseErrorEvent> Errors
{
get { return _errors; }
}
public void Subscribe<TEvent>(ISubscriber<TEvent> listener) { }
public void Unsubscribe<TEvent>(ISubscriber<TEvent> listener) { }
}
The simplest way to use the event aggregator with a configuration is to instantiate a new (provided) Configuration. Here as a sample snippet.
using AngleSharp;
// ...
var errorEvents = new SimpleEventAggregator();
var config = new Configuration(events: errorEvents);
Please note: Every error that is reported is an "official" error (according to W3C spec.). These errors do not indicate that the provided code is malicious or invalid, just that something is not following the spec and that a fallback had to be applied.
Hope this answers your question. If not, then please let me know.
Update Updated the answer for the latest version of AngleSharp.

Related

How can I modify steps/tags with a specflow generator plugin?

I am trying to write a generator plugin for specflow to modify the steps or tags that end up in the generated feature.cs files.
I have tried to follow the method described here:
SpecFlow- is it possible to programmatically add lines to a scenario?
relevant code:
public new CodeNamespace GenerateUnitTestFixture(Feature feature, string testClassName, string targetNamespace)
{
foreach (var scenario in feature.Scenarios)
{
scenario.Steps.Insert(0, new Given { Text = "Given I have <Theme> set as my current theme" });
//add any other steps you need....
}
return base.GenerateUnitTestFixture(feature, testClassName, targetNamespace);
}
The definition of GenerateUnitTestFixture has since changed in UnitTestFeatureGenerator, but using roughly the same method I am unable to insert steps as the entire GherkinDocument argument passed in to the function is readonly including steps so they cannot be modified or added to.
public new CodeNamespace GenerateUnitTestFixture(SpecFlowDocument document, string testClassName, string targetNamespace)
{
var feature = document.Feature;
foreach (ScenarioDefinition scenarioDefinition in feature.Children)
{
scenarioDefinition.Steps.///Cannot insert
}
return base.GenerateUnitTestFixture(document, testClassName, targetNamespace);
}
The same problem occurs for tags. I cant find much up-to-date documentation on this stuff, is it still possible to modify steps/tags or is it no longer supported?

CodeFluent Aspect for Full-Text Index

I'm trying to develop a CodeFluent aspect to set a property of a entity to be a full-text index.
I've found this link, which does something similar to what I'm aiming for.
http://blog.codefluententities.com/2012/11/27/using-the-sql-server-template-producer-to-generate-clustered-indexes/
However this uses a SQL template producer. Are there anyway to set a property to be a full-text index entirely in the aspect itself, so I don't have to install/maintain both template producer and aspect for all projects?
Here's the C# aspect code I have so far:
public class FullTextIndexing : IProjectTemplate
{
public static readonly XmlDocument Descriptor;
public const string Namespace = "http://www.softfluent.com/aspects/samples/FullTextIndexing";
static FullTextIndexing()
{
Descriptor = new XmlDocument();
Descriptor.LoadXml(
#"<cf:project xmlns:cf='http://www.softfluent.com/codefluent/2005/1' defaultNamespace='FullTextIndexing'>
<cf:pattern name='Full Text Indexing' namespaceUri='" + Namespace + #"' preferredPrefix='fti' step='Tables'>
<cf:message class='_doc'>CodeFluent Full Text Indexing Aspect</cf:message>
<cf:descriptor name='fullTextIndexing'
typeName='boolean'
category='Full Text Indexing'
targets='Property'
defaultValue='false'
displayName='Full-Text Index'
description='Determines if property should be full text indexed.' />
</cf:pattern>
</cf:project>");
}
public Project Project { get; set; }
public XmlDocument Run(IDictionary context)
{
if (context == null || !context.Contains("Project"))
{
// we are probably called for meta data inspection, so we send back the descriptor xml
return Descriptor;
}
// the dictionary contains at least these two entries
Project = (Project)context["Project"];
// the dictionary contains at least these two entries
XmlElement element = (XmlElement)context["Element"];
Project project = (Project)context["Project"];
foreach (Entity entity in project.Entities)
{
Console.WriteLine(">>PROPERTY LOGGING FOR ENTITY "+entity.Name.ToUpper()+":<<");
foreach (Property property in entity.Properties)
{
Log(property);
if(MustFullTextIndex(property))
{
Console.WriteLine("CHANGING PROPERTY");
property.TypeName = "bool";
Log(property);
}
}
}
// we have no specific Xml to send back, but aspect description
return Descriptor;
}
private static bool MustFullTextIndex(Property property)
{
return property != null && property.IsPersistent && property.GetAttributeValue("fullTextIndexing", Namespace, false);
}
private static void Log(Property property)
{
Console.WriteLine(property.Trace());
}
}
EDIT ONE:
Following Meziantou's answer, I'm trying to create a template producer, but it's giving me compilation errors when I try to add the new template producer to the project producers list, so I'm probably doing it wrong.
The error says:
Cannot convert type 'CodeFluent.Model.Producer' to 'CodeFluent.Producers.SqlServer.TemplateProducer'
Here's the code I have thus far:
public XmlDocument Run(IDictionary context)
{
if (context == null || !context.Contains("Project"))
{
// we are probably called for meta data inspection, so we send back the descriptor xml
return Descriptor;
}
// the dictionary contains at least these two entries
XmlElement element = (XmlElement)context["Element"];
Project project = (Project)context["Project"];
CodeFluent.Producers.SqlServer.TemplateProducer producer = new CodeFluent.Producers.SqlServer.TemplateProducer();
producer.AddNamespace("CodeFluent.Model");
producer.AddNamespace("CodeFluent.Model.Persistence");
producer.AddNamespace("CodeFluent.Producers.SqlServer");
Console.WriteLine(producer.Element);
//TODO: Need to figure out how to modify the actual template's contents
project.Producers.Add(producer); //Error happens here
// we have no specific Xml to send back, but aspect description
return Descriptor;
}
In the sample code, the aspect is used only because it has a descriptor. Descriptors are used by CodeFluent Entities to populate the property grid:
<cf:descriptor name="IsClusteredIndex" typeName="boolean" targets="Property" defaultValue="false" displayName="IsClusteredIndex" />
So when you set the value of this property to true or false, the xml attribute ns:IsClusteredIndex is added or removed from the xml file.
Then the SQL Template reads the value of the attribute to generate the expected SQL file:
property.GetAttributeValue("sa:IsClusteredIndex", false)
So the aspect is not mandatory, but provides a graphical interface friendly way to add/remove the attribute. If you don't need to integrate into the graphical interface, you can safely remove the aspect.
If your goal is to integrate into the graphical interface, you need an aspect (XML or DLL) or a producer. If you don't want to create a producer, you can embed the template into your aspect. During the build, you can extract the SQL template and add the SQL Template producer to the project, this way everything is located in the aspect.

Keeping track of user customization's c#

Good evening; I have an application that has a drop down list; This drop down list is meant to be a list of commonly visited websites which can be altered by the user.
My question is how can I store these values in such a manor that would allow the users to change it.
Example; I as the user, decide i want google to be my first website, and youtube to be my second.
I have considered making a "settings" file however is it practical to put 20+ websites into a settings file and then load them at startup? Or a local database, but this may be overkill for the simple need.
Please point me in the right direction.
Given you have already excluded database (probably for right reasons.. as it may be over kill for a small app), I'd recommend writing the data to a local file.. but not plain text..
But preferably serialized either as XML or JSON.
This approach has at least two benefits -
More complex data can be stored in future.. example - while order can be implicit, it can be made explicit.. or additional data like last time the url was used etc..
Structured data is easier to validate against random corruption.. If it was a plain text file.. It will be much harder to ensure its integrity.
The best would be to use the power of Serializer and Deserializer in c#, which will let you work with the file in an Object Oriented. At the same time you don't need to worry about storing into files etc... etc...
Here is the sample code I quickly wrote for you.
using System;
using System.IO;
using System.Collections;
using System.Xml.Serialization;
namespace ConsoleApplication3
{
public class UrlSerializer
{
private static void Write(string filename)
{
URLCollection urls = new URLCollection();
urls.Add(new Url { Address = "http://www.google.com", Order = 1 });
urls.Add(new Url { Address = "http://www.yahoo.com", Order = 2 });
XmlSerializer x = new XmlSerializer(typeof(URLCollection));
TextWriter writer = new StreamWriter(filename);
x.Serialize(writer, urls);
}
private static URLCollection Read(string filename)
{
var x = new XmlSerializer(typeof(URLCollection));
TextReader reader = new StreamReader(filename);
var urls = (URLCollection)x.Deserialize(reader);
return urls;
}
}
public class URLCollection : ICollection
{
public string CollectionName;
private ArrayList _urls = new ArrayList();
public Url this[int index]
{
get { return (Url)_urls[index]; }
}
public void CopyTo(Array a, int index)
{
_urls.CopyTo(a, index);
}
public int Count
{
get { return _urls.Count; }
}
public object SyncRoot
{
get { return this; }
}
public bool IsSynchronized
{
get { return false; }
}
public IEnumerator GetEnumerator()
{
return _urls.GetEnumerator();
}
public void Add(Url url)
{
if (url == null) throw new ArgumentNullException("url");
_urls.Add(url);
}
}
}
You clearly need some sort of persistence, for which there are a few options:
Local database
- As you have noted, total overkill. You are just storing a list, not relational data
Simple text file
- Pretty easy, but maybe not the most "professional" way. Using XML serialization to this file would allow for complex data types.
Settings file
- Are these preferences really settings? If they are, then this makes sense.
The Registry - This is great for settings you don't want your users to ever manually mess with. Probably not the best option for a significant amount of data though
I would go with number 2. It doesn't sound like you need any fancy encoding or security, so just store everything in a text file. *.ini files tend to meet this description, but you can use any extension you want. A settings file doesn't seem like the right place for this scenario.

Convert a string or html file to C# HtmlDocument without using WebBrowser or HAP

The only solution I could find was using:
mshtml.HTMLDocument htmldocu = new mshtml.HTMLDocument();
htmldocu .createDocumentFromUrl(url, "");
and I am not sure about the performance, it should be better than loading the html file in a WebBrowser and then grab the HtmlDocument from there. Anyhow, that code does not work on my machine. The application crashes when it tries to execute the second line.
Has anyone an approach to achieve this efficiently or any other way?
NOTE: Please understand that I need the HtmlDocument object for DOM processing. I do not need the html string.
Use the DownloadString method of the WebClient object. e.g.
WebClient client = new WebClient();
string reply = client.DownloadString("http://www.google.com");
In the above example, after executed, reply will contain the html markup of the endpoint http://www.google.com.
WebClient.DownloadString MSDN
In an attempt to answer your actual question from four years ago (at the time of me posting this answer), I'm providing a working solution. I wouldn't be surprised if you found another way to do this, either, so this is mostly for other people searching for a similar solution. Keep in mind, however, that this is considered
somewhat obsolete (the actual use of HtmlDocument)
not the best way to handle HTML DOM parsing (the preferred solution is to use HtmlAgilityPack or CsQuery or some other method using actual parsing and not regular expressions)
extremely hacky and therefore not the safest/most compatible way to do it
you really should not be doing what I'm about to show
Additionally, keep in mind that HtmlDocument is really just a wrapper for mshtml.HTMLDocument2, so it is technically slower than just using a COM wrapper directly, but I completely understand the use case simply for ease of coding.
If you're cool with all of the above, here's how to accomplish what you want.
public class HtmlDocumentFactory
{
private static Type htmlDocType = typeof(System.Windows.Forms.HtmlDocument);
private static Type htmlShimManagerType = null;
private static object htmlShimSingleton = null;
private static ConstructorInfo docCtor = null;
public static HtmlDocument Create()
{
if (htmlShimManagerType == null)
{
// get a type reference to HtmlShimManager
htmlShimManagerType = htmlDocType.Assembly.GetType(
"System.Windows.Forms.HtmlShimManager"
);
// locate the necessary private constructor for HtmlShimManager
var shimCtor = htmlShimManagerType.GetConstructor(
BindingFlags.NonPublic | BindingFlags.Instance, null, new Type[0], null
);
// create a new HtmlShimManager object and keep it for the rest of the
// assembly instance
htmlShimSingleton = shimCtor.Invoke(null);
}
if (docCtor == null)
{
// get the only constructor for HtmlDocument (which is marked as private)
docCtor = htmlDocType.GetConstructors(
BindingFlags.NonPublic | BindingFlags.Instance
)[0];
}
// create an instance of mshtml.HTMLDocument2 (in the form of
// IHTMLDocument2 using HTMLDocument2's class ID)
object htmlDoc2Inst = Activator.CreateInstance(Type.GetTypeFromCLSID(
new Guid("25336920-03F9-11CF-8FD0-00AA00686F13")
));
var argValues = new object[] { htmlShimSingleton, htmlDoc2Inst };
// create a new HtmlDocument without involving WebBrowser
return (HtmlDocument)docCtor.Invoke(argValues);
}
}
To use it:
var htmlDoc = HtmlDocumentFactory.Create();
htmlDoc.Write("<html><body><div>Hello, world!</body></div></html>");
Console.WriteLine(htmlDoc.Body.InnerText);
// output:
// Hello, world!
I have not tested this code directly -- I have translated it from an old Powershell script that needed the same functionality you're requesting. If it fails, let me know. The functionality is there but the code might need very minor tweaking to get working.

sitecore RSS caching

I have been working on implementing a custom RSS feed in sitecore 6.4. My custom behaviour is very limited, all i effectively wanted to is add a link for author (our author field is a reference field so we cannot use the built in author attribution).
I overrode RenderItem() on the PublicFeed class so that i could make use of my own implementation of the FeedRenderer class (where the author logic is housed). my approach follows this pattern outlined by John West for adding your own rendering behaviour:
public class MyPUblicFeed: PublicFeed
{
protected override SyndicationItem RenderItem(Item item)
{
Assert.ArgumentNotNull(item, "item");
Control rendererControl = FeedUtil.GetFeedRendering(item);
if (rendererControl == null)
{
return null;
}
using (new ContextItemSwitcher(item))
{
var myRenderer= rendererControl as MyFeedRenderer;
if (myRenderer!= null)
{
myRenderer.Database = SitecoreHelper.CurrentDatabase.Name;
return myRenderer.RenderItem();
}
var renderer = rendererControl as Sitecore.Web.UI.WebControls.FeedRenderer;
if (renderer != null)
{
renderer.Database = SitecoreHelper.CurrentDatabase.Name;
return renderer.RenderItem();
}
}
throw new InvalidOperationException("FeedRenderer rendering must be of Sitecore.Web.UI.WebControls.FeedRenderer type");
}
}
And now for my rendering class:
public class MyFeedRenderer: Sitecore.Web.UI.WebControls.FeedRenderer
{
public override SyndicationItem RenderItem()
{
Item item = base.GetItem();
var syndicationItem = base.RenderItem();
//unfortunately we have to parse params again :(
FeedRenderingParameters feedRenderingParameter = FeedRenderingParameters.Parse(base.Parameters);
AddAuthor(syndicationItem, item, feedRenderingParameter);
return syndicationItem;
}
private static void AddAuthor(SyndicationItem syndicationItem, Item item, FeedRenderingParameters feedRenderingParameter)
{
//clear out authors added by base class
syndicationItem.Authors.Clear();
//logic for adding author here
}
}
this all works great, outputting exactly what i want, but the caching element doesn't appear to be working. I have set the cacheable flag on the actual item itself with a timespan of 01:00:00. This didn't appear to work - if i put a breakpoint in either of the above classes it is hit everytime the feed is requested.
so then i tried to enable caching at a control level, turning caching on with VaryByData for the MyFeedRenderer rendering. alas this isn't working either, the breakpoint is hit every time.
Can anyone offer any advice on this matter? the documentation simply recommends turning it on on the actual feed item, not at the Rendering level, but neither seem to be working for me. Interestingly HTML caching is working elsewhere - is RSS also put into the HTML cache?
Thanks in advance,
Nick
-Ensure the Cacheable checkbox in the feed definition item is checked.
-Ensure that you have published the feed definition item.
-If you do not populate the Cache Duration field in the feed definition item, it should default to one day.
-Feeds appear to cache in Sitecore.Syndication.FeedManager.Cache rather than the site output cache. Inspect that cache object in the Visual Studio debugger after calling your feed, and then again after calling that feed a second time, to try to see if any records appear, and if multiple cache keys appear for the same feed. Investigate the Render() method; if PublicFeed.IsCacheable() returns false (depending on the Cacheable field in the feed definition item), PublicFeed.Render() does not cache.
-Ensure nothing else clears caches between your requests for the feed.
SDN forum thread: http://sdn.sitecore.net/forum/ShowPost.aspx?PostID=40591

Categories

Resources