Using XhtmlTextWriter with XmlTextReader

Using XhtmlTextWriter with XmlTextReader - c#

After reading this article, I have decided to update the following code (using XmlDocument) with XmlReader:
Rendering controls
public string Rendering(Control baseControl)
{
StringBuilder stringBuilder = new StringBuilder();
using (StringWriter stringWriter = new StringWriter(stringBuilder))
using (XhtmlTextWriter htmlWriter = new XhtmlTextWriter(stringWriter))
{
baseControl.RenderControl(htmlWriter);
return PretifyWithNewlines(stringBuilder.ToString());
}
}
Adding newline after each node
private string PretifyWithNewlines(string minifiedMarkup)
{
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.XmlResolver = null;
try
{
xmlDocument.LoadXml("<base>" + minifiedMarkup + "</base>");
}
catch // when minifiedMarkup contains the whole HTML with DTD tag defined,
{ // it throws an exception with <base>
xmlDocument.LoadXml(minifiedMarkup);
}
return recursiveOperation(xmlDocument.ChildNodes)
.Replace(Environment.NewLine + Environment.NewLine, Environment.NewLine)
.Replace(Environment.NewLine + "<base>" + Environment.NewLine, "")
.Replace(Environment.NewLine + "</base>" + Environment.NewLine, "");
}
Recursively traverse each node and plant new element
private static string recursiveOperation(XmlNodeList xmlNodeList)
{
string result = "";
foreach (XmlNode currentNode in xmlNodeList)
{
XmlNode clonedNode = currentNode;
string interimMarkup = recursiveOperation(currentNode.ChildNodes);
try
{
clonedNode.InnerXml = interimMarkup;
}
finally
{
result += Environment.NewLine + clonedNode.OuterXml + Environment.NewLine;
}
}
return result;
}
Questions:
Is there a room for optimizing the existing code?
How would I go about directly instantiating XmlTextReader from Control, StringWriter or XhtmlTextWriter object? Or do I really need to render it as a string first then instantiate XmlTextReader?
Edit
As per Jon Skeet's answer, here is the update. The idea is to implode a newline after each element:
Before prettify:
<div class="tag"><span>text<span class="another-span"></span></span></div><div>Text<img src="some/relative/URL/" />namely</div>
After prettify:
<div class="tag">
<span>
text
<span class="another-span"></span>
</span>
</div>
<div>
Text
<img src="some/relative/URL/" />
namely
</div>
Notice how span.another-span keep collapsed while everything else (with child nodes) expanded. The indentation will be asserted by Visual Studio.

Is there a room for optimizing the existing code?
Absolutely. The first place I'd change has nothing to do with how you load the XML - it's string concatenation. I'd change your recursiveOperation method to:
private static string RecursiveOperation(XmlNodeList xmlNodeList)
{
StringBuilder result = new StringBuilder();
foreach (XmlNode currentNode in xmlNodeList)
{
XmlNode clonedNode = currentNode;
// Remove try/finally block - if an exception is thrown your
// result will be lost anyway
string interimMarkup = RecursiveOperation(currentNode.ChildNodes);
clonedNode.InnerXml = interimMarkup;
result.Append(Environment.NewLine)
.Append(clonedNode.OuterXml)
.Append(Environment.NewLine);
}
return result.ToString();
}
It's possible you could optimize this further using a single StringBuilder passed into RecursiveOperation, but I haven't quite got to grips with your code sufficiently to say yet. (It's before the first coffee of the morning.)
In terms of the XML handling itself, you're currently doing a lot of reparsing by setting the OuterXml node in each child (recursively). I suspect if I had a better grasp of what you were doing, it would be feasible to change the whole approach. Given that this is functionality that XmlReader really doesn't have (it wouldn't make sense), it's not clear that you should be taking much notice of the other article at the moment.
How would I go about directly instantiating XmlTextReader from Control, StringWriter or XhtmlTextWriter object? Or do I really need to render it as a string first then instantiate XmlTextReader?
It's not clear what it would even mean to create an XmlTextReader from any of those objects - they're not inherently a source of XML data. I think what you've got already looks reasonable to me.
If you're still concerned about the performance, you should avoid guesswork and use a profiler to measure where the time is being taken. You should set yourself a target first though, otherwise you won't know when you've finished optimizing.

Related

Iterate through web pages and download PDFs

I have a code for crawling through all PDF files on web page and download them to folder. However now it started to drop an error:
System.NullReferenceException HResult=0x80004003 Message=Object
reference not set to an instance of an object. Source=NW Crawler
StackTrace: at NW_Crawler.Program.Main(String[] args) in
C:\Users\PC\source\repos\NW Crawler\NW Crawler\Program.cs:line 16
Pointing to ProductListPage in foreach (HtmlNode src in ProductListPage)
Is there any hint on how to fix this issue? I have tried to implement async/await with no success. Maybe I was doing something wrong tho...
Here is the process to be done:
Go to https://www.nordicwater.com/products/waste-water/
List all links in section (related products). They are: <a class="ap-area-link" href="https://www.nordicwater.com/product/mrs-meva-multi-rake-screen/">MRS MEVA multi rake screen</a>
Proceed to each link and search for PDF files. PDF files are in:
<div class="dl-items">
<a href="https://www.nordicwater.com/wp-content/uploads/2016/04/S1126-MRS-brochure-EN.pdf" download="">
Here is my full code for testing:
using HtmlAgilityPack;
using System;
using System.Net;
namespace NW_Crawler
{
class Program
{
static void Main(string[] args)
{
{
HtmlDocument htmlDoc = new HtmlWeb().Load("https://www.nordicwater.com/products/waste-water/");
HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//a[#class='ap-area-link']//a");
Console.WriteLine("Here are the links:" + ProductListPage);
foreach (HtmlNode src in ProductListPage)
{
htmlDoc = new HtmlWeb().Load(src.Attributes["href"].Value);
// Thread.Sleep(5000); // wait some time
HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[#class='dl-items']//a");
if (LinkTester != null)
{
foreach (var dllink in LinkTester)
{
string LinkURL = dllink.Attributes["href"].Value;
Console.WriteLine(LinkURL);
string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/"));
var DLClient = new WebClient();
// Thread.Sleep(5000); // wait some time
DLClient.DownloadFileAsync(new Uri(LinkURL), #"C:\temp\" + ExtractFilename);
}
}
}
}
}
}
}

Made a couple of changes to cover the errors you might be seeing.
Changes
Use of src.GetAttributeValue("href", string.Empty) instead of src.Attribute["href"].Value;. If the href is not present or null, you will get Object Reference Not Set to an instance of an object
Check if ProductListPage is valid and not null.
ExtractFileName includes a / in the name. You want to use + 1 in the substring method to skip that 'Last / from index of)'.
Move on to the next iteration if the href is null on either of the loops
Changed the Product List query to //a[#class='ap-area-link'] from //a[#class='ap-area-link']//a. You were searching for <a> within the <a> tag which is null. Still, if you want to query it this way, the first IF statement to check if ProductListPage != null will take care of errors.
HtmlDocument htmlDoc = new HtmlWeb().Load("https://www.nordicwater.com/products/waste-water/");
HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//a[#class='ap-area-link']");
if (ProductListPage != null)
foreach (HtmlNode src in ProductListPage)
{
string href = src.GetAttributeValue("href", string.Empty);
if (string.IsNullOrEmpty(href))
continue;
htmlDoc = new HtmlWeb().Load(href);
HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[#class='dl-items']//a");
if (LinkTester != null)
foreach (var dllink in LinkTester)
{
string LinkURL = dllink.GetAttributeValue("href", string.Empty);
if (string.IsNullOrEmpty(LinkURL))
continue;
string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/") + 1);
new WebClient().DownloadFileAsync(new Uri(LinkURL), #"C:\temp\" + ExtractFilename);
}
}

The Xpath that you used seems to be incorrect. I tried loading the web page in a browser and did a search for the xpath and got no results. I replaced it with //a[#class='ap-area-link'] and was able to find matching elements, screenshot below.

Creating XML from MarkUp HTML

I've got a few web pages that have static data in HTML mark-up tables. By this, I mean, manually maintained text:
<table border="1" >
<tr><th>Number</th><th>Date</th><th>BW</th><th>WW</th><th>%</th><th>Type</th><th>CED</th><th>BW</th><th>WW</th><th>YW</th><th>Mlk</th><th>Me</th></tr>
<tr><td>313</td><td>9/16/2013</td><td>74</td><td>512</td><td>100</td><td>861U</td><td>3</td><td>-1.1</td><td>54</td><td>85</td><td>16</td><td></td></tr>
<tr><td>315</td><td>10/6/2013</td><td>-</td><td>-</td><td>-</td><td>W179</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr>
<tr><td>316</td><td>10/102013</td><td>72</td><td>595</td><td>94.2</td><td>W179</td><td>7</td><td>-2.3</td><td>53</td><td>80</td><td>21</td><td>-3</td></tr>
<tr><td>350</td><td>10/11/2013</td><td>71</td><td>703</td><td>100</td><td>W179</td><td>7</td><td>-2.3</td><td>46</td><td>72</td><td>20</td><td>-5</td></tr>
<tr><td>392</td><td>3/8/2013</td><td>61</td><td>651</td><td>100</td><td>RANGER</td><td>7</td><td>-2.3</td><td>52</td><td>82</td><td>20</td><td>-2</td></tr>
<tr><td>303</td><td>7/3/2013</td><td>63</td><td>-</td><td>97.1</td><td>W179</td><td>8</td><td>-3.2</td><td>N/A</td><td>82</td><td>21</td><td>-8</td></tr>
<tr><td>304</td><td>7/8/2013</td><td>62</td><td>-</td><td>97.1</td><td>W179</td><td>7</td><td>-3.9</td><td>N/A</td><td>69</td><td>20</td><td>-4</td></tr>
<tr><td>397</td><td>3/18/2013</td><td>78</td><td>621</td><td>100</td><td>STATEMENT</td><td>6</td><td>-2.7</td><td>55</td><td>84</td><td>19</td><td>5</td></tr>
<tr><td>395</td><td>3/17/2013</td><td>63</td><td>716</td><td>94.2</td><td>STATEMENT</td><td>5</td><td>-2.7</td><td>54</td><td>85</td><td>19</td><td>5</td></tr>
<tr><td>390</td><td>3/6/2013</td><td>66</td><td>583</td><td>94.2</td><td>ENVY</td><td>2</td><td>-0.6</td><td>55</td><td>80</td><td>23</td><td>2</td></tr>
<tr><td>388</td><td>3/4/2013</td><td>53</td><td>621</td><td>100</td><td>STATEMENT</td><td>10</td><td>-5.1</td><td>49</td><td>82</td><td>20</td><td>2</td></tr>
<tr><td>300</td><td>3/22/2013</td><td>61</td><td>633</td><td>100</td><td>RANGER</td><td>8</td><td>-2.8</td><td>49</td><td>81</td><td>19</td><td>-2</td></tr>
<tr><td>379</td><td>2/1/2013</td><td>55</td><td>518</td><td>100</td><td>STATEMENT</td><td>8</td><td>-4.1</td><td>61</td><td>98</td><td>18</td><td>1</td></tr>
<tr><td>398</td><td>3/20/2013</td><td>62</td><td>664</td><td>100</td><td>RANGER</td><td>6</td><td>-2.3</td><td>53</td><td>83</td><td>20</td><td>0</td></tr>
<tr><td>384</td><td>2/10/2013</td><td>61</td><td>650</td><td>100</td><td>ENVY</td><td>3</td><td>-1</td><td>50</td><td>70</td><td>19</td><td>4</td></tr>
<tr><td>369</td><td>1/30/2013</td><td>76</td><td>651</td><td>100</td><td>STATEMENT</td><td>5</td><td>-2.4</td><td>60</td><td>99</td><td>20</td><td>8</td></tr>
<tr><td>373</td><td>1/21/2013</td><td>71</td><td>433</td><td>100</td><td>STATEMENT</td><td>4</td><td>-1.6</td><td>55</td><td>89</td><td>17</td><td>3</td></tr>
<tr><td>393</td><td>3/10/2013</td><td>63</td><td>717</td><td>100</td><td>STATEMENT</td><td>3</td><td>-4.6</td><td>51</td><td>91</td><td>20</td><td>5</td></tr>
<tr><td>389</td><td>3/8/2013</td><td>72</td><td>723</td><td>88.3</td><td>ENVY</td><td>4</td><td>-0.6</td><td>54</td><td>76</td><td>24</td><td>2</td></tr>
<tr><td>364</td><td>10/1/2012</td><td>60</td><td>574</td><td>100</td><td>RANGER</td><td>1</td><td>0.4</td><td>56</td><td>84</td><td>21</td><td>2</td></tr>
</table>
Currently, I am contemplating using a WebClient.DownloadString to pull all of the text in, and try to create an XML file out of it by parsing each row <tr>.
That sounds tedious, and I would rather not reinvent the wheel. Besides, a few good solutions would give me something to look at for ideas on how to best approach writing my version.
Has anyone come across some code that can do this?
I've started, to give you an idea of what I'm working on:
private const string XML_DATA = "App_Data/page_data.xml";
private const string TABLE_START = "<table>";
private const string TABLE_STOP = "</table>";
private string[] TABLE_ROW = { "<tr>", "</tr>" };
private string[] TABLE_HEAD = { "<th>", "</th>" };
private string[] TABLE_DET = { "<td>", "</td>" };
private void load_data() {
if (!File.Exists(XML_DATA)) {
string HtmlText;
using (var client = new WebClient()) {
HtmlText = client.DownloadString(Server.MapPath("/Sales.aspx"));
}
if (!String.IsNullOrEmpty(HtmlText)) {
var lcTxt = HtmlText.ToLower();
int len0 = TABLE_START.Length;
int tStart = lcTxt.IndexOf(TABLE_START) + len0;
int tStop = lcTxt.IndexOf(TABLE_STOP);
if ((len0 < tStart) && (tStart < tStop)) {
var tableString = HtmlText.Substring(tStart, tStop - tStart);
var tableRows = tableString.Split(TABLE_ROW, StringSplitOptions.RemoveEmptyEntries);
foreach (var row in tableRows) {
if (-1 < row.IndexOf(TABLE_HEAD[0])) {
//
} else {
//
}
}
}
}
}
}
Of course, you can see that is already going to fail, because the Markup using <table border="1">.
Yes, easy to fix, but I'd rather have a working guide that has already been through a lot of debugging steps.
UPDATE: I tried using XmlDocument's LoadXml method, but it can't seem to read basic HTML:

You definitely shouldn't be trying to parse that manually. Other people have already solved that problem.
If your markup is valid XML (and from what you've shown us, it looks like it is), then you can just parse it as XML:
XmlDocument doc = new XmlDocument();
doc.LoadXml(HtmlString);
doc.Save("myfile.xml");
But for that matter, if it's already valid XML markup, and all you need to do is save it as a file, then you don't need to parse it. Just save it:
File.WriteAllText("myfile.xml", HtmlString);

xmlNode.SelectSingleNode always returns same value even though the node changes

I am reading in a bunch of XML files, transforming them and loading the data in to another system.
Previously I had done this using ThreadPool, however the provider of the files and therefore the structure has changed, so I'm now trying Aysync-Await and getting an odd result.
As I process the files I get a list of the xmlNodes and loop over them
foreach (XmlNode currentVenue in venueNodes)
{
Console.WriteLine(currentVenue.OuterXml);
Console.WriteLine(currentVenue.SelectSingleNode(#"//venueName").InnerText);
}
however the second WriteLine always returns the result expected for the first node, example:
<venue venueID="xartrix" lastModified="2012-08-20 10:49:30"><venueName>Artrix</venueName></venue>
Artrix
<venue venueID="xbarins" lastModified="2013-04-29 11:39:07"><venueName>The Barber Institute Of Fine Arts, University Of Birmingham</venueName></venue>
Artrix
<venue venueID="xbirmus" lastModified="2012-11-13 16:41:13"><venueName>Birmingham Museum & Art Gallery</venueName></venue>
Artrix
here is the complete code:
public async Task ProcessFiles()
{
string[] filesToProcess = Directory.GetFiles(_filePath);
List<Task> tasks = new List<Task>();
foreach (string currentFile in filesToProcess)
{
tasks.Add(Task.Run(()=>processFile(currentFile)));
}
await Task.WhenAll(tasks);
}
private async Task processFile(string currentFile)
{
try
{
XmlDocument currentXmlFile = new XmlDocument();
currentXmlFile.Load(currentFile);
//select nodes for processing
XmlNodeList venueNodes = currentXmlFile.SelectNodes(#"//venue");
foreach (XmlNode currentVenue in venueNodes)
{
Console.WriteLine(currentVenue.InnerXml);
Console.WriteLine(currentVenue.SelectSingleNode(#"//venueName").InnerText);
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
Obviously I've missed something, but I cannot see what, can someone point it out please?

SelectSingleNode returns only a single node in document order from the document. #jbl is correct, //venueName starts from the document root. The // xpath operator is the "descendent selector" operator.
I work with XML and XPath often and this is a common mistake. You need to make sure that your context node is correct when calling SelectSingleNode. So, like we just all said, using //venueName gets the first <venueName /> node in document order starting from the root of the document.
In order to get the <venueName /> node that is a child of the current node you're iterating over, you need to use the following code:
foreach (XmlNode currentVenue in venueNodes)
{
Console.WriteLine(currentVenue.OuterXml);
Console.WriteLine(currentVenue.SelectSingleNode(#".//venueName").InnerText); // The '.' means from the current node. Without it, searching starts from the document root, not currentVenue.
}
That should solve your problem.

Doesn't //venueName search from the document root ?
I guess that, combined with SelectSingleNode, will always end-up on the same resulting node (the first venueName node of the document)
You may try replacing //venueName with venueName

using List.Add() method is breaking my code out of a foreach loops

So, essentially, I'm running into an interesting issue where, when the call to the "CreateXML()" function in the following code is made, an xelement is created as intended, but then, when I attempt to add it to a collection of xeleents, instead of continuing the foreach loop from which the call to "CreateXML()" originated, the foreach loop is broken out of, and a call is made to "WriteXML()". Additionally, though an XElement is created and populated, it is not added to the List. [for clarification, the foreach loops I am referring to live in the "ParseDoc()" method]
private List<XElement> _xelemlist;
private void WriteXml()
{
XElement head = new XElement("header", new XAttribute("headerattributename", "attribute"));
foreach (XElement xelem in _xelemlist)
{
head.Add(xelem);
}
XDocument doc = new XDocument();
doc.Add(head);
}
private void CreateXML(string attname, string att)
{
XElement xelem = new XElement("name", new XElement("child", new XAttribute(attname, att), segment));
_xelemlist.Add(xelem);
}
private void ExtractSegment(HtmlNode node)
{
HtmlAttribute[] segatts = node.Attributes.ToArray();
string attname = segatts[0].Value.ToString();
string att = node.InnerText.ToString();
CreateXML(attname, att);
}
private HtmlDocument ParseDoc(HtmlDocument document)
{
try
{
HtmlNode root = document.DocumentNode.FirstChild;
foreach (HtmlNode childnode1 in root.SelectNodes(".//child1"))
{
foreach (HtmlNode childnode2 in node.SelectNodes(".//child2"))
{
ExtractSegment(childnode2);
}
}
}
catch (Exception e) { }
WriteXml();
return document;
}
When I comment out the "List.Add()" in "CreateXML()" and step through the code, the foreach loop is not broken out of after the first iteration, and the code works properly.
I have no idea what I'm doing wrong (And yes, the code is instantiated by a public member, don't worry: I am only posting the relevant internal methods to my problem)... if anyone has come across this sort of behavior before, I would really appreciate a push in the right direction to attempt to correct it... Sepcifically: is the problem just poor coding, or is this behavior a result of a property of one of the methods/libraries I am using?
One Caveat: I know that I am using HTMLAgilityPack to parse a file and extract information, but a requirement on this code forces me to use XDocument to write said information... don't ask me why.

I have no idea what I'm doing wrong
This, for starters:
catch (Exception e) { }
That's stopping you from seeing what on earth's going on. I strongly suspect you've got a NullReferenceException due to _xelemlist being null, but that's a secondary problem. The main problem is that by pretending everything's fine whatever happens, with no logging whatsoever, the only way of getting anywhere is by debugging, and that's an awful experience when you don't need to go through it.
It's extremely rarely a good idea catch exceptions and swallow them without any logging at all. It's almost never a good idea to do that with Exception.
Whenever you have a problem which is difficult to diagnose, improve your diagnostic capabilities first. That way, when you next run into a problem, it'll be easier to diagnose.

Declare the List this way,
private List<XElement> _xelemlist = new List<XElement>();

In your foreach loop, you are attempting to use XElement head as a list of XElements when you add() to it. This should probably be a list of XElements?

Might I suggest switching to using XmlDocument?
Here is some sample code which I have written for work (changed to protect my work :D), and we are using it rather well.
Code:
XmlDocument doc = new XmlDocument();
XmlNode root;
if(File.Exists(path + "\\MyXmlFile.xml"))
{
doc.Load(path + "\\MyXmlFile.xml");
root = doc.SelectSingleNode("//Library");
}
else
{
XmlDeclaration dec = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
doc.AppendChild(dec);
root = doc.CreateElement("Library");
doc.AppendChild(root);
}
XmlElement book = doc.CreateElement("Book");
XmlElement title = doc.CreateElement("Title");
XmlElement author = doc.CreateElement("Author");
XmlElement isbn = doc.CreateElement("ISBN");
title.InnerText = "Title of a Book";
author.InnerText = "Some Author";
isbn.InnerText = "RandomNumbers";
book.AppendChild(title);
book.AppendChild(author);
book.AppendChild(isbn);
root.AppendChild(book);
doc.Save(path + "\\MyXmlFile.xml");

Replace tokens in an aspx page on load

I have an aspx page that contains regular html, some uicomponents, and multiple tokens of the form {tokenname} .
When the page loads, I want to parse the page content and replace these tokens with the correct content. The idea is that there will be multiple template pages using the same codebehind.
I've no trouble parsing the string data itself, (see named string formatting, replace tokens in template) my trouble lies in when to read, and how to write the data back to the page...
What's the best way for me to rewrite the page content? I've been using a streamreader, and the replacing the page with Response.Write, but this is no good - a page containing other .net components does not render correctly.
Any suggestions would be greatly appreciated!

Take a look at System.Web.UI.Adapters.PageAdapter method TransformText - generally it is used for multi device support, but you can postprocess your page with this.

I'm not sure if I'm answering your question, but...
If you can change your notation from
{tokenname}
to something like
<%$ ZeusExpression:tokenname %>
you could consider creating your System.Web.Compilation.ExpressionBuilder.
After reading your comment...
There are other ways of getting access to the current page using ExpressionBuilder: just... create an expression. ;-)
Changing just a bit the sample from MSDN and supposing the code of your pages contain a method like this
public object GetData(string token);
you could implement something like this
public override CodeExpression GetCodeExpression(BoundPropertyEntry entry, object parsedData, ExpressionBuilderContext context)
{
Type type1 = entry.DeclaringType;
PropertyDescriptor descriptor1 = TypeDescriptor.GetProperties(type1)[entry.PropertyInfo.Name];
CodeExpression[] expressionArray1 = new CodeExpression[1];
expressionArray1[0] = new CodePrimitiveExpression(entry.Expression.Trim());
return new CodeCastExpression(
descriptor1.PropertyType,
new CodeMethodInvokeExpression(
new CodeThisReferenceExpression(),
"GetData",
expressionArray1));
}
This replaces your placeholder with a call like this
(string)this.GetData("tokenname");
Of course you can elaborate much more on this, perhaps using a "utility method" to simplify and "protect" access to data (access to properties, no special method involved, error handling, etc.).
Something that replaces instead with (e.g.)
(string)Utilities.GetData(this, "tokenname");
Hope this helps.

Many thanks to those that contributed to this question, however I ended up using a different solution -
Overriding the render function as per this page, except I parsed the page content for multiple different tags using regular expressions.
protected override void Render(HtmlTextWriter writer)
{
if (!Page.IsPostBack)
{
using (System.IO.MemoryStream stream = new System.IO.MemoryStream())
{
using (System.IO.StreamWriter streamWriter = new System.IO.StreamWriter(stream))
{
HtmlTextWriter htmlWriter = new HtmlTextWriter(streamWriter);
base.Render(htmlWriter);
htmlWriter.Flush();
stream.Position = 0;
using (System.IO.StreamReader oReader = new System.IO.StreamReader(stream))
{
string pageContent = oReader.ReadToEnd();
pageContent = ParseTagsFromPage(pageContent);
writer.Write(pageContent);
oReader.Close();
}
}
}
}
else
{
base.Render(writer);
}
}
Here's the regex tag parser
private string ParseTagsFromPage(string pageContent)
{
string regexPattern = "{zeus:(.*?)}"; //matches {zeus:anytagname}
string tagName = "";
string fieldName = "";
string replacement = "";
MatchCollection tagMatches = Regex.Matches(pageContent, regexPattern);
foreach (Match match in tagMatches)
{
tagName = match.ToString();
fieldName = tagName.Replace("{zeus:", "").Replace("}", "");
//get data based on my found field name, using some other function call
replacement = GetFieldValue(fieldName);
pageContent = pageContent.Replace(tagName, replacement);
}
return pageContent;
}
Seems to work quite well, as within the GetFieldValue function you can use your field name in any way you wish.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Using XhtmlTextWriter with XmlTextReader - c#

Related

Iterate through web pages and download PDFs

Creating XML from MarkUp HTML

xmlNode.SelectSingleNode always returns same value even though the node changes

using List.Add() method is breaking my code out of a foreach loops

Replace tokens in an aspx page on load

Categories

Resources