Custom WebViewPage inject code when razor template is rendering - c#

I'm trying to create a custom Razor view base class (inheriting WebViewPage) that will inject a bit of HTML for each view template being rendered (including Layouts and Partial Views) so that I have a reference on the client of where each Razor template starts (not interested in where it ends).
What I have tried so far is
overriding the Write method (as described in a comment here) . This injects code at every razor section, not just once per template (for example every time that you use the HTML.TextBoxFor)
overriding the ExecutePageHierarchy method (as described in the post of the link above). This throws an error every time it hits the first PopContext call: The "RenderBody" method has not been called for layout page "~/Views/Shared/_Layout.cshtml".

after trying your solution, I had some problems with the rendered HTML of complex pages with partial views.
my issue was that everything was reversed. (order of partial views)
to correct - I ended up replacing the Output stream in the OutputStack
public override void ExecutePageHierarchy()
{
// Replace output stream with a fake local stream
StringWriter fakeOutput = new StringWriter();
// Save output stack top level stream, and replace with fake local stream
TextWriter outputStackTopOutput = OutputStack.Pop();
OutputStack.Push(fakeOutput);
// Run Razor view engine
base.ExecutePageHierarchy();
string content = fakeOutput.ToString();
// Set back real outputs, and write to the real output
OutputStack.Pop();
OutputStack.Push(outputStackTopOutput);
outputStackTopOutput.Write(content);
}

Think that I have an answer to this now:
public abstract class CustomWebViewPage: WebViewPage
{
public override void ExecutePageHierarchy()
{
var layoutReferenceMarkup = #"<script type=""text/html"" data-layout-id=""" + TemplateInfo.VirtualPath + #"""></script>";
base.ExecutePageHierarchy();
string output = Output.ToString();
//if the body tag is present the script tag should be injected into it, otherwise simply append
if (output.Contains("</body>"))
{
Response.Clear();
Response.Write(output.Replace("</body>", layoutReferenceMarkup+"</body>"));
Response.End();
}
else
{
Output.Write(layoutReferenceMarkup);
}
}
}
public abstract class CustomWebViewPage<TModel>: CustomWebViewPage
{
}
Seems to work, but if anyone has a better solution, please share.

Related

Display Each Element From Array Using HTML Helper in C#

This is probably a very simple problem, but I am extremely new to C# / MVC and I have been handed a broken project to fix. So it's time to sink or swim!
I have an array of strings that is being passed from a function to the front end.
The array looks something like
reports = Directory.GetFiles(#"~\Reports\");
On the front end, I would like it to display each report, but I am not sure how to do that.
This project is using a MVC, and I believe the view is called "Razor View"? I know that it's using an HTML helper.
In essence, I need something like
#HTML.DisplayTextFor(Model.report [And then print every report in the array]);
I hope that makes sense.
If you want to display the file name array you can simply use a foreach:
#foreach(var report in Model.Reports){ #report }
Note that you should add the Reports property to your view model:
public class SampleViewModel
{
public string [] Reports { get; set; }
}
You could use ViewData or TempData but I find that using the view model is the better way.
You can then populate it:
[HttpGet]
public ActionResult Index()
{
var model = new SampleViewModel(){ Reports = Directory.GetFiles(#"~\Reports\")};
return View(model);
}
And use it in the view as you see fit.
Here is a simple online example: https://dotnetfiddle.net/5WmX5M
If you'd like to add a null check at the view level you can:
#if(Model.Reports != null)
{
foreach(var report in Model.Reports){ #report <br> }
}
else
{
<span> No files found </span>
}
https://dotnetfiddle.net/melMLW
This is never a bad idea, although in this case GetFiles will return an empty list if no files can be found, and I assume a possible IOException is being handled.

Scraping html list data from a dynamic server

Hallo guys!
Sorry for the dump question, this is my last resort. I swear i triend countless of other Stackoverflow questions, different Frameworks, etc., but those didnt seem to help.
Ich have the following Problem:
A website displays a list of data (there is a TON of div, li, span etc. tags infront, its a big HTML.)
Im writing a tool that fetches data from a specific list inside a ton of other div tags, downloads it and outputs an excel file.
The website im trying to access, is dynamic. So you open the website, it loads a little bit, and then the list appears (probably some JS and stuff).
When i try to download the website via a webRequest in C#, the html I get ist almost empty with a ton on white spaces, lots of non-html stuff, some garbage data as well.
Now: Im pretty used to C#, HTMLAgillityPack, and countless other libraries, not so much in web related stuff tho. I tried CefSharp, Chromium etc. all of those stuff, but couldnt get them to work properly unfortunately.
I want to have a HTML in my program to work with that looks exactly like the HTML that you see when
you open the dev console in chrome wenn visting the website mentined above.
The HTML parser works flwalessly there.
This is how I image how the code could look like simplified.
Extreme C# pseudocode:
WebBrowserEngine web = new WebBrowserEngine()
web.LoadURLuntilFinished(url); // with all the JS executed and stuff
String html = web.getHTML();
web.close();
My Goal would be that the string html in the pseudocode looks exactly like the one in the Chrome dev tab.
Maybe there is a solution posted somewhere else but i swear i coudlnt find it, been looking for days.
Andy help is greatly appreciated.
#SpencerBench is spot on in saying
It could be that the page is using some combination of scroll state, element visibility, or element positions to trigger content loading. If that's the case, then you'll need to figure out what it is and trigger it programmatically.
To answer the question for your specific use case, we need to understand the behaviour of the page you want to scrape data from, or as I asked in the comments, how do you know the page is "finished"?
However, it's possible to give a fairly generic answer to the question which should act as a starting point for you.
This answer uses Selenium, a package which is commonly used for automating testing of web UIs, but as they say on their home page, that's not the only thing it can be used for.
Primarily it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should) also be automated as well.
The web site I'm scraping
So first we need a web site. I've created one using ASP.net core MVC with .net core 3.1, although the web site's technology stack isn't important, it's the behaviour of the page you want to scrape which is important. This site has 2 pages, unimaginatively called Page1 and Page2.
Page controllers
There's nothing special in these controllers:
namespace StackOverflow68925623Website.Controllers
{
using Microsoft.AspNetCore.Mvc;
public class Page1Controller : Controller
{
public IActionResult Index()
{
return View("Page1");
}
}
}
namespace StackOverflow68925623Website.Controllers
{
using Microsoft.AspNetCore.Mvc;
public class Page2Controller : Controller
{
public IActionResult Index()
{
return View("Page2");
}
}
}
API controller
There's also an API controller (i.e. it returns data rather than a view) which the views can call asynchronously to get some data to display. This one just creates an array of the requested number of random strings.
namespace StackOverflow68925623Website.Controllers
{
using Microsoft.AspNetCore.Mvc;
using System;
using System.Collections.Generic;
using System.Text;
[Route("api/[controller]")]
[ApiController]
public class DataController : ControllerBase
{
[HttpGet("Create")]
public IActionResult Create(int numberOfElements)
{
var response = new List<string>();
for (var i = 0; i < numberOfElements; i++)
{
response.Add(RandomString(10));
}
return Ok(response);
}
private string RandomString(int length)
{
var sb = new StringBuilder();
var random = new Random();
for (var i = 0; i < length; i++)
{
var characterCode = random.Next(65, 90); // A-Z
sb.Append((char)characterCode);
}
return sb.ToString();
}
}
}
Views
Page1's view looks like this:
#{
ViewData["Title"] = "Page 1";
}
<div class="text-center">
<div id="list" />
<script src="~/lib/jquery/dist/jquery.min.js"></script>
<script>
var apiUrl = 'https://localhost:44394/api/Data/Create';
$(document).ready(function () {
$('#list').append('<li id="loading">Loading...</li>');
$.ajax({
url: apiUrl + '?numberOfElements=20000',
datatype: 'json',
success: function (data) {
$('#loading').remove();
var insert = ''
for (var item of data) {
insert += '<li>' + item + '</li>';
}
insert = '<ul id="results">' + insert + '</ul>';
$('#list').html(insert);
},
error: function (xht, status) {
alert('Error: ' + status);
}
});
});
</script>
</div>
So when the page first loads, it just contains an empty div called list, however the page loading trigger's the function passed to jQuery's $(document).ready function, which makes an asynchronous call to the API controller, requesting an array of 20,000 elements. While the call is in progress, "Loading..." is displayed on the screen, and when the call returns, this is replaced by an unordered list containing the received data. This is written in a way intended to be friendly to developers of automated UI tests, or of screen scrapers, because we can tell whether all the data has loaded by testing whether or not the page contains an element with the ID results.
Page2's view looks like this:
#{
ViewData["Title"] = "Page 2";
}
<div class="text-center">
<div id="list">
<ul id="results" />
</div>
<script src="~/lib/jquery/dist/jquery.min.js"></script>
<script>
var apiUrl = 'https://localhost:44394/api/Data/Create';
var requestCount = 0;
var maxRequests = 20;
$(document).ready(function () {
getData();
});
function getDataIfAtBottomOfPage() {
console.log("scroll - " + requestCount + " requests");
if (requestCount < maxRequests) {
console.log("scrollTop " + document.documentElement.scrollTop + " scrollHeight " + document.documentElement.scrollHeight);
if (document.documentElement.scrollTop > (document.documentElement.scrollHeight - window.innerHeight - 100)) {
getData();
}
}
}
function getData() {
window.onscroll = undefined;
requestCount++;
$('results2').append('<li id="loading">Loading...</li>');
$.ajax({
url: apiUrl + '?numberOfElements=50',
datatype: 'json',
success: function (data) {
var insert = ''
for (var item of data) {
insert += '<li>' + item + '</li>';
}
$('#loading').remove();
$('#results').append(insert);
if (requestCount < maxRequests) {
window.setTimeout(function () { window.onscroll = getDataIfAtBottomOfPage }, 1000);
} else {
$('#results').append('<li>That\'s all folks');
}
},
error: function (xht, status) {
alert('Error: ' + status);
}
});
}
</script>
</div>
This gives a nicer user experience because it requests data from the API controller in multiple smaller chunks, so the first chunk of data appears fairly quickly, and once the user has scrolled down to somewhere near the bottom of the page, the next chunk of data is requested, until 20 chunks have been requested and displayed, at which point the text "That's all folks" is added to the end of the unordered list. However this is more difficult to interact with programmatically because you need to scroll the page down to make the new data appear.
(Yes, this implementation is a bit buggy - if the user gets to the bottom of the page too quickly then requesting the next chunk of data doesn't happen until they scroll up a bit. But the question isn't about how to implement this behaviour in a web page, but about how to scrape the displayed data, so please forgive my bugs.)
The scraper
I've implemented the scraper as a xUnit unit test project, just because I'm not doing anything with the data I've scraped from the web site other than Asserting that it is of the correct length, and therefore proving that I haven't prematurely assumed that the web page I'm scraping from is "finished". You can put most of this code (other than the Asserts) into any type of project.
Having created your scraper project, you need to add the Selenium.WebDriver and Selenium.WebDriver.ChromeDriver nuget packages.
Page Object Model
I'm using the Page Object Model pattern to provide a layer of abstraction between functional interaction with the page and the implementation detail of how to code that interaction. Each of the pages in the web site has a corresponding page model class for interacting with that page.
First, a base class with some code which is common to more than one page model class.
namespace StackOverflow68925623Scraper
{
using System;
using OpenQA.Selenium;
using OpenQA.Selenium.Support.UI;
public class PageModel
{
protected PageModel(IWebDriver driver)
{
this.Driver = driver;
}
protected IWebDriver Driver { get; }
public void ScrollToTop()
{
var js = (IJavaScriptExecutor)this.Driver;
js.ExecuteScript("window.scrollTo(0, 0)");
}
public void ScrollToBottom()
{
var js = (IJavaScriptExecutor)this.Driver;
js.ExecuteScript("window.scrollTo(0, document.body.scrollHeight)");
}
protected IWebElement GetById(string id)
{
try
{
return this.Driver.FindElement(By.Id(id));
}
catch (NoSuchElementException)
{
return null;
}
}
protected IWebElement AwaitGetById(string id)
{
var wait = new WebDriverWait(Driver, TimeSpan.FromSeconds(10));
return wait.Until(e => e.FindElement(By.Id(id)));
}
}
}
This base class gives us 4 convenience methods:
Scroll to the top of the page
Scroll to the bottom of the page
Get the element with the supplied ID, or return null if it doesn't exist
Get the element with the supplied ID, or wait for up to 10 seconds for it to appear if it doesn't exist yet
And each page in the web site has its own model class, derived from that base class.
namespace StackOverflow68925623Scraper
{
using OpenQA.Selenium;
public class Page1Model : PageModel
{
public Page1Model(IWebDriver driver) : base(driver)
{
}
public IWebElement AwaitResults => this.AwaitGetById("results");
public void Navigate()
{
this.Driver.Navigate().GoToUrl("https://localhost:44394/Page1");
}
}
}
namespace StackOverflow68925623Scraper
{
using OpenQA.Selenium;
public class Page2Model : PageModel
{
public Page2Model(IWebDriver driver) : base(driver)
{
}
public IWebElement Results => this.GetById("results");
public void Navigate()
{
this.Driver.Navigate().GoToUrl("https://localhost:44394/Page2");
}
}
}
And the Scraper class:
namespace StackOverflow68925623Scraper
{
using OpenQA.Selenium.Chrome;
using System;
using System.Threading;
using Xunit;
public class Scraper
{
[Fact]
public void TestPage1()
{
// Arrange
var driver = new ChromeDriver();
var page = new Page1Model(driver);
page.Navigate();
try
{
// Act
var actualResults = page.AwaitResults.Text.Split(Environment.NewLine);
// Assert
Assert.Equal(20000, actualResults.Length);
}
finally
{
// Ensure the browser window closes even if things go pear-shaped
driver.Quit();
}
}
[Fact]
public void TestPage2()
{
// Arrange
var driver = new ChromeDriver();
var page = new Page2Model(driver);
page.Navigate();
try
{
// Act
while (!page.Results.Text.Contains("That's all folks"))
{
Thread.Sleep(1000);
page.ScrollToBottom();
page.ScrollToTop();
}
var actualResults = page.Results.Text.Split(Environment.NewLine);
// Assert - we expect 1001 because of the extra "that's all folks"
Assert.Equal(1001, actualResults.Length);
}
finally
{
// Ensure the browser window closes even if things go pear-shaped
driver.Quit();
}
}
}
}
So, what's happening here?
// Arrange
var driver = new ChromeDriver();
var page = new Page1Model(driver);
page.Navigate();
ChromeDriver is in the Selenium.WebDriver.ChromeDriver package and implements the IWebDriver interface from the Selenium.WebDriver package with the code to interact with the Chrome browser. Other packages are available containing implementations for all popular browsers. Instantiating the driver object opens a browser window, and calling its Navigate method directs the browser to the page we want to test/scrape.
// Act
var actualResults = page.AwaitResults.Text.Split(Environment.NewLine);
Because on Page1, the results element doesn't exist until all the data has been displayed, and no user interaction is required in order for it to be displayed, we use the page model's AwaitResults property to just wait for that element to appear and return it once it has appeared.
AwaitResults returns an IWebElement instance representing the element, which in turn has various methods and properties we can use to interact with the element. In this case we use its Text property which returns the element's contents as a string, without any markup. Because the data is displayed as an unordered list, each element in the list is delimited by a line break, so we can can use String's Split method to convert it to a string array.
Page2 needs a different approach - we can't use the presence of the results element to determine whether the data has all been displayed, because that element is on the page right from the start, instead we need to check for the string "That's all folks" which is written right at the end of the last chunk of data. Also the data isn't loaded all in one go, and we need to keep scrolling down in order to trigger the loading of the next chunk of data.
// Act
while (!page.Results.Text.Contains("That's all folks"))
{
Thread.Sleep(1000);
page.ScrollToBottom();
page.ScrollToTop();
}
var actualResults = page.Results.Text.Split(Environment.NewLine);
Because of the bug in the UI that I mentioned earlier, if we get to the bottom of the page too quickly, the fetch of the next chunk of data isn't triggered, and attempting to scroll down when already at the bottom of the page doesn't raise another scroll event. That's why I'm scrolling to the bottom of the page and then back to the top - that way I can guarantee that a scroll event is raised. You never know, the web site you're trying to scrape data from may itself be buggy.
Once the "That's all folks" text has appeared, we can go ahead and get the results element's Text property and convert it to a string array as before.
// Assert - we expect 1001 because of the extra "that's all folks"
Assert.Equal(1001, actualResults.Length);
This is the bit that won't be in your code. Because I'm scraping a web site which is under my control, I know exactly how much data it should be displaying so I can check that I've got all the data, and therefore that my scraping code is working correctly.
Further reading
Absolute beginner's introduction to Selenium: https://www.guru99.com/selenium-csharp-tutorial.html
(A curiosity in that article is the way that it starts by creating a console application project and later changes its output type to class library and manually adds the unit test packages, when the project could have been created using one of Visual Studio's unit test project templates. It gets to the right place in the end, albeit via a rather odd route.)
Selenium documentation: https://www.selenium.dev/documentation/
Happy scraping!
If you need to fully execute the web page, then a complete browser like CefSharp is your only option.
It could be that the page is using some combination of scroll state, element visibility, or element positions to trigger content loading. If that's the case, then you'll need to figure out what it is and trigger it programmatically. I know that CefSharp can simulate user actions like clicking, scrolling, etc.

Fixing a PDF Accessibility Issue (Alternative description missing for an annotation) in links

Generated accessible pdf using pdfHTMl add-on in iText 7. To add the link to the pdf used below code in HTML file,
www.google.com
C# code as below:
IList<IElement> elements = HtmlConverter.ConvertToElements(htmlFile,converterProperties);
foreach(IElement element in elements){
doc.Add((IBlockElement)element);
}
link was appeared in the pdf as expected. PAC tool gives error saying "Alternative description missing for annotation".
I saw same issue already raised here.
Fixing a PDF Accessibility Issue (Alternative description missing for an annotation) when converting an HTML Page to PDF
and
Fixing link error, pdfHTML
But there are not mentioned what is the extract answer for this. That is why I'm raised new one.
I tired out to create custom tag using aTagWorker. But element is appeared as JSoupElementNode in ProcessEnd method. How to set accessible properties for JSoupElementNode type of elements?
Please help me to resolve this issue.
Thanks
You're on the right track. Typically the getElementResult() function can be overridden to get an object that you can add accessibility attributes to. Links are a bit of a special case because there are both objects like Paragraphs and annotations (a clickable box that overlaps, but isn't directly related to the Text). This means you have to go through the processsEnd() function. At that point getAllElements() will return the sub-elements of the link.
Here is the solution I came up with when working with someone else. Note that it assumes sub-elements of the link are Text elements which is true in the typical case, but not necessarily in every case.
Set the HtmlWorker to use your new ATagWorker
ConverterProperties converterProperties = new ConverterProperties();
converterProperties.setTagWorkerFactory(new DefaultTagWorkerFactory() {
#Override
public ITagWorker getCustomTagWorker(
IElementNode tag, ProcessorContext context) {
if ("a".equalsIgnoreCase(tag.name())) {
return new AccessibleATagWorker(tag, context);
}
return null;
}
});
...
HtmlConverter.convertToPdf( ... , ... , converterProperties);
And the custom A tag Worker:
class AccessibleATagWorker extends ATagWorker {
private String ALTERNATE_DESCRIPTION;
public AccessibleATagWorker(IElementNode element, ProcessorContext context) {
super(element, context);
ALTERNATE_DESCRIPTION = element.getAttribute("title");
}
#Override
public void processEnd(IElementNode element, ProcessorContext context) {
super.processEnd(element, context);
List < IPropertyContainer > containedElements = this.getAllElements();
for (int x = 0; x < containedElements.size(); x++) {
if (containedElements.get(x) instanceof Text) {
((Text) containedElements.get(x)).getAccessibilityProperties().setAlternateDescription(ALTERNATE_DESCRIPTION);
}
}
}
}

sitecore RSS caching

I have been working on implementing a custom RSS feed in sitecore 6.4. My custom behaviour is very limited, all i effectively wanted to is add a link for author (our author field is a reference field so we cannot use the built in author attribution).
I overrode RenderItem() on the PublicFeed class so that i could make use of my own implementation of the FeedRenderer class (where the author logic is housed). my approach follows this pattern outlined by John West for adding your own rendering behaviour:
public class MyPUblicFeed: PublicFeed
{
protected override SyndicationItem RenderItem(Item item)
{
Assert.ArgumentNotNull(item, "item");
Control rendererControl = FeedUtil.GetFeedRendering(item);
if (rendererControl == null)
{
return null;
}
using (new ContextItemSwitcher(item))
{
var myRenderer= rendererControl as MyFeedRenderer;
if (myRenderer!= null)
{
myRenderer.Database = SitecoreHelper.CurrentDatabase.Name;
return myRenderer.RenderItem();
}
var renderer = rendererControl as Sitecore.Web.UI.WebControls.FeedRenderer;
if (renderer != null)
{
renderer.Database = SitecoreHelper.CurrentDatabase.Name;
return renderer.RenderItem();
}
}
throw new InvalidOperationException("FeedRenderer rendering must be of Sitecore.Web.UI.WebControls.FeedRenderer type");
}
}
And now for my rendering class:
public class MyFeedRenderer: Sitecore.Web.UI.WebControls.FeedRenderer
{
public override SyndicationItem RenderItem()
{
Item item = base.GetItem();
var syndicationItem = base.RenderItem();
//unfortunately we have to parse params again :(
FeedRenderingParameters feedRenderingParameter = FeedRenderingParameters.Parse(base.Parameters);
AddAuthor(syndicationItem, item, feedRenderingParameter);
return syndicationItem;
}
private static void AddAuthor(SyndicationItem syndicationItem, Item item, FeedRenderingParameters feedRenderingParameter)
{
//clear out authors added by base class
syndicationItem.Authors.Clear();
//logic for adding author here
}
}
this all works great, outputting exactly what i want, but the caching element doesn't appear to be working. I have set the cacheable flag on the actual item itself with a timespan of 01:00:00. This didn't appear to work - if i put a breakpoint in either of the above classes it is hit everytime the feed is requested.
so then i tried to enable caching at a control level, turning caching on with VaryByData for the MyFeedRenderer rendering. alas this isn't working either, the breakpoint is hit every time.
Can anyone offer any advice on this matter? the documentation simply recommends turning it on on the actual feed item, not at the Rendering level, but neither seem to be working for me. Interestingly HTML caching is working elsewhere - is RSS also put into the HTML cache?
Thanks in advance,
Nick
-Ensure the Cacheable checkbox in the feed definition item is checked.
-Ensure that you have published the feed definition item.
-If you do not populate the Cache Duration field in the feed definition item, it should default to one day.
-Feeds appear to cache in Sitecore.Syndication.FeedManager.Cache rather than the site output cache. Inspect that cache object in the Visual Studio debugger after calling your feed, and then again after calling that feed a second time, to try to see if any records appear, and if multiple cache keys appear for the same feed. Investigate the Render() method; if PublicFeed.IsCacheable() returns false (depending on the Cacheable field in the feed definition item), PublicFeed.Render() does not cache.
-Ensure nothing else clears caches between your requests for the feed.
SDN forum thread: http://sdn.sitecore.net/forum/ShowPost.aspx?PostID=40591

PartialView in web forms

I am working on a Web Forms application and I have some HTML code that I need in 2-3 more places.
I've created .ascx control like so which represents the repeated code:
protected override void Render(HtmlTextWriter writer)
{
StringBuilder str = new StringBuilder();
while (connRef.Read())
{
str.Append(connRef["some_database_field"]);
}
writer.Write(str.ToString());
base.Render(writer);
}
connRef is a reference to DataReader object that I pass where I need this piece of code to render.
In other control, I use following code:
MSSqlConn s = new MSSqlConn();
StringBuilder str = new StringBuilder();
s.OpenConn("select * from notes order by note_date desc;");
notes.note c = (notes.note)Page.LoadControl(#"/controls/notes/note.ascx");
c.ID = "note";
c.connRef = s;
while (s.Read())
{
str.Append(Html.RenderControl(c));
}
s.CloseConn();
Response.Write(str.ToString());
MSSqlConn is my class for database connection.
RenderControl renders any control as a HTML string.
s in second code snippet returns only one record instead of two. For some reason, s closes if I pass the reference to another control (c.connRef = s).
Maybe I am missing something, I don't know.
I am sorry if I hadn't explain it well.
Your problem is that you're calling the Read() method twice. This is what your code is basically doing:
while (s.Read())
{
while (connRef.Read())
{
str.Append(connRef["some_database_field"]);
}
}
Every call to the Read() method advances the data reader by one record. You need to take out one of the while loops.
You should also note that if you're passing this data reader to multiple user controls, you will need to reset it to start from the beginning for each new user control.

Categories

Resources