I have an aspx page that contains regular html, some uicomponents, and multiple tokens of the form {tokenname} .
When the page loads, I want to parse the page content and replace these tokens with the correct content. The idea is that there will be multiple template pages using the same codebehind.
I've no trouble parsing the string data itself, (see named string formatting, replace tokens in template) my trouble lies in when to read, and how to write the data back to the page...
What's the best way for me to rewrite the page content? I've been using a streamreader, and the replacing the page with Response.Write, but this is no good - a page containing other .net components does not render correctly.
Any suggestions would be greatly appreciated!
Take a look at System.Web.UI.Adapters.PageAdapter method TransformText - generally it is used for multi device support, but you can postprocess your page with this.
I'm not sure if I'm answering your question, but...
If you can change your notation from
{tokenname}
to something like
<%$ ZeusExpression:tokenname %>
you could consider creating your System.Web.Compilation.ExpressionBuilder.
After reading your comment...
There are other ways of getting access to the current page using ExpressionBuilder: just... create an expression. ;-)
Changing just a bit the sample from MSDN and supposing the code of your pages contain a method like this
public object GetData(string token);
you could implement something like this
public override CodeExpression GetCodeExpression(BoundPropertyEntry entry, object parsedData, ExpressionBuilderContext context)
{
Type type1 = entry.DeclaringType;
PropertyDescriptor descriptor1 = TypeDescriptor.GetProperties(type1)[entry.PropertyInfo.Name];
CodeExpression[] expressionArray1 = new CodeExpression[1];
expressionArray1[0] = new CodePrimitiveExpression(entry.Expression.Trim());
return new CodeCastExpression(
descriptor1.PropertyType,
new CodeMethodInvokeExpression(
new CodeThisReferenceExpression(),
"GetData",
expressionArray1));
}
This replaces your placeholder with a call like this
(string)this.GetData("tokenname");
Of course you can elaborate much more on this, perhaps using a "utility method" to simplify and "protect" access to data (access to properties, no special method involved, error handling, etc.).
Something that replaces instead with (e.g.)
(string)Utilities.GetData(this, "tokenname");
Hope this helps.
Many thanks to those that contributed to this question, however I ended up using a different solution -
Overriding the render function as per this page, except I parsed the page content for multiple different tags using regular expressions.
protected override void Render(HtmlTextWriter writer)
{
if (!Page.IsPostBack)
{
using (System.IO.MemoryStream stream = new System.IO.MemoryStream())
{
using (System.IO.StreamWriter streamWriter = new System.IO.StreamWriter(stream))
{
HtmlTextWriter htmlWriter = new HtmlTextWriter(streamWriter);
base.Render(htmlWriter);
htmlWriter.Flush();
stream.Position = 0;
using (System.IO.StreamReader oReader = new System.IO.StreamReader(stream))
{
string pageContent = oReader.ReadToEnd();
pageContent = ParseTagsFromPage(pageContent);
writer.Write(pageContent);
oReader.Close();
}
}
}
}
else
{
base.Render(writer);
}
}
Here's the regex tag parser
private string ParseTagsFromPage(string pageContent)
{
string regexPattern = "{zeus:(.*?)}"; //matches {zeus:anytagname}
string tagName = "";
string fieldName = "";
string replacement = "";
MatchCollection tagMatches = Regex.Matches(pageContent, regexPattern);
foreach (Match match in tagMatches)
{
tagName = match.ToString();
fieldName = tagName.Replace("{zeus:", "").Replace("}", "");
//get data based on my found field name, using some other function call
replacement = GetFieldValue(fieldName);
pageContent = pageContent.Replace(tagName, replacement);
}
return pageContent;
}
Seems to work quite well, as within the GetFieldValue function you can use your field name in any way you wish.
Related
I have created a simple web crawler but I want to add the recursion function so that every page that is opened I can get the URLs in this page, but I have no idea how I can do that and I want also to include threads to make it faster.
Here is my code
namespace Crawler
{
public partial class Form1 : Form
{
String Rstring;
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
WebRequest myWebRequest;
WebResponse myWebResponse;
String URL = textBox1.Text;
myWebRequest = WebRequest.Create(URL);
myWebResponse = myWebRequest.GetResponse();//Returns a response from an Internet resource
Stream streamResponse = myWebResponse.GetResponseStream();//return the data stream from the internet
//and save it in the stream
StreamReader sreader = new StreamReader(streamResponse);//reads the data stream
Rstring = sreader.ReadToEnd();//reads it to the end
String Links = GetContent(Rstring);//gets the links only
textBox2.Text = Rstring;
textBox3.Text = Links;
streamResponse.Close();
sreader.Close();
myWebResponse.Close();
}
private String GetContent(String Rstring)
{
String sString="";
HTMLDocument d = new HTMLDocument();
IHTMLDocument2 doc = (IHTMLDocument2)d;
doc.write(Rstring);
IHTMLElementCollection L = doc.links;
foreach (IHTMLElement links in L)
{
sString += links.getAttribute("href", 0);
sString += "/n";
}
return sString;
}
I fixed your GetContent method as follow to get new links from crawled page:
public ISet<string> GetNewLinks(string content)
{
Regex regexLink = new Regex("(?<=<a\\s*?href=(?:'|\"))[^'\"]*?(?=(?:'|\"))");
ISet<string> newLinks = new HashSet<string>();
foreach (var match in regexLink.Matches(content))
{
if (!newLinks.Contains(match.ToString()))
newLinks.Add(match.ToString());
}
return newLinks;
}
Updated
Fixed: regex should be regexLink. Thanks #shashlearner for pointing this out (my mistype).
i have created something similar using Reactive Extension.
https://github.com/Misterhex/WebCrawler
i hope it can help you.
Crawler crawler = new Crawler();
IObservable observable = crawler.Crawl(new Uri("http://www.codinghorror.com/"));
observable.Subscribe(onNext: Console.WriteLine,
onCompleted: () => Console.WriteLine("Crawling completed"));
The following includes an answer/recommendation.
I believe you should use a dataGridView instead of a textBox as when you look at it in GUI it is easier to see the links (URLs) found.
You could change:
textBox3.Text = Links;
to
dataGridView.DataSource = Links;
Now for the question, you haven't included:
using System. "'s"
which ones were used, as it would be appreciated if I could get them as can't figure it out.
From a design standpoint, I've written a few webcrawlers. Basically you want to implement a Depth First Search using a Stack data structure. You can use Breadth First Search also, but you'll likely come into stack memory issues. Good luck.
Good day
I have question about displaying html documents in a windows forms applications. App that I'm working on should display information from the
database in the html format. I will try to describe actions that I have taken (and which failed):
1) I tried to load "virtual" html page that exists only in memory and dynamically change it's parameters (webbMain is a WebBrowser control):
public static string CreateBookHtml()
{
StringBuilder sb = new StringBuilder();
//Declaration
sb.AppendLine(#"<?xml version=""1.0"" encoding=""utf-8""?>");
sb.AppendLine(#"<?xml-stylesheet type=""text/css"" href=""style.css""?>");
sb.AppendLine(#"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.1//EN""
""http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"">");
sb.AppendLine(#"<html xmlns=""http://www.w3.org/1999/xhtml"" xml:lang=""en"">");
//Head
sb.AppendLine(#"<head>");
sb.AppendLine(#"<title>Exemplary document</title>");
sb.AppendLine(#"<meta http-equiv=""Content-Type"" content=""application/xhtml+xml;
charset=utf-8""/ >");
sb.AppendLine(#"</head>");
//Body
sb.AppendLine(#"<body>");
sb.AppendLine(#"<p id=""paragraph"">Example.</p>");
sb.AppendLine(#"</body>");
sb.AppendLine(#"</html>");
return sb.ToString();
}
void LoadBrowser()
{
this.webbMain.Navigate("about:blank");
this.webbMain.DocumentText = CreateBookHtml();
HtmlDocument doc = this.webbMain.Document;
}
This failed, because doc.Body is null, and doc.getElementById("paragraph") returns null too. So I cannot change paragraph InnerText property.
Furthermore, this.webbMain.DocumentText is "\0"...
2) I tried to create html file in specified folder, load it to the WebBrowser and then change its parameters. Html is the same as created by
CreateBookHtml() method:
private void LoadBrowser()
{
this.webbMain.Navigate("HTML\\BookPage.html"));
HtmlDocument doc = this.webbMain.Document;
}
This time this.webbMain.DocumentText contains Html data read from the file, but doc.Body returns null again, and I still cannot take element using
getByElementId() method. Of course, when I have text, I would try regex to get specified fields, or maybe do other tricks to achieve a goal, but I wonder - is there simply way to mainipulate html? For me, ideal way would be to create HTML text in memory, load it into the WebBrowser control, and then dynamically change its parameters using IDs. Is it possible? Thanks for the answers in advance, best regards,
Paweł
I've worked some time ago with the WebControl and like you wanted to load a html from memory but have the same problem, body being null. After some investigation, I noticed that the Navigate and NavigateToString methods work asynchronously, so it needs a little time for the control to load the document, the document is not available right after the call to Navigate. So i did something like (wbChat is the WebBrowser control):
wbChat.NavigateToString("<html><body><div>first line</div></body><html>");
DoEvents();
where DoEvents() is implemeted as:
[SecurityPermissionAttribute(SecurityAction.Demand, Flags = SecurityPermissionFlag.UnmanagedCode)]
public void DoEvents()
{
DispatcherFrame frame = new DispatcherFrame();
Dispatcher.CurrentDispatcher.BeginInvoke(DispatcherPriority.Background,
new DispatcherOperationCallback(ExitFrame), frame);
Dispatcher.PushFrame(frame);
}
and it worked for me, after the DoEvents call, I could obtain a non-null body:
mshtml.IHTMLDocument2 doc2 = (mshtml.IHTMLDocument2)wbChat.Document;
mshtml.HTMLDivElement div = (mshtml.HTMLDivElement)doc2.createElement("div");
div.innerHTML = "some text";
mshtml.HTMLBodyClass body = (mshtml.HTMLBodyClass)doc2.body;
if (body != null)
{
body.appendChild((mshtml.IHTMLDOMNode)div);
body.scrollTop = body.scrollHeight;
}
else
Console.WriteLine("body is still null");
I don't know if this is the right way of doing this, but it fixed the problem for me, maybe it helps you too.
Later Edit:
public object ExitFrame(object f)
{
((DispatcherFrame)f).Continue = false;
return null;
}
The DoEvents method is necessary on WPF. For System.Windows.Forms one can use Application.DoEvents().
Another way to do the same thing is:
webBrowser1.DocumentText = "<html><body>blabla<hr/>yadayada</body></html>";
this works without any extra initialization
I have a need to verify a specific hyperlink exists on a given web page. I know how to download the source HTML. What I need help with is figuring out if a "target" url exists as a hyperlink in the "source" web page.
Here is a little console program to demonstrate the problem:
public static void Main()
{
var sourceUrl = "http://developer.yahoo.com/search/web/V1/webSearch.html";
var targetUrl = "http://developer.yahoo.com/ypatterns/";
Console.WriteLine("Source contains link to target? Answer = {0}",
SourceContainsLinkToTarget(
sourceUrl,
targetUrl));
Console.ReadKey();
}
private static bool SourceContainsLinkToTarget(string sourceUrl, string targetUrl)
{
string content;
using (var wc = new WebClient())
content = wc.DownloadString(sourceUrl);
return content.Contains(targetUrl); // Need to ensure this is in a <href> tag!
}
Notice the comment on the last line. I can see if the target URL exists in the HTML of the source URL, but I need to verify that URL is inside of a <href/> tag. This way I can validate it's actually a hyperlink, instead of just text.
I'm hoping someone will have a kick-ass regular expression or something I can use.
Thanks!
Here is the solution using the HtmlAgilityPack:
private static bool SourceContainsLinkToTarget(string sourceUrl, string targetUrl)
{
var doc = (new HtmlWeb()).Load(sourceUrl);
foreach (var link in doc.DocumentNode.SelectNodes("//a[#href]"))
if (link.GetAttributeValue("href",
string.Empty).Equals(targetUrl))
return true;
return false;
}
The best way is to use a web scraping library with a built in DOM parser, which will build an object tree out of the HTML and let you explore it programmatically for the link entity you are looking for. There are many available - for example Beautiful Soup (python) or scrapi (ruby) or Mechanize (perl). For .net, try the HTML agility pack. http://htmlagilitypack.codeplex.com/
I'm imagining that this will come down to personal preference but which way would you go?
StringBuilder markup = new StringBuilder();
foreach (SearchResult image in Search.GetImages(componentId))
{
markup.Append(String.Format("<div class=\"captionedImage\"><img src=\"{0}\" width=\"150\" alt=\"{1}\"/><p>{1}</p></div>", image.Resolutions[0].Uri, image.Caption));
}
LiteralMarkup.Text = markup.ToString();
Vs.
StringBuilder markup = new StringBuilder();
foreach (SearchResult image in Search.GetImages(componentId))
{
markup.Append(String.Format(#"<div class=""captionedImage""><img src=""{0}"" width=""150"" alt=""{1}""/><p>{1}</p></div>", image.Resolutions[0].Uri, image.Caption));
}
LiteralMarkup.Text = markup.ToString();
Or should I not be doing this at all and using the HtmlTextWriter class instead?
EDIT: Some really good suggestions here. We are on 2.0 framework so LINQ is not available
Another vote for "AppendFormat". Also, for the sake of the server code I might put up with single quotes here to avoid the need to escape anything:
StringBuilder markup = new StringBuilder();
foreach (SearchResult image in Search.GetImages(componentId))
{
markup.AppendFormat(
"<div class='captionedImage'><img src='{0}' width='150' alt='{1}'/><p>{1}</p></div>",
image.Resolutions[0].Uri, image.Caption
);
}
LiteralMarkup.Text = markup.ToString();
Finally, you may also want an additional check in there somewhere to prevent html/xss injection.
Another option is to encapsulate your image in a class:
public class CaptionedHtmlImage
{
public Uri src {get; set;};
public string Caption {get; set;}
CaptionedHtmlImage(Uri src, string Caption)
{
this.src = src;
this.Caption = Caption;
}
public override string ToString()
{
return String.Format(
"<div class='captionedImage'><img src='{0}' width='150' alt='{1}'/><p>{1}</p></div>"
src.ToString(), Caption
);
}
}
This has the advantage of making it easy to re-use and add features to the concept over time. To get real fancy you can turn that class into a usercontrol.
Me personally I would opt for the first option. Just to point out also, a quick code saving tip here. you can use AppendFormat for the StringBuilder.
EDIT: The HtmlTextWriter approach would give you a much more structured result, and if the amount of HTML to generate grew, then this would be an obvious choice. OR the use of an HTML file and using it as a template and maybe string replacements
ASP.NET User Control are another good choice for templating.
I'd prefer:
StringBuilder markup = new StringBuilder();
string template = "<div class=\"captionedImage\"><img src=\"{0}\" width=\"150\" alt=\"{1}\"/><p>{1}</p></div>";
foreach (SearchResult image in Search.GetImages(componentId))
{
markup.AppendFormat(template,image.Resolutions[0].Uri, image.Caption);
}
LiteralMarkup.Text = markup.ToString();
As mentioned in another answer, using AppendFormat().
Take a look at SharpTemplate for html templating, it makes it all a lot easier to read even for small amounts of HTML.
I would prefer doing this in aspx using the ListView Control which exactly is made for this purpose. So your code will remain readable and is seperated (no HTML in the C# Code). With your approach you will get no compilte time XHTML validation warnings.
I have made some helper classes to create html controls, so my code would look like this:
foreach (SearchResult image in Search.GetImages(componentId)) {
ContainerMarkup.Controls.Add(
Tag.Div.CssClass("captionedImage")
.AddChild(Tag.Image(image.Resolutions[0].Uri).Width(150).Alt(Image.Caption))
.AddChild(Tag.Paragraph.Text(Image.Caption)));
}
At first it might not look simpler, but it's easy to work with, creates controls that does proper html encoding of the values, and there is no string escaping issues.
I am working on a Web Forms application and I have some HTML code that I need in 2-3 more places.
I've created .ascx control like so which represents the repeated code:
protected override void Render(HtmlTextWriter writer)
{
StringBuilder str = new StringBuilder();
while (connRef.Read())
{
str.Append(connRef["some_database_field"]);
}
writer.Write(str.ToString());
base.Render(writer);
}
connRef is a reference to DataReader object that I pass where I need this piece of code to render.
In other control, I use following code:
MSSqlConn s = new MSSqlConn();
StringBuilder str = new StringBuilder();
s.OpenConn("select * from notes order by note_date desc;");
notes.note c = (notes.note)Page.LoadControl(#"/controls/notes/note.ascx");
c.ID = "note";
c.connRef = s;
while (s.Read())
{
str.Append(Html.RenderControl(c));
}
s.CloseConn();
Response.Write(str.ToString());
MSSqlConn is my class for database connection.
RenderControl renders any control as a HTML string.
s in second code snippet returns only one record instead of two. For some reason, s closes if I pass the reference to another control (c.connRef = s).
Maybe I am missing something, I don't know.
I am sorry if I hadn't explain it well.
Your problem is that you're calling the Read() method twice. This is what your code is basically doing:
while (s.Read())
{
while (connRef.Read())
{
str.Append(connRef["some_database_field"]);
}
}
Every call to the Read() method advances the data reader by one record. You need to take out one of the while loops.
You should also note that if you're passing this data reader to multiple user controls, you will need to reset it to start from the beginning for each new user control.