I am changing the content on my webbrowser control using 2 different methods. I am only able to disable the message in one situation.
method 1:
I just need to add some text at the bottom of the page (i don't get the message)
string s = browser.DocumentText + "<a>Extra Text</a>";
browser.Document.Write(string.Empty);
browser.DocumentText=s;
method 2:
When i try to create a new element and add it to the webbrowser I still get the "this document has been modified"-message. Can i disable this message?
HtmlElement element = browser.Document.GetElementById("myId");
HtmlElement newElement = browser.Document.CreateElement("a");
newElement.InnerText = "Extra text";
element.AppendChild(newElement);
Try
myWebBrowser.Document.ExecCommand("Refresh", False, "")
Related
I tried finding the element using Get but it does not work, thats why i treid with GetElement method
I am trying to enter text in an textbox element found using GetElement in teststack white using C#
i want to know how to cast the automation element to UIitem so that i can do enter() or click operation on that element
var all = appWindow.GetElement(SearchCriteria.ByControlType(ControlType.ComboBox)
.AndByText("Model collapsed"));
var element = all.FindFirst(TreeScope.Children,
new PropertyCondition(AutomationElement.NameProperty, "Edit Box collapsed"));
element.enter("");
when i do element.enter or click it gives error, i think i need to cast it or is there any other way where i can achieve this. Thank you.
After using the below code i was able to enter text.
var all = appWindow.GetElement(SearchCriteria.ByControlType(ControlType.ComboBox)
.AndByText(parentValue));
var element = all.FindFirst(TreeScope.Children, new PropertyCondition(AutomationElement.NameProperty, childValue));
TextBox textBox = new TextBox(all, appWindow.ActionListener);
TestStack.White.InputDevices.AttachedKeyboard keyboard = appWindow.Keyboard;
textBox .Click();
keyboard.Enter("test");
Hi I am working on data scraping application in C#.
Actually I want to get all the Display text but not the html tags.
Here's My code
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.
Load(#"http://dawateislami.net/books/bookslibrary.do#!section:bookDetail_521.tr");
string str = doc.DocumentNode.InnerText;
This inner html is returning some tags and scripts as well but I want to only get the Display text that's visible to user.
Please help me.
Thanks
[I believe this will solve ur problem][1]
Method 1 – In Memory Cut and Paste
Use WebBrowser control object to process the web page, and then copy the text from the control…
Use the following code to download the web page:
Collapse | Copy Code
//Create the WebBrowser control
WebBrowser wb = new WebBrowser();
//Add a new event to process document when download is completed
wb.DocumentCompleted +=
new WebBrowserDocumentCompletedEventHandler(DisplayText);
//Download the webpage
wb.Url = urlPath;
Use the following event code to process the downloaded web page text:
Collapse | Copy Code
private void DisplayText(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb = (WebBrowser)sender;
wb.Document.ExecCommand(“SelectAll”, false, null);
wb.Document.ExecCommand(“Copy”, false, null);
textResultsBox.Text = CleanText(Clipboard.GetText());
}
Method 2 – In Memory Selection Object
This is a second method of processing the downloaded web page text. It seems to take just a bit longer (very minimal difference). However, it avoids using the clipboard and the limitations associated with that.
Collapse | Copy Code
private void DisplayText(object sender, WebBrowserDocumentCompletedEventArgs e)
{ //Create the WebBrowser control and IHTMLDocument2
WebBrowser wb = (WebBrowser)sender;
IHTMLDocument2 htmlDocument =
wb.Document.DomDocument as IHTMLDocument2;
//Select all the text on the page and create a selection object
wb.Document.ExecCommand(“SelectAll”, false, null);
IHTMLSelectionObject currentSelection = htmlDocument.selection;
//Create a text range and send the range’s text to your text box
IHTMLTxtRange range = currentSelection.createRange() as IHTMLTxtRange
textResultsBox.Text = range.text;
}
Method 3 – The Elegant, Simple, Slower XmlDocument Approach
A good friend shared this example with me. I am a huge fan of simple, and this example wins the simplicity contest hands down. It was unfortunately very slow compared to the other two approaches.
The XmlDocument object will load / process HTML files with only 3 simple lines of code:
Collapse | Copy Code
XmlDocument document = new XmlDocument();
document.Load(“www.yourwebsite.com”);
string allText = document.InnerText;
There you have it! Three simple ways to scrape only displayed text from web pages with no external “packages” involved.
Packages
To remove javascript and css:
foreach(var script in doc.DocumentNode.Descendants("script").ToArray())
script.Remove();
foreach(var style in doc.DocumentNode.Descendants("style").ToArray())
style.Remove();
To remove comments (untested):
foreach(var comment in doc.DocumentNode.Descendants("//comment()").ToArray())
comment.Remove()
For removing all html tags from a string you can use:
String output = inputString.replaceAll("<[^>]*>", "");
For removing a specific tag:
String output = inputString.replaceAll("(?i)<td[^>]*>", "");
Hope it helps :)
IE9 Generate blank cell or you can say Ghost Cell, with ASP.Net Repeater control.
I try javascript regural expression. Render function to run reg. exp. but the page holds few update controls and generate error.
Error: sys.webforms.pagerequestmanagerservererrorexception the message
received from the server could not be parsed. ScriptResource.axd
I try all the well known links for this error.
Please suggest me if you really have...
Thank You
protected override void Render(HtmlTextWriter writer)
{
using (HtmlTextWriter htmlwriter = new HtmlTextWriter(new System.IO.StringWriter()))
{
base.Render(htmlwriter);
string html = htmlwriter.InnerWriter.ToString();
if ((ConfigurationManager.AppSettings.Get("RemoveWhitespace") + string.Empty).Equals("true", StringComparison.OrdinalIgnoreCase))
{
//html = Regex.Replace(html, #"(?<=[^])\t{2,}|(?<=[>])\s{2,}(?=[<])|(?<=[>])\s{2,11}(?=[<])|(?=[\n])\s{2,}", string.Empty);
html = Regex.Replace(html, #"(?<=<td[^>]*>)(?>\s+)(?!<table)|(?<!</table>\s*)\s+(?=</td>)", string.Empty);
html = html.Replace(";\n", ";");
}
writer.Write(html.Trim());
}
another Solution is, but fail for Repeater
var expr = new RegExp('>[ \t\r\n\v\f]*<', 'g');
document.body.innerHTML = document.body.innerHTML.replace(expr, '><');
You can access the Repeater control directly (before it's written to the page and rendered by IE) and remove the cells based on their index.
Need to remove spaces between "< /td >" and "< td >".
Found a very useful script to prevent unwanted cells in your html table while rendering in IE9 browser.
function removeWhiteSpaces()
{
$('#myTable').html(function(i, el) {
return el.replace(/>\s*</g, '><');
});
}
This javascript function you should call when the page loads (i.e. onload event)
Textbox1.text is user can enter html page name, so that its is appending to panel through literal.(loading html page to pannel).
string val = TextBox1.Text;
string location = Server.MapPath(".");
string path = location + "\\HTML\\" + val + ".html"; // HTML IS FOLDER NAME IN MY PROJECT
string readText = System.IO.File.ReadAllText(path);
Panel1.Controls.Clear();
Literal lit = new Literal();
lit.Text = readText;
Panel1.Controls.Add(lit);
Actually in Html page few controls which are in format of input (<input style="position: relative;" id="T0" onmouseup="mUp(this.id)" class="ui-draggable" onmousedown="mDown(this.id)" value="" type="text">)
I have to find those id's and text to save in database.
how to find the controls in panel now?
Give an ID to the control when you add it.
Literal lit = new Literal();
lit.Text = readText;
lit.ID = "myLiteral";
Panel1.Controls.Add(lit);
Then you can get it back as follows:
Literal lit = (Literal)Panel1.FincControl("myLiteral");
Remember that dynamically added controls must be created added again on every PostBack that follows as long as you want to have access to them.
Give your Literal an ID and then you can access it via FindControl...
Literal myLiteral = Panel1.FindControl("litId") as Literal;
if (myLiteral != null)
{
// ... do something
}
EDIT: (Missed the last part of your question)
You need to use ParseControl([string value]) on the HTML content which returns a control and then add that control (containing all child controls) to the Panel. Then you can use FindControl to locate child controls. For this method, the controls must be .NET controls (ie.
Since you did not give id to the control, u can find them by Panel1.Controls[index], since it is the first control added u can access at Panel1.Controls[0]
I am using below code to inject javascript
HtmlElement head = _wb.Document.GetElementsByTagName("head")[0];
HtmlElement scriptEl = _wb.Document.CreateElement("script");
mshtml.IHTMLScriptElement element = (mshtml.IHTMLScriptElement)scriptEl.DomElement;
element.text = "function zoom(){document.body.style.zoom='150%';}";
head.AppendChild(scriptEl);
Now can anyone tell me how to remove the added child
I think explicit deleting of an element is not possible (did not check the IHTML interfaces on this).
But this can be done in 2 different ways too:
element.OutherHtml = string.empty; => removes the whole element but does not always work
HtmlElement blankScript = _wb.Document.CreateElement("script");
element = blankScript; => replaces your unwanted script with a blank one