I need to write a page, can use PHP or .NET, that will display the unmodified html for an element of another page.
The other page may not have valid HTML, but we want it to be returned unmodified. We will not be selecting based on the invalid elements, but will select their parent element and need them returned unmodified.
An example HTML page that my page will be fetching:
<body>
<div>
<p>test1</p>
<br>
<p>test2
<p>test3</p>
</div>
</body>
So far everything I have tried attempts to fix the HTML, it makes the br in the example self closing and the second paragraph tags gets closed.
Is there anything out there that can do this?
Thanks!
Related
So I'm trying to scrape a website using AngleSharp and want to access a particular button that is nested deep in the site. I have logged out the parsed document html with document.DocumentElement.OuterHtml
but can only see so far into the document:
<div class="l-propertySearch-paginationAndSearchFooter" data-test="pagination">
<div data-bind="component: 'pagination'"></div>
</div>
</div>
However, when I inspect the page in the web browser, I can see the additional layers necessary to access the button:
As you can see, the div with the data-bind attribute title "component: 'pagination'" open up further but doesn't display this in the log - this is why, I suspect, I can't retrieve the element.
I've experimented with document.QuerySelectorAll("button" and get back a list of buttons but not the one I'm after - it's like the particular block I want doesn't exist. Any ideas what I'm doing wrong?
As far as I understand that button you are looking for is created with javascript and does not exist in original source code. That is the reason you can't access that button with anglesharp. Right click on website and click View page source (Ctrl + U on chrome) and look for your button there. That is what anglesharp sees not html inside inspect element.
I have a contenteditable div the user enter data. When they enter line break, each browser stores the data differently. When I export this data to Word using HtmlToOpenXml it adds a blank line for the content and I want to avoid that so the html page and word doc look the same.
One option for me is to replace the tags <br>, <div>, <p> with blank and then replace the </div> and </p> with <br/> in the C# code using RegEx. But I do not know what all formatting is used for contenteditable div by different browsers and this implementation may not help.
I would like to know what is the best way to address this or is there any open source tool/dll that helps me with this issue?
e.g. ContentEditable div actual data in browsers looks like below
Chrome -
line1<div>line2</div><div>line3</div>
IE Edge-
<div>line1</div><div>line22</div><div>line3<br></div>
FireFox - I read it uses <p> </p> instead of <div> </div>
Safari - ????
A Solution I found:
You could use RegEx, which I highly recommend in C# for parsing information.
Then effectively based on the formatting you could narrow down what browser it is and then move on towards parsing it's output and what its XML means universally. This will not be easy but no cross-platform ever truly is. I would give a example of how this could be done, but RegEx in all honesty takes a good amount of work and it would be quite a bit of code to make a example that could show you how to parse it and find out what the browser is.
I have a fairly simple page with a set of jQuery tabs, the content of some is called via ajax. I also have a search box in the masterpage in my header.
When I open the tabbed page the search box works fine. However once I have clicked on one of the ajax tabs the search box fails to work with an "Invalid Viewstate" yellow screen of death.
I believe this is because the ajax page is replacing the __VIEWSTATE hidden input with its own.
How can I stop this behaviour?
UPDATE: I have noticed that the YSOD only appears in IE and Chrome, Firefox doesn't seem to have the same issue. Although how the browser influences the ViewState, I'm not sure.
UPDATE: I've put a cut down version of the site that shows the issue here: http://dropbox.com/s/7wqgjqqdorgp958/stackoverflow.zip
The reason of such behavior is that you getting content of the ajaxTab.aspx page asynchronously and paste it into another aspx page. So you getting two instances of hidden fields with __VIEWSTATE name and when page posted back to server theirs values are mixing (might depends on how browser process multiple controls with same name on submit). To resolve this you can put second tab's content into a frame:
<div id="tabs">
<ul>
<li>Default Tab</li>
<li>ajax Content</li>
</ul>
<div id="tabs-1">
<p>
To replicate the error:
<ul>
<li>First use the search box top right to search to prove that code is ok</li>
<li>Then click the second ajax tab, and search again.</li>
<li>N.B. Chrome / IE give a state error, Firefox does not</li>
</ul>
</p>
</div>
<iframe id="tabs-2" src="ajaxTab.aspx" style="width:100%;" ></iframe>
</div>
Also, I'm not sure but this seems like error in the Web_UserControls_search control. In my opinion, NavBarSearchItemNoSearchItem_OnClick method must be refactored as below:
protected void NavBarSearchItemNoSearchItem_OnClick(object sender, EventArgs e)
{
var searchFieldTbx = NavBarSearchItemNo;
var navBarSearchCatHiddenField = NavBarSearchCatHiddenField;
var term = searchFieldTbx != null ? searchFieldTbx.Text : "";
if (term.Length > 0) //There is actually something in the input box we can work with
{
//Response.Redirect(Url.GetUrl("SearchResults", term));
Response.Redirect(ResolveClientUrl("~/Web/SearchResults.aspx?term=" + term + "&cat=" + navBarSearchCatHiddenField.Value));
}
}
Draw attention that we resolving client url when redirecting to search results page and instead of navBarSearchCatHiddenField use navBarSearchCatHiddenField.Value as cat parameter.
I guess that you use AJAX to fill the content of the tab. So in this case, content of your tab will be replaced by the new one from ajax and certainly _VIEWSTATE will be replaced. At server, do you use data from ViewState? In the "static tabs", you should prevent them auto reload by using cache:true
Your issue is that with your ajax call you bring in a complete ASPX page. Including the Form tag and its Viewstate. If you remove the Form tag from ajaxTab.aspx you will see everything works fine. asp.net does not know how to handle two Form tags in one page. Same goes for hidden Viewstate fields. You cannot bring in a full aspx page via ajax. Just bring in the content Div you want to display and you`ll be good to go.
Not able to identify the element in a page.It gives null.I want to identify the element in the Iframe (textbox) .I used selenium webdriver to click on the element,but it is not able to identify the element
1) My HTML Code is as shown bellow
<html>
<head>
<body>
<iframe id="iframeOne">
</iframe>
</body>
</head>
</html>
2. I used javascript to identify the textbox like document.getElementById('textbox').
But it return null.
3.I even Tried using selenium webdriver
IWebElement ClickElement = Wait.Until((d) => webDriver.FindElement(By.Id(parameter1))); It gives object reference error
ClickElement.Click();
You cannot put html inside an iframe tag. it is to load another page inside the curent page. and your input tag should caontain the type of the control. and check the HTML validation errors.
The html code you put inside the iframe tag will be loaded and visible if and only if the browser does not support iframe tag. So probably never, unless you're using older Netscape navigator or IE 4.
Add src attribute to the iframe pointing to the url you want to load. Then you can access elements inside this way:
var frame = document.getElementById('iframeOne');
var frameDocument = frame.contentDocument;
var element = frameDocument.getElementById('xxxx');
There's one thing to take into account, though: accesing contentDocument when iframe's src is cross-domain might not work as expected.
I have a page that does not have runat="server" set in the <head/> section. I do not have access to modify any of the code in the page.
This page contains a user control which I do have access to. Can I add a <meta/> tag to the head section of the page from the user control? It needs to be server-side so a javascript solution won't work.
One option is to create a Response Filter, and then modify the output before it's sent to the user.
https://web.archive.org/web/20211029043851/https://www.4guysfromrolla.com/articles/120308-1.aspx
You can parse the text in
(this.Page.Controls[0] as LiteralControl).Text
to see where the string <head> starts, and insert whatever text you need in there thus injecting your own code into the page header without it being marked with runat="server".
Please be aware though, this is pretty hacky way of getting your code where it most likely shouldn't be (otherwise the <head> element would have been marked as runat="server" so you can access it normally). This will also break if at a later date the head element is changed to be an ASP.NET control. It might will not work with master pages, you will have to walk up the control tree looking for topmost literal element.