C# WebKit and CKEditor iframe scraping

C# WebKit and CKEditor iframe scraping - c#

Currently I am working on my project which involves using webbrowser control in c#. After many struggles I successfully accomplished joining WebKit to WinForms application and run website with CKEditor in it but it gave me 2 issues.
1 Image uploader works fine but it doesn't send callback or WebKit cannot process it. Is there any possibility to make it work?
2 When I try to scrape html document to get the iframe by doing this: webKitBrowser1.Document.GetElementById("cke_1_contents").LastChild I get iframe element but I have no idea how to get content of it because it says that i doesn't have any childs.
Anyone can suggest me what to do next or give any alternative for this?
I use VS2008 and .NET 3.5.

I can't answer this question in the context of the WebKit-based control, but I suggest that you try the native WinForms WebBrowser control. It works great as the host for CKEditor, once the WebBrowser Feature Control has been implemented.
Then, if I was to do web-scraping on a page with CKEditor, I'd try something like this to get the current editor content (from C#):
dynamic pageDocument = webBrowser.Document.DomDocument;
var ckeDocument = pageDocument.getElementsByClassName("cke_wysiwyg_frame").item(0).contentDocument;
MessageBox.Show((string)ckeDocument.documentElement.outerHTML);

Related

How do I dynamically load HTML into a Winforms WebViewControl?

The Visual Studio WebView component uses the Microsoft Edge browser and is the upgraded version of the WebBrowser control that used older Internet Explorer technology, but the API is different.
Does anyone know the equivalent WebView control call for:
WebBrowser.DocumentText = "<html>Dynamic page content</html>";

I agree with the suggestion given by the #aepot that you can use the WebView.NavigateToString to load the dynamic HTML.
webView2.NavigateToString("<html> Test code </html>");
If it is a file then you can do like this:
webView2.NavigateToString(System.IO.File.ReadAllText(Application.StartupPath + "/11.html"));
OR
webView2.NavigateToLocal(#"\12.html");

Excecute script with HtmlAgilityPack [duplicate]

I'm trying to scrape a particular webpage which works as follows.
First the page loads, then it runs some sort of javascript to fetch the data it needs to populate the page. I'm interested in that data.
If I Get the page with HtmlAgilityPack - the script doesn't run so I get what it essentially a mostly-blank page.
Is there a way to force it to run a script, so I can get the data?

You are getting what the server is returning - the same as a web browser. A web browser, of course, then runs the scripts. Html Agility Pack is an HTML parser only - it has no way to interpret the javascript or bind it to its internal representation of the document. If you wanted to run the script you would need a web browser. The perfect answer to your problem would be a complete "headless" web browser. That is something that incorporates an HTML parser, a javascript interpreter, and a model that simulates the browser DOM, all working together. Basically, that's a web browser, except without the rendering part of it. At this time there isn't such a thing that works entirely within the .NET environment.
Your best bet is to use a WebBrowser control and actually load and run the page in Internet Explorer under programmatic control. This won't be fast or pretty, but it will do what you need to do.
Also see my answer to a similar question: Load a DOM and Execute javascript, server side, with .Net which discusses the available technology in .NET to do this. Most of the pieces exist right now but just aren't quite there yet or haven't been integrated in the right way, unfortunately.

You can use Awesomium for this, http://www.awesomium.com/. It works fairly well but has no support for x64 and is not thread safe. I'm using it to scan some web sites 24x7 and it's running fine for at least a couple of days in a row but then it usually crashes.

Selenium webdriver C# - Unable to find the element in a grid developed using angular UI

I am trying to automate a web application developed using angular JS through selenium webdriver(C#) and in that i am trying to click on a cell in a angular UI grid, i tried finding by css selector or xpath but it didn't help.
Css selector is generating dynamic ID - #\31 460691734316-0-uiGrid-00KQ-cell > div
Xpath is also dynamic //*[#id="1460691734316-0-uiGrid-00KQ-cell"]/div
and i tried using
driver.FindElements(By.CssSelector("*[id^='1460'][id$='cell']"));
but it didn't help
any help will be highly appreciated. I can send more details if needed

For my particular problem with the HTML page containing iframes and developed with AnglularJS the following trick saved me a lot of time: In the DOM I clearly saw that there is an iframe which wraps all the content. So following code supposed to work:
driver.switchTo().frame(0);
waitUntilVisibleByXPath("//h2[contains(text(), 'Creative chooser')]");
But it was not working and told me something like "Cannot switch to frame. Window was closed". Then I modified the code to:
driver.switchTo().defaultContent();
driver.switchTo().frame(0);
waitUntilVisibleByXPath("//h2[contains(text(), 'Creative chooser')]");
After this everything went smoothly. So evidently Angular was mangling something with iframes and just after loading the page when you expect that driver is focused on default content it was focused by some already removed by Angular frame. Hope this may help some of you.

What about trying to find the element with Selenium IDE which is a plugin of firefox ?
In the IDE, you can easily find the selector using selecting the element with GUI

Rather than identifying the element specifically by its ID, could you use the elements around it instead? Is this cell within a table and at a consistent position? Is there a parent element you could more consistently select and iterate through the children in your C# program to identify the appropriate cell you're looking for?

Webview in CEFSharp to render HTML content

My objective is to render the html content into text with its styles, indent and all others. I just done a workaround with CEFSharp v1.25.5 with the following code and it works like charm.
CefSharp.Wpf.WebView webView= new CefSharp.Wpf.WebView();
webView.LoadHtml("<p> this is <b>paragraph<b></p>");
Since I am developing a 64bit application,I'm not able to implement the same in my application. So, I found latest version of CefSharp v37.0.0 with 64bit support. But the sad part is I could not find the 'WebView' here. I tried the following code without any luck.
CefSharp.Wpf.ChromimWebBrowser browser= new CefSharp.Wpf.ChromimWebBrowser();
browser.LoadHtml = ("<p> this is <b>paragraph<b></p>","dummy:");
I need to show the html webcontent in webview container in a 64bit target platform like the webview in Cefsharp v1.25.5.

Hi there's a few things you should do, all covered in CefSharp Troubleshooting.
Initialise CEF
Add the browser to the Controls collection
I prefer to LoadHtml in a method attached to the browser.IsInitialisedEvent (and only load when the event.IsBrowserInitialised is true)
Hope these help. If there are any other issue do let us know!

HTML only WebBrowser in C#?

Any way to tell a WebBrowser in C# to show the pages in HTML only? I'm trying to make a web scraper and I don't need pictures that make the process way slower than necessary.

Why are you using a WebBrowser control for page scraping? If you just want the core html of a page, then just do a WebRequest and get the response.

you're going to have to roll your own basically.
One way would be to build your application in WPF and use a HTML->XAML conversion process and just leave off the tag from being converted.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# WebKit and CKEditor iframe scraping - c#

Related

How do I dynamically load HTML into a Winforms WebViewControl?

Excecute script with HtmlAgilityPack [duplicate]

Selenium webdriver C# - Unable to find the element in a grid developed using angular UI

Webview in CEFSharp to render HTML content

HTML only WebBrowser in C#?

Categories

Resources