Source Code Different When Using CefSharp Function GetSourceAsync() than when Calling ViewSource()

Source Code Different When Using CefSharp Function GetSourceAsync() than when Calling ViewSource() - c#

I have the latest version of CefSharp installed and when I call ViewSource(), it opens up a notepad window with the source code. But when I call GetSourceAsync() the code is very different and missing the HTML I need in the var html that is shown in the Notepad window. The only work around would be to somehow copy the contents of the code in Notepad into my app and use it. Does anyone know how to get the html as shown in the NotePad window? I'm running the application in the Windows 7 Pro operating system using Visual Studio 2017 Express. Here is my code...
private void WebBrowserFrameLoadEndedAsync(object sender, FrameLoadEndEventArgs e)
{
chromeBrowser.ViewSource();
chromeBrowser.GetSourceAsync().ContinueWith(taskHtml =>
{
var html = taskHtml.Result;
});
}
}
Here is the web page that the browser goes to...
chromeBrowser = new ChromiumWebBrowser("https://www.amazon.com/product-reviews/B084RCFDJ3/ref=acr_search_hist_5?ie=UTF8&filterByStar=five_star&reviewerType=all_reviews#reviews-filter-bar");

It turns out, I was searching the source for the wrong phrase. So now I just call the following...
string source = await chromeBrowser.GetBrowser().MainFrame.GetSourceAsync();

I've gone into detail on the difference between GetSource and ViewSource further down.
Some important things to note about FrameLoadEnd.
Is called for every frame, if your page has multiple frames then it will be called multiple times
It's called when the initial resources have finished loading, if your website is dynamically created/rendered then your call maybe happening too early.
//FrameLoadEnd is called for every frame, if your page has multiple frames then it will be called multiple times.
private async void BrowserFrameLoadEnd(object sender, FrameLoadEndEventArgs e)
{
var frame = e.Frame;
var source = await frame.GetSourceAsync();
}
//To only get the main frame source
private async void BrowserFrameLoadEnd(object sender, FrameLoadEndEventArgs e)
{
var frame = e.Frame;
if (frame.IsMain)
{
var source = await frame.GetSourceAsync();
}
}
// If your website dynamically generates content then you might need to wait a
// little longer for it to render. Introduce a fixed wait period, this can be
// problematic for a number of reasons.
private async void BrowserFrameLoadEnd(object sender, FrameLoadEndEventArgs e)
{
var frame = e.Frame;
if (frame.IsMain)
{
// Wait a little bit of time for the page to load
await System.Threading.Tasks.Task.Delay(500);
var source = await frame.GetSourceAsync();
}
}
Explanation of the difference in behaviour
Firstly ViewSource() returns immediately, Notepad is being launched after the GetSourceAsync call has completed.
Both methods send a GetSource message to the render process which returns a ReadOnlySharedMemoryRegion. When you read the data from the shared memory section ends up with a different snapshot in time.
void CefFrameHostImpl::ViewSource() {
SendCommandWithResponse(
"GetSource",
base::BindOnce(&ViewTextCallback, CefRefPtr<CefFrameHostImpl>(this)));
}
void CefFrameHostImpl::GetSource(CefRefPtr<CefStringVisitor> visitor) {
SendCommandWithResponse("GetSource",
base::BindOnce(&StringVisitCallback, visitor));
}
CEF Source reference.
The CefFrameHostImpl::GetSource method which GetSourceAsync calls completes very quickly as it simply creates a string from the shared memory section.
The CefFrameHostImpl::ViewSource method whilst returns immediately is much slower and takes additional processing to create a file on disk, write that string, spawn notepad.
HTML Source is always a snapshot of source for a given point in time. For static web pages, time makes no difference, for dynamically rendered/updated websites a few hundred milliseconds can mean you get entirely different source.
When the shared ReadOnlySharedMemoryRegion is converted into a string means there is a subtle difference in the source you end up getting.

Related

CefSharp offscreen - wait for page for render

I have a problem as below. I use the CefSharp offscreen for webpage automation as follows (I open only one and the same page):
1. Open page and wait untill it renders*.
2. With EvaluateScriptAsync I put on value to input form and then with the same method I click the button on webpage.
3. Then there is some JS on this webpage that check result and displays a message.
4. When the message is displayed I make a screenshot. **
However, I have two problems:
* My sulution has to be Internet speed proof. And As I used BrowserLoadingStateChanged event and IsLoading method, even though that the events fired the webpage did not load completly - when I started the EavluateScriptAsync method it gives back error because the page was not completly loaded. Sure, I can put sth like ThreadSleep but it does not always work - it is strongly dependent on Your internet speed.
** When I try to make a screenshot it does not always contain the result message displayed by JS - sometimes there is a loading circle instead of message. And here again I can use THreadSleep but it does not always work.
Do You have any ideas? Thanks in advance.
private static void BrowserLoadingStateChanged(object sender, LoadingStateChangedEventArgs e)
{
// Check to see if loading is complete - this event is called twice, one when loading starts
// second time when it's finished
// (rather than an iframe within the main frame).
if (!e.IsLoading)
{
// Remove the load event handler, because we only want one snapshot of the initial page.
browser.LoadingStateChanged -= BrowserLoadingStateChanged;
Thread.Sleep(1800); // e. g. but it isn't a solution in fact
var scriptTask = browser.EvaluateScriptAsync("document.getElementById('b-7').value = 'something'");
scriptTask = browser.EvaluateScriptAsync("document.getElementById('b-8').click()");
//scriptTask.Wait();
if (browser.IsLoading == false)
{
scriptTask.ContinueWith(t =>
{
//Give the browser a little time to render
//Thread.Sleep(500);
Thread.Sleep(500); // still not a solution
// Wait for the screenshot to be taken.
var task = browser.ScreenshotAsync();
task.ContinueWith(x =>
{
// Make a file to save it to (e.g. C:\Users\jan\Desktop\CefSharp screenshot.png)
var screenshotPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "CefSharp screenshot.png");
Console.WriteLine();
Console.WriteLine("Screenshot ready. Saving to {0}", screenshotPath);
// Save the Bitmap to the path.
// The image type is auto-detected via the ".png" extension.
task.Result.Save(screenshotPath);
// We no longer need the Bitmap.
// Dispose it to avoid keeping the memory alive. Especially important in 32-bit applications.
task.Result.Dispose();
Console.WriteLine("Screenshot saved. Launching your default image viewer...");
// Tell Windows to launch the saved image.
Process.Start(screenshotPath);
Console.WriteLine("Image viewer launched. Press any key to exit.");
}, TaskScheduler.Default);
});
}
}
}

Ok, so in my case the best sollution was to use javascript to check if element by id exists. If yes then the page is loaded.

I noticed that render time may vary significantly depending on your hardware. It can take up to 5 seconds to render after EvaluateScriptAsync was called. So it always better to do longer delays before calling ScreenshotAsync() if you do not want to get outdated screenshot.
Thread.Sleep(5000);

Pass String from python script to C# UI [duplicate]

I'm making a program that controls a game server. One of the functions I'm making, is a live server logfile monitor.
There is a logfile (a simple textfile) that gets updated by the server as it runs.
How do I continuously check the logfile and output it's content in a RichTextBox?
I did this simple function just try and get the content of the log. It will of course just get the text row by row and output it to my textbox. Also it will lock the program for as long as the loop runs, so I know it's useless.
public void ReadLog()
{
using (StreamReader reader = new StreamReader("server.log"))
{
String line;
// Read and display lines from the file until the end of the file is reached.
while ((line = reader.ReadLine()) != null)
{
monitorTextBox.AppendText(line + "\n");
CursorDown();
}
}
}
But how would you go about solving the live monitoring as simple as possible?
*** EDIT ***
I'm using Prescots solution. great stuff.
At the moment I'm using a sstreamreader to put the text from the file to my textbox. I ran into the problem is that, whenever I tried to access any of the gui controls in my event handler the program just stopped with no error or warnings.
I found out that it has to do with threading. I solved that like this:
private void OnChanged(object source, FileSystemEventArgs e)
{
if (monitorTextField.InvokeRequired)
{
monitorTextField.Invoke((MethodInvoker)delegate { OnChanged(source, e); });
}
else
{
StreamReader reader = new StreamReader("file.txt");
monitorTextField.Text = "";
monitorTextField.Text = reader.ReadToEnd();
reader.Close();
CursorDown();
}
}
Now my only problem is that the file.txt is used by the server so I can't access it, since it's "being used by another process". I can't control that process, so maybe I'm out of luck.
But the file can be opened in notepad while the server is running, so somehow it must be possible. Perhaps I can do a temp copy of the file when it updates and read the copy. I don't know.

Check out the System.IO.FileSystemWatcher class:
public static Watch()
{
var watch = new FileSystemWatcher();
watch.Path = #"D:\tmp";
watch.Filter = "file.txt";
watch.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite; //more options
watch.Changed += new FileSystemEventHandler(OnChanged);
watch.EnableRaisingEvents = true;
}
/// Functions:
private static void OnChanged(object source, FileSystemEventArgs e)
{
if(e.FullPath == #"D:\tmp\file.txt")
{
// do stuff
}
}
Edit: if you know some details about the file, you could handle the most efficent way to get the last line. For example, maybe when you read the file, you can wipe out what you've read, so next time it's updated, you just grab whatever is there and output. Perhaps you know one line is added at a time, then your code can immediately jump to the last line of the file. Etc.

Although the FileSystemWatcher is the most simple solution I have found it to be unreliable in reality.. often a file can be updated with new contents but the FileSystemWatcher does not fire an event until seconds later and often never.
The only reliable way I have found to approach this is to check for changes to the file on a regular basis using a System.Timers.Timer object and checking the file size.
I have written a small class that demonstrates this available here:
https://gist.github.com/ant-fx/989dd86a1ace38a9ac58
Example Usage
var monitor = new LogFileMonitor("c:\temp\app.log", "\r\n");
monitor.OnLine += (s, e) =>
{
// WARNING.. this will be a different thread...
Console.WriteLine(e.Line);
};
monitor.Start();
The only real disadvantage here (apart from a slight performance delay caused by file size checking) is that because it uses a System.Timers.Timer the callback comes from a different thread.
If you are using a Windows Forms or WPF app you could easily modify the class to accept a SynchronizingObject which would ensure the event handler events are called from the same thread.

As #Prescott suggested, use a FileSystemWatcher. And make sure, you open the file with the appropriate FileShare mode (FileShare.ReadWrite seems to be appropriate), since the file might still be opened by the server. If you try to open the file exclusively while it is still used by another process, the open operation will fail.
Also in order to gain a bit of performance, you could remember the last position up to which you already have read the file and only read the new parts.

Use this answer on another post c# continuously read file.
This one is quite efficient, and it checks once per second if the file size has changed.
You can either run it on another thread (or convert to async code), but in any case you would need to marshall the text back to the main thread to append to the textbox.

Try adding a Timer and have the Timer.Tick set to an Interval of 1 second. On Timer.Tick you run the function.
private void myTimer_Tick(object sender, EventArgs e)
{
ReadLog();
}

Changing the text of a label at the start and after the code in C#

I have the following code which will fetch some data from a .php file on a website and it will format the data and show it on the form. (Using visual studio)
Sometimes the fetching of data takes some time. So I want a label named U to be changed to "Refreshing..." during the time it fetches the data.
So I used the below code.(I am showing the relevant part)
private void refresh(object sender, MouseEventArgs e)
{
U.Text = "Refreshing ...";
string r = HttpGet("http://www.example.com/?Fetch=OK");
U.Text = "Done";
}
But this code is not changing the text to "Refreshing ..." ,it's only being changed to "Done" even if the fetching takes 1 minute.
What's happening here? How can I make it work?

The best way to handle this is typically to fetch the data asynchronously:
private async void Refresh(object sender, MouseEventArgs e)
{
U.Text = "Refreshing...";
string r = await HttpGetAsync("http://www.example.com/?Fetch=OK"); // Requires an async version
U.Text = "Done";
}
This requires changing your HttpGet method to get the data asynchronously, and return a Task<string> instead of string.

The issue is that your code executes and somehow, due to either low resources on machine, the application stops while loading the resources. Once done, it updates the content. You should use Threading of .NET for this, to perform different tasks using threads.
Assign each function to a different thread, UI thread must be different, resource loading must be different too.
Have a look here, msdn.microsoft.com/en-us/library/system.threading.thread(v=vs.110).aspx

How to detect when a page is being update to IIS

Whenever I update my web app on IIS, any user who currently using it, will see the page be unresponsive and it won't work again until they refresh the browser. (The update process last for about 30 seconds)
I would like to show up a notification, such as a javascript alert, for user to know that the page is being udpated and please try to refresh the page after 30 seconds, etc.
I tried to catch the Exception in Global.ascx but no exception was thrown in this case.

Consider using app_offline.htm. It is a page that will cause clients to see your IIS app as being down. When you're through updating, just remove the page.

You could create a FileSystemWatcher in global.ascx then bubble up (update a js for instance) an exception when a file is updated. You could start with this:
using System.IO;
namespace WebApplication1
{
public class Global : System.Web.HttpApplication
{
FileSystemWatcher watcher;
void Application_Start(object sender, EventArgs e)
{
// Code that runs on application startup
watcher = new FileSystemWatcher(this.Context.Server.MapPath("/"));
watcher.Changed += new FileSystemEventHandler(watcher_Changed);
}
void watcher_Changed(object sender, FileSystemEventArgs e)
{
//set a value in js file
FileInfo jsFilesChanged = new FileInfo(Path.Combine(this.Context.Server.MapPath("/"), "scripts", "files_changed.js"));
using (StreamWriter jsWriter = (!jsFilesChanged.Exists) ? new StreamWriter(jsFilesChanged.Create()) : new StreamWriter(jsFilesChanged.FullName, false))
{
jsWriter.WriteLine("var changed_file = \"" + e.Name + "\";");
}
}
//.......
}
}
Then in client code include files_changed.js and create a periodic timeout call to check the var changed_file. Also, make sure watcher doesn't get garbage collected.
Some references:
http://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher.aspx
http://www.developerfusion.com/article/84362/extending-filesystemwatcher-to-aspnet/

How the big boys do this:
You need to have a way of posting an alert on a page. Typically this is done by having a table in your database for these alerts. Basically you are just storing some text in there like "hey, the site is going down for maintenance between 8:00am and 8:01am"..
On each page load, you check that table and display any messages found in a conspicuous place (like the top).
Prior to pushing an update you add the alert, while giving them enough time to wrap up whatever it is that they are doing.
After the push is complete you clear out the alerts table.
Honestly the main issue you have is simply one of scheduling updates and communicating to the users what's about to happen. You want to do so in a way that isn't a surprise. That said, you might consider enabling the optimizeCompilations flag in order to try and speed up the compilation time of your website when it is first hit after pushing an update.

HTML - How do I know when all frames are loaded?

I'm using .NET WebBrowser control.
How do I know when a web page is fully loaded?
I want to know when the browser is not fetching any more data. (The moment when IE writes 'Done' in its status bar...).
Notes:
The DocumentComplete/NavigateComplete events might occur multiple times for a web site containing multiple frames.
The browser ready state doesn't solve the problem either.
I have tried checking the number of frames in the frame collection and then count the number of times I get DocumentComplete event but this doesn't work either.
this.WebBrowser.IsBusy doesn't work either. It is always 'false' when checking it in the Document Complete handler.

Here's how I solved the problem in my application:
private void wbPost_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url != wbPost.Url)
return;
/* Document now loaded */
}

My approach to doing something when page is completely loaded (including frames) is something like this:
using System.Windows.Forms;
protected delegate void Procedure();
private void executeAfterLoadingComplete(Procedure doNext) {
WebBrowserDocumentCompletedEventHandler handler = null;
handler = delegate(object o, WebBrowserDocumentCompletedEventArgs e)
{
ie.DocumentCompleted -= handler;
Timer timer = new Timer();
EventHandler checker = delegate(object o1, EventArgs e1)
{
if (WebBrowserReadyState.Complete == ie.ReadyState)
{
timer.Dispose();
doNext();
}
};
timer.Tick += checker;
timer.Interval = 200;
timer.Start();
};
ie.DocumentCompleted += handler;
}
From my other approaches I learned some "don't"-s:
don't try to bend the spoon ... ;-)
don't try to build elaborate construct using DocumentComplete, Frames, HtmlWindow.Load events. Your solution will be fragile if working at all.
don't use System.Timers.Timer instead of Windows.Forms.Timer, strange errors will begin to occur in strange places if you do, due to timer running on different thread that the rest of your app.
don't use just Timer without DocumentComplete because it may fire before your page even begins to load and will execute your code prematurely.

Here's my tested version. Just make this your DocumentCompleted Event Handler and place the code that you only want be called once into the method OnWebpageReallyLoaded(). Effectively, this approach determines when the page has been stable for 200ms and then does its thing.
// event handler for when a document (or frame) has completed its download
Timer m_pageHasntChangedTimer = null;
private void webBrowser_DocumentCompleted( object sender, WebBrowserDocumentCompletedEventArgs e ) {
// dynamic pages will often be loaded in parts e.g. multiple frames
// need to check the page has remained static for a while before safely saying it is 'loaded'
// use a timer to do this
// destroy the old timer if it exists
if ( m_pageHasntChangedTimer != null ) {
m_pageHasntChangedTimer.Dispose();
}
// create a new timer which calls the 'OnWebpageReallyLoaded' method after 200ms
// if additional frame or content is downloads in the meantime, this timer will be destroyed
// and the process repeated
m_pageHasntChangedTimer = new Timer();
EventHandler checker = delegate( object o1, EventArgs e1 ) {
// only if the page has been stable for 200ms already
// check the official browser state flag, (euphemistically called) 'Ready'
// and call our 'OnWebpageReallyLoaded' method
if ( WebBrowserReadyState.Complete == webBrowser.ReadyState ) {
m_pageHasntChangedTimer.Dispose();
OnWebpageReallyLoaded();
}
};
m_pageHasntChangedTimer.Tick += checker;
m_pageHasntChangedTimer.Interval = 200;
m_pageHasntChangedTimer.Start();
}
OnWebpageReallyLoaded() {
/* place your harvester code here */
}

How about using javascript in each frame to set a flag when the frame is complete, and then have C# look at the flags?

I'm not sure it'll work but try to add a JavaScript "onload" event on your frameset like that :
function everythingIsLoaded() { alert("everything is loaded"); }
var frameset = document.getElementById("idOfYourFrameset");
if (frameset.addEventListener)
frameset.addEventListener('load',everythingIsLoaded,false);
else
frameset.attachEvent('onload',everythingIsLoaded);

Can you use jQuery? Then you could easily bind frame ready events on the target frames. See this answer for directions. This blog post also has a discussion about it. Finally there is a plug-in that you could use.
The idea is that you count the number of frames in the web page using:
$("iframe").size()
and then you count how many times the iframe ready event has been fired.

You will get a BeforeNavigate and DocumentComplete event for the outer web page, as well as each frame. You know you're done when you get the DocumentComplete event for the outer webpage. You should be able to use the managed equivilent of IWebBrowser2::TopLevelContainer() to determine this.
Beware, however, the website itself can trigger more frame navigations anytime it wants, so you never know if a page is truly done forever. The best you can do is keep a count of all the BeforeNavigates you see and decrement the count when you get a DocumentComplete.
Edit: Here's the managed docs: TopLevelContainer.

Here's what finally worked for me:
public bool WebPageLoaded
{
get
{
if (this.WebBrowser.ReadyState != System.Windows.Forms.WebBrowserReadyState.Complete)
return false;
if (this.HtmlDomDocument == null)
return false;
// iterate over all the Html elements. Find all frame elements and check their ready state
foreach (IHTMLDOMNode node in this.HtmlDomDocument.all)
{
IHTMLFrameBase2 frame = node as IHTMLFrameBase2;
if (frame != null)
{
if (!frame.readyState.Equals("complete", StringComparison.OrdinalIgnoreCase))
return false;
}
}
Debug.Print(this.Name + " - I think it's loaded");
return true;
}
}
On each document complete event I run over all the html element and check all frames available (I know it can be optimized). For each frame I check its ready state.
It's pretty reliable but just like jeffamaphone said I have already seen sites that triggered some internal refreshes.
But the above code satisfies my needs.
Edit: every frame can contain frames within it so I think this code should be updated to recursively check the state of every frame.

I just use the webBrowser.StatusText method. When it says "Done" everything is loaded!
Or am I missing something?

Checking for IE.readyState = READYSTATE_COMPLETE should work, but if that's not proving reliable for you and you literally want to know "the moment when IE writes 'Done' in its status bar", then you can do a loop until IE.StatusText contains "Done".

Have you tried WebBrowser.IsBusy property?

I don't have an alternative for you, but I wonder if the IsBusy property being true during the Document Complete handler is because the handler is still running and therefore the WebBrowser control is technically still 'busy'.
The simplest solution would be to have a loop that executes every 100 ms or so until the IsBusy flag is reset (with a max execution time in case of errors). That of course assumes that IsBusy will not be set to false at any point during page loading.
If the Document Complete handler executes on another thread, you could use a lock to send your main thread to sleep and wake it up from the Document Complete thread. Then check the IsBusy flag, re-locking the main thread is its still true.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Source Code Different When Using CefSharp Function GetSourceAsync() than when Calling ViewSource() - c#

It turns out, I was searching the source for the wrong phrase. So now I just call the following... string source = await chromeBrowser.GetBrowser().MainFrame.GetSourceAsync();

Related

CefSharp offscreen - wait for page for render

Pass String from python script to C# UI [duplicate]

Changing the text of a label at the start and after the code in C#

How to detect when a page is being update to IIS

HTML - How do I know when all frames are loaded?

Categories

Resources