CefSharp offscreen - wait for page for render - c#

I have a problem as below. I use the CefSharp offscreen for webpage automation as follows (I open only one and the same page):
1. Open page and wait untill it renders*.
2. With EvaluateScriptAsync I put on value to input form and then with the same method I click the button on webpage.
3. Then there is some JS on this webpage that check result and displays a message.
4. When the message is displayed I make a screenshot. **
However, I have two problems:
* My sulution has to be Internet speed proof. And As I used BrowserLoadingStateChanged event and IsLoading method, even though that the events fired the webpage did not load completly - when I started the EavluateScriptAsync method it gives back error because the page was not completly loaded. Sure, I can put sth like ThreadSleep but it does not always work - it is strongly dependent on Your internet speed.
** When I try to make a screenshot it does not always contain the result message displayed by JS - sometimes there is a loading circle instead of message. And here again I can use THreadSleep but it does not always work.
Do You have any ideas? Thanks in advance.
private static void BrowserLoadingStateChanged(object sender, LoadingStateChangedEventArgs e)
{
// Check to see if loading is complete - this event is called twice, one when loading starts
// second time when it's finished
// (rather than an iframe within the main frame).
if (!e.IsLoading)
{
// Remove the load event handler, because we only want one snapshot of the initial page.
browser.LoadingStateChanged -= BrowserLoadingStateChanged;
Thread.Sleep(1800); // e. g. but it isn't a solution in fact
var scriptTask = browser.EvaluateScriptAsync("document.getElementById('b-7').value = 'something'");
scriptTask = browser.EvaluateScriptAsync("document.getElementById('b-8').click()");
//scriptTask.Wait();
if (browser.IsLoading == false)
{
scriptTask.ContinueWith(t =>
{
//Give the browser a little time to render
//Thread.Sleep(500);
Thread.Sleep(500); // still not a solution
// Wait for the screenshot to be taken.
var task = browser.ScreenshotAsync();
task.ContinueWith(x =>
{
// Make a file to save it to (e.g. C:\Users\jan\Desktop\CefSharp screenshot.png)
var screenshotPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "CefSharp screenshot.png");
Console.WriteLine();
Console.WriteLine("Screenshot ready. Saving to {0}", screenshotPath);
// Save the Bitmap to the path.
// The image type is auto-detected via the ".png" extension.
task.Result.Save(screenshotPath);
// We no longer need the Bitmap.
// Dispose it to avoid keeping the memory alive. Especially important in 32-bit applications.
task.Result.Dispose();
Console.WriteLine("Screenshot saved. Launching your default image viewer...");
// Tell Windows to launch the saved image.
Process.Start(screenshotPath);
Console.WriteLine("Image viewer launched. Press any key to exit.");
}, TaskScheduler.Default);
});
}
}
}

Ok, so in my case the best sollution was to use javascript to check if element by id exists. If yes then the page is loaded.

I noticed that render time may vary significantly depending on your hardware. It can take up to 5 seconds to render after EvaluateScriptAsync was called. So it always better to do longer delays before calling ScreenshotAsync() if you do not want to get outdated screenshot.
Thread.Sleep(5000);

Related

Source Code Different When Using CefSharp Function GetSourceAsync() than when Calling ViewSource()

I have the latest version of CefSharp installed and when I call ViewSource(), it opens up a notepad window with the source code. But when I call GetSourceAsync() the code is very different and missing the HTML I need in the var html that is shown in the Notepad window. The only work around would be to somehow copy the contents of the code in Notepad into my app and use it. Does anyone know how to get the html as shown in the NotePad window? I'm running the application in the Windows 7 Pro operating system using Visual Studio 2017 Express. Here is my code...
private void WebBrowserFrameLoadEndedAsync(object sender, FrameLoadEndEventArgs e)
{
chromeBrowser.ViewSource();
chromeBrowser.GetSourceAsync().ContinueWith(taskHtml =>
{
var html = taskHtml.Result;
});
}
}
Here is the web page that the browser goes to...
chromeBrowser = new ChromiumWebBrowser("https://www.amazon.com/product-reviews/B084RCFDJ3/ref=acr_search_hist_5?ie=UTF8&filterByStar=five_star&reviewerType=all_reviews#reviews-filter-bar");
It turns out, I was searching the source for the wrong phrase. So now I just call the following...
string source = await chromeBrowser.GetBrowser().MainFrame.GetSourceAsync();
I've gone into detail on the difference between GetSource and ViewSource further down.
Some important things to note about FrameLoadEnd.
Is called for every frame, if your page has multiple frames then it will be called multiple times
It's called when the initial resources have finished loading, if your website is dynamically created/rendered then your call maybe happening too early.
//FrameLoadEnd is called for every frame, if your page has multiple frames then it will be called multiple times.
private async void BrowserFrameLoadEnd(object sender, FrameLoadEndEventArgs e)
{
var frame = e.Frame;
var source = await frame.GetSourceAsync();
}
//To only get the main frame source
private async void BrowserFrameLoadEnd(object sender, FrameLoadEndEventArgs e)
{
var frame = e.Frame;
if (frame.IsMain)
{
var source = await frame.GetSourceAsync();
}
}
// If your website dynamically generates content then you might need to wait a
// little longer for it to render. Introduce a fixed wait period, this can be
// problematic for a number of reasons.
private async void BrowserFrameLoadEnd(object sender, FrameLoadEndEventArgs e)
{
var frame = e.Frame;
if (frame.IsMain)
{
// Wait a little bit of time for the page to load
await System.Threading.Tasks.Task.Delay(500);
var source = await frame.GetSourceAsync();
}
}
Explanation of the difference in behaviour
Firstly ViewSource() returns immediately, Notepad is being launched after the GetSourceAsync call has completed.
Both methods send a GetSource message to the render process which returns a ReadOnlySharedMemoryRegion. When you read the data from the shared memory section ends up with a different snapshot in time.
void CefFrameHostImpl::ViewSource() {
SendCommandWithResponse(
"GetSource",
base::BindOnce(&ViewTextCallback, CefRefPtr<CefFrameHostImpl>(this)));
}
void CefFrameHostImpl::GetSource(CefRefPtr<CefStringVisitor> visitor) {
SendCommandWithResponse("GetSource",
base::BindOnce(&StringVisitCallback, visitor));
}
CEF Source reference.
The CefFrameHostImpl::GetSource method which GetSourceAsync calls completes very quickly as it simply creates a string from the shared memory section.
The CefFrameHostImpl::ViewSource method whilst returns immediately is much slower and takes additional processing to create a file on disk, write that string, spawn notepad.
HTML Source is always a snapshot of source for a given point in time. For static web pages, time makes no difference, for dynamically rendered/updated websites a few hundred milliseconds can mean you get entirely different source.
When the shared ReadOnlySharedMemoryRegion is converted into a string means there is a subtle difference in the source you end up getting.

Using await in async methods to prevent next line of code from running

I'm new to using async methods, so I think I'm misunderstanding something. I have a WinForms app with a button, and when the button is clicked, an async method gets called. This must be async, as I need to make javascript calls to a Chromium Web Browser control (using CefSharp). I need to ensure that this javascript has finished running and that the browser has updated before continuing with the next part of the method.
I'm basically trying to capture the entire web page into a single image. My approach was to use javascript to update the scroll position on the page, then take screenshots in each position using Graphics.CopyFromScreen. This mostly works, however occasionally the resulting image will have the wrong 'chunk' of webpage (e.g., the first bitmap is repeated twice). Here is my code:
// Calculate screen sizes, screenshot spacing etc.
for (int i = 0; i < screenshotCount; i++)
{
int scrollSize = i == 0 ? -PageHeight : (int)browserControlHeight;
string script = "(function() { window.scrollBy(0, " + scrollSize.ToString() + ") })();";
await browser.EvaluateScriptAsync(script);
// Take screenshot, add to list of bitmaps
}
// Combine resulting list of bitmaps
If I add the following
await Task.Delay(1000);
after the EvaluateScriptAsync() call, the final image comes out correct every time. I'm working on the assumption that the javascript is being called but doesn't complete before the screenshot begins. If this is the case, even adding a delay may not work (what if the javascript takes longer than a second to run?).
Am I misunderstanding the way that async/await works?
No, the issue is not with await, the issue is that the Task returned from EvaluateScriptAsync is being marked as completed before you're ready to continue. It's going to be marked as completed as soon as the javascript to inform the browser that it should scroll has executed, rather than being marked as completed after the browser has finished re-rendering the screen after being sent the scroll command.

Force loop containing asynchronous task to maintain sequence

Something tells me this might be a stupid question and I have in fact approached my problem from the wrong direction, but here goes.
I have some code that loops through all the documents in a folder - The alphabetical order of these documents in each folder is important, this importance is also reflected in the order the documents are printed. Here is a simplified version:
var wordApp = new Microsoft.Office.Interop.Word.Application();
foreach (var file in Directory.EnumerateFiles(folder))
{
fileCounter++;
// Print file, referencing a previously instantiated word application object
wordApp.Documents.Open(...)
wordApp.PrintOut(...)
wordApp.ActiveDocument.Close(...)
}
It seems (and I could be wrong) that the PrintOut code is asynchronous, and the application sometimes gets into a situation where the documents get printed out of order. This is confirmed because if I step through, or place a long enough Sleep() call, the order of all the files is correct.
How should I prevent the next print task from starting before the previous one has finished?
I initially thought that I could use a lock(someObject){} until I remembered that they are only useful for preventing multiple threads accessing the same code block. This is all on the same thread.
There are some events I can wire into on the Microsoft.Office.Interop.Word.Application object: DocumentOpen, DocumentBeforeClose and DocumentBeforePrint
I have just thought that this might actually be a problem with the print queue not being able to accurately distinguish lots of documents that are added within the same second. This can't be the problem, can it?
As a side note, this loop is within the code called from the DoWork event of a BackgroundWorker object. I'm using this to prevent UI blocking and to feedback the progress of the process.
Your event-handling approach seems like a good one. Instead of using a loop, you could add a handler to the DocumentBeforeClose event, in which you would get the next file to print, send it to Word, and continue. Something like this:
List<...> m_files = Directory.EnumerateFiles(folder);
wordApp.DocumentBeforeClose += ProcessNextDocument;
...
void ProcessNextDocument(...)
{
File file = null;
lock(m_files)
{
if (m_files.Count > 0)
{
file = m_files[m_files.Count - 1];
m_files.RemoveAt(m_files.Count - 1);
}
else
{
// Done!
}
}
if (file != null)
{
PrintDocument(file);
}
}
void PrintDocument(File file)
{
wordApp.Document.Open(...);
wordApp.Document.PrintOut(...);
wordApp.ActiveDocument.Close(...);
}
The first parameter of Application.PrintOut specifies whether the printing should take place in the background or not. By setting it to false it will work synchronously.

Prevent Background task from updating LiveTile

I really need your help since I have a frustrating problem. I'm downloading data in my periodic agent (OnInvoke). Works fine but every night the web site I download data from has no data to download. If that happens I want the live tile to remain as it is (instead of beeing empty) with the current data and not get updated. Then one or two hours later when there is data to download and parse the update should continue.
I have tried this but when NotifyComplete is called the the code after still gets executed. Isn't NotifyComplete supposed to stop the rest of the code to be executed?
MatchCollection matchesMyData = rxMyData.Matches(strHTML);
foreach (Match matchMyData in matchesMyData)
{
GroupCollection groupsMyData = matchMyData.Groups;
//Code for handling downloaded data
}
if (matchesMyData.Count < 1)
{
ShellToast toast = new ShellToast();
toast.Title = "No update: ";
toast.Content = "Webservice returned no data";
toast.Show();
NotifyComplete();
}
I also tried the following peice of code but that stopped my background task and I had to start my app again to re-enable it. Why?
ShellTile TileToFind = ShellTile.ActiveTiles.FirstOrDefault(x => x.NavigationUri.ToString().Contains("TileID=2"));
if (TileToFind != null && intCount > 0)
{
//Update the live tile
}
So, when no data gets parsed the tile should remain as it is and an hour or two later when data gets downloaded everything should be back to normal with thetile beeing updated.
Please help, since this is a show stopper right now. Thanks in advance.
Calling NotifyComplete() will not stop the code after the method call being executed, it just lets the OS know that you are finished. The OS should abort the thread but there may be time for a few more lines of code to run (the documentation isn't clear on whether the thread that calls NotifyComplete will be aborted immediately).
If you add a return statement after the call to NotifyComplete then the tile shouldn't be updated.

Where to store progress information in ASP.Net web application

I'm creating a page that get uploaded text files and builds them into multiple PDFs. They are just exports from Excel. Each row in the file corresponds to a new PDF that needs to be created.
Anyway, once the files are uploaded I want to begin processing them, but I don't want the user to have to stay on the page, or even still have their session open. For example they could close the browser and come back 10 minutes later, log in, and the progress information will say like 112/200 files processed or something. It will be a lot quicker than that though.
So two questions really, how can I pass this processing job to something (Handler?Thread?) that will continue to run when the page is closed, and will return as soon as the job has started (so the browser isn't stopped)? Secondly, where can I store this information so that when the user comes back to the page, they can see the current progress.
I realise that I can't use sessions, and since it will be processing about a file a second I don't really want to update a DB every second. Is there some way I can do this? Is it possible?
I solved this by using the link provided by astander above. I simply create an object in the HttpContext.Application to store progress variables, and then Set the method which does my processing inside a new Thread.
// Create the new progress object
BatchProgress bs = new BatchProgress(0);
if(Application["BatchProgress"] != null)
{
// Should never happen
Application["BatchProgress"] = bs;
}
else
{
Application.Add("BatchProgress","bs");
}
//Set up new thread, run batch is the method that does all the processing.
ThreadStart ts = new ThreadStart(RunBatch);
Thread t = new Thread(ts);
t.Start();
It then returns after the thread starts and I can use jQuery to get the Application["BatchProgress"] object at regular intervals. At the end of my thread the BatchProgress object has its status set to "Complete", then when jQuery queries it, it sees the complete status and removes the progress object from the application.

Categories

Resources