C# Download a complete weired HTML page - c#

I'm sorry if this question has allready been answered , but I litterally spent more than two weeks searching the Internet for a solution to my issue.
Now , I definitly do not perform the best google searches , and it might seem that my question has several effective answers on the Internet. but I really tried every single solution that I found , without any positive results.
What i'm trying to do is simple , and I did it successfully on many websites :
Navigating to a website using WebBrowser (1).
Waiting for everything to load properly (document completed event).
Download the page using DocumentText property (1).
(1) : I also use WebClient from time to time.
And there it is , I get the html page , and I can exploit it anyway I like. The issue is with a particular website that I cannot obtain the full content inspite of using all the different solutions that I found. I suspected the fact that this page might need to load several scripts before getting the full content. Yet again, I read that WebBrowser does run all the necessarry scripts before triggering the " completed " event, so , apparently , that's not the issue. The page that i'm inquiring about is : http://www.coolmod#com/tarjetas-graficas-nvidia-pci-express
I tried , after that the WebBrowser loads the entire page , looking for random elements using GetElementByID property and checking if I get a null result. It appears that when I try getting an element that does not belong to the products list , i'm successful. But , whenever I try to get an element that belongs to the list it self , I always get a null. Which means , the list it self does not load. and I really don't know why. By the way , I do not prevent the WebBrowser. Navigate () from delivering multiple responses , I allow it to give as many feedbacks as possible , and still , the product list does not load , even when I pass the cookies. I Even tried copying all the content of the document and pasting it through the clipboard. Here is a simple example of what I try to do :
private void catalogueDownload()
{
System.Windows.Forms.WebBrowser wb = new System.Windows.Forms.WebBrowser();
wb.ScriptErrorsSuppressed = true;
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(Catalogue_DocumentCompleted);
wb.Navigate("http://www.coolmod.com/tarjetas-graficas-nvidia-pci-express");
}
public void Catalogue_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var wb = sender as System.Windows.Forms.WebBrowser;
string output = wb.DocumentText;
File.WriteAllText("testing.html", output);
}
Thanks for giving up your time to read all this.

System.Windows.Forms.WebBrowser is a bit outdated, If I were you, I would consider using an external library for that, Selenium would be my 1st choice, given it has all the necessary integrations with .NET Framework (and a lot of other languages)

Related

Selenium Webdriver not returning Javascript code

Hi I am new to Selenium Webdriver. I can successfully open a webpage and find elements on it.
In one case I have noted that there is a link on a page that becomes clickable after a while. In Firebug on the Script tab, I can see the code for the javascript that does the timer function.
But using Selenium Webdriver if I issue:
driver.PageSource
I cannot see the source code for the Javascript. Delaying for 30 seconds before requesting the source makes no difference. I have tried finding it with various By options using:
driver.FindElement
and so on, but it isnt there.
How does firebug manage to find and show the Javascript source code? Is there a way that I can coerce Selenium Webdriver to return all code referenced by the page?
Or is there a better approach?
Thanks for any advice!
EDIT---------------------
I tried the following in Firefox:
Dim Driver2 As IWebDriver = New Chrome.ChromeDriver
Driver2.Url = "http://mypage"
Dim js As IJavaScriptExecutor = TryCast(Driver2, IJavaScriptExecutor)
Dim title As String = DirectCast(js.ExecuteScript("return JSON.stringify(window)"), String)
and I got
Permission denied to access property 'toJSON'
I read that this wont work in firefox so I tried in Chrome, and got
Blocked a frame with origin "http://mypage" from accessing a
cross-origin frame
and from there no solutions because according to this its a security restriction, apparently you can't access an with Javascript
I'm starting to think Im a bit out of my depth here.
PageSource probably doesn't return an exact snapshot of the DOM & etc.
You can instead inspect javascript using driver.executeScript() but the burden of analyzing the return object may be discouraging.
Regardless - Here's a contrived example:
Object result = driver.executeScript("return JSON.stringify(window)");
System.out.println(result.toString());

MSDN OneNote Api: Navigate to never before opened page without opening a OneNote Application Window

My goal is to be able to use C# to programmatically open any .one section file and get all of the section's page ids. In a simple case (one where I have created and recently used the section), this can done with the following code:
using Microsoft.Office.Interop.OneNote;
class Program
{
public static void ProcessOnenoteFile()
{
Application onenoteApp = new Application();
string filepath = #"C:\Users\Admin\Documents\OneNote Notebooks\My Notebook\testsection.one";
string sectionId;
onenoteApp.OpenHierarchy(filepath, null, out sectionId);
string hierarchy;
onenoteApp.GetHierarchy(sectionId, HierarchyScope.hsPages, out hierarchy);
File.WriteAllText(#"C:\hierarchy.txt", hierarchy);
}
}
From here I can parse the xml to find all the pageIds and I am good to go.
The problem, however, is that I want to do this with files I am getting from somebody else and have never opened before. When I run the same code on those files, I cannot find any pageIds in the hierarchy, and therefore, I cannot process any pages. A fix that seems to work is to use the navigateTo method to open the section file in OneNote before trying to get the hierarchy.
...
string sectionId;
onenoteApp.OpenHierarchy(filepath, null, out sectionId);
onenoteApp.NavigateTo(sectionId);
string hierarchy
...
This, however, is quite annoying as I need to open the OneNote application. Since I have many .one section files to process it would be a lot of random information flashing across the screen which is not necessary and might confuse the end users of my program. Is there a way I can achieve the same result of adding pageIds to the hierarchy without needing to open the OneNote Application? At the very least, is there a way I can hide the application?
UPDATE:
I just noticed that using the Publish command also updates the hierarchy with pageIds, however, this solution is still not ideal as it requires me to make anotehr file.
Also, looking more closely at the xml export, I saw that there is a an attribute called "areAllPagesAvailable" which is set to false for me on all the files I have yet to open in OneNote.
WooHoo! After a couple hours of just playing around and Google Searching the different methods, I have found what I am after.
Solution: SyncHierarchy(sectionId);
...
string sectionId;
onenoteApp.OpenHierarchy(onenoteFile, null, out sectionId, CreateFileType.cftSection);
onenoteApp.SyncHierarchy(sectionId);
string hierarchy;
onenoteApp.GetHierarchy(sectionId, HierarchyScope.hsPages, out hierarchy);
...

End page background working, WP8, C#

I dont know if it is even possible, but is there some way how to "end" page in Windows Phone 8 app?
My problem is that i am using one delegate (to know when is my xml downloaded) on multiple pages. It works fine, but when i open one page, she initialize herself, i go on other page (trough back button) and new page initialize herself too. Everything is fine, but the previous page is still listening to the delegate and it is really big problem. So i need to get the previous page (that closed) into a same state like she was not ever opened.
I will be thankful for any advice (maybe i am thinking in wrong way now, i dont know, maybe the page just have to be de-initialize).
PS: If its necessary i will post the code, but i think it is not. :)
Okey here is some code:
In class whis is downloading XML i have delegate like this:
public delegate void delDownloadCompleted();
public static event delDownloadCompleted eventDownloadCompleted;
This class is downloading few different xml files depends of constructor in run(int number) method.
After is download complete and all information from xml are saved in my local list i call delegateCompled. if (eventDownloadCompleted != null)
{
eventDownloadCompleted();
}
Then i have few different pages. All pages are used for display specific data from downloaded xml. So on this specific page I have method that is fired when "downloadClass" says it is complet.
XML_DynamicDataChat.delDownloadCompleted delegMetoda = new XML_DynamicDataChat.delDownloadCompleted(inicialiyaceListu);
XML_DynamicDataChat.eventDownloadCompleted += delegMetoda;
This is that "inicializaceListu" method:
private void inicialiyaceListu()
{
Dispatcher.BeginInvoke(() =>
{
model = new datka();
// object model is just model where i am saving all specific list of informations that i got from xml files.
chatList9 = model.getChat(1);
gui_listNovinky.ItemsSource = chatList9;
gui_loadingGrid.Visibility = Visibility.Collapsed;
});
}
All of these works fine, but when i go back (with back button) and open other specific page with other specific information from other downloaded xml, previous page is still listening for the delegate and inicialiyaceListu() method is still fired everytime i complete download of xml.
So i need to say previous page something like: "hey page, you are now closed! Can you shut the **** up and stop work?!?"
I think that specific delegate for each pages could solve this, but it is not correct programing way.
I solved it nice and easy. It is really simple solution. I just created bool variable and set it false when i go back. In inicializaceListu() i have condition if it is true. If it is true do that stuffs when false do nothing.

Click first email using selenium

I am trying to login to hotmail, and click the first email that is shown in the emails box.
I get a problem, when I'm trying to find the first email.
I've tried to get it in a lot of different ways, but I've wanted to make a test just by finding all the list items li, so I've made a simple function:
public void clickFirstEmail()
{
var lis = driver.FindElements(By.TagName("li")); //this is raising the exception
foreach (var li in lis)
{
MessageBox.Show(li.Text);
}
}
Whenever I try to access some elements, I get this exception: Permission denied to access property '__qosId'
I've seen some answers here on stackoverflow, but I guess they are right when you are running a selenium server.
I start my driver like this:
driver = new FirefoxDriver();
Any ideas on how to get this right ?
The whole plan is to click the first email.
UPDATE: I've tried one more time, and it worked, but it gave me this error now: Element is no longer attached to the DOM on the messagebox.show line.
I am thinking that the page (javascript) is constantly loading/changing new stuff, so what could I do about this ?

How can you programmatically detect if javascript is enabled/disabled in a Windows Desktop Application? (WebBrowser Control)

I have an application which writes HTML to a WebBrowser control in a .NET winforms application.
I want to detect somehow programatically if the Internet Settings have Javascript (or Active Scripting rather) disabled.
I'm guessing you need to use something like WMI to query the IE Security Settings.
EDIT #1: It's important I know if javascript is enabled prior to displaying the HTML so solutions which modify the DOM to display a warning or that use tags are not applicable in my particular case. In my case, if javascript isn't available i'll display content in a native RichTextBox instead and I also want to report whether JS is enabled back to the server application so I can tell the admin who sends alerts that 5% or 75% of users have JS enabled or not.
Thanks to #Kickaha's suggestion. Here's a simple method which checks the registry to see if it's set. Probably some cases where this could throw an exception so be sure to handle them.
const string DWORD_FOR_ACTIVE_SCRIPTING = "1400";
const string VALUE_FOR_DISABLED = "3";
const string VALUE_FOR_ENABLED = "0";
public static bool IsJavascriptEnabled( )
{
bool retVal = true;
//get the registry key for Zone 3(Internet Zone)
Microsoft.Win32.RegistryKey key = Registry.CurrentUser.OpenSubKey(#"Software\Microsoft\Windows\CurrentVersion\Internet Settings\Zones\3", true);
if (key != null)
{
Object value = key.GetValue(DWORD_FOR_ACTIVE_SCRIPTING, VALUE_FOR_ENABLED);
if( value.ToString().Equals(VALUE_FOR_DISABLED) )
{
retVal = false;
}
}
return retVal;
}
Note: in the interest of keep this code sample short (and because I only cared about the Internet Zone) - this method only checks the internet zone. You can modify the 3 at end of OpenSubKey line to change the zone.
If you are having troubles with popups popping up, i've included a solution for you, and if you want to disable/enable javascript on th client machine (or even just read/query if it is enabled/disabled) ive included that answer for you as well, here we go:
Which popup message do you want to disable? If it's the alert message, try this, obviously resolving the window or frame object to your particular needs, I’ve just assumed top-level document, but if you need an iframe you can access it using window.frames(0). for the first frame and so on... (re the JavaScript part)... here is some code, assuming WB is your webbrowser control...
WB.Document.parentWindow.execScript "window.alert = function () { };", "JScript"
You must run the above code only after the entire page is done loading, i understand this is very difficult to do (and a full-proof version hasn't been published yet) however I have been doing it (full proof) for some time now, and you can gather hints on how to do this accurately if you read some of my previous answers labelled "webbrowser" and "webbrowser-control", but getting back to the question at hand, if you want to cancel the .confirm JavaScript message, just replace window.alert with window.confirm (of course, qualifying your window. object with the correct object to reach the document hierarchy you are working with). You can also disable the .print method with the above technique and the new IE9 .prompt method as well.
If you want to disable JavaScript entirely, you can use the registry to do this, and you must make the registry change before the webbrowser control loads into memory, and every time you change it (on & off) you must reload the webbrowser control out and into memory (or just restart your application).
The registry key is \HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\Zones\ - the keyname is 1400 and the value to disable it is 3, and to enable it is 0.
Of course, because there are 5 zones under the Zones key, you need to either change it for the active zone or for all zones to be sure. However, you really don't need to do this if all you want to do is supress js dialog popup messages.
Let me know how you go, and if I can help further.
Here is a suggestion - Encode the warning into your webpage as default. Create a javascript that runs on page load which removes that element. The warning will be there when ever javascript is not allowed to run.
It's a long while since I coded client side HTML javascript to interact with the DOM so I may be a little out of date. i.e. you will need to fix details, but I hope I get the general idea across.
<script>
document.getElemntByID("noJavascriptWarning").innerHTML="";
</script>
and in your HTML body
<DIV id="noJavascriptWarning" name="noJavaScriptWarning">Please enable Javascript</DIV>

Categories

Resources