I was asked by a friend to develop a winform app to be able to extract data. I figured it would be easy enough - how wrong I was!
In my winform, I have included a webbrowser control and some buttons. The URL for the webbrowser is http://www.racingpost.com/greyhounds/card.sd and as you can imagine, it is the place to get data for greyhounds. When on the page above, there are a number of links within this area which are specific to a race time. If you click on any of these, it takes you to that race, and its this data that I need to extract. So, my initial thoughts were to get ALL links off the link above, then store them in a list, then just have a button available to take in whatever link it is, and then take the webbrowser to that location. Once there, I can then look to extract the data and store it as needed.
So, in the first instance, I use
//url = link above
wb1.Url = new Uri(url);
grab the data (which are links for each race on that day)
once I have this, use a further button to go to the specific race
wb1.Url = new Uri("http://www.racingpost.com/greyhounds/card.sd#resultday=2015-01-17&raceid=1344640");
then, once there, click another button to capture the data, after which, return to the original link above.
The problem is, it will not go to the location present in the link. BUT, if I click the link manually within the webbrowser, it goes there no problem.
I have looked at the properties of the webbrowser, and these all look fine - although I can't qualify that tbh!
I know if I try to go to the links manually, I can, but if I try to do it through code, it just wont budge. I can only assume I have done something wrong in the code.
Hope some of that makes sense - first posting, so apologies if I made a mess of it. I will provide all code no problem, but cant seem to figure out how to post the code in 'code format'?
//here is the code
public partial class Form1 : Form
{
Uri _url;
public Form1()
{
InitializeComponent();
wb1.Url = new Uri("http://www.racingpost.com/greyhounds/card.sd");
wb1.Navigated +=new WebBrowserNavigatedEventHandler(wb1_Navigated);
}
classmodules.trackUrl tu;
private void btnGrabData_Click(object sender, EventArgs e)
{
classmodules.utility u = new classmodules.utility();
rtb1.Text = u.GetWebData("http://www.racingpost.com/greyhounds/card.sd");
HtmlDocument doc = wb1.Document;
string innerText = (((mshtml.HTMLDocument)(doc.DomDocument)).documentElement).outerHTML;
innerText = Regex.Replace(innerText, #"\r\n?|\n", "");
rtb1.Text = innerText;
tu = new classmodules.trackUrl();
u.splitOLs(ref tu, innerText);
classmodules.StaticUtils su = new classmodules.StaticUtils();
su.SerializeObject(tu, typeof(classmodules.trackUrl)).Save(#"d:\dogsUTL.xml");
classmodules.ExcelProcessor xl = new classmodules.ExcelProcessor();
xl.createExcel(tu);
}
private void wb1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser wb1 = sender as WebBrowser;
this.Text = wb1.Url.ToString();
}
void wb1_Navigated(object sender, WebBrowserNavigatedEventArgs e)
{
_url = e.Url;
}
private void btnGoBack_Click(object sender, EventArgs e)
{
goBack();
}
private void goBack()
{
wb1.Url = new Uri("http://www.racingpost.com/greyhounds/card.sd");
}
private void btnGetRaceData_Click(object sender, EventArgs e)
{
HtmlDocument doc = wb1.Document;
string innerText = (((mshtml.HTMLDocument)(doc.DomDocument)).documentElement).outerHTML;
rtb2.Text = innerText;
}
//###############################
//OK, here is the point where I want to take in the URL and click a button //to instruct the webbrowser to go to that location. I add an initial //counter to 0, and then get the first url from the list, increment the //counter, then when I click the button again, urlNo wil be 1, so then it //tries the second url
int urlNo = 0;
private void btnUseData_Click(object sender, EventArgs e)
{
if (tu.race.Count > urlNo)
{
string url = tu.race[urlNo].url;
wb1.Url = new Uri(url);
lblUrl.Text = url;
urlNo++;
}
else
{
lblUrl.Text = "No More";
}
}
Did you try the Navigate(...) method? In theory, the behavior of Navigate and Url is the same, but I can infer that they behave a bit different.
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.navigate(v=vs.110).aspx
Related
I have a winform app with the following functionality:
Has a multiline textbox that contain one URL on each line - about 30 URLs (each URL is different but the webpage is the same (just the domain is different);
I have another textbox in which I can write a command and a button that sends that command to an input field from the webpage.
I have a WebBrowser controller ( I would like to do all the things in one controller )
The webpage consist of a textbox and a button which I want to be clicked after I insert a command in that textbox.
My code so far:
//get path for the text file to import the URLs to my textbox to see them
private void button1_Click(object sender, EventArgs e)
{
OpenFileDialog fbd1 = new OpenFileDialog();
fbd1.Title = "Open Dictionary(only .txt)";
fbd1.Filter = "TXT files|*.txt";
fbd1.InitialDirectory = #"M:\";
if (fbd1.ShowDialog(this) == DialogResult.OK)
path = fbd1.FileName;
}
//import the content of .txt to my textbox
private void button2_Click(object sender, EventArgs e)
{
textBox1.Lines = File.ReadAllLines(path);
}
//click the button from webpage
private void button3_Click(object sender, EventArgs e)
{
this.webBrowser1.Document.GetElementById("_act").InvokeMember("click");
}
//parse the value of the textbox and press the button from the webpage
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
newValue = textBox2.Text;
HtmlDocument doc = this.webBrowser1.Document;
doc.GetElementById("_cmd").SetAttribute("Value", newValue);
}
Now, how can I add all those 30 URLs from my textbox in the same webcontroller so that I can send the same command to all of the textboxes from all the webpages and then press the button for all of them ?
//EDIT 1
So, I have adapted #Setsu method and I've created the following:
public IEnumerable<string> GetUrlList()
{
string f = File.ReadAllText(path); ;
List<string> lines = new List<string>();
using (StreamReader r = new StreamReader(f))
{
string line;
while ((line = r.ReadLine()) != null)
lines.Add(line);
}
return lines;
}
Now, is this returning what it should return, in order to parse each URL ?
If you want to keep using just 1 WebBrowser control, you'd have to sequentially navigate to each URL. Note, however, that the Navigate method of the WebBrowser class is asynchronous, so you can't just naively call it in a loop. Your best bet is to implement an async/await pattern detailed in this answer here.
Alternatively, you CAN have 30 WebBrowser controls and have each one navigate on its own; this is roughly equivalent to having 30 tabs open in modern browsers. Since each WebBrowser is doing identical work, you can just have 1 DocumentCompleted event written to handle a single WebBrowser, and then hook up the others to the same event. Do note that the WebBrowser control has a bug that will cause it to gradually leak memory, and the only way to solve this is to restart the application. Thus, I would recommend going with the async/await solution.
UPDATE:
Here's a brief code sample of how to do the 30 WebBrowsers way (untested as I don't have access to VS right now):
List<WebBrowser> myBrowsers = new List<WebBrowser>();
public void btnDoWork(object sender, EventArgs e)
{
//This method starts navigation.
//It will call a helper function that gives us a list
//of URLs to work with, and naively create as many
//WebBrowsers as necessary to navigate all of them
IEnumerable<string> urlList = GetUrlList();
//note: be sure to sanitize the URLs in this method call
foreach (string url in urlList)
{
WebBrowser browser = new WebBrowser();
browser.DocumentCompleted += webBrowserDocumentCompleted;
browser.Navigate(url);
myBrowsers.Add(browser);
}
}
private void webBrowserDocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//check that the full document is finished
if (e.Url.AbsolutePath != (sender as WebBrowser).Url.AbsolutePath)
return;
//get our browser reference
WebBrowser browser = sender as WebBrowser;
//get the string command from form TextBox
string command = textBox2.Text;
//enter the command string
browser.Document.GetElementById("_cmd").SetAttribute("Value", command);
//invoke click
browser.Document.GetElementById("_act").InvokeMember("click");
//detach the event handler from the browser
//note: necessary to stop endlessly setting strings and clicking buttons
browser.DocumentCompleted -= webBrowserDocumentCompleted;
//attach second DocumentCompleted event handler to destroy browser
browser.DocumentCompleted += webBrowserDestroyOnCompletion;
}
private void webBrowserDestroyOnCompletion(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//check that the full document is finished
if (e.Url.AbsolutePath != (sender as WebBrowser).Url.AbsolutePath)
return;
//I just destroy the WebBrowser, but you might want to do something
//with the newly navigated page
WebBrowser browser = sender as WebBrowser;
browser.Dispose();
myBrowsers.Remove(browser);
}
I have a WebBrowser control which I dinamically refresh/change url based on user input. I don't want to let the user to navigate, so I set AllowNavigation to false. This seems to be OK, however the below link is still "active":
Close Page
The issue here is: If the user clicks it, and confirms closure in the pop-up window I can't manage WebBrowser anymore. Looks like it is closed though the last page is still visible. Also I can't remove this link as the site is not managed by me.
Disable the control? Nope, I have to allow the user to highlight and copy text from the webpage.
Do I have any other option to disable literally ALL links?
#TaW: here is my code based on yours. So I have to set the url from my code and call a custom one:
button_click()
{
webBrowser1_load_URL("http://website/somecheck.php?compname=" + textBoxHost.Text);
}
Here it is the function:
private void webBrowser1_load_URL(string url)
{
string s = GetDocumentText(url.ToString());
s = s.Replace(#"javascript:window.close()", "");
webBrowser1.AllowNavigation = true;
webBrowser1.DocumentText = s;
}
The rest is exaclty what's in your answer:
private void webBrowser1_DocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
webBrowser1.AllowNavigation = false;
}
public string GetDocumentText(string s)
{
WebBrowser dummy = new WebBrowser(); //(*)
dummy.Url = new Uri(s);
return dummy.DocumentText;
}
Still it's not working. Please help me to spot the issue with my code.
If you have control over the loading of the pages you could grab the pages' text and change the code to disable rogue scripts. The one you showed can simply be deleted. Of course you might have to forsee more than the one..
Obviously this could be eased if you could do without javascript alltogether, but if that is not an option go for those that do real or pseudo-navigation..
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
webBrowser1.AllowNavigation = false;
}
private void loadURL_Click(object sender, EventArgs e)
{
webBrowser1.AllowNavigation = true;
string s = File.ReadAllText(textBox_URL.Text);
s = s.Replace("javascript:window.close()", "");
webBrowser1.DocumentText = s;
}
If the pages are not in the file system, the same trick should work, for instance by loading the URL into a dummy WebBrowser like this:
private void cb_loadURL_Click(object sender, EventArgs e)
{
string s = GetDocumentText(tb_URL.Text);
s = s.Replace("javascript:window.close()", "");
webBrowser1.AllowNavigation = true;
webBrowser1.DocumentText = s;
}
public string GetDocumentText(string s)
{
WebBrowser dummy = new WebBrowser(); //(*)
dummy.Url = new Uri(s);
return dummy.DocumentText;
}
Note: According to this post you can't set the DocumentText quite as freely as one would think; probably a bug.. Instead of creating the dummy each time you can also move the (*) line to class level. Then, no matter how many changes you had to make, you would always have an unchanged version, th user could e.g. save somewhere..
I have a FlowDocumentPageViewer control in my application that programmatically advances through each block and inline element in a FlowDocument (this is because it's part of a typing application and doing so gives visual cues which tell the user what to type). Each time I change the inline element I'm focused on, I want to check what page the inline element is on, and if it's not on the current page, to navigate to the page it is on.
If this is not possible, please suggest any alternate solutions.
Also, if it matters, every inline element I'm dealing with is a Run element.
Are you just trying to automatically navigate to the page? If so we don't need to know the page number and should be able to just use BringIntoView? I'm assuming you have a reference to the block?
The following code navigates to the page the 301st block is on when the button is pressed
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
this.Loaded += new RoutedEventHandler(MainWindow_Loaded);
}
void MainWindow_Loaded(object sender, RoutedEventArgs e)
{
FlowDocument fd = new FlowDocument();
for (int i = 0; i < 1000; i++)
{
fd.Blocks.Add(new Paragraph(new Run(i.ToString())));
}
view.Document = fd;
}
private void Button_Click(object sender, RoutedEventArgs e)
{
(view.Document as FlowDocument) .Blocks.Skip(300).First().BringIntoView();
}
}
If you really, really want the page number you could do:
var previousPage = view.MasterPageNumber;
(view.Document as FlowDocument) .Blocks.Skip(300).First().BringIntoView();
var pageOfControl = view.MasterPageNumber;
view.GoToPage(previousPage);
It didn't flicker or anything in the test app lol! My mate didn't like that though so he suggested:
var ddp = (DynamicDocumentPaginator)view.Document.DocumentPaginator;
var position = ddp.GetObjectPosition(document.Blocks.Skip(300).First());
var page = ddp.GetPageNumber(position);
Just be aware that it is 0 indexed as opposed to the "lol" method which starts at 1
From my app MainPage I go to a website (alternatively: I go to task: send email).
When pressing 'back button' black screen is returned, instead of MainPage. I have tried to find a solution, but have not found one yet. Can anyone help?
I figured out how to solve the problem (which I had but no one else, perhaps!). If you need to make an external link do not do it directly from your Main page. Instead create an intermediate page to go to, from where you link externally (to a website or the email application, for example). Then you come back to this page with the back arrow, and using a logical string, have been here ...have not been here, you can direct back to the main page. See below:
private void Button1_Click(object sender, RoutedEventArgs e)
{
/* instead of putting the code here, you go to another page
EmailComposeTask emailcomposer = new EmailComposeTask();
emailcomposer.To = "Ä;
emailcomposer.Subject = "Customer request";
emailcomposer.Body = "Text:";
emailcomposer.Show();
*/
var b = App.Current as App;
b.Emailat = "email";
NavigationService.Navigate(new Uri("/Page2.xaml",UriKind.Relative));
}
And on this other page:
protected override void OnNavigatedTo(System.Windows.Navigation.NavigationEventArgs e)
{
var c = App.Current as App;
string Howisit = c.Emailat;
if (Howisit == "email")
{
EmailComposeTask emailcomposer = new EmailComposeTask();
emailcomposer.To = "";
emailcomposer.Subject = "Customer request";
emailcomposer.Body = "Text:";
emailcomposer.Show();
var b = App.Current as App;
b.Emailat = "stop";
}
if (Howisit == "stop")
{
NavigationService.Navigate(new Uri("/MainPage.xaml", UriKind.Relative));
}
}
That's it!
I'm having issues trying to get the document title from a WebBrowser in C#. It works fine in VB.NET, but it won't give me any properties in C#.
When I type in MyBrowser.Document., the only options I get are 4 methods: Equals, GetHashCode, GetType, and ToString - no properties.
I think it's because I have to assign the document to a new instance first, but I can't find the HTMLDocument class that exists in VB.NET.
Basically what I'm wanting to do is return the Document.Title each time the WebBrowser loads/reloads a page.
Can someone help please? It will be much appreciated!
Here is the code I have at the moment...
private void Link_Click(object sender, RoutedEventArgs e)
{
WebBrowser tempBrowser = new WebBrowser();
tempBrowser.HorizontalAlignment = HorizontalAlignment.Left;
tempBrowser.Margin = new Thickness(-4, -4, -4, -4);
tempBrowser.Name = "MyBrowser";
tempBrowser.VerticalAlignment = VerticalAlignment.Top;
tempBrowser.LoadCompleted += new System.Windows.Navigation.LoadCompletedEventHandler(tempBrowser_LoadCompleted);
tempTab.Content = tempBrowser; // this is just a TabControl that contains the WebBrowser
Uri tempURI = new Uri("http://www.google.com");
tempBrowser.Navigate(tempURI);
}
private void tempBrowser_LoadCompleted(object sender, EventArgs e)
{
if (sender is WebBrowser)
{
MessageBox.Show("Test");
currentBrowser = (WebBrowser)sender;
System.Windows.Forms.HtmlDocument tempDoc = (System.Windows.Forms.HtmlDocument)currentBrowser.Document;
MessageBox.Show(tempDoc.Title);
}
}
This code doesn't give me any errors, but I never see the second MessageBox. I do see the first one though (the "Test" message), so the program is getting to that code block.
Add reference to Microsoft.mshtml
Add event receiver for LoadCompleted
webbrowser.LoadCompleted += new LoadCompletedEventHandler(webbrowser_LoadCompleted);
Then you will have no problems with document not being loaded in order to read values back out
void webbrowser_LoadCompleted(object sender, NavigationEventArgs e)
{
// Get the document title and display it
if (webbrowser.Document != null)
{
mshtml.IHTMLDocument2 doc = webbrowser.Document as mshtml.IHTMLDocument2;
Informative.Text = doc.title;
}
}
You are not using the Windows Forms WebBrowser control. I think you got the COM wrapper for ieframe.dll, its name is AxWebBrowser. Verify that by opening the References node in the Solution Explorer window. If you see AxSHDocVw then you got the wrong control. It is pretty unfriendly, it just gives you an opaque interface pointer for the Document property. You'll indeed only get the default object class members.
Look in the toolbox. Pick WebBrowser instead of "Microsoft Web Browser".
string title = ((HTMLDocument)MyBrowser.Document).Title
Or
HTMLDocument Doc = (HTMLDocument)MyBrowser.Document.Title ;
string title = doc.Title;
LoadCompleted doesn't fire. You should use Navigated event handler instead of it.
webBrowser.Navigated += new NavigatedEventHandler(WebBrowser_Navigated);
(...)
private void WebBrowser_Navigated(object sender, NavigationEventArgs e)
{
HTMLDocument doc = ((WebBrowser)sender).Document as HTMLDocument;
foreach (IHTMLElement elem in doc.all)
{
(...)
}
// you may have to dispose WebBrowser object on exit
}
Finally works well with:
using System.Windows.Forms;
...
WebBrowser CtrlWebBrowser = new WebBrowser();
...
CtrlWebBrowser.Document.Title = "Hello World";
MessageBox.Show( CtrlWebBrowser.Document.Title );