I am scraping the web page and navigating to correct location, however as being a new to the whole c# world I am stuck with downloading pdf file.
Link is hiding behind this
var reportDownloadButton = driver.FindElementById("company_report_link");
It is something like: www.link.com/key/489498-654gjgh6-6g5h4jh/link.pdf
How to download the file to C:\temp\?
Here is my code:
using System.Linq;
using OpenQA.Selenium.Chrome;
namespace WebDriverTest
{
class Program
{
static void Main(string[] args)
{
var chromeOptions = new ChromeOptions();
chromeOptions.AddArguments("headless");
// Initialize the Chrome Driver // chromeOptions
using (var driver = new ChromeDriver(chromeOptions))
{
// Go to the home page
driver.Navigate().GoToUrl("www.link.com");
driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
// Get the page elements
var userNameField = driver.FindElementById("loginForm:username");
var userPasswordField = driver.FindElementById("loginForm:password");
var loginButton = driver.FindElementById("loginForm:loginButton");
// Type user name and password
userNameField.SendKeys("username");
userPasswordField.SendKeys("password");
// and click the login button
loginButton.Click();
driver.Navigate().GoToUrl("www.link2.com");
driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
var reportSearchField = driver.FindElementByClassName("form-control");
reportSearchField.SendKeys("Company");
var reportSearchButton = driver.FindElementById("search_filter_button");
reportSearchButton.Click();
var reportDownloadButton = driver.FindElementById("company_report_link");
reportDownloadButton.Click();
EDIT:
EDIT 2:
I am not the sharpest pen on Stackoverflow community yet. I don't understand how to do it with Selenium. I have done it with
var reportDownloadButton = driver.FindElementById("company_report_link");
var text = reportDownloadButton.GetAttribute("href");
// driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
WebClient client = new WebClient();
// Save the file to desktop for debugging
var desktop = System.Environment.GetFolderPath(System.Environment.SpecialFolder.Desktop);
string fileName = desktop + "\\myfile.pdf";
client.DownloadFile(text, fileName);
However web page seems to be a little bit tricky. I am getting
System.Net.WebException: 'The remote server returned an error: (401)
Unauthorized.'
Debugger pointing at:
client.DownloadFile(text, fileName);
I think it should really simulate Right click and Save Link As, otherwise this download will not work. Also if I just click on button, it opens PDF in new Chrome tab.
EDIT3:
Should it be like this?
using System.Linq;
using OpenQA.Selenium.Chrome;
namespace WebDriverTest
{
class Program
{
static void Main(string[] args)
{
// declare chrome options with prefs
var options = new ChromeOptionsWithPrefs();
options.AddArguments("headless"); // we add headless here
// declare prefs
options.prefs = new Dictionary<string, object>
{
{ "download.default_directory", downloadFilePath }
};
// declare driver with these options
//driver = new ChromeDriver(options); we don't need this because we already declare driver below.
// Initialize the Chrome Driver // chromeOptions
using (var driver = new ChromeDriver(options))
{
// Go to the home page
driver.Navigate().GoToUrl("www.link.com");
driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
// Get the page elements
var userNameField = driver.FindElementById("loginForm:username");
var userPasswordField = driver.FindElementById("loginForm:password");
var loginButton = driver.FindElementById("loginForm:loginButton");
// Type user name and password
userNameField.SendKeys("username");
userPasswordField.SendKeys("password");
// and click the login button
loginButton.Click();
driver.Navigate().GoToUrl("www.link.com");
driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
var reportSearchField = driver.FindElementByClassName("form-control");
reportSearchField.SendKeys("company");
var reportSearchButton = driver.FindElementById("search_filter_button");
reportSearchButton.Click();
driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
driver.Navigate().GoToUrl("www.link.com");
// click the link to download
var reportDownloadButton = driver.FindElementById("company_report_link");
reportDownloadButton.Click();
// if clicking does not work, get href attribute and call GoToUrl() -- this may trigger download
var href = reportDownloadButton.GetAttribute("href");
driver.Navigate().GoToUrl(href);
}
}
}
}
}
You could try setting the download.default_directory Chrome driver preference:
// declare chrome options with prefs
var options = new ChromeOptionsWithPrefs();
// declare prefs
options.prefs = new Dictionary<string, object>
{
{ "download.default_directory", downloadFilePath }
};
// declare driver with these options
driver = new ChromeDriver(options);
// ... run your code here ...
// click the link to download
var reportDownloadButton = driver.FindElementById("company_report_link");
reportDownloadButton.Click();
// if clicking does not work, get href attribute and call GoToUrl() -- this may trigger download
var href = reportDownloadButton.GetAttribute("href");
driver.Navigate().GoToUrl(href);
If reportDownloadButton is a link that triggers a download, then the file should download to the filePath you have set in download.default_directory.
Neither of these threads are in C#, but they speak of a similar issue:
How to control the download of files with Selenium + Python bindings in Chrome
How to use chrome webdriver in selenium to download files in python?
You can use WebClient.DownloadFile for that.
Related
I'm trying to launch Tor browser via puppeteer-sharp. I am using .net core 3.1 console application and latest version of puppeteer-sharp. So far the given the executable path console application launches the Tor Browser with an exception.
using PuppeteerSharp;
using System.Threading;
using System.Threading.Tasks;
namespace puppeteer_tor
{
internal class Program
{
static async Task Main(string[] args)
{
string enableAutomation = "--enable-automation";
string noSandBox = "--no-sandbox";
string disableSetUidSandBox = "--disable-setuid-sandbox";
string[] argumentsWithoutExtension = new string[] { "C:\\Users\\selaka.nanayakkara\\Desktop\\Tor Browser\\Browser\\TorBrowser\\Data\\profile.default", "--proxy-server=socks5://127.0.0.1:9050", "--disable-gpu", "--disable-dev-shm-usage", enableAutomation, disableSetUidSandBox, noSandBox };
var options = new LaunchOptions
{
Headless = false,
ExecutablePath = #"C:\Users\selaka.nanayakkara\Desktop\Tor Browser\Browser\firefox.exe",
Args = argumentsWithoutExtension
};
using (var browser = await Puppeteer.LaunchAsync(options))
{
Thread.Sleep(5000);
var page = await browser.NewPageAsync();
await page.GoToAsync("https://check.torproject.org/");
var element = await page.WaitForSelectorAsync("h1");
var text = element.ToString();
}
}
}
}
The browser launches with an issue and gives me the exception of :
Failed to launch browser!
With the below screen of the Tor browser :
Your help is much appreciated in the above issue. Thanks in advance.
Please find the attach code base here.
Set the Headless to true nad try
var options = new LaunchOptions
{
Headless = true,
ExecutablePath = #"C:\Program Files\Mozilla Firefox\firefox.exe",
Args = argumentsWithoutExtension
};
After many pitfalls I was able to find the puppeteer-sharp to work along with Tor Browser. For anyone who is interested please find the below code attached here with :
using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;
using PuppeteerSharp;
using System;
using System.Threading;
using System.Threading.Tasks;
namespace puppeteer_tor
{
internal class Program
{
static async Task Main(string[] args)
{
// Initiating Browser configuration
Console.WriteLine("Intiating Tor Browser");
Browser browser = (Browser)await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = false,
ExecutablePath = #"C:\Users\selaka.nanayakkara\Desktop\Tor Browser\Browser\firefox.exe",
Product = Product.Firefox,
UserDataDir = #"C:\Users\selaka.nanayakkara\Desktop\Tor Browser\Browser\TorBrowser\Data\profile.default",
DefaultViewport = null,
IgnoreHTTPSErrors = true,
Args = new[] { "-wait-for-browser" }
});
// Enabling prxoy connectivilty
Console.WriteLine("Intiating Tor proxy");
var page = await browser.PagesAsync();
Page page1 =(Page)page[0];
await page1.ClickAsync("#connectButton");
// Loading geoblocked url.
Console.WriteLine("Navigating to the URL");
Page page3 =(Page)await browser.NewPageAsync();
page3.DefaultNavigationTimeout = 0;
await page3.GoToAsync("http://nebraskalegislature.gov/laws/browse-chapters.php?chapter=20");
// Fetching content from the page.
Console.WriteLine("Fetching content in the URL.");
var content = await page3.GetContentAsync();
Console.WriteLine("Content fetching completed! ");
// Closing Browser
Console.WriteLine("Closing browser.");
await browser.CloseAsync();
}
}
}
Sample git repository : https://github.com/SelakaKithmal/puppeteer-tor
I'm using Selenium ChromeDriver to navigate to pages and it works fine, but on second request, I get intercepted by Incapsula.
If I dispose of the driver everytime, it works though.
Here's the current code:
var chromeOptions = new ChromeOptions();
chromeOptions.AddArguments(new List<string>() { "headless" });
var chromeDriverService = ChromeDriverService.CreateDefaultService();
ChromeDriver driver = new ChromeDriver(chromeDriverService, chromeOptions);
The code below is in a loop which iterates over many records
//extract json variable from page output
ResultModel resultModel = new ResultModel();
driver = new ChromeDriver(chromeDriverService, chromeOptions);
driver.Navigate().GoToUrl($"https://www.website.ca{resultUrl}");
var modelString = driver.ExecuteScript("return JSON.stringify(window.the_variable);", new object[] { });
if (modelString != null)
resultModel = JsonConvert.DeserializeObject<ResultModel>(modelString.ToString());
driver.Dispose();
So this works, but disposing and re-creating the driver everytime slows the process quite a bit.
When I try to simply Navigate to the next page, after the first request, I get intercepted.
What is happening exactly when I'm disposing and recreating ? Could I spoof that without actually doing this ?
Clearing the cookies seemed to have helped:
driver.ExecuteChromeCommand("Network.clearBrowserCookies", new Dictionary<string, object>() );
In my MVC web application, I'm using selenium C# web driver to read some data from HTML file. my application works properly when I execute my application through VS(HTML file opening through chrome and reading HTML properly). But after I publish and host application in IIS server HTML file not opening through the chrome browser. (browser not opening), here is my code.
public class CribController : Controller
{
public ActionResult Index()
{
try
{
IWebDriver driver = new ChromeDriver(#"C:\Selenium\");
driver.Navigate().GoToUrl("D:/Crib/toEdit_Foramted V2.html");
string text = driver.Title;
var table = driver.FindElement(By.Id("reportcontainerstyle-Ver2"));
var rowsss = table.FindElements(By.TagName("tr"));
//To get days arrears details
var mainTable = driver.FindElement(By.Name("ConsumerCreditDetails_Version3"));
var subTables = mainTable.FindElements(By.Id("bandstyle-Ver2"));
var rows = driver.FindElements(By.XPath("//table[.//td[normalize-space(.)='Credit Facility (CF) Details']][1]/following-sibling::table[1]//tr[not(#type='table-header')]"));
foreach (IWebElement row in rows)
{
//Some logic here
}
Thread.Sleep(3000);
driver.Close();
}
catch (Exception ex)
{
Logger.LogWriter("WebApplication2.Controllers", ex, "CribController", "Index");
Console.WriteLine(ex);
}
return View();
}
}
Why this not working after publishing. how can I solve this?
I think we need more context about the error it throws you.
There's a similar question the Selenium GitHub Repository and this was the response https://github.com/seleniumhq/selenium/issues/1125#issuecomment-257258747
You can declare the driver like this:
var driverService = ChromeDriverService.CreateDefaultService();
driverService.HideCommandPromptWindow = true;
var options = new ChromeOptions();
options.AddArguments(new List<string> { { "start-maximized" } });
IWebDriver driver;
driver = new ChromeDriver(driverService, options);
Other way that might help is: In the same solution, try to create a Console Application for the Selenium code and executions, calling its constructor from the controller (of the MVC project).
i am trying to change chrome default homepage (google tabs) but i didn' t find a working solution.
What i have tried:
var _options = new ChromeOptions();
_options.AddUserProfilePreference("homepage", "http://www.example.com");
_options.AddUserProfilePreference("homepage_is_newtabpage", true);
_options.AddUserProfilePreference("session.restore_on_startup", 4);
_options.AddUserProfilePreference("session.startup_urls", new List<string>() { "http://in.gr"});
_options.AddArgument("--homepage=http://in.gr");
var _driver = new ChromeDriver(_options);
You can navigate to the page you want to after initializing the WebDriver before you begin the rest of your web automation.
...
var _driver = new ChromeDriver(_options);
_driver.Navigate().GoToUrl("http://in.gr");
Hope that is of some help.
I have problems with setting the default download folder for chrome driver.
I found some information related to this but none of it is working.
This is what I've tried:
var options = new ChromeOptionsWithPrefs();
options.AddArguments("start-maximized");
options.prefs = new Dictionary<string, object> {
{ "download.default_directory", folderName },
{ "download.prompt_for_download", false },
{ "intl.accept_languages", "nl" }};
webdriver = new ChromeDriver(chromedriver_path, options);
and
var options = new ChromeOptions();
options.AddUserProfilePreference("download.default_directory", folderName);
options.AddUserProfilePreference("intl.accept_languages", "nl");
options.AddUserProfilePreference("download.prompt_for_download", "false");
I am using chrome driver 2.9(latest one) and chrome version 33.
Also tried to set a default directory for chrome and when I start the web-driver I expect that the default directory to be change but I did not work as well.
Do you have any new idea how I can set the this default folder?
Edit: adding declaration:
string folderName = #"C:\Browser";
I was running into trouble doing this with ChromeDriver 2.24 and Selenium 3.0.
For me the following code worked:
var service = ChromeDriverService.CreateDefaultService(driverPath);
var downloadPrefs = new Dictionary<string, object>
{
{"default_directory", #"C:\Users\underscore\MyCustomLocation"},
{"directory_upgrade", true}
};
var options = new ChromeOptions();
options.AddUserProfilePreference("download", downloadPrefs);
return new ChromeDriver(service, options);
Hopefully this helps anyone trying to do it now.
In case it changes in future; I verified the required format by opening my default Chrome preferences file. The location of this file can be found by browsing to chrome://version and opening the Preferences file at the location specified by Profile Path. This showed that the default "download" key has an object with these values.
I could then check the changes were applied by opening the preferences file used by the Selenium Chrome browser (again by checking the location from chrome://version).
Edit 2
Similarly in order to disable the inbuilt Chrome PDF Viewer which was blocking file downloads, I added the following lines to the configuration:
var pdfViewerPlugin = new Dictionary<string, object>
{
["enabled"] = false,
["name"] = "Chrome PDF Viewer"
};
var pluginsList = new Dictionary<string, object>
{
{ "plugins_list", new [] { pdfViewerPlugin } }
};
var downloadPreferences = new Dictionary<string, object>
{
{"default_directory", launchOptions.DownloadFolder},
{"directory_upgrade", true}
};
var options = new ChromeOptions();
options.AddUserProfilePreference("download", downloadPreferences);
options.AddUserProfilePreference("plugins", pluginsList);
Firefox
Since I wasted another hour on this today, here is the configuration for Firefox (49+) running the same version of Selenium (Note: this won't work with GeckoDriver 0.10.0 and Selenium 3.0.0+, GeckoDriver must be version 0.11.1):
var path = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "GeckoBinary");
var service = FirefoxDriverService.CreateDefaultService(path);
service.HideCommandPromptWindow = true;
var profile = new FirefoxProfile();
profile.SetPreference("browser.download.dir", myDownloadLocation);
profile.SetPreference("browser.download.downloadDir", myDownloadLocation);
profile.SetPreference("browser.download.defaultFolder", myDownloadLocation);
profile.SetPreference("browser.helperApps.neverAsk.saveToDisk", ContentTypes.AllTypesSingleLine);
profile.SetPreference("pdfjs.disabled", true);
profile.SetPreference("browser.download.useDownloadDir", true);
profile.SetPreference("browser.download.folderList", 2);
return new FirefoxDriver(service, new FirefoxOptions
{
Profile = profile
}, TimeSpan.FromMinutes(5));
Where ContentTypes.AllTypesSingleLine is just a string containing mime types, e.g.:
application/pdf;application/excel;...
As of GeckoDriver 0.11.1 and Selenium 3.0.1 this can be simplified to:
var options = new FirefoxOptions();
options.SetPreference("browser.download.dir", launchOptions.DownloadFolder);
options.SetPreference("browser.download.downloadDir", launchOptions.DownloadFolder);
options.SetPreference("browser.download.defaultFolder", launchOptions.DownloadFolder);
options.SetPreference("browser.helperApps.neverAsk.saveToDisk", ContentTypes.AllTypesSingleLine);
options.SetPreference("pdfjs.disabled", true);
options.SetPreference("browser.download.useDownloadDir", true);
options.SetPreference("browser.download.folderList", 2);
return new FirefoxDriver(service, options, TimeSpan.FromMinutes(5));