How to find broken links of a webiste with Selenium C#

How to find broken links of a webiste with Selenium C# - c#

I am very new to Selenium C#. I was searching for ways to find broken links of a website using Selenium C#. I could find a handful of solutions for Java Selenium but I was unable to find using Selenium C#. It would be really helpful if anyone could post a small snippet of code for the same or any link to any document so that I could refer and follow it. Thanks in advance.

You can try iterate over list of 'a' tags and check for 200 OK in http request:
IList<IWebElement> links = driver.FindElements(By.TagName("a"));
foreach (IWebElement link in links)
{
var url = link.getAttribute("href");
IsLinkWorking(url);
}
bool IsLinkWorking(string url) {
HttpWebRequest request = (HttpWebRequest) HttpWebRequest.Create(url);
//You can set some parameters in the "request" object...
request.AllowAutoRedirect = true;
try {
HttpWebResponse response = (HttpWebResponse) request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Console.WriteLine("\r\nResponse Status Code is OK and
StatusDescription is: {0}", response.StatusDescription);
// Releases the resources of the response.
response.Close();
return true;
}
else
{
return false;
}
} catch { //TODO: Check for the right exception here
return false;
}
}

IWebDriver webDriver = new ChromeDriver();
webDriver.Navigate().GoToUrl("https://www.google.co.in/maps/");
HttpWebRequest req = null;
var urls = Driver.FindElements(By.TagName("a"));
foreach (var url in urls)
{
if (!(url.Text.Contains("Email") || url.Text == ""))
{
req = (HttpWebRequest)WebRequest.Create(url.GetAttribute("href"));
try
{
var response = (HttpWebResponse)re.GetResponse();
System.Console.WriteLine($"URL: {url.GetAttribute("href")} status is :{response.StatusCode}");
}
catch (WebException e)
{
var errorResponse = (HttpWebResponse)e.Response;
System.Console.WriteLine($"URL: {url.GetAttribute("href")} status is :{errorResponse.StatusCode}");
}
}
}

Related

Check if a.txt file exists or not on a remote webserver

I'm trying to check if .txt file is exists or not from web url. This is my code:
static public bool URLExists(string url)
{
bool result = false;
WebRequest webRequest = WebRequest.Create(url);
webRequest.Timeout = 1200; // miliseconds
webRequest.Method = "HEAD";
HttpWebResponse response = null;
try
{
response = (HttpWebResponse)webRequest.GetResponse();
result = true;
}
catch (WebException webException)
{
//(url + " doesn't exist: " + webException.Message);
}
finally
{
if (response != null)
{
response.Close();
}
}
return result;
}
If i enter "http://www.example.com/demo.txt" is not a valid file path and website showing 404 error page then this code return true. How to solve this problem. Thanks in advance.

Use the StatusCode property of the HttpWebResponse object.
response = (HttpWebResponse)webRequest.GetResponse();
if(response.StatusCode == HttpStatusCode.NotFound)
{
result = false;
}
else
{
result = true;
}
Look through the list of possible status codes to see which ones you want to interpret as the file not existing.

Windows Forms Bad Request when Trying to get Response from website

I have a list of URLs, and the meaning of this is that I am checking our websites if anyone is down / offline we would get a notification and that works except some of the URLs crash at this line
HttpWebResponse httpRes = (HttpWebResponse)httpReq.GetResponse();
But the rest is working just fine? can anyone tell me what I'm doing wrong? I've tried URLs with HTTPS, HTTP and even with only www...
public void CheckUrl()//List Of URLs
{
List<string> urls = new List<string>() {
"https//:www.example.com/something1/buy",
"https//:www.example.com/something2/buy",
"https//:www.example.com/something3/buy",
"https//:www.example.com/something4/buy",
};
//walks through all the URL:s
foreach (var url in urls)
{
//Creating URL Request
HttpWebRequest httpReq = (HttpWebRequest)WebRequest.Create(url);
httpReq.AllowAutoRedirect = false;
try
{
WebClient client = new WebClient();
string downloadString = client.DownloadString(url);
//Trying to find a response
HttpWebResponse httpRes = (HttpWebResponse)httpReq.GetResponse();
if ((httpRes.StatusCode != HttpStatusCode.OK || httpRes.StatusCode != HttpStatusCode.Found) && downloadString.Contains("404 -") || downloadString.Contains("Server Error"))
{
// Code for NotFound resources goes here.
SendVerificationLinkEmail(httpRes.StatusCode.ToString(), url);
foreach (var number in Numbers)
{
SendSms(url, number);
}
}
//Close the response.
httpRes.Close();
}
catch(Exception e)
{
//sending only to admin to check it out first
SendExeptionUrl(url);
foreach (var number in Numbers)
{
SendSms(url, number);
}
}
}
Application.Exit();
}

WebException when loading rss feed

I am attempting to load a page I've received from an RSS feed and I receive the following WebException:
Cannot handle redirect from HTTP/HTTPS protocols to other dissimilar ones.
with an inner exception:
Invalid URI: The hostname could not be parsed.
I wrote a code that would attempt loading the url via an HttpWebRequest. Due to some suggestions I received, when the HttpWebRequest fails I then set the AllowAutoRedirect to false and basically manually loop through the iterations of redirect until I find out what ultimately fails. Here's the code I'm using, please forgive the gratuitous Console.Write/Writeline calls:
Uri url = new Uri(val);
bool result = true;
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(url);
string source = String.Empty;
Uri responseURI;
try
{
using (System.Net.WebResponse webResponse = req.GetResponse())
{
using (HttpWebResponse httpWebResponse = webResponse as HttpWebResponse)
{
responseURI = httpWebResponse.ResponseUri;
StreamReader reader;
if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
{
reader = new StreamReader(new GZipStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
{
reader = new StreamReader(new DeflateStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else
{
reader = new StreamReader(httpWebResponse.GetResponseStream());
}
source = reader.ReadToEnd();
reader.Close();
}
}
req.Abort();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(source);
result = true;
}
catch (ArgumentException ae)
{
Console.WriteLine(url + "\n--\n" + ae.Message);
result = false;
}
catch (WebException we)
{
Console.WriteLine(url + "\n--\n" + we.Message);
result = false;
string urlValue = url.ToString();
try
{
bool cont = true;
int count = 0;
do
{
req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(urlValue);
req.Headers.Add("Accept-Language", "en-us,en;q=0.5");
req.AllowAutoRedirect = false;
using (System.Net.WebResponse webResponse = req.GetResponse())
{
using (HttpWebResponse httpWebResponse = webResponse as HttpWebResponse)
{
responseURI = httpWebResponse.ResponseUri;
StreamReader reader;
if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
{
reader = new StreamReader(new GZipStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
{
reader = new StreamReader(new DeflateStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else
{
reader = new StreamReader(httpWebResponse.GetResponseStream());
}
source = reader.ReadToEnd();
if (string.IsNullOrEmpty(source))
{
urlValue = httpWebResponse.Headers["Location"].ToString();
count++;
reader.Close();
}
else
{
cont = false;
}
}
}
} while (cont);
}
catch (UriFormatException uriEx)
{
Console.WriteLine(urlValue + "\n--\n" + uriEx.Message + "\r\n");
result = false;
}
catch (WebException innerWE)
{
Console.WriteLine(urlValue + "\n--\n" + innerWE.Message+"\r\n");
result = false;
}
}
if (result)
Console.WriteLine("testing successful");
else
Console.WriteLine("testing unsuccessful");
Since this is currently just test code I hardcode val as http://rss.nytimes.com/c/34625/f/642557/s/3d072012/sc/38/l/0Lartsbeat0Bblogs0Bnytimes0N0C20A140C0A70C30A0Csarah0Ekane0Eplay0Eamong0Eofferings0Eat0Est0Eanns0Ewarehouse0C0Dpartner0Frss0Gemc0Frss/story01.htm
the ending url that gives the UriFormatException is: http:////www-nc.nytimes.com/2014/07/30/sarah-kane-play-among-offerings-at-st-anns-warehouse/?=_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&partner=rss&emc=rss&_r=6&
Now I'm sure if I'm missing something or if I'm doing the looping wrong, but if I take val and just put that into a browser the page loads fine, and if I take the url that causes the exception and put it in a browser I get taken to an account login for nytimes.
I have a number of these rss feed urls that are resulting in this problem. I also have a large number of these rss feed urls that have no problem loading at all. Let me know if there is any more information needed to help resolve this. Any help with this would be greatly appreciated.
Could it be that I need to have some sort of cookie capability enabled?

You need to keep track of the cookies while doing all your requests. You can use an instance of the CookieContainer class to achieve that.
At the top of your method I made the following changes:
Uri url = new Uri(val);
bool result = true;
// keep all our cookies for the duration of our calls
var cookies = new CookieContainer();
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(url);
// assign our CookieContainer to the new request
req.CookieContainer = cookies;
string source = String.Empty;
Uri responseURI;
try
{
And in the exception handler where you create a new HttpWebRequest, you do the assignment from our CookieContainer again:
do
{
req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(urlValue);
// reuse our cookies!
req.CookieContainer = cookies;
req.Headers.Add("Accept-Language", "en-us,en;q=0.5");
req.AllowAutoRedirect = false;
using (System.Net.WebResponse webResponse = req.GetResponse())
{
This makes sure that on each successive call the already present cookies are resend again in the next request. If you leave this out, no cookies are sent and therefore the site you try to visit assumes you are a fresh/new/unseen user and gives you a kind of authentication path.
If you want to store/keep cookies beyond this method you could move the cookie instance variable to a static public property so you can use all those cookies program-wide like so:
public static class Cookies
{
static readonly CookieContainer _cookies = new CookieContainer();
public static CookieContainer All
{
get
{
return _cookies;
}
}
}
And to use it in a WebRequest:
var req = (System.Net.HttpWebRequest) WebRequest.Create(url);
req.CookieContainer = Cookies.All;

Uri.IsWellFormedUriString returns true, but cannot read from a url

I am trying to check if the url http://master.dev.brandgear.net is valid by the following method:
private bool UrlIsValid(string url)
{
using (var webClient = new WebClient())
{
bool response;
try
{
webClient.UseDefaultCredentials = true;
using (Stream strm = webClient.OpenRead(url))
{
response = true;
}
}
catch (WebException we)
{
response = false;
}
return response;
}
}
However, I am getting a web exception "404 not found.". I have checked the uri with Uri.IsWellFormedUriString and it is returning true. However, the same url can be opened through a browser. Any idea how to validate it?

I ran your example with following URL http://master.dev.brandgear.net and exception is also raised. If you open same URL in browser (for example Firefox) and run Firebug plugin, open Network tab you will see error 404 (Page not found). Your code is OK, but server returns 404.

To really get a response, you have to use WebException instead of GetResponse or GetResponseStream methods when the 404 exception happens.Also use HttpWebRequest and HttpWebResponse in these situations for better control,so after the exception occurs you check its state to see if its a ProtocolError and if so get the response from there:
private bool UrlIsValid(string url)
{
bool response = false;
HttpWebResponse rep = null;
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
rep = (HttpWebResponse)request.GetResponse();
}
catch (WebException we)
{
if (we.Status == WebExceptionStatus.ProtocolError)
rep = (HttpWebResponse)we.Response;
}
if (rep != null)
{
try
{
using (Stream strm = rep.GetResponseStream())
{
response = true;
}
}
catch (WebException ex)
{
//no need variable is already false if we didnt succeed.
//response = false;
}
}
return response;
}

HttpWebResonse hangs on multiple request

I've an application that create many web request to donwload the news pages of a web site
(i've tested for many web sites)
after a while I find out that the application slows down in fetching the html source then I found out that HttpWebResonse fails getting the response. I post only the function that do this job.
public PageFetchResult Fetch()
{
PageFetchResult fetchResult = new PageFetchResult();
try
{
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(URLAddress);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Uri requestedURI = new Uri(URLAddress);
Uri responseURI = resp.ResponseUri;
if (Uri.Equals(requestedURI, responseURI))
{
string resultHTML = "";
byte[] reqHTML = ResponseAsBytes(resp);
if (!string.IsNullOrEmpty(FetchingEncoding))
resultHTML = Encoding.GetEncoding(FetchingEncoding).GetString(reqHTML);
else if (!string.IsNullOrEmpty(resp.CharacterSet))
resultHTML = Encoding.GetEncoding(resp.CharacterSet).GetString(reqHTML);
resp.Close();
fetchResult.IsOK = true;
fetchResult.ResultHTML = resultHTML;
}
else
{
URLAddress = responseURI.AbsoluteUri;
relayPageCount++;
if (relayPageCount > 5)
{
fetchResult.IsOK = false;
fetchResult.ErrorMessage = "Maximum page redirection occured.";
return fetchResult;
}
return Fetch();
}
}
catch (Exception ex)
{
fetchResult.IsOK = false;
fetchResult.ErrorMessage = ex.Message;
}
return fetchResult;
}
any solution would greatly appreciate

Fetch function is called recursively and always creates HttpWebRequest but releasing only when url is matched. You have to close request and response in else statement.

I agree with #volody, Also HttpWebRequest already have property called MaximumAutomaticRedirections, which is set to 50, you can set it to 5 to automatically achieve what you are looking for in this code anyway, it will raise exception and that will be handled by your code.
Just set
request.MaximumAutomaticRedirections = 5;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to find broken links of a webiste with Selenium C# - c#

Related

Check if a.txt file exists or not on a remote webserver

Windows Forms Bad Request when Trying to get Response from website

WebException when loading rss feed

Uri.IsWellFormedUriString returns true, but cannot read from a url

HttpWebResonse hangs on multiple request

Categories

Resources