How to Find Broken Links on Webpage with Selenium WebDriver

As an Automation Tester you get to play with links present on the website. Links are one of the most important part of the webpage. So, it's always important to not leave any broken link on any website. Testing the links manually could be a tough and time taking task. In this post, we'll learn to automate the URL links testing.

1. What is a Broken Link?

A broken link is a URL which is not working or not reachable. There are multiple reasons for it's dis-functioning. There are different http error status codes which the browser shows when the link is broken. These error codes have different meanings. Let's take a look at the different http status codes.
  • 200 - It means success, link is working.
  • 404 - It's the most common one, which mean Page not found.
  • 403 - Authorization is required to access the page. 
  • 400 - It has different meanings, bad request, bad host, timeout etc.
  • 500 - Internal server error.

2. Why a link broke?

A link might not be working due to many reasons, like.,
  • Server is down which is hosting the URL.
  • Might be a human error where wrong URL is inserted in the html code by mistake.

3. How to write Selenium Code to find Broken links

Selenium WebDriver has the ability to find all the links present on the web page and also to check whether they are working or not. Otherwise, it could be a tedious task to check manually that all the links present in a website are working or not. Let's take a look at the logic which we use in our Selenium code:
  1. Get all the links present on the web page on the basis of html anchor page <a>, which is used for creating a link on the webpage.
  2. Store all the links inside a list.
  3. Send http request to each link and verify the response received.
  4. If the response code is 200 then the link is working and if response is other than 200 then link is not working.
Let's divide our solution into two parts:
  1. Write Selenium Code to get all links on a web page
  2. Write code to verify that those links are working or not

3.1. Selenium code to Get All Links from a Web page

public class GetAllLinks {
 
 public static void main(String[] args) {

 // Initialize Webdriver Object (Update your system's path)
 System.setProperty("webdriver.chrome.driver", "D:\\mydir\\chromedriver.exe");
 WebDriver driver = new ChromeDriver();

 driver.get("https://phptravels.com/demo/");

 // Store all link elements (anchor tag elements in html) in a list
 java.util.List links = driver.findElements(By.tagName("a"));

 // Print no. of links stored in list
 System.out.println(links.size());

 for (int i = 1; i<=links.size(); i=i+1)
  {

  // Print text of all the links
  System.out.println(links.get(i).getText());

  }

 }
 
}

Code Explanation:

1. Open the webpage.
2. Create a list of type WebElements' and store all elements with tagname 'a' in it using 'findElements()'
3. Iterate over all the links using list size as it's maximum value.
4. Get the text of the link by using getText() and print it.

Now, you have all the links in the list, you can perform different operations on it and put different checks on it. 

3.2. Write code to Find Broken links on a Webpage

Now we already have the collection of all the links, next task is to check whether those links are broken or not. For this purpose we would use Java's HttpURLConnection library which is present inside java.net package. 
To check whether a URL is working or not, we create a http connection to that URL using HttpURLConnection library and we receive the response code similar to REST APIs. If the response is 200 then the URL is working fine if response code is 400 or greater than 400 then it confirms that URL is broken.
About HttpURLConnection
As the name suggests, A URLConnection with support for HTTP-specific features.
Each HttpURLConnection instance is used to make a single request but the underlying network connection to the HTTP server may be transparently shared by other instances.
Calling the close() methods on the InputStream or OutputStream of an HttpURLConnection  after a request may free network resources associated with this instance but has no effect on any shared persistent connection. 
Calling the disconnect() method may close the underlying socket if a persistent connection is otherwise idle at that time.
Now, it's time to add the verifyLinks() method to our previous code and then our program will be completed.
Selenium Code to find broken links on a Webpage:
package com.techlistic.testscripts;

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.List;

import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;


public class BrokenLinksTest {

public static void main(String[] args) throws IOException {
	// Update your system's path, where Chromedriver.exe is present
	System.setProperty("webdriver.chrome.driver", "D:\\mydir\\chromedriver.exe");

	// Initialize Webdriver Object
	WebDriver driver = new ChromeDriver();
	driver.get("https://phptravels.com/demo/");

	// Store all link elements (anchor tag elements in html) in a list
	List<WebElement> links = driver.findElements(By.tagName("a"));
	System.out.println(links.size());

	// Print no. of links stored in list
	for (int i = 1; i<=links.size(); i=i+1){
		// Print text of all the links
		System.out.println(((WebElement) links.get(i)).getText());

		// Get href attribute
		WebElement elem = links.get(i);
		String linkUrl = elem.getAttribute("href");

		// Call Verify Links method
		verifyLinks(linkUrl);

	}

	// Close WebDriver
	driver.quit();
}

public static void verifyLinks(String websiteLink) throws IOException {
	// Create URL object and pass website link 
	URL url = new URL(websiteLink);

	// Create URL connection and Get the response code
	HttpURLConnection httpURLConnect=(HttpURLConnection)url.openConnection();
	httpURLConnect.setConnectTimeout(5000);
	httpURLConnect.connect();

	// Verify Response code
	if(httpURLConnect.getResponseCode() >= 400){
		System.out.println(websiteLink+" - "
				+httpURLConnect.getResponseMessage()+"is a broken link");
	}    
	//Fetching and Printing the response code obtained
	else{
		System.out.println(websiteLink+" - "+httpURLConnect.getResponseMessage());
	}

	// Disconnect URL Connection
	httpURLConnect.disconnect();
	}

}
Code Explanation:
We have already explained the code for getting links from the webpage, here we'll explain the verifyLinks() method code.
  • In verifyLinks() method, we are receiving a parameter websiteLink, which is the URL to be tested.
  • Then we are creating object of URL class and passing the websiteLink param to it.
  • After that we have initialized the object of HttpURLConnection class and open the connection to the URL using HTTP protocol with open() function.
  • Set the timeout, so that if communication to the URL couldn't be made within the set timeout range then it throws timeout exception.
  • At the end, we are receiving the response code using getResponseCode() function and verifying whether it's 400 or greater, then print the broken link.

Conclusion
Links are one of the important components of a website. Broken links would definitely not leave a good impression on the users. So, links testing becomes important aspect of the Test Plan. But performing it manually becomes a tough and time taking task. It's better to solve this problem by automating it using Selenium and HttpURLConnection.

Handle Multiple Tabs in Selenium << Previous     ||     Next >>  Upload/Download File in Selenium

Author
Passionately working as an Automation Developer from more than a decade. Let's connect LinkedIn

Follow Techlistic

YouTube Channel | Facebook Page | Telegram Channel | Quora Space
Feel free to ask queries or share your thoughts in comments or email us.

Comments

  1. Thanks for sharing the best information and suggestions, If you are looking for the best website design company in jodhpur, then visit Digital Suncity. Highly energetic blog, I’d love to find out some additional information.

    ReplyDelete
  2. The information you've provided is useful because it provides a wealth of knowledge that will be highly beneficial to me. Thank you for sharing that. Keep up the good work. Web Development Company Bloomington

    ReplyDelete

Post a Comment

Popular posts from this blog

10+ Best Demo Websites for Selenium Automation Testing Practice

Automate Amazon like E-Commerce Website with Selenium WebDriver

Handle Static and Dynamic Web Table in Selenium WebDriver

How to Automate Google Search with Selenium WebDriver

25+ Most Important Selenium WebDriver Commands Tutorial

50+ Most Important Selenium WebDriver Interview Questions and Answers

Top 7 Web Development Trends in 2022

Automate GoDaddy.com Features with Selenium WebDriver