Advanced Selenium Automation: Real-World Experiments with Sikuli, Appium, Python & Linux

Selenium is often introduced as a simple browser automation tool. Click here, assert there, move on. But once you start working on real-world enterprise applications, Selenium reveals its true personality. Flexible, extensible, and surprisingly powerful when combined with the right tools.

Over the years, I’ve used Selenium not just for UI automation, but as the central orchestrator in complex automation frameworks spanning image-based testing, OS-level dialogs, backend systems, performance testing, and even mobile OTP workflows.

In this blog, I’ll walk you through some of my hands-on Selenium experiments, explaining the why, the how, and sharing code snippets you can adapt in your own projects.


1. Automating a Map-Based Web Application Using Selenium + Sikuli

The Problem

Map-based applications (Google Maps–like UIs) are notoriously difficult to automate. Many controls are:

  • Canvas-rendered

  • Dynamic

  • Not accessible via DOM locators (XPath, CSS)

The Solution

I integrated Sikuli with Selenium WebDriver (Java).

Sikuli works on image recognition, making it perfect when DOM-level automation fails.

Architecture Overview

  • Selenium handles browser navigation and page-level flows

  • Sikuli handles map clicks, drag-drop, and visual validations


Visual Automation Layer (Sikuli Integration)

This layer is invoked when:

  • UI elements are not accessible via DOM

  • Canvas-based components are used

  • Map-based or image-driven interfaces exist

How It Fits

Flow:

  1. Selenium navigates to the relevant page

  2. Control is handed over to Sikuli

  3. Sikuli performs image-based actions (click, hover, drag)

  4. Control returns to Selenium

Architectural Benefit

  • Zero dependency on fragile locators

  • Ideal for maps, charts, video players

  • Keeps Selenium scripts clean and readable


Sample Code (Java + Sikuli)

java
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.sikuli.script.Screen;
import org.sikuli.script.Pattern;

public class MapAutomation {

    public static void main(String[] args) throws Exception {
        // Set ChromeDriver path (if not in PATH)
        // System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe");
        
        WebDriver driver = new ChromeDriver();
        driver.get("https://example-map-app.com");

        Screen screen = new Screen();
        Pattern mapIcon = new Pattern("map_marker.png");

        // Click on map marker using image recognition
        screen.wait(mapIcon, 10);
        screen.click(mapIcon);

        driver.quit();
    }
}

Why This Worked

  • Selenium alone could not “see” the map elements

  • Sikuli allowed pixel-level interaction

  • The combination delivered stability without hacks


2. Handling Windows Dialogs Using Selenium + WinIT

The Problem

Browser-native dialogs like:

  • File upload popups

  • Authentication dialogs

  • OS-level alerts

are outside Selenium’s control.

The Solution

I integrated WinIT with Selenium to interact with Windows dialogs directly.


OS-Level Interaction Layer (WinIT Integration)

Selenium cannot interact with:

  • Windows file upload dialogs

  • Authentication popups

  • Native OS alerts

Architecture Flow

  1. Selenium triggers an action that opens an OS dialog

  2. WinIT attaches to the dialog window

  3. Performs required actions (set text, click buttons)

  4. Returns execution to Selenium

Design Principle

This layer ensures:

  • Browser automation does not break due to OS limitations

  • No manual intervention in regression suites

  • End-to-end flows remain fully automated


Sample Code (Java + WinIT)

java
import com.lazerycode.selenium.util.Query;
import com.lazerycode.selenium.util.SeleniumUtils;

Query uploadDialog = new Query.Builder()
        .usingTitle("Open")
        .usingText("File name:")
        .build();

// Set file path in dialog
SeleniumUtils.waitForWindowAndSetText(uploadDialog, "C:\\test\\data.csv");
SeleniumUtils.waitForWindowAndClickButton(uploadDialog, "Open");

Key Takeaway

Selenium does not need to be replaced for OS-level testing. It just needs the right companion tool.


3. Performance Testing Using Selenium WebDriver with Python

The Problem

Traditional performance tools don’t always reflect real user journeys.

The Solution

I used Selenium WebDriver with Python to measure:

  • Page load times

  • Feature-level response times

  • User-action latency

This helped catch performance regressions from a user’s perspective.


Sample Code (Python)

python
import time
from selenium import webdriver

driver = webdriver.Chrome()
start_time = time.time()

driver.get("https://example-app.com/login")

login_time = time.time() - start_time
print(f"Login page load time: {login_time:.2f} seconds")

driver.quit()

Why This Approach Is Powerful

  • Combines functional + performance testing

  • Works well in CI pipelines

  • Highlights slow UX flows early


4. Frontend + Backend Automation Using Selenium + Paramiko

The Problem

Some test scenarios required:

  • UI actions on the frontend

  • Backend validations on AIX/Linux servers and vice versa.

The Solution

I integrated Python’s Paramiko module with Selenium.


Backend Automation Layer (Paramiko via SSH)

This layer enables frontend + backend automation in a single test case.

Typical Use Cases

  • Log validation after UI actions

  • Database or file system checks

  • Triggering backend jobs

  • AIX/Linux command execution

Architectural Flow

  1. Selenium performs UI action (e.g., submit form)

  2. Paramiko connects to backend server via SSH

  3. Executes commands and fetches output

  4. Results are validated within the same test execution

Why This Is Powerful

  • Eliminates manual backend verification

  • Enables true end-to-end testing

  • Speeds up root-cause analysis


Sample Code (Python + Paramiko)

python
import paramiko
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Initialize
driver = webdriver.Chrome()
driver.get("https://example-app.com")

# Trigger backend command via SSH
with paramiko.SSHClient() as ssh:
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh.connect("backend.server.com", username="user", password="pass")
    
    # Execute backend action
    ssh.exec_command("systemctl restart app-service")
    
    # Verify effect in logs
    stdin, stdout, stderr = ssh.exec_command("systemctl status app-service")
    print("Backend status:", stdout.read().decode())

# Validate UI effect
WebDriverWait(driver, 30).until(
    EC.visibility_of_element_located((By.ID, "status-indicator"))
)
status = driver.find_element(By.ID, "status-indicator").text
print(f"UI Status: {status}")

driver.quit()


5. Automating OTP Flows Using Selenium + Appium

The Problem

OTP-based authentication breaks traditional automation.

The Solution

I combined Selenium (Web) with Appium (Mobile).


Mobile Automation Layer (Appium Integration for OTP)

OTP-based authentication traditionally blocks automation.

Architecture Flow

  1. Selenium triggers OTP generation on the web app

  2. Appium connects to the mobile device or emulator

  3. Reads OTP from SMS or notification

  4. Selenium enters OTP into the web UI

  5. Authentication completes automatically

Key Advantage

  • Web and mobile automation run as one logical transaction

  • Enables full regression of secure flows

  • Extremely useful for fintech, banking, and telecom apps


Sample Flow (Sample Code)

java
import io.appium.java_client.AppiumDriver;
import io.appium.java_client.android.AndroidDriver;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
import java.net.URL;
import java.time.Duration;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class OTPAutomation {
    
    private static WebDriver webDriver;
    private static AppiumDriver mobileDriver;
    private static final int MAX_RETRIES = 5;
    private static final int POLL_INTERVAL_SECONDS = 2;
    
    public static void main(String[] args) throws Exception {
        try {
            // Step 1: Initialize Selenium WebDriver for Web Application
            setupWebDriver();
            
            // Step 2: Initialize Appium for Mobile Device
            setupMobileDriver();
            
            // Step 3: Navigate to web application and trigger OTP
            triggerOTPGeneration();
            
            // Step 4: Fetch OTP from mobile SMS
            String otp = fetchOTPFromMobile();
            
            // Step 5: Enter OTP in web application
            enterOTPInWebApp(otp);
            
            // Step 6: Verify successful authentication
            verifyAuthentication();
            
        } finally {
            // Cleanup
            if (webDriver != null) webDriver.quit();
            if (mobileDriver != null) mobileDriver.quit();
        }
    }
    
    private static void setupWebDriver() {
        System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");
        webDriver = new ChromeDriver();
        webDriver.manage().window().maximize();
        webDriver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10));
    }
    
    private static void setupMobileDriver() throws Exception {
        // Appium server URL
        URL appiumServerUrl = new URL("http://127.0.0.1:4723/wd/hub");
        
        // Desired capabilities for Android device
        DesiredCapabilities capabilities = new DesiredCapabilities();
        capabilities.setCapability("platformName", "Android");
        capabilities.setCapability("deviceName", "Android Emulator");
        capabilities.setCapability("automationName", "UiAutomator2");
        capabilities.setCapability("appPackage", "com.google.android.apps.messaging");
        capabilities.setCapability("appActivity", "com.google.android.apps.messaging.ui.ConversationListActivity");
        capabilities.setCapability("noReset", true);
        
        mobileDriver = new AndroidDriver(appiumServerUrl, capabilities);
    }
    
    private static void triggerOTPGeneration() {
        // Navigate to login page
        webDriver.get("https://example-app.com/login");
        
        // Enter phone number
        webDriver.findElement(By.id("phoneNumber")).sendKeys("+1234567890");
        
        // Click "Send OTP" button
        webDriver.findElement(By.id("sendOtpButton")).click();
        
        // Wait for OTP sent confirmation
        WebDriverWait wait = new WebDriverWait(webDriver, Duration.ofSeconds(10));
        wait.until(ExpectedConditions.visibilityOfElementLocated(
            By.xpath("//div[contains(text(),'OTP sent successfully')]")
        ));
        
        System.out.println("OTP generation triggered successfully");
    }
    
    private static String fetchOTPFromMobile() throws InterruptedException {
        String otp = null;
        int retryCount = 0;
        
        // Wait for SMS to arrive and extract OTP
        while (retryCount < MAX_RETRIES && otp == null) {
            System.out.println("Checking for OTP SMS (Attempt " + (retryCount + 1) + ")");
            
            try {
                // Navigate to SMS app and get the latest message
                mobileDriver.findElement(By.id("conversation_list")).click();
                
                // Wait for messages to load
                Thread.sleep(2000);
                
                // Get the latest SMS body
                String smsText = mobileDriver.findElement(
                    By.id("message_text")
                ).getText();
                
                // Extract OTP using regex pattern (6-digit OTP)
                Pattern pattern = Pattern.compile("\\b\\d{6}\\b");
                Matcher matcher = pattern.matcher(smsText);
                
                if (matcher.find()) {
                    otp = matcher.group();
                    System.out.println("OTP found: " + otp);
                }
                
            } catch (Exception e) {
                System.out.println("Error reading SMS: " + e.getMessage());
            }
            
            if (otp == null) {
                Thread.sleep(POLL_INTERVAL_SECONDS * 1000);
                retryCount++;
                
                // Refresh SMS view
                mobileDriver.navigate().back();
            }
        }
        
        if (otp == null) {
            throw new RuntimeException("Failed to retrieve OTP after " + MAX_RETRIES + " attempts");
        }
        
        return otp;
    }
    
    private static void enterOTPInWebApp(String otp) {
        // Wait for OTP input field to be visible
        WebDriverWait wait = new WebDriverWait(webDriver, Duration.ofSeconds(10));
        wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("otpInput")));
        
        // Enter OTP digit by digit (for individual input fields)
        WebElement otpInput = webDriver.findElement(By.id("otpInput"));
        otpInput.clear();
        otpInput.sendKeys(otp);
        
        System.out.println("OTP entered successfully: " + otp);
        
        // Click verify button
        webDriver.findElement(By.id("verifyOtpButton")).click();
    }
    
    private static void verifyAuthentication() {
        // Wait for successful authentication
        WebDriverWait wait = new WebDriverWait(webDriver, Duration.ofSeconds(10));
        
        try {
            // Check for successful login indicator
            WebElement successElement = wait.until(
                ExpectedConditions.visibilityOfElementLocated(
                    By.xpath("//div[contains(text(),'Login Successful') or contains(text(),'Welcome')]")
                )
            );
            
            System.out.println("Authentication successful! Message: " + successElement.getText());
            
        } catch (Exception e) {
            // Check for error message
            try {
                WebElement errorElement = webDriver.findElement(
                    By.xpath("//div[contains(@class,'error')]")
                );
                System.out.println("Authentication failed! Error: " + errorElement.getText());
            } catch (Exception ex) {
                System.out.println("Authentication status unknown");
            }
        }
    }
}


Final Thoughts: Selenium Is a Platform, Not Just a Tool 

What I learned through these experiments is simple but powerful:

Selenium is not limited by what it can do, but by how creatively you use it.

By integrating Selenium with tools like Sikuli, WinIT, Python, Paramiko, and Appium, you can:

  • Solve non-standard automation problems

  • Build resilient enterprise frameworks

  • Cover UI, OS, backend, mobile, and performance testing together

If you’re still using Selenium only for basic UI checks, you’re barely scratching the surface.

Suggested Articles


--
Author
Vaneesh Behl
Quality Engineering Leader | Automation Enthusiast

Popular posts from this blog

Mastering Selenium Practice: Automating Web Tables with Demo Examples

18 Demo Websites for Selenium Automation Practice in 2026

Selenium Automation for E-commerce Websites: End-to-End Testing Scenarios

14+ Best Selenium Practice Exercises to Master Automation Testing (with Code & Challenges)

Top 10 Highly Paid Indian-Origin CEOs in the USA

What is Java Class and Object?

A Complete Software Testing Tutorial: The Importance, Process, Tools, and Learning Resources

12 AI Tools You Should Start Using Today to Save Time & Work Smarter (2026)

Behavior-Driven Development (BDD) with Python Behave: A Complete Tutorial

Top 7 Web Development Trends in the Market (2026)