Advanced Selenium Automation: Real-World Experiments with Sikuli, Appium, Python & Linux
Selenium is often introduced as a simple browser automation tool. Click here, assert there, move on. But once you start working on real-world enterprise applications, Selenium reveals its true personality. Flexible, extensible, and surprisingly powerful when combined with the right tools.
Over the years, I’ve used Selenium not just for UI automation, but as the central orchestrator in complex automation frameworks spanning image-based testing, OS-level dialogs, backend systems, performance testing, and even mobile OTP workflows.
In this blog, I’ll walk you through some of my hands-on Selenium experiments, explaining the why, the how, and sharing code snippets you can adapt in your own projects.
1. Automating a Map-Based Web Application Using Selenium + Sikuli
The Problem
Map-based applications (Google Maps–like UIs) are notoriously difficult to automate. Many controls are:
-
Canvas-rendered
-
Dynamic
-
Not accessible via DOM locators (XPath, CSS)
The Solution
I integrated Sikuli with Selenium WebDriver (Java).
Sikuli works on image recognition, making it perfect when DOM-level automation fails.
Architecture Overview
-
Selenium handles browser navigation and page-level flows
-
Sikuli handles map clicks, drag-drop, and visual validations
Visual Automation Layer (Sikuli Integration)
This layer is invoked when:
-
UI elements are not accessible via DOM
-
Canvas-based components are used
-
Map-based or image-driven interfaces exist
How It Fits
Flow:
-
Selenium navigates to the relevant page
-
Control is handed over to Sikuli
-
Sikuli performs image-based actions (click, hover, drag)
-
Control returns to Selenium
Architectural Benefit
-
Zero dependency on fragile locators
-
Ideal for maps, charts, video players
-
Keeps Selenium scripts clean and readable
Sample Code (Java + Sikuli)
import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; import org.sikuli.script.Screen; import org.sikuli.script.Pattern; public class MapAutomation { public static void main(String[] args) throws Exception { // Set ChromeDriver path (if not in PATH) // System.setProperty("webdriver.chrome.driver", "path/to/chromedriver.exe"); WebDriver driver = new ChromeDriver(); driver.get("https://example-map-app.com"); Screen screen = new Screen(); Pattern mapIcon = new Pattern("map_marker.png"); // Click on map marker using image recognition screen.wait(mapIcon, 10); screen.click(mapIcon); driver.quit(); } }
Why This Worked
-
Selenium alone could not “see” the map elements
-
Sikuli allowed pixel-level interaction
-
The combination delivered stability without hacks
2. Handling Windows Dialogs Using Selenium + WinIT
The Problem
Browser-native dialogs like:
-
File upload popups
-
Authentication dialogs
-
OS-level alerts
are outside Selenium’s control.
The Solution
I integrated WinIT with Selenium to interact with Windows dialogs directly.
OS-Level Interaction Layer (WinIT Integration)
Selenium cannot interact with:
-
Windows file upload dialogs
-
Authentication popups
-
Native OS alerts
Architecture Flow
-
Selenium triggers an action that opens an OS dialog
-
WinIT attaches to the dialog window
-
Performs required actions (set text, click buttons)
-
Returns execution to Selenium
Design Principle
This layer ensures:
-
Browser automation does not break due to OS limitations
-
No manual intervention in regression suites
-
End-to-end flows remain fully automated
Sample Code (Java + WinIT)
import com.lazerycode.selenium.util.Query; import com.lazerycode.selenium.util.SeleniumUtils; Query uploadDialog = new Query.Builder() .usingTitle("Open") .usingText("File name:") .build(); // Set file path in dialog SeleniumUtils.waitForWindowAndSetText(uploadDialog, "C:\\test\\data.csv"); SeleniumUtils.waitForWindowAndClickButton(uploadDialog, "Open");
Key Takeaway
Selenium does not need to be replaced for OS-level testing. It just needs the right companion tool.
3. Performance Testing Using Selenium WebDriver with Python
The Problem
Traditional performance tools don’t always reflect real user journeys.
The Solution
I used Selenium WebDriver with Python to measure:
-
Page load times
-
Feature-level response times
-
User-action latency
This helped catch performance regressions from a user’s perspective.
Sample Code (Python)
import time from selenium import webdriver driver = webdriver.Chrome() start_time = time.time() driver.get("https://example-app.com/login") login_time = time.time() - start_time print(f"Login page load time: {login_time:.2f} seconds") driver.quit()
Why This Approach Is Powerful
-
Combines functional + performance testing
-
Works well in CI pipelines
-
Highlights slow UX flows early
4. Frontend + Backend Automation Using Selenium + Paramiko
The Problem
Some test scenarios required:
-
UI actions on the frontend
-
Backend validations on AIX/Linux servers and vice versa.
The Solution
I integrated Python’s Paramiko module with Selenium.
Backend Automation Layer (Paramiko via SSH)
This layer enables frontend + backend automation in a single test case.
Typical Use Cases
Log validation after UI actions
Database or file system checks
Triggering backend jobs
AIX/Linux command execution
Architectural Flow
Selenium performs UI action (e.g., submit form)
Paramiko connects to backend server via SSH
Executes commands and fetches output
Results are validated within the same test execution
Why This Is Powerful
Eliminates manual backend verification
Enables true end-to-end testing
Speeds up root-cause analysis
Sample Code (Python + Paramiko)
import paramiko from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC # Initialize driver = webdriver.Chrome() driver.get("https://example-app.com") # Trigger backend command via SSH with paramiko.SSHClient() as ssh: ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) ssh.connect("backend.server.com", username="user", password="pass") # Execute backend action ssh.exec_command("systemctl restart app-service") # Verify effect in logs stdin, stdout, stderr = ssh.exec_command("systemctl status app-service") print("Backend status:", stdout.read().decode()) # Validate UI effect WebDriverWait(driver, 30).until( EC.visibility_of_element_located((By.ID, "status-indicator")) ) status = driver.find_element(By.ID, "status-indicator").text print(f"UI Status: {status}") driver.quit()
5. Automating OTP Flows Using Selenium + Appium
The Problem
OTP-based authentication breaks traditional automation.
The Solution
I combined Selenium (Web) with Appium (Mobile).
Mobile Automation Layer (Appium Integration for OTP)
OTP-based authentication traditionally blocks automation.
Architecture Flow
-
Selenium triggers OTP generation on the web app
-
Appium connects to the mobile device or emulator
-
Reads OTP from SMS or notification
-
Selenium enters OTP into the web UI
-
Authentication completes automatically
Key Advantage
-
Web and mobile automation run as one logical transaction
-
Enables full regression of secure flows
-
Extremely useful for fintech, banking, and telecom apps
Sample Flow (Sample Code)
import io.appium.java_client.AppiumDriver; import io.appium.java_client.android.AndroidDriver; import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.support.ui.ExpectedConditions; import org.openqa.selenium.support.ui.WebDriverWait; import java.net.URL; import java.time.Duration; import java.util.regex.Matcher; import java.util.regex.Pattern; public class OTPAutomation { private static WebDriver webDriver; private static AppiumDriver mobileDriver; private static final int MAX_RETRIES = 5; private static final int POLL_INTERVAL_SECONDS = 2; public static void main(String[] args) throws Exception { try { // Step 1: Initialize Selenium WebDriver for Web Application setupWebDriver(); // Step 2: Initialize Appium for Mobile Device setupMobileDriver(); // Step 3: Navigate to web application and trigger OTP triggerOTPGeneration(); // Step 4: Fetch OTP from mobile SMS String otp = fetchOTPFromMobile(); // Step 5: Enter OTP in web application enterOTPInWebApp(otp); // Step 6: Verify successful authentication verifyAuthentication(); } finally { // Cleanup if (webDriver != null) webDriver.quit(); if (mobileDriver != null) mobileDriver.quit(); } } private static void setupWebDriver() { System.setProperty("webdriver.chrome.driver", "path/to/chromedriver"); webDriver = new ChromeDriver(); webDriver.manage().window().maximize(); webDriver.manage().timeouts().implicitlyWait(Duration.ofSeconds(10)); } private static void setupMobileDriver() throws Exception { // Appium server URL URL appiumServerUrl = new URL("http://127.0.0.1:4723/wd/hub"); // Desired capabilities for Android device DesiredCapabilities capabilities = new DesiredCapabilities(); capabilities.setCapability("platformName", "Android"); capabilities.setCapability("deviceName", "Android Emulator"); capabilities.setCapability("automationName", "UiAutomator2"); capabilities.setCapability("appPackage", "com.google.android.apps.messaging"); capabilities.setCapability("appActivity", "com.google.android.apps.messaging.ui.ConversationListActivity"); capabilities.setCapability("noReset", true); mobileDriver = new AndroidDriver(appiumServerUrl, capabilities); } private static void triggerOTPGeneration() { // Navigate to login page webDriver.get("https://example-app.com/login"); // Enter phone number webDriver.findElement(By.id("phoneNumber")).sendKeys("+1234567890"); // Click "Send OTP" button webDriver.findElement(By.id("sendOtpButton")).click(); // Wait for OTP sent confirmation WebDriverWait wait = new WebDriverWait(webDriver, Duration.ofSeconds(10)); wait.until(ExpectedConditions.visibilityOfElementLocated( By.xpath("//div[contains(text(),'OTP sent successfully')]") )); System.out.println("OTP generation triggered successfully"); } private static String fetchOTPFromMobile() throws InterruptedException { String otp = null; int retryCount = 0; // Wait for SMS to arrive and extract OTP while (retryCount < MAX_RETRIES && otp == null) { System.out.println("Checking for OTP SMS (Attempt " + (retryCount + 1) + ")"); try { // Navigate to SMS app and get the latest message mobileDriver.findElement(By.id("conversation_list")).click(); // Wait for messages to load Thread.sleep(2000); // Get the latest SMS body String smsText = mobileDriver.findElement( By.id("message_text") ).getText(); // Extract OTP using regex pattern (6-digit OTP) Pattern pattern = Pattern.compile("\\b\\d{6}\\b"); Matcher matcher = pattern.matcher(smsText); if (matcher.find()) { otp = matcher.group(); System.out.println("OTP found: " + otp); } } catch (Exception e) { System.out.println("Error reading SMS: " + e.getMessage()); } if (otp == null) { Thread.sleep(POLL_INTERVAL_SECONDS * 1000); retryCount++; // Refresh SMS view mobileDriver.navigate().back(); } } if (otp == null) { throw new RuntimeException("Failed to retrieve OTP after " + MAX_RETRIES + " attempts"); } return otp; } private static void enterOTPInWebApp(String otp) { // Wait for OTP input field to be visible WebDriverWait wait = new WebDriverWait(webDriver, Duration.ofSeconds(10)); wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("otpInput"))); // Enter OTP digit by digit (for individual input fields) WebElement otpInput = webDriver.findElement(By.id("otpInput")); otpInput.clear(); otpInput.sendKeys(otp); System.out.println("OTP entered successfully: " + otp); // Click verify button webDriver.findElement(By.id("verifyOtpButton")).click(); } private static void verifyAuthentication() { // Wait for successful authentication WebDriverWait wait = new WebDriverWait(webDriver, Duration.ofSeconds(10)); try { // Check for successful login indicator WebElement successElement = wait.until( ExpectedConditions.visibilityOfElementLocated( By.xpath("//div[contains(text(),'Login Successful') or contains(text(),'Welcome')]") ) ); System.out.println("Authentication successful! Message: " + successElement.getText()); } catch (Exception e) { // Check for error message try { WebElement errorElement = webDriver.findElement( By.xpath("//div[contains(@class,'error')]") ); System.out.println("Authentication failed! Error: " + errorElement.getText()); } catch (Exception ex) { System.out.println("Authentication status unknown"); } } } }
Final Thoughts: Selenium Is a Platform, Not Just a Tool
What I learned through these experiments is simple but powerful:
Selenium is not limited by what it can do, but by how creatively you use it.
By integrating Selenium with tools like Sikuli, WinIT, Python, Paramiko, and Appium, you can:
-
Solve non-standard automation problems
-
Build resilient enterprise frameworks
-
Cover UI, OS, backend, mobile, and performance testing together
If you’re still using Selenium only for basic UI checks, you’re barely scratching the surface.
Suggested Articles
Author
Vaneesh Behl
Quality Engineering Leader | Automation Enthusiast
📘Tutorials | 🧠AI | 🧪Selenium | 🥇Top 10 | 🛠️Tools | 📋Software Testing





