Halmurat T.
Halmurat T.

Senior SDET

Home Blog Books ask About

The Dispatch

Weekly QA notes from the trenches.

Welcome aboard!

You're on the list. Expect real-world QA insights — no fluff, no spam.

© 2026 Halmurat T.

Automation 24
  • Selenium
  • Playwright
  • Appium
  • Cypress
AI Testing 5
CI/CD 6
  • GitHub Actions
  • Slack Reporting
QA Strategy 4
Case Studies 5
Blog/AI Testing
AI TestingHalmurat T./February 12, 2026/12 min

Self-Healing Locators Aren't Magic — Here's How They Work

Filed underself-healing/ai-testing/framework-design/playwright/selenium
Self-Healing Locators Aren't Magic — Here's How They Work

Table of Contents
  • Why Locators Break in the First Place
  • How Do Self-Healing Locator Tools Actually Work?
  • Approach 1: Attribute Fallback Chains
  • Approach 2: DOM Tree Similarity Scoring
  • Approach 3: LLM-Based Locator Repair
  • Approach 4: Visual + DOM Hybrid
  • Can You Build Self-Healing Into Your Own Framework?
  • Which Approach Should You Actually Use?

On this page

  • Why Locators Break in the First Place
  • How Do Self-Healing Locator Tools Actually Work?
  • Approach 1: Attribute Fallback Chains
  • Approach 2: DOM Tree Similarity Scoring
  • Approach 3: LLM-Based Locator Repair
  • Approach 4: Visual + DOM Hybrid
  • Can You Build Self-Healing Into Your Own Framework?
  • Which Approach Should You Actually Use?

The “AI” inside most self-healing locator tools isn’t what the marketing suggests — it’s a try/catch loop with backup selectors. Every UI redesign, your locators break. Not one or two — dozens. The team spends a full day triaging red tests that have nothing to do with real bugs, swapping #btn-submit for #cta-submit-primary because someone refactored a component library. Self-healing locators promise to fix this automatically. But when you look under the hood of tools like Healenium, Testim, and mabl, the reality is simpler — and more limited — than the pitch.

I’ve spent over a decade building test frameworks in enterprise environments where locator instability is a weekly tax. I’ve evaluated every approach that claims to solve it. Here’s how the four main approaches to self-healing locators actually work, what trade-offs each carries, and which one is worth building into your own framework.

Why Locators Break in the First Place

Before reaching for self-healing, it’s worth understanding what you’re healing. Locators break for three reasons:

  1. CSS/ID refactors — A developer renames .btn-primary to .btn-cta-primary during a design system migration. Your By.cssSelector(".btn-primary") fails immediately. This is the most common cause, and I covered a deeper fix for it in why text-based locators reduce maintenance by 60%.

  2. DOM restructuring — A component gets wrapped in a new <div> for layout purposes. XPath expressions like //div[2]/form/button that depend on tree position silently point to the wrong element — or throw NoSuchElement.

  3. Dynamic attributes — Frameworks like React, Angular, and Vue generate IDs at runtime: input-a7f3e2b. Every page load, every build, the ID is different. No static locator survives this.

Self-healing tools target all three, but each approach handles them differently. The approach that fixes CSS renames doesn’t necessarily handle DOM restructuring, and vice versa.

How Do Self-Healing Locator Tools Actually Work?

Here are the four approaches used by every self-healing tool on the market. Some tools combine multiple approaches, but every implementation reduces to one or more of these.

Approach 1: Attribute Fallback Chains

This is what Healenium does, and it’s the simplest approach by far.

The idea: instead of storing one locator per element, store multiple — an ID, a CSS selector, an XPath, a text match, a data-testid. When the primary locator fails, try the next one in the chain. If any of them find the element, the test continues and logs a healing event. Conceptually, the locator chain looks like this:

conceptual-locator-chain.yml
element: submit-button
locators:
- type: id
value: "btn-submit"
- type: css
value: "[data-testid='submit']"
- type: xpath
value: "//button[contains(text(),'Submit')]"
- type: text
value: "Submit"

When #btn-submit breaks after a refactor, Healenium falls back to data-testid, then XPath, then text. It picks the first one that resolves, uses it for the current run, and updates its internal mapping.

Strengths: Dead simple. Easy to understand, easy to debug. Works well for CSS/ID renames — the most common breakage. You can build this yourself in a weekend.

Weaknesses: If the DOM restructures and all your stored locators become stale, the chain fails completely. There’s no intelligence — it’s just trying alternatives in order. It also can’t handle cases where the fallback locator matches the wrong element (a different button that also says “Submit”).

[ NOTE ]

Healenium is open-source and uses a PostgreSQL backend to store locator mappings. Under the hood, it’s a Selenium proxy that intercepts findElement calls, catches NoSuchElementException, and tries alternatives. That’s the entire “AI.”

Approach 2: DOM Tree Similarity Scoring

This is where the technique gets more interesting. Instead of just trying backup locators, the tool takes a snapshot of the DOM around an element — its tag, attributes, parent chain, sibling context — and stores it as a fingerprint. When the primary locator fails, it walks the current DOM and scores every candidate element against the stored fingerprint.

The algorithm is usually a weighted similarity score:

similarity-scoring.txt
Score = (0.3 × tag_match)
+ (0.25 × attribute_overlap)
+ (0.2 × text_similarity)
+ (0.15 × parent_chain_match)
+ (0.1 × sibling_context)
Threshold: score > 0.75 → accept as healed

If a button moves from div > form > button#submit to section > form > div.actions > button.cta-submit, the tag still matches, the text still matches, the parent chain partially matches, and the sibling context (nearby labels, inputs) is likely similar. The score might come out to 0.82 — above threshold — and the tool picks it.

Strengths: Handles DOM restructuring much better than fallback chains. Can survive layout changes, wrapper additions, and component reorganizations. It’s the approach that actually deserves the term “intelligent.”

Weaknesses: Threshold tuning is painful. Set it too low and you match wrong elements. Set it too high and you still break on minor changes. False positives are dangerous — the test silently interacts with the wrong element and passes. Also, computing similarity across the full DOM is expensive on large pages (5,000+ nodes).

Approach 3: LLM-Based Locator Repair

This is the newest approach and the one that actually uses “AI” in the way most people imagine. When a locator breaks, the tool sends context to a large language model — the old locator, the old DOM snapshot, and the current DOM — and asks it to generate a new locator.

The prompt looks something like this:

llm-repair-prompt.txt
The following Selenium locator no longer finds an element:
By.id("btn-submit")
Previous DOM context around the element:
<form id="login-form">
<button id="btn-submit" class="btn-primary">Submit</button>
</form>
Current DOM:
<form id="login-form">
<div class="form-actions">
<button class="cta-submit-primary" data-action="submit">
Submit
</button>
</div>
</form>
Generate a new, stable locator for the same element.

The LLM responds with something like By.cssSelector("button[data-action='submit']") or By.xpath("//form[@id='login-form']//button[text()='Submit']"). The tool validates the locator against the current DOM, and if it resolves to exactly one element, it heals.

Strengths: By far the most flexible. Can handle arbitrary DOM changes, renamed attributes, restructured components — anything a human tester could figure out by looking at the page. The LLM understands intent, not just structure.

Weaknesses: Latency. An LLM call takes 500ms–2s per broken locator. If 30 locators break in a redesign, that’s up to a minute of healing time per test run. Cost adds up too — at $3-15 per million tokens, a suite that heals frequently gets expensive. And there’s no guarantee the LLM generates a correct locator. I saw a similar trust problem when I let AI write a full test suite — the output looked right but had subtle bugs that required expert review.

[ WARNING ]

LLM-based repair introduces non-determinism into your test infrastructure. The same broken locator might get healed differently on two consecutive runs. For CI pipelines where reproducibility matters, this is a serious concern.

Approach 4: Visual + DOM Hybrid

This is what Testim and mabl use. The tool captures both a visual snapshot (screenshot region) and a DOM fingerprint of each element. When the primary locator breaks, it uses computer vision to find the element visually on the page, then cross-references with DOM similarity to confirm the match.

Think of it as: “The Submit button is a blue rectangle in the bottom-right of the form, with text ‘Submit,’ inside a <form> element.” If the DOM changes but the button still looks the same and is in roughly the same position, the visual matcher finds it. If the button moves but the DOM fingerprint still matches, the DOM matcher finds it.

Strengths: Most resilient to both structural and visual changes. Handles redesigns where everything changes — class names, DOM structure, even the tag itself — as long as the element is still visually recognizable.

Weaknesses: Heavyweight. Requires screenshot capture on every run (or at least on failure), image comparison infrastructure, and significant storage for baseline snapshots. Brittle across themes — dark mode vs. light mode can confuse the visual matcher. And most importantly, these tools are SaaS-only. You can’t build this yourself without investing in a computer vision pipeline, which is a project on its own.

Can You Build Self-Healing Into Your Own Framework?

Yes — for approaches 1 through 3. Here’s the honest assessment:

Fallback chains (Approach 1): Absolutely. This is a wrapper around your existing locator strategy. Intercept findElement, catch the failure, try alternatives. A senior SDET can build this in 2-3 days. If you’ve already moved to text-based locators or data-testid attributes, your fallback chain is shorter and more reliable.

DOM similarity scoring (Approach 2): Feasible but non-trivial. You need a DOM snapshot system, a similarity algorithm, and a threshold tuning process. Budget 2-3 weeks for a solid implementation. The algorithm itself isn’t complex — it’s the edge cases (iframes, shadow DOM, dynamic content) that eat your time.

LLM-based repair (Approach 3): Surprisingly easy to prototype. You need an LLM API call, a prompt template, and a locator validation step. A working proof-of-concept takes a day. But making it production-ready — handling rate limits, caching repairs, dealing with non-determinism, managing cost — takes much longer.

Visual hybrid (Approach 4): Don’t build this yourself. The computer vision component alone requires training data, model management, and screenshot infrastructure. Use a vendor tool or skip this approach.

Which Approach Should You Actually Use?

Here’s my recommendation based on what I’ve seen work at scale:

Start with Approach 1 (fallback chains) — it solves 70-80% of locator breakage. Most broken locators in real-world suites are CSS/ID renames, and a well-ordered fallback chain handles these instantly with zero latency, zero cost, and full determinism. Pair it with a strong locator strategy — text-based locators, data-testid attributes, and ARIA roles — and you’ve eliminated the majority of maintenance work before self-healing even kicks in.

Add Approach 2 (DOM similarity) if you’re on a large suite (500+ tests) with frequent UI redesigns. The investment pays off when you have enough locators breaking frequently enough that manual repair costs more than the 2-3 weeks to build the similarity engine. This is the sweet spot for most enterprise teams.

Use Approach 3 (LLM repair) as a last-resort fallback, not a primary strategy. It’s powerful for the 5% of cases where everything else fails, but the latency, cost, and non-determinism make it a poor default. Keep it behind a flag. Run it in a nightly job that proposes locator updates for human review — not in your CI pipeline.

Skip Approach 4 unless you’re already paying for Testim or mabl. The build-vs-buy calculus doesn’t work here. If you’re already on one of these platforms, use the visual healing. If you’re not, the other three approaches cover 95% of the problem.

[ TIP ]

The best self-healing strategy is preventing locator breakage in the first place. Invest in stable locator patterns first — text content, ARIA roles, data-testid — and add self-healing as a safety net, not a crutch.

§ Frequently Asked FAQ
+ Is self-healing testing actually AI?

Mostly no. The most common approach — attribute fallback chains — is a simple try/catch with backup locators. DOM similarity scoring uses deterministic algorithms, not machine learning. Only LLM-based repair and visual matching use what most people would consider “AI.” The marketing outpaces the technology.

+ Can I add self-healing to an existing Playwright or Selenium framework?

Yes. Fallback chains and LLM-based repair can be added as wrappers around your existing locator calls without restructuring your framework. DOM similarity requires storing element fingerprints, which needs a snapshot system — more invasive, but still feasible as an incremental addition.

+ Does self-healing hide real bugs?

It can. If a locator heals to the wrong element, your test passes but is now testing the wrong thing. Always log healing events and review them. Treat a healed locator as a tech debt item — an automated locator update that needs human confirmation, not a silent fix.

+ What is the best open-source self-healing tool?

Healenium is the most mature open-source option. It uses fallback chains with a PostgreSQL backend and integrates with Selenium. For Playwright, there’s no established open-source self-healing tool yet — you’ll need to build your own or use a commercial platform like Testim or mabl.

§ Further Reading 03 of 03
01AI Testing

Playwright MCP vs CLI vs Agents: What to Use in 2026

Playwright has three ways to talk to AI: MCP, CLI, and Test Agents. Here's the decision framework an enterprise SDET uses to pick the right one for 2026.

Read →
02AI Testing

I Let AI Write My Test Suite — Here's What Broke

A hands-on experiment with AI-generated Playwright tests: where LLMs save time, where they create false confidence, and the review workflow that works.

Read →
03AI Testing

Claude Code Has 2 Primitives, Not 3 — Use Skills First

Most engineers think Claude Code has three primitives. It actually has two — skills and subagents. Here's when to use which, with token-cost benchmarks.

Read →

Don't miss a thing

Subscribe to get updates straight to your inbox.

HT

No spam · Unsubscribe anytime

Welcome aboard!

You're on the list. Expect real-world QA insights — no fluff, no spam.

§ Colophon

Halmurat T. — Senior SDET writing about test automation, CI/CD, and QA strategy from 10+ years in the enterprise trenches.

Set in
IBM Plex Sans, Lora, and IBM Plex Mono.
Built with
Astro, MDX, Tailwind CSS & Expressive Code. Served by Vercel.
Privacy
No cookies. No tracking scripts on the main thread — analytics run sandboxed via Partytown.
Source
github.com/Halmurat-Uyghur
Terminal
Try /ask to query Halmurat's notes in a shell prompt.

© 2026 Halmurat T. · Written in plain text, shipped in plain time.

Search
Esc

Search is not available in dev mode.

Run npm run build then npm run preview:local to test search locally.