Halmurat T.
Halmurat T.

Senior SDET

Home Blog Books ask About

The Dispatch

Weekly QA notes from the trenches.

Welcome aboard!

You're on the list. Expect real-world QA insights — no fluff, no spam.

© 2026 Halmurat T.

Automation 24
  • Selenium
  • Playwright
  • Appium
  • Cypress
AI Testing 5
CI/CD 6
  • GitHub Actions
  • Slack Reporting
QA Strategy 4
Case Studies 5
Blog/CI/CD
CI/CDHalmurat T./April 23, 2026/11 min

Playwright's New CLI Debugger Finally Works in Headless CI

Filed underplaywright/cicd/debugging/trace-viewer

Table of Contents
  • Why the Playwright debug story had a gap for five years
  • What is Playwright’s --debug=cli mode?
  • The interaction at a glance
  • How do you use --debug=cli inside a CI container?
  • What can you do with npx playwright trace?
  • The three subcommands that unlock scripted triage
  • Post-CI triage: automated Slack summaries from trace zips
  • Why does retain-on-failure-and-retries trace mode matter?
  • Where does this fit in the 2026 Playwright debug stack?
  • One concrete next step

On this page

  • Why the Playwright debug story had a gap for five years
  • What is Playwright’s --debug=cli mode?
  • The interaction at a glance
  • How do you use --debug=cli inside a CI container?
  • What can you do with npx playwright trace?
  • The three subcommands that unlock scripted triage
  • Post-CI triage: automated Slack summaries from trace zips
  • Why does retain-on-failure-and-retries trace mode matter?
  • Where does this fit in the 2026 Playwright debug stack?
  • One concrete next step

It’s 2 AM. PagerDuty rang. The staging pipeline is blocked because a checkout test is flaking — and only on the ephemeral runner, never locally. You open the Playwright HTML report in your browser, find the red test, click the trace icon, wait for Trace Viewer to load, scrub through 40 actions, find nothing conclusive, download the zip, try to open it locally, realize the runner’s Node version differs, rerun the test with --ui, wait for a browser to spin up, finally spot what looks like a race condition. That loop — start to hypothesis — took 35 minutes. With Playwright 1.59’s new --debug=cli mode, the same loop takes three.

Why the Playwright debug story had a gap for five years

For the last half-decade, Playwright’s official debug story was: UI Mode for local dev, Trace Viewer for post-hoc inspection, and PWDEBUG=1 for a last-ditch browser-based debugger. All of them share the same fatal flaw for enterprise teams: they need a display.

A headless CI container running on an ephemeral GitHub Actions runner or a Kubernetes pod doesn’t have a display. It doesn’t have a browser window. It has a terminal, SSH access if you’re lucky, and a 6-hour timeout. Every production Playwright failure I’ve debugged at scale — at a large Canadian telecom, at a financial services firm running 2,000+ tests — has followed the same painful path: try to reproduce it locally, fail, add logging, push, wait 20 minutes for the pipeline, fail again, repeat.

That’s not a tooling gap — it’s a workflow tax. And it compounds hard when your suite is big enough that “just reproduce it locally” isn’t a reasonable ask. Playwright 1.59 (shipped April 1, 2026) finally addresses this with two CLI primitives built for headless environments.

What is Playwright’s --debug=cli mode?

--debug=cli is a terminal step-through debugger for Playwright tests. It pauses test execution, prints an attach instruction to stdout, and waits for an engineer to connect from any shell — including an SSH session into the CI runner itself. Unlike --ui, which launches a browser window and requires a display, --debug=cli runs entirely in text. No display, no Xvfb, no forwarded ports.

The interaction at a glance

Here’s what the interaction looks like:

terminal
# Start the paused test — this blocks and prints an attach token
npx playwright test checkout.spec.ts --debug=cli
# Output:
# Test paused. To debug, run:
# playwright-cli attach tw-7af3b2c1
terminal
# In a second shell (or over SSH to the same runner) — attach
playwright-cli attach tw-7af3b2c1
# Inside the CLI debugger:
# step-over — move to the next Playwright action
# step-in — descend into the current action
# continue — run until the next pause or failure
# snapshot before — dump the DOM state before this action
# locate "getByRole('button', { name: 'Place order' })"

The locate command inside the session is particularly useful — you can test selectors against the live DOM state without modifying the test file. I spent a solid 40 minutes last week stepping through a checkout flow that was passing locally and failing in CI because a third-party payment widget was loading 800ms later on the runner’s network. The CLI debugger showed me the DOM snapshot at step 14 — the button was there, but disabled. That took 4 minutes to find.

How do you use --debug=cli inside a CI container?

The pattern is a manual-trigger workflow — not something you run on every push. You add a workflow_dispatch GitHub Actions workflow that accepts the test file as input, runs the test with --debug=cli, and keeps the runner alive long enough for you to attach.

.github/workflows/debug.yml
name: Debug failing test
on:
workflow_dispatch:
inputs:
test_path:
description: 'Test file to debug (e.g. tests/checkout.spec.ts)'
required: true
jobs:
debug:
runs-on: ubuntu-latest
timeout-minutes: 60
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm ci && npx playwright install --with-deps
- name: Start CLI debug session
run: npx playwright test ${{ inputs.test_path }} --debug=cli
# Attach via GitHub Actions debug shell (re-run with SSH debug option)
# or ship the attach token to Slack in a prior step
- name: Upload trace on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: trace
path: test-results/

The GitHub Actions “Re-run with SSH debug” feature (available via the mxschmitt/action-tmate action) lets you open an SSH tunnel into the runner and run playwright-cli attach directly. That combination — --debug=cli plus an SSH tunnel — is the closest thing to a production debugger that headless CI has ever offered.

[ WARNING ]

--debug=cli is not a substitute for fixing flaky tests. It’s a last-resort tool for the 10% of failures where reproduction is genuinely hard — race conditions tied to runner infrastructure, environment-specific timing issues, network-dependent UI states. If you’re reaching for --debug=cli regularly, the real fix is isolating the environment variable causing the divergence, not stepping through it by hand every time.

What can you do with npx playwright trace?

The second big addition in 1.59 is CLI-native trace analysis. Previously, trace.zip files were useful only if you could open them in a browser via npx playwright show-report or the Playwright Trace Viewer UI. With 1.59, you can inspect them from the command line — which means you can script them.

The three subcommands that unlock scripted triage

Three subcommands matter here:

terminal
# List every recorded action, with optional grep filter
npx playwright trace actions trace.zip --grep="goto|fill|fetch"
# Dump the DOM snapshot at action index 14, before execution
npx playwright trace snapshot trace.zip 14 --name before > snapshot.html
# Interactive TUI (still requires a terminal but not a display)
npx playwright trace open trace.zip

Post-CI triage: automated Slack summaries from trace zips

The enterprise unlock is the actions subcommand piped into a script. A suite running 200 tests per pipeline with a 12% flakiness rate produces roughly 24 failed traces per run. Nobody opens 24 trace zips manually at 2 AM. But a script can. Here’s the post-CI triage pattern I’ve started recommending to teams:

scripts/trace-triage.sh
#!/usr/bin/env bash
# Runs after every failed CI job. Finds the last network action before
# the failure, dumps the "before" snapshot, and posts a summary to Slack.
set -euo pipefail
TRACE_ZIP="${1:?trace zip path required}"
SLACK_WEBHOOK_URL="${SLACK_WEBHOOK_URL:?SLACK_WEBHOOK_URL env var required}"
# Find the last network-related action before failure
LAST_ACTION=$(npx playwright trace actions "$TRACE_ZIP" --grep="fetch|goto|route" | tail -1)
ACTION_INDEX=$(echo "$LAST_ACTION" | awk '{print $1}')
# Dump the DOM state before that action
SNAPSHOT=$(npx playwright trace snapshot "$TRACE_ZIP" "$ACTION_INDEX" --name before 2>/dev/null | head -5)
SUMMARY="*Trace triage:* \`$(basename "$TRACE_ZIP")\`\nLast network action before failure: \`$LAST_ACTION\`"
curl -sf -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$SUMMARY\"}" \
"$SLACK_WEBHOOK_URL"

Wire this into your CI pipeline’s on-failure step and you get a Slack message with the last network call before every failure — automatically, on every run. A team I worked with at a large retail platform reduced their triage-to-hypothesis time from 25 minutes to under 5 by running a similar script. The engineering cost was two hours. That’s the kind of leverage you get from scriptable trace analysis.

Why does retain-on-failure-and-retries trace mode matter?

This is the third 1.59 addition worth calling out. Previously, the closest trace retention mode was 'retain-on-failure' — which saved the trace from the final attempt of a test. If that test had two retries and retry #3 eventually passed, you got no trace at all. If retry #3 still failed, you got only the last retry’s trace — not the first two, which is often where the actual divergence shows up in flaky tests.

The new 'retain-on-failure-and-retries' mode keeps traces from every attempt:

playwright.config.ts
import { defineConfig } from '@playwright/test';
export default defineConfig({
use: {
trace: 'retain-on-failure-and-retries',
},
retries: process.env.CI ? 2 : 0,
});

For flaky-test diagnosis, this is significant. The interesting data is usually in the divergence between retry #1 and retry #2 — the point where the test passes sometimes and fails others. With all three traces, you can run npx playwright trace actions on each and diff the action sequences. Retry #1 had 42 actions, retry #2 had 39 — the three missing actions tell you exactly where the timing diverged.

[ TIP ]

The storage cost is real. A suite running 200 tests with 2 retries and 10% flakiness will produce roughly 3x more trace data than retain-on-failure alone — something like 60 extra trace zips per run. Enable retain-on-failure-and-retries on a per-project basis for the flaky tests you’re actively investigating, not as a blanket global setting. I watched this triple a team’s artifact storage bill for a month before they scoped it down.

Where does this fit in the 2026 Playwright debug stack?

Four layers, use the right one for the job:

EnvironmentToolWhen to use
Local dev--ui (UI Mode)Writing new tests, iterating on selectors
Post-hoc localTrace Viewer in browserForensics on a specific failed run
Headless CI (live)--debug=cliReproducing a failure that won’t reproduce locally
Headless CI (automated)npx playwright trace + scriptsPost-CI triage, Slack summaries, failure clustering

The key mental model: the CLI tools (--debug=cli and npx playwright trace) are not replacements for UI Mode and the browser-based Trace Viewer. They’re complements for the environments where those tools can’t run. UI Mode is still the fastest tool for selector iteration and test authoring. The browser Trace Viewer is still the most ergonomic tool for human forensics on a single failure. But for the 20 failures that landed in your CI queue overnight, a shell script and npx playwright trace actions is the right tool — not 20 manual Trace Viewer sessions.

If you’re still tracking down the root causes of those CI failures, the async trap is where most of them originate — the async trap that causes most flaky tests is worth reading before you reach for the debugger. And if you’re dealing with a slow suite on top of a flaky one, the strategies for making slow Playwright suites fast apply before this layer.

One concrete next step

Pick one flaky test in your suite today. Open playwright.config.ts, add trace: 'retain-on-failure-and-retries' to that test’s project config (not the global config), and push. The next time that test flakes, you’ll have the full retry sequence on disk. Run npx playwright trace actions on each zip and look for the action count divergence. That’s your hypothesis in under two minutes.

If the suite is on a team that still does the 35-minute manual debug loop, copy the trace-triage.sh script above, swap in your Slack webhook, and wire it into your on-failure CI step. You’ll have automated triage before end of day.


§ Frequently Asked FAQ
+ Is --debug=cli a replacement for UI Mode?

No — UI Mode is still the right tool for local test authoring. --debug=cli is for the environments where UI Mode can’t run: headless CI containers, SSH-only bastion hosts, CI shells without a display.

+ Do I need to install anything extra for --debug=cli?

No. It ships with Playwright 1.59. The playwright-cli binary is installed by default when you run npm install @playwright/test.

+ Can `npx playwright trace` replace the Trace Viewer UI?

For automated triage, yes. For human forensics on a single failure, the browser-based Trace Viewer is still more ergonomic. Think of the CLI as what you script, and the UI as what you click.

+ What's the cost of trace: 'retain-on-failure-and-retries'?

Storage. A suite running 200 tests with 2 retries and 10% flakiness will produce roughly 3x more trace data than retain-on-failure alone. Worth it for a focused debug month, not as a permanent default.

§ Further Reading 03 of 03
01CI/CD

Stop Emailing Test Reports — Host Allure on Jenkins

Replace emailed Extent Report HTML files with a persistent Allure portal on Jenkins, so the team gets one URL, full history, and zero downloads to manage.

Read →
02CI/CD

Host a Team Report Portal with Allure Docker Service

Set up a standalone Allure report portal with Docker that any CI tool can push to, so the whole team can review results without logging into Jenkins daily.

Read →
03CI/CD

Task Scheduler Runs Our Automation Server — Here's How

Five real ways I use Windows Task Scheduler to keep an enterprise Playwright automation server healthy — from nightly test runs to disk alerts and cleanup.

Read →

Don't miss a thing

Subscribe to get updates straight to your inbox.

HT

No spam · Unsubscribe anytime

Welcome aboard!

You're on the list. Expect real-world QA insights — no fluff, no spam.

§ Colophon

Halmurat T. — Senior SDET writing about test automation, CI/CD, and QA strategy from 10+ years in the enterprise trenches.

Set in
IBM Plex Sans, Lora, and IBM Plex Mono.
Built with
Astro, MDX, Tailwind CSS & Expressive Code. Served by Vercel.
Privacy
No cookies. No tracking scripts on the main thread — analytics run sandboxed via Partytown.
Source
github.com/Halmurat-Uyghur
Terminal
Try /ask to query Halmurat's notes in a shell prompt.

© 2026 Halmurat T. · Written in plain text, shipped in plain time.

Search
Esc

Search is not available in dev mode.

Run npm run build then npm run preview:local to test search locally.