Halmurat T.
· 8 min read

The Flaky Test Isn't Flaky — It's a Race Condition

The Flaky Test Isn't Flaky — It's a Race Condition
Table of Contents

At a large Canadian telecom, we had a Playwright suite — 260 tests, 4 parallel workers — that failed 8-12 times per run. Always different tests. Always passing on retry. The team had already added retries: 2 to the Playwright config and moved on. For six months, nobody questioned it. The retry mask was hiding a real bug that customers were hitting in production.

The @Retry Anti-Pattern

Here’s the thing about retries: they don’t fix anything. They hide symptoms. And in test automation, hidden symptoms compound.

Every retry is a question you’re choosing not to ask. “Why did this fail?” becomes “did it pass the second time?” and that second question costs you nothing today and everything over time.

I’ve seen teams carry 15-20% retry rates for months. The math is brutal. If you run 300 tests 3 times a day and 15% need a retry, that’s 135 extra test executions daily. At 30 seconds each, you’re burning over an hour of CI time every day on tests that “pass.” Multiply that by a year — that’s 365+ hours of compute time spent re-asking questions you already got honest answers to.

But the CI time isn’t even the real cost. The real cost is trust erosion. Once a team accepts that “some tests are just flaky,” every legitimate failure gets the benefit of the doubt. I’ve watched teams retry a genuine regression four times before someone actually read the error message. That’s the compound interest on technical debt.

What a Race Condition Actually Looks Like

A race condition is two operations fighting over the same resource with no coordination on who goes first. In application code, it’s two threads writing to the same variable. In test automation, it’s usually subtler — two tests sharing state they shouldn’t be sharing.

The symptoms are always the same:

  • Passes in isolation, fails in parallel
  • Failures are non-deterministic — different tests, different runs
  • Error messages don’t match the test logic (wrong user, wrong data, unexpected state)
  • Failure rate increases when you add more parallel workers

If that list describes your suite, you don’t have flaky tests. You have shared state leaking between parallel executions.

The War Story: A “Flaky” Login Test

Our failing tests had no pattern — sometimes a checkout test, sometimes a profile update, sometimes a simple dashboard load. The only commonality was that they all involved authenticated flows. The team had accepted it as Playwright being “flaky with auth.” Playwright wasn’t the problem.

The first clue was in the CI history — four consecutive runs, each failing on different authenticated tests:

CI history — 4 consecutive nightly runs (nightly-regression pipeline)
Run #1204 Mar 02 258 passed 2 failed ✗ account-management > update billing address
✗ dashboard > load account summary widget
Run #1205 Mar 03 256 passed 4 failed ✗ profile > change notification preferences
✗ checkout > apply promo code to subscription
✗ billing > download invoice PDF
✗ dashboard > verify usage chart renders
Run #1206 Mar 04 259 passed 1 failed ✗ account-management > cancel add-on service
Run #1207 Mar 05 257 passed 3 failed ✗ billing > update payment method
✗ profile > upload avatar image
✗ checkout > upgrade plan tier

The pattern is invisible if you look at any single run. Line up four runs and it jumps out: every failure is an authenticated flow, no two runs fail on the same test, and every single one passes on retry. That’s not flakiness — that’s contention.

I pulled up the Playwright Trace Viewer on three consecutive failures to confirm. The network timelines told the whole story:

  1. Test A (Worker 1) logs in as testuser@corp.com at T+0ms
  2. Test B (Worker 3) logs in as testuser@corp.com at T+200ms
  3. Test A’s session token gets invalidated when Test B authenticates
  4. Test A tries to load the dashboard at T+500ms — gets a 401, redirected to login

The authentication service enforced single-session. When Test B logged in with the same credentials, it killed Test A’s session. Both tests were correct. The infrastructure was correct. The problem was that all 260 tests shared a single test user account.

Trace Viewer timeline — simplified
Worker 1: login(testuser) -----> 200 OK -----> GET /dashboard -----> 401 Unauthorized
Worker 3: login(testuser) -----> 200 OK -----> GET /dashboard -----> 200 OK
^
Session invalidated here

Why Shared Test Users Break Everything

This anti-pattern is everywhere. A team creates one test account, hardcodes the credentials, and it works fine in sequential execution. The moment you add parallel workers, you’ve introduced a race condition into your test infrastructure.

The failures aren’t limited to session conflicts. Shared test users cause at least three categories of non-deterministic failure:

1. Session/auth token invalidation. The one we hit. Most auth systems enforce single-session or rotate tokens on new login. Two workers logging in with the same credentials means one always loses.

2. Data contention. Test A creates an order for testuser. Test B queries orders for testuser and finds unexpected data. Test A deletes the order. Test B tries to verify the order and it’s gone.

3. State pollution. Test A changes the user’s profile to “Ontario.” Test B expects the default “British Columbia.” Both tests are correct in isolation, both fail unpredictably in parallel.

The Fix: Test User Isolation

The principle is simple — every parallel worker gets its own isolated test user. The implementation depends on your constraints.

Option 1: Pre-Created User Pool

Create a pool of test users ahead of time and assign one per worker. This is the fastest path if your user provisioning is complex or slow.

src/test/java/core/TestUserPool.java
public class TestUserPool {
// One user per parallel worker — never shared
private static final String[] USERS = {
"testuser-w0@corp.com",
"testuser-w1@corp.com",
"testuser-w2@corp.com",
"testuser-w3@corp.com"
};
public static String forWorker(int workerIndex) {
return USERS[workerIndex % USERS.length];
}
}

For Playwright specifically, you can use the built-in workerIndex:

tests/fixtures.ts
import { test as base } from '@playwright/test';
export const test = base.extend<{ testUser: string }>({
testUser: async ({}, use, workerInfo) => {
const email = `testuser-w${workerInfo.workerIndex}@corp.com`;
await use(email);
},
});

Option 2: Dynamic User Provisioning

If your system supports it, create a fresh user per test or per worker via API. More overhead, but zero chance of collision.

src/test/java/core/TestUserFactory.java
public class TestUserFactory {
public static TestUser create() {
// Unique per invocation — UUID eliminates collisions
String email = "auto-" + UUID.randomUUID() + "@test.corp.com";
return userApi.createUser(email, DEFAULT_PASSWORD);
}
}

Option 3: Worker-Scoped Setup (Best of Both Worlds)

Create the user once per worker, reuse it across all tests on that worker, and tear it down after. This is what we ended up using — it balances isolation with performance.

tests/fixtures.ts
export const test = base.extend<{}, { workerUser: TestUser }>({
workerUser: [async ({}, use, workerInfo) => {
// Created once per worker, torn down after all its tests
const user = await api.createUser({
email: `worker-${workerInfo.workerIndex}-${Date.now()}@test.corp.com`,
password: 'Test1234!',
});
await use(user);
await api.deleteUser(user.id);
}, { scope: 'worker' }],
});

CI Retry Rate

Before

15% of tests needed retries

After

0.4% retry rate (actual infra flakes)

97% reduction

The Organizational Argument

The technical fix took two days. Convincing the team to prioritize it took two weeks. Here’s the argument that worked:

Every retry is an admission that your test suite is giving you unreliable answers. If QA can’t trust the suite, neither can anyone gating releases on it. That means manual verification creeps back in “just to be safe.” A test that lies about its confidence level is worse than no test — and a test that sometimes fails for infrastructure reasons is a test nobody trusts even when it fails for real reasons.

Our team was spending roughly 3 hours per week investigating “flaky” failures that turned out to be retry-masked race conditions. After the fix, that dropped to near zero. But the bigger win was cultural — the team stopped treating test failures as noise and started treating them as signal again.

Reframe: Signal, Not Noise

The next time a test fails intermittently, resist the urge to add a retry. Instead, ask three questions:

  1. Does it pass in isolation but fail in parallel? You have shared state. Check for thread safety violations — shared drivers, shared users, shared data.
  2. Does the error reference data from a different test? Two workers are contending over the same resource.
  3. Does the failure rate scale with parallelism? More workers = more failures = shared state, guaranteed.

Your flaky test is trying to tell you something. The retry is just making sure you never hear it.

Related Posts

Welcome aboard!

You're on the list. Expect real-world QA insights — no fluff, no spam.