The Flaky Test Isn't Flaky — It's a Race Condition

At a large Canadian telecom, we had a Playwright suite — 260 tests, 4 parallel workers — that failed 8-12 times per run. Always different tests. Always passing on retry. The team had already added retries: 2 to the Playwright config and moved on. For six months, nobody questioned it. The retry mask was hiding a real bug that customers were hitting in production.

Why are test retries an anti-pattern?

Retries mask the root cause of test failures instead of fixing them. They turn a clear signal — “something is wrong” — into comfortable silence, and the underlying bug keeps shipping. Here’s what that looks like in practice.

Here’s the thing about retries: they don’t fix anything. They hide symptoms. And in test automation, hidden symptoms compound.

Every retry is a question you’re choosing not to ask. “Why did this fail?” becomes “did it pass the second time?” and that second question costs you nothing today and everything over time.

I’ve seen teams carry 15-20% retry rates for months. The math is brutal. If you run 300 tests 3 times a day and 15% need a retry, that’s 135 extra test executions daily. At 30 seconds each, you’re burning over an hour of CI time every day on tests that “pass.” Multiply that by a year — that’s 365+ hours of compute time spent re-asking questions you already got honest answers to.

But the CI time isn’t even the real cost. The real cost is trust erosion. Once a team accepts that “some tests are just flaky,” every legitimate failure gets the benefit of the doubt. I’ve watched teams retry a genuine regression four times before someone actually read the error message. That’s the compound interest on technical debt.

What does a race condition look like in test automation?

A race condition in test automation occurs when two or more parallel test workers access shared state — like a test user account, a database record, or a session token — without coordination. The result is non-deterministic failures that pass on retry, making them look “flaky” when the real problem is a concurrency conflict in your test infrastructure.

A race condition is two operations fighting over the same resource with no coordination on who goes first. In application code, it’s two threads writing to the same variable. In test automation, it’s usually subtler — two tests sharing state they shouldn’t be sharing.

The symptoms are always the same:

Passes in isolation, fails in parallel
Failures are non-deterministic — different tests, different runs
Error messages don’t match the test logic (wrong user, wrong data, unexpected state)
Failure rate increases when you add more parallel workers

If that list describes your suite, you don’t have flaky tests. You have shared state leaking between parallel executions.

Our failing tests had no pattern — sometimes a checkout test, sometimes a profile update, sometimes a simple dashboard load. The only commonality was that they all involved authenticated flows. The team had accepted it as Playwright being “flaky with auth.” Playwright wasn’t the problem.

The first clue was in the CI history — four consecutive runs, each failing on different authenticated tests:

Run #1204  Mar 02  258 passed  2 failed   ✗ account-management > update billing address
                                           ✗ dashboard > load account summary widget
Run #1205  Mar 03  256 passed  4 failed   ✗ profile > change notification preferences
                                           ✗ checkout > apply promo code to subscription
                                           ✗ billing > download invoice PDF
                                           ✗ dashboard > verify usage chart renders
Run #1206  Mar 04  259 passed  1 failed   ✗ account-management > cancel add-on service
Run #1207  Mar 05  257 passed  3 failed   ✗ billing > update payment method
                                           ✗ profile > upload avatar image
                                           ✗ checkout > upgrade plan tier

The pattern is invisible if you look at any single run. Line up four runs and it jumps out: every failure is an authenticated flow, no two runs fail on the same test, and every single one passes on retry. That’s not flakiness — that’s contention.

I pulled up the Playwright Trace Viewer on three consecutive failures to confirm. The network timelines told the whole story:

Test A (Worker 1) logs in as testuser@corp.com at T+0ms
Test B (Worker 3) logs in as testuser@corp.com at T+200ms
Test A’s session token gets invalidated when Test B authenticates
Test A tries to load the dashboard at T+500ms — gets a 401, redirected to login

The authentication service enforced single-session. When Test B logged in with the same credentials, it killed Test A’s session. Both tests were correct. The infrastructure was correct. The problem was that all 260 tests shared a single test user account.

Worker 1: login(testuser) -----> 200 OK -----> GET /dashboard -----> 401 Unauthorized
Worker 3:           login(testuser) -----> 200 OK -----> GET /dashboard -----> 200 OK
                                    ^
                            Session invalidated here

Why do shared test users break parallel test execution?

Shared test users break parallel execution because most authentication systems enforce single-session policies or rotate tokens on login. When two parallel workers authenticate with the same credentials, one worker’s session gets invalidated by the other, causing unpredictable 401 errors that look like flaky tests.

This anti-pattern is everywhere. A team creates one test account, hardcodes the credentials, and it works fine in sequential execution. The moment you add parallel workers, you’ve introduced a race condition into your test infrastructure.

The failures aren’t limited to session conflicts. Shared test users cause at least three categories of non-deterministic failure:

1. Session/auth token invalidation. The one we hit. Most auth systems enforce single-session or rotate tokens on new login. Two workers logging in with the same credentials means one always loses.

2. Data contention. Test A creates an order for testuser. Test B queries orders for testuser and finds unexpected data. Test A deletes the order. Test B tries to verify the order and it’s gone.

3. State pollution. Test A changes the user’s profile to “Ontario.” Test B expects the default “British Columbia.” Both tests are correct in isolation, both fail unpredictably in parallel.

How do you fix race conditions in parallel test suites?

The fix is test user isolation — every parallel worker gets its own dedicated test user account so no two workers compete for the same session, data, or state. There are three common implementation patterns depending on your provisioning constraints.

Race conditions from shared state are preventable with a controlled test environment — dedicated test users, seeded data, and isolated infrastructure.

The principle is simple — every parallel worker gets its own isolated test user. The implementation depends on your constraints.

Option 1: Pre-Created User Pool

Create a pool of test users ahead of time and assign one per worker. This is the fastest path if your user provisioning is complex or slow.

public class TestUserPool {
    // One user per parallel worker — never shared
    private static final String[] USERS = {
        "testuser-w0@corp.com",
        "testuser-w1@corp.com",
        "testuser-w2@corp.com",
        "testuser-w3@corp.com"
    };

    public static String forWorker(int workerIndex) {
        return USERS[workerIndex % USERS.length];
    }
}

For Playwright specifically, you can use the built-in workerIndex:

import { test as base } from '@playwright/test';

export const test = base.extend<{ testUser: string }>({
    testUser: async ({}, use, workerInfo) => {
        const email = `testuser-w${workerInfo.workerIndex}@corp.com`;
        await use(email);
    },
});

Option 2: Dynamic User Provisioning

If your system supports it, create a fresh user per test or per worker via API. More overhead, but zero chance of collision.

public class TestUserFactory {
    public static TestUser create() {
        // Unique per invocation — UUID eliminates collisions
        String email = "auto-" + UUID.randomUUID() + "@test.corp.com";
        return userApi.createUser(email, DEFAULT_PASSWORD);
    }
}

Option 3: Worker-Scoped Setup (Best of Both Worlds)

Create the user once per worker, reuse it across all tests on that worker, and tear it down after. This is what we ended up using — it balances isolation with performance.

export const test = base.extend<{}, { workerUser: TestUser }>({
    workerUser: [async ({}, use, workerInfo) => {
        // Created once per worker, torn down after all its tests
        const user = await api.createUser({
            email: `worker-${workerInfo.workerIndex}-${Date.now()}@test.corp.com`,
            password: 'Test1234!',
        });
        await use(user);
        await api.deleteUser(user.id);
    }, { scope: 'worker' }],
});

§ Delta · CI Retry Rate 97% reduction

Before

15% of tests needed retries

After

0.4% retry rate (actual infra flakes)

The Organizational Argument

The technical fix took two days. Convincing the team to prioritize it took two weeks. Here’s the argument that worked:

Every retry is an admission that your test suite is giving you unreliable answers. If QA can’t trust the suite, neither can anyone gating releases on it. That means manual verification creeps back in “just to be safe.” A test that lies about its confidence level is worse than no test — and a test that sometimes fails for infrastructure reasons is a test nobody trusts even when it fails for real reasons.

Our team was spending roughly 3 hours per week investigating “flaky” failures that turned out to be retry-masked race conditions. After the fix, that dropped to near zero. But the bigger win was cultural — the team stopped treating test failures as noise and started treating them as signal again.

How do you tell if a flaky test is really a race condition?

Flaky tests that are actually race conditions follow a distinct pattern: they pass in isolation, fail non-deterministically in parallel, and the failure rate scales with the number of parallel workers. If those three conditions are true, you’re dealing with shared state, not test instability.

The next time a test fails intermittently, resist the urge to add a retry. Instead, ask three questions:

Does it pass in isolation but fail in parallel? You have shared state. Check for thread safety violations — shared drivers, shared users, shared data.
Does the error reference data from a different test? Two workers are contending over the same resource.
Does the failure rate scale with parallelism? More workers = more failures = shared state, guaranteed.

Your flaky test is trying to tell you something. The retry is just making sure you never hear it.

Frequently Asked Questions

§ Frequently Asked FAQ

How do I know if my flaky test is caused by a race condition?

If the test passes when run in isolation but fails non-deterministically in parallel, and the failure rate increases as you add more parallel workers, it is almost certainly a race condition caused by shared state — typically shared test user accounts, shared database records, or shared session tokens.

What is the best way to isolate test users in Playwright parallel execution?

The most effective approach is worker-scoped test user fixtures. Create a unique test user per Playwright worker using workerInfo.workerIndex or a UUID-based factory, reuse it across all tests on that worker, and tear it down after. This balances isolation with performance — you avoid session conflicts without the overhead of creating a new user for every single test.

Should I remove retries entirely from my Playwright config?

Not necessarily. A small retry count (1, not 2-3) can handle genuine infrastructure flakes like network timeouts or CI runner hiccups. The problem is using retries to mask a high failure rate. If more than 1-2% of your tests need retries to pass, you have a systematic issue — likely shared state or a race condition — that retries are hiding.

Can race conditions in test suites hide real production bugs?

Yes. Race conditions in your test infrastructure can mask real application bugs because the team learns to dismiss intermittent failures as “flaky tests.” If a genuine regression causes a test to fail, the retry passes it silently, and the bug ships to production. This is exactly what happened in the case study above — the single-session enforcement that broke our tests was also affecting real customers with multiple browser tabs.

Why are test retries an anti-pattern?

What does a race condition look like in test automation?

The War Story: A “Flaky” Login Test

Why do shared test users break parallel test execution?

How do you fix race conditions in parallel test suites?

Option 1: Pre-Created User Pool

Option 2: Dynamic User Provisioning

Option 3: Worker-Scoped Setup (Best of Both Worlds)

The Organizational Argument

How do you tell if a flaky test is really a race condition?

Frequently Asked Questions

Shared Session Cookies Corrupted Our Parallel Tests

We Cut 150 Min of Test Setup with 3 Java Classes

Our Enterprise Approved AI — And Why It's the Biggest Risk

Don't miss a thing