The Flaky Test Isn't Flaky — It's a Race Condition

Table of Contents
At a large Canadian telecom, we had a Playwright suite — 260 tests, 4 parallel workers — that failed 8-12 times per run. Always different tests. Always passing on retry. The team had already added retries: 2 to the Playwright config and moved on. For six months, nobody questioned it. The retry mask was hiding a real bug that customers were hitting in production.
Why are test retries an anti-pattern?
Retries mask the root cause of test failures instead of fixing them. They turn a clear signal — “something is wrong” — into comfortable silence, and the underlying bug keeps shipping. Here’s what that looks like in practice.
Here’s the thing about retries: they don’t fix anything. They hide symptoms. And in test automation, hidden symptoms compound.
Every retry is a question you’re choosing not to ask. “Why did this fail?” becomes “did it pass the second time?” and that second question costs you nothing today and everything over time.
I’ve seen teams carry 15-20% retry rates for months. The math is brutal. If you run 300 tests 3 times a day and 15% need a retry, that’s 135 extra test executions daily. At 30 seconds each, you’re burning over an hour of CI time every day on tests that “pass.” Multiply that by a year — that’s 365+ hours of compute time spent re-asking questions you already got honest answers to.
But the CI time isn’t even the real cost. The real cost is trust erosion. Once a team accepts that “some tests are just flaky,” every legitimate failure gets the benefit of the doubt. I’ve watched teams retry a genuine regression four times before someone actually read the error message. That’s the compound interest on technical debt.
What does a race condition look like in test automation?
A race condition in test automation occurs when two or more parallel test workers access shared state — like a test user account, a database record, or a session token — without coordination. The result is non-deterministic failures that pass on retry, making them look “flaky” when the real problem is a concurrency conflict in your test infrastructure.
A race condition is two operations fighting over the same resource with no coordination on who goes first. In application code, it’s two threads writing to the same variable. In test automation, it’s usually subtler — two tests sharing state they shouldn’t be sharing.
The symptoms are always the same:
- Passes in isolation, fails in parallel
- Failures are non-deterministic — different tests, different runs
- Error messages don’t match the test logic (wrong user, wrong data, unexpected state)
- Failure rate increases when you add more parallel workers
If that list describes your suite, you don’t have flaky tests. You have shared state leaking between parallel executions.
The War Story: A “Flaky” Login Test
Our failing tests had no pattern — sometimes a checkout test, sometimes a profile update, sometimes a simple dashboard load. The only commonality was that they all involved authenticated flows. The team had accepted it as Playwright being “flaky with auth.” Playwright wasn’t the problem.
The first clue was in the CI history — four consecutive runs, each failing on different authenticated tests:
Run #1204 Mar 02 258 passed 2 failed ✗ account-management > update billing address ✗ dashboard > load account summary widgetRun #1205 Mar 03 256 passed 4 failed ✗ profile > change notification preferences ✗ checkout > apply promo code to subscription ✗ billing > download invoice PDF ✗ dashboard > verify usage chart rendersRun #1206 Mar 04 259 passed 1 failed ✗ account-management > cancel add-on serviceRun #1207 Mar 05 257 passed 3 failed ✗ billing > update payment method ✗ profile > upload avatar image ✗ checkout > upgrade plan tierThe pattern is invisible if you look at any single run. Line up four runs and it jumps out: every failure is an authenticated flow, no two runs fail on the same test, and every single one passes on retry. That’s not flakiness — that’s contention.
I pulled up the Playwright Trace Viewer on three consecutive failures to confirm. The network timelines told the whole story:
- Test A (Worker 1) logs in as
testuser@corp.comatT+0ms - Test B (Worker 3) logs in as
testuser@corp.comatT+200ms - Test A’s session token gets invalidated when Test B authenticates
- Test A tries to load the dashboard at
T+500ms— gets a 401, redirected to login
The authentication service enforced single-session. When Test B logged in with the same credentials, it killed Test A’s session. Both tests were correct. The infrastructure was correct. The problem was that all 260 tests shared a single test user account.
Worker 1: login(testuser) -----> 200 OK -----> GET /dashboard -----> 401 UnauthorizedWorker 3: login(testuser) -----> 200 OK -----> GET /dashboard -----> 200 OK ^ Session invalidated hereWhy do shared test users break parallel test execution?
Shared test users break parallel execution because most authentication systems enforce single-session policies or rotate tokens on login. When two parallel workers authenticate with the same credentials, one worker’s session gets invalidated by the other, causing unpredictable 401 errors that look like flaky tests.
This anti-pattern is everywhere. A team creates one test account, hardcodes the credentials, and it works fine in sequential execution. The moment you add parallel workers, you’ve introduced a race condition into your test infrastructure.
The failures aren’t limited to session conflicts. Shared test users cause at least three categories of non-deterministic failure:
1. Session/auth token invalidation. The one we hit. Most auth systems enforce single-session or rotate tokens on new login. Two workers logging in with the same credentials means one always loses.
2. Data contention. Test A creates an order for testuser. Test B queries orders for testuser and finds unexpected data. Test A deletes the order. Test B tries to verify the order and it’s gone.
3. State pollution. Test A changes the user’s profile to “Ontario.” Test B expects the default “British Columbia.” Both tests are correct in isolation, both fail unpredictably in parallel.
How do you fix race conditions in parallel test suites?
The fix is test user isolation — every parallel worker gets its own dedicated test user account so no two workers compete for the same session, data, or state. There are three common implementation patterns depending on your provisioning constraints.
Race conditions from shared state are preventable with a controlled test environment — dedicated test users, seeded data, and isolated infrastructure.
The principle is simple — every parallel worker gets its own isolated test user. The implementation depends on your constraints.
Option 1: Pre-Created User Pool
Create a pool of test users ahead of time and assign one per worker. This is the fastest path if your user provisioning is complex or slow.
public class TestUserPool { // One user per parallel worker — never shared private static final String[] USERS = { "testuser-w0@corp.com", "testuser-w1@corp.com", "testuser-w2@corp.com", "testuser-w3@corp.com" };
public static String forWorker(int workerIndex) { return USERS[workerIndex % USERS.length]; }}For Playwright specifically, you can use the built-in workerIndex:
import { test as base } from '@playwright/test';
export const test = base.extend<{ testUser: string }>({ testUser: async ({}, use, workerInfo) => { const email = `testuser-w${workerInfo.workerIndex}@corp.com`; await use(email); },});Option 2: Dynamic User Provisioning
If your system supports it, create a fresh user per test or per worker via API. More overhead, but zero chance of collision.
public class TestUserFactory { public static TestUser create() { // Unique per invocation — UUID eliminates collisions String email = "auto-" + UUID.randomUUID() + "@test.corp.com"; return userApi.createUser(email, DEFAULT_PASSWORD); }}Option 3: Worker-Scoped Setup (Best of Both Worlds)
Create the user once per worker, reuse it across all tests on that worker, and tear it down after. This is what we ended up using — it balances isolation with performance.
export const test = base.extend<{}, { workerUser: TestUser }>({ workerUser: [async ({}, use, workerInfo) => { // Created once per worker, torn down after all its tests const user = await api.createUser({ email: `worker-${workerInfo.workerIndex}-${Date.now()}@test.corp.com`, password: 'Test1234!', }); await use(user); await api.deleteUser(user.id); }, { scope: 'worker' }],});Before
15% of tests needed retries
After
0.4% retry rate (actual infra flakes)
The Organizational Argument
The technical fix took two days. Convincing the team to prioritize it took two weeks. Here’s the argument that worked:
Every retry is an admission that your test suite is giving you unreliable answers. If QA can’t trust the suite, neither can anyone gating releases on it. That means manual verification creeps back in “just to be safe.” A test that lies about its confidence level is worse than no test — and a test that sometimes fails for infrastructure reasons is a test nobody trusts even when it fails for real reasons.
Our team was spending roughly 3 hours per week investigating “flaky” failures that turned out to be retry-masked race conditions. After the fix, that dropped to near zero. But the bigger win was cultural — the team stopped treating test failures as noise and started treating them as signal again.
How do you tell if a flaky test is really a race condition?
Flaky tests that are actually race conditions follow a distinct pattern: they pass in isolation, fail non-deterministically in parallel, and the failure rate scales with the number of parallel workers. If those three conditions are true, you’re dealing with shared state, not test instability.
The next time a test fails intermittently, resist the urge to add a retry. Instead, ask three questions:
- Does it pass in isolation but fail in parallel? You have shared state. Check for thread safety violations — shared drivers, shared users, shared data.
- Does the error reference data from a different test? Two workers are contending over the same resource.
- Does the failure rate scale with parallelism? More workers = more failures = shared state, guaranteed.
Your flaky test is trying to tell you something. The retry is just making sure you never hear it.
Frequently Asked Questions
How do I know if my flaky test is caused by a race condition?
If the test passes when run in isolation but fails non-deterministically in parallel, and the failure rate increases as you add more parallel workers, it is almost certainly a race condition caused by shared state — typically shared test user accounts, shared database records, or shared session tokens.
What is the best way to isolate test users in Playwright parallel execution?
The most effective approach is worker-scoped test user fixtures. Create a unique test user per Playwright worker using workerInfo.workerIndex or a UUID-based factory, reuse it across all tests on that worker, and tear it down after. This balances isolation with performance — you avoid session conflicts without the overhead of creating a new user for every single test.
Should I remove retries entirely from my Playwright config?
Not necessarily. A small retry count (1, not 2-3) can handle genuine infrastructure flakes like network timeouts or CI runner hiccups. The problem is using retries to mask a high failure rate. If more than 1-2% of your tests need retries to pass, you have a systematic issue — likely shared state or a race condition — that retries are hiding.
Can race conditions in test suites hide real production bugs?
Yes. Race conditions in your test infrastructure can mask real application bugs because the team learns to dismiss intermittent failures as “flaky tests.” If a genuine regression causes a test to fail, the retry passes it silently, and the bug ships to production. This is exactly what happened in the case study above — the single-session enforcement that broke our tests was also affecting real customers with multiple browser tabs.
