Halmurat T.
· 8 min read

Thread Safety in Parallel Test Execution: The Bug That Took Us 3 Days to Find

Thread Safety in Parallel Test Execution: The Bug That Took Us 3 Days to Find
Table of Contents

We had 800 tests running in parallel across 4 threads on a large retail platform. Once a week — never more, never less — a handful of tests would fail with assertion data that didn’t match any test case. A test verifying a product name would assert on a completely different product. A login test would screenshot a dashboard that belonged to a different test’s user. Every failing test passed when rerun individually. It took us three days to find the root cause: our WebDriver instances were leaking between threads.

The Symptoms That Should Scare You

  • Tests pass solo but fail in parallel. This is the number one signal. If a test passes with --threads 1 but fails with --threads 4, the test logic isn’t the problem — shared state is.
  • Assertion values from a different test. You’re asserting on “Wireless Headphones” but the actual value is “Running Shoes.” That data belongs to a test running on another thread.
  • Screenshots show the wrong page. Your test failed on the checkout page, but the failure screenshot shows a product listing page. The WebDriver instance was shared, and another thread navigated away.
  • Failures cluster around test count thresholds. We noticed failures only happened when the suite had 700+ tests. With fewer tests, the thread contention was too brief to cause visible issues. This is why thread safety bugs slip through for months before they’re caught.

The 3 Things That Must Be Thread-Local

In any parallel test execution framework — TestNG, JUnit 5, or Playwright’s built-in parallelism — three categories of state absolutely cannot be shared across threads.

1. The Browser/WebDriver Instance

This is the most common violation and the one that bit us. If two threads share a WebDriver instance, one thread’s navigate() call affects what the other thread sees.

src/test/java/core/DriverFactory.java
// BAD — shared static field, all threads use the same driver
public class DriverFactory {
private static WebDriver driver; // Every thread reads and writes this
public static WebDriver getDriver() {
if (driver == null) {
driver = new ChromeDriver();
}
return driver;
}
}

The fix is ThreadLocal, which gives each thread its own isolated instance:

src/test/java/core/DriverFactory.java
// GOOD — each thread gets its own driver instance
public class DriverFactory {
private static final ThreadLocal<WebDriver> driverThread = new ThreadLocal<>();
public static WebDriver getDriver() {
if (driverThread.get() == null) {
driverThread.set(new ChromeDriver());
}
return driverThread.get();
}
// CRITICAL: clean up after each test to prevent memory leaks
public static void quitDriver() {
WebDriver driver = driverThread.get();
if (driver != null) {
driver.quit();
driverThread.remove();
}
}
}

2. Test Data

If your tests create data during execution — a user account, an order, a temporary file — that data must be scoped to the thread or the test. Two tests creating a user with the same email on different threads will collide.

src/test/java/data/TestDataFactory.java
// GOOD — unique test data per thread using thread ID
public class TestDataFactory {
public static String uniqueEmail() {
return "test-" + Thread.currentThread().getId()
+ "-" + System.currentTimeMillis() + "@example.com";
}
public static String uniqueUsername() {
return "user-" + Thread.currentThread().getId()
+ "-" + System.currentTimeMillis();
}
}

On the retail platform, our thread safety bug was compounded by test data collision. Two threads creating a cart for “testuser@example.com” meant one thread’s cart got the other thread’s products. Making emails unique per thread eliminated an entire class of phantom failures.

3. Reporting Context

If you’re using Extent Reports or a similar thread-aware reporting library, the test context must be thread-local. We covered this in detail in our squad tagging implementation for Extent Reports, where ThreadLocal<ExtentTest> ensures each thread’s results are attributed correctly.

src/test/java/reporting/ReportManager.java
public class ReportManager {
private static final ThreadLocal<ExtentTest> testThread = new ThreadLocal<>();
public static void startTest(String name) {
ExtentTest test = extent.createTest(name);
testThread.set(test);
}
public static ExtentTest getTest() {
return testThread.get();
}
}

Without this, test logs from thread 1 bleed into thread 2’s report entry. The result is a report where the failure logs don’t match the test that actually failed — which sends your team on a debugging detour.

TestNG’s Parallel Modes and What Each Means for Shared State

TestNG offers three parallel modes, and each one changes what’s safe to share:

ModeWhat Runs in ParallelSafe to Share Across Tests?
methodsIndividual test methodsNothing — each method may run on any thread
classesTest classesInstance fields are safe within a class, not across classes
tests<test> blocks from testng.xmlTests within a block are sequential; across blocks, nothing is safe
testng.xml
<suite name="Regression" parallel="methods" thread-count="4">
<!-- parallel="methods" is the most aggressive — requires full ThreadLocal discipline -->
<test name="AllTests">
<classes>
<class name="tests.LoginTests"/>
<class name="tests.CheckoutTests"/>
<class name="tests.SearchTests"/>
</classes>
</test>
</suite>

Most teams I work with use parallel="methods" because it gives the best speed improvement. But it’s also the mode that’s least forgiving of shared state. If you’re getting intermittent failures with methods, try switching to classes temporarily. If failures disappear, your problem is shared state between methods — and you need more ThreadLocal.

How to Verify Your Suite Is Actually Thread-Safe

Passing once in parallel doesn’t prove thread safety. Thread safety bugs are probabilistic — they depend on timing, CPU load, and which tests happen to run on the same thread. Here’s how to actually verify:

Terminal
# Run the suite 10 times in parallel — any failure in any run means thread safety issue
for i in {1..10}; do
echo "Run $i of 10"
mvn test -Dsurefire.parallel=methods -Dsurefire.threadCount=4
if [ $? -ne 0 ]; then
echo "FAILED on run $i — thread safety issue detected"
exit 1
fi
done
echo "All 10 runs passed — suite is likely thread-safe"

On the retail project, our suite passed 1 out of 1 parallel runs consistently. When we ran it 10 times, it failed on runs 3, 7, and 9. That’s a thread safety bug. If your suite passes 10 consecutive parallel runs, you can be reasonably confident it’s thread-safe — though “reasonably” is doing heavy lifting. We run 20 iterations before major releases.

Parallel Test Stability

Before

Random failures at 8+ threads

After

10/10 consecutive runs passing

What I’d Do Differently

If I were setting up parallel execution from scratch, I’d make ThreadLocal the default from day one — not something we retrofit after finding bugs. Every base class, every factory, every shared utility would use ThreadLocal storage. The overhead is negligible. The debugging time saved is enormous.

I’d also add a CI gate that runs the suite 5 times in parallel on every PR. Finding thread safety issues before merge is infinitely cheaper than finding them in the nightly run when nobody remembers what they changed.

Understanding coupling and why loose coupling matters helps here too — tightly coupled test infrastructure (shared drivers, shared data, shared state) is exactly what makes parallel execution fragile.

Your Next Step

Check your WebDriver factory. Is the driver stored in a static field or a ThreadLocal field? If it’s static, add ThreadLocal this week — even if you’re not running tests in parallel yet. When you eventually turn on parallel execution (and you will, because a 45-minute suite demands it), you’ll thank yourself for having the foundation already in place.

If your suite already runs in parallel, run it 10 times in a row. If even one run fails, you have a thread safety bug. The fix is almost always ThreadLocal. The hard part is finding which shared state is leaking — and now you know the three places to look first.

Related Posts

Get weekly QA automation insights

No fluff, just battle-tested strategies from 10+ years in the trenches.

No spam. Unsubscribe anytime.