Parallel Execution Without the Refactor Tax

Table of Contents
Every parallel execution guide I’ve read jumps straight to ThreadLocal, DriverFactory redesigns, and test data isolation patterns. That’s one way to do it — and sometimes the right way. But it’s not the only way. There are actually two levels of parallelism in test automation, and picking the wrong one is why teams spend weeks refactoring frameworks that didn’t need it. Understanding the difference took our suite from 4 hours to 30 minutes — without a framework rewrite.
What Are the Two Levels of Parallelism?
Most teams treat “parallel execution” as a single concept. It’s not. There are two fundamentally different approaches, and they have completely different costs.
Thread-level parallelism runs multiple test methods as separate threads inside a single JVM process. All threads share the same memory space. This is what you get with TestNG’s parallel execution modes or JUnit 5’s junit.jupiter.execution.parallel.enabled=true. It’s fast and efficient, but every piece of shared state — your WebDriver instance, test data, reporting context — becomes a potential race condition. This is the level that demands ThreadLocal, user isolation patterns, and careful framework design.
Process-level parallelism runs tests in completely separate OS processes. Each process has its own memory space, its own JVM (if applicable), and its own state. Processes can’t accidentally share a WebDriver instance because they physically can’t access each other’s memory. No ThreadLocal needed. No DriverFactory rewrite. The isolation is built into the operating system.
| Thread-Level | Process-Level | |
|---|---|---|
| How it works | Multiple threads in one JVM | Separate OS processes, each with its own JVM |
| Memory | Shared heap — all threads see the same objects | Isolated — processes can’t access each other’s memory |
| Isolation | You enforce it (ThreadLocal, careful design) | The OS enforces it (free) |
| Shared state risk | High — static fields, singletons, and shared refs are all race conditions | None — processes physically can’t share state |
| Framework changes needed | ThreadLocal wrappers, user isolation, reporting context scoping | Minimal — fix test data collisions and implicit ordering |
| Resource overhead | Low — threads are lightweight | Higher — each process loads its own JVM |
| Startup time | Fast — threads spin up in milliseconds | Slower — JVM startup per process |
| Debugging | Harder — race conditions are non-deterministic | Easier — failures are isolated to one process |
| Tools | TestNG parallel=methods, JUnit 5 parallel | Maven forkCount, separate user accounts, Docker containers, BrowserStack SDK |
| Best for | New frameworks designed for thread safety | Existing frameworks that need parallel execution without a rewrite |
Why Does Thread-Level Parallelism Require So Much Refactoring?
Thread-level parallelism requires heavy refactoring because all threads share the same heap memory, meaning any static field, singleton, or shared reference becomes a race condition when accessed from multiple test methods simultaneously. You need ThreadLocal discipline across your entire framework to prevent threads from interfering with each other.
I’ve written about this in detail — the 3-day thread safety bug that turned out to be a shared WebDriver instance leaking between threads. The fix was ThreadLocal, but the investigation and refactoring across the framework took weeks.
The traditional parallel execution path means solving three problems simultaneously:
- Thread-safe driver management — wrapping every
WebDriverinThreadLocalso threads don’t share browser sessions - Test data isolation — ensuring parallel workers don’t share test users or collide on database records
- Infrastructure — standing up a Selenium Grid (or Docker Selenium) to handle concurrent browser sessions
That’s easily 2-3 weeks of work for a mid-size suite. It’s the right investment if you’re building a framework from scratch. But if you’re inheriting a mature framework with hundreds of tests, static driver instances, and a team that’ll push back on a multi-sprint refactor — there’s a faster path.
How Does Process-Level Parallelism Avoid the Refactor Tax?
Process-level parallelism avoids the ThreadLocal refactor entirely because each process runs in its own isolated memory space. Two processes literally cannot share a WebDriver instance — they don’t have access to each other’s heap. The operating system enforces the isolation that ThreadLocal enforces at the language level.
I’ve used process-level parallelism in two very different setups over my career, and both worked for the same fundamental reason.
The Automation Server Pattern
At one enterprise project, we had a dedicated automation server with four OS-level user accounts. Each user account would clone the test repository independently, run a subset of the test suite in its own process, and drop the results into a shared network folder when finished. A simple script at the end merged the results.
# Each user runs on the same server but as a separate OS processsu - testuser1 -c "cd /home/testuser1/repo && mvn test -Dsuite=group1" &su - testuser2 -c "cd /home/testuser2/repo && mvn test -Dsuite=group2" &su - testuser3 -c "cd /home/testuser3/repo && mvn test -Dsuite=group3" &su - testuser4 -c "cd /home/testuser4/repo && mvn test -Dsuite=group4" &wait# Merge results from all four runscp /home/testuser*/repo/target/surefire-reports/*.xml /shared/results/No ThreadLocal. No framework changes. Each user account was a completely isolated process with its own JVM, its own WebDriver, its own memory. The only shared resource was the results folder — and that was write-only at the end.
Was it elegant? No. Was it running in production for over a year, cutting our suite time by 4x? Yes.
We didn’t call it “process-level parallelism” at the time. We called it “we need this to run faster and we have a server with 16 cores sitting mostly idle.” But it was the same principle that modern tooling now formalizes.
BrowserStack SDK: Cloud-Managed Process-Level Parallelism
BrowserStack SDK is the same concept, productized. It’s a Java agent that intercepts your WebDriver creation at the JVM level. When your test calls new ChromeDriver(), the SDK replaces it with a remote BrowserStack session — without changing any code. Your tests think they’re running locally. They’re actually running on BrowserStack’s cloud. The official SDK documentation covers the full setup and benefits for TestNG.
Combined with Maven Surefire’s forkCount, each forked JVM process gets its own remote browser session. Same isolation principle as the four-user-account server, but managed by cloud infrastructure instead of bash scripts.
The setup is three steps:
1. Add the dependency:
<dependencies> <!-- Existing Selenium, TestNG dependencies stay unchanged --> <dependency> <groupId>com.browserstack</groupId> <artifactId>browserstack-java-sdk</artifactId> <version>LATEST</version> </dependency></dependencies>2. Create the configuration:
userName: ${BROWSERSTACK_USERNAME}accessKey: ${BROWSERSTACK_ACCESS_KEY}
framework: testngparallelsPerPlatform: 5
platforms: - os: Windows osVersion: 11 browserName: Chrome browserVersion: latest - os: OS X osVersion: Sonoma browserName: Safari browserVersion: latest - os: Windows osVersion: 11 browserName: Firefox browserVersion: latest
browserstackLocal: falsebuildName: "regression-${BUILD_NUMBER}"3. Wire up Maven and run:
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <configuration> <argLine> -javaagent:${com.browserstack:browserstack-java-sdk:jar} </argLine> </configuration></plugin>mvn testYour existing tests — with their existing new ChromeDriver() calls, existing page objects, existing assertions — now run on BrowserStack’s cloud across multiple parallel sessions. No ThreadLocal. No DriverFactory rewrite. No Selenium Grid to maintain.
How Much Faster Did the Suite Get?
4 hours
Before (sequential)
30 min
After (parallel)
0 hrs/week
Infrastructure maintenance
An 8x improvement. And that included Safari and Firefox, which we’d never tested against. We found browser-specific failures that had been hiding in production — rendering issues and API inconsistencies that Chrome silently handled but Safari didn’t. The parallel execution was the primary goal, but the cross-browser coverage alone justified the BrowserStack license.
What Actually Broke (And What I Had to Fix)
Process-level parallelism sidesteps ThreadLocal, but it doesn’t sidestep every problem. Parallel execution — at any level — exposes shortcuts that sequential runs hide. Here’s what I had to work through:
Test data collisions. Multiple tests created records using the same hardcoded identifiers. Sequentially, they ran and cleaned up one at a time. In parallel, they collided. I went through the data setup utilities and added timestamp suffixes to prevent collisions.
public class OrderData { public static String uniqueProductId(String baseId) { // Timestamp suffix prevents collisions across parallel sessions return baseId + "-" + System.currentTimeMillis(); }}Implicit test ordering. Several tests assumed another test had already created their test data. Sequential execution masked this because TestNG’s default ordering happened to run them in the right order. In parallel, the setup test sometimes ran on a different fork and finished last. I extracted shared setup logic into @BeforeClass methods where it belonged.
Utility class adjustments. A few shared helper classes had static state that didn’t survive process forking cleanly — cached configuration values and shared HTTP client instances that assumed a single execution context. None of these required a full ThreadLocal refactor, but they did need attention.
Config tuning. Getting the right parallelsPerPlatform count took experimentation. Too many parallel sessions and BrowserStack’s queue would back up. Too few and we weren’t getting the speedup we wanted.
The bulk of the tests passed without modification. But “most tests just worked” doesn’t mean “zero effort” — the tests that broke needed real investigation, and the utility fixes added up to a week of targeted work.
How Do You Decide Between Thread-Level and Process-Level Parallelism?
The right choice depends on where you are today and what constraint is tightest — time, money, or control.
| Situation | Level | Approach |
|---|---|---|
| Existing suite, need parallel fast | Process | BrowserStack SDK or multi-process runner — days, not sprints |
| Building a new framework from scratch | Thread | Design for ThreadLocal from day one |
| Running 10+ suites across multiple teams | Either | Self-hosted Grid (Selenoid) for thread-level; dedicated servers for process-level |
| Need cross-browser coverage, not just speed | Process | BrowserStack SDK — browser matrix is a config change |
| Pre-commit or fast-feedback loop | Thread | Local parallel with ThreadLocal — no network latency |
| Budget is tight, have spare hardware | Process | The automation server pattern — 4 users, 4 processes, shared results folder |
The resourceful move is to start with process-level parallelism as a bridge. Get results now, plan the proper ThreadLocal refactor for next quarter. When you eventually build thread-safe infrastructure, you can keep BrowserStack for cross-browser validation and move your primary speed-focused execution to local thread-level parallel.
I’ve seen too many teams spend 3 sprints “preparing for parallel” while their suite stays at 4 hours. The 80% solution running today beats the 100% solution planned for Q3. For more on building stable parallel test infrastructure, explore our test automation guides.
Your Next Step
Look at your current test execution setup and ask: are we even using the right level of parallelism? If your team has been blocked on a ThreadLocal refactor for months, try process-level first. Pick 10 tests, run them in separate processes (even something as simple as two terminal windows running different test groups), and see if they pass without changes. If they do — and most will — you’ve validated the approach without writing a single line of framework code.
Is process-level parallelism always better than thread-level?
No. Thread-level parallelism is more resource-efficient — threads share memory and have less startup overhead than separate processes. If you have a well-designed framework with proper ThreadLocal discipline, thread-level parallelism gives you better performance per CPU core. Process-level is better when you need parallel execution now and can’t afford the framework refactor.
Can I combine both levels of parallelism?
Yes, and many mature teams do. You can run process-level parallelism across machines or cloud sessions while also running thread-level parallelism within each process. But start with one level. Adding both simultaneously doubles the debugging surface when something breaks.
Does process-level parallelism work with Playwright?
Yes. Playwright already uses process-level isolation by default — each worker runs in a separate process. This is one reason Playwright suites tend to be more stable in parallel out of the box compared to Selenium with TestNG, which defaults to thread-level parallelism.
What about Docker containers — is that process-level or something else?
Docker containers are process-level parallelism with extra isolation. Each container is an isolated process with its own filesystem, network, and memory. Running tests in separate Docker containers gives you the same isolation benefits as separate OS processes, plus reproducibility and easy cleanup. Tools like Selenoid use this approach.
