Parallel Execution Without the Refactor Tax

Every parallel execution guide I’ve read jumps straight to ThreadLocal, DriverFactory redesigns, and test data isolation patterns. That’s one way to do it — and sometimes the right way. But it’s not the only way. There are actually two levels of parallelism in test automation, and picking the wrong one is why teams spend weeks refactoring frameworks that didn’t need it. Understanding the difference took our suite from 4 hours to 30 minutes — without a framework rewrite.

What Are the Two Levels of Parallelism?

Most teams treat “parallel execution” as a single concept. It’s not. There are two fundamentally different approaches, and they have completely different costs.

Thread-level parallelism runs multiple test methods as separate threads inside a single JVM process. All threads share the same memory space. This is what you get with TestNG’s parallel execution modes or JUnit 5’s junit.jupiter.execution.parallel.enabled=true. It’s fast and efficient, but every piece of shared state — your WebDriver instance, test data, reporting context — becomes a potential race condition. This is the level that demands ThreadLocal, user isolation patterns, and careful framework design.

Process-level parallelism runs tests in completely separate OS processes. Each process has its own memory space, its own JVM (if applicable), and its own state. Processes can’t accidentally share a WebDriver instance because they physically can’t access each other’s memory. No ThreadLocal needed. No DriverFactory rewrite. The isolation is built into the operating system.

	Thread-Level	Process-Level
How it works	Multiple threads in one JVM	Separate OS processes, each with its own JVM
Memory	Shared heap — all threads see the same objects	Isolated — processes can’t access each other’s memory
Isolation	You enforce it (`ThreadLocal`, careful design)	The OS enforces it (free)
Shared state risk	High — static fields, singletons, and shared refs are all race conditions	None — processes physically can’t share state
Framework changes needed	`ThreadLocal` wrappers, user isolation, reporting context scoping	Minimal — fix test data collisions and implicit ordering
Resource overhead	Low — threads are lightweight	Higher — each process loads its own JVM
Startup time	Fast — threads spin up in milliseconds	Slower — JVM startup per process
Debugging	Harder — race conditions are non-deterministic	Easier — failures are isolated to one process
Tools	TestNG `parallel=methods`, JUnit 5 parallel	Maven `forkCount`, separate user accounts, Docker containers, BrowserStack SDK
Best for	New frameworks designed for thread safety	Existing frameworks that need parallel execution without a rewrite

Why Does Thread-Level Parallelism Require So Much Refactoring?

Thread-level parallelism requires heavy refactoring because all threads share the same heap memory, meaning any static field, singleton, or shared reference becomes a race condition when accessed from multiple test methods simultaneously. You need ThreadLocal discipline across your entire framework to prevent threads from interfering with each other.

I’ve written about this in detail — the 3-day thread safety bug that turned out to be a shared WebDriver instance leaking between threads. The fix was ThreadLocal, but the investigation and refactoring across the framework took weeks.

The traditional parallel execution path means solving three problems simultaneously:

Thread-safe driver management — wrapping every WebDriver in ThreadLocal so threads don’t share browser sessions
Test data isolation — ensuring parallel workers don’t share test users or collide on database records
Infrastructure — standing up a Selenium Grid (or Docker Selenium) to handle concurrent browser sessions

That’s easily 2-3 weeks of work for a mid-size suite. It’s the right investment if you’re building a framework from scratch. But if you’re inheriting a mature framework with hundreds of tests, static driver instances, and a team that’ll push back on a multi-sprint refactor — there’s a faster path.

How Does Process-Level Parallelism Avoid the Refactor Tax?

Process-level parallelism avoids the ThreadLocal refactor entirely because each process runs in its own isolated memory space. Two processes literally cannot share a WebDriver instance — they don’t have access to each other’s heap. The operating system enforces the isolation that ThreadLocal enforces at the language level.

I’ve used process-level parallelism in two very different setups over my career, and both worked for the same fundamental reason.

The Automation Server Pattern

At one enterprise project, we had a dedicated automation server with four OS-level user accounts. Each user account would clone the test repository independently, run a subset of the test suite in its own process, and drop the results into a shared network folder when finished. A simple script at the end merged the results.

# Each user runs on the same server but as a separate OS process
su - testuser1 -c "cd /home/testuser1/repo && mvn test -Dsuite=group1" &
su - testuser2 -c "cd /home/testuser2/repo && mvn test -Dsuite=group2" &
su - testuser3 -c "cd /home/testuser3/repo && mvn test -Dsuite=group3" &
su - testuser4 -c "cd /home/testuser4/repo && mvn test -Dsuite=group4" &
wait
# Merge results from all four runs
cp /home/testuser*/repo/target/surefire-reports/*.xml /shared/results/

No ThreadLocal. No framework changes. Each user account was a completely isolated process with its own JVM, its own WebDriver, its own memory. The only shared resource was the results folder — and that was write-only at the end.

Was it elegant? No. Was it running in production for over a year, cutting our suite time by 4x? Yes.

We didn’t call it “process-level parallelism” at the time. We called it “we need this to run faster and we have a server with 16 cores sitting mostly idle.” But it was the same principle that modern tooling now formalizes.

BrowserStack SDK: Cloud-Managed Process-Level Parallelism

BrowserStack SDK is the same concept, productized. It’s a Java agent that intercepts your WebDriver creation at the JVM level. When your test calls new ChromeDriver(), the SDK replaces it with a remote BrowserStack session — without changing any code. Your tests think they’re running locally. They’re actually running on BrowserStack’s cloud. The official SDK documentation covers the full setup and benefits for TestNG.

Combined with Maven Surefire’s forkCount, each forked JVM process gets its own remote browser session. Same isolation principle as the four-user-account server, but managed by cloud infrastructure instead of bash scripts.

The setup is three steps:

1. Add the dependency:

<dependencies>
    <!-- Existing Selenium, TestNG dependencies stay unchanged -->
    <dependency>
        <groupId>com.browserstack</groupId>
        <artifactId>browserstack-java-sdk</artifactId>
        <version>LATEST</version>
    </dependency>
</dependencies>

2. Create the configuration:

userName: ${BROWSERSTACK_USERNAME}
accessKey: ${BROWSERSTACK_ACCESS_KEY}

framework: testng
parallelsPerPlatform: 5

platforms:
  - os: Windows
    osVersion: 11
    browserName: Chrome
    browserVersion: latest
  - os: OS X
    osVersion: Sonoma
    browserName: Safari
    browserVersion: latest
  - os: Windows
    osVersion: 11
    browserName: Firefox
    browserVersion: latest

browserstackLocal: false
buildName: "regression-${BUILD_NUMBER}"

3. Wire up Maven and run:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <configuration>
        <argLine>
            -javaagent:${com.browserstack:browserstack-java-sdk:jar}
        </argLine>
    </configuration>
</plugin>

mvn test

Your existing tests — with their existing new ChromeDriver() calls, existing page objects, existing assertions — now run on BrowserStack’s cloud across multiple parallel sessions. No ThreadLocal. No DriverFactory rewrite. No Selenium Grid to maintain.

How Much Faster Did the Suite Get?

4 hours

Before (sequential)

30 min

After (parallel)

0 hrs/week

Infrastructure maintenance

An 8x improvement. And that included Safari and Firefox, which we’d never tested against. We found browser-specific failures that had been hiding in production — rendering issues and API inconsistencies that Chrome silently handled but Safari didn’t. The parallel execution was the primary goal, but the cross-browser coverage alone justified the BrowserStack license.

What Actually Broke (And What I Had to Fix)

Process-level parallelism sidesteps ThreadLocal, but it doesn’t sidestep every problem. Parallel execution — at any level — exposes shortcuts that sequential runs hide. Here’s what I had to work through:

Test data collisions. Multiple tests created records using the same hardcoded identifiers. Sequentially, they ran and cleaned up one at a time. In parallel, they collided. I went through the data setup utilities and added timestamp suffixes to prevent collisions.

public class OrderData {
    public static String uniqueProductId(String baseId) {
        // Timestamp suffix prevents collisions across parallel sessions
        return baseId + "-" + System.currentTimeMillis();
    }
}

Implicit test ordering. Several tests assumed another test had already created their test data. Sequential execution masked this because TestNG’s default ordering happened to run them in the right order. In parallel, the setup test sometimes ran on a different fork and finished last. I extracted shared setup logic into @BeforeClass methods where it belonged.

Utility class adjustments. A few shared helper classes had static state that didn’t survive process forking cleanly — cached configuration values and shared HTTP client instances that assumed a single execution context. None of these required a full ThreadLocal refactor, but they did need attention.

Config tuning. Getting the right parallelsPerPlatform count took experimentation. Too many parallel sessions and BrowserStack’s queue would back up. Too few and we weren’t getting the speedup we wanted.

The bulk of the tests passed without modification. But “most tests just worked” doesn’t mean “zero effort” — the tests that broke needed real investigation, and the utility fixes added up to a week of targeted work.

How Do You Decide Between Thread-Level and Process-Level Parallelism?

The right choice depends on where you are today and what constraint is tightest — time, money, or control.

Situation	Level	Approach
Existing suite, need parallel fast	Process	BrowserStack SDK or multi-process runner — days, not sprints
Building a new framework from scratch	Thread	Design for `ThreadLocal` from day one
Running 10+ suites across multiple teams	Either	Self-hosted Grid (Selenoid) for thread-level; dedicated servers for process-level
Need cross-browser coverage, not just speed	Process	BrowserStack SDK — browser matrix is a config change
Pre-commit or fast-feedback loop	Thread	Local parallel with `ThreadLocal` — no network latency
Budget is tight, have spare hardware	Process	The automation server pattern — 4 users, 4 processes, shared results folder

The resourceful move is to start with process-level parallelism as a bridge. Get results now, plan the proper ThreadLocal refactor for next quarter. When you eventually build thread-safe infrastructure, you can keep BrowserStack for cross-browser validation and move your primary speed-focused execution to local thread-level parallel.

I’ve seen too many teams spend 3 sprints “preparing for parallel” while their suite stays at 4 hours. The 80% solution running today beats the 100% solution planned for Q3. For more on building stable parallel test infrastructure, explore our test automation guides.

Your Next Step

Look at your current test execution setup and ask: are we even using the right level of parallelism? If your team has been blocked on a ThreadLocal refactor for months, try process-level first. Pick 10 tests, run them in separate processes (even something as simple as two terminal windows running different test groups), and see if they pass without changes. If they do — and most will — you’ve validated the approach without writing a single line of framework code.

§ Frequently Asked FAQ

Is process-level parallelism always better than thread-level?

No. Thread-level parallelism is more resource-efficient — threads share memory and have less startup overhead than separate processes. If you have a well-designed framework with proper ThreadLocal discipline, thread-level parallelism gives you better performance per CPU core. Process-level is better when you need parallel execution now and can’t afford the framework refactor.

Can I combine both levels of parallelism?

Yes, and many mature teams do. You can run process-level parallelism across machines or cloud sessions while also running thread-level parallelism within each process. But start with one level. Adding both simultaneously doubles the debugging surface when something breaks.

Does process-level parallelism work with Playwright?

Yes. Playwright already uses process-level isolation by default — each worker runs in a separate process. This is one reason Playwright suites tend to be more stable in parallel out of the box compared to Selenium with TestNG, which defaults to thread-level parallelism.

What about Docker containers — is that process-level or something else?

Docker containers are process-level parallelism with extra isolation. Each container is an isolated process with its own filesystem, network, and memory. Running tests in separate Docker containers gives you the same isolation benefits as separate OS processes, plus reproducibility and easy cleanup. Tools like Selenoid use this approach.

What Are the Two Levels of Parallelism?

Why Does Thread-Level Parallelism Require So Much Refactoring?

How Does Process-Level Parallelism Avoid the Refactor Tax?

The Automation Server Pattern

BrowserStack SDK: Cloud-Managed Process-Level Parallelism

How Much Faster Did the Suite Get?

What Actually Broke (And What I Had to Fix)

How Do You Decide Between Thread-Level and Process-Level Parallelism?

Your Next Step

Your Test Suite Is Slow for 5 Reasons — Not Just One

Selenium's Alert Handling Crashed Our Parallel Suite

The Browser Errors Your Test Suite Never Catches

Don't miss a thing