Guide

How to Fix Flaky Tests: 7 Proven Patterns That Actually Work

March 26, 202610 min read

You detected your flaky tests. You know what they cost you. Now you need to actually fix them. Most teams get stuck here -- they retry, quarantine, and move on. The flaky test stays in the codebase for months, quietly draining CI time and developer trust.

This guide covers 7 specific fix patterns that address the root causes behind 95% of flaky tests. Each pattern includes the symptom, the underlying cause, a concrete fix with code, and how to prevent it from recurring.

Before You Fix: Identify the Root Cause Category

Every flaky test falls into one of these categories. Knowing which one you are dealing with determines which fix pattern to apply. Tools like AI-powered root cause analysis can classify these automatically, but you can also diagnose manually:

Race conditions -- test depends on timing of async operations
Shared state -- tests pollute each other through shared databases, files, or globals
Timing/timeout -- hardcoded sleeps or timeouts that break under load
Resource exhaustion -- port conflicts, file handle leaks, memory pressure
External dependencies -- network calls, third-party APIs, system clock
Order dependence -- test passes only when run after another test
Environment drift -- works on macOS, fails on Linux CI runner

Pattern 1: Fix Race Conditions with Explicit Waits

Symptom: Test passes locally (fast machine) but fails intermittently in CI (slower, shared runner). Usually involves async operations: database writes, API calls, event handlers, DOM updates.

Root cause: The test asserts before the async operation completes. On a fast machine, the operation finishes in time. On a loaded CI runner, it does not.

The fix: Replace implicit timing with explicit waits

// BAD: Relies on timing
test('shows notification after save', async () => {
  await userEvent.click(saveButton);
  // This sleep might not be long enough on slow CI
  await new Promise(r => setTimeout(r, 100));
  expect(screen.getByText('Saved')).toBeVisible();
});

// GOOD: Wait for the actual condition
test('shows notification after save', async () => {
  await userEvent.click(saveButton);
  // waitFor retries until condition is true or timeout
  await waitFor(() => {
    expect(screen.getByText('Saved')).toBeVisible();
  });
});

For backend tests, the same principle applies -- wait for the observable side effect, not an arbitrary delay:

# BAD: Sleep and hope
def test_webhook_processes_event():
    post("/webhooks", data=event)
    time.sleep(2)  # Might not be enough
    assert db.query(Event).count() == 1

# GOOD: Poll for the expected state
def test_webhook_processes_event():
    post("/webhooks", data=event)
    wait_until(lambda: db.query(Event).count() == 1,
               timeout=10, interval=0.1)

Prevention: Lint for setTimeout and time.sleep in test files. If someone adds a sleep, require a comment explaining why an explicit wait is not possible.

Pattern 2: Isolate Shared State Between Tests

Symptom: Test passes when run alone, fails when run with other tests. Or: test suite passes, then adding a new test breaks an unrelated existing test.

Root cause: Tests share mutable state -- a database, a global variable, a file on disk, an environment variable. One test modifies the state, and another test assumes the original state.

The fix: Reset state before each test

// Database: Use transactions that roll back
beforeEach(async () => {
  // Start a transaction
  await db.query('BEGIN');
});

afterEach(async () => {
  // Roll back everything the test did
  await db.query('ROLLBACK');
});

// Global state: Save and restore
let originalEnv: NodeJS.ProcessEnv;
beforeEach(() => {
  originalEnv = { ...process.env };
});
afterEach(() => {
  process.env = originalEnv;
});

// File system: Use temp directories
beforeEach(() => {
  testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'test-'));
});
afterEach(() => {
  fs.rmSync(testDir, { recursive: true });
});

Prevention: Run tests in random order (Jest --randomize, Pytest -p randomly). If order-dependent tests exist, randomization exposes them immediately instead of hiding them behind a stable run order.

Pattern 3: Replace Hardcoded Timeouts with Adaptive Waits

Symptom: Test has a timeout(5000) that passes locally but fails in CI. Or the test takes exactly N seconds because it is sleeping for N seconds.

Root cause: CI runners have variable performance. A 5-second timeout that works on your M2 MacBook may not be enough on a shared GitHub Actions runner handling 50 concurrent jobs.

The fix: Use condition-based waits with generous timeouts

// BAD: Fixed timeout
test('processes batch', async () => {
  await startBatchJob();
  await delay(5000); // Works locally, fails in CI
  expect(await getJobStatus()).toBe('complete');
}, 10000);

// GOOD: Poll for completion with generous upper bound
test('processes batch', async () => {
  await startBatchJob();
  await waitFor(
    async () => expect(await getJobStatus()).toBe('complete'),
    { timeout: 30000, interval: 500 }
  );
}, 60000); // Generous test timeout -- real duration is usually <2s

The key insight: set the test timeout generously (the outer bound that should never hit), but poll frequently (the inner interval). The test completes as soon as the condition is met, so it is still fast when things work -- and it only times out when something is genuinely broken.

Pattern 4: Fix Resource Conflicts with Dynamic Allocation

Symptom: "Address already in use" errors. Tests fail in parallel but pass sequentially. Port conflicts, file lock errors.

Root cause: Multiple test processes or parallel test runners compete for the same port, file, or other system resource.

The fix: Allocate resources dynamically

// BAD: Hardcoded port
const server = app.listen(3000);

// GOOD: Let the OS assign a free port
const server = app.listen(0); // Port 0 = OS assigns next available
const port = (server.address() as AddressInfo).port;

// BAD: Shared temp file
const logFile = '/tmp/test-output.log';

// GOOD: Unique temp file per test run
const logFile = path.join(
  os.tmpdir(),
  `test-output-${process.pid}-${Date.now()}.log`
);

For database tests running in parallel, use unique schema names or separate databases per worker:

# Pytest with unique database per worker
@pytest.fixture
def db(worker_id):
    db_name = f"test_db_{worker_id}"
    create_database(db_name)
    yield connect(db_name)
    drop_database(db_name)

Prevention: Ban hardcoded ports and file paths in test code via linting rules.

Pattern 5: Mock External Dependencies Deterministically

Symptom: Test fails when the network is slow, a third-party API is down, or a rate limit is hit. Works fine locally with fast internet.

Root cause: The test makes real network calls to external services. External services are inherently unreliable -- they go down, they rate-limit, they change their API.

The fix: Mock at the HTTP boundary

// Using MSW (Mock Service Worker) for API mocking
import { setupServer } from 'msw/node';
import { http, HttpResponse } from 'msw';

const server = setupServer(
  http.get('https://api.github.com/repos/:owner/:repo', () => {
    return HttpResponse.json({
      name: 'my-repo',
      full_name: 'owner/my-repo',
      stargazers_count: 42,
    });
  })
);

beforeAll(() => server.listen());
afterEach(() => server.resetHandlers());
afterAll(() => server.close());

test('fetches repo info', async () => {
  const repo = await getRepoInfo('owner', 'my-repo');
  expect(repo.stars).toBe(42);
  // Deterministic -- no network call, always passes
});

For time-dependent tests, mock the clock:

// BAD: Depends on real time
test('token expires after 1 hour', async () => {
  const token = createToken();
  await delay(3600000); // Wait an actual hour? No.
  expect(token.isExpired()).toBe(true);
});

// GOOD: Control time
test('token expires after 1 hour', () => {
  jest.useFakeTimers();
  const token = createToken();
  jest.advanceTimersByTime(3600000);
  expect(token.isExpired()).toBe(true);
  jest.useRealTimers();
});

Prevention: In CI environments like GitHub Actions, network calls from tests are a red flag. Block external network access in test environments using firewall rules or HTTP interceptors that fail on unmocked requests.

Pattern 6: Fix Order-Dependent Tests with Proper Setup

Symptom: Running tests in a different order causes failures. Adding--randomize to your test runner breaks previously-passing tests.

Root cause: Test B implicitly depends on state created by Test A. When Test A runs first, Test B finds the state it needs. When Test B runs first or alone, it fails because the state does not exist.

The fix: Make each test self-contained

// BAD: Test B depends on Test A having run
describe('user management', () => {
  test('A: creates a user', async () => {
    await createUser({ name: 'Alice' });
    expect(await getUser('Alice')).toBeDefined();
  });

  test('B: updates the user', async () => {
    // Fails if Test A didn't run first!
    await updateUser('Alice', { role: 'admin' });
    expect(await getUser('Alice')).toHaveProperty('role', 'admin');
  });
});

// GOOD: Each test creates its own state
describe('user management', () => {
  test('creates a user', async () => {
    await createUser({ name: 'Alice' });
    expect(await getUser('Alice')).toBeDefined();
  });

  test('updates a user', async () => {
    // Self-contained: creates its own user
    await createUser({ name: 'Bob' });
    await updateUser('Bob', { role: 'admin' });
    expect(await getUser('Bob')).toHaveProperty('role', 'admin');
  });
});

Prevention: Always run tests with randomized order in CI. This is the single most effective prevention measure for order-dependent flakiness.

Pattern 7: Normalize Environment Differences

Symptom: "Works on my machine." Test passes on macOS, fails on Linux. Or passes in one CI environment but fails in another.

Root cause: Tests depend on OS-specific behavior: file path separators, locale settings, timezone, available system commands, filesystem case sensitivity.

The fix: Normalize or abstract environment-specific behavior

// BAD: Hardcoded path separator
expect(result.path).toBe('src/components/Button.tsx');

// GOOD: Use path.join for OS-independent paths
expect(result.path).toBe(path.join('src', 'components', 'Button.tsx'));

// BAD: Assumes specific timezone
expect(formatDate(timestamp)).toBe('2026-03-26 10:00 AM');

// GOOD: Set timezone explicitly in test
test('formats date correctly', () => {
  process.env.TZ = 'UTC';
  expect(formatDate(timestamp)).toBe('2026-03-26 10:00 AM');
});

// BAD: Depends on locale-specific sorting
expect(sortedNames).toEqual(['Apple', 'banana', 'Cherry']);

// GOOD: Explicit collation
const sorted = names.sort((a, b) =>
  a.localeCompare(b, 'en', { sensitivity: 'base' })
);

For filesystem case sensitivity (macOS is case-insensitive by default, Linux is case-sensitive):

// BAD: Works on macOS, fails on Linux
import Button from './button'; // File is actually Button.tsx

// GOOD: Match the exact filename case
import Button from './Button';

Prevention: Run CI on the same OS (or container) as production. Use Docker-based CI runners to eliminate environment differences entirely.

Putting It All Together: A Systematic Fix Workflow

When you encounter a flaky test, follow this workflow:

Quarantine it -- move it out of the critical path so it stops blocking merges
Classify the root cause -- use root cause analysis or the categories above
Apply the matching fix pattern -- one of the 7 patterns above covers 95% of cases
Verify the fix -- run the test 50-100 times (for i in {1..100}; do npm test -- --testPathPattern=mytest; done)
Un-quarantine -- move it back to the critical path
Add a prevention measure -- lint rule, test config, or team guideline

Tools like FlakyGuard automate steps 1-3 by detecting flaky tests, quarantining them automatically, and using AI to classify root causes and suggest the specific fix pattern to apply.

The ROI of Fixing vs. Retrying

Most teams default to retrying flaky tests because it is the path of least resistance. But retrying has compounding costs:

CI time: Each retry adds 2-5 minutes. With 10 flaky tests retrying 3x each, that is 60-150 extra minutes of CI per day.
Developer trust: When tests "randomly" fail, developers stop trusting the test suite. They merge with red CI. Real bugs slip through.
Accumulation: Unfixed flaky tests compound. Teams that retry instead of fix end up with 50+ flaky tests within a year. The total cost can reach millions per year for large teams.

Fixing a flaky test takes 30-60 minutes using the patterns above. The fix is permanent. Retrying that same test costs 5 minutes per day, forever. The breakeven is less than 2 weeks.

Key Takeaways

Every flaky test has a root cause. The 7 patterns above cover 95% of them.
Fix the cause, not the symptom. Retrying is a bandaid, not a fix.
Prevention is cheaper than cure -- randomize test order, mock external calls, use dynamic resource allocation.
Automate the workflow: detect, quarantine, classify, fix, verify, un-quarantine.