Best Practices

How to Prevent Flaky Tests: 6 Engineering Practices for Reliable Test Suites

March 27, 202611 min read

The best flaky test is the one you never write. While detecting flaky tests and fixing them are important skills, prevention is significantly cheaper. A flaky test that never makes it into your codebase costs nothing -- no CI failures, no developer interruptions, no hours of debugging an intermittent failure.

This guide covers 6 engineering practices that prevent flakiness at the source. These are not theoretical ideals -- they are concrete, implementable techniques that teams with mature test suites use to keep their CI green.

Why Flaky Tests Are an Engineering Problem, Not a Testing Problem

Most teams treat flaky tests as a testing nuisance -- something QA deals with, or something you retry your way out of. This framing is wrong and expensive.

Flaky tests are a symptom of engineering problems: non-deterministic code, implicit dependencies, shared mutable state, and timing assumptions baked into test logic. Fixing those engineering problems does not just eliminate flakiness -- it produces better code.

The cost of flaky tests compounds over time. A codebase with 5 flaky tests today will have 50 in a year if the practices that produce them are not addressed. Prevention changes the trajectory.

Practice 1: Design Tests to Be Deterministic by Default

The most common source of flakiness is non-determinism in test setup. A test is non-deterministic when its outcome depends on something outside the test's control: the current time, a random number, network latency, or the order other tests ran.

Concrete rules for deterministic tests:

Control time explicitly. Never let tests depend on wall-clock time. Use fake timers or inject a clock interface.
Seed random number generators. If your code uses randomness, expose a seeded RNG in tests so outcomes are reproducible.
Mock the filesystem clock. File modification times can affect test outcomes in build tools and caching tests. Control Date.now().
Avoid assertions on exact timestamps. Assert on relative differences (createdAt < updatedAt) rather than specific values.

// BAD: Non-deterministic -- depends on real time
test('token is recent', () => {
  const token = createToken();
  const now = Date.now();
  expect(token.createdAt).toBeGreaterThan(now - 1000);
  // Fails if test runner is slow, machine is loaded, etc.
});

// GOOD: Inject time, assert deterministically
test('token is created at the given time', () => {
  const fixedTime = new Date('2026-01-01T00:00:00Z').getTime();
  jest.setSystemTime(fixedTime);

  const token = createToken();

  expect(token.createdAt).toBe(fixedTime);
  jest.useRealTimers();
});

Practice 2: Enforce Strict Test Isolation

Test isolation means each test can run independently, in any order, and produce the same result. Isolation is the single most effective prevention measure because most root causes of flakiness involve tests leaking state into each other.

What isolation looks like in practice:

Database: Each test starts with a clean database state. Use transactions that roll back after each test, or truncate tables in beforeEach.
Environment variables: Save and restore process.env around tests that modify it. Or use a library like jest-environment-variables.
Global state: Modules with singleton state (loggers, caches, connection pools) should expose a reset method, or be recreated per test.
File system: Use unique temporary directories per test, never shared paths that persist between runs.

// Enforcing isolation with a test helper
class TestContext {
  private envSnapshot: NodeJS.ProcessEnv;
  private tempDir: string;

  async setup() {
    // Snapshot env
    this.envSnapshot = { ...process.env };
    // Unique temp dir
    this.tempDir = await fs.mkdtemp(path.join(os.tmpdir(), 'test-'));
    // Wrap DB in transaction
    await db.query('BEGIN');
  }

  async teardown() {
    // Restore env
    Object.keys(process.env).forEach(k => delete process.env[k]);
    Object.assign(process.env, this.envSnapshot);
    // Clean temp dir
    await fs.rm(this.tempDir, { recursive: true });
    // Roll back DB changes
    await db.query('ROLLBACK');
  }
}

// In every test file:
const ctx = new TestContext();
beforeEach(() => ctx.setup());
afterEach(() => ctx.teardown());

Enforcement tip: Run tests with randomized order in CI using jest --randomize or pytest -p randomly. If any test breaks under randomization, you have found an isolation violation before it becomes a flaky test.

Practice 3: Never Make Real Network Calls in Unit or Integration Tests

Network calls in tests are the single most common source of environment-dependent flakiness. They introduce latency variance, external service outages, rate limits, and DNS resolution failures into your test suite. None of these are bugs in your code, but they make your tests report failures anyway.

The rule: Any test below end-to-end scope should not make real network calls. Mock at the HTTP boundary.

// Setup: intercept all HTTP at the transport layer
import { setupServer } from 'msw/node';
import { http, HttpResponse } from 'msw';

export const server = setupServer();

// In jest.setup.ts:
beforeAll(() => server.listen({ onUnhandledRequest: 'error' }));
// ^^^ onUnhandledRequest: 'error' is the key prevention measure.
// Any test that makes a real network call FAILS IMMEDIATELY.
// This turns network calls from intermittent failures into
// deterministic failures you fix immediately.

afterEach(() => server.resetHandlers());
afterAll(() => server.close());

// In test file:
server.use(
  http.post('https://api.stripe.com/v1/charges', () => {
    return HttpResponse.json({ id: 'ch_test', status: 'succeeded' });
  })
);

test('processes payment', async () => {
  const result = await processPayment({ amount: 1000 });
  expect(result.status).toBe('succeeded');
  // Always passes -- no network, no flakiness
});

The onUnhandledRequest: 'error' configuration is critical. It converts accidental network calls from intermittent flakiness into immediate, deterministic failures that are caught in code review. This is prevention at the architectural level.

Practice 4: Use Dynamic Resource Allocation

Hardcoded ports, file paths, and database names are a common source of flakiness in CI environments like GitHub Actions where multiple test workers run in parallel. When two workers compete for port 3000 or write to /tmp/test.log, one will fail.

The prevention pattern: always allocate dynamically.

// BAD: Hardcoded -- breaks with parallel workers
const server = app.listen(3000);
const dbName = 'test_database';
const logPath = '/tmp/app-test.log';

// GOOD: Dynamic allocation
// Port: let OS assign the next available
const server = app.listen(0);
const port = (server.address() as AddressInfo).port;

// Database: unique per worker
const workerId = process.env.JEST_WORKER_ID ?? '1';
const dbName = `test_db_worker_${workerId}`;

// Files: unique per test run
const logPath = path.join(
  os.tmpdir(),
  `app-test-${process.pid}-${Date.now()}.log`
);

For database-heavy test suites running in parallel, consider a test database pool:

// jest.globalSetup.ts: create one DB per worker
export async function setup() {
  const workerCount = parseInt(process.env.JEST_MAX_WORKERS ?? '4');
  for (let i = 1; i <= workerCount; i++) {
    await createDatabase(`test_db_${i}`);
    await runMigrations(`test_db_${i}`);
  }
}

// jest.globalTeardown.ts: clean up
export async function teardown() {
  const workerCount = parseInt(process.env.JEST_MAX_WORKERS ?? '4');
  for (let i = 1; i <= workerCount; i++) {
    await dropDatabase(`test_db_${i}`);
  }
}

Practice 5: Write Assertions That Are Robust to Irrelevant Variation

Many flaky tests are not flaky because the code is non-deterministic -- they are flaky because the assertions are too strict about details that do not matter for correctness.

Examples of brittle assertions:

Asserting the exact order of items in a set or map (which may not be insertion-ordered in all environments)
Asserting on exact floating-point values that have rounding differences across platforms
Asserting on log message text that includes timestamps or request IDs
Asserting on the exact number of milliseconds an operation takes (timing assertions)
Asserting on exact object shapes when only a subset of properties matter

// BAD: Brittle -- order may vary across environments
expect(result.tags).toEqual(['alpha', 'beta', 'gamma']);

// GOOD: Robust -- order doesn't matter
expect(result.tags).toEqual(expect.arrayContaining(['alpha', 'beta', 'gamma']));
expect(result.tags).toHaveLength(3);

// BAD: Floating point equality
expect(calculateTax(100)).toBe(13.000000000000001);

// GOOD: Approximate equality
expect(calculateTax(100)).toBeCloseTo(13.0, 2);

// BAD: Contains timestamp, will always vary
expect(logMessage).toBe('[2026-03-27T10:00:00Z] Processing complete');

// GOOD: Assert on the invariant part
expect(logMessage).toContain('Processing complete');

// BAD: Asserts on duration (timing-dependent)
expect(responseTime).toBeLessThan(50); // ms -- fails on slow CI

// GOOD: Assert on correctness, not performance
// Move performance tests to dedicated benchmarks or use generous thresholds
expect(responseTime).toBeLessThan(2000); // 2s -- should always pass

Practice 6: Build a Flakiness Prevention Gate Into Code Review

The practices above are most effective when they are enforced during code review, not discovered weeks later when a test starts flaking in production CI. A flakiness prevention gate catches issues when the context is fresh and the fix is cheap.

What to check in code review for new tests:

Does the test make a real network call? Any fetch, axios, http.get, or third-party SDK call in a non-E2E test is a flakiness risk. Require mocking.
Does the test use a hardcoded port, path, or database name? Flag it and require dynamic allocation.
Does the test rely on timing? Any setTimeout, setInterval, sleep, or delay in test code is a red flag. Require explicit waits or fake timers.
Does the test clean up after itself? Check that any created files, database records, or modified environment variables are reset in afterEach.
Does the test work in isolation? Require that all new tests pass when run alone: jest --testPathPattern=MyTest --runInBand.

# Add to your PR checklist or CONTRIBUTING.md:
## Test Review Checklist
- [ ] No real network calls (all HTTP mocked with MSW/nock/httpretty)
- [ ] No hardcoded ports or file paths
- [ ] No sleeps or fixed timeouts (uses waitFor/polling instead)
- [ ] Cleans up state in afterEach (DB, files, env vars)
- [ ] Passes when run alone (not order-dependent)
- [ ] Passes with --randomize (not sequence-dependent)

Automating this review with linting rules is even more effective:

// .eslintrc.js: lint rules that prevent common flakiness sources
module.exports = {
  rules: {
    // Prevent real timers in tests
    'no-restricted-syntax': [
      'error',
      {
        selector: "CallExpression[callee.name='setTimeout']",
        message: 'Use jest.useFakeTimers() or waitFor() in tests instead of setTimeout',
      },
    ],
    // Prevent hardcoded ports
    'no-restricted-syntax': [
      'error',
      {
        selector: "Literal[value=/^3[0-9]{3}$/]",
        message: 'Avoid hardcoded ports in tests. Use port 0 (OS-assigned) or a port pool.',
      },
    ],
  },
};

Putting Prevention Into a Development Workflow

These six practices work best as a system, not individual rules. Here is how to roll them out without disrupting an existing codebase:

Start with new tests only. Apply the practices to all newly written tests. Do not try to retrofit the entire test suite at once.
Add the code review checklist. This is the highest-leverage step -- it catches flakiness before it is committed, when fixing is cheapest.
Enable randomized test order in CI. Run jest --randomize on your CI pipeline. This immediately surfaces existing order-dependent tests.
Add onUnhandledRequest: 'error' to your MSW setup. This converts accidental network calls into immediate failures.
Detect and quarantine existing flaky tests separately. Prevention governs new tests; detection and quarantine handle the existing backlog.
Add lint rules progressively. Start with the highest-signal rules (network calls, hardcoded ports) and add more over time.

The Prevention Mindset

The most important shift is mindset. Flaky tests are not bad luck -- they are predictable outcomes of specific engineering patterns. When you write a test that sleeps for 100ms, you have created a test that will fail on a sufficiently loaded CI runner. When you write a test that hits a real API, you have created a test that will fail when that API is slow or down.

Recognizing these patterns during test authoring -- not weeks later when the test starts failing intermittently -- is what separates teams with reliable CI from teams that spend hours every sprint debugging "random" failures.

Tools like AI-powered flaky test management catch what slips through, but prevention means there is less to catch. The goal is a test suite where every test is deterministic, isolated, and fast -- and the developer productivity gains compound over time.

Key Takeaways

Prevention is cheaper than detection. A flaky test stopped in code review costs nothing.
Design tests to be deterministic: control time, seed randomness, mock the clock.
Enforce strict isolation: each test is independent, cleans up after itself, and passes alone.
Never make real network calls in unit or integration tests. Mock at the HTTP boundary.
Use dynamic resource allocation to eliminate port conflicts and path collisions.
Write robust assertions that are not brittle to irrelevant variation.
A code review checklist is the highest-leverage prevention tool.