Guide

Fixing Flaky Tests in GitHub Actions: A Step-by-Step Guide

March 22, 202610 min read

GitHub Actions is the CI system of choice for millions of repositories. But as your test suite grows, flaky tests become the silent productivity killer: builds fail for no reason, developers retry workflows hoping for green, and trust in CI erodes. This guide walks through the most common causes of flaky tests specific to GitHub Actions and how to fix each one.

Why GitHub Actions makes flaky tests worse

GitHub Actions runners are shared, ephemeral virtual machines. Unlike your local dev environment, they have variable CPU and I/O performance, limited memory, and a fresh filesystem every run. Tests that pass reliably on your MacBook can fail intermittently on a runner because of these differences.

The combination of shared infrastructure and ephemeral environments creates several flakiness vectors that don't exist locally:

CPU throttling: Runners share physical hosts. Your tests might get slower CPU time during peak hours, causing timeout-sensitive assertions to fail.
Network variability: Tests that call external services or download dependencies may hit rate limits or experience latency spikes.
Filesystem differences: Case sensitivity, temp directory behavior, and file descriptor limits differ between runners and local environments.
Concurrency limits: Parallel jobs compete for runner availability, leading to queued workflows and stale caches.

Step 1: Identify which tests are flaky

Before you can fix flaky tests, you need to know which ones are flaky. The naive approach -- “it failed, so I'll re-run” -- doesn't scale. You need systematic detection.

Use JUnit XML reports

Most test frameworks can output JUnit XML. Configure your workflow to upload test results as artifacts, then analyze them across runs to spot tests that flip between pass and fail.

# .github/workflows/test.yml
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm test -- --reporter=junit --outputFile=results.xml
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: test-results
          path: results.xml

Track flip rates

A test's flip rate is how often it switches between pass and fail across consecutive runs without code changes. A flip rate above 2% is a strong signal of flakiness. Tools like FlakyGuard automate this tracking by ingesting every CI run and computing flip rates per test.

Step 2: Fix timing-dependent tests

Timing issues are the number one cause of flaky tests on GitHub Actions. Assertions that depend on wall-clock time or assume operations complete within a fixed window break when the runner is under load.

Replace sleeps with polling

Never use setTimeout or sleep for synchronization. Instead, poll for the expected condition with a timeout:

// Bad: assumes 500ms is enough
await new Promise(r => setTimeout(r, 500));
expect(element).toBeVisible();

// Good: polls until condition is met or times out
await waitFor(() => expect(element).toBeVisible(), {
  timeout: 5000,
});

Use fake timers for time-sensitive logic

If your code depends on Date.now() or timers, use your test framework's fake timer support. Jest, Vitest, and Playwright all support this:

// Vitest / Jest
vi.useFakeTimers();
vi.setSystemTime(new Date('2026-01-15T10:00:00Z'));
// ... run test ...
vi.useRealTimers();

Step 3: Handle resource constraints

GitHub-hosted runners have 7 GB of RAM and 2 CPUs (for ubuntu-latest). Tests that work fine on your 32 GB MacBook might OOM or time out on a runner.

Limit test parallelism

Running too many tests in parallel can exhaust memory. Configure your test runner to limit concurrency:

# Jest
jest --maxWorkers=2

# Vitest
vitest --pool=threads --poolOptions.threads.maxThreads=2

# Playwright
npx playwright test --workers=2

Monitor runner resources

Add a diagnostic step to your workflow to track resource usage:

- name: System info
  run: |
    free -m
    nproc
    df -h
  if: always()

Step 4: Eliminate environment drift

GitHub Actions runners update their pre-installed software regularly. A Node.js patch bump or a Chrome version change can break tests silently.

Pin your tool versions

- uses: actions/setup-node@v4
  with:
    node-version: '20.11.1'  # Pin exact version, not '20' or '20.x'

- uses: browser-actions/setup-chrome@v1
  with:
    chrome-version: '122.0.6261.94'

Cache dependencies deterministically

Use lockfile-based cache keys to ensure the same dependencies across runs:

- uses: actions/cache@v4
  with:
    path: node_modules
    key: deps-${{ hashFiles('pnpm-lock.yaml') }}

Step 5: Isolate external dependencies

Tests that hit real APIs, databases, or third-party services are inherently flaky in CI. Network issues, rate limiting, and service outages all cause spurious failures.

Mock external APIs using libraries like msw (Mock Service Worker) or nock.
Use service containers for databases instead of remote instances. GitHub Actions supports Docker services natively.
Record and replay HTTP interactions using tools like polly.js or VCR patterns.

# Use a service container for PostgreSQL
services:
  postgres:
    image: postgres:16
    env:
      POSTGRES_PASSWORD: test
    ports:
      - 5432:5432
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

Step 6: Implement smart retries (but don't rely on them)

GitHub Actions supports step-level retries, and many test frameworks have built-in retry mechanisms. Retries are a valid stop-gap, but they mask the underlying problem. Use them alongside root cause investigation, not instead of it.

# Workflow-level retry for the entire test job
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
    steps:
      - uses: actions/checkout@v4
      - run: npm test
        id: test1
        continue-on-error: true
      - run: npm test
        if: steps.test1.outcome == 'failure'

A better approach is quarantining flaky tests so they run in a separate, non-blocking lane while you investigate the root cause.

Step 7: Automate flaky test management

Manual tracking breaks down fast. Once you have more than a handful of flaky tests, you need automation:

Automatic detection that watches every CI run and flags new flaky tests using flip-rate analysis.
Automatic quarantine that isolates flaky tests without manual intervention.
AI-powered diagnosis that classifies the root cause and suggests specific fixes.
Trend tracking that shows whether your flakiness problem is getting better or worse over time.

This is exactly what FlakyGuard does. Install the GitHub App, and it automatically ingests your GitHub Actions test results, detects flaky tests, quarantines them, and provides AI-powered fix recommendations.

GitHub Actions-specific checklist

Use this checklist when debugging flaky tests in GitHub Actions:

Check the flip rate: is this test genuinely flaky or a real failure?
Compare runner specs to your local machine (CPU, RAM, disk).
Check if the failure correlates with specific runner images or regions.
Look for timing-dependent assertions (sleeps, fixed timeouts).
Verify all tool versions are pinned (Node, browser, OS image).
Check for external service calls that could fail or be rate-limited.
Review test isolation: do tests share state, ports, or temp files?
Check memory usage: are parallel tests exhausting runner RAM?
Look at cache hit rates: stale or missing caches cause different behavior.
Review workflow concurrency: are jobs competing for shared resources?

Stop re-running failed GitHub Actions workflows

FlakyGuard plugs directly into your GitHub Actions pipeline. It detects flaky tests, quarantines them automatically, and gives your team AI-powered fix suggestions. No config files, no test code changes.

Join the Waitlist