Strategy

How to Quarantine Flaky Tests Without Losing CI Trust

March 12, 202610 min read

Your CI pipeline is red again. A developer clicks into the failure log, spends 15 minutes reading stack traces, and concludes: "It was just that flaky test again." They re-run the pipeline, wait another 12 minutes, and it passes. Multiply this by every PR across your team and you are looking at hundreds of lost engineering hours per quarter.

The quarantine pattern solves this by separating flaky test failures from real regressions. Quarantined tests still run, but their failures do not block merges. Your CI stays green and trustworthy while you systematically fix root causes.

What Is Test Quarantine?

Test quarantine is an operational pattern where tests identified as flaky are moved to a separate execution lane. They continue to execute on every CI run, but their results are reported separately from the main test suite. A quarantined test failure produces a "neutral" check status rather than a "failure," so it does not block pull request merges.

This is fundamentally different from skipping or disabling tests. Disabled tests stop running entirely, which means you lose signal -- if the underlying bug gets worse or the test starts passing consistently, you will never know. Quarantine preserves the signal while removing the noise.

Why Teams Need Quarantine

Without quarantine, engineering teams face an impossible choice:

Option A: Ignore flaky failures. Developers learn to re-run CI when it fails. This works until a real regression hides behind a "probably flaky" failure and ships to production.
Option B: Fix every flaky test immediately. Admirable in theory, but flaky test root causes are often deep (race conditions, shared state, environment dependencies) and take hours to diagnose. You cannot halt feature development every time a test flakes.
Option C: Disable flaky tests. Fast but dangerous. Disabled tests stop catching real bugs. Teams that disable aggressively end up with a test suite that passes 100% of the time and catches 60% of regressions.

Quarantine is Option D: keep running the test, stop blocking on it, and create a structured process to fix it. It is the only option that preserves both CI trust and test coverage.

The Quarantine Lifecycle

An effective quarantine system has four phases:

Phase 1: Detection

Before you can quarantine a test, you need to know it is flaky. The most reliable method is flip-rate analysis: tracking how often a test changes between pass and fail on the same code. A test with a flip rate above 0.2-0.3 is almost certainly flaky. For a full breakdown of detection techniques, see our practical guide to detecting flaky tests.

Detection can be automated. Ingest JUnit XML or test result artifacts from your CI runs into a tracking system, compute flip rates per test, and flag tests that exceed your threshold. No human intervention required.

Phase 2: Auto-Quarantine

When a test exceeds the flakiness threshold, it should be quarantined automatically. Manual quarantine does not scale -- if someone has to file a ticket and tag a test as flaky, it will not happen consistently.

Auto-quarantine requires two things:

A flakiness threshold (e.g., 0.3) and a minimum number of runs (e.g., 5) to avoid false positives on new tests.
A mechanism to separate quarantined test results in CI reporting. In GitHub, this means posting a "neutral" check status instead of "failure" when only quarantined tests fail.

# Example .flakyguard.yml configuration
quarantine:
  threshold: 0.3      # Flakiness score to trigger quarantine
  minRuns: 5           # Minimum runs before evaluating
  disabled: false      # Kill switch

checks:
  disabled: false      # Post GitHub check results
  neutralOnFlaky: true # Neutral (not failure) for quarantined tests

comments:
  disabled: false      # Post PR comments with flaky summary

Phase 3: Ownership and Triage

Quarantine without ownership is just a graveyard. Tests go in and never come out. Every quarantined test needs an owner -- someone responsible for investigating and fixing it.

The best way to assign ownership is automatically via CODEOWNERS. If your repository has a CODEOWNERS file, you can resolve the owner of each test file and assign quarantined tests to the responsible team. This turns "someone should fix this" into "the frontend team has 4 quarantined tests with an average flakiness score of 0.45."

Key metrics to surface per owner:

Number of quarantined tests
Average flakiness score
Days in quarantine (tests sitting for 30+ days need escalation)
Estimated CI minutes wasted per week

Phase 4: Resolution

The goal of quarantine is to be temporary. A healthy quarantine system has a "quarantine age" metric -- the average number of days tests spend quarantined. If this number keeps growing, your process is not working.

Resolution strategies by root cause (see our root cause analysis guide for detailed diagnosis techniques):

Timing issues: Replace setTimeout / sleep() with explicit waitFor() or polling patterns. Increase timeouts for CI environments where CPU is shared.
Test ordering: Ensure each test creates and tears down its own state. Avoid shared mutable fixtures.
Resource contention: Use unique ports, temp directories, or database schemas per test. Mock external services.
Environment differences: Pin timezone, locale, and system clock in tests. Use Date.now = jest.fn() or equivalent.
Network flakiness: Mock all external HTTP calls in unit tests. Use contract tests for integration verification.

Once a test is fixed, it should be unquarantined and monitored. If its flakiness score drops below the threshold over the next 10-20 runs, it stays in the main suite. If it flakes again, it goes back to quarantine automatically.

What Good PR Reporting Looks Like

When quarantine is working, your PR checks tell a clear story:

Green check: All tests passed.
Neutral check: 3 tests failed, all quarantined -- not blocking. Shows test names, flakiness scores, and owners.
Red check: 2 tests failed (real regressions). Plus 1 quarantined failure (not blocking). The developer knows exactly what needs attention.

The PR comment should include:

A summary: "2 real failures, 3 quarantined failures (not blocking)."
For real failures: test name, error message, and whether this test has any flaky history.
For quarantined failures: test name, flakiness score, and owner.

Quarantine Anti-Patterns

Watch out for these common mistakes:

Quarantine without a dashboard. If nobody can see what is quarantined, nobody will fix it. You need a visible, team-accessible view of quarantined tests, their age, and their owners.
No quarantine age limit. Set a policy: tests quarantined for more than 30 days without investigation get escalated. Without this, quarantine becomes a permanent skip list.
Manual-only quarantine. If quarantine requires someone to edit a config file or add an annotation, it will not happen consistently. Automate the detection-to-quarantine pipeline.
Quarantining real failures. This happens when the flakiness threshold is too low or the minimum run count is too small. A test that fails consistently on a specific branch is not flaky -- it is broken. Require at least 5 runs and a flip rate above 0.2 before quarantining.

Measuring Quarantine Health

Track these metrics to ensure your quarantine system is working:

Quarantine rate: Percentage of tests currently quarantined. Healthy teams keep this under 5%. Above 10% signals systemic test quality issues.
Mean time to resolution: Average days from quarantine to fix. Target: under 14 days. Use proven fix patterns to speed up resolution.
Quarantine churn: Tests that bounce in and out of quarantine. These have root causes that were not fully fixed.
CI time saved: Hours of developer time not wasted on investigating flaky failures. This is your ROI metric.

Getting Started

If you are implementing quarantine from scratch:

Start collecting test results. Ingest JUnit XML from every CI run into a database or service.
Compute flip rates. After 5+ runs, identify tests with a flip rate above 0.3.
Separate CI reporting. Use your CI platform's check API to post neutral results for quarantined tests.
Add a dashboard. Even a simple spreadsheet showing quarantined tests, their age, and owners is better than nothing.
Set a review cadence. Weekly quarantine triage meetings until the process is self-sustaining.

For teams ready to go beyond manual triage, see our guide on how AI agents automate the entire quarantine workflow -- from detection to resolution tracking.

Or, use a purpose-built tool. FlakyGuard handles the full quarantine lifecycle automatically: detection via flip-rate analysis, auto-quarantine with configurable thresholds, CODEOWNERS-based ownership assignment, GitHub PR checks with clear quarantine reporting, AI-powered root cause analysis, and team-level analytics.

Automate your flaky test quarantine

FlakyGuard detects flaky tests, quarantines them automatically, assigns owners via CODEOWNERS, and diagnoses root causes with AI. Your CI stays green while you fix the real problems.

Join the Waitlist