More Tests ≠ Better

In the modern era of AI-powered development, writing tests has never been easier. With a single prompt or a "generate unit tests" click, you can sprout thousands of lines of test code in seconds. It feels productive. It looks great on a dashboard. But as many engineering teams are discovering, there is a point where the quantity of tests stops being a safety net and starts becoming a cage.

When we prioritize the volume of tests over their intent, we aren't building better software—we are building a maintenance nightmare.

1. The "Easy" Generation Debt

The rise of generative AI means that code coverage targets that once took weeks to hit can now be reached in hours. However, this ease of creation is a double-edged sword.

The Bloat: It is tempting to generate tests for every trivial function, getter, and setter.
The Infrastructure Bill: More code isn't free. Every extra test increases your execution time and swells your cloud compute bills. If your team runs a bloated suite on every commit, you are burning money on "safety" that often provides no new information.

2. The PR Bottleneck and Developer Fatigue

In fast-paced environments, productivity is often measured by the number of Pull Requests (PRs) merged. A massive test suite is the natural enemy of this velocity.

The Productivity Cliff: Data shows that when test suites exceed 10–20 minutes, developers stop "waiting" and start context-switching.
The Math of Flakiness: If a single test has a 0.1% chance of failing randomly (flakiness), a suite of 1,000 tests will fail nearly 63% of the time. This leads to "alert fatigue," where developers begin to ignore failures, assuming they are "just the flaky tests again."

3. The "Mockery" of Real Testing

To keep large test suites running fast, developers often rely heavily on Mocks. If your test mocks the database, the external API, and the internal service logic, you aren't testing your code—you’re testing your assumptions.

The Risk: Industry post-mortems show that roughly 35% of production bugs are integration errors. If your tests rely on mocks that assume a behavior that has since changed in the real world, your tests will pass while your production environment crashes.

4. Quality vs. Quantity: The Data

Research suggests that while moving from 0% to 70% coverage significantly reduces bugs, the benefits taper off sharply afterward. The "Law of Diminishing Returns" applies heavily to testing.

The Bottom Line: Test for Confidence, Not Credit

A lean suite of 100 tests that covers critical user journeys, handles messy edge cases, and runs in under three minutes is infinitely more valuable than a library of 1,000 tests that just confirms your boilerplate code works as intended.

Stop counting your tests and start weighing them. Delete redundant checks, favor integration over excessive mocking, and remember: The goal of testing is to ship reliable software, not to hit a vanity metric on a chart.

References & Further Reading

Google Engineering: Software Engineering at Google (Curated by Winters, Manshreck, and Wright) – On the "Cost of Flakiness" and "Test Maintenance."
Journal of Systems and Software: The Relationship Between Code Coverage and Software Quality – A study on why high coverage doesn't always equal fewer defects.
DORA Research (DevOps Research and Assessment): On the correlation between fast feedback loops (Continuous Integration) and high-performing technology organizations.
The "Testing Trophy" Model: Originally proposed by Kent C. Dodds, emphasizing integration tests over isolated unit tests.

‍

More Tests ≠ Better

Rishi Singh

More Tests ≠ Better

1. The "Easy" Generation Debt

2. The PR Bottleneck and Developer Fatigue

3. The "Mockery" of Real Testing

4. Quality vs. Quantity: The Data

The Bottom Line: Test for Confidence, Not Credit

References & Further Reading

Related posts

Flexibility, Security, and Transparency with Baserock