Beyond the Limit

Pondering Test Coverage Limits and Thresholds

1729, 1089, 42, 3.14159… History, pop culture and mathematics are littered with magical numbers. The fame of each sequence of digits is established in different ways. People will remember those constants they come to use regularly over time in equations. Or those that dictate a formal limit that they must follow.

What is the right number for test coverage percentage?

All good musings start with a question. What is the right percentage test coverage to enforce? The posing of this quandary by another on social media has got me thinking about test coverage again. What is the right number? Is there a right number at all? This week I revisit test coverage limits, focusing on what the limit should be, and mechanisms to enforce said number.

Test for Echo

As expected, the social media responses to the aforementioned question varied. Typical answers vary between 80 and 100%. My own opinion is that it should be at least 90%. Searching the expanse of the Internet doesn’t give you a concrete answer either, with similar ranges being discussed. Differing opinions on the effectiveness of 100% coverage are also easy to find.

Some engineers still see writing tests as a bonus ball moment, rather than a mandated part of feature development

Perceptions differ vastly across my workplace as well. I would love to say responses are similar to the above. In certain circles they are thankfully within that range. However, there are still some that see tests are a bonus in the development of new features. That logic is so simple to understand that writing tests is pointless. This lack of craftsmanship allows you to identify those developers unable to own the features they develop like a flashing green diamond above their SIM.

Test Pilot Blues

Irrespective of an engineers dedication to the craft, the right number is one that is collectively agreed. Squads should be encouraged to aim high, rather than try to scrape the barrel for the lowest achievable threshold. Utilising lead engineers will help establish a high bar. The sole way to establish N% coverage as dogma is to have the team define N for themselves, and document it as the definition of done.

Strong lead developers will also be mindful that the right number depends on the current state of the project. Legacy codebases such as some that we own have low test coverage due to a previous lack of dedication to the automated testing cause.

Legacy applications with historically poor coverage can cause developers to aim low in establishing their coverage metrics

Regardless of past sins, new components should not fall foul to the same poor practices. The team should agree a high threshold for all new components together. That should be as high as you can.

Put to the Test

Once test coverage threshold consensus has been achieved, it is vital to enforce the threshold. Coverage regression can be caused by several factors, which have been discussed previously. Of late, differing craftsmanship has been a lesser cause thanks to threshold quality gates, and enforcing of strong coverage practices through regular pull requests.

Deadlines have been the greatest single contributor of coverage dips in development of recent features. Even the most diligent of programmers will cut corners when it gets hot in the kitchen. This may be driven by a lack of dedication to the practice of TDD. Tests are still seen as an exercise to be undertaken once something works. This week I’ve seen an engineer writing tests for a feature developed last sprint, raising concerns that their inexperience meant it took considerably longer. This mindset drastically needs to change to ensure testing thresholds are adhered to and instilled among junior developers.

As we inch increasingly closer to deadlines, the first thing developers drop is writing automated tests

Going back to our legacy components, we need to be mindful of coverage dips when striving to improve our adoption. Gradually increasing thresholds is one way, but still mean drops in coverage without regular discipline. It also means once we overachieve, that engineers can drop the coverage to the bare threshold when the going gets tough. The use of delta gates should be considered to prevent coverage falls.

The Test of Time

This journey of discovery has helped me realise that there is no single solution to the coverage equation. Teams should strive to enforce a high standard that they can work towards, rather than imposing a minimum standard that has already been achieved.

Teams must agree on test coverage metrics together to build trust and consensus

Collective agreement on what the percentage coverage should be is important. I cannot impose my own 90% preference on the entire team. How can they possibly buy into a number that they don’t consider magical. Factors such as the current state can be a starting point. Legacy codebases will require use of delta gates to ensure an upward trend to your desired result. It’s by no means the end of the journey. Pick your percentage wisely.

Thanks for reading!