Unconventional Wisdom V15: Test EffectivenessGo Back
- Posted by Robin
Participants in my testing seminars routinely rate their organizations very low on measuring and rewarding effective testing. Rewarding usually is thought the deal breaker. However, further inquiry continually confirms my analysis that the main reason test effectiveness is not rewarded is because it is not measured.
Conventional Measures of Testing
Participants regularly report a variety of ways their organizations commonly think they are measuring test effectiveness. Frequently cited are the ever-rare projects that come in on-time and in-budget, followed by somewhat less rare testing that finished on-time and in-budget.Other supposed test effectiveness measures participants often report include customers’ satisfaction with project results, tester participation on (typically-Agile) project teams, extent of test automation, and number of defects detected by testing.
Many factors besides testing contribute to a project’s finishing on-time and in-budget. In fact, it seems more likely that more effective testing would lead to overruns due to finding more defects that would necessitate additional time and effort to fix and retest. Testers I meet generally complain that their budgets and schedules are woefully inadequate for performing the testing they believe is necessary. It’s therefore had to imagine how finishing testing within such prescribed but insufficient constraints equates to more effective testing. Anybody can finish on-time and in-budget if it doesn’t matter what they deliver. Test effectiveness is all about what’s delivered.
Similarly, customer satisfaction is due to many factors, one of which certainly would be few(er) defects, which in turn is due to many factors. While more effective testing may be one factor contributing to customer satisfaction, I seldom find customers mentioning let alone crediting test effectiveness for their satisfaction. People expect software to work without defects and instead mainly base their satisfaction on design aspects.
Participating actively and amicably with other project team members certainly are necessary for testers to be effective. They’re also understandably how many other team members may judge testers on their team. However, while necessary, they’re hardly sufficient to make the testing itself effective. Moreover, effective testing is likely to ruffle feathers by revealing issues that could make other team members uncomfortable. More effective testers perhaps do so more tactfully, but that’s a secondary aspect of their effectiveness.
As “do it yesterday” increasingly dominates development, automation attracts greater attention. Automated tests can be executed quicker and thus are essential for dev ops and other approaches focused on reducing time between development and deployment. Doing more tests in limited available time surely is an aspect of effectiveness; but automation says nothing about the effectiveness of the tests themselves. Conversely, more effective tests that take too long to run don’t improve test effectiveness.
Number of defects detected surely indicates test effectiveness, doesn’t it? Tests that find 30 defects must be more effective than tests that find 25 defects, right? This is another instance where the answer is “it depends.” Say the 30 defects were in a program with 300 lines of code, whereas the 25 defects were in a 25-line piece of code. By some perspectives, detected defect density is far greater when finding one defect per line of code than one defect per ten lines of code.
Okay, so let’s normalize and talk only of comparable defect densities, such as per thousand lines of code (KLOC). How many defects a tester finds is only half the story, like knowing only how many points your team has scored but not how many the opponent has scored. We also need to know how many defects were missed. For example finding 25 of 250 defects is a lot less effective testing than tests that find 30 of 30 defects.
Defect Detection Percentage
Defect Detection Percentage (DDP) is the name for the measure of test effectiveness that reflects the ratio of defects found to total defects, i.e., the sum of defects found plus defects missed. DDP is perhaps the most widely-recognized measure of test effectiveness among the small minority of testers who actually understand—let alone measure– test effectiveness. Thus, in one way DDP is conventional wisdom; but in comparison to the truly “unconventional we’s dumb” cited by my many testing students, DDP may be too rarely recognized to earn the designation.
DDP is not without its challenges, including proneness to misuse. For any given piece of software, one cannot accurately assess total number of defects until it has been in production long enough for essentially all latent defects to be detected. This gets further complicated as code is continually updated with fixes and enhancements. Some of their attendant new defects may be found in conjunction with the updates, often along with existing previously undetected defects, while the additional unfound defects may lurk longer and longer.
This has several effectiveness measurement consequences. First, it’s unwise to make judgments about DDP during arbitrary slices of development. Without a solid end-point where additional defects essentially stop being detected, each DDP calculation effort is as likely to reflect sampling issues as actual DDP test effectiveness.
Second, in a related vein, meaningful measurement also needs to reflect the point and type of defect injection as well as when and how the defect is detected. Such uncommon granularity is essential for meaningful understanding and improvement of the testing process. Even though most authorities recognize requirements and design defects are the major cause of defects that are not detected until coded, few organizations routinely or reliably collect necessary information to see what’s actually happening, its causes, its extent, or its impact.
Third, in light of the above, DDP is most useful with respect to test processes. Yes, processes are measured by accumulating across included projects; but the accuracy and relevance of such test effectiveness measures is much more appropriate for the process as a whole rather than for evaluating any of its individual included projects.
More on test effectiveness will be in a future post. Email me to learn more about my related Proactive Testing™ measurement, management, execution, and improvement training and direct advisory assistance.
“It isn’t what we don’t know that gives us trouble, it’s what we know that ain’t so.”
– Will Rogers
Welcome to my Unconventional Wisdom blog. Much of conventional wisdom is valid and usefully time-saving. However, too much instead is mistaken and misleadingly blindly accepted as truth, what I call “conventional we’s dumb”. Each month, I’ll share some alternative possibly unconventional ideas and perspectives I hope you’ll find wise and helpful.
Robin F. Goldsmith, JD helps organizations get the right results right by advising and training business and systems professionals on risk-based Proactive Software Quality Assurance™ and Proactive Testing™, REAL requirements, REAL ROI™, metrics, outsourcing, project and process management. He is author of the book, Discovering REAL Business Requirements for Software Project Success, and the forthcoming book, Cut Creep: Write Right Agile Story and Acceptance Test Requirements Right. Email him at [email protected]