Test Sample Not Test Cases – Using Sampling in Testing

I have worked with statistical methods in testing since before the millennium, when I used it for determining quality of six-digit date updates (or in fact since when I studied Atomic science for a while)… It has really helped me to know well enough if something is good enough.

  • Testing traditionally takes a lot of time –
    But we do not have enough time, money or people!
  • Testing with statistical methods is quicker –
    Sampling is extremely powerful, gaining a lot of time
  • Test results become objective och quantified
  • We can test less and still learn more than usual –
    And we can deliver results much faster…

Less Test Cases

Did you know that we only need about 50 test samples – not even test cases – in the largest possible system – to be able to tell, at a 99 % confidence level, if it is good enough (to at least 99 % extent, to be precise)?

Thus, in spite of the general view that we can´t work faster in test, there are alternatives to full scale, traditional tests. My suggestion is to use sampling, at least when we expect:

  1. Relatively large amounts of something to examine
  2. Whatever we are examining to be of a similar form
  3. Discernability whether an item is faulty or not
  4. Examination to be a control, not a debugging session
  5. Reasonable confidence, not absolute certainty!

 

This means that sampling can be quite efficient in testing, where we often have large amounts of test cases to run and verify, as well as many items to write test cases about.

It can even be an alternative to test automation in e.g. agile development! Or when we just want to have an opinion very quickly about a huge requirement specification, or …

Also, it helps when we want to express ourselves in a way that makes the Project Leader happy: we don´t talk about number of level 2 bugs found, we say that we are at least 99.8 % sure that this system works well enough (and if he wants to know what ‘well enough’ is, it is to 99.8 % extent, because that is comparable to the SLA for the servers, etc.). Thus, we take a step up in the value chain, delivering news in a way that the steering committee would understand…

This is not something new, either; it has been used for decades in the hardware industry. There are standards (ISO 2859, ANSI/ASQ Z1.4, MIL-STD-105E (for free!)) – but nowadays we can get the calculations done online:

http://www.sqconline.com/military-standard-105e-tables-sampling-attributes

 

So, what is the gain?

Typical test usage of sampling:

  • In an extreme case, with a sample of about 200 from a million changes, where we don´t find a single error, a factor of 5,000, so that tests will become almost negligible in comparison with other work.
  • In a normal case, with subsystems, deeper testing somewhere, etc. – up to a factor 10, at least a factor 3, counting total effort, according to my own experience.

Governing factors:

  • What level of quality we want to reach
    (may vary between e.g. different parts of a system)
  • How certain we want to be that the level is reached
    (may also vary between e.g. different system parts)
  • As usual, how much we can/want to plan in advance,
    but we may also use the methods in order to check achieved levels later…

 

Why don’t we always work like this, then?

  • The level of ambition has to be clear in all parts
    (i.e. the Acceptable Quality Level, AQL) .
  • That level can be set too high or too low, due to uncertainty or greed. Working with too few samples might give the wrong impression.
  • A system or other entity might consist of many different, non-comparable units.
  • We may be unlucky – or misunderstand something – and get a misleading sample. E.g. sampling among test cases not covering the complete functionality.
  • Tests cannot be used to debug a lousy (sic!) construction work. Many organizations still believe that tests are meant to do just that…
  • The method may not be accepted by the testers – or other important parties of interest.
  • Insufficient knowledge of statistics leads to the wrong usage of sampling techniques.

However, utilized in the right way, I consider this a very good complement to other techniques. Not least because it gives us hard figures to point at, in a very cost efficient way. We know where we stand – and if we need to dig there.

“It is not enough to do your best; you must know what to do, and then do your best.”

— William Edwards Deming

About the Author

Fredrik

Worked with Test and Quality Assurance for over 25 years, as employee and consultant, primarily as a TL or QA in large projects. Favorite area: Statistical methodology.
Find out more about @fredrikc