When can we say that a software is tested "enough"?

The principles and technique to help understand when test coverage can be considered as "enough".

Dec 6, 2023

You’re building a very important feature for your customers. You know that a defect found by your customer in your software will hurt their trust in your product so you wanted to be very careful about this, but how do you know that you’ve tested enough?

This post aims to answer that question, especially for functional black-box testing.

The seven testing principles

First of all, let’s take a look at the seven testing principles that are introduced by the International Software Testing Qualification Board (ISTQB). According to ISTQB, the principles “offer general guidelines common for all testing.”

This will be our foundation in understanding what is considered enough coverage for testing:

Principle	Meaning
Testing is context dependent	Testing is done differently in different contexts, safety-critical software will be tested differently from e-commerce, so keep in mind of the context when testing.
Exhaustive testing is impossible	It is not feasible to test every single possible cases except for trivial cases, so you need to use risk analysis, test techniques, and priorities should be used to focus test efforts.
Defects cluster together	A small number of modules usually contains most of the defects, predicting it is an important input into a risk analysis used to focus the test effort.
Testing shows the presence of defects, not their absence	Because it is costly to test everything, testing is a mechanism to show the presence of defect and can NOT prove absence of defect nor that the software is correct.
Early testing saves time and money	It is better to detect defect early as it helps reduce or eliminate costly changes.
Beware of the pesticide paradox	Repeating the same set of tests will not uncover new defects so test data may need changing and new tests may need to be added.
Absence-of-errors is a fallacy	Despite all the test, a defect-free software is still an unusable software if it does NOT follow the user's requirements.

For this post, I intentionally changed the sequence and rephrased the meaning of the principles so that it is easier to understand them all cohesively—so try reading only the meaning from top to bottom.

If you’d like to, you can see them yourself from the original document from ISTQB.

Now you can see that we can surmise the criteria of what is “enough coverage” as below:

There is no other meaningful test that can be run (redundancy-free), judged by doing risk assessment and testing techniques
The software can be considered as usable by the user
Tests are made with the software’s context in mind:
- the environment (such as viewport, OS, hardware, etc.)
- the user (such as tech-saviness, while multi-tasking, low internet connection, etc.)

The testing techniques

Testing techniques are tools to help us cover enough testing ground. These techniques are generally made to reduce redundancy in tests and focus on areas where the defects are most likely to happen.

Note that the techniques listed below are by no means exhaustive, but only a handful that I have learned and understand so far. I will add more over time.

Just like all other tools, no one tool is right and each shall be applied—solely or in combination with another—accordingly.

Equivalence Partitioning

This technique splits data into multiple valid and invalid partitions that are expected to be processed in the same way and makes a test case from the value in each of the partitions. It ensures that each condition is tested at least once.

Valid partitions should contain values that are accepted by the system, while invalid partitions are values rejected by the system.

The technique considers 100% coverage when all partitions are tested by at least one value from each partition.

The technique works better for data with discrete values. Check the Boundary Value Analysis technique for work with sequential or numerical range data.

INTERESTING THOUGHT

In the fairy tale Goldilocks and the Three Bears, Goldilocks uses similar technique to this when trying the three bear’s properties:

Papa bear’s Son bear’s Mama bear’s

invalid valid invalid

Porridge too hot just right too cold

Chair too hard just right too delicate

Bed too hard just right too delicate

	Papa bear’s	Son bear’s	Mama bear’s
	invalid	valid	invalid
Porridge	too hot	just right	too cold
Chair	too hard	just right	too delicate
Bed	too hard	just right	too delicate

Example

Customer will get:

5% discount for $1–10 purchase

10% discount for $11–100 purchase

15% discount for above

Minimum purchase is $1 and maximum is $500

Instead of testing sporadically and without knowing if the condition is tested enough …

$6 should get a 5% discount
$10 should get a 5% discount
$13 should get a 10% discount
$34 should get a 10% discount
$88 should get a 10% discount
$120 should get a 15% discount
$400 should get a 15% discount

… the technique allows clarity in defining “enough” by demanding exactly 5 tests to ensure all conditions are tested:

$0.5 should be invalid
$6 should get a 5% discount
$34 should get a 10% discount
$400 should get a 15% discount
$700 should be invalid

Step by step

Let’s use the sample above:

Valid input is only letters and not symbols nor numbers

Turn the conditional statement into valid and invalid partitions.

invalid valid invalid

type symbols letters numbers

range ~!@#?/… A to z 0 to 9
Take a value from each partition as a test case
- ’$’ should result in invalid
- ‘F’ should get 5% discount
- ‘4’ should result in invalid

	invalid	valid	invalid
type	symbols	letters	numbers
range	~!@#?/…	A to z	0 to 9

Boundary Value Analysis

Most often, the defect of a numeric range or sequential data can be found on its edges rather than the middle value. This technique is an extension of the Equivalence Partitioning technique that focuses on those edges to make sure the boundaries are set correctly as values on the boundaries are more prone to defect than ones that are within.

In this context, each partition will have valid values and invalid values.

The technique considers 100% coverage if all valid and invalid values for all partitions are tested at least once.

Example

Customer will get:

5% discount for $1–10 purchase

10% discount for $11–100 purchase

15% discount for above

Which resulted in these 6 test cases:

$0 should be no discount
$1 should get a 5% discount
$10 should get a 5% discount
$11 should get a 10% discount
$100 should get a 10% discount
$101 should get a 15% discount

Step by step

Let’s use the sample above:

Valid input is only letters and not symbols nor numbers

Turn the conditional statement into valid and invalid partitions.

discount 0% 5% 10% 15%

range < $1 < $11 < 101 > $101
Take valid and invalid values on the edge of each partition
- partition 0%:
  - invalid: -$0.01
  - valid: $0
  - valid: $0.99
  - invalid: $1
- partition 5%:
  - invalid: $0.99
  - valid: $1
  - valid: $10.99
  - invalid: $11
- partition 10%:
  - invalid: $10.99
  - valid: $11
  - valid: $100.99
  - invalid: $101
- partition 15%:
  - invalid: $100.99
  - valid: $101
Combine and remove the redundant tests
- -$0.01 should be rejected
- $0 should get no discount
- $0.99 should get no discount
- $1 should get a 5% discount
- $10.99 should get a 5% discount
- $11 should get a 10% discount
- $100.99 should get a 10% discount
- $101 should get a 15% discount

discount	0%	5%	10%	15%
range	< $1	< $11	< 101	> $101

Decision Table

Also called the “Cause Effect Table” because it maps a system’s effects and its causes. Laying down each variable and its states (boolean or discrete) alongside the effect it causes.

This is best used to record and test complex business rules that a system must implement.

The technique considers 100% coverage when all the decision rule is covered. Note that the strength of this technique lies in making sure no decisions are left untested, so this may a huge number of cases need to be created and run.

Example

Registration form that asked for email, and password.

Test case	1	2	3	4	5
Conditions
Email	Y	Y	N	Y	N
Password	Y	Y	Y	N	N
Effects
Success message	Y	Y	N	N	N
Error message	N	N	Y	Y	Y

Step by step

Let’s use this as an example:

Test a registration form that:

requires a valid email

requires a valid password which is 8 or more charactes and is a combination of numbers and letters

optional opt-in to newsletter communication via email

List down the conditions and possible effects that may happen—make sure to list down the effect rather than putting it in the table to make it easier to comprehend

Test case
Conditions
Email
Password (8 char with numbers & letters)
Password (8 char, only numbers)
Password (8 char, only letters)
Password (7 char with numbers & letters)
Newsletter opt-in
Effects
Success message
Error message
Send welcoming newsletter

Jot down the different conditions that are met as well as their appropriate effects; you may use:

‘Y’ or ‘N’ to represent ‘Yes’ or ‘No’,
‘T’ or ‘F’ to represent ‘True’ or ‘False’,
‘1’ or ‘0’ to represent boolean,
the discrete value (such as “red”, “green”, “XXL”, etc.) if that is the data type requested, or
’-‘ if not relevant

Test case	1	2	3	4	5	6	7	8
Conditions
Email	Y	Y	N	Y	Y	Y	Y	N
Password (8 char with numbers & letters)	Y	Y	Y	N	-	-	-	N
Password (8 char, only numbers)	-	-	-	-	Y	-	-	-
Password (8 char, only letters)	-	-	-	-	-	Y	-	-
Password (7 char with numbers & letters)	-	-	-	-	-	-	Y	-
Newsletter opt-in	Y	N	-	-	-	-	-	-
Effects
Success message
Error message
Send welcoming newsletter

Jot down the respective effects given the conditions you’ve listed; which can be expressed by:

Use ‘X’, ‘Y’, ‘T’, or ‘1 to mark that an action should occur, and
Blank (preferable for ease of reading), ‘N’, ‘F’, or ‘0’ to indicate the action will not occur.
It is also okay to put a specific message shown in or beside the table if relevant.

Test case	1	2	3	4	5	6	7	8
Conditions
Email	Y	Y	N	Y	Y	Y	Y	N
Password (8 char with numbers & letters)	Y	Y	Y	N	-	-	-	N
Password (8 char, only numbers)	-	-	-	-	Y	-	-	-
Password (8 char, only letters)	-	-	-	-	-	Y	-	-
Password (7 char with numbers & letters)	-	-	-	-	-	-	Y	-
Newsletter opt-in	Y	N	-	-	-	-	-	-
Effects
Success message	X	X
Error message			X-1	X-2	X-2	X-2	X-2	X-2
Send welcoming newsletter	X	X

Notes

Success message: “Your account has been successfully created!”
Error messages:
1. “Email is required.”
2. “Password must contain 8 or more characters and is a combination of number and letter.”

Pairwise

The technique is based on the observation that most faults are caused by interactions of at most two factors. It significantly reduces the amount of test needed simply by making sure a pair of each test are only tested once whenever possible.

This is best applied to settings or configurational features.

The technique considers 100% coverage when all variable value pairs are represented in a test at least once.

Example

Format method: quick, slow
File system:   FAT, FAT32, NTFS
Compression:   on, off

Instead of doing all 12 combinations to test …

quick + FAT + on
quick + FAT + off
quick + FAT32 + on
quick + FAT32 + off
quick + NTFS + on
quick + NTFS + off
slow + FAT + on
slow + FAT + off
slow + FAT32 + on
slow + FAT32 + off
slow + NTFS + on
slow + NTFS + off

… the technique allows us to only do 6 tests with a high degree of confidence since all variables’ values are paired at least once with another variable’s value. That is a 50% decrease.

quick + FAT + on
quick + FAT32 + off
slow + FAT + off
slow + FAT32 + on
quick + NTFS + on
slow + NTFS + off

A bigger set of combinations will result in a bigger efficiency. For example, the combination below has 4,704 combinations, only 62 are needed when using the Pairwise technique. That is a whopping 98.6% decrease!

Type:          Primary, Logical, Single, Span, Stripe, Mirror, RAID-5
Size:          10, 100, 500, 1000, 5000, 10000, 40000
Format method: quick, slow
File system:   FAT, FAT32, NTFS
Cluster size:  512, 1024, 2048, 4096, 8192, 16384, 32768, 65536
Compression:   on, off

Primary + 10 + slow + FAT32 + 16384 + on
Primary + 100 + quick + FAT + 2048 + off
Primary + 100 + quick + NTFS + 512 + off
Primary + 500 + quick + FAT + 65536 + off
Primary + 500 + slow + FAT32 + 4096 + off
Primary + 1000 + slow + FAT + 1024 + on
Primary + 5000 + quick + NTFS + 1024 + off
Primary + 10000 + slow + FAT + 32768 + on
Primary + 40000 + slow + FAT + 8192 + on
Primary + 40000 + quick + NTFS + 32768 + on
Logical + 500 + quick + NTFS + 8192 + off
Logical + 10000 + slow + NTFS + 512 + off
Logical + 40000 + quick + FAT32 + 16384 + off
Logical + 10 + quick + FAT + 2048 + on
Logical + 100 + quick + NTFS + 32768 + off
Logical + 1000 + slow + FAT + 4096 + off
Logical + 10000 + slow + NTFS + 1024 + off
Logical + 5000 + slow + NTFS + 65536 + off
Logical + 10000 + quick + FAT + 4096 + off
Single + 10 + quick + FAT + 512 + off
Single + 10000 + quick + FAT32 + 65536 + off
Single + 100 + quick + FAT + 8192 + on
Single + 5000 + quick + NTFS + 16384 + off
Single + 10 + quick + NTFS + 4096 + on
Single + 1000 + slow + FAT32 + 2048 + off
Single + 40000 + quick + FAT + 1024 + off
Single + 500 + quick + FAT32 + 512 + off
Single + 5000 + quick + FAT + 32768 + on
Span + 5000 + slow + FAT32 + 2048 + on
Span + 10000 + slow + FAT + 16384 + on
Span + 1000 + quick + NTFS + 512 + on
Span + 10 + slow + FAT32 + 32768 + off
Span + 500 + slow + FAT + 1024 + off
Span + 40000 + quick + FAT + 4096 + on
Span + 100 + quick + FAT + 65536 + on
Span + 1000 + quick + NTFS + 65536 + off
Span + 5000 + slow + FAT + 8192 + off
Stripe + 100 + quick + FAT32 + 1024 + on
Stripe + 5000 + slow + NTFS + 32768 + on
Stripe + 5000 + quick + FAT + 4096 + off
Stripe + 500 + quick + NTFS + 16384 + on
Stripe + 10000 + quick + FAT32 + 512 + off
Stripe + 40000 + quick + FAT32 + 65536 + on
Stripe + 1000 + quick + FAT + 8192 + off
Stripe + 10 + quick + NTFS + 2048 + off
Mirror + 1000 + quick + FAT + 32768 + off
Mirror + 100 + slow + NTFS + 65536 + on
Mirror + 10 + quick + FAT32 + 8192 + off
Mirror + 500 + quick + FAT32 + 2048 + on
Mirror + 5000 + quick + FAT + 512 + on
Mirror + 10000 + slow + FAT + 2048 + off
Mirror + 40000 + slow + FAT32 + 4096 + off
Mirror + 10 + slow + FAT32 + 1024 + off
Mirror + 100 + slow + FAT32 + 16384 + off
RAID-5 + 500 + slow + FAT + 32768 + on
RAID-5 + 100 + slow + FAT32 + 4096 + off
RAID-5 + 40000 + quick + NTFS + 2048 + on
RAID-5 + 5000 + quick + NTFS + 1024 + on
RAID-5 + 10000 + slow + NTFS + 8192 + on
RAID-5 + 1000 + quick + FAT + 16384 + on
RAID-5 + 10 + slow + FAT32 + 65536 + off
RAID-5 + 40000 + quick + FAT32 + 512 + on

Step by step

The simplest way is, of course, to use a generator. One that I found incredibly easy to use is the Pairwise Independent Combinatorial Testing (PICT) tool which is a FOSS made by Microsoft.

But if you’d like to do it manually, I’ll guide you on the step by step on how to do it using this as a context:

We need to test our system works well on any of the following configuration:

Format method: quick, slow
File system:   FAT, FAT32, NTFS
Compression:   on, off

You make a pair for each variable, making sure all pair combinations are made.
- quick + FAT
- quick + FAT32
- quick + NTFS
- quick + on
- quick + off
- slow + FAT
- slow + FAT32
- slow + NTFS
- slow + on
- slow + off
- FAT + on
- FAT + off
- FAT32 + on
- FAT32 + off
- NTFS + on
- NTFS + off
Arrange a combination of all 3 sequences using all available pairs
- quick + FAT + on –> quick + FAT & FAT + on & quick + on
- quick + FAT32 + off —> quick + FAT32 & FAT32 + off & quick + off
- slow + FAT + off –> slow + FAT & FAT + off & slow + off
- slow + FAT32 + on –> slow + FAT32 & FAT32 + on & slow + on
- quick + NTFS + on –> quick + NTFS & NTFS + on & redundancy of quick + on
- slow + NTFS + off –> slow + NTFS & NTFS + off & redundancy of slow + off
As you can see, the last two on the list have a repeat combination of quick + on and slow + off, but since this is inevitable, this is a necessary redundancy.

Concluding thoughts

Here are the key takeaways from the above.

No software can be defect-free as it is not viable, so tests should be done just enough to make you confident that it is stable and reliable enough before releasing it to the user
There are testing techniques that help you get an idea of how much can be considered just enough such as:
- Equivalence partitioning, ensures all valid and invalid conditions are tested.
- Boundary Value Analysis, ensure boundaries between value ranges are properly established.
- Decision Table, ensure no decision rules are left untested.
- Pairwise, for combinatorial values.

An additional point to make from my reflection is that, the technique helps us understand what is “enough”, but as the principle said, once it is more stable, it is wise to add more tests to uncover more defects—keeping in mind that a usable software is more important than a fully defect-free software.

A process flows from left to right: test enough, useful, test more to improve

Relevant references

The seven testing principles and some of the testing techniques can be found on ISTQB’s Foundation Level (CTFL) Syllabus
Well-written description of the seven testing principles by Box UK
Pairwise Independent Combinatorial Testing (PICT) tool, a FOSS tool to generate Pairwise tests by Microsoft:
- its official site, and
- its documentation
Good explanation video by Software Testing Mentor on YouTube:
- Equivalence partitioning
- Boundary Value Analysis

End-notes

This post is dedicated to Yustika and Kocoy, the QEs from my previous startup, Farmacare.

Fri Rasyidi

Table of contents

When can we say that a software is tested "enough"?

The seven testing principles

The testing techniques

Equivalence Partitioning

INTERESTING THOUGHT

Example

Step by step

Boundary Value Analysis

Example

Step by step

Decision Table

Example

Step by step

Pairwise

Example

Step by step

Concluding thoughts

Relevant references

End-notes