Testing is pretty uncontroversial these days, most people involved in software engineering will at least pay lip service to the idea that code should always be tested.
Although all tests are equal, some are more equal than others. Over the years I’ve come to think that the best tests share the same overarching design philosophy. I think working in a well tested codebase is truly an eye opening experience, it is radically different from working in the (more common) “hacked together with duct tape and prayers” kind of projects.
The principles I write about here are heavily inspired by Dave Cheney’s practical go presentation so I definitely recommend to give that a read too if you’re interested in writing code people will enjoy working on.
I’ll illustrate those principles by going through a little tutorial on writing good go tests.
Unit tests must only care about interfaces the function uses
Let’s imagine an example, a Brewer:
type Brewer struct {
human Human
machine Machine
}
func (b *Brewer) Brew() error {
cup, err := b.human.GrabCup()
if err != nil {
return err
}
return b.machine.DoMagic(cup)
}
If you want to unit test the function Brew()
, your test should not care about
the implementation of GrabCup
or DoMagic
, you only want to make sure that you handle the errors returned by those functions.
This is because you can’t rely on code the function can’t see, if you depend on the implementation of e.g GrabCup
for your function Brew()
to perform correctly someone might unwittingly change it and break your function.
If you do depend on some state in another function, this needs to be made clear in its signature (e.g return the additional variable you depend on so you can react correctly)
Define your dependencies and enforce them in code
It’s pretty clear that our brewer needs a human and a machine to operate properly, so we need to make sure that a brewer can never exist without them.
Luckily there’s a pretty straightforward way to enforce that: use the constructor pattern.
func NewBrewer(human Human, machine Machine) (*Brewer, error) {
if !human.IsBarista() {
return nil, errHumanMustBeBarista
}
return &Brewer{
human: human,
machine: machine,
}
}
From now on every time someone wants to initialize a new Brewer, they will use this function, ensuring that we never perform an operation with an invalid brewer.
We also add a bit of validation, making sure we never initialize a brewer with a human who is not a barista. You wouldn’t want sub par coffee!
Interface your dependencies
As stated earlier, you do not want your function to depend on hidden state elsewhere in the codebase. This is where interfaces come in.
You might be familiar with the concept from another language, Interfaces define a set of functions which a struct
must implement in order to satisfy the interface type definition. For example, we don’t really care which human the brewer uses in order to produce coffee.
It could be Jessie
Bob
or Gandalf
; as long as they can tell us if they are a barista and they can grab a cup, they can be used by the brewer. Likewise, the machine doesn’t matter much (for most people), as long as it can perform its magic on the given cup and say if an error ocurred, it can be used to brew.
type Human interface {
GrabCup() (Cup, error)
IsBarista() bool
}
type Machine interface {
DoMagic(Cup) error
}
Mock your dependencies
There’s one last thing we should do before we can unit test our brewer, we need to mock our dependencies.
Since we’ve abstracted away the Human
and Machine
implementation in our function, we can now create a struct
which satisfies those interfaces without any specific internal logic.
For example, let’s create a mock human:
type mockHuman struct {
GrabCupError error
GrabCupGiveCup Cup
IsBaristaAnswer bool
}
func (h *mockHuman) GrabCup() (Cup, error) {
return m.GrabCupGiveCup, m.GrabCupError
}
func (h *mockHuman) IsBarista() bool {
return m.IsBaristaAnswer
}
The GrabCup()
and IsBarista()
functions must not contain any logic, they will simply return the values we give them when initializing a test.
Now use your mocks in your unit test
The only thing left to do is to use our mock in a test!
func Test_Brew(t *testing.T) {
mockError := errors.New("cup handle broke")
wantErr := mockError
mockHuman := &mockHuman{
GrabCupError: mockError,
}
mockMachine := &mockMachine{}
brewer := NewCoffeeMachine(mockHuman, mockMachine)
err := brewer.Brew()
if !errors.Is(err, wantErr) {
t.Errorf("got %v, want %v", err, wantErr)
}
}
This simple test does not do much, it will simply make sure that if our human fails to grab a cup, calling Brew()
returns the error we expect.
We want to test that our logic performs correctly under a range of scenarios. What if the human manages to grab the cup but the machine fails to brew the cup? Do we correctly return an error to the person trying to Brew()
?
In order to test a range of scenarios we’re going to need table tests. This pattern allows us to run the same test code multiple times (in parallel) with just a few tweaks, and check that we get the output we expect each time.
func Test_Brew(t *testing.T) {
t.Parallel()
mockError = errors.New("oops")
tests := []struct {
name string
coffeeMachineErr error
humanErr error
humanGiveCup Cup
wantMagicedCup Cup
wantErr error
}{
{
name: "ok",
coffeeMachineErr: nil,
humanErr: nil,
humanGiveCup: NewCup(),
wantMagicedCup: NewCup(),
wantErr: nil,
},
{
name: "coffee machine broke",
coffeeMachineErr: mockError,
humanErr: nil,
humanGiveCup: NewCup(),
wantMagicedCup: NewCup(),
wantErr: mockError,
},
{
name: "human broke",
coffeeMachineErr: nil,
humanErr: mockError,
humanGiveCup: NewCup(),
wantMagicedCup: NewCup(),
wantErr: mockError,
},
}
for _, tt := range tests {
var tt = tt
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
// initializing our mocked dependencies
mockHuman := &mockHuman{
GrabCupError: tt.humanErr,
IsBaristaGive: true,
}
mockMachine := &mockMachine{
DoMagicGiveErr: tt.coffeeMachineErr,
}
brewer := NewCoffeeMachine(mockHuman, mockMachine)
// running our our test
err := brewer.Brew()
// checking that we received what we expected
if !errors.Is(err, tt.wantErr) {
t.Errorf("got %v, want %v", err, tt.wantErr)
}
if !cmp.Equal(mockMachine.DoMagicGotCup, tt.wantMagicedCup) {
t.Errorf("mockMachine.DoMagicGotCup and tt.wantCup didn't match %s", cmp.Diff(mockMachine.DoMagicGotCup, tt.wantMagicedCup))
}
})
}
}
In this unit test we test that we return the correct information when any of our dependencies fails. Adding a test case is very cheap, just add a value to our test slice. You only need to define what our mocks will return and what you expect Brew()
to yield. Once this is done, run go test --run Test_Brew
to see if your function behaves as expected.
What if the internals of our mocked dependencies matter?
This unit test will not pick up cases where the functions called by Brew()
depend on each other’s state. Imagine if we change the logic to be:
func (b *Brewer) Brew() error {
err := b.human.WalkToTable()
if err != nil {
return err
}
cup, err = b.human.GrabCup()
if err != nil {
return err
}
err = b.human.WalkToTable()
if err != nil {
return err
}
return b.machine.DoMagic(cup)
}
Here in order for GrabCup()
or DoMagic(cup)
to succeed the caller needs to first call WalkToTable()
. We can imagine bugs that can arise from this; for example we might tell our human to walk to the wrong table. Our unit tests would not pick up those bugs since we abstracted our dependencies, and simply check our own logic.
That is an example of a broken abstraction and this code must be refactored. If GrabCup()
and DoMagic(cup)
only succeed if the human is in front of the right table, then this must be part of their own logic. The caller must not be expected to have prepared dependencies which are not defined in the function’s signature. Those broken abstractions invariably lead to bugs since they add needless complexity, which someone will trip over eventually.
Flaky tests
One of the most painful situations to be in in a codebase is to be dealing with a suite of flaky tests (tests which sometimes fail but not always). This happens when some of your tests depend on the same data and either concurrently modify it or fail to reset it to a “clean” state after a run. Flaky tests can come from race conditions, which is when two threads running in parallel access and mutate a shared resource at the same time. However in my experience they often arise from a simpler scenario: both tests depends on the same database and fail to “clean up” the data they modified after they run.
There’s a simple way to ensure that you don’t write flaky tests: never rely on data you don’t fully control in your tests. In the world of ephemeral infrastructure there is no need to rely on a “cleanup” function to reset your database after a test. Your CI pipeline will spin up a virtual machine and discard it once the tests are done running, so simply creating a test table in your database at the beginning of your test is enough. The same should be true of testing locally, if you need to spin up an instance of PostgreSQL to run integration tests locally you should use the same easily disposable instances as your CI.
Of course race conditions can slip through, but when you see them occurring in your CI pipeline you should fix them ASAP. This is because a flaky pipeline gradually devalues the whole test suite since no one knows how many successful runs means things are OK. Genuine race conditions should occur very rarely. There are ways to check for race conditions in your code before it gets merged, which I might go through in another post.
Integration tests come after unit tests
I think a codebase always needs to be first covered by unit tests before writing integration tests. I’ve seldom seen this idea being shared in the wild, since it seems like most people value integration tests more highly than simple unit tests. However, there is no point testing the “system” behavior of components whose individual behavior are not well defined and tested.
If the design principles outlined so far are followed, integration tests are a “nice to have” and should not pick up major bugs. This is because the contracts between each components is made clear via unit tests, and each function correctly encapsulates only the state it cares about.
Some closing thoughts
Tests are not only for correctness, they force you to define exactly how the functions you depend on should behave.
They are a form of documentation, reading a test can tell you a lot about a function’s intended use.
They also make your own code more robust as you have to think about the ways in which your function will break.
They give confidence to new contributors, as they will be less afraid to change a robust system.
I’ve had the pleasure of working with some genuinely well written codebases which followed those principles, so I hope I’ve at least managed to convince you on a few of those points. The development experience between a codebase with proper abstractions and unit tests and one without those features is wildly different.