This is a story in ten posts about the evolution of test automation in a fairly big company in the mobile industry
1) Flaky tests are bad, but evergreens are worse
Many years ago, at the very advent of android, we found ourselves in a rather exciting situation. The playfield was to a large degree unknown and we were figuring things out as we went along.
Not only was the android platform vastly more open and accessible compared to the old feature phone, there were big changes in the development tools as well; Git, Gerrit, Hudson (that would split/turn into Jenkins) and the whole open source idea.
One famous quote from those early days boldly stated that
‘We don’t need testing anymore, as no developer will want to risk his or her rep by committing bad code to an open source project’
And even though we may smile at that statement today I do not relay it in order to shame anyone, but to illustrate how new this world was to us all.
Anyway, the chosen model for development was not surprisingly a decentralized one with lot of freedom and responsibility distributed into small agile teams. Most teams responded positively and, supported by a very dedicated and strategically focused tools department, they would swiftly move towards a continuous integration context.
This was so successful that the tools department would soon set up screens in the common areas to display the latest aggregated build results. Obviously, it was just a matter of time before last night’s junit test results were also accumulated and displayed.
And this is where strange things started to happen.
The screens would show a 99-point-something pass rate right over the line, and I can tell you that it may impress a visitor – but as for the rest of us it was not only confusing, it was corrupt and potential damaging information. Looking at the screens you could easily get the impression that the software was all but ready for release. In reality the teams were busting all sorts of body parts to bring up the platform, merge legacy code and add features at a break neck speed. Having a lot of tests constantly pass under these circumstances is peculiar.
So what happened here, except for the fact that we were getting a lot of false negatives?
One explanation may lie in how tests are automated.
When you design automated tests you will during design take a few shortcuts in order to be able to verify the individual parts of your script/test-suite and to speed up development.
All automation projects I have seen so far have started with one single test case, be it UI-based script or unit test – does not matter, it is where you start. When the single test case runs through and produces a ‘PASS’, you add more tests, and make them run as smoothly, right?
Eventually you will start to automate the full test execution; setting up the environment, install the test code, execute the tests, gather and report results etc, you know the drill. At this point it is tempting to comment out all but a few test cases that swiftly and reliably pass, and focus on the reliability of the automation part surrounding the test code.
Now the trap is set. Part of your coding aims to make a mess (that would be the test code) whereas the other part aims to run as smooth as lubricated silk. Being able to recognize which part is which is of course imperative.
It may seem like I am busting open doors here, but I have been unfortunate enough to see the following code IRL:
… and I do not ever, ever want to see it again. Remember; ours is not the task to design systems that run smoothly, it is to thoroughly mess up the tested application.
In order to achieve this kind of bipolarity in the system there are a few measures to be taken, and to mention a few:
- To start with, separate the automation code from the automated test code. Create a common framework for the automation part and handle that as a normal software project, then publish a set of reliable APIs to the test designer. The automation code must be beyond suspicion, so that errors and failures of the automated test can be assumed to be due to defects in the tested system. More on this in a coming post.
- Train the test designer. Either he or she is a developer and needs a crash course in test technique, or he or she is a tester and needs training in efficient coding.
- Enforce peer-review of the test code, and include the whole development team. Code is code – handle it like that. Do not let the developers in the team get away with ‘I have no experience in testing (it is below my dignity), so I will can’t review (be bothered by) this.’
- Measure internal leakage from automated to manual tests. By mapping test results from automated as well as manual tests against the same software components, you may be able to identify these pesky evergreens (and many other interesting phenomena).
- Get the software project interested, if played correctly the PMs will be a powerful ally in driving test code quality. This is so important that it is also deserves a separate post.
The test case is the single most important part of your automated test system. If you fail at creating good tests that make the applications under test toss and turn, then you need not bother about the rest.
Make sure to separate the automated test from the test automation. Handling both at the same time easily leads to either evergreens or flaky results.