It’s the Journey, Not the Destination

Once an organization transitions from a start-up to a more mature business, it often finds that its software development velocity stalls when it tries to add new features or attempts to refactor problematic code. This is because without solid, automated tests developers don't know if they've broken existing behavior.

Management then tries to ameliorate the crisis by prioritizing automated testing. Because higher-level end-to-end (e2e) testing promises more test coverage per line of test code written, it's often pushed as a silver-bullet. However, e2e testing comes with serious drawbacks. In addition, decision makers often overestimate the potential for e2e testing to find bugs, and underestimate the value of other types of automated tests and quality assurance strategies.

This post aims to describe some broad categories of automated testing and clarify why we write automated tests in the first place, so that developers and managers can make wise choices in implementing a testing strategy that works.

Types of Automated Testing

In this post I'll define three types of automated testing, though in reality there are many more possibilities, such as load testing, security testing etc. For simplicity's sake I'll discuss testing in the context of web development.

Unit Tests

Unit tests exercise individual functions. If I give a function an input, a unit test asserts that the output is what is expected. These tests are the cheapest/fastest to write, maintain and run. They are the most focused tests, meaning that when a test fails, a developer will most likely know exactly what failed, why and where. The promise is that if developers compose their apps from functions and test those functions thoroughly, the app will be correct.

However, unit tests don't test how parts of the system interact, so, compared to other types of tests, they don't accurately reflect the state of the system as a whole. Here's an example of a unit test using JavaScript's Jest framework:

import { toTitleCase } from "./utils";

describe("utils", () => {
  describe("toTitleCase", () => {
    it("returns the string in title case", () => {
        expect(toTitleCase("foo bar")).toEqual("Foo Bar");
    });
  });
});

Integration Tests

Integration tests exercise how different modules, functions or classes interact. Like unit tests, these tests are also relatively cheap to write, maintain and run, but not quite as cheap because the tester needs to keep different modules in mind when developing the tests. A developer may need to mock things like HTTP calls, databases etc. or simply leave those parts of the system out of the tests by using strategies such as interfaces and dependency injection.

Here's an example in Python/Django which leverages Django's built-in TestCase. This class provides a convenient test client that allows the developer to transparently mock HTTP calls against the router. This effectively tests the controller, but without setting up an actual database and live server:

from django.core import mail
from django.test import TestCase

class ContactTests(TestCase):
   def test_post(self):
      # POST to the endpoint:
     response = self.client.post(
       '/contact/',
       {'message': 'I like your site'},
     )
      # Make our assertions:
      self.assertEqual(response.status_code, 200)
      self.assertEqual(len(mail.outbox), 1)
      self.assertEqual(mail.outbox[0].subject, 'Contact Form')
      self.assertEqual(mail.outbox[0].body, 'I like your site')

These tests share many of the advantages/disadvantages of unit tests, but tend to be a bit more involved and less focused than unit tests. However, these tests provide more test coverage per line of test code written than unit tests. When the framework and/or language provides good tools (such as Django does) these can be quick and easy to write.

E2e

E2e tests are black-box tests that exercise the system as a user would. Writing these tests may involve bootstrapping resources outside of the code-under-test, such as servers and databases and/or automating user interactions with a UI.

Here's an example e2e test using Selenium code in Python that goes to python.org, asserts the page title is what we expect, searches for"pycon" and asserts that we got some results (credit: https://www.geeksforgeeks.org/writing-tests-using-selenium-python/):

import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

# inherit TestCase Class and create a new test class
class PythonOrgSearch(unittest.TestCase):

    # initialization of webdriver
    def setUp(self):
        self.driver = webdriver.Firefox()

    # Test case method. It should always start with test_
    def test_search_in_python_org(self):

        # get driver
        driver = self.driver
        # get python.org using selenium
        driver.get("http://www.python.org")

        # assertion to confirm if title has python keyword in it
        self.assertIn("Python", driver.title)

        # locate element using name
        elem = driver.find_element_by_name("q")

        # send data
        elem.send_keys("pycon")

        # receive data
        elem.send_keys(Keys.RETURN)
        assert "No results found." not in driver.page_source

    # cleanup method called after every test performed
    def tearDown(self):
        self.driver.close()

# execute the script
if __name__ == "__main__":
    unittest.main()

The advantage of e2e tests is that we can test the entire system's behavior similarly to how a user would experience the app. More code is exercised per line of test code written than any other type of automated test. However, the tests:

  1. are more complex and harder to write. Imagine, for one, setting up a test environment with a clean web server and database for each test.
  2. tend to be brittle due to timing issues, changes in UI, test interference etc.
  3. run much slower than unit or integration tests.
  4. are less focused: when something goes wrong it's hard to know what went wrong. E.g.: did the webpage not load totally before making assertions, or did the test actually find a bug?

Where to Focus?

So where do we focus our testing energies?

Kent Beck, one of the founders of extreme programming, came up with the test pyramid--a strategy for distributing testing whereby unit tests make up the majority of the tests, integration tests make up less than unit tests, and e2e tests are the minority (credit):

Beck suggests this test distribution for the exact reason that I mentioned above: unit tests are cheap and focused, integration tests are a bit more expensive and less focused, and e2e tests are expensive and unfocused.

When I've shown decision makers this pyramid, they nod their heads but continue to suggest investing mostly in high-level, e2e tests. The reason for this, I suppose, is that they are desperately trying to solve the impending catastrophe of unstable software. Organizations learn about this instability from users who experience the product vis-à-vis the app's UI. If users are complaining the product is unstable, why not test from the perspective of the end-user?

This is entirely logical, if we put aside the cost of e2e tests and assume that the main benefit of testing is reducing bugs and directly stabilizing software. The problem is this assumption is wrong--though, as I'll show, reliable software is often a welcome bi-product of systematic automated testing.

What Automated Testing Gives Us

The problem is that typical management (and the typical developer alike) have a narrow view of the benefits of automated testing. It's not just about finding bugs before users do.

Effective testing should find bugs before users do, but also:

  1. provide feedback to developers as they add to or fix the code base, giving them confidence to constantly improve the code
  2. encourage developers to write better code
  3. provide a controlled environment to reproduce bugs

First, the evidence is unconvincing whether e2e tests are particularly effective at finding bugs before users do. Several studies, including one in the IEE Transactions on Software Engineering (SE-12(7):744-751, July 1986) found that code reviews found considerably more bugs per hour than most types of automated testing, including e2e tests. So if finding bugs before users is important to an organization, then requiring and instituting a process for formal code reviews should be at the top of the list, yet few places that I've worked for require them.

Steve McConnell in his Code Complete makes similar observations. In fact, according to him, the high range of bugs found per hour for writing unit and integration tests is higher than that of e2e testing.

Provides Feedback

Effective testing provides rapid feedback to developers whether by modifying code, they've broken existing specifications. And when I say rapid I mean seconds, not minutes. Developers need to know almost immediately if a refactor has broken previously confirmed behavior. Without this rapid and thorough feedback cycle, developers lose confidence in their ability to adapt the code to new requirements and are reluctant to refactor once the code works.

A critical element to successful, stable software is the confidence to change implementations under the hood. I've found that a winning software process is a modified form of test-driven-development (TDD):

  1. Break up the problem into parts.
  2. Write tests, using the tests as a "laboratory" for trying out ideas.
  3. Get the software working fast, implementing it in whatever ugly way possible, and make the tests pass.
  4. Think of a cleaner, better way to implement the solution.
  5. Refactor until the tests pass again

I am not religious about writing tests first, or writing tests for everything. Keep in mind that the whole point of TDD is that it's a tool for taking as small or large steps as a developer needs to make progress and write quality code.

That said, the general practice of TDD is a scientific, confidence-inspiring process for writing stable software. I say process, because it emphasizes the journey, not the destination. In the end, a stable product with comprehensive tests is a happy by-product of a sane methodology. Because working this way requires quick, focused feedback it requires developer-written unit and/or integration testing over e2e testing.

Encourages Better Code

A second goal of automated testing is that it encourages writing more modular, composable code. It should be no surprise that solutions made up of small, encapsulated parts is a cleaner, more flexible architecture. It's not just that writing modular code makes testing easier, it's that the very act of testing makes code more clean and modular. For example, if developers know they need to test their code, they'll tend to do a few things, like:

  • compose solutions out of bite-sized functions
  • write in a style that pushes side-effects (like network requests, reading files etc.) to the periphery of the app, so as not to have to "mock the world" when testing.
A Lab to Reproduce Bugs

Once developers have proper tests in place, the tests act as laboratories to reproduce bugs. Automated tests pay for themselves when something like the following happens:

  1. a user or other developer reports a bug.
  2. the developer grabs the data from a log and writes a failing test using that data.
  3. the developer fixes the bug and the test passes.
  4. the developer runs the app, and the bug is fixed.

The team can use the test-suite to communicate bugs with each other. The team gets:

  • documentation of the bug/requirement in the form of a test.
  • a regression test that guarantees that bug will stay fixed, forever.

My Recommendation

I hope I've made the case that companies cannot hope to produce more stable, agile software on a whole by focusing entirely on e2e tests. In fact, e2e tests are sometimes used as a cop-out from improving stability and quality by other means. Instead, organizations should focus on the software writing process by encouraging fast, unit and integration testing during development.

When adding a formal code-review process, participants can ask things like, "how do you know that the code you've just written realizes its requirements, and how do you know it will continue to do so?"

The testing process does not have to remain static. That is, developers can start off writing lots of unit tests, but as the solution solidifies, they can rely more on higher-level integration tests that offer more coverage per line of test code written.

I do not mean to say that e2e tests have no place in the software development process. If companies have the resources and dedicated experts to write comprehensive and stable e2e tests, then by all means they should do so. The fact that I haven't seen this successfully improving the quality and reliability of software projects where I've worked doesn't mean that it can't, or hasn't been done. I have, however, found time and time again, that the developer-centered process I've outlined above produces reliable, and more flexible software.

It shouldn't be surprising that a systematic, scientific process for writing software will produce quality results. But this takes practice and maturity, both of which defy shortcuts.