Fitting Frontend Design Testing into the Test Pyramid

Published on 14.02.2020 by Hans Christian Reinl

The Test Pyramid is a helpful way of thinking about software testing, introduced in 2012 by Martin Fowler. It takes a bird's-eye view on software testing as a whole and asserts that for several economic reasons your project should have more fast and flexible low-level tests (like unit tests) and fewer brittle, slow and expensive high level tests (like integration tests and UI tests).

With Warhol, we appear to be perched at the top of the test pyramid, as we focus on the look and visual consistency of web projects and don't care much about the finer points of business logic. But it is not as clear-cut as it sounds! Warhol shares a few traits with the kind of tests that Fowler placed at the top of his pyramid but can also check low-level design invariants like the color palette — at lighting speed and without much effort on part of the developer. This post explores how exactly frontend UI testing fits into the test pyramid and what Warhol, the new kid on the block, brings to the table.

The original test pyramid
Fowler's original test pyramid categorized testing into three distinct tiers: unit, service and UI.

According to Fowler's test pyramid, low-level unit testing is the most important and the simplest way to test software in an automated fashion. Humans are not very good at consistently verifying the code they write, so unit testing checks software components individually, down to single functions and classes. Each component can be tested in isolation with a variety of different inputs, which makes tests easy to write and quick to run. Unit testing ensures that there are no glaring implementation errors in individual components and that the components adhere to some basics rules regarding inputs and outputs.

The middle part of the pyramid is occupied by API testing and service layer testing. While APIs and services can in theory be tested as thoroughly as individual classes and components, this usually does not happen in practice. Higher-level components can have quite a lot of baggage (like web servers) attached to them, and so they are usually harder to handle and tests simply take longer. But as long as there are comprehensive unit tests at the base of the test pyramid, we can get away with not writing as many tests for APIs and services. We know that all the individual parts work as intended isolation, so we only have to verify that they also work in concert (hence the name integration testing). There is no need to test the API with every possible input once we have established that the API can pass input in general to a unit-tested function.

But people don't interact with APIs and service layers, they interact with buttons and widgets. That means we have to talk about user interface testing. For web projects, UI testing sits at the top of the test pyramid and it is hard and expensive when compared with the lower levels on the pyramid. UI testing can include functional testing with something like Selenium. Selenium and similar tools use remote-controlled web browsers to mimic user interaction with a web project. They can perform quite elaborate scripts (open the online shop, search for product, scroll down, add product to cart, perform checkout), but those are famously hard to implement, consume endless resources, take ages to run and are likely to break. But if integration tests already made sure that our APIs and services work and that they sit on top of unit-tested classes and functions, it makes sense to limit functional UI testing to a few important user interactions like the aforementioned check-out procedure. This is exactly the kind of testing that Fowler placed on the top of the Test Pyramid.

Related to user interface testing is user interface design testing. Where UI testing makes sure that the checkout button works, UI design testing aims to make sure that the checkout button looks as intended. This is way harder than it sounds! The obvious approach is to let remote-controlled browsers take screenshots of said checkout button and let an algorithm compare them pixel by pixel. Many commercial services and open source tools (like wraith, BackstopJS, and spectre) perform this kind of visual regression testing, and they generally work quite well for what they are. But the screenshot approach has significant downsides:

  • Resource consumption and time investment are enormous, as hundreds of megabytes of pixel data need to be created, stored and compared. The developer experience becomes torturous.
  • Screenshots can't differentiate between design changes and content changes. If your web project is multi-language or has any non-deterministic content (because of third-party widgets or user-generated content), false positives will ruin your day even further.
  • There is an inherent chicken-and-egg problem: you want to make sure that the checkout button “looks as intended”, but where can the screenshot tool find the official, definitive checkout button? Certainly not in the pattern library with all its false-positive-causing placeholder content 

It is no surprise that automated user interface design testing is even less frequently used than functional user interface testing, even with development teams that are not strapped for brains or resources. Dealing with screenshots is just too painful!

In a sense, UI Design testing occupies a place above the tip of the Test Pyramid. It is generally used on an even smaller scale than functional UI testing and is just as resource hungry and time-consuming. But there is an important difference between functional UI testing and UI design testing: the former stands on top of unit and integration testing, while the latter has no foundation to stand on! Design consistency is obviously business-critical to get right, but in today's works is nothing between manual testing (which is designers and developers just looking at the screen really hard) and dealing with screenshots. Enter Warhol.

Warhol is a new way to find design bugs in web pages. Warhol does not compare pixels and screenshots, but rather HTML and CSS (which are essentially the precursors to pixels and screenshots). By consuming human-friendly pattern libraries as input it can extract the rules of your design system from examples and verify that your production web projects stays in line with the pattern libraries' requirements. Warhol is fast, and provides high-quality feedback that finds errors deep inside your components implementations. But how does Warhol fit into the test pyramid?

When it comes to testing UI components, Warhol could be categorized into the upper half of the test pyramid. It performs its test on a very high level (up to live production projects) and setting Warhol up for testing components entails at least some work. Your pattern library needs to include an example component that covers all of your use cases and you then have to configure the component in Warhol's web UI. The tests cover individual components and run in a few seconds, but the setup may take a few minutes.

But Warhol can also work on a level below the Test Pyramid's foundation, at the level of infrastructure and fundamental rules. Apart from components, Warhol can also check basic design invariants like color palettes and typography use  for any element, not just for manually defined components! This is a level of testing even below unit tests. It checks only a few rules, but does so for the whole web page. Your test coverage is always 100%, but the tests only verify the most basic and most important style characteristics of all your components. This constitutes a new level on the test pyramid where creating tests takes next to no effort at all1 and focuses on the tiniest building blocks (individual elements) of your UI.

Warhol does not neatly fit into any of the original test pyramid's three tiers, but occupies a spectrum on and below the test pyramid. This is not that surprising, given that Warhol is a new and rather specialized tool for design testing in web projects. The original test pyramid hails from 2012 and is concerned with software testing in general, so we can hardly expect it to include the latest and most specific solutions to UI testing that 2020 has to offer. The test pyramid still remains a useful way to think about software testing, but in order to accommodate Warhol, it requires the addition of an elevator shaft and a new basement.

Written by

Hans Christian Reinl

JavaScript Development. @workingdraft, @nightlybuildio


More from the blog

Be the first to know

You want to know more? Stay tuned and subscribe to our mailing list. We keep you posted of the progress of Warhol.