Today I came across a case against Cucumber, and I have a couple bones to pick. More importantly, I want to elaborate a bit on the author’s comments about enjoying the Gherkin syntax, and expand on an idea there.
First off, I take issue with:
Given(/^I am a (\w+)$/) do |user_type|# make me this user typeend
Have fun grepping for the latter. Chances are someone added a couple other step definitions of the form “I am a”.
- When running Cucumber, it tells you where the step definition is defined.
- When ambiguous regexes arise, that produces an error that Cucumber will inform you of.
Now, on several of his other points – largely about the role of non-technical or semi-technical individuals in writing cukes, I agree. This ideas was misguided and doesn’t work well in practice. At best it’s awkward and makes TDD somewhat more cumbersome and inflexible.
It’s worth noting, I am generally opposed to using Cucumber with Capybara precisely because browser-automation in English becomes a brittle mess in a hurry.
The Real Value of Cucumber
The author almost manages to connect the dots and find the real value of Cucumber when he says:
I don’t prefer to use Cucumber for any of my testing, but I do enjoy the Gherkin syntax. Not for testing, but for gathering feature requirements. It provides a very clear and concise way of explaining a feature, without confusion. But that is where the line is drawn.
To my mind, the value of the Gherkin syntax is in capturing the intent of the feature. This is a subtle difference from requirements. I think of the former vs. the latter in a manner similar to how I think of declarative vs. imperative code, but that’s just an abstraction and maybe not a very informative one.
So what is the difference?
Requirements are a moment-in-time snapshot – a means of communicating an objective between parties. They are only really meaningful in terms of the context of the moment, which is rarely well documented, or curated. They are an important tool for PMs and the like to get engineers to understand what is needed right now, but fall down later on. I’ll explain below.
Intentions are a more context-free way of expressing an objective. They’re less useful for communicating what needs to be built, but they fill the gap that requirements have.
Where is that gap? Simple: Maintenance.
Imagine this scenario. You are a new engineer on a team, and you have begun to get into the flow and context of the product development cycle. You’re implementing a feature, and in so doing, a test breaks. Let’s consider three ways the test could be expressed and what your life is like in each scenario.
Basically, when a test you didn’t write breaks after you make a change, you need to determine if the test breakage is:
- Accidental, and needs to be corrected.
- A legitimate change in behavior, due to different business circumstances, requiring that this broken test be updated to match the new semantics.
- Not relevant because that code isn’t even needed anymore.
The RSpec Way
You have a nice, clean, well-structured set of
blocks following the Arrange, Act, Assert pattern very cleanly. It’s all
well-organized. But what does it tell you about the test that broke? Just
what the expected behavior was at the time it was written – with no context
about why it should behave that way.
You now need to go query people to find out what kind of test breakage you have, unless you happen to know that part of the system well – and you often won’t.
Hopefully someone remembers the history of that feature, and all the relevant evolution of the business and system to be able to give you a good answer.
The Cucumber-As-Requirements-Specification Way
Frankly, this is just the same thing, framed slightly differently. It’s a slightly higher-level view – but not much of one. You still lack any notion of the original context that might help you determine the kind of failure you face. So again, you’re left interviewing people and spelunking through code.
The Cucumber-As-Intent-Specification Way
Properly written, this cuke tells you what the high-level intent of the feature is. You generally need far less understanding of the whole of the system to contextualize this information and reach a conclusion about how to proceed.
So, what does it mean for a cuke to be written in terms of “intent”? Let me provide a real-world example from a company I recently worked for. I have anonymized the scenario, which makes this more vague than I’d like but structurally it should convey my point.
The comment serves to provide some in-the-moment context that may be useful for understanding the intent down the road, and determining if the scenario is still relevant. Ideally you shouldn’t need too many such comments, but it can happen.
This higher-level notation lets me ask more meaningful questions.
- Do these jobs still exist?
- Do we still use a queue that doesn’t guarantee dispatch order for sequences of jobs?
- Do credentials still have fields that need to be idempotent, and ones that do not?
I can go look in our job queues to see if jobs related to Credentials pass through them. If not, then perhaps we don’t use Credentials anymore.
In short, I can come to much stronger conclusions about how to proceed if my change breaks this test, without having to speak to nearly as many people or do as much archeology.
Describing Intent is Hard
It’s really hard to remove yourself from the context you are in.
At the company where the cuke from which the above example was derived, we wound up coming up with a styleguide on how to write cukes, to minimize confusing or misleading language, and had peers review the verbiage of our cukes for clarity. Personally, I was a fan of having the engineer least involved – while not being totally _un_involved – to the relevant area of the system do the review. That seemed to provide the best approximation of someone coming back and reading the cuke later on.
Our style-guide read like this:
Cucumber Feature Style Notes
- Features are black-box tests, not white-box tests.
- Features describe your intent in building a feature of the system. They
describe what matters about the feature.
- Include details that are relevant to what matters about the scenario(s).
- Avoid details that are not relevant to what matters about the scenario(s).
- Someone who is unfamiliar with the details of our system should be able to understand what mattered to you about the code under test.
- Clarity matters more than reuse when it comes to step definitions.
- Reuse can happen through helper methods/classes if need be, but don’t compromise readability/clarity.
- When a
.featurehas been peer-reviewed, a note should be added to that effect. This note should be removed or otherwise amended when a non-trivial change is made to the feature that could render it less clear than expected.
General Scenario Constraints
Givenshould identify one item required for the test. Thus, five or six givens might be required.
- No more than two
Whens should be required. More suggests you are testing too much or wording it wrong.
- No more than two
Thens should be required. More suggests you are testing too much or wording it wrong.
- Implicit subjects should be avoided unless the test deals with precisely one
- In other words, be sure that it’s not ambiguous what “it” refers to!
- Tests should be written in the third person to avoid awkward turns of phrase when framing steps in terms of specific objects rather than high-level / user-visible situations.
Thens should be framed in in terms of “should”. As in “Then X should have Y property.”
- Feature descriptions are indented two spaces, and wrapped to 80 characters.
- Feature comments are indented two spaces and begin with Ruby-style hash comment markers and are wrapped to 80 characters.
- Scenario and Background declarations are indented four spaces.
Givens are indented 6 spaces.
Thens are indented 7 spaces so they right-justify with
Butblocks are indented 8 spaces so they right-right-justy against the above
Thenthey correspond to, in order to facilitate scanability.
A Small Note on Step Definitions
One other problem with Cucumber can be finding and eliminating dead step definitions. A key part of this is to ensure that your code coverage analysis include step definitions. No execution of step bodies == good sign of a dead step.
However, I tend to encourage the use of natural and fluid language in cukes, rather than coercing things into the template of existing step definitions, but that can create a bit of a mess.
A naive solution would be to expand the regex for various linguistic framings:
Unfortunately, you can’t easily find dead phrasings, so you’re stuck with an ever-growing list of alternatives, and resorting to trial-and-error removal of alternatives to find dead variants.
A better option is to hoist the body and use multiple step definitions:
Wrapping It Up
In my time as an engineer, I have yet to see a test suite that didn’t wind up brittle, obtuse, hard to work with, and just generally painful. That’s not to say the suites had no value – they were often very valuable, especially once a system ceased fitting in one person’s head – but I still firmly believe that things can be improved in a way that’s actually practical and realistic, to get greater benefits at lower cost (overhead).
The trick is that it requires a bit of thought, communication, and understanding among a team. It requires that people take a step back and look at the larger picture. So the only question in my mind is how to make that a cheap operation, with a high enough value to get people to buy into it.