Included file 'gauges_analytics.html' not found in _includes directory

MrJoy

Just a Coder, Carryin' On Cranky

An Unusual Case For Cucumber

Permalink

Today I came across a case against Cucumber, and I have a couple bones to pick. More importantly, I want to elaborate a bit on the author's comments about enjoying the Gherkin syntax, and expand on an idea there.

First off, I take issue with:

Given(/^I am a (\w+)$/) do |user_type|
# make me this user type
end


Have fun grepping for the latter. Chances are someone added a couple other step definitions of the form “I am a”.

  1. When running Cucumber, it tells you where the step definition is defined.
  2. When ambiguous regexes arise, that produces an error that Cucumber will inform you of.

Now, on several of his other points – largely about the role of non-technical or semi-technical individuals in writing cukes, I agree. This ideas was misguided and doesn't work well in practice. At best it's awkward and makes TDD somewhat more cumbersome and inflexible.

It's worth noting, I am generally opposed to using Cucumber with Capybara precisely because browser-automation in English becomes a brittle mess in a hurry.

The Real Value of Cucumber

The author almost manages to connect the dots and find the real value of Cucumber when he says:

I don’t prefer to use Cucumber for any of my testing, but I do enjoy the Gherkin syntax. Not for testing, but for gathering feature requirements. It provides a very clear and concise way of explaining a feature, without confusion. But that is where the line is drawn.

So close!

To my mind, the value of the Gherkin syntax is in capturing the intent of the feature. This is a subtle difference from requirements. I think of the former vs. the latter in a manner similar to how I think of declarative vs. imperative code, but that's just an abstraction and maybe not a very informative one.

So what is the difference?

Requirements are a moment-in-time snapshot – a means of communicating an objective between parties. They are only really meaningful in terms of the context of the moment, which is rarely well documented, or curated. They are an important tool for PMs and the like to get engineers to understand what is needed right now, but fall down later on. I'll explain below.

Intentions are a more context-free way of expressing an objective. They're less useful for communicating what needs to be built, but they fill the gap that requirements have.

Where is that gap? Simple: Maintenance.

Imagine this scenario. You are a new engineer on a team, and you have begun to get into the flow and context of the product development cycle. You're implementing a feature, and in so doing, a test breaks. Let's consider three ways the test could be expressed and what your life is like in each scenario.

Basically, when a test you didn't write breaks after you make a change, you need to determine if the test breakage is:

  1. Accidental, and needs to be corrected.
  2. A legitimate change in behavior, due to different business circumstances, requiring that this broken test be updated to match the new semantics.
  3. Not relevant because that code isn't even needed anymore.

The RSpec Way

You have a nice, clean, well-structured set of describe, context, and it blocks following the Arrange, Act, Assert pattern very cleanly. It's all well-organized. But what does it tell you about the test that broke? Just what the expected behavior was at the time it was written – with no context about why it should behave that way.

You now need to go query people to find out what kind of test breakage you have, unless you happen to know that part of the system well – and you often won't.

Hopefully someone remembers the history of that feature, and all the relevant evolution of the business and system to be able to give you a good answer.

The Cucumber-As-Requirements-Specification Way

Frankly, this is just the same thing, framed slightly differently. It's a slightly higher-level view – but not much of one. You still lack any notion of the original context that might help you determine the kind of failure you face. So again, you're left interviewing people and spelunking through code.

The Cucumber-As-Intent-Specification Way

Properly written, this cuke tells you what the high-level intent of the feature is. You generally need far less understanding of the whole of the system to contextualize this information and reach a conclusion about how to proceed.

So, what does it mean for a cuke to be written in terms of “intent”? Let me provide a real-world example from a company I recently worked for. I have anonymized the scenario, which makes this more vague than I'd like but structurally it should convey my point.

Feature: Credential idempotency
In order to avoid unexpected behavior in credentials when jobs dispatch out-of-order
Credentials use an edit version as a watermark to ensure late-arriving jobs don't poison the data
Background:
Given a credential for a supported vendor
Scenario: Editing authorization information of a credential increments the version
When the encrypted data is modified
Then the edit version of the credential should be incremented
Scenario: Editing the nickname of a credential does not change the version
When the nickname is modified
Then the edit version of the credential should not change
Scenario: Editing the estimate configuration of a credential does not change the version
When the estimate details are modified
Then the edit version of the credential should not change
Scenario: Setting the account identifier does not change the version
When a validation success response is received
Then the edit version of the credential should not change
Scenario: Changing the state of a credential does not change the version
When a validation failure response is received
And a validation error response is received
Then the edit version of the credential should not change
# One example of where this could happen if the user changed a credential's
# password quickly, several times and the workers are slow / backlogged.
Scenario: Jobs relating to different versions of a credential dispatch out of order
When a change is made that increments the edit version
And validation jobs arrive out of order
Then the job for the older version should not modify the credential data
Scenario: A credential conversion increments the edit version
Given an unverified amazon credential
When the credential is converted
Then the edit version of the credential should be incremented

The comment serves to provide some in-the-moment context that may be useful for understanding the intent down the road, and determining if the scenario is still relevant. Ideally you shouldn't need too many such comments, but it can happen.

This higher-level notation lets me ask more meaningful questions.

  • Do these jobs still exist?
  • Do we still use a queue that doesn't guarantee dispatch order for sequences of jobs?
  • Do credentials still have fields that need to be idempotent, and ones that do not?

I can go look in our job queues to see if jobs related to Credentials pass through them. If not, then perhaps we don't use Credentials anymore.

In short, I can come to much stronger conclusions about how to proceed if my change breaks this test, without having to speak to nearly as many people or do as much archeology.

Describing Intent is Hard

It's really hard to remove yourself from the context you are in.

At the company where the cuke from which the above example was derived, we wound up coming up with a styleguide on how to write cukes, to minimize confusing or misleading language, and had peers review the verbiage of our cukes for clarity. Personally, I was a fan of having the engineer least involved – while not being totally _un_involved – to the relevant area of the system do the review. That seemed to provide the best approximation of someone coming back and reading the cuke later on.

Our style-guide read like this:

Cucumber Feature Style Notes

  • Features are black-box tests, not white-box tests.
  • Features describe your intent in building a feature of the system. They describe what matters about the feature.
    • Include details that are relevant to what matters about the scenario(s).
    • Avoid details that are not relevant to what matters about the scenario(s).
    • Someone who is unfamiliar with the details of our system should be able to understand what mattered to you about the code under test.
  • Clarity matters more than reuse when it comes to step definitions.
    • Reuse can happen through helper methods/classes if need be, but don't compromise readability/clarity.
  • When a .feature has been peer-reviewed, a note should be added to that effect. This note should be removed or otherwise amended when a non-trivial change is made to the feature that could render it less clear than expected.

General Scenario Constraints

  • A Given should identify one item required for the test. Thus, five or six givens might be required.
  • No more than two Whens should be required. More suggests you are testing too much or wording it wrong.
  • No more than two Thens should be required. More suggests you are testing too much or wording it wrong.

Writing Style

  • Implicit subjects should be avoided unless the test deals with precisely one object.
    • In other words, be sure that it's not ambiguous what “it” refers to!
  • Tests should be written in the third person to avoid awkward turns of phrase when framing steps in terms of specific objects rather than high-level / user-visible situations.
  • Thens should be framed in in terms of “should”. As in “Then X should have Y property.”

Formatting

  • Feature descriptions are indented two spaces, and wrapped to 80 characters.
  • Feature comments are indented two spaces and begin with Ruby-style hash comment markers and are wrapped to 80 characters.
  • Scenario and Background declarations are indented four spaces.
  • Givens are indented 6 spaces.
  • Whens and Thens are indented 7 spaces so they right-justify with Givens.
  • And/But blocks are indented 8 spaces so they right-right-justy against the above Given/When/Then they correspond to, in order to facilitate scanability.

A Small Note on Step Definitions

One other problem with Cucumber can be finding and eliminating dead step definitions. A key part of this is to ensure that your code coverage analysis include step definitions. No execution of step bodies == good sign of a dead step.

However, I tend to encourage the use of natural and fluid language in cukes, rather than coercing things into the template of existing step definitions, but that can create a bit of a mess.

A naive solution would be to expand the regex for various linguistic framings:

# Changing this:
Given(/^I am a (\w+)$/) do |user_type|
# make me this user type
end
# Into this:
Given(/^(?:I am a (\w+)|(?:A (\w+) is using the system))$/) do |user_type|
# make me this user type
end

Unfortunately, you can't easily find dead phrasings, so you're stuck with an ever-growing list of alternatives, and resorting to trial-and-error removal of alternatives to find dead variants.

A better option is to hoist the body and use multiple step definitions:

def given_a_user_of_type(user_type)
# make me this user type
end
# I don't use a single-line proc, or a proc variable because I want the
# coverage tool to show me a red line on the call for variants that aren't
# used anymore.
Given(/^I am a (\w+)$/) do |user_type|
given_a_user_of_type(user_type)
end
Given(/^A (\w+) is using the system$/) do |user_type|
given_a_user_of_type(user_type)
end

Wrapping It Up

In my time as an engineer, I have yet to see a test suite that didn't wind up brittle, obtuse, hard to work with, and just generally painful. That's not to say the suites had no value – they were often very valuable, especially once a system ceased fitting in one person's head – but I still firmly believe that things can be improved in a way that's actually practical and realistic, to get greater benefits at lower cost (overhead).

The trick is that it requires a bit of thought, communication, and understanding among a team. It requires that people take a step back and look at the larger picture. So the only question in my mind is how to make that a cheap operation, with a high enough value to get people to buy into it.

Comments