Managing Technical Debt

A Perspective on Technology Debt

(Cross-posted to the Cloudability Blog)

As engineers and managers we often speak of “technology debt” – the accumulated cruft and drag coefficient that make maintaining a system harder over time. Brittleness arises over time, as sure as the sun rises.

It seems to happen due to a variety of factors, but the ones that stand out to me the most are:

  • Dusty, long-forgotten corners of the system.
  • Nasty but necessary expedient hacks.
  • Changes in business requirements that result in code being repurposed awkwardly.
  • The system growing larger than can be fit in one head, making it impossible to see the “big picture” in enough detail to find all the things that no longer make sense.
  • Changes in the development styles of the engineers working on the system over time, as new engineers come in, and old engineers leave.
  • Variation in the skill levels of engineers involved in various aspects of the system.

Much thought is given to technology debt and ways of managing it, but it’s a complicated problem and most of the techniques I’ve come across require a degree of organizational buy-in and individual discipline that I have difficulty imagining them being deployed adequately to fulfill their vaunted promises. In some cases, there’s a wide sentiment that fully employing the techniques would actually be net harmful.

In all cases, the solution proposed is in project management techniques.

Not long ago, I had an idea for addressing the problem that I think can be integrated into more or less any project management technique, provides a minimum of overhead, and a plethora of benefits.

Treating Technical Debt Like Real Debt

Essentially, my idea is simple: Treat technology debt the way I would treat real debt. Find my debts, and make regular, incremental payments on them.

That sounds easy in the abstract, but as always the devil is in the details.

What does it even mean to make an “incremental payment” on technology debt?

Well, it’s more than just fixing code. It’s about fixing the causes of the code becoming a problem over time. You have multiple debts to pay down:

  • The code itself.
  • The gap in skill between the engineers on your team.
  • Minimizing silos of information that exist only in one person’s head.
  • Getting engineers to be on the same page stylistically enough to avoid clobbering each other.

Cloudability and Technology Debt

Like many startups moving at a break-neck pace, Cloudability – being a year old – has already accrued a bit of technology debt. Not enough to be making TheDailyWTF.com, but enough to make me want to get on top of it.

So at Cloudability, some months ago, I introduced a new process that is already making a world of difference on all fronts.

Introducing: Kill it With Fire Fridays

Kill it With Fire Fridays is a once-weekly meeting, and a small amount of ‘homework’ for every engineer.

Every friday, all engineers meet for 1.5 hours. No more, no less. And every engineer gets 1 hour of 'homework’. If their homework takes them more than an hour they’re to stop, put what they’ve got on a branch, and then we can discuss what made our estimation of the effort needed incorrect at the next meeting.

The key to making this process work is, of course, buy-in from the very top on down. All the management-level employees are bought into this process and have made sure to plan accordingly.

So… What is it?

In Kill it With Fire Friday, we bring up a variety of topics for discussion, and we burn through a long list of things to make concrete improvements.

The first few KiWFF meetings were devoted to discussions of what we valued in terms of test cases. What makes a test “good” in our minds, and what makes it bad. What tools we wanted to use, and why. We revisited this discussion when we brought on a new engineer with strong opinions on the topic – and we revised our thinking according to the new insights he brought to the table.

Right now, our early burn-down list of work is focused on making our test cases be what we want them to be, and to be more comprehensive.

We decided that clarity of intent was a crucial factor in the quality of a test case for us. It mattered that the intent of the code under test be very clearly evident to a reader of the test who did not have any part in writing the code. So one of our standard units of homework is to have engineers read others’ tests and comment.

In the case of Cucumber features, this happens only at the level of *.feature files (for now), and for RSpec examples, the focus is on things like the describe lines and the other pieces that convey intention.

I’m particularly fond of asking the engineers with the least TDD experience to take on these tasks as it’s proven enormously helpful at surfacing hidden assumptions, flaws in our standards, and so forth. It also helps them to get a clearer sense of how to write good tests when they’re called upon to offer suggestions on how to improve the clarity of intent.

Each weekly email summarizing the meeting includes stats about tests passing/failing, code coverage, and how this has changed week-over-week. Obviously easily automated by a CI setup but having these facts in the email gives a nice, simple, up-or-down metric on whether we’re making progress on this area of our technology debt or not.

So, It’s About Testing?

Our emphasis on capturing intent in our tests is a key mechanism for deciding when code can/should go, and when it’s still useful. We expect this to become increasingly valuable for trimming dead complexity as time passes.

That said, there’s a lot more here than just testing.

So What Else Is There, Then?

I’ve also put several static analysis tools in place to look for code that’s awkward, contorted, stylistically heinous, presents trivial security risks, or just plain complex and we use this as a launching point for finding specific bits of code to review.

The very first piece of code we found that stood out as rather complex was a validator plugin I made for Rails. The purpose of this class was to ensure that once a datetime field was assigned a non-null value, that it could only move in one direction temporally. You could only assign it a value that was earlier in time than the previous value. The code had evolved to be configurable, eventually allowing you to decide if you wanted to allow only-backward or only-forward movement on a per-field basis.

By virtue of its evolutionary path it had become one of those pieces of code with a bunch of levels of conditional nesting, and heavy logic in either side of many of the branches.

Well, with many eyes on it we noticed a number of things that I had missed in the moment. Similarities in branch paths, and logical characteristics that were amenable to more elegant arrangement. Ultimately, we hoisted a number of conditionals into a 'bail block’, leaving only one block-structured conditional and a piece of code that is frankly, much simpler and clearer to read and understand – and that is the concensus of the whole team.

I got a gentle reminder that no matter how good I think I am, I have off-days where my focus isn’t on every single detail of a problem. The code got better. The other engineers got to see and understand a dusty corner of the system. The more junior coders learned a bit about what makes code more elegant and maintainable.

Everybody wins.

So, It’s Also About Code Reviews?

Well, no.

You see, not all code warrants review. Much of the code is tedious, boring, trivial, and at the fringes of the system where it has a minimal impact on the lives of all involved.

Reviewing code is also, in general, a very draining exercise. Peoples’ eyes glaze over quickly. They don’t put their full attention into it in the first place.

We have a number of pieces of code in our system that are subject to systematic code review by multiple engineers for the sake of security, but that’s separate from KiWFF and we eschew general code reviews because in our experience it makes everyone miserable with little or no gain in code quality (and possibly even losses in overall productivity).

The point here is to find the hot spots and focus on those. The interesting bits.

This may be something that’s found entirely by automated tools, or it may be a bigger-picture structure that an engineer runs across that feels awkward and wrong and is thus brought up for the sake of fomenting architectural simplication (or, if that’s not possible – finding a way of capturing/documenting the need for that awkward complexity for posterity).

A Fluid Process

Kill it With Fire Fridays is meant to be a fluid and dynamic process where the emphasis evolves to match the pain-points that the engineering team is feeling, and which remains light-weight enough to mesh well with any project management technique we might move towards.

So far, we’ve been using it for 3 months, and have produced measurable benefits from it. If I’m right, it will continue to lead to better code, better engineers, better team cohesion, and better productivity.

Watch this space to find out how things turn out, one way or the other!

assumptions, cloudability 1586 words, est. time: 317 seconds.

« The End Beyond Putting a Burger In Your Shell »

Comments

Copyright © 2016 - Jon Frisby - Powered by Octopress