Truth About Estimates, Story Points and Velocity

When a measure becomes a target, it ceases to be a good measure. -Goodhart’s law


Last week I overheard this conversation between a mid-level manager (Roy*) and a senior manager (Tom*), which reminded me of this quote by Goodhart also known as Goodhart’s Law.

*Names changed to protect identity.

Roy : How can velocity be measured in a meaningful way? Every team has its own definition of a story point.
Tom : True – but there are two meaningful views. First an improvement in each individual teams velocity, as that is a direct comparison with its previous velocity. Second, by the law of large numbers the overall program trend of velocity is a valuable measure as well. But as we all know, you can’t compare velocity across teams or tribes – that is meaningless.
Roy : Ok, but even within teams it seems to be a struggle to do cost estimates in a consistent way. I know we struggled for years with hours and function points but the problem hasn’t changed with a new unit of cost.
Tom : That is something we should help with, chat with our agile coach and she can help.

What is wrong with this conversation?

What is it that Roy & Tom, both are missing? What questions Tom should ask Roy, to make him understand concepts behind relative estimation and story points?

Traditional Project Management

Traditional project management, comes from manufacturing background. It is always focused on measuring the tangible. Measures like number of hours per day, number of hours spent on a particular activity, number of hours per week, number of lines (software lines) per hour, number of lines per day, percentage completed of x, percentage completed of y, number of defects per hour, number of defects per 100 lines of code and so on. Understandably traditional management tried to measure time, scope and quality.

In the above discussion, both Tom and Roy are experienced in such project management and most likely have been project managers themselves.

However, after practising traditional project management for years project managers have forgotten, how a measure is supposed to help deliver project quicker? faster? sooner? with better quality? and importantly cheaper?

PMI has been around for more than 25 years and PMP certification has been around since 1984.

Problem with traditional thinking?

So, looking at the conversation above, what do you think happens when management starts asking for status reports? Typically, reports would show number of hours consumed, percentage of project completed, number of defects found vs. number of defects fixed etc.

As soon as Senior Management starts asking for numbers, the entire focus of middle management including program managers and project managers shifts to procuring these numbers. And in most cases looking good on those numbers and reports. And as Goodhart says – When a measure becomes a target, it ceases to be a good measure.

Recent VW saga is a good example when being number one car manufacturer became the target.

So, what’s wrong with the conversation between Tom and Roy.

Why do we use estimates?

In traditional project management estimates are used for planning, delivery planning, tracking (% complete vs. % remaining), risk mitigation (50% of estimated time completed but only 20% of work done), resource allocation (we need 5 teams to work on this for 6 months) etc.

However, one simple fact of tracking is, by tracking you are only reviewing historical information and not observing real outcomes. Something that lean & Toyota puts lot of emphasis on.

If you are following traditional project management (you can call it Agile), estimates are used to track against actuals. For examples, if you say a particular activity is going to take 100hours. Every week you are going to work say 40hrs on this. That means this activity would be completed in two and half weeks. Good calculation right?
It doesn’t stop here, traditional project managers will go further and say if we have 2 people working on it. That means 80hours a week. That surely means this can be done in one week and two days. Only if pigs could fly.

Nine women can’t make a baby in one month

Again this thinking comes from manufacturing, where it’s easy to count number of donuts produced. And if it’s manual process, more the number of people more output you would get.

In Scrum or XP, what do we use estimate for? Or rather do we even need estimates?

Estimates in Agile Setup

If you are following pure Scrum, your whole team would be working on a single story at a time, which is of most importance to the business. In this case, there is no use of tracking numbers. Either the most critical business functionality is complete and shippable to the customer or it’s simply not done. There is no need to calculate it’s 50% done or 20% remaining.

Now in such a team based set up, even if you have ten teams, each working on one story at a time, how would it matter if one teams velocity was 50 points versus next ones 60 points? All that should matter to the business is – is my most important business functionality getting delivered? And how soon it’s getting delivered?

See the difference in two approaches? One approach is to measure the number that represents amount of work completed (more accurately time spent = amount of work completed). Whereas Lean and Agile approach emphasizes on Value Delivered = most important functionality, and how fast it can be delivered.

Team velocity comparison with its previous velocity
First point mentioned by Tom is : an improvement in each individual teams velocity, as that is a direct comparison with its previous velocity. In a team based setup like Scrum or XP, when entire team is focused on delivering one functionality at a time, why would it bother to compare how many story points it delivered last sprint? Such comparison is meaningless, because like all numbers it can be gamed. Besides, in a complex software development it is not possible to compare two stories. In one sprint, a team might work entire sprint on one big story and would be really happy to deliver. In another sprint, team might deliver ten small stories. Hence the concept of relative estimation.

In such a scenario, it’s useless to compare its own velocity. They should not compare its own velocity.

The law of large numbers the overall program trend of velocity is a valuable measure as well
Tom further mentions : by the law of large numbers the overall program trend of velocity (story points) is a valuable measure as well. This is again as useless as the first measure. What trend are you looking for? Most of the management would ask for positive, upward moving trend. However, can you take this velocity number to the market? Can you ask show your customers velocity numbers and ask them if they are happy with it?

Trend will not tell you what is actually delivered. What you should be looking for is actual, live, working, defect free software. For example, if you have ten teams. All working on one story at a time. And if at the end of say three weeks, you have all ten stories ready, defect free and in shippable condition, you can take them to market and sale those features to the customers.

Agile estimates (story points, hours, function point etc.) are only for the agile team and no one outside of the team.

Agile team are interested in estimates only so it can find out its own deviations and find reasons for those deviations and fix those reasons which cause it to slow down. Management, however instead of asking for velocity numbers need to help teams fix those reasons. That is the only use of estimates.

One thought on “Truth About Estimates, Story Points and Velocity

  1. David Owen says:

    First, let’s remember that because stories are about user-visible features and not the software components that we build to enable those features, the actual story size can depend on the order in which the stories are completed. For example, Story A and Story B both need a common component X. If A is done before B, it will appear larger than if it were done after B. So at the very least, we’ll expect to see a high level of variance in your model.

    Second, have you checked your model? For “velocity,” it’s Y = b*X + N(0, sigma^2). Are the residuals normally-distributed, as the error term in the model assumes? If they’re not, you can have a variety of problems with the output.

    Third, I hope the team is using something as an anchor, so that their definition of “story point” doesn’t drift over time. Also, that they’re not being incentivized in some way to game their velocity number.

    Last, have you tracked any problems down to individual stories whose actuals came in significantly lower or higher than expected? Was there something different about those stories that you and the team could learn from, maybe to anticipate problems before they happen in a pre-sprint or story-review checklist? Learn from what went wrong, but also from what went right.

Leave a Reply