Teaching Product Owners How To Fish

Product owners are responsible for defining WHAT the Agile team needs to create. The team is responsible for deciding HOW to build WHAT the product owner needs.

I don’t think any Agilist would fundamentally disagree with that statement. Where there is inevitably a great deal of discussion and disagreement is describing the boundary between “WHAT” and “HOW.” Is it thin or thick? How much overlap is there? Does this depend on the nature of the work or is there a fixed standard? Does it have to be determined for every story? These are often not easy questions to answer.

While reading some neuroscience material yesterday regarding how our brains construct “concepts,” the topic lead into the notion of “goal-based concepts.” Since I’m not a neuroscientist but an Agilist looking for ways neuroscientific discoveries can be applied to the implementation of Agile principles and practices, the material had me thinking about how product owners might learn from the idea of “goal-based concepts.” How can I teach them about the “WHAT” such that they better understand the type and amount of detail the team needs in order to figure out the “HOW?”

I came up with a series of short dialogs:

Product Owner (to team): I need a fish.

Team: OK.

(Team goes off to work on “fish” and returns the next day with a catfish.)

PO: That’s not what I wanted! I need a fish!

(Team goes off to work on “fish” and returns the next day with a nurse shark.)

PO: That’s not what I wanted either!

What if the PO had been more clear about not just WHAT she wanted, but the goal for that WHAT. That is, a description of the problem that the WHAT was intended to solve.

PO: I’m opening a fish store. I need a fish.

(Team goes off to work on “fish for fish store” and returns the next day with a goldfish.)

PO: That’s good. But it’s not enough. I need more fish. Different kinds of fish.

(The team goes off to work on “variety of fish for fish store” and returns the next day with guppies, mollies, swordtails, and angelfish.)

PO: Cool!

Or version two:

PO: I’m opening a restaurant. I need a fish.

(Team goes off to work on “fish for restaurant” and returns the next day with Poached Salmon in Dill Sauce.)

PO: That’s good. But it’s not enough. I need more fish. Different kinds of fish.

(The team goes off to work on “variety of fish for restaurant” and returns the next day with Miso-Glazed Chilean Sea Bass, Mediterranean Stuffed Swordfish, and Pan Seared Lemon Tilapia.)

PO: Yum!

Hopefully, you can see that by providing a little of the “WHY” for the “WHAT” has helped the team do a better job of delivering WHAT the product owner wanted.

Of course, these are simplified examples. There are many additional details the product owner could have supplied that would have helped the team dial in on exactly WHAT she needed. In the first dialog, the team had to make guesses about what the product owner meant by the rather broad concept of “fish.” In the subsequent dialogs, the team at least had the benefit of the context that was of interest to the product owner. In these cases, the team had a better understanding of the goal the product owner had in mind.

In Agile-speak, this context or goal information would be provided in the “As a…” and “…so that…” part of the story. “As a restaurant owner I want fish so that patrons can enjoy a variety of menu options.”

The less clear the product owner is on these elements, the longer it’s going to take for the team to guess what she really wants.

Estimating Effort – Adaptation

I’ve been running the informed intuition (or if you prefer, “disciplined intuition”)  approach to estimating effort for close to nine months now. For the most part, it has gone very well. The primary objective – inspire and support a conversation around the effort needed to complete a story – has most definitely been realized. Along the way the process has shifted to better support both the conversation and the team’s ability to internalize the process.

Originally, it was proposed that teams rate each of the effort characteristics on a sliding scale – 1 to 10 or 1 to 15, or whatever the team decided was most useful. Feedback from the teams lead to the discovery that it is easier to evaluate each effort characteristic using the modified Fibonacci scale rather than a sliding scale. This provides continuity across the method in that everything about a story’s effort value is considered using the same scale. It also reinforces the rationale behind the use of the Fibonacci scale and seems to facilitating the team’s ability to internalize the method. They are moving more quickly when deriving effort values.

A second adaptation is the use of several sets of characteristics, depending on the type of story, the predominant functional area represented by the team, and the nature of the work. For example, a story that involves the development of a computer board has a different set of criteria from stories that involve the creation of firmware for the board or the UI/UX features of the hardware product. The sets usually contain 3 or 4 common characteristics, such as “complexity” or “dependencies.” However, the hardware board may include something like “part sourcing” or “compliance testing.” This illustrates the importance of having the team deconstruct what “effort” means in the context of their world. When they determine the characteristics, the follow-on conversations about the effort are much more robust and meaningful.

In essence, this method is a reflection of the product owner’s responsibility for the “what” of the story and the team’s responsibility for figuring out the “how” of the story. “What I want,” says the product owner, “is an estimate of the effort involved to complete this story.” The teams effort criteria demonstrate to the product owner how they arrive at any particular value.

Strategies for Remote Interviews with Team Candidates

In a recent New York Times column, Adam Grant wrote:

Credentials are overrated, and motivation is underrated. It doesn’t matter how much experience people have if they lack the drive to think creatively, work collaboratively and keep on learning. We’re not just hiring people to do a job today — we’re hiring them to make their team and their organization better tomorrow.

Once upon a time – last century, actually – employers could rely on the conferring of a college degree as evidence of a certain level of competence in the degree subject. In some areas, this is probably still true. Generally speaking, this would apply to the scientific areas of study: chemistry, physics, mathematics, etc. Unfortunately, even these area are becoming suspect as academic rigor is eroded in the interests of removing perceived barriers to this or that special interest group. To be very clear, I’m referring to the importance of thorough and complete understanding of the subject. The mine field that academia has become is indeed rife with self-inflicted and often insurmountable barriers to learning. The egregious rise in the cost of tuition, grade inflation, and credential dilution are but a few examples.

There are other factors in play. The speed at which society moves in the 21st Century is simply too fast for the four-year degree to to have any hope of staying relevant, let alone keeping up. Almost every major university offers free courses in a wide variety of subjects so it is possible for a high school graduate to craft the equivalent of a Bachelors or Masters and complete it for a fraction of the cost and in half the time. Ah, but without having completed the paper chase, how can such an industrious individual establish for a potential employer that they have the requisite competence?

Adam Grant has it right. Credentials are overrated. So how can we assess the quality and potential of team candidates? Grant identifies three key mistakes interviewers make in the interview process.

  1. They ask they wrong kinds of questions.
  2. They focus on the wrong criteria.
  3. They’re overly influenced by the best talkers.

If, as Grant suggests, job interviews are broken than conducting remote job interviews in the midst of a pandemic are significantly more challenging. In this post, I wish to speak to the second mistake identified by Grant and write about what we can do to identify our criteria, what we can do during an interview to elicit information about the candidate’s qualifications, and a strategy for improving the efficacy of remote job interviews.

Identify Important Criteria

For the sake of example, we’ll engage in a little time travel into the future and imagine having hired the perfect product owner candidate. What tasks encountered in your work day are no longer an issue with the new candidate on board? Is the product backlog now well-maintained and in a healthy state? Does the sprint runway extend out 4-5 (or more) sprints? Has a stable sprint velocity emerged (suggesting that the user stories are of higher quality and understood better by the team)? Do conflicts between areas of the business occur less frequently than in the past? Are stakeholders pleased with the results they see at sprint and increment reviews?

If our example were for a scrum master candidate, we would ask ourselves different questions for eliciting important criteria for the position. Is there less conflict among team members? Does the team understand the purpose and value for determining the effort involved to complete a user story? And again, has a stable sprint velocity emerged?

In addition to considering what hasn’t been working well (and therefore illuminating what skills you want a candidate to bring to the table) it is also important to include what has been working. It will not serve the organization if one set of problems are swapped for another. Perhaps, for example, the previous product owner was well liked by the team and helped the team maintain a positive morale, but had a poorly maintained product backlog that prevented a good approximation for a release date. It wouldn’t be much of an improvement if the new product owner kept a healthy product backlog but did so by driving the team as a tyrant might.

Test for Matching Skills

With a good feel for the criteria needed to hire the best candidate you can then craft a strategy for determining how well the candidate’s abilities satisfy your criteria. Prepare tasks for the candidate that will verify congruity between what a candidate says they can do and what they can actually do. One approach, which I use frequently, is to present the candidate with a series of scenarios, each designed to build on how the candidate responded to the previous scenario. While I may only present a candidate 3-4 scenarios, I have several dozen in the queue and present the sequence based on how well the candidates responses to the challenge.

For example, for a scrum master role – a high-touch role that requires consummate communication skills, flexibility, and the ability to solve people problems – I may present an initial scenario as follows:

“I’m going to give you several scenarios. You are free to ask any questions you wish about the scenario and state any assumptions you are making in your responses.

You are being considered for a position as scrum master for a team that is developing a healthcare related web application for use in hospitals. This team is responsible for developing the UI/UX components and works closely with another team responsible for much of the database and middle tier components. As a new scrum master, what questions would you ask of anyone in the organization to help you quickly understand what you need to do to become effective as a scrum master for your team?”

There are many things I would hope to hear in the candidate’s answer. To mention a few, I’d like to hear that they want to speak to the product owner, the stakeholders, and, of course, each of the team members. I’d like to hear that they plan to spend time in information gathering mode rather than work immediately to shape the team into some version of teams they’ve worked with at other jobs. I’d like to hear questions from them about what kinds of metrics does the team use and what have they shown.

There are no right and wrong answers to a scenario like this. Just answers that are better than others. And I don’t expect the candidate to deliver an exhaustively thorough response.

From their responses, I might learn that they are a recipe follower or that they are flexible in adapting to the needs of the business while working to establish good scrum practices. I might learn that they really don’t know scrum at all and are only good at parroting text book examples and jargon. I might hear how they would attempt to leverage several things from previous experience while acknowledging those attempts would be experiments and subject to adaptation based on feedback.

Assuming the candidate responded to the first scenario in a way that scores high marks for satisfying my criteria, I might offer the next scenario as follows:

“Assume you have been serving successfully as scrum master for this team for six months now. The product owner calls the team together and says ‘I need to swap out some of the stories in the sprint for work that marketing wants done before the end of the week.’ As scrum master, how would you respond to this development?”

As with the previous scenario, the candidate’s response would be measured against the criteria I have established for the position. Depending on what I’ve heard, I may continue to offer additional scenarios that build on the candidates developing experience with the scenario scrum team.

This strategy is pursued until I’m satisfied the candidate knows what they claim to know or not. A short interview does not bode well for the candidate. A long interview does.

(I would be interested in hearing about any questions, comments, or creative ways you’ve applied this strategy.)

Determining Effort Value – Tactics

While the concept and practice is straightforward, shifting a team from intuitive guesses about story points to a more deliberate approach for determining effort value (a.k.a. story points) can be a challenge at first. The following approach may help start the process.

  1. Begin by focusing on product backlog items (PBIs) that the team has estimated using their previous approach that are at a 5 or greater. There isn’t much to be gained by applying this approach to PBIs estimated at 1 or 2. PBIs that the team knows are a bigger effort but may not be able to articulate why that is the case are good candidates for learning how to apply this technique.
  2. Ask the team how much time it may take to complete a PBI. While I have written before about the importance of excluding time criteria when determining effort values, this can be a good place to start. It is what teams are most familiar with – for better or worse. Teams usually have not problem throwing out a time: 8 hours, 16 hours, etc.
  3. With the time estimate in hand ask the team:

“If you sit in front of your computer and start the clock, will the PBI be done if you do nothing and the estimated time elapses?”

I would hope the team would answer “No.”

  1. With the answer to the first question in hand, ask the following question:

If the passage of time alone won’t get the PBI work completed, what will you be doing (actions and behaviors) to complete the work?

The conversation that follows from this questions is the basis for determining the effort criteria the team needs to better describe what they will be doing on their way to completing the PBI. The techniques around establishing effort criteria are described in an earlier post.

Transparency, Source Code Quality, and Metrics

I’ve been reading “Hello World: Being Human in the Age of Algorithms” by Hannah Fry. She relates this story:

In 2012, a number of disabled people in Idaho were informed that their Medicaid assistance was being cut. Although they all qualified for benefits, the state was slashing their financial support – without warning – by as much as 30 per cent, leaving them struggling to pay for their care. This wasn’t a political decision; it was the result of a new ‘budget tool’ that had been adopted by the Idaho Department of Health and Welfare – a piece of software that automatically calculated the level of support that each person should receive.

Unable to understand why their benefits had been reduced, or to effectively challenge the reduction, the residents turned to the American Civil Liberties Union (ACLU) for help.

[The ACLU] began by asking for details on how the algorithm worked, but the Medicaid team refused to explain their calculations. They argued that the software that assessed the cases was a ‘trade secret’ and couldn’t be shared. Fortunately, the judge presiding over the case disagreed. The budget tool that wielded so much power over the residents was then handed over, and revealed to be – not some sophisticated AI, not some beautifully crafted mathematical model, but an Excel spreadsheet.

Within the spreadsheet, the calculations were supposedly based on historical cases, but the data was so badly riddled with bugs and errors that it was, for the most part, entirely useless. Worse, once the ACLU team managed to unpick the equations, they discovered ‘fundamental statistical flaws in the way that the formula itself was structured’. The budget tool had effectively been producing random results for a huge number of people. The algorithm – if you can call it that – was of such poor quality that the court would eventually rule it unconstitutional.

My first thoughts were, “How bad a spreadsheet hack do you gotta be to have your work be declared unconstitutional? And just how many hacks does it take to build an unconstitutional spreadsheet?”

To be fair, math is hard. Government is complex. And I’m comfortable with the assumption that everyone who had a hand in building this spreadsheet had good intentions. Venturing a guess, the breakdown happened at the manager/politician/lawyer level.

It is probable that the complexity of the task quickly overtook the abilities of the spreadsheet author(s) and the capabilities of the tool. Eventually, no single person understood how the whole thing worked. Consequently, making a change in one place affected how the spreadsheet worked in n other places and no one was capable of regression testing the beast. But the manager/politician/lawyer types knew what to do: Hide behind the “trade secret” smoke.

There are many lessons from this story. Plenty of points of failure. What I’m interested in writing about is the importance of transparency and how a good set of performance metrics can help in maintaining transparency.

The externally facing opacity in this story is readily apparent. What we don’t (and probably never will) see is the lack of transparency prevalent internally to the Idaho Department of Health and Welfare and whomever designed and built the spreadsheet tool. I’d bet a round of drinks that neither has heard of Agile much less employed its principles and practices. These by themselves – when actually practiced long term – go a long way toward establishing a culture of transparency. This is the key. Long term practice. A period of time is needed to change behaviors, mindsets, attitudes, beliefs, and when necessary, personnel. Even over the long term, implementing an Agile methodology isn’t improvisational theater. A strategy and a way to measure progress is needed.

Which gets me to metrics.

Selecting metrics and tuning them over time is critical to measuring team performance and developing improvement plans. Metrics that inform meaningful actions are the goal. Leave the vanity metrics that verify what managers want to hear or already “know” to the competition.

I’ve encountered my share of overly complex ways to measure the performance of individuals and teams. Often the metrics taken from machine-like task work (for example, assembly line work) are applied to creative or intellectual/knowledge tasks. This type of re-purposing results in, for example, counting lines of code or the number of source code check-ins as an indicator of software developer productivity. It never ends well.

When working to define a set of metrics to track an individual or team’s performance it is more effective to begin by asking several questions.

  • What problems are you trying to solve?
  • What questions will your chosen metrics answer?
  • What questions will your chosen metrics not answer?
  • How, specifically, will you know you can trust you metrics? How will you know when they are right and how will you know when they are wrong?
  • How well do your metrics compliment each other? That is, by combining them do you end up with a much better picture of individual or team performance the you do by considering individual metrics?
  • Do your metrics support any planned actions for improvement? Are you collecting actionable metrics or vanity metrics?

Finally, it is important to understand the limits of performance metrics. Displaying velocity charts that have fractions of story points implies an accuracy that simply isn’t there. Significantly adjusting project timelines based on the first three sprints worth of velocity data can have adverse secondary effects on the project.

There is no perfect set of metrics, no divine set of measures that match an impossible standard of perfect objectivity and fairness. The best possible set of metrics is one that supports useful decisions rather than simply instructs managers where to apply the stick. They should help show the way to performance improvement rather than simply report results.

I work to have 3-5 metrics, depending on the individual, the team, and the project. Less than 3 and the picture starts to look rather flat. More then 5 and the task of performance monitoring can become overly complicated and cumbersome. Keep it lean and manageable. That way, it’s easier to tell when things aren’t working and your metrics are much less likely to violate your team’s constitutional rights.

How to Develop a Team Identity with A/B Testing

If you’ve ever been fit for prescription glasses, you’ve no doubt had the experience of the eye exam where the doctor flips between different lens strengths and asks “Is this better or worse than before?” It’s basically A/B testing.

This came to mind after reading a research paper authored by Dan Gilbert and Jane Ebert [1] and listening to Gilbert’s TED Talk, “The surprising science of happiness.” The key bit, as described by Gilbert:

Let me first show you an experimental paradigm that’s used to demonstrate the synthesis of happiness among regular old folks. This isn’t mine, it’s a 50-year-old paradigm called the “free choice paradigm.” It’s very simple. You bring in, say, six objects, and you ask a subject to rank them from the most to the least liked. In this case, because this experiment uses them, these are Monet prints. Everybody ranks these Monet prints from the one they like the most to the one they like the least. Now we give you a choice: “We happen to have some extra prints in the closet. We’re going to give you one as your prize to take home. We happen to have number three and number four,” we tell the subject. This is a bit of a difficult choice, because neither one is preferred strongly to the other, but naturally, people tend to pick number three, because they liked it a little better than number four.

Sometime later — it could be 15 minutes, it could be 15 days — the same stimuli are put before the subject, and the subject is asked to re-rank the stimuli. “Tell us how much you like them now.”

The result was that their previous #3 was ranked as #2 and their previous #4 was ranked as #5. This reflects what Gilbert calls “synthetic happiness.” Having been denied their #1 and #2 choices, experiment participants was forced to “settle” for a lesser choice. However, having made the choice they increased they preference for the lesser choice and thereby synthesized happiness with that choice. Just as interesting, the previous #4 choice was pushed further down the scale as if to put some distance between the previous #3 choice. In effect, distinguishing the decision to take home #3 as clearly the better choice.

All this gave me an idea for something to try with a team I’ve working with that needed to rehabilitate their team identity into something healthier. Typically, teams sour on the idea of going through an exercise like this. The team I was working with was no exception. They likened it to defining team goals – a largely tedious and uninspiring chore.

I wanted to know if I could present two possible team identity statements – A/B style – of which one would be clearly undesirable and another more in line with what I suspect the team may be comfortable. The A/B presentation would keep this simple (presenting a selection of six team identity statements as in the experiment with pictures described by Gilbert would be a non-starter.)

Offering a choice should compel them to chose one over the other. I’m counting on their brains to do what brains do. When faced with a choice, they make one. If I were to present them with a single identity statement and ask “How would you like to change this to be more in line with the identity you want?”, I’ve every confidence the room would be filled with silence.

The very first presentation had a blank page on the left and my intentionally lame and inaccurate team goal on the right.

The team was well aware “no goal” wasn’t an option and wouldn’t reflect well on their performance review with HR and management. My theory was that when faced with an empty goal and one that was inaccurate, they’d suggest something, however minimal, that was an improvement on the initial goal. This is what happened and the team then spent a few minutes tuning the goal into something a little less cringe-worthy. This began the process of converting the goal from the scrum master’s goal to the team’s goal.

Then I deliberately let a week or more pass.

On next presentation, the goal on the left was the goal they chose and tuned previously. The second choice was similar but contained one or two slight modifications intended to move the team’s identity in a more positive and healthy direction. Over the course of several months I tested – A/B/Eye Exam style – numerous team goals. “Which goal do you prefer, the one on the right or the one on the left?”

So we had a start. From here on out it was just a matter of improvement. Keying off of things the team said or did, I’d modify the “accepted” goal and present it as an option at the next opportunity.

The key  or driver in this approach, the hypothesis goes, is to set it up so that the team makes the decisions rather than having something foist upon them. The are virtually guaranteed to reject or strongly resist the latter. With the former, they have ownership in the decision. To reject their decision is to say, in essence, that they made a bad or wrong choice, a bad or wrong decision. In general, people don’t like to admit such a thing so they stick with a decision – for better or worse – if it’s a decision they made and are responsible for.

Update (2020.04.13)

Another important element in play with this approach is the anchoring cognitive bias, particularly early on. People are much more comfortable making comparisons between things than they are with coming up with something original. By presenting a blank goal and one that reflects a direction in which I want the team to move – from nothing to something positive – the hypothesis is that the team will assimilate toward more positive goals and that this assimilation will become self-reinforcing over time.

References

[1] D. T. Gilbert, J. E. J. Ebert (2002) Decisions and Revisions: The Affective Forecasting of Changeable Outcomes, Journal of Personality and Social Psychology, Vol. 82, No. 4, 503–514

 

Photo credit: Max Pixel

Time Out!

In Estimating Effort – An Explicitly Implicit Approach I stated that time cannot be one of the attributes the team uses to describe what they mean by “effort.” The importance of this warrants the need for a deeper dive into the rationale behind this rule and how excluding time can lead to better predictability for team performance.

The primary objective for coaching teams to think about effort independent of time constraints is so that they can improve their skills for thinking about the actual work involved. Certainly they will spend time completing the work. But the simple passage of time won’t get the work done. Someone has to actually DO something. That something is the effort.

For example, maybe someone on the team says the product backlog item requires a lot of documentation. It isn’t complex and there aren’t any dependencies, it’s just going to take a lot of time – 7 days, maybe. So they want to give that PBI an effort value of 5 or 8 (or 5 or 8 story points, if that’s what you’re using) because it’s going to take a lot of time.

Remember, the purpose of these criteria is to generate a conversation around what the actual effort is. The criteria are just a set of guideposts that help the team hold a meaningful conversation about the effort.  So when someone on a team insists that they estimate using time, I ask them “What are you doing as the time you’ve estimated is passing? Are you just sitting there, watching the seconds tick away?” Of course they aren’t just sitting there. I’m asking the questions to elicit a comment about the actual work they are doing. Maybe they answer with something a little less vague, like “typing words.” That’s good. “What’s the difference between typing those words in a word processor and typing code in Vim?”

Continuing down this line of inquiry usually leads to the realization that typing documentation has many similar traits to coding. It can be complex. It may have dependencies.  It may require research for accuracy and it certainly will need a lot of debugging (professional writers call this “editing.”) Coders typically don’t like writing documentation. To them it’s just about the tedium of banging something out that’s not as fun as code. Sussing out the effort like this will lead to better acceptance criteria and definition of done associated with the PBI.

The downside of time estimates is that they hide all manner of sins and rabbit holes. The planning fallacy, precision bias, availability heuristic, and survivorship bias are just a few of the mental obstacles guaranteed to reduce the accuracy of time estimates. Or you may have to deal with a team member who wants to estimate using time because they know full well it offers the opportunity to hide slow work. (Gamers gotta game.) When teams have run the gauntlet of effort criteria, they are more likely to end up with a better picture of how much work they are being asked to do when time is excluded from the conversation. Effort criteria force the team to be more explicit about the activities they are engaged with as the clock ticks.

The investment in identifying time-independent effort criteria yields further benefits in the retrospective. Was the team unable to complete a PBI in the sprint? Was all the work finished two days early? Have a look at the effort criteria and ask which of them were a factor in making the PBIs a bigger or smaller effort than initially estimated. This is how teams learn and improve their skill at estimating. The better they are at estimating the more predictable their productivity.

OK, so let’s say you have a team doing a great job of determining the effort needed to complete a PBI and they do so without including time. No doubt, management will be unimpressed. They want time estimates. Good news! We can give them time estimates…in two week increments.

With the team focused on figuring out time independent effort values for every PBI in the backlog and an ongoing experience of how much effort they can reliably complete in two week increments, product owners can provide a reasonable forecast for when the release or project will be complete. The team focuses on accurate time independent effort estimates. The scrum master and product owner worry about the performance metrics and time projections.

It’s surprising how hard of a sell this can be for teams. They are hard wired to think in terms of time because that’s what traditional project management has hounded them for since before coding was a thing. I tell teams, “With Agile and scrum, you no longer have to worry about time. That’s the product owner’s job. But you do have to develop very good skills at estimating effort.” It’s common for them to have a hard time adjusting to the new paradigm.

Estimating Effort – An Explicitly Implicit Approach

It is difficult to make predictions, especially about the future.Unknown

Sage advice.

So why bother estimating the amount of work needed to complete a product backlog item? After all, since estimates are about the future the probability is high that they will be wrong. Actually, they may very well be guaranteed to be wrong. It’s just that some of the guesses will be more accurate than others. And if they happen to match what the effort ended up to be, they just look like they were “right.”

I’ve written in the past expressing my thoughts about estimating the effort needed to complete product backlog items, particularly with respect to story points. I believe working to find a relative gauge to how well teams are estimating work is important. Without them, cognitive biases such as the optimism bias and planning fallacy can significantly distort a project delivery timeline. However, the phrase “story point” is burdened with a lot of baggage. It has been abused and misused such that invoking the phrase often causes more harm than good.

I’ve been experimenting recently with a different approach to estimating effort. The method I’ll describe in this post got a bit of a boost after listening to a recent interview with Psychologist and Nobel laureate Daniel Kahneman. In this interview, Kahneman describes an experience he had while serving in the Israeli army some sixty years ago. He was assigned the job of setting up an interview process that would determine how well a recruit would do as a combat soldier. For this process, he selected six traits and instructed the interviewers to ask questions designed to evaluate each trait independently and score them. The interviewers were not happy with this approach. As a compromise, Kahneman instructed the interviewers, when they were finished asking about the six traits, to close their eyes and just jot down a number they felt matched how good a soldier the recruit might be. What he discovered:

When we validated the results of the interview, it was a big improvement on what had gone on before. But the other surprise was that the final intuitive judgments added, it was good. It was as good as the average of the six traits, and not the same. It added information, so actually we ended up with a score that was half determined by the specific ratings, and the intuition got half the weight. That, by the way, stayed in the Israeli army for well over 50 years.Daniel Kahneman

This intuitive evaluation made by the interviewers is similar to what Agile methods ask of development teams when determining a value for “story points.” T-shirt sizes, planning poker, dot voting, affinity mapping and many similar techniques are all designed to elicit an intuitive sense of the effort involved. If there is a disagreement between team members, than a dialog follows to understand what the discrepancy is all about. This continues until there is alignment on what the team believes the effort to be. When it works, it works well.

So on to the details of the approach I’ve been experimenting with. (It doesn’t have a name yet.) The result of this approach is a number I call the “effort value.” The word “value” is a reference to the actual elementary mathematics value being derived. Much like the answer to the question “What value results from adding 2 and 2?” Answer: 4. The word “value” also suggests an intrinsic worth, something beyond a hard number. My theory is that this will help teams think beyond the mere number and think also about the value they are delivering to stakeholders. The word “point” correlates to a hard number and lacks any association to intrinsic worth or value.

Changing the words introduces a simple and small shift that nonetheless has a significant impact. With the change, teams are more open to considering a different approach to determining estimates.

So how is the effort value derived?

I begin by having the team define 4-5 characteristics or attributes that, to them, describe what they mean by “effort.” It is important for the team to define these attributes. By doing so, they own the definition and it becomes much harder for them to dismiss the attributes as “someone else’s” and thereby object to their use in deriving an effort value. These attributes can be anything that is meaningful to the team. Examples:

  • Complexity – Is the work straightforward (e.g. code a bubble sort function) or does it involve interrelated systems (e.g. code a predictive inventory control algorithm)?
  • Dependencies – How dependent is the product backlog item on other backlog items or other teams?
  • Familiarity – Is this work very similar to work the team has done in the past or something quite new? Tasking a coder with documenting a piece of straightforward code may actually be a difficult effort because the coding language they spend most of their day with is familiar whereas writing clear sentences that non-technical people can understand is unfamiliar.
  • Information – Is the detail in the product backlog item complete? Are the acceptance criteria and definition of done clear?
  • Technical Debt Risk – Does the PBI require any refactoring of related code? Is any technical debt being incurred with the PBI?
  • Design Stability – Is there a lot of discovery and exploration needed to complete the PBI?
  • Confidence for Completing a PBI within the Sprint – This category may roll up several categories.
  • Tedium – Perhaps the effort involves a lot of repetitive copy and paste that nonetheless requires careful attention to avoid simple mistakes.

The team can define any attribute they wish. However, there are a few criteria to consider:

  • Keep the list limited to 4-6 attributes. More than that risks turning the derivation of an effort value into the equivalent of a product backlog item navel-gazing exercise.
  • Time cannot be one of the attributes.
  • The attributes should be reasonable. Assessing a product backlog item’s effort value by evaluating it’s “aura” or the current position of the stars are generally not useful attributes. On the other hand, I’ve listened to arguments against evaluating estimates in terms of “complexity” as being similarly useless. I see the point of those arguments, but my view is that the attributes must first and foremost be meaningful to the entire team. In the end, it’s an educated guess and arguments about the definition of terms like “complexity” are counterproductive to the overall intent of deriving an effort value.

Each of these attributes is then given a scale, the same scale for each attribute – 1 to 10, 1 to 15 – whatever the team feels is most appropriate. The team then goes through each of these attributes and evaluates the product backlog item attribute on the scale. (NB: After nine months of Plan-Do-Check-Adapt, a better approach for scoring the attributes has been determined.) The low number on the scale represents very little impact. If dependency, for example, is one of the attributes then a 1 might mean that the product backlog item is entirely self-contained. A 10 might represent a case where the product backlog item is dependent on several other product backlog items or perhaps the output from other teams.

When this is done, ask the team where on the modified Fibonacci scale they think this particular product backlog item’s effort value should be. If they’re struggling you can do the math: find the average for all the attributes and match that number in the modified Fibonacci scale. If the average is a decimal, for example 3.1, match the value to the next highest modified Fibonacci scale number. In this case the value would be 5. Then ask the team if they feel that number it’s a good representation of the effort value for the product backlog item.

This may seem like a lot of unnecessary gyrations, but for technical people it’s a simple process they can understand. The bonus is a number they can calculate. The number isn’t what’s important here. What’s important is the conversation that happens around the attributes and what the team feels about the number that results from the conversation. This exercise is meant to develop their intuitive muscles for considering multiple aspects and dimensions behind the “effort” needed for them to get the work done.

Use this process enough times and eventually calculating the average can be dropped from the process. Continue using this process and eventually calculating the numbers for the individual attributes can be dropped from the process. I don’t know if it’s a good idea to drop the use of the attributes for generating the needed conversation around the effort needed, but it will certainly be valuable to reconsider the list of attributes from time to time so as to fine tune the list to match what the team feels is important.

With this approach I’m turning the estimation process on its head (or back on its feet, if Kahneman is right.) Rather than seek the intuitive response first (e.g. t-shirt size) and elicit details later if there is a mismatch between team members, this method seeks to better prime and develop the team’s intuition about the effort value by having them explicitly consider a list of self-selected attributes (or traits) for effort first and then include an intuitive evaluation for effort.

Don’t try to form an intuition quickly, which was what we normally do. Focus on the separate points, and then when you have the whole profile, then you can have an intuition and it’s going to be better. Because people form intuitions too quickly, and the rapid intuitions are not particularly good. If you delay intuition until you have more information, it’s going to be better.Daniel Kahneman

Update

See Time Out! and Determining Effort Value – Tactics for additional information on this technique.

The Pull of Well-Crafted Product Visions and Release Goals

There was even a trace of mild exhilaration in their attitude. At least, they had a clear-cut task ahead of them. The nine months of indecision, of speculation about what might happen, of aimless drifting with the pack were over. Now they simply had to get themselves out, however appallingly difficult that might be. [1]

In the early 20th Century, Sir Ernest Shackleton led an expedition attempting to cross the South Pole on foot. He was unsuccessful in that attempt. What he succeeded at, however, was something far more impressive. After nearly two years of battling conditions south of the Antarctic Circle, Shackleton saw to it that all 27 men of his crew made it safely home. As Alfred Lansing notes, “Though they had failed dismally even to come close to the expedition’s original objective, they knew now that somehow they had done much, much more than ever they set out to do.”

There is much I could write about the lessons from Shackleton, his crew, and the Endurance that apply to our own individual endeavors – personal and professional. For the moment, I wish to reflect on the sheer clarity of the goal 28 men had in 1915-1916: To survive, by any means and nothing short of complete dedicated effort.

To be sure, their goal was self-serving – no one can judge them for that – and no product team is ever likely to be placed in a situation of delivering in the face of such high stakes. Indeed, the lessons from Endurance are striking in their contrast to just how feeble the drama is that is often brought into product delivery schedules. We call them “death marches,” but we know not of what we speak.

One of the things we can learn from Endurance is the power of a clearly defined objective. Do or die. That’s pretty damn clear. Time and time again, Shackleton’s crew were faced with completing seemingly impossible tasks under the harshest of conditions with the barest of resources and vanishingly small chances for success.

What kept them going? Certainly, the will and desire to live. There were many other factors, too. What interests me in this post is reflected in the opening quote. The emergence of a well-defined task that cleared away the fog of speculation, indecision, and uncertainty. Episodes like this are described multiple times in Lansing’s book.

Why this is important to something like a product vision is that it clearly illustrates a phenomenon I learned about recently called “The Goal Gradient Hypothesis,” which basically says our efforts increase as we get closer to our goals. But here’s the rub. We have to know and understand what the goal is. “Do or die” is clear and leaves little room for misunderstanding. “Let’s go build a killer app,” not so much.

From the research:

We found that members of a café RP accelerated their coffee purchases as they progressed toward earning a free coffee. The goal-gradient effect also generalized to a very different incentive system, in which shorter goal distance led members to visit a song-rating Web site more frequently, rate more songs during each visit, and persist longer in the rating effort. Importantly, in both incentive systems, we observed the phenomenon of postreward resetting, whereby customers who accelerated toward their first reward exhibited a slowdown in their efforts when they began work (and subsequently accelerated) toward their second reward. [2]

Far away goals, like a product vision, are much less motivating than near-term goals, such as sprint goals. And yet it is the product vision that can, if well-crafted and well-communicated, pull a team forward during a postreward resetting period.

But perhaps the most important lesson from the research – as far as product development is concerned – is that incentives matter.  How an organization structures these is important. Since most people fail The Marshmallow Test, rewarding success on smaller goals that lead to a larger goal is likely to help teams stay focused and dedicated in the long run. Rather than one large post-product release celebration, smaller rewards after each successful sprint are more likely to keep teams engaged and productive.

References

[1] Lansing, A. (1957) Endurance: Shackleton’s Incredible Voyage, pg. 80

[2] Kivetz, R., Urminsky, O., Zheng, Y (2006) The Goal-Gradient Hypothesis Resurrected: Purchase Acceleration, Illusionary Goal Progress, and Customer Retention, Journal of Marketing Research, 39 Vol. XLIII (February 2006), 39–58

Center Point: The Product Backlog

If a rock wall is what is needed, but the only material available is a large boulder, how can you go about transforming the latter into the former?

Short answer: Work.

Longer answer: Lots of work.

Whether the task is to break apart a physical boulder into pieces suitable for building a wall or breaking up an idea into actionable tasks, there is a lot of work involved. Especially if a team is inexperienced or lacks the skilled needed to successfully complete such a process.

Large ideas are difficult to work with. They are difficult to translate into action until they are broken down into more manageable pieces. That is, descriptions of work that can be organized into manageable work streams.

We choose to go to the Moon in this decade and do the other things, not because they are easy, but because they are hard; because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one we intend to win, and the others, too.President John Kennedy

That’s a pretty big boulder. In fact, it’s so big it served quite well as a “product vision” for the effort that eventually got men safely to and from the moon in 1969. Even so, Kennedy’s speech called out the effort that it would take to realize the vision. Hard work. Exceptional energy and skill. What followed in the years after Kennedy’s speech in 1962 was a whole lot of boulder-busting activity

A user story is a brief, simple statement from the perspective of the product’s end user. It’s an invitation for a conversation about the what’s needed so that the story can meet all the user’s expectations.

In it’s simplest form, this is all a product backlog is. A collection of doable user stories derived from a larger vision for the product and ordered in a way that allows for a realistic path to completion to be defined. While this is simple, creating and maintaining such a thing is difficult.

Experience has taught me that the single biggest impediment to improving a team’s performance is the lack of a well-defined and maintained product backlog. Sprint velocities remain volatile if design changes or priorities are continually clobbering sprint scope. Team morale suffers if they don’t know what they’re going to be working on sprint-to-sprint or, even worse, if the work they have completed will have to be reworked or thrown out. The list of negative ripple effects from a poor quality product backlog is a long one.


Boulder image by pen_ash from Pixabay