Paint The Room!

I occasionally deliver internal Agile training classes, and one of the things I have struggled with is to explain the concept of Agile estimation. I’ve been searching for a good game to get the point across, but could never find anything that worked well. Some times I could work off a real backlog of items and features supplied by the class participants, but more often than not it was almost impossible to find a set of features that everyone in the room could relate to. So some people would get the point, others would not.

One day in such a training session I had a room full of people from various backgrounds and was silently stressing over how to best cover Agile estimation. Then, somehow, two previously unacquainted neurons connected for the first time and in a flash I had the solution.

This game will teach Relative Estimation, Planning Poker, Story Points, Backlog Refinement, Story Splitting and Story Elaboration all in one short 20-minute session. All you need is some Planning Poker cards. It’s so simple you just have to try it to see how effective it is.

If you’re not familiar with how to use Planning Poker, then this page from Mike Cohn (@mikewcohn) is a good start.

I call it the “Paint the Room” game.

Paint The Room

The game is played in whatever room you are delivering the training in. Chances are your particular room has at least four walls, a floor and a ceiling. Good – that’s all you need! Now here is the setup:

Working as a team we have been given the job of painting the room which we are currently in. The customer’s requirement is (intentionally) vaguely stated as “please paint the room” and we have drawn up an initial project backlog which looks like this:

  • paint the north wall
  • paint the south wall
  • paint the east wall
  • paint the west wall
  • paint the ceiling
  • paint the floor

You might of course give each wall a descriptive name such as “window wall”, “entry door wall” etc. instead of north/south/east/west. Many people can’t tell North from Up.

In the game you will represent the Customer/Product Owner when the team has questions about the job. The team has to come up with an estimate to paint the room. Distribute Planning Poker cards to everyone. Keep the number of participants to the size of a scrum team. Everyone else can just watch and still get the full benefit.

The nice thing about the game is that everyone can relate to the task of painting a wall, even if they have never touched a paintbrush. Furthermore, everyone’s assumptions about what is involved in doing a good paint job varies widely among white-collar knowledge workers.

Establish the “Anchor Feature”

In relative estimation it is useful to have an anchor feature that we can associate a known story point quantity with. Other features will be estimated relative to the anchor feature.

Pick one wall that looks like it is medium effort relative to all the others and tell the team that “this wall is a 5-story point wall”. That is your “anchor feature”. All other backlog items will be estimated relative to that wall. This establishes what a 5-story-point wall looks like.

Most engineers initially have a hard time separating the idea of Story Points from person-days when we talk about software features, but have no trouble at all accepting the abstract notion of a unit-less “5 Story Point wall”.

Now we’re ready to estimate.

Estimate one wall

Pick one wall that is a little less complicated than the anchor wall and ask the team to estimate how many story points that wall is relative to the anchor wall. Use the Planning Poker method to get input from each team member.

Simply ask “So assuming our anchor wall is 5 Story Points of effort, how many Story Points of effort is this wall?”

Make sure everyone provides an estimate, and that everyone flips their cards all at once. One of the key benefits of Panning Poker is to protect against the influence of expert estimates which may or may not be reflective of what the team can actually do. It’s essentially a variation of wide-band delphi estimation. And indeed you might get a wide range of responses in the range of 1 story points to 8 or even 20.

This is where the fun starts.

Talk about the high and low estimates. The value of Planning Poker is in the conversation which brings out unspoken assumptions, and you will really see that in this exercise. As the high/low estimators explain their reasoning for their estimates, you will hear things like

  • I assumed that we had to put masking tape on the trim and around the edges of the windows
  • I assumed that we had to remove all the covers on the electrical outlets
  • I assumed that we have to cover the floor
  • I assumed that the prep work was done (or was not done) ahead of time

In other words, different people make different assumptions about what the job really entails when the job is imprecisely specified, so they come up with different story point estimates. That is exactly what happens when software engineers with different skills and specializations give estimates on vague requirements. Many times you will find that there is an experienced handyman in the team, and he will be on one end of the scale relative to the others. That is ok, it’s usually that way with software experts too.

The other thing that usually happens is that the team starts spurring each other on with more clarifying questions:

  • are we using rollers or brushes?
  • do we assume that all preparation/making take is done before the job?
  • do we all work on one wall at a time, or several in parallel?
  • do we have to purchase supplies first?
  • Should we move the furniture out of the room?

and so on.

The conversation will flow back and forth a little as the team considers the now-spoken assumptions behind the high and low estimates. As the facilitator you just have to sit back and listen as the team starts peeling the layers off the onion. If the team gets stuck, you might volunteer some of the questions above, but never solutions. Your job is to make sure that the team talks about their assumptions and agree on the how between themselves.

It’s a game, so don’t waste the opportunity to have some fun with it. If you get questions from the team, don’t be shy to throw in new requirements or constraints to make the job harder. “No you can’t take the windows out of the window frames before painting” or “yes of course you must remove the windows from the window frames before painting” and “well yes, of course I want the floor painted in a black-and-white chequer-pattern. What did you think?” are good conversation-starters. The team may groan all they want, but every small requirement change makes the game a little more like reality.

As the team discusses their individual assumptions they will probably identify some new tasks. Most teams end up adding new items to the backlog:

  • remove electrical outlet covers
  • put masking tape around the windows
  • put electrical covers back on
  • Purchase supplies
  • clean up

After a few back-and-forth comments you might get to a point where the team has a new common enhanced understanding:

  • yes we will remove the electrical outlet covers
  • we will use rollers for painting large areas and special brushes around the windows
  • we will not cover the floor (it will be painted last)
  • etc.

Not too different from talking about how a given feature works, what the hidden requirements are an so on. The actual agreements and assumptions are not important, as long as the team ends up with a better shared understanding of the task. The new knowledge is an illustration of Story Elaboration, and you might even want to take the opportunity to specify some Acceptance Criteria, such as “no paint smudges” and “even color application” (might require two coats).

Do another round of Planning Poker estimation. The second round of estimation should have a tighter range since we have now had some conversations and clarification about the “requirements”.

Discuss the high/low estimates again and do one more round of estimations if needed. If you picked a good “anchor wall” and picked a slightly less complicated wall to estimate, then the Story Point estimates should settle around 3 or 5 Story Points. Good enough, let’s move on.

Estimate the ceiling

Now that the team has seen the Anchor Wall and produced a converged estimate for a simple wall, you ask the team to place a story point estimate on painting the ceiling. Again, it must be relative to the Anchor Wall of 5 Story Points.

Discuss the high and low estimates. Some people will estimate the ceiling as being easy, others will see added complexity in simply reaching high, using ladders, how to deal with light fixtures etc. In any case you will find more un-spoken assumptions that only come to light when you talk about the estimates.

This is also a good opportunity to point out that the customer requirement initially was just to “paint the room”. Nobody gave specific direction on what to do about things like the light fixtures. It’s a perfect way to illustrate that it’s critical to go back to your customer and clarify what the expectation is. I usually respond to the team, when they ask, that I like the way the light fixtures look so I don’t want paint all over them. Therefore I want the light fixtures removed and then put back in place after the ceiling has been painted. You can then show that it’s ok to split the story into three: (1) remove fixtures, (2) paint the ceiling and (3) reinstall fixtures. That is a lot more work than the initial expectation (do we need an electrician?), but hey that’s R&D. Sometimes we uncover unexpected complexity. Update the project backlog accordingly.

Check the backlog

As you have gone through the estimating process a couple of times you will see that the backlog has grown with new entries, and some stories have been split into smaller stories. That illustrates the mechanics of backlog refinement. In every case the refinement was a direct result of discussing with the customer and/or uncovering hidden assumptions.

Wrap it up

I never go through the whole backlog, but stop the game after estimating two or three backlog items. By then we have covered Relative Estimation, Story Points, Story Elaboration, Story Splitting, Planning Poker and Backlog Refinement. Not bad for 20 minutes of poker!

I’ve run the exercise a few times now, and every time the team has come out with a better understanding of how to use Agile estimation techniques in their projects. Hopefully it works for you too.

If it does, let me know!

Advertisements

Scaling Agile – it’s not the same problem

I was prompted by Al Shalloway’s (@alshalloway) brief tweet this week combined with a long train journey to write this blog post.

That is exactly one of the key ideas in Nassim Nicholas Taleb’s (@nntaleb) Antifragile. The kinds of problem you get at scale are not the same kinds of problems you deal with in small numbers. To quote Nassim Taleb:

“A city is not a large village, a corporation is not a larger small business”.

Likewise a single Agile team does not behave the same way that a project team made up from several Agile teams does. The method and style of communication changes, as does the kind of risks and issues we deal with. It’s a transformation into something new, not a simple multiplication.

Why scale breeds complexity

If interactions grew linearly with the system size, then scale wouldn’t be such a problem. Each time you add new member to the system, the complexity (C) increases by one since you only add one more possible relationship as visualized below:

linear growth

Unfortunately as real systems grow in scale, their complexity grows exponentially because each component influences or interacts (directly or indirectly) with many others. That is, each member you add to the system adds exponentially more relationships or interactions. It’s an old idea that is surprisingly often neglected by those that design systems. You can visualize how the number of possible interactions grows as we add one more circle to each diagram below:

geometric growth

Complexity is a property of the behavior of the system, not the structure of it. That is, it’s not the number of components that makes it complex, but how they interact – as visualized by the number of lines in the diagrams above.

watchConversely we may have systems that have a complicated structure, but yield simple behavior. For example, a mechanical pocket watch has a complicated structure but very simple and (thankfully) predictable behavior.

So it’s complexity resulting from interactions and behaviors we have to be concerned about. This is why we see different types of problems at different levels of scale.

Aside: scale up and down

In order to be considered “scalable” the transformation must be two-way. I have seen software systems, for example, that are designed to “scale” but can’t run on modest hardware. That is not scaling, that is just big and bulky. Somehow we forget that scaling is not the same as simple expansion. To me a good scaling implementation is a dynamic one, that can scale both up and down. If you can only go one way, it’s not “scaling”.

So that’s the problem: the behavior of a small group is different than that of a large group, which is again different from that of a group of groups… and the effect shows up sooner than you might think because of the number of new communication paths.

The oft-cited Dunbar’s number says that people can have some sort of social relationship (but not necessarily a deep relationship) with up to around maximum 150 (or so) people, but we see behavior changes way sooner than that. The upper bound of a Scrum team (9 people) is based on experience and seems to be re-affirmed time and again. Beyond that people see themselves as part of a mass, not a team. In my organizations I never had a manager with more than 10-12 direct reports.

Sometimes trouble shows up at even smaller numbers. It is said that behind every successful man stands  a woman, behind every failed man there are two. I think that joke works with Scrum teams and Product Owners also, except it’s not funny.

Scale the problem down

This is where Agile and Antifragile meet again: big problems are best solved when you can scale them down and distribute the difficulty. The secret to successfully executing big projects is not to scale the project team up, it’s to scale the problem domain down.

What I mean by that is that we shouldn’t simply look at the project requirements and then figure out how to scale a team to build the whole thing all at once. Approaches such as Minimum Viable Product (MVP) and applying Design Thinking to focus the problem space can save a lot of time, effort and money by applying resources only to what is critical and important.

What about decision-making? The problem of scaling decision-making is a tough one. Many times we don’t even consider that we can change the way decisions are made, and so we fall back on a central person or core team that is responsible for making all the individual decisions. Its a fragile setup because these teams are disconnected from what happens on the ground.

Scaling decision-making

To me every scaling problem is a delegation and distribution problem. The most ineffective way of all is when someone decides to distribute workload but is unwilling to delegate authority. Micro-managers would object to the micro-manager label except they are too worn out from trying to apply their bottleneck everywhere at once… and they don’t read my blog anyway. When you’re too close to the tree you can’t see the forest.

So what can we do? First of all, recognize that you can’t possibly keep more than half a dozen balls in the air at one time. Divide and Conquer, distribute, subdivide… do what it takes to bring things into manageable chunks. This is after all what Agile methods do: break things down into manageable smaller pieces of work.

Distribute Distribute Distribute

Distributed systems, whether they are mechanical, software, political or social have some compelling properties. They are individually self-sufficient, resilient and effective. Centrally organized systems on the other hand are attractive only on paper or at very small scale. Works fine as long as the “central” part is available and capable of managing the information flow, but quickly breaks down and becomes a bottle-neck in any but the simplest projects.

So work obviously needs to be decomposed and distributed amongst multiple teams. Nothing new here, whether you distribute according to traditional functional teams or cross-functional Agile teams it has to be done.

It is not enough to just distribute the work, you also have to distribute authority and decision-making when you’re dealing with more than a couple of teams.

Central Mission command, local tactical decisions

One of the most powerful and elegant ideas in Don Reinertsen’s (@DReinertsen) well-equipped arsenal of golden nuggets is the idea of Mission Command. Instead of developing a detailed plan to be followed by all and centrally managed, we set higher-level objectives and let individuals and teams self-organize (and even improvise) and decide how to achieve the objectives.

Central mission command is very much different from central micro-management. We centrally decide on the overall (higher-level) objective to be achieved, then delegate responsibility, authority and decision-making for how to reach the goal to each team – while accountability still rests centrally.

Central command and local decision-making is not enough either. Even when teams self-organize around a central higher-level goal, their individual approaches may create trouble later down the road. So we can distribute decision-making and authority, but how do we ensure that everyone keeps the overall integrity of project and company in mind at any given time? And how do you retain central accountability?

Use decision-rules

As teams self-organize around delegated and common objectives, they don’t just need to meet their goals, they need to do their work and act in accordance with the company’s long-term interest and in concert with other teams. An orchestra only works if everyone plays in time with each other.

I recall with fondness my son’s first tuba recital in middle school band. There were 4 tubas on stage, and he finished first. Well what can I say, he’s competitive and has since grown into a splendid college athlete. He knows how to play in beautiful concert with his team on the field now but I still smile when thinking about that first recital.

Orchestration and alignment is needed both for the project goals and in the general backdrop of the company. For example, a team might decide to achieve their part of the project goal by using an open-source software package. Although that may result in achieving the project objectives, the team may have inadvertently put the company’s intellectual property at risk. Not all open-source licenses are the same. By simply using “freely” available software, you may be automatically entering into an agreement where your company must agree to make parts of their system software freely available to the rest of the world.

You can’t manage dozens of individuals by central decision-making, but on the other hand you can’t simply let everyone loose on their own either. One way to effectively sub-divide and delegate is to create a set of decision-rules that aligns everyone on how to make decisions. Think of it as a set of guide-rails that prevents you from falling off the path.

Identifying decision-rules is not the same as identifying who is responsible for making decisions and establishing an escalation path. That is not scalable either. It is all about agreeing on how teams will make local decisions on their own, what they can decide on their own and what must be decided centrally. Such rules can enable teams to make decisions consistent with the overall command objectives. These rules are set centrally, and is how the “central command” retains accountability for the decisions made in the project.

One of the best examples of decision-rules I have heard about came out of the Boeing-777 development program. When you design airplanes you have to make tradeoffs between weight, cost and space. If you increase the weight of your subsystem then it becomes a big deal since every ounce translates into increased operations cost for the customer. Every subsystem and every engineer is faced with these kinds of tradeoffs on a daily basis. How do you manage the choices of several thousand engineers all at once? Obviously a central design review committee wouldn’t work effectively. So the project team set “budgets” for each subsystem development team as to how much weight, cost and space they were allocated. If a team needed to exceed their weight allocation, they could do that as long as they found another team that would be willing to trade some of their weight against, for example, some additional cost. That way the weight/cost/space constraints of the 777 airplane were always satisfied overall. The individual teams could trade allowances between themselves as long as they stayed within the overall budget. They didn’t have to get every tradeoff approved centrally – they used decision-rules.

So instead of creating an elaborate escalation path for decision-making, create the guide-rails within which decisions can be made at the right level in alignment with the overall mission of the company.

The Agile Manifesto and the principles behind form an example of another such a decision-making framework. For every choice or decision to be made, every engineer or team can ask themselves, for example, “does what I am about to do satisfy the principle of simplicity?”, or “if we take this approach will we be able to show progress in terms of working software?”.  Similarly your company and project will have a different set of decision-rules that guide teams and individuals in making local decisions.

So is that it?

No that’s not all of it by a long mile. But it’s a start. If you can

  • understand that different levels of scale requires a different approach, and
  • distribute both work and decision-making authority, and
  • create a good set of decision-rules that can be used to align everyone,

…then you’ve laid the foundations for a much less stressful environment at scale.

The Formula-1 Pit Stop: Lean counter-counter-intuition

What does F1 racing and LeaF1-Leann Product Development have in common? Not much at the surface… but if you, as I do, see everything as systems then you can’t help but notice some interesting things. In this case I found an apparent counter-example to Lean that is Lean despite going against the grain of traditional Lean Thinking. Hmmm… we’re into double-negatives here but stay with me.

The racing analogy is interesting to me not because “Lean=Speed” but because someone questioned the “obvious right way” and came up with a counter-intuitive better solution.

640px-2010_Canadian_GP_race_startI am always amazed every time I catch even a glimpse of Formula-1 racing. The cars fly around the track at up to 300 miles per hour, pull 1.45g during acceleration and 4g when braking. High speeds, tight turns and frequent acceleration and braking wears hard on car and driver, but even more so on the tires which aren’t even engineered to last a whole race.

An F1 tyre these days is designed to only last for about 120 kilometers on average (it’s a weight vs. durability tradeoff), but most F1 races are at least 305 kilometers long. That means you need to change tires 2 or 3 times in a race that is won or lost by fractions of a second.

I’m fascinated by this, of course, because the pit stop is the biggest impediment to continuous flow around the track. If you could make one less pit stop than your competitor you would be several seconds ahead, turning 10th place into victory. For a long time that was the strategy: go easy around the curves so as to conserve tires and fuel. A lower average speed would win the race as long as you could avoid making too many pit stops. So far it sounds Lean, right? Slow down and you’ll finish the race faster. No surprise there: Lean solutions are usually counter-intuitive.

Twisting and Turning

Well, every good story has a twist and our little F1-Lean analogy is no different.

In the mid-1980s someone took a step back and looked at the whole end-to-end McLaren_pit_work_2006_Malaysiasystem and realized that the “economy” of racing could be improved. Start the race with only half a tank of fuel, and the car would be much lighter and go faster. Stop worrying about conserving tires and instead push the car to the limits on the track. The penalty for this strategy is an added number of pit stops.

Not a problem – you just need to minimize the time spent in each pit stop.

It’s a tradeoff curve, as always. If you can continuously reduce the amount of time needed to refuel and swap tires, then at some point down the curve  the wear-and-tear vs. pit stop balance will shift.

Now it gets interesting. Because we now have two different models of “good” racing strategy, we have to choose – but how? We have to take a systems view, and make the decisions based on the objective merits of each strategy, not by intuition or personal preference.

Yes but we’re talking about Product Development, right?

You see analogies in Product Development all the time. Whenever there is a bottle-neck in our process we have to decide to either fix/improve the bottle-neck, or try to avoid it. Not all bottle-necks are solvable or even visible. Some are disguised as “the way we always do things here”. Most companies have settled on a particular pattern of which bottle-necks (departments/phases) are reasonable (acceptable as the cost of doing business) and which ones aren’t. Companies that structure the development flow through phases and gates accept the overhead associated with functional departments as “not perfect but the best way to do things”.

Lean Thinkers challenge this view all the time: figure out where the the flow stops and then improve it. Usually it comes down to finding local optimizations and then reducing or eliminating tasks in the name of overall flow. For example, eliminating pit stops so that the race can flow un-interrupted.

And sometimes Lean Thinkers have to challenge themselves, to avoid getting stuck in the “best” Lean solution.

So where did Formula-1 end up?

Have a look at this Ferrari pit stop from 2013. Mid-race refueling is no longer allowed in F1, so now it’s all about how quickly you can get 4 new tires on the car. You can feel the anticipation as you watch the pit crew waiting for the car to arrive.

That pit stop took 2.1 seconds. It’s a huge improvement from the minute-plus pit stops of the early days of F1 in the 1950s. Pit crews spend a lot of time and money to squeeze out every millisecond they can from their process. You can almost visualize a value stream map on the garage wall and the team swarming to find and reduce the next bit of waste from the process.

Is it a fair analogy?

But, many will say, that’s not a fair analogy. Obviously they have a lot of specialized equipment and a huge crew of specialists standing by. This is high-stakes racing and has nothing to do with Lean or product development.

Well that’s the whole point behind this blog post. It’s a classic example of how Lean thinking differs from mass-production thinking. It just manifests itself in a different environment. Speed (and safety) matters in F1, and speed (and safety) matters in Lean.

Systems of systems: Lean Fractals

There is a very direct and obvious application of Lean Thinking and Value Streams to what happens in the pit. If you have seen a Value Stream Map, you can easily understand how that helps weed out waste and inefficiencies in the process flow of safely and quickly changing the tires.

But there is another higher-level system at play, which is includes both the pit stop and the track. This system-of-systems has a different kind of flow economy, working off a different set of aggregated information.

If we ignore the system-of-system effect and blindly apply Lean tools it can lead you astray. Pre-1980s racing solved for Lean flow based on how the costs  were incurred back then, i.e. relatively sloold pit stopw pit stops. Slow down around the track and finish in first place. However the cost of any activity changes over time, and the technology and capability of the pit crew improved such that the basic assumptions behind “the best way to race” had to change.

In this case they had to question some long-held beliefs assumptions in order to get to a new level of performance. Similarly we find the biggest improvements hiding in plain sight when we look at our overall product development system and question our traditional way of working. Our world is full of systems-of-systems.

I draw two lessons from the F1-Lean analogy:

Question the Status Quo

The obvious approach isn’t always right, and you won’t see it until you look at the whole end-to-end system. It is counter-intuitive that adding one more time-consuming pit stop to your race will speed things up overall… but it does – up to a point.

Even if you settle on a “best” Lean solution, this might also be a local optimum… keep looking and keep questioning. The journey through solution space is not always a linear one.

Invest in non-mainstream efforts

In order to lean out the value stream you may have to invest and focus on non-mainstream efforts such as tooling and supporting activities. This is the stuff of overhead. It’s hard to justify increasing the overhead cost when the normal pressure is to reduce expenses. But in a Lean system the right overhead isn’t a liability – it’s what enables the mainstream to go fast. The F1 teams had to shift investment away from car and engine design and onto the less-glorious pit crew tools and processes.

The difference is in the kind of overhead: instead of traditional overhead which is needed to manage the waste generated by stitching together the work of separate departments, Lean “overhead” is there to make the value stream flow faster at higher quality.

Yes it’s a fair analogy

To return to the point above, yes the analogy is totally fair. You can usually make your product development progress much faster if you invest in af1-pit-crewncillary processes and tools with an eye towards end-to-end Flow. The Ferrari team in the video clip shows 21 crew members, all working franticly for less than 3 seconds. It takes 3 people per tire to do the job. Wasteful resource utilization? Not if time matters and your objective is to get the car through the pit stop quickly. Sure, we obviously can’t allocate 21-person support teams for all tasks, but the idea is to figure out what it takes to achieve the best end-to-end flow and then invest accordingly.

When I first started developing software we had bi-weekly load builds because it was such a difficult thing to get a clean build with 40 engineers all submitting several weeks of code changes, and build servers were very expensive. We avoided load build “pit stops” until they were absolutely needed.

Gradually the situation changed, and we now have efficient continuous build and test environments and the ability to submit code changes daily. The “economy” of our pit stop changed. It was initially not easy convincing management to invest in extra build servers, tools and staff – but eventually we met the point on the tradeoff-curve where investing in ancillary things like load builds made more and more sense. Now every serious software group has a DevOps setup.

Local vs. Global Optimization

There is a balance between local and global optimizations, and the balance can
shift over time. You can’t even grasp the concept of that balance unless you look at the end-to-end system flow. Chances are that you are only looking at improvements within your own department. If you are, then you might routinely leave improvements of an order of magnitude or more on the table. Look one level up, at the system-of-systems, and see what you can find.

If it’s a Pipeline, it’s leaking

Many times we view the Product Development System as a Pipeline where we pour effort and energy in, and out comes a product sometime later. You’ve probably used this analogy before, talking about “products in the pipeline” or “the R&D pipeline”.

Pipe-1

Seems pretty intuitive, and I use that analogy too. Except I recently thought perhaps the analogy isn’t quite right. If you’re working in a Waterfall or phase-gate process, it’s not a single pipeline. It’s a series of smaller pipe lengths which are joined together by hand-offs:

pipe-2

The trouble with hand-offs is that they generate waste.Throughout the journey there can be more energy lost in hand-offs than actually make it out of the pipeline. In every joint, effort and energy leaks out.

Pipe-3

I find this analogy is a little more fitting, and although it’s a simple visual it helps make the point about hand-offs at the simplest possible level. The discussion usually turns to “what are the leaks and how we stop them” and there is your entry to discuss Lean and Waste.

What do you think?

Agile at Boeing in 1990s – the 777 Program

777The year 1995 recorded two seemingly very unrelated events: the entry of the first Boeing 777 airplane into commercial service and the introduction of Scrum to the world. As the twists and turns through history go, Boeing was Agile before its time.

I love the Boeing 777. I have flown more than a few miles over the years, and for the majority of them it has been the 777 that carried me and millions of other passengers safely across oceans and continents. For most of us the flying experience is judged by the quality of the food and in-flight entertainment options, and whether the flight is on-time or not. We don’t pay much attention to what airplane we’re in and just want to get to where we need to go, but somehow I have grown fond of the 777. It’s like an old reliable friend now, so when I see the 777 on my itinerary I don’t mind the flying part so much.

But this blog isn’t about someone’s creepy love story with an airplane – although that could be interesting enough. It’s about Product Development.

How do you develop an airplane like the 777? It turns out the story of the 777 development program is even more interesting than the plane itself. The 777 program included elements that were both Iterative and even Agile. I enjoyed learning about the program from various sources and found it hard to reduce it all into a single blog post. There is a lot of information out there if you are interested, including a really good 5-part PBS documentary “21st Century Jet – the Building of the 777”. See the bottom of this blog post for some good places to start.

The 777 Development Program

In the late 80’s Boeing was trying to decide on how to fill the product line gap between the 767 and 747. One option was to evolve the 767 design but in the end it was determined that a completely new aircraft was needed. It’s a big investment – an estimated $5BN for the 777. Even though Boeing was in a strong financial position in 1990, a $5BN expenditure would sink any company if it wasn’t successful.

It’s tempting to avoid any kind of risk when you’re in that position but Boeing broke many barriers in this next-generation aircraft design, both in engineering and manufacturing. The 777 was Boeing’s first “fly-by-wire” (computer-assisted control) aircraft, had the first fully computerized cabin, and was also the first airplane ever designed using 3D CAD systems. Seems impossible to understand how it could be done any other way today, but before the 777 airplanes were designed with 2D drawings and it wouldn’t be until the first prototype was built that form and fit could be tested.

The old tried and true approach had its share of familiar problems: in Boeing’s previous development effort, on the 767, an estimated 13,000 individual design changes (large, small and tiny) were made to the door assemblies at various stages in the development process. The 777 would be even worse if something didn’t change.

10,000 engineers were involved in the 777 program spread across the world in the US, Europe, Japan and Australia. Talk about coordination nightmares and opportunity for error. Large-scale project, indeed.

The 777 program was incredibly successful, as evidenced by the timelines and the results. Conceptual design started in January 1990. Manufacturing of the first prototype started in early 1993. The 777 had its first flight on June 12, 1994 and less than a year later (May 1995) the first plane went into service with United Airlines. Boeing estimates that the number of changes and errors was reduced by 80% compared to previous design projects.

Another first for Boeing: their very first airplane delivery was accepted by United Airlines on the first walkthrough, with only a handful of smaller defects to be noted. That’s beyond remarkable for any complex system, and difficult to comprehend given the safety concerns and reliability requirements that an airplane has to satisfy. A defect-free delivery of the very first airplane on-time as promised speaks volumes about any complex development program. I don’t understand airplanes but I know complex systems don’t come together very easily.

If that wasn’t enough… the very first 777 prototype plane built (W001) was good enough to be updated and eventually put in service by Cathay Pacific in 2000. Not bad for a prototype!

Alan Mulally

mulally

The engineering story of the 777 is also the story of Alan Mulally. The management team was initially led by Phil Condit as the executive in charge of the 777 program and Alan Mulally as the director of engineering. Alan Mulally would later replace Phil Condit as the person overall in charge, and it is Mulally’s management style that formed the framework of the 777 program and the “Working Together” model.

A hint of Agile

The first hint of Lean/Agile mindset can be seen in one of Mulally’s weekly program review meetings. A partial glimpse of what looks like program/meeting rules is barely visible, but you can make out at least some of the principles:

  • Plan for zero overtime [Agile: sustainable pace]
  • Weekly DBT reviews [Agile: weekly scrums]
  • Panic Early [Agile: fail fast]
  • Quality and/is Schedule [Agile: Quality is a critical ingredient]
  • Make decisions faster [Agile: act and learn fast]

meeting-rules

Working Together

The labels “Agile” and “Scrum” were not yet born when Boeing kicked off development of the 777 in January of 1990. However the management team knew that something had to change. Boeing’s environment had become bureaucratic and department-focused. Specialists in various departments would design their own parts and then it was up to the manufacturing team (the system integrators) to figure out how to make it all come together. It was a “throw-it-over-the-wall” environment where communication and disconnect was a constant problem.

This time, Boeing would work closely with their customers to design the airplane, and would also tear down the walls between departments by organizing their own workforce across discipline boundaries. People that were normally separated by organizations or development phases would now be engaged together and at the same time, talking and collaborating in real-time.

“Working Together” essentially boils down to “cross-functional” teams but it meant more than that. It meant working closely -really closely- with the customer airlines and suppliers. It signaled a radical departure from the bureaucratic project organization of the past and set completely new expectations going forward. Today we would call such a thing an “Agile Transformation”.

“Working Together” foreshadowed Agile, and the model is -unfortunately- still lightyears ahead of the majority of product development teams on the planet.

Although the 777 program was planned and executed in an overall phase-gate process with key milestones, integration points and deadlines, we can see many elements and attitudes which are definitely Lean/Agile-inspired, if not outright Agile. Looking at the 777 program through the lens of the Agile Manifesto we can recognize many familiar concepts.

Individuals and Interactions over Processes and Tools

The Working Together model pulled people from many disciplines together in what we today call cross-functional teams. This was a radical departure from the way things used to work, where design engineers would design their piece in isolation, then throw their designs over the wall to the manufacturing team – waterfall-style. This was the mode of operation in Boeing in 1990, and chances are  this is also the mode of operation in your company today if you work on a project of any significant size.

To address the problem of communication, Condit and Mulally organized the program in cross-functional Design-Build Teams (DBT). There were almost 250 DBT’s on the program. DBTs were formed around functional areas of the airplane, for example there was one DBT for the engine, one for the cargo door, one for the passenger door, one for the leading wing edge, one for the trailing wing edge, one for the flaps, one for the rudder etc.

Each DBT was staffed with the right people that could carry a particular design from concept through manufacturing and maintenance. Teams had design, manufacturing, tooling, finance, materials, maintenance, subcontractor representatives and even customer representatives to make sure that every aspect of operation, manufacturing and maintenance was covered.

This gives us a hint as to how we can organize Agile at large scale. You can’t possibly coordinate or co-locate 10,000 engineers, but you can make sure that the individual DBT’s are co-located and then coordinate the DBTs. DBTs were organized in a cascading hierarchy according to how functions of the airplane could be decomposed. For example, there were ten DBTs responsible for the wing’s Trailing edge, including a DBT each for Inboard Flap, Outboard Flap, Aileron, Spoilers and so on.

One potential problem of decomposing the project into DBTs is that individual teams lose track of the whole design. Weekly DBT reviews helped counteract that, but Boeing realized that the teams always need to be bonded together by a higher-level goal. When you are invested and care about something, you look out and make sure your work, and the work of the person next to you, is at the highest level.

To achieve that alignment Mulally did something incredible which I still have a hard time imagining. He pulled together all 10,000 engineers working on the 777 program for all-hands meetings once a quarter. By mid-1993 the 9th (!) all-team meeting convened in Seattle. That’s 9 of those in 30 months…

All-teamWe can only speculate how much it cost Boeing to bring everyone on the team together once per quarter, but Boeing understood the real economics of product development: total team alignment allows you to reduce errors and disconnects and avoids prolonged delays. The cost of bringing everyone together for 2 days (10,000 x 2 days of salary plus travel) is much smaller than the cost of a one-month delay to the program (10,000 x 30 days of salary plus downstream fallout).

When making resource vs. time decisions, you always need to consider the burn-rate and cost of delay for the project. The burn-rate of a 10,000-person program is much higher than the cost of these all-team meetings.

Working Software (or planes) over Comprehensive Documentation

It’s all about creating fast feedback loops in order to discover problems early while they are still correctable. Agile is a Fail-Fast model.

Quick iterations and virtual integration

CADIt is of course not practical -yet- to build and test an airplane incrementally in Sprints, but in the early 1990s CAD software was just being introduced. The 777 was the first airplane to be designed almost entirely using 3D CAD software instead of 2D drawings. This allowed Boeing to simulate form and fit quickly instead of building the traditional physical mock-ups which took both time and resources. Now for the first time 3D CAD models could be fit together and verified almost in real-time, before the first prototype was built.

Prototyping

The 777 program built 9 working airplane prototypes, compared to the usual 6 for a traditional airplane development program. Considering that the list price of a 777 would be in the 100M range, the extra 3 prototypes must have put a big dent into the development budget. However if you value fast feedback loops, then the cost of not building the extra 3 airplane prototypes would be even greater.

I don’t know for sure of course, but assuming that 9 prototypes each cost at least $100M, the  prototype expenditure for the 777 program was in excess of 20% of the total budget.

Flight-deckNot all subsystems needed a full-scale prototype. To test the maneuverability and visibility of the new flight-deck layout, a flight-deck prototype was built and mounted it on a wheeled frame which allowed them to taxi around the airport to get a feel for the handling, controls and visibility of the new flight-deck design.

Mock-ups and test-beds

If you can test early, then do so – especially for the risky new components.

A prototype of Pratt & Whitney’s new engine was fitted to an old 747 and given a test flight. It’s a costly experiment which was debated internally, but in this case it was justified: a surge (engine back-fire) was experienced on that first flight, and was discovered early enough that Pratt & Whitney could address the problem without delay to the 777 program.

The new Fly-by-Wire system was similarly tested on a 757 airplane first to make sure that everything would work smoothly on the 777.

These added prototypes and test beds resulted in positive improvements for the customers too. The accelerated testing schedule made possible by the additional prototypes made a difference in getting the necessary FAA certifications in record-time, allowing for a much faster customer deployment of the plane.

Customer Collaboration over Contract Negotiation

Boeing needed a lead customer for the 777 and after fierce competition, United Airlines selected Boeing and the 777 for a $22BN deal that effectively launched the 777 program from concept into reality.

United Airlines and Boeing did write a formal contract, but the essential agreement upon which United Airlines would award Boeing their business was the famous “Condit-Guyette” memo. This hand-written note, drawn up in the early morning hours by United Airlines executive James Guyette punctuated several days of competitive negotiations between Airbus, McDonnell-Douglas and Boeing.

Signed by both Boeing and United Airlines executives, it stated that Boeing and United would work together to design a new service-ready airplane.

Thanks due to the PBS documentary flashing the original memo on the screen for a few seconds we can reverse-engineer what it actually said:


The Condit-Guyette memo, transcribed

B777 Objectives

United + Boeing + Pratt-Whitney

In order to launch on-time a truly great airplane we have a responsibility to work together to design, produce and introduce an airplane that exceeds the expectations of flight crews, cabin crews and maintenance and support teams and ultimately our passengers and shippers.

From day one:

– Best dispatch reliability in the industry

– Greatest customer appeal in the industry

– User friendly and everything works

October 15, 1990

Signed by United Airlines, Pratt-Whitney and Boeing executives


Seems quite reasonable from a customer perspective yet unrealistic from a product development standpoint. What a great way to kick off a project, and it doesn’t get any clearer than that: customer collaboration over everything else.

Responding to Change over Following a Plan

Although the design of the 777 followed a master production plan, we can see a few ways where Boeing and its suppliers expected and responded to change within the boundaries of that plan.

As many as 8 airline customers had full-time representatives sitting alongside Boeing in Seattle, with British Airways peaking at 75 people integrated with Boeing on the 777 program. 1,500 design features were reviewed with the airlines, and changes were made to 300 of them as a result.

As another example, for the first time Pratt & Whitney (the jet engine manufacturer) held several open design reviews where they invited airplane mechanics from customer airlines to critique and provide feedback about the serviceability of their new engine. This helped reduce human error in maintenance, and helped build confidence with the customer airlines that they would have a reliable and serviceable engine on the 777.

There are many other stories of requirement change, such as the rudder team in Australia which had to endure changing requirements twice after rudder manufacturing had started. Those are the big and visible ones. But how would we deal with all the small day-to-day changes?

On a day-to-day basis the DBT’s would deal with new and emerging information, and had put in place fast feedback loops between manufacturing and design engineers. Change requests which normally took weeks to process were handled in a single day with the DBT approach. When you can collaborate quickly, it is also much easier to implement (and undo) changes.

Postscript

The 777 program is a technical and commercial success, and has garnered numerous innovation awards. Unfortunately Boeing did not push the “Working Together” model across the company. The amount of change, training and continuous attention experienced in the 777 program was quite high, so it was decided to leave it optional for each development program to decide. Of the next 3 development programs, only one implemented the “Working Together” model. Judging from the 787 impressions (numerous delays and early equality issues, fleet groundings), it seems a lot of the 777 lessons have been forgotten or ignored.

Alan Mulally himself later joined Ford in 2006 as President and CEO. There he continued the corporate turnaround with similar management methods. It just goes to show: principles are transferable but practices are not.

References and Resources

You can find out quite a bit about the 777 development program if you just do a bit of searching. Here are some of the links/resources which i came across:

Domain Dependence

Occasionally I come across a golden nugget that puts things in perspective. Usually it’s a simple and beautiful idea which elicits the “duh yes of course” gut response when it’s first explained but then continues to reverberate and come back into focus again and again. Ideas such as this one articulated by Nicholas Nassim Taleb.

“Some people can understand an idea in one domain, say, medicine, and fail to recognize it in another, say, socioeconomic life. Or they get it in the classroom, but not in the more complicated texture of the street. Humans somehow fail to recognize situations outside the contexts in which they usually learn about them.” – N. N. Taleb

He refers to this as the domain dependence of our mind. He puts forth a few examples, including that of the business man who pays the hotel porter to carry his luggage upstairs, then heads for the gym to lift weights – mimicking the natural action of lifting a suitcase. Simply put – we put blinders on and fail to recognize familiar patterns in unfamiliar situations.

It explains a lot of the frustration that comes about in process improvements and agile transformations. At least for those of us that like to think and speak in analogies, Taleb’s words ring true. We are simply too accustomed to thinking about things in our familiar ways to see the simple truth staring at us from the other side of the mirror. It gets worse as our expertise deepens. Expert knowledge is deep, and therefore becomes more and more domain-specific. The groove wears deeper and obscures our view of the world. It gets harder and harder to look outside our own domain for solutions to the problems we are working on.

But for innovation that’s exactly where we need to look – outside of our own domain.

We experience the pattern clearly when we discuss Lean/Agile concepts with someone that has a firm waterfall view of the world. Most people can understand and agree with the concepts, but many also have clear convictions that those same ideas simply don’t fit in their area.

There are a couple of mental shifts that have to be made in order to recognize foreign ideas and implement them locally. First you have to take a particular situation and understand the general idea and principle behind it. That takes some abstract thinking abilities which some people are better at than others. Nonetheless you need to extract the general principle at play. Then, that same general idea needs to be applied in a different way in the new domain. The application may look different but the principle behind it is the same.

For example the idea that an apple will fall from the tree to the earth is well-known and accepted in the domain of everyday things; The idea that the earth actually pulls the apple towards the planet’s center until it tears loose from the tree is harder to imagine.

At a different level, we have almost no trouble learning about the gravitational pull of planets. Once we understand the principle of gravity explained using planets as the domain we can start to see the commonality. The only difference between the moon and the apple here is scale – it’s just that you can’t see the similarity unless you shift your viewpoint significantly. Once we look at them both from outer space, it’s a bit easier to visualize the apple as a really tiny moon. Now the apple doesn’t fall down any more. It gets pulled inwards towards earth’s center.

Or for a more obvious example – since you are reading this blog – consider the difficulty of seeing the translation of Lean Manufacturing methods to Product Development. The Software crowd are starting to get it, but the Hardware/FPGA folks? “That won’t work in our special domain!” (BTW to me that’s not a problem at all – it’s an opportunity to be the first HW/FPGA developers to outpace the competition). Check out www.agilesoc.com and @nosnhojn as an example of pioneering work here.

Domain dependence, then, is one hurdle that we repeatedly have to cross. That is especially true for Lean/Agile transformation because the Lean principles (which are fundamental to Agile methodology) are so counter-intuitive to “common” knowledge. Lean/Agile principles are really sensitive to domain-dependence because of that.

But there is a bit of good news: if you can shift the viewpoint and get to the principle behind the practice, then this can be your detour around domain dependence. Talking about general principles without being weighed down by “common knowledge” will open the door. Once you have developed an understanding of how general principles work (like gravity) then it is a bit easier to move forward since rejecting a fundamental truth just because it looks different is, in a word, irrational. Of course calling someone irrational would not be a rational thing to do in most situations, so find another way to make the point. Patience helps; most of the time you just have to wait for the gears to turn and the point will make itself.

So, domain dependence is a real obstacle and it’s usually not recognized as a factor when we gauge resistance to new ideas. Remembering that simply shifting the viewpoint and focusing on the general principles at play is one way to reduce the impact of domain-dependence. Plotting that course is a tough job which falls to you – the change agent.

Why not SAFe?

So let me get this out right away: I’m a SAFe proponent. But hang on – it doesn’t make me a “pure-Agile” non-supporter.

I have my PSM certification from scrum.org and my SPC certification from Scaled Agile Academy, so I feel comfortable enough to comment on the discussion. The discussion is not as intense as it was a few weeks ago, but I sense that the residual effect is for some to bypass SAFe as even an option to consider.

It’s too bad that what would otherwise be a healthy discussion surrounding SAFe and “pure Agile” is causing division. Peer review and feedback is the best way to ruthlessly drive things forward and evolve, but if we allow discussion to degrade into both (or one) sides digging their heels in and simply trying to convince the rest of the world that their approach is best instead of understanding each other and move forward, the value of discourse quickly dissipates.

On this front, Dean and SAFe scores the point: every time I heard Dean speak about SAFe, he has been very clear that there is a lot to be gained from being pragmatic and tolerant of other’s views. Rather than taking a polarizing “my-way-or-the-highway” stance, the creators of SAFe already has an advantage here that will work in their favor in the long run.

So what is it about SAFe that makes it worth the investment, that other Agile approaches haven’t covered well enough yet? In my view…

SAFe addresses the people not on the Agile team.

This insight by itself should be enough to pique your interest. Most Lean/Agile material I’ve seen focuses on how to implement Agile methodologies, but very little attention is paid to the stakeholders outside the team. Yes I know, there are exceptions out there, but generally I feel that we don’t pay enough attention to the folks outside the immediate development team.

SAFe addresses these stakeholders as first-class citizens, not as impediments to progress that have to be “turned around”. That’s a smart thing. Those “impediments” often sign your paycheck, and if they are not comfortable with what’s happening, your Agile Adventure will be short-lived or at least very limited in reach. Let’s invite them to contribute in a format they can relate to and be effective in.

SAFe has the right ingredients and building blocks

  • Lean principles as the leadership backdrop
  • Scrum as the proven team-level project management framework
  • XP-inspired coding practices
  • Don Reinertsen’s Principles of Product Development Flow

You may have a different set of favorites, but when I saw the above 4 ingredients in the same recipe, I definitely paid attention. Ever since I read and re-read Don’s book I’ve been on the lookout for applications of this really wonderful and clarifying set of ideas.

SAFe is for large programs.

The “S” in SAFe stands for “Scaled” – a point which I feel is lost in some of the SAFe critics. I have spent all my working life in large organizations in both software development and project/program management roles, and I can confidently say that if you want to manage anything at a scale beyond a handful of people you will need some sort of framework to keep everything aligned. Simply relying on a shared set of values and a common goal isn’t enough. Multiple teams simply won’t self-organize in a single direction. It doesn’t matter whether you are doing Agile or not – large groups of people working together is going to require alignment and guidance to enable some sort of orchestrated delivery.  As Reinertsen points out: “there is more to be achieved from overall alignment than local optimization”.

Statistically we will have some success-stories where overall alignment happens in an organic manner, but don’t count on it for your project. You N. N. Taleb fans out there know what I mean.

SAFe pulls it all together and communicates a holistic view

Again, selling Lean/Agile to folks outside the Agile teams is where you either achieve an Agile enterprise transformation, or you don’t. You don’t have to “sell” Agile to the teams themselves, they are usually the ones that get onboard first. If you want to scale Agile to the enterprise-level, you have to sell the idea to those that have the most influence.

The value of having a single-page graphic which acknowledges the various roles and the main flow of events can’t be overstated. Here, for the first time, we have a map of the world in which everyone in the existing organization can see a place for themselves in an Agile setup, post-transformation. That immediately defuses a lot of the potential friction and apprehension associated with introducing Agile to the C-level office and the middle management layer. You know, the ones who decide what you can’t and can’t do in the enterprise. Kind of important to get those folks on your side. Showing the full context of an Agile Release Train is definitely much easier than selling the idea of “Agile self-organizing” teams.

SAFe may not be for you

…and that’s ok.

If you’re starting out and introducing Agile to a small organization, you don’t need or want SAFe. If you’re working with a small set of teams, it’s probably not for you. That’s ok – SAFe starts to make sense at scale. Keep that in mind as you grow.

SAFe is not a one-size-fits-all solution. Neither would you want one. Personally I just need a model that fits my situation and environment. Most of us work in one company, and in one environment. Chances are slim that we will need a large-scale Agile solution one day and a small-team solution the next. If SAFe works for what I need to do, then that is fine. If not, then move on and find something else. But please don’t make it your mission to convince others (who you don’t even know) that some approach or other is not an option for them. SAFe is indeed an option, and so is “pure Agile” and any other variant such as DAD, but only you can decide which one is right for you.

Use SAFe (or any other model) responsibly

A lot of the SAFe criticism I’ve seen seems to assume that our teams don’t think for themselves, but rather assumes SAFe would be adopted blindly and without real-world context.

Isn’t that anti-Agile? Thinking people can understand when to apply the mechanisms offered by SAFe and when to do something else. Not all aspects of SAFe will work for you, but it would be a shame to throw out 100% of the model just because there is 10% that doesn’t work for you. Use the parts that are helpful to you, discard the rest (for now). Whatever defined approach you choose, you’ll need to tune it to your particular situation anyway. I think Dean and the SAFe crew would agree.

…which brings me to the point of this blog post

Although a polarizing discussion is entertaining and can provoke some great insights, at some point we reach diminishing returns, and instead of improving we alienate. It would be too bad if this argument causes some folks to bypass the good work done by the SAFe team. SAFe can provide a huge head-start for some companies, especially those who need to visualize a pathway out of the waterfall.

If you are rejecting SAFe on the basis of comments made by some Agile rock-stars, pause for a moment and put it all in context. What alternatives are they suggesting? In most companies, it’s not enough to simply rely on self-organization around a shared goal and scrum-of-scrums. At some point at scale you will eventually need some sort of framework and holistic planning involving the rest of the enterprise, whether it’s called SAFe or not.

If you are embracing SAFe because it looks less threatening to your existing organization and an easy fix, you should probably take a step back and ask yourself if you really understand what Lean/Agile is about. Your organization will likely have to change its ways, and it can be a difficult transformation.

Ultimately, I don’t see that SAFe competes for territory with pure Agile/Scrum but rather complements our current practices to enable larger-scale Agile projects. There is a big world out there that largely still hasn’t transitioned to Agile, and the sandbox is big enough for everyone to play. Let’s move Agile adoption forward in any way we can, even if it doesn’t fit a purist’s view of what Scrum should be.

If SAFe brings Lean/Scrum/Agile within reach for companies that would otherwise not even entertain the idea, how can that be a bad thing?