The Formula-1 Pit Stop: Lean counter-counter-intuition

What does F1 racing and LeaF1-Leann Product Development have in common? Not much at the surface… but if you, as I do, see everything as systems then you can’t help but notice some interesting things. In this case I found an apparent counter-example to Lean that is Lean despite going against the grain of traditional Lean Thinking. Hmmm… we’re into double-negatives here but stay with me.

The racing analogy is interesting to me not because “Lean=Speed” but because someone questioned the “obvious right way” and came up with a counter-intuitive better solution.

640px-2010_Canadian_GP_race_startI am always amazed every time I catch even a glimpse of Formula-1 racing. The cars fly around the track at up to 300 miles per hour, pull 1.45g during acceleration and 4g when braking. High speeds, tight turns and frequent acceleration and braking wears hard on car and driver, but even more so on the tires which aren’t even engineered to last a whole race.

An F1 tyre these days is designed to only last for about 120 kilometers on average (it’s a weight vs. durability tradeoff), but most F1 races are at least 305 kilometers long. That means you need to change tires 2 or 3 times in a race that is won or lost by fractions of a second.

I’m fascinated by this, of course, because the pit stop is the biggest impediment to continuous flow around the track. If you could make one less pit stop than your competitor you would be several seconds ahead, turning 10th place into victory. For a long time that was the strategy: go easy around the curves so as to conserve tires and fuel. A lower average speed would win the race as long as you could avoid making too many pit stops. So far it sounds Lean, right? Slow down and you’ll finish the race faster. No surprise there: Lean solutions are usually counter-intuitive.

Twisting and Turning

Well, every good story has a twist and our little F1-Lean analogy is no different.

In the mid-1980s someone took a step back and looked at the whole end-to-end McLaren_pit_work_2006_Malaysiasystem and realized that the “economy” of racing could be improved. Start the race with only half a tank of fuel, and the car would be much lighter and go faster. Stop worrying about conserving tires and instead push the car to the limits on the track. The penalty for this strategy is an added number of pit stops.

Not a problem – you just need to minimize the time spent in each pit stop.

It’s a tradeoff curve, as always. If you can continuously reduce the amount of time needed to refuel and swap tires, then at some point down the curve  the wear-and-tear vs. pit stop balance will shift.

Now it gets interesting. Because we now have two different models of “good” racing strategy, we have to choose – but how? We have to take a systems view, and make the decisions based on the objective merits of each strategy, not by intuition or personal preference.

Yes but we’re talking about Product Development, right?

You see analogies in Product Development all the time. Whenever there is a bottle-neck in our process we have to decide to either fix/improve the bottle-neck, or try to avoid it. Not all bottle-necks are solvable or even visible. Some are disguised as “the way we always do things here”. Most companies have settled on a particular pattern of which bottle-necks (departments/phases) are reasonable (acceptable as the cost of doing business) and which ones aren’t. Companies that structure the development flow through phases and gates accept the overhead associated with functional departments as “not perfect but the best way to do things”.

Lean Thinkers challenge this view all the time: figure out where the the flow stops and then improve it. Usually it comes down to finding local optimizations and then reducing or eliminating tasks in the name of overall flow. For example, eliminating pit stops so that the race can flow un-interrupted.

And sometimes Lean Thinkers have to challenge themselves, to avoid getting stuck in the “best” Lean solution.

So where did Formula-1 end up?

Have a look at this Ferrari pit stop from 2013. Mid-race refueling is no longer allowed in F1, so now it’s all about how quickly you can get 4 new tires on the car. You can feel the anticipation as you watch the pit crew waiting for the car to arrive.

That pit stop took 2.1 seconds. It’s a huge improvement from the minute-plus pit stops of the early days of F1 in the 1950s. Pit crews spend a lot of time and money to squeeze out every millisecond they can from their process. You can almost visualize a value stream map on the garage wall and the team swarming to find and reduce the next bit of waste from the process.

Is it a fair analogy?

But, many will say, that’s not a fair analogy. Obviously they have a lot of specialized equipment and a huge crew of specialists standing by. This is high-stakes racing and has nothing to do with Lean or product development.

Well that’s the whole point behind this blog post. It’s a classic example of how Lean thinking differs from mass-production thinking. It just manifests itself in a different environment. Speed (and safety) matters in F1, and speed (and safety) matters in Lean.

Systems of systems: Lean Fractals

There is a very direct and obvious application of Lean Thinking and Value Streams to what happens in the pit. If you have seen a Value Stream Map, you can easily understand how that helps weed out waste and inefficiencies in the process flow of safely and quickly changing the tires.

But there is another higher-level system at play, which is includes both the pit stop and the track. This system-of-systems has a different kind of flow economy, working off a different set of aggregated information.

If we ignore the system-of-system effect and blindly apply Lean tools it can lead you astray. Pre-1980s racing solved for Lean flow based on how the costs  were incurred back then, i.e. relatively sloold pit stopw pit stops. Slow down around the track and finish in first place. However the cost of any activity changes over time, and the technology and capability of the pit crew improved such that the basic assumptions behind “the best way to race” had to change.

In this case they had to question some long-held beliefs assumptions in order to get to a new level of performance. Similarly we find the biggest improvements hiding in plain sight when we look at our overall product development system and question our traditional way of working. Our world is full of systems-of-systems.

I draw two lessons from the F1-Lean analogy:

Question the Status Quo

The obvious approach isn’t always right, and you won’t see it until you look at the whole end-to-end system. It is counter-intuitive that adding one more time-consuming pit stop to your race will speed things up overall… but it does – up to a point.

Even if you settle on a “best” Lean solution, this might also be a local optimum… keep looking and keep questioning. The journey through solution space is not always a linear one.

Invest in non-mainstream efforts

In order to lean out the value stream you may have to invest and focus on non-mainstream efforts such as tooling and supporting activities. This is the stuff of overhead. It’s hard to justify increasing the overhead cost when the normal pressure is to reduce expenses. But in a Lean system the right overhead isn’t a liability – it’s what enables the mainstream to go fast. The F1 teams had to shift investment away from car and engine design and onto the less-glorious pit crew tools and processes.

The difference is in the kind of overhead: instead of traditional overhead which is needed to manage the waste generated by stitching together the work of separate departments, Lean “overhead” is there to make the value stream flow faster at higher quality.

Yes it’s a fair analogy

To return to the point above, yes the analogy is totally fair. You can usually make your product development progress much faster if you invest in af1-pit-crewncillary processes and tools with an eye towards end-to-end Flow. The Ferrari team in the video clip shows 21 crew members, all working franticly for less than 3 seconds. It takes 3 people per tire to do the job. Wasteful resource utilization? Not if time matters and your objective is to get the car through the pit stop quickly. Sure, we obviously can’t allocate 21-person support teams for all tasks, but the idea is to figure out what it takes to achieve the best end-to-end flow and then invest accordingly.

When I first started developing software we had bi-weekly load builds because it was such a difficult thing to get a clean build with 40 engineers all submitting several weeks of code changes, and build servers were very expensive. We avoided load build “pit stops” until they were absolutely needed.

Gradually the situation changed, and we now have efficient continuous build and test environments and the ability to submit code changes daily. The “economy” of our pit stop changed. It was initially not easy convincing management to invest in extra build servers, tools and staff – but eventually we met the point on the tradeoff-curve where investing in ancillary things like load builds made more and more sense. Now every serious software group has a DevOps setup.

Local vs. Global Optimization

There is a balance between local and global optimizations, and the balance can
shift over time. You can’t even grasp the concept of that balance unless you look at the end-to-end system flow. Chances are that you are only looking at improvements within your own department. If you are, then you might routinely leave improvements of an order of magnitude or more on the table. Look one level up, at the system-of-systems, and see what you can find.

Lean Systems: Antifragile Applied

“Systems subjected to randomness—and unpredictability—build a mechanism beyond the robust to opportunistically reinvent themselves each generation”

– Nassim Nicholas Taleb

In a previous post I introduced the concept of Antifragility – systems that benefit from shocks, randomness and disorder. Classifying the world in the triad of Fragile – Robust – Antifragile helps us understand and manage the potential impact of the uncertainty surrounding us.

It’s initially hard to imagine that anything useful could benefit from disorder, so the first thing to realize is that although objects and things can be Fragile or Robust, they can’t be Antifragile. Systems, on the other hand (which if course includes Product Development Systems) are made up of multiple interacting components. Systems exhibit behavior as they respond to their surroundings, and can be Fragile, Robust or Antifragile. It is this ability to respond and interact that opens the door to antifragility. Antifragility can bee seen as a type of evolutionary mechanism, continuously picking the best of the available options. So, when we look for examples of antifragility we need to look at systems, not objects.

Stressors: the fuel of Antifragility

A stressor is something that puts a strain on the system, pulls it away from its equilibrium. It’s the system’s response to stressors that classifies the system as either Fragile, Robust or Antifragile.

A system that gets weaker from the encounter with the stressor is Fragile. For example, a pyramid scheme collapses when exposed to the light of day. Not only dictators (the individual) but the foundation of the dictatorship (the system) crumbles when the forces of democratic thought are applied. The best-laid project plan with all its gantt-charts has a best-before date sometime before the first problem is discovered.

Robust systems neither gets weaker nor stronger in the presence of a stressor. Most government bureaucracies seem to fall in this category – their inability to learn and evolve astounds me, as does their unequaled staying power. Many companies operate in this way too. New ideas get rejected and expelled by the corporate immune system, allowing the company structure to stay the same even in the face of certain bankruptcy. Remember Kodak? GM?

Antifragile systems on the other hand enjoy randomness and stressors, at least up to a point. Shocks and disruption make them stronger because they keep the system alert and in shape. Stressors exercise and improve the system the same way physical activity stresses and improves your body. Strength training, for instance, involves pushing your muscles just past their breaking point. Your body is able to repair this damage and even over-shoots in the repair effort. The result is that you are left with a little more muscle mass than you had before. This is how Schwarzenegger became Schwarzenegger and Ahnold was again a cool and acceptable name for your first-born. Without these stressors the system would stagnate, much like a couch-potato grows the wrong kind of body mass and ends up with clogged arteries.

Of course, there is a limit to how much stressors are beneficial. Running at a reasonable effort level puts you in better shape; the first marathoner supposedly expired at the goal line, having historically over-exerted himself to deliver with his last gasp the one-word message to the king: “victory”.

(hang on – if they won the battle, then why the life-and-death rush? Good news would still be reasonably good the next morning, right?)

The next important thing to understand about Antifragile Systems is that they work in layers. It is not enough that individual members get stronger, the system as a whole needs to be able to survive and thrive. It needs to be able to learn and select.

It’s in the DNA of the System

Going back to our example of Mother Nature as the ultimate antifragile system, we can observe that the individual member of a species are inherently fragile. In fact, each member will eventually die off, no matter how strong it is. There is a natural turnover to make room for the newer and more fit members. By natural selection and replacement of individuals the system becomes more and more fit. There is a layering effect here. Individual members (at the lowest layer) compete with each other. The strong propagate their DNA and have (presumably) stronger offspring, the weaker gradually (or abruptly, as the case may be) exit the gene pool. The system as a whole (at a higher layer) grows stronger as a result. The system survives the demise of each of its members because the information that makes up the system is preserved in its DNA, surviving generation after generation of individuals.

By evolution such a system improves gradually even if there is no master plan and things happen at random. The system continuously Inspects and Adapts, and the current “best recipe” is carried forward in our DNA. As long as we recognize and seize opportunity, even a random walk will be beneficial. Antifragile systems love errors and variation for that reason.

Lean Systems: Fragile?

Lean systems are called Lean because they deliberately operate with very small error margins. For example, Lean Manufacturing systems are sometimes called “zero-inventory” systems because they have almost no buffer inventory to absorb variations and problems at individual stations. If there is a problem somewhere on the production line, the whole system could shut down. This is by design: in a tightly coupled system small problems are amplified to make them painfully obvious, and every problem becomes an urgent matter.

In one sense Lean systems are therefore very fragile to disorder and error so one might be tempted to simply put Lean in the Fragile category. But it’s not that simple. The antifragility of Lean is in the DNA of the system.

Lean Systems: Antifragile

So we need to reconcile the apparent fragility of the small operating margins of a Lean system with the claim that Lean systems are antifragile.

I like Steven Spear’s (The High Velocity Edge) summary of a good Lean implementation:

  1. Build a system of “dynamic discovery” designed to reveal operational problems and weaknesses as they arise
  2. Attack and solve problems when and where they occur, converting weaknesses into strengths
  3. Disseminate knowledge gained from solving local problems throughout the company as a whole
  4. Lead by developing capabilities 1, 2 and 3

The ingenuity and beauty of Lean is that even small problems become intolerable at the system level. Lean Systems use this fragile tight coupling as a way to accelerate system-level learning. If a problem develops, it immediately becomes painfully obvious that something is wrong.

Rather than working around or ignoring these small problems, the team in charge is obligated to immediately seize the opportunity to improve the way the system works before the small problem becomes a big problem. A good lean team will swarm the problem to get it fixed, and put in place measures to ensure that similar problems don’t occur in the future. The result is that the particular process step which failed now has improved and is less likely to fail in the future.

Antifragile systems love errors, and so do Lean systems. The fragility of small error tolerances acts as a forcing function which brings problems to the surface, causing the old faulty processing step to evolve and be replaced with a new and more fit one. Each small failure alters the DNA of the Lean system just a little bit, evolving and improving. One more problem spot has been eliminated, and the probability of future defects is reduced.

So here is a perfect example of a system that is designed evolve over time, to learn from mistakes and to grow more capable after each error. It needs no top-down direction other than living the Lean Principles. There is no master plan, yet Lean systems evolve on their own to become the most competitive and effective man-made systems we have on our planet.

Evolving. Learning. Antifragile. Lean. Wonderful.

If it’s a Pipeline, it’s leaking

Many times we view the Product Development System as a Pipeline where we pour effort and energy in, and out comes a product sometime later. You’ve probably used this analogy before, talking about “products in the pipeline” or “the R&D pipeline”.

Pipe-1

Seems pretty intuitive, and I use that analogy too. Except I recently thought perhaps the analogy isn’t quite right. If you’re working in a Waterfall or phase-gate process, it’s not a single pipeline. It’s a series of smaller pipe lengths which are joined together by hand-offs:

pipe-2

The trouble with hand-offs is that they generate waste.Throughout the journey there can be more energy lost in hand-offs than actually make it out of the pipeline. In every joint, effort and energy leaks out.

Pipe-3

I find this analogy is a little more fitting, and although it’s a simple visual it helps make the point about hand-offs at the simplest possible level. The discussion usually turns to “what are the leaks and how we stop them” and there is your entry to discuss Lean and Waste.

What do you think?

Agile at Boeing in 1990s – the 777 Program

777The year 1995 recorded two seemingly very unrelated events: the entry of the first Boeing 777 airplane into commercial service and the introduction of Scrum to the world. As the twists and turns through history go, Boeing was Agile before its time.

I love the Boeing 777. I have flown more than a few miles over the years, and for the majority of them it has been the 777 that carried me and millions of other passengers safely across oceans and continents. For most of us the flying experience is judged by the quality of the food and in-flight entertainment options, and whether the flight is on-time or not. We don’t pay much attention to what airplane we’re in and just want to get to where we need to go, but somehow I have grown fond of the 777. It’s like an old reliable friend now, so when I see the 777 on my itinerary I don’t mind the flying part so much.

But this blog isn’t about someone’s creepy love story with an airplane – although that could be interesting enough. It’s about Product Development.

How do you develop an airplane like the 777? It turns out the story of the 777 development program is even more interesting than the plane itself. The 777 program included elements that were both Iterative and even Agile. I enjoyed learning about the program from various sources and found it hard to reduce it all into a single blog post. There is a lot of information out there if you are interested, including a really good 5-part PBS documentary “21st Century Jet – the Building of the 777”. See the bottom of this blog post for some good places to start.

The 777 Development Program

In the late 80’s Boeing was trying to decide on how to fill the product line gap between the 767 and 747. One option was to evolve the 767 design but in the end it was determined that a completely new aircraft was needed. It’s a big investment – an estimated $5BN for the 777. Even though Boeing was in a strong financial position in 1990, a $5BN expenditure would sink any company if it wasn’t successful.

It’s tempting to avoid any kind of risk when you’re in that position but Boeing broke many barriers in this next-generation aircraft design, both in engineering and manufacturing. The 777 was Boeing’s first “fly-by-wire” (computer-assisted control) aircraft, had the first fully computerized cabin, and was also the first airplane ever designed using 3D CAD systems. Seems impossible to understand how it could be done any other way today, but before the 777 airplanes were designed with 2D drawings and it wouldn’t be until the first prototype was built that form and fit could be tested.

The old tried and true approach had its share of familiar problems: in Boeing’s previous development effort, on the 767, an estimated 13,000 individual design changes (large, small and tiny) were made to the door assemblies at various stages in the development process. The 777 would be even worse if something didn’t change.

10,000 engineers were involved in the 777 program spread across the world in the US, Europe, Japan and Australia. Talk about coordination nightmares and opportunity for error. Large-scale project, indeed.

The 777 program was incredibly successful, as evidenced by the timelines and the results. Conceptual design started in January 1990. Manufacturing of the first prototype started in early 1993. The 777 had its first flight on June 12, 1994 and less than a year later (May 1995) the first plane went into service with United Airlines. Boeing estimates that the number of changes and errors was reduced by 80% compared to previous design projects.

Another first for Boeing: their very first airplane delivery was accepted by United Airlines on the first walkthrough, with only a handful of smaller defects to be noted. That’s beyond remarkable for any complex system, and difficult to comprehend given the safety concerns and reliability requirements that an airplane has to satisfy. A defect-free delivery of the very first airplane on-time as promised speaks volumes about any complex development program. I don’t understand airplanes but I know complex systems don’t come together very easily.

If that wasn’t enough… the very first 777 prototype plane built (W001) was good enough to be updated and eventually put in service by Cathay Pacific in 2000. Not bad for a prototype!

Alan Mulally

mulally

The engineering story of the 777 is also the story of Alan Mulally. The management team was initially led by Phil Condit as the executive in charge of the 777 program and Alan Mulally as the director of engineering. Alan Mulally would later replace Phil Condit as the person overall in charge, and it is Mulally’s management style that formed the framework of the 777 program and the “Working Together” model.

A hint of Agile

The first hint of Lean/Agile mindset can be seen in one of Mulally’s weekly program review meetings. A partial glimpse of what looks like program/meeting rules is barely visible, but you can make out at least some of the principles:

  • Plan for zero overtime [Agile: sustainable pace]
  • Weekly DBT reviews [Agile: weekly scrums]
  • Panic Early [Agile: fail fast]
  • Quality and/is Schedule [Agile: Quality is a critical ingredient]
  • Make decisions faster [Agile: act and learn fast]

meeting-rules

Working Together

The labels “Agile” and “Scrum” were not yet born when Boeing kicked off development of the 777 in January of 1990. However the management team knew that something had to change. Boeing’s environment had become bureaucratic and department-focused. Specialists in various departments would design their own parts and then it was up to the manufacturing team (the system integrators) to figure out how to make it all come together. It was a “throw-it-over-the-wall” environment where communication and disconnect was a constant problem.

This time, Boeing would work closely with their customers to design the airplane, and would also tear down the walls between departments by organizing their own workforce across discipline boundaries. People that were normally separated by organizations or development phases would now be engaged together and at the same time, talking and collaborating in real-time.

“Working Together” essentially boils down to “cross-functional” teams but it meant more than that. It meant working closely -really closely- with the customer airlines and suppliers. It signaled a radical departure from the bureaucratic project organization of the past and set completely new expectations going forward. Today we would call such a thing an “Agile Transformation”.

“Working Together” foreshadowed Agile, and the model is -unfortunately- still lightyears ahead of the majority of product development teams on the planet.

Although the 777 program was planned and executed in an overall phase-gate process with key milestones, integration points and deadlines, we can see many elements and attitudes which are definitely Lean/Agile-inspired, if not outright Agile. Looking at the 777 program through the lens of the Agile Manifesto we can recognize many familiar concepts.

Individuals and Interactions over Processes and Tools

The Working Together model pulled people from many disciplines together in what we today call cross-functional teams. This was a radical departure from the way things used to work, where design engineers would design their piece in isolation, then throw their designs over the wall to the manufacturing team – waterfall-style. This was the mode of operation in Boeing in 1990, and chances are  this is also the mode of operation in your company today if you work on a project of any significant size.

To address the problem of communication, Condit and Mulally organized the program in cross-functional Design-Build Teams (DBT). There were almost 250 DBT’s on the program. DBTs were formed around functional areas of the airplane, for example there was one DBT for the engine, one for the cargo door, one for the passenger door, one for the leading wing edge, one for the trailing wing edge, one for the flaps, one for the rudder etc.

Each DBT was staffed with the right people that could carry a particular design from concept through manufacturing and maintenance. Teams had design, manufacturing, tooling, finance, materials, maintenance, subcontractor representatives and even customer representatives to make sure that every aspect of operation, manufacturing and maintenance was covered.

This gives us a hint as to how we can organize Agile at large scale. You can’t possibly coordinate or co-locate 10,000 engineers, but you can make sure that the individual DBT’s are co-located and then coordinate the DBTs. DBTs were organized in a cascading hierarchy according to how functions of the airplane could be decomposed. For example, there were ten DBTs responsible for the wing’s Trailing edge, including a DBT each for Inboard Flap, Outboard Flap, Aileron, Spoilers and so on.

One potential problem of decomposing the project into DBTs is that individual teams lose track of the whole design. Weekly DBT reviews helped counteract that, but Boeing realized that the teams always need to be bonded together by a higher-level goal. When you are invested and care about something, you look out and make sure your work, and the work of the person next to you, is at the highest level.

To achieve that alignment Mulally did something incredible which I still have a hard time imagining. He pulled together all 10,000 engineers working on the 777 program for all-hands meetings once a quarter. By mid-1993 the 9th (!) all-team meeting convened in Seattle. That’s 9 of those in 30 months…

All-teamWe can only speculate how much it cost Boeing to bring everyone on the team together once per quarter, but Boeing understood the real economics of product development: total team alignment allows you to reduce errors and disconnects and avoids prolonged delays. The cost of bringing everyone together for 2 days (10,000 x 2 days of salary plus travel) is much smaller than the cost of a one-month delay to the program (10,000 x 30 days of salary plus downstream fallout).

When making resource vs. time decisions, you always need to consider the burn-rate and cost of delay for the project. The burn-rate of a 10,000-person program is much higher than the cost of these all-team meetings.

Working Software (or planes) over Comprehensive Documentation

It’s all about creating fast feedback loops in order to discover problems early while they are still correctable. Agile is a Fail-Fast model.

Quick iterations and virtual integration

CADIt is of course not practical -yet- to build and test an airplane incrementally in Sprints, but in the early 1990s CAD software was just being introduced. The 777 was the first airplane to be designed almost entirely using 3D CAD software instead of 2D drawings. This allowed Boeing to simulate form and fit quickly instead of building the traditional physical mock-ups which took both time and resources. Now for the first time 3D CAD models could be fit together and verified almost in real-time, before the first prototype was built.

Prototyping

The 777 program built 9 working airplane prototypes, compared to the usual 6 for a traditional airplane development program. Considering that the list price of a 777 would be in the 100M range, the extra 3 prototypes must have put a big dent into the development budget. However if you value fast feedback loops, then the cost of not building the extra 3 airplane prototypes would be even greater.

I don’t know for sure of course, but assuming that 9 prototypes each cost at least $100M, the  prototype expenditure for the 777 program was in excess of 20% of the total budget.

Flight-deckNot all subsystems needed a full-scale prototype. To test the maneuverability and visibility of the new flight-deck layout, a flight-deck prototype was built and mounted it on a wheeled frame which allowed them to taxi around the airport to get a feel for the handling, controls and visibility of the new flight-deck design.

Mock-ups and test-beds

If you can test early, then do so – especially for the risky new components.

A prototype of Pratt & Whitney’s new engine was fitted to an old 747 and given a test flight. It’s a costly experiment which was debated internally, but in this case it was justified: a surge (engine back-fire) was experienced on that first flight, and was discovered early enough that Pratt & Whitney could address the problem without delay to the 777 program.

The new Fly-by-Wire system was similarly tested on a 757 airplane first to make sure that everything would work smoothly on the 777.

These added prototypes and test beds resulted in positive improvements for the customers too. The accelerated testing schedule made possible by the additional prototypes made a difference in getting the necessary FAA certifications in record-time, allowing for a much faster customer deployment of the plane.

Customer Collaboration over Contract Negotiation

Boeing needed a lead customer for the 777 and after fierce competition, United Airlines selected Boeing and the 777 for a $22BN deal that effectively launched the 777 program from concept into reality.

United Airlines and Boeing did write a formal contract, but the essential agreement upon which United Airlines would award Boeing their business was the famous “Condit-Guyette” memo. This hand-written note, drawn up in the early morning hours by United Airlines executive James Guyette punctuated several days of competitive negotiations between Airbus, McDonnell-Douglas and Boeing.

Signed by both Boeing and United Airlines executives, it stated that Boeing and United would work together to design a new service-ready airplane.

Thanks due to the PBS documentary flashing the original memo on the screen for a few seconds we can reverse-engineer what it actually said:


The Condit-Guyette memo, transcribed

B777 Objectives

United + Boeing + Pratt-Whitney

In order to launch on-time a truly great airplane we have a responsibility to work together to design, produce and introduce an airplane that exceeds the expectations of flight crews, cabin crews and maintenance and support teams and ultimately our passengers and shippers.

From day one:

– Best dispatch reliability in the industry

– Greatest customer appeal in the industry

– User friendly and everything works

October 15, 1990

Signed by United Airlines, Pratt-Whitney and Boeing executives


Seems quite reasonable from a customer perspective yet unrealistic from a product development standpoint. What a great way to kick off a project, and it doesn’t get any clearer than that: customer collaboration over everything else.

Responding to Change over Following a Plan

Although the design of the 777 followed a master production plan, we can see a few ways where Boeing and its suppliers expected and responded to change within the boundaries of that plan.

As many as 8 airline customers had full-time representatives sitting alongside Boeing in Seattle, with British Airways peaking at 75 people integrated with Boeing on the 777 program. 1,500 design features were reviewed with the airlines, and changes were made to 300 of them as a result.

As another example, for the first time Pratt & Whitney (the jet engine manufacturer) held several open design reviews where they invited airplane mechanics from customer airlines to critique and provide feedback about the serviceability of their new engine. This helped reduce human error in maintenance, and helped build confidence with the customer airlines that they would have a reliable and serviceable engine on the 777.

There are many other stories of requirement change, such as the rudder team in Australia which had to endure changing requirements twice after rudder manufacturing had started. Those are the big and visible ones. But how would we deal with all the small day-to-day changes?

On a day-to-day basis the DBT’s would deal with new and emerging information, and had put in place fast feedback loops between manufacturing and design engineers. Change requests which normally took weeks to process were handled in a single day with the DBT approach. When you can collaborate quickly, it is also much easier to implement (and undo) changes.

Postscript

The 777 program is a technical and commercial success, and has garnered numerous innovation awards. Unfortunately Boeing did not push the “Working Together” model across the company. The amount of change, training and continuous attention experienced in the 777 program was quite high, so it was decided to leave it optional for each development program to decide. Of the next 3 development programs, only one implemented the “Working Together” model. Judging from the 787 impressions (numerous delays and early equality issues, fleet groundings), it seems a lot of the 777 lessons have been forgotten or ignored.

Alan Mulally himself later joined Ford in 2006 as President and CEO. There he continued the corporate turnaround with similar management methods. It just goes to show: principles are transferable but practices are not.

References and Resources

You can find out quite a bit about the 777 development program if you just do a bit of searching. Here are some of the links/resources which i came across:

I don’t watch Netflix but I am a Big Fan

Move over Apple and Google, I think I have an new favorite company.

I only occasionally watch movies so I don’t have a Netflix subscription, but the company caught my attention at the Agile2013 conference. Gareth Bowles gave a very interesting talk on Netflix’s “self-service build and deployment” infrastructure. What intrigued me was the level of empowerment and the trust model in place, centered on “freedom and responsibility”. Subsequently I’ve noticed more and more reports on Netflix that fill in the pieces for a more complete picture.

Unmatched levels personal freedom and trust – although with corresponding levels of accountability, a conspicuous lack of pre-deployment verification of new features, and a company that goes out of their way to disable their own product in front of their customers to force themselves to get better. It’s not crazy, it’s Netflix – and it seems to work.

Managing 700 engineers working on a product line which serves 44 million picky customers in 40 countries obviously requires a lot of strict governance, verification, quality checks, processes and oversight, or… perhaps something completely different?

I can only observe from the outside, but from where I stand, the Netflix approach boils down to: assume success-path the majority of the time and deal quickly with the rare failure cases when they happen. Invest in the necessary infrastructure so that you can achieve high quality at high speed with low overhead. And stick to it.

The result is real Agility, but it takes commitment and conviction. The Netflix approach is not for everyone, but it should provide inspiration for us all to think about novel and counter-intuitive solutions.

Look at Netflix from the viewpoint of the values promoted by the Agile Manifesto and Lean Product Development (as summarized by Dean Leffingwell’s SAFe House of Lean).

Agile Manifesto and Lean Values

“Individuals and interactions over processes and tools”
“The most efficient and effective method of conveying information to and within a development team is face-to-face conversation”

Netflix houses all of their 700 engineers in Los Gatos, California – part of Silicon Valley. They only hire senior staff and pay “top-of-market” compensation. There are no outsourcing or low-cost development sites to balance the burn-rate. They must have the most expensive labor force in the most expensive labor market. If you can stomach the burn-rate, you can have a highly skilled co-located group in the U.S. It would be hard to imagine a more expensive setup, but Netflix have understood that it’s not about the labor rate, it’s about the ROI on your R&D dollars. To enable quality and agility, they are willing to pay a high premium.

“Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done”
“Our highest priority is to satisfy the customer through early and continuous delivery of valuable software”

The HR policies and resulting culture of Netflix create a high-trust, high-performance and high-accountability environment. There is a breed of engineers that tend to like this kind of environment and perform at their best when they know they are trusted to freely work on something that creates value.

For example, any engineer at Netflix can push a code change to the live network at any time. Within two hours the change has gone through automated testing and through to deployment, into the customer’s hands. That is truly a high trust model considering the huge customer base served and the potential business damage that could be done.

It’s not as wild-west as you might imagine. Rather than going through layers of verification before deployment, Netflix manages the introduction of the new code carefully. Netflix pushes the change to a small number of customers first, monitors the behavior and either pulls the change back or widens the deployment based on continuous monitoring of a select set of metrics. Deploy with a limited scope, then either widen the scope or pull back depending on the results. I would imagine that engineers which repeatedly make mistakes don’t last long at Netflix, but they are open about that and if you can take the responsibility then you partake in the benefits. Not everyone’s cup of tea, for sure, but it enables fast and continuous delivery of new functionality.

Lean Goal: sustainably shortest lead time. Best quality and value to people and society.

With a 2-hour build and release cycle it is hard to imagine any quicker path to deployed software. I don’t have any quality data, but you couldn’t maintain a working system like this with poor quality. Netflix added 4 million subscribers in the last quarter of 2013, which couldn’t happen if their product had a poor reputation or didn’t perform as advertised.

Lean Pillar: respect for the individual

Netflix’s development and administrative policies are heavily weighted towards freedom in a trusted environment. There is corresponding responsibility, of course, but for those who accept and thrive in this situation, it translates into high trust and respect for the individual. If you don’t perform at the top of your game then your future at Netflix is less than certain. I would normally not say that this is a respectful approach, but Netflix are quite clear on their core values and the competitive nature of their environment. Anyone going to Netflix go there with eyes wide open, and in such a case I think it actually is an ok, open and honest way to go about it. You may not agree with the core values, but there’s nothing disrespectful about it.

Lean Pillar: Product Development Flow

Netflix has managed to take a huge step forward in achieving overall flow by (1) not batching individual code changes together for verification but releasing in small increments, and (2) removing the customary big-bang integration/verification phase. Not all code changes will break something. In fact, Netflix has recognized that there are many more passed tests than failed tests in the average project. If 90% of the tests pass, then why burden the project with anything but the 10% that reveal failures? Since we don’t know where the 10% hides, the straightforward thing to do is just to test it all. If you already have high quality in place, then the Netflix approach of releasing and then finding and resolving the 10% failure cases quickly is elegant. And probably less costly. Certainly it is faster for the 90% of features that deeply without problems.

Lean Foundation: Leadership

Kudos to the management team at Netflix. They have to really commit and have conviction that a counter-intuitive approach will work. Instead of putting more and more heavy layers of inspection and verification into their process, they are erring on the side of being too light. Instead of managing with a traditional heavy-handed approval culture, they focus on enabling smooth flow and high speed.You can see it working in a small startup… but a company with more than $4 Billion annual revenue?

Companies tend to calcify and become bureaucratic as they grow, yet Netflix has some of the most relaxed business policies around. Rather than degrading into a mess, in the right environment this can enable high performance.

High quality is necessary to delight customers and achieve high development speeds. If you really want high quality, then you have to pay for it. In most companies this means heavy verification cycles before final release. At Netflix this means (among other things) the high labor overhead described above and Chaos Monkey.

Chaos Monkey

What a great concept! Chaos Monkey is a software program that continuously runs and disables pieces of the Netflix application. When you unexpectedly terminate part of an application you get unexpected behavior and… chaos. A good application can deal with partial shutdown gracefully, but most don’t. It’s simply too hard to predict what combination of problems will eventually crash your system. So, Chaos Monkey runs continuously and does its mad thing until something crashes. Chaos Monkey keeps regular office hours, so this way Netflix can be prepared and deal with the problems during the workday instead of in the middle of the night when emergency really strikes.

Maybe you don’t think this is a fair test. It’s a corner case that will almost never happen. That is what most engineers I know would say and they are right. But the real world is not fair, and unexpected problems will eventually happen. If you choose to ignore the unfair cases, that is your choice – but you accept a more fragile solution and you need a bigger customer support operation.

Neat approach, but what really sold me on the idea is that Chaos Monkey is not for controlled lab environments – it runs on the live production network which serves customers. Netflix runs on Amazon Web Services (AWS), and the cloud environment can be unpredictable. Instead of relying on AWS for resilience, the vulnerabilities identified by Chaos Monkey are fixed and the Netflix application itself recovers from unexpected failures. In the first year of existence Chaos Monkey terminated 65,000 live virtual instances on AWS. As far as I can tell Netflix has a much higher availability record than other services on AWS.

It’s gutsy, an inventive solution by the R&D team and also a reflects a fundamental commitment from the management team to continuously improving the resilience of Netflix. What other company do you know which intentionally breaks their own product while in the hands of the customer?

The bottom line

The Netflix approach is novel, counter-intuitive, quality-focused in a different way and it seems to pay off. Could you replicate the Netflix culture in your company? Probably not. But maybe you, like me, take inspiration from the Netflix story to not go the easy and traditional route, but look for innovative solutions even (or especially!) if it breaks with conventional knowledge. It should give you pause to think about your current setup and what economic model you are following: are you simply chasing the lowest possible labor rate, or are you more concerned about the overall return on investment?

Lean/Agile Product Development requires a different investment mindset. If you invest with product development flow in mind (regardless of your outsourcing situation) then the benefits are not 5% or 10% improvement, but 50% to 100% or more. The solution is not in lowest possible labor rate (although that always helps) but in the highest possible ROI on the next R&D dollar.

Welcome Post

Hello and Welcome to the Lean Viking’s blog.

Here you will find thoughts and ideas on Lean and Agile Product Development, inspired by my affinity for minimalistic designs, Lean principles and Agile methods.

There’s lots of information out there about how Lean and Agile development works. But how do you get there if you are entrenched in a waterfall-based culture today? And what if you’re working in a large-scale environment? Wishing for Agile won’t make it so, especially if you are trying to scale up beyond a handful of teams. Agile at large scale is difficult… but is there really a viable long-term alternative? Organizations move to Agile because things aren’t working well in the Waterfall model. Those troubles are magnified at scale. The good news is that improvements are magnified as well, and scale economics work their wonders.

There is much hard work to do and a lot of it involves changing fundamental behavior. But what could be more rewarding that moving a whole organization to a new level of performance?

My hope is that you will follow this blog as an active participant. The purpose of this blog isn’t to give me a soapbox platform. I am blogging because I want to share my thoughts and hear what y’all have to say about Agile at scale.

Cheers,

Odd