The Formula-1 Pit Stop: Lean counter-counter-intuition

What does F1 racing and LeaF1-Leann Product Development have in common? Not much at the surface… but if you, as I do, see everything as systems then you can’t help but notice some interesting things. In this case I found an apparent counter-example to Lean that is Lean despite going against the grain of traditional Lean Thinking. Hmmm… we’re into double-negatives here but stay with me.

The racing analogy is interesting to me not because “Lean=Speed” but because someone questioned the “obvious right way” and came up with a counter-intuitive better solution.

640px-2010_Canadian_GP_race_startI am always amazed every time I catch even a glimpse of Formula-1 racing. The cars fly around the track at up to 300 miles per hour, pull 1.45g during acceleration and 4g when braking. High speeds, tight turns and frequent acceleration and braking wears hard on car and driver, but even more so on the tires which aren’t even engineered to last a whole race.

An F1 tyre these days is designed to only last for about 120 kilometers on average (it’s a weight vs. durability tradeoff), but most F1 races are at least 305 kilometers long. That means you need to change tires 2 or 3 times in a race that is won or lost by fractions of a second.

I’m fascinated by this, of course, because the pit stop is the biggest impediment to continuous flow around the track. If you could make one less pit stop than your competitor you would be several seconds ahead, turning 10th place into victory. For a long time that was the strategy: go easy around the curves so as to conserve tires and fuel. A lower average speed would win the race as long as you could avoid making too many pit stops. So far it sounds Lean, right? Slow down and you’ll finish the race faster. No surprise there: Lean solutions are usually counter-intuitive.

Twisting and Turning

Well, every good story has a twist and our little F1-Lean analogy is no different.

In the mid-1980s someone took a step back and looked at the whole end-to-end McLaren_pit_work_2006_Malaysiasystem and realized that the “economy” of racing could be improved. Start the race with only half a tank of fuel, and the car would be much lighter and go faster. Stop worrying about conserving tires and instead push the car to the limits on the track. The penalty for this strategy is an added number of pit stops.

Not a problem – you just need to minimize the time spent in each pit stop.

It’s a tradeoff curve, as always. If you can continuously reduce the amount of time needed to refuel and swap tires, then at some point down the curve  the wear-and-tear vs. pit stop balance will shift.

Now it gets interesting. Because we now have two different models of “good” racing strategy, we have to choose – but how? We have to take a systems view, and make the decisions based on the objective merits of each strategy, not by intuition or personal preference.

Yes but we’re talking about Product Development, right?

You see analogies in Product Development all the time. Whenever there is a bottle-neck in our process we have to decide to either fix/improve the bottle-neck, or try to avoid it. Not all bottle-necks are solvable or even visible. Some are disguised as “the way we always do things here”. Most companies have settled on a particular pattern of which bottle-necks (departments/phases) are reasonable (acceptable as the cost of doing business) and which ones aren’t. Companies that structure the development flow through phases and gates accept the overhead associated with functional departments as “not perfect but the best way to do things”.

Lean Thinkers challenge this view all the time: figure out where the the flow stops and then improve it. Usually it comes down to finding local optimizations and then reducing or eliminating tasks in the name of overall flow. For example, eliminating pit stops so that the race can flow un-interrupted.

And sometimes Lean Thinkers have to challenge themselves, to avoid getting stuck in the “best” Lean solution.

So where did Formula-1 end up?

Have a look at this Ferrari pit stop from 2013. Mid-race refueling is no longer allowed in F1, so now it’s all about how quickly you can get 4 new tires on the car. You can feel the anticipation as you watch the pit crew waiting for the car to arrive.

That pit stop took 2.1 seconds. It’s a huge improvement from the minute-plus pit stops of the early days of F1 in the 1950s. Pit crews spend a lot of time and money to squeeze out every millisecond they can from their process. You can almost visualize a value stream map on the garage wall and the team swarming to find and reduce the next bit of waste from the process.

Is it a fair analogy?

But, many will say, that’s not a fair analogy. Obviously they have a lot of specialized equipment and a huge crew of specialists standing by. This is high-stakes racing and has nothing to do with Lean or product development.

Well that’s the whole point behind this blog post. It’s a classic example of how Lean thinking differs from mass-production thinking. It just manifests itself in a different environment. Speed (and safety) matters in F1, and speed (and safety) matters in Lean.

Systems of systems: Lean Fractals

There is a very direct and obvious application of Lean Thinking and Value Streams to what happens in the pit. If you have seen a Value Stream Map, you can easily understand how that helps weed out waste and inefficiencies in the process flow of safely and quickly changing the tires.

But there is another higher-level system at play, which is includes both the pit stop and the track. This system-of-systems has a different kind of flow economy, working off a different set of aggregated information.

If we ignore the system-of-system effect and blindly apply Lean tools it can lead you astray. Pre-1980s racing solved for Lean flow based on how the costs  were incurred back then, i.e. relatively sloold pit stopw pit stops. Slow down around the track and finish in first place. However the cost of any activity changes over time, and the technology and capability of the pit crew improved such that the basic assumptions behind “the best way to race” had to change.

In this case they had to question some long-held beliefs assumptions in order to get to a new level of performance. Similarly we find the biggest improvements hiding in plain sight when we look at our overall product development system and question our traditional way of working. Our world is full of systems-of-systems.

I draw two lessons from the F1-Lean analogy:

Question the Status Quo

The obvious approach isn’t always right, and you won’t see it until you look at the whole end-to-end system. It is counter-intuitive that adding one more time-consuming pit stop to your race will speed things up overall… but it does – up to a point.

Even if you settle on a “best” Lean solution, this might also be a local optimum… keep looking and keep questioning. The journey through solution space is not always a linear one.

Invest in non-mainstream efforts

In order to lean out the value stream you may have to invest and focus on non-mainstream efforts such as tooling and supporting activities. This is the stuff of overhead. It’s hard to justify increasing the overhead cost when the normal pressure is to reduce expenses. But in a Lean system the right overhead isn’t a liability – it’s what enables the mainstream to go fast. The F1 teams had to shift investment away from car and engine design and onto the less-glorious pit crew tools and processes.

The difference is in the kind of overhead: instead of traditional overhead which is needed to manage the waste generated by stitching together the work of separate departments, Lean “overhead” is there to make the value stream flow faster at higher quality.

Yes it’s a fair analogy

To return to the point above, yes the analogy is totally fair. You can usually make your product development progress much faster if you invest in af1-pit-crewncillary processes and tools with an eye towards end-to-end Flow. The Ferrari team in the video clip shows 21 crew members, all working franticly for less than 3 seconds. It takes 3 people per tire to do the job. Wasteful resource utilization? Not if time matters and your objective is to get the car through the pit stop quickly. Sure, we obviously can’t allocate 21-person support teams for all tasks, but the idea is to figure out what it takes to achieve the best end-to-end flow and then invest accordingly.

When I first started developing software we had bi-weekly load builds because it was such a difficult thing to get a clean build with 40 engineers all submitting several weeks of code changes, and build servers were very expensive. We avoided load build “pit stops” until they were absolutely needed.

Gradually the situation changed, and we now have efficient continuous build and test environments and the ability to submit code changes daily. The “economy” of our pit stop changed. It was initially not easy convincing management to invest in extra build servers, tools and staff – but eventually we met the point on the tradeoff-curve where investing in ancillary things like load builds made more and more sense. Now every serious software group has a DevOps setup.

Local vs. Global Optimization

There is a balance between local and global optimizations, and the balance can
shift over time. You can’t even grasp the concept of that balance unless you look at the end-to-end system flow. Chances are that you are only looking at improvements within your own department. If you are, then you might routinely leave improvements of an order of magnitude or more on the table. Look one level up, at the system-of-systems, and see what you can find.

Advertisements

Lean Systems: Antifragile Applied

“Systems subjected to randomness—and unpredictability—build a mechanism beyond the robust to opportunistically reinvent themselves each generation”

– Nassim Nicholas Taleb

In a previous post I introduced the concept of Antifragility – systems that benefit from shocks, randomness and disorder. Classifying the world in the triad of Fragile – Robust – Antifragile helps us understand and manage the potential impact of the uncertainty surrounding us.

It’s initially hard to imagine that anything useful could benefit from disorder, so the first thing to realize is that although objects and things can be Fragile or Robust, they can’t be Antifragile. Systems, on the other hand (which if course includes Product Development Systems) are made up of multiple interacting components. Systems exhibit behavior as they respond to their surroundings, and can be Fragile, Robust or Antifragile. It is this ability to respond and interact that opens the door to antifragility. Antifragility can bee seen as a type of evolutionary mechanism, continuously picking the best of the available options. So, when we look for examples of antifragility we need to look at systems, not objects.

Stressors: the fuel of Antifragility

A stressor is something that puts a strain on the system, pulls it away from its equilibrium. It’s the system’s response to stressors that classifies the system as either Fragile, Robust or Antifragile.

A system that gets weaker from the encounter with the stressor is Fragile. For example, a pyramid scheme collapses when exposed to the light of day. Not only dictators (the individual) but the foundation of the dictatorship (the system) crumbles when the forces of democratic thought are applied. The best-laid project plan with all its gantt-charts has a best-before date sometime before the first problem is discovered.

Robust systems neither gets weaker nor stronger in the presence of a stressor. Most government bureaucracies seem to fall in this category – their inability to learn and evolve astounds me, as does their unequaled staying power. Many companies operate in this way too. New ideas get rejected and expelled by the corporate immune system, allowing the company structure to stay the same even in the face of certain bankruptcy. Remember Kodak? GM?

Antifragile systems on the other hand enjoy randomness and stressors, at least up to a point. Shocks and disruption make them stronger because they keep the system alert and in shape. Stressors exercise and improve the system the same way physical activity stresses and improves your body. Strength training, for instance, involves pushing your muscles just past their breaking point. Your body is able to repair this damage and even over-shoots in the repair effort. The result is that you are left with a little more muscle mass than you had before. This is how Schwarzenegger became Schwarzenegger and Ahnold was again a cool and acceptable name for your first-born. Without these stressors the system would stagnate, much like a couch-potato grows the wrong kind of body mass and ends up with clogged arteries.

Of course, there is a limit to how much stressors are beneficial. Running at a reasonable effort level puts you in better shape; the first marathoner supposedly expired at the goal line, having historically over-exerted himself to deliver with his last gasp the one-word message to the king: “victory”.

(hang on – if they won the battle, then why the life-and-death rush? Good news would still be reasonably good the next morning, right?)

The next important thing to understand about Antifragile Systems is that they work in layers. It is not enough that individual members get stronger, the system as a whole needs to be able to survive and thrive. It needs to be able to learn and select.

It’s in the DNA of the System

Going back to our example of Mother Nature as the ultimate antifragile system, we can observe that the individual member of a species are inherently fragile. In fact, each member will eventually die off, no matter how strong it is. There is a natural turnover to make room for the newer and more fit members. By natural selection and replacement of individuals the system becomes more and more fit. There is a layering effect here. Individual members (at the lowest layer) compete with each other. The strong propagate their DNA and have (presumably) stronger offspring, the weaker gradually (or abruptly, as the case may be) exit the gene pool. The system as a whole (at a higher layer) grows stronger as a result. The system survives the demise of each of its members because the information that makes up the system is preserved in its DNA, surviving generation after generation of individuals.

By evolution such a system improves gradually even if there is no master plan and things happen at random. The system continuously Inspects and Adapts, and the current “best recipe” is carried forward in our DNA. As long as we recognize and seize opportunity, even a random walk will be beneficial. Antifragile systems love errors and variation for that reason.

Lean Systems: Fragile?

Lean systems are called Lean because they deliberately operate with very small error margins. For example, Lean Manufacturing systems are sometimes called “zero-inventory” systems because they have almost no buffer inventory to absorb variations and problems at individual stations. If there is a problem somewhere on the production line, the whole system could shut down. This is by design: in a tightly coupled system small problems are amplified to make them painfully obvious, and every problem becomes an urgent matter.

In one sense Lean systems are therefore very fragile to disorder and error so one might be tempted to simply put Lean in the Fragile category. But it’s not that simple. The antifragility of Lean is in the DNA of the system.

Lean Systems: Antifragile

So we need to reconcile the apparent fragility of the small operating margins of a Lean system with the claim that Lean systems are antifragile.

I like Steven Spear’s (The High Velocity Edge) summary of a good Lean implementation:

  1. Build a system of “dynamic discovery” designed to reveal operational problems and weaknesses as they arise
  2. Attack and solve problems when and where they occur, converting weaknesses into strengths
  3. Disseminate knowledge gained from solving local problems throughout the company as a whole
  4. Lead by developing capabilities 1, 2 and 3

The ingenuity and beauty of Lean is that even small problems become intolerable at the system level. Lean Systems use this fragile tight coupling as a way to accelerate system-level learning. If a problem develops, it immediately becomes painfully obvious that something is wrong.

Rather than working around or ignoring these small problems, the team in charge is obligated to immediately seize the opportunity to improve the way the system works before the small problem becomes a big problem. A good lean team will swarm the problem to get it fixed, and put in place measures to ensure that similar problems don’t occur in the future. The result is that the particular process step which failed now has improved and is less likely to fail in the future.

Antifragile systems love errors, and so do Lean systems. The fragility of small error tolerances acts as a forcing function which brings problems to the surface, causing the old faulty processing step to evolve and be replaced with a new and more fit one. Each small failure alters the DNA of the Lean system just a little bit, evolving and improving. One more problem spot has been eliminated, and the probability of future defects is reduced.

So here is a perfect example of a system that is designed evolve over time, to learn from mistakes and to grow more capable after each error. It needs no top-down direction other than living the Lean Principles. There is no master plan, yet Lean systems evolve on their own to become the most competitive and effective man-made systems we have on our planet.

Evolving. Learning. Antifragile. Lean. Wonderful.

If it’s a Pipeline, it’s leaking

Many times we view the Product Development System as a Pipeline where we pour effort and energy in, and out comes a product sometime later. You’ve probably used this analogy before, talking about “products in the pipeline” or “the R&D pipeline”.

Pipe-1

Seems pretty intuitive, and I use that analogy too. Except I recently thought perhaps the analogy isn’t quite right. If you’re working in a Waterfall or phase-gate process, it’s not a single pipeline. It’s a series of smaller pipe lengths which are joined together by hand-offs:

pipe-2

The trouble with hand-offs is that they generate waste.Throughout the journey there can be more energy lost in hand-offs than actually make it out of the pipeline. In every joint, effort and energy leaks out.

Pipe-3

I find this analogy is a little more fitting, and although it’s a simple visual it helps make the point about hand-offs at the simplest possible level. The discussion usually turns to “what are the leaks and how we stop them” and there is your entry to discuss Lean and Waste.

What do you think?