The Formula-1 Pit Stop: Lean counter-counter-intuition

What does F1 racing and LeaF1-Leann Product Development have in common? Not much at the surface… but if you, as I do, see everything as systems then you can’t help but notice some interesting things. In this case I found an apparent counter-example to Lean that is Lean despite going against the grain of traditional Lean Thinking. Hmmm… we’re into double-negatives here but stay with me.

The racing analogy is interesting to me not because “Lean=Speed” but because someone questioned the “obvious right way” and came up with a counter-intuitive better solution.

640px-2010_Canadian_GP_race_startI am always amazed every time I catch even a glimpse of Formula-1 racing. The cars fly around the track at up to 300 miles per hour, pull 1.45g during acceleration and 4g when braking. High speeds, tight turns and frequent acceleration and braking wears hard on car and driver, but even more so on the tires which aren’t even engineered to last a whole race.

An F1 tyre these days is designed to only last for about 120 kilometers on average (it’s a weight vs. durability tradeoff), but most F1 races are at least 305 kilometers long. That means you need to change tires 2 or 3 times in a race that is won or lost by fractions of a second.

I’m fascinated by this, of course, because the pit stop is the biggest impediment to continuous flow around the track. If you could make one less pit stop than your competitor you would be several seconds ahead, turning 10th place into victory. For a long time that was the strategy: go easy around the curves so as to conserve tires and fuel. A lower average speed would win the race as long as you could avoid making too many pit stops. So far it sounds Lean, right? Slow down and you’ll finish the race faster. No surprise there: Lean solutions are usually counter-intuitive.

Twisting and Turning

Well, every good story has a twist and our little F1-Lean analogy is no different.

In the mid-1980s someone took a step back and looked at the whole end-to-end McLaren_pit_work_2006_Malaysiasystem and realized that the “economy” of racing could be improved. Start the race with only half a tank of fuel, and the car would be much lighter and go faster. Stop worrying about conserving tires and instead push the car to the limits on the track. The penalty for this strategy is an added number of pit stops.

Not a problem – you just need to minimize the time spent in each pit stop.

It’s a tradeoff curve, as always. If you can continuously reduce the amount of time needed to refuel and swap tires, then at some point down the curve  the wear-and-tear vs. pit stop balance will shift.

Now it gets interesting. Because we now have two different models of “good” racing strategy, we have to choose – but how? We have to take a systems view, and make the decisions based on the objective merits of each strategy, not by intuition or personal preference.

Yes but we’re talking about Product Development, right?

You see analogies in Product Development all the time. Whenever there is a bottle-neck in our process we have to decide to either fix/improve the bottle-neck, or try to avoid it. Not all bottle-necks are solvable or even visible. Some are disguised as “the way we always do things here”. Most companies have settled on a particular pattern of which bottle-necks (departments/phases) are reasonable (acceptable as the cost of doing business) and which ones aren’t. Companies that structure the development flow through phases and gates accept the overhead associated with functional departments as “not perfect but the best way to do things”.

Lean Thinkers challenge this view all the time: figure out where the the flow stops and then improve it. Usually it comes down to finding local optimizations and then reducing or eliminating tasks in the name of overall flow. For example, eliminating pit stops so that the race can flow un-interrupted.

And sometimes Lean Thinkers have to challenge themselves, to avoid getting stuck in the “best” Lean solution.

So where did Formula-1 end up?

Have a look at this Ferrari pit stop from 2013. Mid-race refueling is no longer allowed in F1, so now it’s all about how quickly you can get 4 new tires on the car. You can feel the anticipation as you watch the pit crew waiting for the car to arrive.

That pit stop took 2.1 seconds. It’s a huge improvement from the minute-plus pit stops of the early days of F1 in the 1950s. Pit crews spend a lot of time and money to squeeze out every millisecond they can from their process. You can almost visualize a value stream map on the garage wall and the team swarming to find and reduce the next bit of waste from the process.

Is it a fair analogy?

But, many will say, that’s not a fair analogy. Obviously they have a lot of specialized equipment and a huge crew of specialists standing by. This is high-stakes racing and has nothing to do with Lean or product development.

Well that’s the whole point behind this blog post. It’s a classic example of how Lean thinking differs from mass-production thinking. It just manifests itself in a different environment. Speed (and safety) matters in F1, and speed (and safety) matters in Lean.

Systems of systems: Lean Fractals

There is a very direct and obvious application of Lean Thinking and Value Streams to what happens in the pit. If you have seen a Value Stream Map, you can easily understand how that helps weed out waste and inefficiencies in the process flow of safely and quickly changing the tires.

But there is another higher-level system at play, which is includes both the pit stop and the track. This system-of-systems has a different kind of flow economy, working off a different set of aggregated information.

If we ignore the system-of-system effect and blindly apply Lean tools it can lead you astray. Pre-1980s racing solved for Lean flow based on how the costs  were incurred back then, i.e. relatively sloold pit stopw pit stops. Slow down around the track and finish in first place. However the cost of any activity changes over time, and the technology and capability of the pit crew improved such that the basic assumptions behind “the best way to race” had to change.

In this case they had to question some long-held beliefs assumptions in order to get to a new level of performance. Similarly we find the biggest improvements hiding in plain sight when we look at our overall product development system and question our traditional way of working. Our world is full of systems-of-systems.

I draw two lessons from the F1-Lean analogy:

Question the Status Quo

The obvious approach isn’t always right, and you won’t see it until you look at the whole end-to-end system. It is counter-intuitive that adding one more time-consuming pit stop to your race will speed things up overall… but it does – up to a point.

Even if you settle on a “best” Lean solution, this might also be a local optimum… keep looking and keep questioning. The journey through solution space is not always a linear one.

Invest in non-mainstream efforts

In order to lean out the value stream you may have to invest and focus on non-mainstream efforts such as tooling and supporting activities. This is the stuff of overhead. It’s hard to justify increasing the overhead cost when the normal pressure is to reduce expenses. But in a Lean system the right overhead isn’t a liability – it’s what enables the mainstream to go fast. The F1 teams had to shift investment away from car and engine design and onto the less-glorious pit crew tools and processes.

The difference is in the kind of overhead: instead of traditional overhead which is needed to manage the waste generated by stitching together the work of separate departments, Lean “overhead” is there to make the value stream flow faster at higher quality.

Yes it’s a fair analogy

To return to the point above, yes the analogy is totally fair. You can usually make your product development progress much faster if you invest in af1-pit-crewncillary processes and tools with an eye towards end-to-end Flow. The Ferrari team in the video clip shows 21 crew members, all working franticly for less than 3 seconds. It takes 3 people per tire to do the job. Wasteful resource utilization? Not if time matters and your objective is to get the car through the pit stop quickly. Sure, we obviously can’t allocate 21-person support teams for all tasks, but the idea is to figure out what it takes to achieve the best end-to-end flow and then invest accordingly.

When I first started developing software we had bi-weekly load builds because it was such a difficult thing to get a clean build with 40 engineers all submitting several weeks of code changes, and build servers were very expensive. We avoided load build “pit stops” until they were absolutely needed.

Gradually the situation changed, and we now have efficient continuous build and test environments and the ability to submit code changes daily. The “economy” of our pit stop changed. It was initially not easy convincing management to invest in extra build servers, tools and staff – but eventually we met the point on the tradeoff-curve where investing in ancillary things like load builds made more and more sense. Now every serious software group has a DevOps setup.

Local vs. Global Optimization

There is a balance between local and global optimizations, and the balance can
shift over time. You can’t even grasp the concept of that balance unless you look at the end-to-end system flow. Chances are that you are only looking at improvements within your own department. If you are, then you might routinely leave improvements of an order of magnitude or more on the table. Look one level up, at the system-of-systems, and see what you can find.

If it’s a Pipeline, it’s leaking

Many times we view the Product Development System as a Pipeline where we pour effort and energy in, and out comes a product sometime later. You’ve probably used this analogy before, talking about “products in the pipeline” or “the R&D pipeline”.

Pipe-1

Seems pretty intuitive, and I use that analogy too. Except I recently thought perhaps the analogy isn’t quite right. If you’re working in a Waterfall or phase-gate process, it’s not a single pipeline. It’s a series of smaller pipe lengths which are joined together by hand-offs:

pipe-2

The trouble with hand-offs is that they generate waste.Throughout the journey there can be more energy lost in hand-offs than actually make it out of the pipeline. In every joint, effort and energy leaks out.

Pipe-3

I find this analogy is a little more fitting, and although it’s a simple visual it helps make the point about hand-offs at the simplest possible level. The discussion usually turns to “what are the leaks and how we stop them” and there is your entry to discuss Lean and Waste.

What do you think?

Why not SAFe?

So let me get this out right away: I’m a SAFe proponent. But hang on – it doesn’t make me a “pure-Agile” non-supporter.

I have my PSM certification from scrum.org and my SPC certification from Scaled Agile Academy, so I feel comfortable enough to comment on the discussion. The discussion is not as intense as it was a few weeks ago, but I sense that the residual effect is for some to bypass SAFe as even an option to consider.

It’s too bad that what would otherwise be a healthy discussion surrounding SAFe and “pure Agile” is causing division. Peer review and feedback is the best way to ruthlessly drive things forward and evolve, but if we allow discussion to degrade into both (or one) sides digging their heels in and simply trying to convince the rest of the world that their approach is best instead of understanding each other and move forward, the value of discourse quickly dissipates.

On this front, Dean and SAFe scores the point: every time I heard Dean speak about SAFe, he has been very clear that there is a lot to be gained from being pragmatic and tolerant of other’s views. Rather than taking a polarizing “my-way-or-the-highway” stance, the creators of SAFe already has an advantage here that will work in their favor in the long run.

So what is it about SAFe that makes it worth the investment, that other Agile approaches haven’t covered well enough yet? In my view…

SAFe addresses the people not on the Agile team.

This insight by itself should be enough to pique your interest. Most Lean/Agile material I’ve seen focuses on how to implement Agile methodologies, but very little attention is paid to the stakeholders outside the team. Yes I know, there are exceptions out there, but generally I feel that we don’t pay enough attention to the folks outside the immediate development team.

SAFe addresses these stakeholders as first-class citizens, not as impediments to progress that have to be “turned around”. That’s a smart thing. Those “impediments” often sign your paycheck, and if they are not comfortable with what’s happening, your Agile Adventure will be short-lived or at least very limited in reach. Let’s invite them to contribute in a format they can relate to and be effective in.

SAFe has the right ingredients and building blocks

  • Lean principles as the leadership backdrop
  • Scrum as the proven team-level project management framework
  • XP-inspired coding practices
  • Don Reinertsen’s Principles of Product Development Flow

You may have a different set of favorites, but when I saw the above 4 ingredients in the same recipe, I definitely paid attention. Ever since I read and re-read Don’s book I’ve been on the lookout for applications of this really wonderful and clarifying set of ideas.

SAFe is for large programs.

The “S” in SAFe stands for “Scaled” – a point which I feel is lost in some of the SAFe critics. I have spent all my working life in large organizations in both software development and project/program management roles, and I can confidently say that if you want to manage anything at a scale beyond a handful of people you will need some sort of framework to keep everything aligned. Simply relying on a shared set of values and a common goal isn’t enough. Multiple teams simply won’t self-organize in a single direction. It doesn’t matter whether you are doing Agile or not – large groups of people working together is going to require alignment and guidance to enable some sort of orchestrated delivery.  As Reinertsen points out: “there is more to be achieved from overall alignment than local optimization”.

Statistically we will have some success-stories where overall alignment happens in an organic manner, but don’t count on it for your project. You N. N. Taleb fans out there know what I mean.

SAFe pulls it all together and communicates a holistic view

Again, selling Lean/Agile to folks outside the Agile teams is where you either achieve an Agile enterprise transformation, or you don’t. You don’t have to “sell” Agile to the teams themselves, they are usually the ones that get onboard first. If you want to scale Agile to the enterprise-level, you have to sell the idea to those that have the most influence.

The value of having a single-page graphic which acknowledges the various roles and the main flow of events can’t be overstated. Here, for the first time, we have a map of the world in which everyone in the existing organization can see a place for themselves in an Agile setup, post-transformation. That immediately defuses a lot of the potential friction and apprehension associated with introducing Agile to the C-level office and the middle management layer. You know, the ones who decide what you can’t and can’t do in the enterprise. Kind of important to get those folks on your side. Showing the full context of an Agile Release Train is definitely much easier than selling the idea of “Agile self-organizing” teams.

SAFe may not be for you

…and that’s ok.

If you’re starting out and introducing Agile to a small organization, you don’t need or want SAFe. If you’re working with a small set of teams, it’s probably not for you. That’s ok – SAFe starts to make sense at scale. Keep that in mind as you grow.

SAFe is not a one-size-fits-all solution. Neither would you want one. Personally I just need a model that fits my situation and environment. Most of us work in one company, and in one environment. Chances are slim that we will need a large-scale Agile solution one day and a small-team solution the next. If SAFe works for what I need to do, then that is fine. If not, then move on and find something else. But please don’t make it your mission to convince others (who you don’t even know) that some approach or other is not an option for them. SAFe is indeed an option, and so is “pure Agile” and any other variant such as DAD, but only you can decide which one is right for you.

Use SAFe (or any other model) responsibly

A lot of the SAFe criticism I’ve seen seems to assume that our teams don’t think for themselves, but rather assumes SAFe would be adopted blindly and without real-world context.

Isn’t that anti-Agile? Thinking people can understand when to apply the mechanisms offered by SAFe and when to do something else. Not all aspects of SAFe will work for you, but it would be a shame to throw out 100% of the model just because there is 10% that doesn’t work for you. Use the parts that are helpful to you, discard the rest (for now). Whatever defined approach you choose, you’ll need to tune it to your particular situation anyway. I think Dean and the SAFe crew would agree.

…which brings me to the point of this blog post

Although a polarizing discussion is entertaining and can provoke some great insights, at some point we reach diminishing returns, and instead of improving we alienate. It would be too bad if this argument causes some folks to bypass the good work done by the SAFe team. SAFe can provide a huge head-start for some companies, especially those who need to visualize a pathway out of the waterfall.

If you are rejecting SAFe on the basis of comments made by some Agile rock-stars, pause for a moment and put it all in context. What alternatives are they suggesting? In most companies, it’s not enough to simply rely on self-organization around a shared goal and scrum-of-scrums. At some point at scale you will eventually need some sort of framework and holistic planning involving the rest of the enterprise, whether it’s called SAFe or not.

If you are embracing SAFe because it looks less threatening to your existing organization and an easy fix, you should probably take a step back and ask yourself if you really understand what Lean/Agile is about. Your organization will likely have to change its ways, and it can be a difficult transformation.

Ultimately, I don’t see that SAFe competes for territory with pure Agile/Scrum but rather complements our current practices to enable larger-scale Agile projects. There is a big world out there that largely still hasn’t transitioned to Agile, and the sandbox is big enough for everyone to play. Let’s move Agile adoption forward in any way we can, even if it doesn’t fit a purist’s view of what Scrum should be.

If SAFe brings Lean/Scrum/Agile within reach for companies that would otherwise not even entertain the idea, how can that be a bad thing?

I don’t watch Netflix but I am a Big Fan

Move over Apple and Google, I think I have an new favorite company.

I only occasionally watch movies so I don’t have a Netflix subscription, but the company caught my attention at the Agile2013 conference. Gareth Bowles gave a very interesting talk on Netflix’s “self-service build and deployment” infrastructure. What intrigued me was the level of empowerment and the trust model in place, centered on “freedom and responsibility”. Subsequently I’ve noticed more and more reports on Netflix that fill in the pieces for a more complete picture.

Unmatched levels personal freedom and trust – although with corresponding levels of accountability, a conspicuous lack of pre-deployment verification of new features, and a company that goes out of their way to disable their own product in front of their customers to force themselves to get better. It’s not crazy, it’s Netflix – and it seems to work.

Managing 700 engineers working on a product line which serves 44 million picky customers in 40 countries obviously requires a lot of strict governance, verification, quality checks, processes and oversight, or… perhaps something completely different?

I can only observe from the outside, but from where I stand, the Netflix approach boils down to: assume success-path the majority of the time and deal quickly with the rare failure cases when they happen. Invest in the necessary infrastructure so that you can achieve high quality at high speed with low overhead. And stick to it.

The result is real Agility, but it takes commitment and conviction. The Netflix approach is not for everyone, but it should provide inspiration for us all to think about novel and counter-intuitive solutions.

Look at Netflix from the viewpoint of the values promoted by the Agile Manifesto and Lean Product Development (as summarized by Dean Leffingwell’s SAFe House of Lean).

Agile Manifesto and Lean Values

“Individuals and interactions over processes and tools”
“The most efficient and effective method of conveying information to and within a development team is face-to-face conversation”

Netflix houses all of their 700 engineers in Los Gatos, California – part of Silicon Valley. They only hire senior staff and pay “top-of-market” compensation. There are no outsourcing or low-cost development sites to balance the burn-rate. They must have the most expensive labor force in the most expensive labor market. If you can stomach the burn-rate, you can have a highly skilled co-located group in the U.S. It would be hard to imagine a more expensive setup, but Netflix have understood that it’s not about the labor rate, it’s about the ROI on your R&D dollars. To enable quality and agility, they are willing to pay a high premium.

“Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done”
“Our highest priority is to satisfy the customer through early and continuous delivery of valuable software”

The HR policies and resulting culture of Netflix create a high-trust, high-performance and high-accountability environment. There is a breed of engineers that tend to like this kind of environment and perform at their best when they know they are trusted to freely work on something that creates value.

For example, any engineer at Netflix can push a code change to the live network at any time. Within two hours the change has gone through automated testing and through to deployment, into the customer’s hands. That is truly a high trust model considering the huge customer base served and the potential business damage that could be done.

It’s not as wild-west as you might imagine. Rather than going through layers of verification before deployment, Netflix manages the introduction of the new code carefully. Netflix pushes the change to a small number of customers first, monitors the behavior and either pulls the change back or widens the deployment based on continuous monitoring of a select set of metrics. Deploy with a limited scope, then either widen the scope or pull back depending on the results. I would imagine that engineers which repeatedly make mistakes don’t last long at Netflix, but they are open about that and if you can take the responsibility then you partake in the benefits. Not everyone’s cup of tea, for sure, but it enables fast and continuous delivery of new functionality.

Lean Goal: sustainably shortest lead time. Best quality and value to people and society.

With a 2-hour build and release cycle it is hard to imagine any quicker path to deployed software. I don’t have any quality data, but you couldn’t maintain a working system like this with poor quality. Netflix added 4 million subscribers in the last quarter of 2013, which couldn’t happen if their product had a poor reputation or didn’t perform as advertised.

Lean Pillar: respect for the individual

Netflix’s development and administrative policies are heavily weighted towards freedom in a trusted environment. There is corresponding responsibility, of course, but for those who accept and thrive in this situation, it translates into high trust and respect for the individual. If you don’t perform at the top of your game then your future at Netflix is less than certain. I would normally not say that this is a respectful approach, but Netflix are quite clear on their core values and the competitive nature of their environment. Anyone going to Netflix go there with eyes wide open, and in such a case I think it actually is an ok, open and honest way to go about it. You may not agree with the core values, but there’s nothing disrespectful about it.

Lean Pillar: Product Development Flow

Netflix has managed to take a huge step forward in achieving overall flow by (1) not batching individual code changes together for verification but releasing in small increments, and (2) removing the customary big-bang integration/verification phase. Not all code changes will break something. In fact, Netflix has recognized that there are many more passed tests than failed tests in the average project. If 90% of the tests pass, then why burden the project with anything but the 10% that reveal failures? Since we don’t know where the 10% hides, the straightforward thing to do is just to test it all. If you already have high quality in place, then the Netflix approach of releasing and then finding and resolving the 10% failure cases quickly is elegant. And probably less costly. Certainly it is faster for the 90% of features that deeply without problems.

Lean Foundation: Leadership

Kudos to the management team at Netflix. They have to really commit and have conviction that a counter-intuitive approach will work. Instead of putting more and more heavy layers of inspection and verification into their process, they are erring on the side of being too light. Instead of managing with a traditional heavy-handed approval culture, they focus on enabling smooth flow and high speed.You can see it working in a small startup… but a company with more than $4 Billion annual revenue?

Companies tend to calcify and become bureaucratic as they grow, yet Netflix has some of the most relaxed business policies around. Rather than degrading into a mess, in the right environment this can enable high performance.

High quality is necessary to delight customers and achieve high development speeds. If you really want high quality, then you have to pay for it. In most companies this means heavy verification cycles before final release. At Netflix this means (among other things) the high labor overhead described above and Chaos Monkey.

Chaos Monkey

What a great concept! Chaos Monkey is a software program that continuously runs and disables pieces of the Netflix application. When you unexpectedly terminate part of an application you get unexpected behavior and… chaos. A good application can deal with partial shutdown gracefully, but most don’t. It’s simply too hard to predict what combination of problems will eventually crash your system. So, Chaos Monkey runs continuously and does its mad thing until something crashes. Chaos Monkey keeps regular office hours, so this way Netflix can be prepared and deal with the problems during the workday instead of in the middle of the night when emergency really strikes.

Maybe you don’t think this is a fair test. It’s a corner case that will almost never happen. That is what most engineers I know would say and they are right. But the real world is not fair, and unexpected problems will eventually happen. If you choose to ignore the unfair cases, that is your choice – but you accept a more fragile solution and you need a bigger customer support operation.

Neat approach, but what really sold me on the idea is that Chaos Monkey is not for controlled lab environments – it runs on the live production network which serves customers. Netflix runs on Amazon Web Services (AWS), and the cloud environment can be unpredictable. Instead of relying on AWS for resilience, the vulnerabilities identified by Chaos Monkey are fixed and the Netflix application itself recovers from unexpected failures. In the first year of existence Chaos Monkey terminated 65,000 live virtual instances on AWS. As far as I can tell Netflix has a much higher availability record than other services on AWS.

It’s gutsy, an inventive solution by the R&D team and also a reflects a fundamental commitment from the management team to continuously improving the resilience of Netflix. What other company do you know which intentionally breaks their own product while in the hands of the customer?

The bottom line

The Netflix approach is novel, counter-intuitive, quality-focused in a different way and it seems to pay off. Could you replicate the Netflix culture in your company? Probably not. But maybe you, like me, take inspiration from the Netflix story to not go the easy and traditional route, but look for innovative solutions even (or especially!) if it breaks with conventional knowledge. It should give you pause to think about your current setup and what economic model you are following: are you simply chasing the lowest possible labor rate, or are you more concerned about the overall return on investment?

Lean/Agile Product Development requires a different investment mindset. If you invest with product development flow in mind (regardless of your outsourcing situation) then the benefits are not 5% or 10% improvement, but 50% to 100% or more. The solution is not in lowest possible labor rate (although that always helps) but in the highest possible ROI on the next R&D dollar.

Welcome Post

Hello and Welcome to the Lean Viking’s blog.

Here you will find thoughts and ideas on Lean and Agile Product Development, inspired by my affinity for minimalistic designs, Lean principles and Agile methods.

There’s lots of information out there about how Lean and Agile development works. But how do you get there if you are entrenched in a waterfall-based culture today? And what if you’re working in a large-scale environment? Wishing for Agile won’t make it so, especially if you are trying to scale up beyond a handful of teams. Agile at large scale is difficult… but is there really a viable long-term alternative? Organizations move to Agile because things aren’t working well in the Waterfall model. Those troubles are magnified at scale. The good news is that improvements are magnified as well, and scale economics work their wonders.

There is much hard work to do and a lot of it involves changing fundamental behavior. But what could be more rewarding that moving a whole organization to a new level of performance?

My hope is that you will follow this blog as an active participant. The purpose of this blog isn’t to give me a soapbox platform. I am blogging because I want to share my thoughts and hear what y’all have to say about Agile at scale.

Cheers,

Odd