Bosch Connected World Hackathon

There is no shortage of things to do in Berlin, and if you’re curious about high-tech, there are lots of opportunities to dive in and get connected to the high-tech culture. In my almost two years here I’ve had a great time spending my spare time attending events at startup-accelerators and co-working spaces, meetups, conferences and even hackathons.

When I saw the notice for a hackathon sponsored by Bosch I wasn’t sure what to think. To me, Bosch means spark plugs and power tools. Turns out there is a lot more to Bosch than that, and they are developing a big presence in the IoT space. So I decided I had to have a look at the Bosch Connected World conference in Berlin.  It was all about connected devices, and I would have loved to listen to the talks in the conference.

But there were also 4 separate hackathon tracks and I was curious about those. I signed up for the Connected Car hackathon. I’ve been learning Android mobile app development on my spare time lately so it was an opportunity for me to see how my skills stacked up.

IMG_2504 B.jpg

The way these things work is that the general topic of the hackathon is introduced, then anyone can pitch ideas to the audience. After the pitches are made, people circulate around the room and form teams to work on each idea. Then the teams design and code up their idea, working until the final presentation on the afternoon of the next day.

I decided to pitch a simple idea and was a bit surprised and very relieved that 4 university students from Stuttgart joined in and we formed a team. Since it was a Connected Car hackathon, we decided to integrate an Android app with the Bosch MySpin environment so that it would run on the car’s entertainment system.idea-1.png

The Bosch guys even brought a Jaguar with the MySpin system installed. We somehow missed to install our app in the car, but ran it on a prototype unit instead.

idea-2.png

After two days of hacking we were able to retrieve GPS coordinates from one phone, and display google maps driving directions on another MySpin-connected mobile phone. Very cool! I was somewhat relieved that I was able to contribute my part and feel that I’m keeping pace with the industry.

The event was really well organized at Cafe Moskau (pictured in the tweet above) which turned out to be a great venue. Cafe Moskau was built in 1964 and is situated in the former East Berlin. The architecture of the building is fantastic, such a departure from the stern soviet-era concrete buildings lining the streets around it. Great lunch and dinner foods and refreshments served by an attentive staff – this was definitely not your average hackathon. There must have been more than 200 people hacking away in several different rooms. There were different hackathons for Connected Car, Industrie 4.0 (a big topic in Germany, I think it’s termed Industrial IoT elsewhere), power tools and cloud-connected sensors. Check out the twitter hashtag #BCX16 to see more on the event, or have a look at this video.

At the end of the two days we enjoyed a nice wrap-up dinner courtesy of Bosch where hackers and conference attendees mingled. I must say they did a good job putting this whole event on. And a big thanks to the hackathon organizers – they clearly put a lot of effort into it, and it all went off without a glitch.

 

Paint The Room!

I occasionally deliver internal Agile training classes, and one of the things I have struggled with is to explain the concept of Agile estimation. I’ve been searching for a good game to get the point across, but could never find anything that worked well. Some times I could work off a real backlog of items and features supplied by the class participants, but more often than not it was almost impossible to find a set of features that everyone in the room could relate to. So some people would get the point, others would not.

One day in such a training session I had a room full of people from various backgrounds and was silently stressing over how to best cover Agile estimation. Then, somehow, two previously unacquainted neurons connected for the first time and in a flash I had the solution.

This game will teach Relative Estimation, Planning Poker, Story Points, Backlog Refinement, Story Splitting and Story Elaboration all in one short 20-minute session. All you need is some Planning Poker cards. It’s so simple you just have to try it to see how effective it is.

If you’re not familiar with how to use Planning Poker, then this page from Mike Cohn (@mikewcohn) is a good start.

I call it the “Paint the Room” game.

Paint The Room

The game is played in whatever room you are delivering the training in. Chances are your particular room has at least four walls, a floor and a ceiling. Good – that’s all you need! Now here is the setup:

Working as a team we have been given the job of painting the room which we are currently in. The customer’s requirement is (intentionally) vaguely stated as “please paint the room” and we have drawn up an initial project backlog which looks like this:

  • paint the north wall
  • paint the south wall
  • paint the east wall
  • paint the west wall
  • paint the ceiling
  • paint the floor

You might of course give each wall a descriptive name such as “window wall”, “entry door wall” etc. instead of north/south/east/west. Many people can’t tell North from Up.

In the game you will represent the Customer/Product Owner when the team has questions about the job. The team has to come up with an estimate to paint the room. Distribute Planning Poker cards to everyone. Keep the number of participants to the size of a scrum team. Everyone else can just watch and still get the full benefit.

The nice thing about the game is that everyone can relate to the task of painting a wall, even if they have never touched a paintbrush. Furthermore, everyone’s assumptions about what is involved in doing a good paint job varies widely among white-collar knowledge workers.

Establish the “Anchor Feature”

In relative estimation it is useful to have an anchor feature that we can associate a known story point quantity with. Other features will be estimated relative to the anchor feature.

Pick one wall that looks like it is medium effort relative to all the others and tell the team that “this wall is a 5-story point wall”. That is your “anchor feature”. All other backlog items will be estimated relative to that wall. This establishes what a 5-story-point wall looks like.

Most engineers initially have a hard time separating the idea of Story Points from person-days when we talk about software features, but have no trouble at all accepting the abstract notion of a unit-less “5 Story Point wall”.

Now we’re ready to estimate.

Estimate one wall

Pick one wall that is a little less complicated than the anchor wall and ask the team to estimate how many story points that wall is relative to the anchor wall. Use the Planning Poker method to get input from each team member.

Simply ask “So assuming our anchor wall is 5 Story Points of effort, how many Story Points of effort is this wall?”

Make sure everyone provides an estimate, and that everyone flips their cards all at once. One of the key benefits of Panning Poker is to protect against the influence of expert estimates which may or may not be reflective of what the team can actually do. It’s essentially a variation of wide-band delphi estimation. And indeed you might get a wide range of responses in the range of 1 story points to 8 or even 20.

This is where the fun starts.

Talk about the high and low estimates. The value of Planning Poker is in the conversation which brings out unspoken assumptions, and you will really see that in this exercise. As the high/low estimators explain their reasoning for their estimates, you will hear things like

  • I assumed that we had to put masking tape on the trim and around the edges of the windows
  • I assumed that we had to remove all the covers on the electrical outlets
  • I assumed that we have to cover the floor
  • I assumed that the prep work was done (or was not done) ahead of time

In other words, different people make different assumptions about what the job really entails when the job is imprecisely specified, so they come up with different story point estimates. That is exactly what happens when software engineers with different skills and specializations give estimates on vague requirements. Many times you will find that there is an experienced handyman in the team, and he will be on one end of the scale relative to the others. That is ok, it’s usually that way with software experts too.

The other thing that usually happens is that the team starts spurring each other on with more clarifying questions:

  • are we using rollers or brushes?
  • do we assume that all preparation/making take is done before the job?
  • do we all work on one wall at a time, or several in parallel?
  • do we have to purchase supplies first?
  • Should we move the furniture out of the room?

and so on.

The conversation will flow back and forth a little as the team considers the now-spoken assumptions behind the high and low estimates. As the facilitator you just have to sit back and listen as the team starts peeling the layers off the onion. If the team gets stuck, you might volunteer some of the questions above, but never solutions. Your job is to make sure that the team talks about their assumptions and agree on the how between themselves.

It’s a game, so don’t waste the opportunity to have some fun with it. If you get questions from the team, don’t be shy to throw in new requirements or constraints to make the job harder. “No you can’t take the windows out of the window frames before painting” or “yes of course you must remove the windows from the window frames before painting” and “well yes, of course I want the floor painted in a black-and-white chequer-pattern. What did you think?” are good conversation-starters. The team may groan all they want, but every small requirement change makes the game a little more like reality.

As the team discusses their individual assumptions they will probably identify some new tasks. Most teams end up adding new items to the backlog:

  • remove electrical outlet covers
  • put masking tape around the windows
  • put electrical covers back on
  • Purchase supplies
  • clean up

After a few back-and-forth comments you might get to a point where the team has a new common enhanced understanding:

  • yes we will remove the electrical outlet covers
  • we will use rollers for painting large areas and special brushes around the windows
  • we will not cover the floor (it will be painted last)
  • etc.

Not too different from talking about how a given feature works, what the hidden requirements are an so on. The actual agreements and assumptions are not important, as long as the team ends up with a better shared understanding of the task. The new knowledge is an illustration of Story Elaboration, and you might even want to take the opportunity to specify some Acceptance Criteria, such as “no paint smudges” and “even color application” (might require two coats).

Do another round of Planning Poker estimation. The second round of estimation should have a tighter range since we have now had some conversations and clarification about the “requirements”.

Discuss the high/low estimates again and do one more round of estimations if needed. If you picked a good “anchor wall” and picked a slightly less complicated wall to estimate, then the Story Point estimates should settle around 3 or 5 Story Points. Good enough, let’s move on.

Estimate the ceiling

Now that the team has seen the Anchor Wall and produced a converged estimate for a simple wall, you ask the team to place a story point estimate on painting the ceiling. Again, it must be relative to the Anchor Wall of 5 Story Points.

Discuss the high and low estimates. Some people will estimate the ceiling as being easy, others will see added complexity in simply reaching high, using ladders, how to deal with light fixtures etc. In any case you will find more un-spoken assumptions that only come to light when you talk about the estimates.

This is also a good opportunity to point out that the customer requirement initially was just to “paint the room”. Nobody gave specific direction on what to do about things like the light fixtures. It’s a perfect way to illustrate that it’s critical to go back to your customer and clarify what the expectation is. I usually respond to the team, when they ask, that I like the way the light fixtures look so I don’t want paint all over them. Therefore I want the light fixtures removed and then put back in place after the ceiling has been painted. You can then show that it’s ok to split the story into three: (1) remove fixtures, (2) paint the ceiling and (3) reinstall fixtures. That is a lot more work than the initial expectation (do we need an electrician?), but hey that’s R&D. Sometimes we uncover unexpected complexity. Update the project backlog accordingly.

Check the backlog

As you have gone through the estimating process a couple of times you will see that the backlog has grown with new entries, and some stories have been split into smaller stories. That illustrates the mechanics of backlog refinement. In every case the refinement was a direct result of discussing with the customer and/or uncovering hidden assumptions.

Wrap it up

I never go through the whole backlog, but stop the game after estimating two or three backlog items. By then we have covered Relative Estimation, Story Points, Story Elaboration, Story Splitting, Planning Poker and Backlog Refinement. Not bad for 20 minutes of poker!

I’ve run the exercise a few times now, and every time the team has come out with a better understanding of how to use Agile estimation techniques in their projects. Hopefully it works for you too.

If it does, let me know!

ThingsCon 2015

Living in Berlin has its benefits. Lots of things are happening, especially now that winter has lost its grip on spring. One of the nice surprises lately was ThingsCon – the small but premier european conference on the Internet of Things. I wasn’t sure what to expect but I registered anyway. Berlin has so much energy around the startup and technology scene it’s just insane. ThingsCon turned out to be another one of those events that buzzed with energy.

The two days of talks and workshops focused no only on the technology of the Internet of Things, but also -and more importantly- on the social responsibility and implications of the Things we put on the Internet.

The opening Keynote by Warren Ellis set the tone. He doesn’t care how we build this stuff or how cool the next gizmo is. He just wants to get home without worrying that the connected world doesn’t crash and shuts him out of his connected house. He doesn’t want to fiddle with buttons or operate his light switch from the iPhone. He just wants it to work, and he doesn’t want to be bothered with the technology part. He is, as he put it, “your worst nightmare – I am your products’ and services’ most likely customer.”

It’s not all doom and gloom for him though. He is excited by the prospects and pleaded with his audience of technologists:

It was probably the best opening perspective that a crowd of young, excited and energetic product innovators could hear. In the end what matters is how these products enhance and enrich our lives. No amount of tech-cool or big-data monetization can make up for a deficient user experience.

This was the second year of ThingsCon and I can tell that this conference will grow. I am definitely going back next year to keep track with how this field is maturing. I was expecting a very “thing-centric” technology conference but was delighted to see that for the most part the human side and the personal aspect took center stage. Yes, lots of of attention to the technology of IoT, but most of the talks and workshops I attended framed the technology inside the user experience.

The Internet of Things is definitely a darling buzzword, but at the same time it is difficult to imagine how an exponential growth of connectedness could not have some fundamental impact on your daily life. If it is the next evolution of the Internet, it will be the Invisible Web. The more we push technology for technology’s sake, the less adoption we will see. To me it’s not so much about making new cool gadgets. The best technology, as far as I am concerned, is invisible.

That is the challenge for the folks working in the IoT space: not to highlight the new gizmo or technology, but to embed it invisibly in a natural and socially responsible way so as to improve and enrich our lives.

Socially responsible.That phrase kept coming back to me as I sat through workshops and talks, and by talking to the folks around me. The IoT manifesto draft looks a little “thing-centric” in its first form, but in the workshop critique all the comments were human-centered so it’s a nice signal that the folks involved with the technology recognize that there is a lot more to the challenge than just “the thing”.

Two days isn’t enough, but I got a first sense for where the IoT technology curve is right now. I also got a much better appreciation for the coming struggle of making sure the technology, devices and the data they collect are managed in an ethical and responsible manner. Once you endow personal devices with consent to collect information about yourself in your daily life, the privacy issues surface very quickly.

We will have to find a balance between personalized services (which most of us want, but which require sharing of personal information) and privacy (which we also want, but which precludes any personalization of service). Ease of use must be a foremost concern, but security and privacy could make a seamless experience difficult to achieve.

At the moment I am visualizing the IoT conundrum as the 4 corners of a square with conflicting attributes:

Personalized  –  Secure  –  Easy to use  –  Private.

If you cant have all 4 of them, then where does the compromise start? In those 4 constraints, innovation waits. Until ThingsCon 2016…

Scaling Agile – it’s not the same problem

I was prompted by Al Shalloway’s (@alshalloway) brief tweet this week combined with a long train journey to write this blog post.

That is exactly one of the key ideas in Nassim Nicholas Taleb’s (@nntaleb) Antifragile. The kinds of problem you get at scale are not the same kinds of problems you deal with in small numbers. To quote Nassim Taleb:

“A city is not a large village, a corporation is not a larger small business”.

Likewise a single Agile team does not behave the same way that a project team made up from several Agile teams does. The method and style of communication changes, as does the kind of risks and issues we deal with. It’s a transformation into something new, not a simple multiplication.

Why scale breeds complexity

If interactions grew linearly with the system size, then scale wouldn’t be such a problem. Each time you add new member to the system, the complexity (C) increases by one since you only add one more possible relationship as visualized below:

linear growth

Unfortunately as real systems grow in scale, their complexity grows exponentially because each component influences or interacts (directly or indirectly) with many others. That is, each member you add to the system adds exponentially more relationships or interactions. It’s an old idea that is surprisingly often neglected by those that design systems. You can visualize how the number of possible interactions grows as we add one more circle to each diagram below:

geometric growth

Complexity is a property of the behavior of the system, not the structure of it. That is, it’s not the number of components that makes it complex, but how they interact – as visualized by the number of lines in the diagrams above.

watchConversely we may have systems that have a complicated structure, but yield simple behavior. For example, a mechanical pocket watch has a complicated structure but very simple and (thankfully) predictable behavior.

So it’s complexity resulting from interactions and behaviors we have to be concerned about. This is why we see different types of problems at different levels of scale.

Aside: scale up and down

In order to be considered “scalable” the transformation must be two-way. I have seen software systems, for example, that are designed to “scale” but can’t run on modest hardware. That is not scaling, that is just big and bulky. Somehow we forget that scaling is not the same as simple expansion. To me a good scaling implementation is a dynamic one, that can scale both up and down. If you can only go one way, it’s not “scaling”.

So that’s the problem: the behavior of a small group is different than that of a large group, which is again different from that of a group of groups… and the effect shows up sooner than you might think because of the number of new communication paths.

The oft-cited Dunbar’s number says that people can have some sort of social relationship (but not necessarily a deep relationship) with up to around maximum 150 (or so) people, but we see behavior changes way sooner than that. The upper bound of a Scrum team (9 people) is based on experience and seems to be re-affirmed time and again. Beyond that people see themselves as part of a mass, not a team. In my organizations I never had a manager with more than 10-12 direct reports.

Sometimes trouble shows up at even smaller numbers. It is said that behind every successful man stands  a woman, behind every failed man there are two. I think that joke works with Scrum teams and Product Owners also, except it’s not funny.

Scale the problem down

This is where Agile and Antifragile meet again: big problems are best solved when you can scale them down and distribute the difficulty. The secret to successfully executing big projects is not to scale the project team up, it’s to scale the problem domain down.

What I mean by that is that we shouldn’t simply look at the project requirements and then figure out how to scale a team to build the whole thing all at once. Approaches such as Minimum Viable Product (MVP) and applying Design Thinking to focus the problem space can save a lot of time, effort and money by applying resources only to what is critical and important.

What about decision-making? The problem of scaling decision-making is a tough one. Many times we don’t even consider that we can change the way decisions are made, and so we fall back on a central person or core team that is responsible for making all the individual decisions. Its a fragile setup because these teams are disconnected from what happens on the ground.

Scaling decision-making

To me every scaling problem is a delegation and distribution problem. The most ineffective way of all is when someone decides to distribute workload but is unwilling to delegate authority. Micro-managers would object to the micro-manager label except they are too worn out from trying to apply their bottleneck everywhere at once… and they don’t read my blog anyway. When you’re too close to the tree you can’t see the forest.

So what can we do? First of all, recognize that you can’t possibly keep more than half a dozen balls in the air at one time. Divide and Conquer, distribute, subdivide… do what it takes to bring things into manageable chunks. This is after all what Agile methods do: break things down into manageable smaller pieces of work.

Distribute Distribute Distribute

Distributed systems, whether they are mechanical, software, political or social have some compelling properties. They are individually self-sufficient, resilient and effective. Centrally organized systems on the other hand are attractive only on paper or at very small scale. Works fine as long as the “central” part is available and capable of managing the information flow, but quickly breaks down and becomes a bottle-neck in any but the simplest projects.

So work obviously needs to be decomposed and distributed amongst multiple teams. Nothing new here, whether you distribute according to traditional functional teams or cross-functional Agile teams it has to be done.

It is not enough to just distribute the work, you also have to distribute authority and decision-making when you’re dealing with more than a couple of teams.

Central Mission command, local tactical decisions

One of the most powerful and elegant ideas in Don Reinertsen’s (@DReinertsen) well-equipped arsenal of golden nuggets is the idea of Mission Command. Instead of developing a detailed plan to be followed by all and centrally managed, we set higher-level objectives and let individuals and teams self-organize (and even improvise) and decide how to achieve the objectives.

Central mission command is very much different from central micro-management. We centrally decide on the overall (higher-level) objective to be achieved, then delegate responsibility, authority and decision-making for how to reach the goal to each team – while accountability still rests centrally.

Central command and local decision-making is not enough either. Even when teams self-organize around a central higher-level goal, their individual approaches may create trouble later down the road. So we can distribute decision-making and authority, but how do we ensure that everyone keeps the overall integrity of project and company in mind at any given time? And how do you retain central accountability?

Use decision-rules

As teams self-organize around delegated and common objectives, they don’t just need to meet their goals, they need to do their work and act in accordance with the company’s long-term interest and in concert with other teams. An orchestra only works if everyone plays in time with each other.

I recall with fondness my son’s first tuba recital in middle school band. There were 4 tubas on stage, and he finished first. Well what can I say, he’s competitive and has since grown into a splendid college athlete. He knows how to play in beautiful concert with his team on the field now but I still smile when thinking about that first recital.

Orchestration and alignment is needed both for the project goals and in the general backdrop of the company. For example, a team might decide to achieve their part of the project goal by using an open-source software package. Although that may result in achieving the project objectives, the team may have inadvertently put the company’s intellectual property at risk. Not all open-source licenses are the same. By simply using “freely” available software, you may be automatically entering into an agreement where your company must agree to make parts of their system software freely available to the rest of the world.

You can’t manage dozens of individuals by central decision-making, but on the other hand you can’t simply let everyone loose on their own either. One way to effectively sub-divide and delegate is to create a set of decision-rules that aligns everyone on how to make decisions. Think of it as a set of guide-rails that prevents you from falling off the path.

Identifying decision-rules is not the same as identifying who is responsible for making decisions and establishing an escalation path. That is not scalable either. It is all about agreeing on how teams will make local decisions on their own, what they can decide on their own and what must be decided centrally. Such rules can enable teams to make decisions consistent with the overall command objectives. These rules are set centrally, and is how the “central command” retains accountability for the decisions made in the project.

One of the best examples of decision-rules I have heard about came out of the Boeing-777 development program. When you design airplanes you have to make tradeoffs between weight, cost and space. If you increase the weight of your subsystem then it becomes a big deal since every ounce translates into increased operations cost for the customer. Every subsystem and every engineer is faced with these kinds of tradeoffs on a daily basis. How do you manage the choices of several thousand engineers all at once? Obviously a central design review committee wouldn’t work effectively. So the project team set “budgets” for each subsystem development team as to how much weight, cost and space they were allocated. If a team needed to exceed their weight allocation, they could do that as long as they found another team that would be willing to trade some of their weight against, for example, some additional cost. That way the weight/cost/space constraints of the 777 airplane were always satisfied overall. The individual teams could trade allowances between themselves as long as they stayed within the overall budget. They didn’t have to get every tradeoff approved centrally – they used decision-rules.

So instead of creating an elaborate escalation path for decision-making, create the guide-rails within which decisions can be made at the right level in alignment with the overall mission of the company.

The Agile Manifesto and the principles behind form an example of another such a decision-making framework. For every choice or decision to be made, every engineer or team can ask themselves, for example, “does what I am about to do satisfy the principle of simplicity?”, or “if we take this approach will we be able to show progress in terms of working software?”.  Similarly your company and project will have a different set of decision-rules that guide teams and individuals in making local decisions.

So is that it?

No that’s not all of it by a long mile. But it’s a start. If you can

  • understand that different levels of scale requires a different approach, and
  • distribute both work and decision-making authority, and
  • create a good set of decision-rules that can be used to align everyone,

…then you’ve laid the foundations for a much less stressful environment at scale.

The Formula-1 Pit Stop: Lean counter-counter-intuition

What does F1 racing and LeaF1-Leann Product Development have in common? Not much at the surface… but if you, as I do, see everything as systems then you can’t help but notice some interesting things. In this case I found an apparent counter-example to Lean that is Lean despite going against the grain of traditional Lean Thinking. Hmmm… we’re into double-negatives here but stay with me.

The racing analogy is interesting to me not because “Lean=Speed” but because someone questioned the “obvious right way” and came up with a counter-intuitive better solution.

640px-2010_Canadian_GP_race_startI am always amazed every time I catch even a glimpse of Formula-1 racing. The cars fly around the track at up to 300 miles per hour, pull 1.45g during acceleration and 4g when braking. High speeds, tight turns and frequent acceleration and braking wears hard on car and driver, but even more so on the tires which aren’t even engineered to last a whole race.

An F1 tyre these days is designed to only last for about 120 kilometers on average (it’s a weight vs. durability tradeoff), but most F1 races are at least 305 kilometers long. That means you need to change tires 2 or 3 times in a race that is won or lost by fractions of a second.

I’m fascinated by this, of course, because the pit stop is the biggest impediment to continuous flow around the track. If you could make one less pit stop than your competitor you would be several seconds ahead, turning 10th place into victory. For a long time that was the strategy: go easy around the curves so as to conserve tires and fuel. A lower average speed would win the race as long as you could avoid making too many pit stops. So far it sounds Lean, right? Slow down and you’ll finish the race faster. No surprise there: Lean solutions are usually counter-intuitive.

Twisting and Turning

Well, every good story has a twist and our little F1-Lean analogy is no different.

In the mid-1980s someone took a step back and looked at the whole end-to-end McLaren_pit_work_2006_Malaysiasystem and realized that the “economy” of racing could be improved. Start the race with only half a tank of fuel, and the car would be much lighter and go faster. Stop worrying about conserving tires and instead push the car to the limits on the track. The penalty for this strategy is an added number of pit stops.

Not a problem – you just need to minimize the time spent in each pit stop.

It’s a tradeoff curve, as always. If you can continuously reduce the amount of time needed to refuel and swap tires, then at some point down the curve  the wear-and-tear vs. pit stop balance will shift.

Now it gets interesting. Because we now have two different models of “good” racing strategy, we have to choose – but how? We have to take a systems view, and make the decisions based on the objective merits of each strategy, not by intuition or personal preference.

Yes but we’re talking about Product Development, right?

You see analogies in Product Development all the time. Whenever there is a bottle-neck in our process we have to decide to either fix/improve the bottle-neck, or try to avoid it. Not all bottle-necks are solvable or even visible. Some are disguised as “the way we always do things here”. Most companies have settled on a particular pattern of which bottle-necks (departments/phases) are reasonable (acceptable as the cost of doing business) and which ones aren’t. Companies that structure the development flow through phases and gates accept the overhead associated with functional departments as “not perfect but the best way to do things”.

Lean Thinkers challenge this view all the time: figure out where the the flow stops and then improve it. Usually it comes down to finding local optimizations and then reducing or eliminating tasks in the name of overall flow. For example, eliminating pit stops so that the race can flow un-interrupted.

And sometimes Lean Thinkers have to challenge themselves, to avoid getting stuck in the “best” Lean solution.

So where did Formula-1 end up?

Have a look at this Ferrari pit stop from 2013. Mid-race refueling is no longer allowed in F1, so now it’s all about how quickly you can get 4 new tires on the car. You can feel the anticipation as you watch the pit crew waiting for the car to arrive.

That pit stop took 2.1 seconds. It’s a huge improvement from the minute-plus pit stops of the early days of F1 in the 1950s. Pit crews spend a lot of time and money to squeeze out every millisecond they can from their process. You can almost visualize a value stream map on the garage wall and the team swarming to find and reduce the next bit of waste from the process.

Is it a fair analogy?

But, many will say, that’s not a fair analogy. Obviously they have a lot of specialized equipment and a huge crew of specialists standing by. This is high-stakes racing and has nothing to do with Lean or product development.

Well that’s the whole point behind this blog post. It’s a classic example of how Lean thinking differs from mass-production thinking. It just manifests itself in a different environment. Speed (and safety) matters in F1, and speed (and safety) matters in Lean.

Systems of systems: Lean Fractals

There is a very direct and obvious application of Lean Thinking and Value Streams to what happens in the pit. If you have seen a Value Stream Map, you can easily understand how that helps weed out waste and inefficiencies in the process flow of safely and quickly changing the tires.

But there is another higher-level system at play, which is includes both the pit stop and the track. This system-of-systems has a different kind of flow economy, working off a different set of aggregated information.

If we ignore the system-of-system effect and blindly apply Lean tools it can lead you astray. Pre-1980s racing solved for Lean flow based on how the costs  were incurred back then, i.e. relatively sloold pit stopw pit stops. Slow down around the track and finish in first place. However the cost of any activity changes over time, and the technology and capability of the pit crew improved such that the basic assumptions behind “the best way to race” had to change.

In this case they had to question some long-held beliefs assumptions in order to get to a new level of performance. Similarly we find the biggest improvements hiding in plain sight when we look at our overall product development system and question our traditional way of working. Our world is full of systems-of-systems.

I draw two lessons from the F1-Lean analogy:

Question the Status Quo

The obvious approach isn’t always right, and you won’t see it until you look at the whole end-to-end system. It is counter-intuitive that adding one more time-consuming pit stop to your race will speed things up overall… but it does – up to a point.

Even if you settle on a “best” Lean solution, this might also be a local optimum… keep looking and keep questioning. The journey through solution space is not always a linear one.

Invest in non-mainstream efforts

In order to lean out the value stream you may have to invest and focus on non-mainstream efforts such as tooling and supporting activities. This is the stuff of overhead. It’s hard to justify increasing the overhead cost when the normal pressure is to reduce expenses. But in a Lean system the right overhead isn’t a liability – it’s what enables the mainstream to go fast. The F1 teams had to shift investment away from car and engine design and onto the less-glorious pit crew tools and processes.

The difference is in the kind of overhead: instead of traditional overhead which is needed to manage the waste generated by stitching together the work of separate departments, Lean “overhead” is there to make the value stream flow faster at higher quality.

Yes it’s a fair analogy

To return to the point above, yes the analogy is totally fair. You can usually make your product development progress much faster if you invest in af1-pit-crewncillary processes and tools with an eye towards end-to-end Flow. The Ferrari team in the video clip shows 21 crew members, all working franticly for less than 3 seconds. It takes 3 people per tire to do the job. Wasteful resource utilization? Not if time matters and your objective is to get the car through the pit stop quickly. Sure, we obviously can’t allocate 21-person support teams for all tasks, but the idea is to figure out what it takes to achieve the best end-to-end flow and then invest accordingly.

When I first started developing software we had bi-weekly load builds because it was such a difficult thing to get a clean build with 40 engineers all submitting several weeks of code changes, and build servers were very expensive. We avoided load build “pit stops” until they were absolutely needed.

Gradually the situation changed, and we now have efficient continuous build and test environments and the ability to submit code changes daily. The “economy” of our pit stop changed. It was initially not easy convincing management to invest in extra build servers, tools and staff – but eventually we met the point on the tradeoff-curve where investing in ancillary things like load builds made more and more sense. Now every serious software group has a DevOps setup.

Local vs. Global Optimization

There is a balance between local and global optimizations, and the balance can
shift over time. You can’t even grasp the concept of that balance unless you look at the end-to-end system flow. Chances are that you are only looking at improvements within your own department. If you are, then you might routinely leave improvements of an order of magnitude or more on the table. Look one level up, at the system-of-systems, and see what you can find.

Lean Systems: Antifragile Applied

“Systems subjected to randomness—and unpredictability—build a mechanism beyond the robust to opportunistically reinvent themselves each generation”

– Nassim Nicholas Taleb

In a previous post I introduced the concept of Antifragility – systems that benefit from shocks, randomness and disorder. Classifying the world in the triad of Fragile – Robust – Antifragile helps us understand and manage the potential impact of the uncertainty surrounding us.

It’s initially hard to imagine that anything useful could benefit from disorder, so the first thing to realize is that although objects and things can be Fragile or Robust, they can’t be Antifragile. Systems, on the other hand (which if course includes Product Development Systems) are made up of multiple interacting components. Systems exhibit behavior as they respond to their surroundings, and can be Fragile, Robust or Antifragile. It is this ability to respond and interact that opens the door to antifragility. Antifragility can bee seen as a type of evolutionary mechanism, continuously picking the best of the available options. So, when we look for examples of antifragility we need to look at systems, not objects.

Stressors: the fuel of Antifragility

A stressor is something that puts a strain on the system, pulls it away from its equilibrium. It’s the system’s response to stressors that classifies the system as either Fragile, Robust or Antifragile.

A system that gets weaker from the encounter with the stressor is Fragile. For example, a pyramid scheme collapses when exposed to the light of day. Not only dictators (the individual) but the foundation of the dictatorship (the system) crumbles when the forces of democratic thought are applied. The best-laid project plan with all its gantt-charts has a best-before date sometime before the first problem is discovered.

Robust systems neither gets weaker nor stronger in the presence of a stressor. Most government bureaucracies seem to fall in this category – their inability to learn and evolve astounds me, as does their unequaled staying power. Many companies operate in this way too. New ideas get rejected and expelled by the corporate immune system, allowing the company structure to stay the same even in the face of certain bankruptcy. Remember Kodak? GM?

Antifragile systems on the other hand enjoy randomness and stressors, at least up to a point. Shocks and disruption make them stronger because they keep the system alert and in shape. Stressors exercise and improve the system the same way physical activity stresses and improves your body. Strength training, for instance, involves pushing your muscles just past their breaking point. Your body is able to repair this damage and even over-shoots in the repair effort. The result is that you are left with a little more muscle mass than you had before. This is how Schwarzenegger became Schwarzenegger and Ahnold was again a cool and acceptable name for your first-born. Without these stressors the system would stagnate, much like a couch-potato grows the wrong kind of body mass and ends up with clogged arteries.

Of course, there is a limit to how much stressors are beneficial. Running at a reasonable effort level puts you in better shape; the first marathoner supposedly expired at the goal line, having historically over-exerted himself to deliver with his last gasp the one-word message to the king: “victory”.

(hang on – if they won the battle, then why the life-and-death rush? Good news would still be reasonably good the next morning, right?)

The next important thing to understand about Antifragile Systems is that they work in layers. It is not enough that individual members get stronger, the system as a whole needs to be able to survive and thrive. It needs to be able to learn and select.

It’s in the DNA of the System

Going back to our example of Mother Nature as the ultimate antifragile system, we can observe that the individual member of a species are inherently fragile. In fact, each member will eventually die off, no matter how strong it is. There is a natural turnover to make room for the newer and more fit members. By natural selection and replacement of individuals the system becomes more and more fit. There is a layering effect here. Individual members (at the lowest layer) compete with each other. The strong propagate their DNA and have (presumably) stronger offspring, the weaker gradually (or abruptly, as the case may be) exit the gene pool. The system as a whole (at a higher layer) grows stronger as a result. The system survives the demise of each of its members because the information that makes up the system is preserved in its DNA, surviving generation after generation of individuals.

By evolution such a system improves gradually even if there is no master plan and things happen at random. The system continuously Inspects and Adapts, and the current “best recipe” is carried forward in our DNA. As long as we recognize and seize opportunity, even a random walk will be beneficial. Antifragile systems love errors and variation for that reason.

Lean Systems: Fragile?

Lean systems are called Lean because they deliberately operate with very small error margins. For example, Lean Manufacturing systems are sometimes called “zero-inventory” systems because they have almost no buffer inventory to absorb variations and problems at individual stations. If there is a problem somewhere on the production line, the whole system could shut down. This is by design: in a tightly coupled system small problems are amplified to make them painfully obvious, and every problem becomes an urgent matter.

In one sense Lean systems are therefore very fragile to disorder and error so one might be tempted to simply put Lean in the Fragile category. But it’s not that simple. The antifragility of Lean is in the DNA of the system.

Lean Systems: Antifragile

So we need to reconcile the apparent fragility of the small operating margins of a Lean system with the claim that Lean systems are antifragile.

I like Steven Spear’s (The High Velocity Edge) summary of a good Lean implementation:

  1. Build a system of “dynamic discovery” designed to reveal operational problems and weaknesses as they arise
  2. Attack and solve problems when and where they occur, converting weaknesses into strengths
  3. Disseminate knowledge gained from solving local problems throughout the company as a whole
  4. Lead by developing capabilities 1, 2 and 3

The ingenuity and beauty of Lean is that even small problems become intolerable at the system level. Lean Systems use this fragile tight coupling as a way to accelerate system-level learning. If a problem develops, it immediately becomes painfully obvious that something is wrong.

Rather than working around or ignoring these small problems, the team in charge is obligated to immediately seize the opportunity to improve the way the system works before the small problem becomes a big problem. A good lean team will swarm the problem to get it fixed, and put in place measures to ensure that similar problems don’t occur in the future. The result is that the particular process step which failed now has improved and is less likely to fail in the future.

Antifragile systems love errors, and so do Lean systems. The fragility of small error tolerances acts as a forcing function which brings problems to the surface, causing the old faulty processing step to evolve and be replaced with a new and more fit one. Each small failure alters the DNA of the Lean system just a little bit, evolving and improving. One more problem spot has been eliminated, and the probability of future defects is reduced.

So here is a perfect example of a system that is designed evolve over time, to learn from mistakes and to grow more capable after each error. It needs no top-down direction other than living the Lean Principles. There is no master plan, yet Lean systems evolve on their own to become the most competitive and effective man-made systems we have on our planet.

Evolving. Learning. Antifragile. Lean. Wonderful.

If it’s a Pipeline, it’s leaking

Many times we view the Product Development System as a Pipeline where we pour effort and energy in, and out comes a product sometime later. You’ve probably used this analogy before, talking about “products in the pipeline” or “the R&D pipeline”.

Pipe-1

Seems pretty intuitive, and I use that analogy too. Except I recently thought perhaps the analogy isn’t quite right. If you’re working in a Waterfall or phase-gate process, it’s not a single pipeline. It’s a series of smaller pipe lengths which are joined together by hand-offs:

pipe-2

The trouble with hand-offs is that they generate waste.Throughout the journey there can be more energy lost in hand-offs than actually make it out of the pipeline. In every joint, effort and energy leaks out.

Pipe-3

I find this analogy is a little more fitting, and although it’s a simple visual it helps make the point about hand-offs at the simplest possible level. The discussion usually turns to “what are the leaks and how we stop them” and there is your entry to discuss Lean and Waste.

What do you think?

Agile at Boeing in 1990s – the 777 Program

777The year 1995 recorded two seemingly very unrelated events: the entry of the first Boeing 777 airplane into commercial service and the introduction of Scrum to the world. As the twists and turns through history go, Boeing was Agile before its time.

I love the Boeing 777. I have flown more than a few miles over the years, and for the majority of them it has been the 777 that carried me and millions of other passengers safely across oceans and continents. For most of us the flying experience is judged by the quality of the food and in-flight entertainment options, and whether the flight is on-time or not. We don’t pay much attention to what airplane we’re in and just want to get to where we need to go, but somehow I have grown fond of the 777. It’s like an old reliable friend now, so when I see the 777 on my itinerary I don’t mind the flying part so much.

But this blog isn’t about someone’s creepy love story with an airplane – although that could be interesting enough. It’s about Product Development.

How do you develop an airplane like the 777? It turns out the story of the 777 development program is even more interesting than the plane itself. The 777 program included elements that were both Iterative and even Agile. I enjoyed learning about the program from various sources and found it hard to reduce it all into a single blog post. There is a lot of information out there if you are interested, including a really good 5-part PBS documentary “21st Century Jet – the Building of the 777”. See the bottom of this blog post for some good places to start.

The 777 Development Program

In the late 80’s Boeing was trying to decide on how to fill the product line gap between the 767 and 747. One option was to evolve the 767 design but in the end it was determined that a completely new aircraft was needed. It’s a big investment – an estimated $5BN for the 777. Even though Boeing was in a strong financial position in 1990, a $5BN expenditure would sink any company if it wasn’t successful.

It’s tempting to avoid any kind of risk when you’re in that position but Boeing broke many barriers in this next-generation aircraft design, both in engineering and manufacturing. The 777 was Boeing’s first “fly-by-wire” (computer-assisted control) aircraft, had the first fully computerized cabin, and was also the first airplane ever designed using 3D CAD systems. Seems impossible to understand how it could be done any other way today, but before the 777 airplanes were designed with 2D drawings and it wouldn’t be until the first prototype was built that form and fit could be tested.

The old tried and true approach had its share of familiar problems: in Boeing’s previous development effort, on the 767, an estimated 13,000 individual design changes (large, small and tiny) were made to the door assemblies at various stages in the development process. The 777 would be even worse if something didn’t change.

10,000 engineers were involved in the 777 program spread across the world in the US, Europe, Japan and Australia. Talk about coordination nightmares and opportunity for error. Large-scale project, indeed.

The 777 program was incredibly successful, as evidenced by the timelines and the results. Conceptual design started in January 1990. Manufacturing of the first prototype started in early 1993. The 777 had its first flight on June 12, 1994 and less than a year later (May 1995) the first plane went into service with United Airlines. Boeing estimates that the number of changes and errors was reduced by 80% compared to previous design projects.

Another first for Boeing: their very first airplane delivery was accepted by United Airlines on the first walkthrough, with only a handful of smaller defects to be noted. That’s beyond remarkable for any complex system, and difficult to comprehend given the safety concerns and reliability requirements that an airplane has to satisfy. A defect-free delivery of the very first airplane on-time as promised speaks volumes about any complex development program. I don’t understand airplanes but I know complex systems don’t come together very easily.

If that wasn’t enough… the very first 777 prototype plane built (W001) was good enough to be updated and eventually put in service by Cathay Pacific in 2000. Not bad for a prototype!

Alan Mulally

mulally

The engineering story of the 777 is also the story of Alan Mulally. The management team was initially led by Phil Condit as the executive in charge of the 777 program and Alan Mulally as the director of engineering. Alan Mulally would later replace Phil Condit as the person overall in charge, and it is Mulally’s management style that formed the framework of the 777 program and the “Working Together” model.

A hint of Agile

The first hint of Lean/Agile mindset can be seen in one of Mulally’s weekly program review meetings. A partial glimpse of what looks like program/meeting rules is barely visible, but you can make out at least some of the principles:

  • Plan for zero overtime [Agile: sustainable pace]
  • Weekly DBT reviews [Agile: weekly scrums]
  • Panic Early [Agile: fail fast]
  • Quality and/is Schedule [Agile: Quality is a critical ingredient]
  • Make decisions faster [Agile: act and learn fast]

meeting-rules

Working Together

The labels “Agile” and “Scrum” were not yet born when Boeing kicked off development of the 777 in January of 1990. However the management team knew that something had to change. Boeing’s environment had become bureaucratic and department-focused. Specialists in various departments would design their own parts and then it was up to the manufacturing team (the system integrators) to figure out how to make it all come together. It was a “throw-it-over-the-wall” environment where communication and disconnect was a constant problem.

This time, Boeing would work closely with their customers to design the airplane, and would also tear down the walls between departments by organizing their own workforce across discipline boundaries. People that were normally separated by organizations or development phases would now be engaged together and at the same time, talking and collaborating in real-time.

“Working Together” essentially boils down to “cross-functional” teams but it meant more than that. It meant working closely -really closely- with the customer airlines and suppliers. It signaled a radical departure from the bureaucratic project organization of the past and set completely new expectations going forward. Today we would call such a thing an “Agile Transformation”.

“Working Together” foreshadowed Agile, and the model is -unfortunately- still lightyears ahead of the majority of product development teams on the planet.

Although the 777 program was planned and executed in an overall phase-gate process with key milestones, integration points and deadlines, we can see many elements and attitudes which are definitely Lean/Agile-inspired, if not outright Agile. Looking at the 777 program through the lens of the Agile Manifesto we can recognize many familiar concepts.

Individuals and Interactions over Processes and Tools

The Working Together model pulled people from many disciplines together in what we today call cross-functional teams. This was a radical departure from the way things used to work, where design engineers would design their piece in isolation, then throw their designs over the wall to the manufacturing team – waterfall-style. This was the mode of operation in Boeing in 1990, and chances are  this is also the mode of operation in your company today if you work on a project of any significant size.

To address the problem of communication, Condit and Mulally organized the program in cross-functional Design-Build Teams (DBT). There were almost 250 DBT’s on the program. DBTs were formed around functional areas of the airplane, for example there was one DBT for the engine, one for the cargo door, one for the passenger door, one for the leading wing edge, one for the trailing wing edge, one for the flaps, one for the rudder etc.

Each DBT was staffed with the right people that could carry a particular design from concept through manufacturing and maintenance. Teams had design, manufacturing, tooling, finance, materials, maintenance, subcontractor representatives and even customer representatives to make sure that every aspect of operation, manufacturing and maintenance was covered.

This gives us a hint as to how we can organize Agile at large scale. You can’t possibly coordinate or co-locate 10,000 engineers, but you can make sure that the individual DBT’s are co-located and then coordinate the DBTs. DBTs were organized in a cascading hierarchy according to how functions of the airplane could be decomposed. For example, there were ten DBTs responsible for the wing’s Trailing edge, including a DBT each for Inboard Flap, Outboard Flap, Aileron, Spoilers and so on.

One potential problem of decomposing the project into DBTs is that individual teams lose track of the whole design. Weekly DBT reviews helped counteract that, but Boeing realized that the teams always need to be bonded together by a higher-level goal. When you are invested and care about something, you look out and make sure your work, and the work of the person next to you, is at the highest level.

To achieve that alignment Mulally did something incredible which I still have a hard time imagining. He pulled together all 10,000 engineers working on the 777 program for all-hands meetings once a quarter. By mid-1993 the 9th (!) all-team meeting convened in Seattle. That’s 9 of those in 30 months…

All-teamWe can only speculate how much it cost Boeing to bring everyone on the team together once per quarter, but Boeing understood the real economics of product development: total team alignment allows you to reduce errors and disconnects and avoids prolonged delays. The cost of bringing everyone together for 2 days (10,000 x 2 days of salary plus travel) is much smaller than the cost of a one-month delay to the program (10,000 x 30 days of salary plus downstream fallout).

When making resource vs. time decisions, you always need to consider the burn-rate and cost of delay for the project. The burn-rate of a 10,000-person program is much higher than the cost of these all-team meetings.

Working Software (or planes) over Comprehensive Documentation

It’s all about creating fast feedback loops in order to discover problems early while they are still correctable. Agile is a Fail-Fast model.

Quick iterations and virtual integration

CADIt is of course not practical -yet- to build and test an airplane incrementally in Sprints, but in the early 1990s CAD software was just being introduced. The 777 was the first airplane to be designed almost entirely using 3D CAD software instead of 2D drawings. This allowed Boeing to simulate form and fit quickly instead of building the traditional physical mock-ups which took both time and resources. Now for the first time 3D CAD models could be fit together and verified almost in real-time, before the first prototype was built.

Prototyping

The 777 program built 9 working airplane prototypes, compared to the usual 6 for a traditional airplane development program. Considering that the list price of a 777 would be in the 100M range, the extra 3 prototypes must have put a big dent into the development budget. However if you value fast feedback loops, then the cost of not building the extra 3 airplane prototypes would be even greater.

I don’t know for sure of course, but assuming that 9 prototypes each cost at least $100M, the  prototype expenditure for the 777 program was in excess of 20% of the total budget.

Flight-deckNot all subsystems needed a full-scale prototype. To test the maneuverability and visibility of the new flight-deck layout, a flight-deck prototype was built and mounted it on a wheeled frame which allowed them to taxi around the airport to get a feel for the handling, controls and visibility of the new flight-deck design.

Mock-ups and test-beds

If you can test early, then do so – especially for the risky new components.

A prototype of Pratt & Whitney’s new engine was fitted to an old 747 and given a test flight. It’s a costly experiment which was debated internally, but in this case it was justified: a surge (engine back-fire) was experienced on that first flight, and was discovered early enough that Pratt & Whitney could address the problem without delay to the 777 program.

The new Fly-by-Wire system was similarly tested on a 757 airplane first to make sure that everything would work smoothly on the 777.

These added prototypes and test beds resulted in positive improvements for the customers too. The accelerated testing schedule made possible by the additional prototypes made a difference in getting the necessary FAA certifications in record-time, allowing for a much faster customer deployment of the plane.

Customer Collaboration over Contract Negotiation

Boeing needed a lead customer for the 777 and after fierce competition, United Airlines selected Boeing and the 777 for a $22BN deal that effectively launched the 777 program from concept into reality.

United Airlines and Boeing did write a formal contract, but the essential agreement upon which United Airlines would award Boeing their business was the famous “Condit-Guyette” memo. This hand-written note, drawn up in the early morning hours by United Airlines executive James Guyette punctuated several days of competitive negotiations between Airbus, McDonnell-Douglas and Boeing.

Signed by both Boeing and United Airlines executives, it stated that Boeing and United would work together to design a new service-ready airplane.

Thanks due to the PBS documentary flashing the original memo on the screen for a few seconds we can reverse-engineer what it actually said:


The Condit-Guyette memo, transcribed

B777 Objectives

United + Boeing + Pratt-Whitney

In order to launch on-time a truly great airplane we have a responsibility to work together to design, produce and introduce an airplane that exceeds the expectations of flight crews, cabin crews and maintenance and support teams and ultimately our passengers and shippers.

From day one:

– Best dispatch reliability in the industry

– Greatest customer appeal in the industry

– User friendly and everything works

October 15, 1990

Signed by United Airlines, Pratt-Whitney and Boeing executives


Seems quite reasonable from a customer perspective yet unrealistic from a product development standpoint. What a great way to kick off a project, and it doesn’t get any clearer than that: customer collaboration over everything else.

Responding to Change over Following a Plan

Although the design of the 777 followed a master production plan, we can see a few ways where Boeing and its suppliers expected and responded to change within the boundaries of that plan.

As many as 8 airline customers had full-time representatives sitting alongside Boeing in Seattle, with British Airways peaking at 75 people integrated with Boeing on the 777 program. 1,500 design features were reviewed with the airlines, and changes were made to 300 of them as a result.

As another example, for the first time Pratt & Whitney (the jet engine manufacturer) held several open design reviews where they invited airplane mechanics from customer airlines to critique and provide feedback about the serviceability of their new engine. This helped reduce human error in maintenance, and helped build confidence with the customer airlines that they would have a reliable and serviceable engine on the 777.

There are many other stories of requirement change, such as the rudder team in Australia which had to endure changing requirements twice after rudder manufacturing had started. Those are the big and visible ones. But how would we deal with all the small day-to-day changes?

On a day-to-day basis the DBT’s would deal with new and emerging information, and had put in place fast feedback loops between manufacturing and design engineers. Change requests which normally took weeks to process were handled in a single day with the DBT approach. When you can collaborate quickly, it is also much easier to implement (and undo) changes.

Postscript

The 777 program is a technical and commercial success, and has garnered numerous innovation awards. Unfortunately Boeing did not push the “Working Together” model across the company. The amount of change, training and continuous attention experienced in the 777 program was quite high, so it was decided to leave it optional for each development program to decide. Of the next 3 development programs, only one implemented the “Working Together” model. Judging from the 787 impressions (numerous delays and early equality issues, fleet groundings), it seems a lot of the 777 lessons have been forgotten or ignored.

Alan Mulally himself later joined Ford in 2006 as President and CEO. There he continued the corporate turnaround with similar management methods. It just goes to show: principles are transferable but practices are not.

References and Resources

You can find out quite a bit about the 777 development program if you just do a bit of searching. Here are some of the links/resources which i came across:

AntifrAgile

“Wind extinguishes a candle and energizes fire. Likewise with randomness, uncertainty, chaos: you want to use them, not hide from them. You want to be the fire and wish for the wind.” – Nicholas Nassim Taleb

Agile and Antifragile… one word you know, and another perhaps you don’t.

Taleb BooksAntifragile. The word is so new it doesn’t appear in any dictionary… yet. It was introduced to the world by one of my all-time favorite authors, Nassim Nicholas Taleb, in his 2012 book “Antifragile – Things that gain from disorder”.

One might imagine that the list of “things that gain from disorder” could be written on a single post-it note with room to spare. Short in length and narrow in scope indeed, but the more you dig the more applications you see. There are numerous parallels to product development here, waiting to be picked.

When someone offers you a way to benefit from disorder instead of being harmed by it, wouldn’t you be curious? Sign me up! Should be enough material for a few blog posts. Not sure how many – all I know is that this is part 1.

Disorder Defined

Things that gain from disorder… intriguing. Better clarify what is meant by disorder then. Disorder can be many things, so Taleb provides a rough summary of the “extended disorder family”:

(i) uncertainty, (ii) variability, (iii) imperfect, incomplete knowledge, (iv) chance, (v) chaos, (vi) volatility, (vii) disorder, (viii) entropy, (ix) time, (x) the unknown, (xi) randomness, (xii) turmoil, (xiii) stressor, (xiv) error, (xv) dispersion of outcomes, (xvi) unknowledge.

As a shorthand, I’ll refer to the family as either disorder or randomness, depending on context or the whim of the moment.

To a traditional Project Manager, these family members are all harbingers of bad news, like an unwelcome visit from the Mafia. To the Developer, they are facts of life. The “project management” activity aims to wrestle the disorder family to the ground and squeeze out as much predictability, control and stability as possible.

That, as Taleb likes to say, is a “sucker game”.

It’s a sucker game because the problem is unsolvable. We’ll never eliminate or control randomness. If we could, randomness wouldn’t be coming at us at inconvenient times from unexpected directions. The winning game is not to tame randomness (we can’t), but to manage our exposure to the effects of disorder by assuming change and volatility is ever-present. Here you can already see the first signs of parallels to Agile: we embrace change and new emerging information. We adapt to a changing environment. Variability and uncertainty are not to be resisted, but used.

We need a way, then, to help us get a handle on this disorder thing. You can’t exploit what you can’t see. And so we introduce the Triad.

Fragile – Robust – Antifragile

Everything in the world can be classified into The Triad: Fragile, Robust or Antifragile according to how they respond to disorder.

Something that breaks when subjected to disorder is Fragile. For example, a porcelain teacup which is dropped onto the tile floor has nothing to gain and probably will break.

The Robust neither breaks nor gains from disorder. Robust or resilient things retain their shape after being exposed to shocks. That is good, as they don’t get any worse – but they don’t improve either.

We typically think of the opposite of Fragile as robust or resilient, but as Taleb is the first to articulate, the opposite of breaking under stress is not to stay the same but to get better. What kinds of things could possibly benefit from randomness and disorder? Well, a lot of things, and Taleb gave us the word Antifragile to give the concept a name. Things that are antifragile respond positively to disorder and randomness. They grow and get stronger.

OK, we are engineers with logical minds trained in linear thought. This can be hard to wrap your mind around, so let’s look at some examples.

The easiest approach is to look at the opposite end of the spectrum first. Fragility is easily detected. Anything that looks like a line of dominoes that have to fall in place perfectly is fragile because any kind of variation or instability (one domino out of order is enough) will upset the plan.

For example, if you are traveling by air and your itinerary depends on making 3 different connections with no allowance for delay, then your plan is fragile to delays. The more such dependencies you have, the more fragile your trip will be.

Think about all the fun project Gantt-charts with carefully laid-out dependencies you have been exposed to in your lifetime. None of them (none!) executed according to plan. The bigger your project is, the more moving parts you have. The more moving parts, the more fragile you become because you have more opportunity for things to wrong. Size makes you fragile (this will be a separate blog post for sure). You can hide fragility behind buffers, but it doesn’t reduce the fragility of your reality. It just lets you pretend it doesn’t exist and delivers a bigger surprise when things go really wrong. This we mistake for confidence.

Robustness in R&D typically come in the form of buffers or backup-plans. Both are expensive because in both cases we trade stability for schedule time or resources. Since most R&D projects are people-intensive (easily 80% of project costs come from salaries), this is the most expensive way to purchase perceived predictability.

Mother Nature is Antifragile. Here is  system that depends on random events. Evolution can not do its work in a perfectly stable system where there is no variation. If there is no genetic variation, then there is also no way to improve the species. Genetic variation (mutation – say sharper teeth for a carnivore) is necessary for the improvement of species. Evolution is fueled by randomness. It’s a volume business, for sure, and most variations don’t result in meaningful improvements. But once in a while a new ability surfaces which ends up dominating. Here we see at least two members of the disorder family at play: variability and time.

AntifrAgile

There is a lot more in common between Agile and Antifragile than contracting the two words into AntifrAgile – although the word combination is cute and rather serendipitous. And maybe prophetic.

Yes, Agile methods embrace change as a fact of life but the dive into AntifrAgile goes much deeper than that. It boils down to a different way of looking at risk – and therefore also acting differently on a daily basis.  We stop trying to predict or eliminate random events, and instead put ourselves in a position where surprise is not an unwelcome guest but rather the wind that energizes the fire.

As we shall see later, like Mother Nature we can exploit disorder in product development rather than being harmed by it. Taleb outlines several strategies for how to do that in other parts of life, and I’ll try to relate them to the Lean/Agile mindset. As we draw the parallels, you might get a better appreciation into how randomness is also Agile’s friend. And – perhaps in the process we’ll tread into uncharted Agile waters. Who knows.

Summary

Ok that is almost enough for this first post on Agile and Antifragile. Maybe you can start to see the trajectory of how Agile and Antifragile intersect. They are overlapping in so many places, sharing much of the same DNA.

So what’s the big deal? Understanding how Antifragility can work to your advantage will unlock more options and maybe even push our application of Agile methods forward in new directions. I am a big believer in adopting and adapting ideas from other areas, and if we can overcome our domain dependence then we have one more source of inspiration to draw on which will help us move away from the Fragile world of Gantt-charts and into the more dynamic world of Agile.

And hey, it doesn’t hurt to get independent confirmation from other parts of life about the sensibility of Agile, and to see that Agile’s way of dealing with uncertainty has merit outside of Software Development.

A Tale of Two Systems

Ok, a cheesy headline. But it fits.

I recently relocated to Berlin and part of the fun is to get set up with new… everything. A few trips to Ikea, hardware stores, grocery stores, the local Bürgeramt (government bureaucracy is alive and well) etc… and then the real fun: communications. We need new mobile phones and internet access.

I like the convenience and simplicity of the one-stop-shop model so we visited the local telecom provider in our area. No problem, they can help with all of that. It didn’t take long to find ourselves citizens of two worlds.

First of all, I don’t need or want a land-line. It’s 2014 and we just use mobile phones and Skype. Neither did we really need or want TV since we watch online on-demand on our own schedule. But… you can’t get Internet without TV or TV without a land phone line anywhere in the world these days. So now we occupy space in the public phone system for a number that will never ring, and we have lots of German TV channels we will never watch. But it’s nice to know they’re there, I suppose.

But that’s a digression. Step one is to get mobile phones, and that was easy. Pick a plan, pick a phone and less than 30 minutes later (most of which was spent filling in the application forms) we had mobile phones in hand, activated and ready to go. Great!

Not so easy for Cable TV and Internet. The store clerk patiently explained that we will receive a set-top media receiver via mail courier in two weeks. Then, two days after that, the actual service would be activated and away we go. The DSL modem for internet connection is yet another piece of equipment which, incidentally, the TV set-top box depends on. We picked up the DSL modem in-store.

One company, two very different levels of service. Mentally I form a picture of a company that has one logo but two distinct business units and logistical setups, probably as a result of a merger along the way.

The mobile unit is new, formed around internet-time service expectations and delivery capability. Same thing for internet connectivity. It’s new, not saddled with legacy procedures and expectations.

The cable TV unit is stuck with old infrastructure, not just for physical connection but also in terms of logistics. I imagine someone has to take the order, re-type it into another system, print it on a dot-matrix line printer, send it to Nuremberg or wherever where someone puts their stamp on it, makes two copies which get inserted into thick dusty binders never again to be seen, sends another paper work order via internal mail to the service activation center where the operator finally flicks the electronic switch.

Well maybe not exactly like that, but perhaps not too far off. It’s 2014 and there is a 2-week waiting period to get a simple service turned up, where it appears that most of the time is spent simply waiting in line.

One company, two service models. I guess the merger wasn’t as thorough as the company logo might suggest.

Predictably, as anyone familiar with complex systems will know, the longer lead-time model with more links in the logistics chain is more prone to problems and surprises.

Somewhere along the way our set-top box got lost in the shuffle, and nobody was able to find it or tell us when it would arrive. “Maybe tomorrow” we heard a few times, the clerk’s voice unable to really convey the confidence we were hoping for. Can’t order a new one because the previous one is already “in the system and on its way”. The fun part here is that as we were tracing the steps, we discover that the service provider actually sent the set-top box to the courier service the same day we ordered the service. So they were in effect told to hold the package in the delivery warehouse for 2 weeks before even shipping. It was lost for almost a week before it arrived. We finally had internet working 3 weeks after the order was placed.

I check the calendar. Yes, it’s 2014. But at least we’re in Berlin and living life in a way that’s not possible anywhere else in the world. Some things are inconvenient, yes, but easily purged from active memory by blogging about it. Kind of feels like we passed the first test. Now we move on and become Berliners – whatever that is.