How ordinary failure could have a seismic effect on an industrial giant

Alamy (Credit: Alamy)Alamy

Earlier this year, a Boeing aircraft's door plug fell out in flight – all because crucial bolts were missing. The incident shows why simple failures like this are often a sign of larger problems, says John Downer.

For a flight to be imperiled by a simple and preventable manufacturing or maintenance error is an anomaly with potentially far-reaching implications.

On 6 January, as Alaska Airlines Flight 1282 (a Boeing 737 Max 9) was climbing out of Portland, Oregon, a large section of the aircraft's structure, a fuselage door-plug, broke free in flight. With the plug gone, the cabin violently decompressed with a clamorous boom and gale that ripped headrests from their moorings. The mother of a teenage boy seated just in front of the rupture clung to him as his shirt was torn from his body and sucked into the void.

Comment & Analysis

John Downer is Associate Professor in Science and Technology Studies at the University of Bristol, and the author of "Rational Accidents."

Nobody died in the harrowing incident, somewhat miraculously, but it was a very close call. If the seats directly next to the failed fuselage section had not been empty, or the seatbelt light had not been lit, the event could probably have been deadly.

Dangerous failures in modern jetliners are extremely uncommon events in general, but even in this context, the plug blowout looks unusual and concerning. Preliminary reports strongly indicate that its proximate cause was shockingly mundane: it seems that Boeing simply failed to secure the plug correctly. The errant door-plug was missing four crucial bolts when it was discovered in a residential neighborhood, and subsequent inspections have reportedly revealed improperly bolted plugs on other aircraft fuselages.

If the missing bolt theory is confirmed when the safety investigation concludes, then it will be the sheer ordinariness of the failure that sets it apart. When jetliners fail for mechanical reasons, those reasons tend to be much more complicated and interesting (at least from an engineering perspective) than missing bolts. For a flight to be imperiled by such a prosaic and eminently avoidable manufacturing or maintenance error is an anomaly with ominous implications.

To understand what I mean here, it helps to put the incident into context, and for that it helps to step back and think briefly about the inherent difficulties of making jetliners as reliable as we have come to expect. Extreme reliability is hard, especially in complex technologies that operate in unforgiving environments. This is intuitive enough. But the nature of the challenges it poses, and the manner in which the aviation industry has managed those challenges, are both widely misunderstood.

Getty Images In recent years, Boeing has had to deal with serious issues with its 737 Max short-haul airliner (Credit: Getty Images)Getty Images
In recent years, Boeing has had to deal with serious issues with its 737 Max short-haul airliner (Credit: Getty Images)

The extreme reliability that we expect of jetliners poses meaningfully different challenges than the "normal" reliability we expect of almost any other system. In essence, this is because the challenge of designing a complex system that doesn't fail very often lies in knowing the intricacies of that system and its operation. (Which is simply to say that the better engineers understand how a system can fail, the more able they are to prevent it from failing.)

But the depth of knowledge needed to lower a system's failure-rate, and the difficultly of achieving that knowledge, don't scale in a linear way. Like climbing a high mountain without oxygen, each step on becomes progressively more difficult. So, doubling a system's reliability takes more than twice the effort, and so on.

To appreciate this relationship, consider the work of building a system that is reliable 99.99% of the time (ie one that fails no more than once in every 10,000 hours of operation). To achieve this, engineers need to understand how the system will behave over that period of time: the external conditions it might face, how its many elements will interact with those conditions, and a great deal else. And for that they need abstractions – theories, tests, models – that are representative enough of the real world to accurately capture the kinds of eventualities that might occur only once in every 10,000 hours.

The real world is "messy" in ways that engineering abstractions never perfectly reproduce, however, so achieving such representativeness can be challenging. A lot of unexpectedly catastrophic things can happen in 10,000 hours. Perhaps an unusual environmental condition might stress a material in an unanticipated way, causing it to corrode or fatigue. Or an obscure combination of inputs might cause essential software components to crash or behave erratically. We don't know what we don't know, as the old truism goes, so these things are difficult to anticipate.

When reckoning with failures over this kind of timescale, even extraordinarily obscure considerations can become critical

Now consider what happens as the reliability required of the system rises from 99.99% to 99.999% . To achieve this new benchmark engineers need to account for eventualities that might occur, not every 10,000 hours, but every 100,000 hours. That's a much larger set of even more obscure possibilities. And so it goes; each new decimal in this "march of nines" representing an order-of-magnitude rise in the obscurity of the factors that engineers need to capture in their abstractions and accommodate in their designs.

With each increment, therefore, it becomes increasingly likely that their reliability calculations will be undone by something esoteric and improbable that eludes their understanding of how the system functions: some property, or combination of circumstances that nobody thought to test or model. (Elsewhere, I have proposed we call such failures "rational accidents", partly because they arise from rationally held but nevertheless erroneous beliefs, and partly because it is rational, epistemologically, to expect them to occur.)

This is the context in which we should understand the safety of jetliners. Viewed through the lens of epistemological uncertainty and its hidden dangers, civil aviation's safety record over the last few decades is little short of astonishing. The rate of airliner accidents attributable to technological failure implies that their critical systems have mean-times-to-failure not of 10,000 hours or even 100,000 hours, but north of a billion hours.

When reckoning with failures over this kind of timescale, even extraordinarily obscure considerations can become critical: unexpected interactions or phenomena that might only show up with a particular phase of the moon or alignment of the stars.

Getty Images David Calhoun (right), Boeing's CEO, has had to answer questions about the company's corporate culture at Senate hearings this year (Credit: Getty Images)Getty Images
David Calhoun (right), Boeing's CEO, has had to answer questions about the company's corporate culture at Senate hearings this year (Credit: Getty Images)

As a 20th-Century engineering achievement, therefore, the sheer ordinariness and tedium of commercial flight is on par with the exceptionality and drama of the Apollo Moon landings. And if the laurels for this collective achievement had to be laid at the feet of a single organisation, then it would have to be Boeing. The company was at the vanguard of the jetliner revolution, and was the leading manufacturer of civil airliners through most of the Jet Age that followed. Its revolutionary B-29 Superfortress, the wartime development of which cost more than the Manhattan Project, pioneered many of the core technologies and design principles that made modern jetliners possible.

The processes by which Boeing and its peers achieved this lofty reliability are routinely misrepresented. We have been conditioned to think of engineering as an objective, rule-governed process, and the work of making jetliners reliable is firmly couched in this language. So it is, we are told, the awesome mundanity of modern air travel is built on ever-more-detailed engineering analyses and rigorous regulatory oversight: tests, models, measurements, and calculations.

Like sausages and scriptures, however, these formal practices look increasingly spurious when the circumstances of their production are examined closely. Not even the most exhaustive tests and models could hope to accurately identify and reproduce every subtlety of a jetliner's real-world performance over billions of hours of operation. It would be an impossible undertaking.

While rigorous analysis and oversight are sufficient in most engineering circumstances, therefore, their usefulness wanes long before they can deliver the kind of performance jetliners demand. Engineers working in this domain need to push past the limits and uncertainties of their abstractions, and herein lies the true challenge of extreme reliability.

Examined closely, the industry navigated this challenge by leveraging a series of pragmatic but ultimately unquantifiable practices, which, stripped to their essence, amount to a slow process of learning from experience. Engineers calculated and measured everything that could realistically be calculated and measured, then they gradually whittled away at the uncertainties that remained by interrogating failures for marginal insights that had eluded their tests and models. They incrementally made jetliners more reliable over time, in other words, by leveraging their real-world experience.

This learning process sounds simple, but it was actually a painful, expensive, decades-long grind that depended for its success on several longstanding and often challenging organisational commitments. For example, it necessitated a costly dedication to researching the industry's failures and close calls, and an institutionalised readiness to accept findings of fault (something that organisations naturally tend to resist).

How does this understanding of aviation reliability help us make sense of Boeing's recent missteps with its 737?

Perhaps most significantly, it depended on a deep-rooted adherence to a consistent and stable jetliner design paradigm: a willingness to greatly delay, or forgo entirely, implementing tantalising innovations – new materials, architectures, technologies – that, on paper, promised significant competitive advantages.

These vital practices and commitments could never be wholly legislated, audited, and enforced by third parties due to the nuanced and necessarily subjective judgments on which they hinged. Regulators might demand that "new" designs be subjected to more scrutiny than "light modifications" of prior designs, for instance, but they could never perfectly define what constituted a "light modification". And, while their rules might require that special precautions be taken for "safety-critical" components, the "criticality" of specific components will always be a matter of interpretation.

The fact that these vital practices and commitments were necessarily subjective, and so to some extent unenforceable, made the organisational cultures that framed them extremely important. The people making strategic decisions at companies like Boeing needed to understand the significance of the choices they were making, and to do that they needed to be able to see past the rule-governed objectivity that frames the safety discourse around modern aviation. They had to realise that in this domain, if in few others, simply ticking every box was not enough.

Getty Images The loss of a cabin door from a Boeing 737 Max in January this year happened because of a simple failure to secure bolts (Credit: Getty Images)Getty Images
The loss of a cabin door from a Boeing 737 Max in January this year happened because of a simple failure to secure bolts (Credit: Getty Images)

They also needed to be willing, and able, to prioritise expensive, counterintuitive practices over shorter-term economic incentives, and justify their decisions to stakeholders without appeals to quantitative rigour. This made aviation-grade reliability a huge management challenge as well as an engineering challenge.

So how does this understanding of aviation reliability help us make sense of Boeing's recent problems with its 737? Seen through this lens, the door-plug drama looks highly unusual in that it appears to have been an avoidable error. This is stranger than it seems. On the rare occasions when jetliner failures are attributable to the airplane's manufacturer, they are almost always "rational accidents", with root causes hid in the uncertainties of experts' understanding of the system. If the insecure plug was due to missing bolts, then this was something else. Securing bolts properly is about the lowest-hanging fruit of high-reliability engineering. It is the kind of thing that manufacturers ought to be catching with their elaborate rules and oversight, long before they even begin their "march of nines".

It is unsurprising therefore, that Boeing's operations have been under the magnifying glass

We should always hesitate to draw large conclusions from small samples, but a failure this ordinary lends credence to increasingly pervasive accounts of Boeing as a company that has gradually lost its way. One expert review commissioned by the FAA into the company found evidence of a "disconnect" between senior management and their staff, while others have pointed to its culture and priorities being increasingly dominated by MBAs rather than the engineers of old.

It is especially significant, when that failure is seen in conjunction with the 2018 and 2019 737-Max disasters — which, unlike the door-plug blowout, were rooted in avoidable shortcomings in the design (rather than the maintenance or manufacture) of the airplane. Those incidents led to 737 Max aircraft being grounded worldwide for more than a year and demands for Boeing to improve its safety record. In July 2024, Boing pled guilty to a criminal fraud conspiracy charge after the US Department of Justice found the company had violated a deal to reform its safety and quality monitoring and reporting.

It is unsurprising therefore, that Boeing's operations have been under the magnifying glass, giving rise to multiple investigations and Senate hearings. No organisational practices ever match their idealised representations, so scrutiny of this kind can create misleading impressions of deviance. Even accounting for this, however, the testimony arising from these investigations has painted a damning picture. Witness after witness has spoken of a company that has increasingly prioritised profit over excellence: cutting corners, weakening its unionised workforce, outsourcing delicate work it used to do in-house and then squeezing its subcontractors with price cuts and more. (Boeing has responded to these accusations, with a spokesperson saying: "Feedback from our employees makes us better and we strongly encourage employees to report any concerns. Boeing employees can anonymously report through a variety of channels including our Speak Up portal or directly to the FAA. When we receive reports, we act swiftly and take necessary action to ensure our airplanes meet our specifications and regulatory requirements."

This is probably the Alaska Airlines incident's real significance. Boeing will surely remedy any specific problem with missing or unsecured bolts; it would be truly incredible if that mistake is ever seen again.

Legislators and regulators are demanding that Boeing address the "gaps in Boeing's safety journey" , but organisational cultures have far more inertia than we usually imagine. They come to be reflected in choices about personnel, procedures, and performance metrics. They are inscribed in manufacturing strategies – regarding outsourcing, for example – through which they become embedded in contracts, budgets, prices, and profit margins.

They can even shape a corporation's geography. In 2001 Boeing relocated its corporate headquarters away from Seattle, where it builds its aircraft, to Chicago, distancing its managers from its engineers. (And in 2022 it moved them again to Washington DC. ) Reconfiguring Boeing's "culture" will mean grappling with all these considerations and many more. To the extent that it will be possible at all, it will be like turning a supertanker.

Even if Boeing’s culture can be changed, however, its legacy will remain. By shaping the company's practices and priorities, that culture will have shaped the designs and manufacture of its airplanes. Most critics tend to date the beginning of Boeing's alleged decline to 1997, when it merged with McDonnell Douglas. Since then, it has introduced three new jetliners – the 787 Dreamliner, the 737 Max, and the (soon to be released) 777X – and it has built the 787 and Max in significant numbers. These aircraft represent huge investments for the airlines that purchased them, and to recoup their costs those airlines will need operate them intensively for many years. If it transpires that their long-term safety has been compromised by shortcomings in their design or construction, recouping those costs could prove dangerous or impracticable.

Getty Images Since World War II, Boeing has been at the vanguard of airliner development, and remains one of the biggest aerospace companies in the world (Credit: Getty Images)Getty Images
Since World War II, Boeing has been at the vanguard of airliner development, and remains one of the biggest aerospace companies in the world (Credit: Getty Images)

The star-crossed service history of the 737 Max speaks eloquently to this danger. The fact that extreme reliability hinges on attending to considerations that are marginal to the point of being negligible, means it can take time for any shortcomings to manifest as failures. This is just to say that a manufacturing deficiency capable of imperiling a jetliner once-in-every-100,000-hours is unlikely to fell any airplanes immediately; especially since new jetliners enter service gradually. The fact that the Maxes have had accidents and near-misses so early in their production runs raises questions for the future. (By the end of June 2024, Boeing had delivered 1,555 of its 737 Max aircraft to customers.)

The record of the 787 Dreamliner is more opaque. Launched in 2004, the airplane has had multiple issues and technical problems that led to emergency landings or groundings. In 2013, for example, all 787s worldwide were grounded for several months due to concerns over the safety of their lithium-ion batteries, which had caused fires aboard several aircraft. To date, however, there have been no catastrophic accidents involving a 787, and this could be reasonably construed as impressive; especially given the number of aircraft now in operation (at the end of June 2024, 1,132 787's had been delivered by Boeing). Certainly, it would be a remarkable accomplishment in almost any other technological domain.

Civil aviation is a unique domain, however, and relative to the expectations by which it operates, the 787's record is far from dispositive. Many shortcomings of design or manufacture take time to manifest.

Time will eventually tell all, of course, but Boeing’s hard-earned and well-deserved reputation for excellence has undeniably put under scrutiny by its recent travails. This matters. Civil aviation isn’t like most engineering domains because the extraordinary reliability we demand of it leaves so little room for deficiency or error. The equanimity with which we eat packaged nuts and watch movies at 40,000 feet is a much rarer and more delicate achievement than we realise.

* John Downer is Associate Professor in Science and Technology Studies at the University of Bristol, and the author of "Rational Accidents." A shorter version of this story was previously published on MIT Press Reader.

--

If you liked this story, sign up for The Essential List newsletter – a handpicked selection of features, videos and can't-miss news, delivered to your inbox twice a week.

For more science, technology, environment and health stories from the BBC, follow us on Facebook and X.