Recent outages prove need for transparency around internet infrastructure

Society has taken leap of faith in cloud computing but analysis and regulation is limited

Other, more mature, industries have developed practices to cultivate and disseminate best practice, to analyse, publish and so learn from incidents and accidents. Photograph: Leon Neal/Getty Images

Other, more mature, industries have developed practices to cultivate and disseminate best practice, to analyse, publish and so learn from incidents and accidents. Photograph: Leon Neal/Getty Images

 

In his 2001 book, Fooled by Randomness, the former risk analyst and financial options trader Nassim Nicholas Taleb observed that an entire collection of thought can collapse if any one of its fundamental assumptions is disproved. As one example, he noted that for at least 13 centuries, Europeans had believed all swans to be white, since all known European historical archives recorded that swans always have white feathers. However, in 1697, a Dutch sea captain Willem de Vlamingh, on a rescue mission searching for survivors from a compatriot ship lost two years earlier, explored a river on the coast of “New Holland” and was astonished to observe black swans. He named the estuary Swan river, and today the Australian city of Perth stands on its shores.

Taleb’s subsequent 2007 book, The Black Swan, documents numerous examples of what he termed “black swan events”, each of which were outside the realm of historical expectations, then had considerable impact when they occurred, but which in hindsight might have been entirely predictable (and even avoidable).

On Tuesday last week, substantial disruption and damage was caused worldwide by a black swan event in the infrastructure of the internet. An unnamed customer of the American services provider, Fastly, in San Francisco, made a routine (and entirely legitimate) change in its choice of settings for its Fastly service. The event triggered a bug within the Fastly software which then for several hours crippled a considerable number of news web sites, public information web sites (including both for the White House and the UK government), and various ecommerce websites.

Subsequently last Friday, one of Fastly’s chief competitors, Cloudflare – also based in San Francisco – failed for many of its customers in Los Angeles and Chicago, including for users of the popular messaging service Discord. Although service was restored reasonably quickly, the incident was a further reminder of the fragile nature of internet infrastructure after the Fastly event earlier the same week.

The internet has become a pervasive tool across much of the planet, at the heart of business systems and ecommerce, global news and social media commentary, health and public welfare. No lives are known to have been directly lost due to last week’s outages, but some ecommerce sites complained of damage to their revenues beyond just minor inconvenience. Fortunately, the failures were moderately short-lived but for smaller enterprises and start-ups who have come to trust internet infrastructure for both their sales and supplies, any lengthy loss of service would potentially be catastrophic given their relatively limited balance sheets.

Last week has shown that while the facade may be elegant, some service architectures may be but a house of cards

There has been a leap of faith in the last two decades, actively promoted by major internet service providers, to put “everything” into the “cloud” and so dispense with local computing resources and software applications run on-site within an organisation’s premises. Vendors advocate that their software be used on a subscription basis in the cloud rather than purchased and owned, and that computing should be an operational expense rather than capital investment. The implication is that system-wide failure by the major software vendors is highly unlikely. However, last week has shown that while the facade may be elegant, some service architectures may be but a house of cards. Third-party independent verification of the reliability claims and assertions from cloud vendors is currently limited. Has the time come for the internet industry to finally wise up?

Other, more mature, industries have developed practices to cultivate and disseminate best practice, to analyse, publish and so learn from incidents and accidents. The aviation industry established its first accidents investigation committee just nine years after the first powered flight, in an initiative taken in 1912 by the Royal Aero Club. The maritime industry has a long history of accident investigation, in part driven by insurers keen to understand how and why incidents had occurred. Ethics in the medical sector emphasise that in appraising a situation, first ensure that no (further) harm is done.

The investigation of incidents need not necessarily result in assignment of blame and liability. Rather, the emphasis is more frequently on identifying the flaws in processes which led up to the event, and auditing the preventative procedures which were supposed to preclude the incident from happening. The goal is not only to ensure that the accident is not repeated, but also to ensure that related accidents cannot occur anywhere in the future. Incident reports are published to partners and competitors alike and so made widely available across an industry. Whistle-blowing legislation is in place in many countries, and professional and regulatory bodies can add their considerable weight and power when appropriate. Despite change from continuous innovation and advances in technology, a learning culture emerges throughout the industry, as peer pressure and professionalism disdain any who are seen to repeat the well-publicised mistakes of the past.

It is time for the specialists in the internet industry to become transparent, open and professional. The rest of us have placed tremendous faith, confidence and investment in their artefacts and while mistakes may occur, reoccurrence is frankly no longer an acceptable option.

The Irish Times Logo
Commenting on The Irish Times has changed. To comment you must now be an Irish Times subscriber.
SUBSCRIBE
GO BACK
Error Image
The account details entered are not currently associated with an Irish Times subscription. Please subscribe to sign in to comment.
Comment Sign In

Forgot password?
The Irish Times Logo
Thank you
You should receive instructions for resetting your password. When you have reset your password, you can Sign In.
The Irish Times Logo
Please choose a screen name. This name will appear beside any comments you post. Your screen name should follow the standards set out in our community standards.
Screen Name Selection

Hello

Please choose a screen name. This name will appear beside any comments you post. Your screen name should follow the standards set out in our community standards.

The Irish Times Logo
Commenting on The Irish Times has changed. To comment you must now be an Irish Times subscriber.
SUBSCRIBE
Forgot Password
Please enter your email address so we can send you a link to reset your password.

Sign In

Your Comments
We reserve the right to remove any content at any time from this Community, including without limitation if it violates the Community Standards. We ask that you report content that you in good faith believe violates the above rules by clicking the Flag link next to the offending comment or by filling out this form. New comments are only accepted for 3 days from the date of publication.