The Software Apocalypse
/There was a great article published in The Atlantic late last year, The Coming Software Apocalypse, that took a hard look at the crossroads of software ubiquity, safety, and subject expertise (does anyone actually really understand how anything works anymore). The evolution of technology has been so exhaustingly expeditious that for the average American it can be easy to forget both how amazingly complex software is but also that technology once existed without it altogether. In 2014, the entire State of Washington experienced a six hour blackout of its 911 system. During this time, if you were to dial 911 you would have heard a busy signal, a frightening sound if say you were alone in your house subject to an (alleged) breaking and entering. Which in fact the story cites as at least one example of why the 911 system going down is a bad thing (the homeowner actually called at least 37 times). It was later discovered that the outage was caused by a glitch in the software code designed to keep a running count of incoming calls for recordkeeping. Turns out, the developers set the counter upper limit to a random number in the millions which just so happen to occur. With each new call, a unique number was assigned. On the day the upper limit was reached, calls were rejected because they were not assigned a unique number. Insert chaos.
The programmers of the software did not immediately understand the problem in part because it was never deemed critical. And because it was not critical it was never assigned an alarm. There was a time when emergency calls were handled locally, by people. The promise of innovation led the system to shift away from mechanical or human operation and rely more and more on code.
“When we had electromechanical systems, we used to be able to test them exhaustively. We used to be able to think through all the things it could do, all the states it could get into,” states Nancy Leveson, a professor of aeronautics and astronautics at MIT. Thinking about a small system (a gravity fed sewer wet-well, an elevator, a railroad crossing) and all of its modes of operation, both likely and unlikely, you can jot down on a single sheet of paper. And each one of those items you can visually inspect, observe, and verify appropriate (and inappropriate) responses to operating scenarios and externalities.
Software is different. By editing the text in a file somewhere (it does not have to be local to the hardware), that same processor or controller can become an intelligent speaker or a self-driving car or logistics control system. As the article states, “the flexibility is software’s miracle and its curse. Because it can be changed cheaply, software is constantly changed; and because it is unmoored from anything physical-a program that is a thousand times more complex than another takes up the same actual space-it tends to grow without bound.” “The problem is that we are building systems that are beyond our ability to intellectually manage,” says Leveson.
Because software is different, it is hard to say that it “broke”, like say an armature or a fitting break. The idea of “failure” takes on a different meaning when applying it to software. Did the 911 system software fail or did it do exactly what the code told it to do? The reason it failed was because it was told to do the wrong thing. Just like a bolt can be fastened wrong or a support arm is designed wrong, wrong software will lead to a “failure”.
As software-based technology continues to advance, as engineers we need to keep all of this in the back of our minds. It is challenging to just be a single discipline engineer these days. To really excel in your (our) field, you must be able to think beyond your specialty to fully grasp the true nature of your design decisions.