The Silver Bullet or Why Software Is Bad

Latest News

Temporal Intelligence

The Silver Bullet

Project COSA

The COSA System

Software Composition

 

 

Why Software Is So Bad
  The 'No Silver Bullet' Syndrome
  Vested Interest
Why the Experts Are So Wrong
 
There Is a Silver Bullet After All
  Targeting the Wrong Complexity
The Silver Bullet
  Algorithmic vs. Signal-Driven Systems
  Plug Compatible Connectors
  Event Ordering Is Critical
  Imitate Nature's Parallelism
  Software IC's with a Twist
  Failure Localization
  Boosting Productivity
  Slaying the Werewolf
  Software vs. Hardware
  World Safety in the Balance


Abstract: There is something fundamentally wrong with the way we create software. This article describes a silver bullet solution to the problem of software reliability and productivity. The solution requires a fundamental change in the way we program our computers. I will argue that the main reason that software is so unreliable and so hard to produce has to do with a custom that is as old as the computer: the practice of using the algorithm as the basis of software construction. I will argue further that moving to a pure signal-based, synchronous software model will result in at least an order of magnitude improvement in both reliability and productivity.

 

Why Software Is So Bad

The 'No Silver Bullet' Syndrome

In a recent article on the software reliability crisis published by Technology Review, the author blames the problem on everything from bad planning and business decisions to bad programmers. The proposed solution: bring in the lawyers. Not once did the article mention that there might be something fundamentally wrong with the way we develop software. The reason for this omission has to do in part with a highly influential paper titled "No Silver Bullet --Essence and Accidents of Software Engineering" that was published in 1987 by a now famous computer scientist named Frederick P. Brooks. Dr. Brooks writes:

But, as we look to the horizon of a decade hence, we see no silver bullet. There is no single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.

...

Not only are there no silver bullets now in view, the very nature of software makes it unlikely that there will be any--no inventions that will do for software productivity, reliability, and simplicity what electronics, transistors, and large-scale integration did for computer hardware.

Little does Dr. Brooks suspect that the last part of the above paragraph holds the key to the very silver bullet that he so vehemently denies. No other paper in the annals of software engineering has had a more deleterious effect on humanity's efforts to find a comprehensive solution to the software reliability crisis than Dr. Brooks' paper. Almost single-handedly, it succeeded in convincing the entire computer world that there is no hope in trying to find a solution. It is a rather unfortunate chapter in the history of programming. Many human beings have died and will certainly die and tens of billions of dollars have been and will be wasted as a result.

The end result is that most of the burden of ensuring reliability is placed squarely on the programmer's shoulders. An entire reliability industry has sprouted with countless experts and tool vendors touting various labor-intensive engineering recipes, theories and practices. But more than twenty years after people began to refer to the problem as a crisis, the unreliability and low productivity problem is worse than ever. As the Technology Review article points out, the cost has been staggering.

Vested Interest

Software experts (such as the folks at Cigital) have a vested interest in seeing that the crisis lasts as long as possible. It is their raison d'être. Computer scientists and many programmers love Brooks' ideas because an insoluble software crisis affords them with a well-paid job and a lifetime career as reliability engineers. Not that these folks do not bring worthwhile improvements to the table. They do. But looking for a solution that will bring Dr. Brooks' order-of-magnitude improvement in reliability and productivity is not on their agenda. They deny that such a breakthrough is even possible. Brooks' paper is their new testament and 'no silver bullet' their mantra. Worst of all, they are sincere in their convictions.

Be that as it may, calling in the lawyers and hiring more software experts schooled in an ancient paradigm will not solve the problem. It will only be costlier (lawyers, experts and trained engineers do not work for beans) and, in the end, deadlier. The reason is threefold. First, the complexity and ubiquity of software continue to grow unabated. Second, the threat of lawsuits means that the cost (in time and money) of software development will skyrocket. Third, the incremental stop-gap measures offered by the experts are not designed to get to the heart of the problem. They are designed to provide short-term relief at the expense of keeping the experts employed. In the meantime, the crisis continues.

Why ancient paradigm? Because the root cause of the crisis is as old as Lady Ada Lovelace and Charles Babbage of analytical engine fame, as I explain below.

 

Why The Experts Are So Wrong

There Is a Silver Bullet After All

The brains of humans and animals are the existence proof that there is a silver bullet. Robustness and reliability are measured in terms of defects vs. complexity. Because of its sheer astronomical complexity, the brain is the most reliable complex system in the world. In fact, the more complex the brain gets (as it learns), the more reliable it becomes. By contrast, the reliability of software gets worse as its complexity increases. Any handcrafted software with the complexity of the brain would be so riddled with bugs as to be unusable. Conversely, given their low relative complexity, any handcrafted software with the reliability of the brain would almost never fail.

The brain is proof that extreme complexity does not imply a lack of robustness. Its reliability is many, many orders of magnitude greater than that of today's software. Just going up and down a flight of stairs without running into people or driving a car around town (taxi drivers do it all day long, everyday) without getting into an accident is staggeringly more complex than anything any software application in existence can accomplish. Imagine how complex it is to be able to recognize someone's face under all sorts of lighting conditions, velocities and orientations? We do it all the time. How complex are the tasks that a lioness must perform when hunting prey? Or a hummingbird flying around in search of nectar? Robotics researchers continually marvel at the amazing robustness of the complex behavior displayed by creatures as primitive as bees and cockroaches. 

Sure we make mistakes, but the things that we do are so complex, especially the little things that we are oblivious to, that our mistakes pale in comparison to our successes. Biological nervous systems are proof that the reliability of a behaving system (which is what a computer program is) does not have to be inversely proportional to its complexity.

But the brain is not the only proof that we have of the existence of a silver bullet. We all know about the amazing reliability of integrated circuits. In all the years that I have owned and used computers, only once did a CPU fail on me and it was because its cooling fan stopped working. I replaced the fan and everything went back to working perfectly. No one can seriously deny that a modern CPU is a very complex device, what with some of the high-end chips from Intel, Motorola and AMD sporting hundreds of millions of transistors. Moore's law does not seem to have a significant effect on hardware reliability since, to my knowledge, the reliability of CPUs did not degrade over the years as they increased in speed and complexity.

Targeting the Wrong Complexity

These unarguable facts squarely and decisively refute Fred Brooks' 'No Silver Bullet' arguments in as far as they relate to behavioral complexity and reliability. This is not to say that Brooks' arguments are wrong but that they do not apply to complexity in general but only to a specific form of complexity, the complexity of algorithmic software. Dr. Brooks had a particular type of software in mind when he wrote his famous paper even though he may have been under the impression that all software is created equal. In the remainder of this article, I will argue that all the effort in time and money being spent on making software more reliable is being targeted at the wrong complexity. And it is a particularly insidious and intractable form of complexity, one that humanity, fortunately, does not have to live with. Switch to the right complexity and the problem goes away.

The million (billion?) dollar question is, what is it about biological nervous systems and integrated circuits that makes them so reliable in spite of their complexity? But even more important, can we emulate it in our software?

 

The Silver Bullet

Algorithmic vs. Signal-Driven Systems

The primary reason for bad software is that it is based on the algorithm, a practice that is as old as Lovelace and Babbage (*). The brain, by contrast, is a parallel signal-based system. The reliability of the brain is due primarily to two reasons:

a) The strict enforcement of precise signal timing through synchronization. Neurons fire at the right time, under the right temporal conditions. Timing is consistent because of the brain's parallel architecture. A similar case can be made with regard to integrated circuits.

b) The highly distributed nature of the brain's elementary components. This means that the localized malfunctions of a few (or even many) components will not cause the catastrophic failure of the entire nervous system.

By contrast, an algorithm function or subroutine is not unlike a chain. Break a link and the entire chain is broken. With algorithmic software it is virtually impossible to guarantee the timing of various processes because the execution times of software subroutines or functions vary unpredictably. They vary mainly because of a construct called conditional branching, a necessary decision mechanism used in algorithmic sequences. But that is not all. While a subroutine is being processed, the calling function essentially goes into a coma. The use of threads and message passing between threads does somewhat alleviate the problem but the multithreading solution is way too coarse and unwieldy to make a difference in highly complex applications. The inherent uncertainty of algorithmic systems leads to program decisions happening at the wrong time, under the wrong conditions.

Every time a programmer runs an algorithm, he or she is sending signals even though he or she may not realize it. During execution, every statement or operation in a procedural code is essentially sending a signal to the next statement, saying: 'I'm finished, now it's your turn.' My thesis is that this sort of rigid signaling is dangerous and is prone to errors. Why? Because, within any given procedure, communication (signaling) is limited to two objects (code statements) at a time. More often than not, this is fine, but there are occasions when a particular event or action must be communicated to several objects simultaneously. Algorithmic development environments make it hard to attach orthogonal signaling branches to a sequential thread and therein lies the problem. The burden is on the programmer to remember to add code to handle delayed reaction cases: something that occurred earlier in the procedure needs to be addressed at the earliest opportunity. Every so often we either  forget to do so or we fail to spot the dependency.

Indeed, many parts of a program may depend critically on changes to a variable or property. It frequently happens that the variable is modified by a statement in a procedural code, unbeknownst to the rest of the program. By the time the other parts learn of the modification, the damage is usually done. Blind code leads to wrong assumptions which often result in catastrophic failures. This kind of problem does not exist in a parallel, change-driven system because every change to a variable is immediately communicated to all objects that are affected by the change. All objects must, in a sense, have eyes in the back of their heads. The liberal use of change detectors or sensors will solve this problem by taking care of behavioral side effects as they happen. (See Project COSA).

Many people have suggested that we should componentize software in the hope that we can do for software what integrated circuits have done for hardware. This is all well and good and certainly a giant step in the right direction. But, even though the use of software components (e.g., Microsoft's ActiveX® controls, Java beans, etc...) in the last decade has automated some of the pain out of programming, the reliability problem is still with us. The reason should be obvious: software components are constructed with things that are utterly alien to a hardware chip designer, algorithms. A thoroughly tested algorithmic software component may work fine in one application and fail in another. The most likely reason is that its temporal behavior is not consistent. It varies from one environment to another.

Plug Compatible Connectors

Another known reason for bad software has to do with connection types. In the brain, signal pathways are not connected willy-nilly. Connections are made according to their types. Refer, for example, to the retinotopic mapping of the visual cortex. We should do the same with software. All message connectors should have message types, and all connectors should be either male or female to ensure robust connectivity and automated compatibility. The use of pre-built, snap-together, plug-compatible components should automate over 90% of the software development process and would turn everyday users into software developers.

Some may say that typed message connectors are not new and they are correct. Objects that communicate via connectors have indeed been tried before. However, as I mentioned earlier, in a pure signal-based system, objects will not contain algorithms. Algorithmic-like sequences can be composed by stringing primitive objects together. In fact, the only pure algorithmic code that should exist in the entire system is a small OS microkernel. No new executable code should be allowed. The microkernel should run everything.  Furthermore, the parallelism and the signaling mechanism should be implemented and enforced at the operating system level in such a way as to be completely transparent to the software designer. (Again, see Project COSA).

Event Ordering Is Critical

Timing is of the essence but, as I explained above, the use of algorithms plays havoc with event ordering. To ensure temporal order consistency, the prescribed ordering of every operation or action in a software application must be maintained throughout the life of the application. Nothing should be allowed to happen before or after its time. In a signal-based, synchronous software development environment, the enforcement of order is not something that developers or designers need to be concerned with because it is a natural consequence of the system's parallelism.

Note that the term 'timing', as used in these pages, does not mean that operations should be synchronized to a real time clock. It means that the prescribed logical or relative order of operations must be enforced throughout the life of the system.

Using the parallel synchronous approach, the more complex the system gets during development, the more reliable it will become. This is because adding new signal pathways and connections also add new temporal constraints to the system, making it more robust.

Imitate Nature's Parallelism

To solve the crisis we must imitate nature. Objects in nature behave synchronously, Why should software objects be any different? We must stop using the algorithm as the basis of computer programming. We must abandon our linguistic and textual past and embrace a visual future. We must forever stop thinking of the computer as a machine for the execution of instruction sequences. The computer should be viewed as a communication system, i.e., a collection of synchronously interacting objects. In other words, software should be more like hardware with various interacting objects and modules working in parallel, sending and receiving messages from one another through their input and output connections.

Even though software is inherently sequential due to the von Neumann architecture of our computers, thanks to the high speed of modern processors, we can easily emulate the parallelism of integrated circuits or the nervous system in software. This is not new. We can already emulate nature's parallelism in our artificial neural networks, cellular automata and other types of simulation applications. For example, chip manufacturers emulate their chip designs in software.

Software ICs with a Twist

In a 1995 article titled "What if there's a Silver Bullet..." Dr. Brad Cox wrote the following:

Building applications (rack-level modules) solely with tightly-coupled technologies like subroutine libraries (block-level modules) is logically equivalent to wafer-scale integration, something that hardware engineering can barely accomplish to this day. So seven years ago, Stepstone began to play a role analogous to the silicon chip vendors, providing chip-level software components, or Software-ICs[TM], to the system-building community.

While I agree with the use of modules for software composition, I take issue with Dr. Cox's analogy, primarily because subroutine libraries have no analog in integrated circuit design. The biggest difference between hardware and conventional software is that the former operates in a parallel, signal-based universe where timing is systematic and consistent, whereas the latter uses sequential algorithms which result in haphazard timing. I believe that this is the main reason that hardware is orders of magnitude more reliable than software.

Another reason that electronic logic circuits are so much more reliable than software is that timing problems are immediately nipped in the bud due to the inherent parallelism of hardware. The circuits simply will not work properly if the timing is wrong. But once the timing gets to the point where it is working correctly, it will continue to do so for the life of the circuit, barring some physical failure. So automatic timing enforcement and system-wide parallelism and synchronicity are some of the keys to reliability.

Failure Localization

An algorithmic program is more like a chain, and like a chain, it is as strong as its weakest link. Break any link and the entire chain is broken. This brittleness can be somewhat alleviated by the use of multiple communicating threads. A malfunctioning thread usually does not affect the proper functioning of other threads. Failure localization is a very effective way to increase a system's fault tolerance. But the sad reality is that, even though threaded operating systems are the norm in the software industry, our systems are still susceptible to catastrophic failures. Why? The answer is that threads do not entirely eliminate algorithmic coding. They encapsulate algorithms into concurrent programs running on the same computer. Another even more serious problem with threads is that they are asynchronous. Synchronous communication between interacting objects is a must for reliability.

Threads can carry a heavy price because of the performance overhead associated context switching. Increasing the number of threads in a system so a to encapsulate elementary operations quickly becomes unworkable. The performance hit would be tremendous. Fortunately, there is a simple parallelization technique that does away with threads altogether. It is commonly used in such applications as cellular automata, neural networks, simulations and other programs. See Project COSA for more details.

Boosting Productivity

How can the adoption of a pure signal-based, synchronous paradigm increase productivity? The answer has to do with signals and how they relate to software components. 

A concurrent and synchronous software model lends itself well to a pure graphical development environment for composing software. No cryptic language and no complex algorithms are needed. It is much easier to follow signal activation pathways on a diagram than it is to decipher someone's obscure algorithmic code spread over multiple text files. The application designer can get a much better feel for the flow of things as every signal propagates from one object to another using unidirectional pathways.

The above gains in productivity will be due mainly to an increase in program comprehensibility. But what will really boost productivity by at least an order of magnitude will be the much fewer number of bugs to fix. It is common knowledge that most of the average programmer's development time is spent debugging. A signal-based reactive system is not only easier to debug, but the system itself will find most of the hidden bugs automatically (see Project COSA). But that is not all. The use of reusable, plug-compatible components eliminates a huge number of opportunities for mistakes.
In summary, a signal-based reactive environment facilitates safe, automated software development and thus opens up software development to the lay public.

Slaying the Werewolf

Frederick Brooks is right about one thing. There is indeed no silver bullet that can solve the reliability problem of algorithmic systems. But, as I mentioned previously, what Brooks and others fail to consider is that is that his arguments apply only to algorithmic software. The bullet should be used to slay the beast once and for all, not to alleviate the symptoms of its incurable illness.

Software vs. Hardware

Software should not be radically different than hardware. Rather, it should serve as an extension to it. Software should emulate the functionality of hardware by adding only what is lacking in it: flexibility and ease of modification. In the future, when we develop technologies for non-von Neumann machines that can sprout new physical signal pathways and new self-executing objects on the fly, the distinction between software and hardware will no longer exist.

World Safety in the Balance

In conclusion, we can solve the software reliability and productivity crisis. To do so, we must acknowledge that there is something rotten at the core of software engineering. We must understand that using the algorithm as the basis of computer programming is the main cause of the problem. The algorithmic approach is the last of the stumbling blocks that are preventing us from achieving an effective and safe componentization of software comparable to what has been done in hardware. It is the reason that current quality control measures will always fail in the end. Adopting a pure, signal-based, software construction and execution model will ensure that our systems get more robust and reliable as they grow larger and more complex. This is critical to world safety. Given the ubiquity of software in every aspect of modern life, there is always the danger that major disasters and even nasty wars will be triggered as a result of faulty software systems. Software has become too much a part of our everyday life to be entrusted to the vagaries of an outdated and flawed paradigm. We need a new approach worthy of the twenty-first century. The world cannot afford to continue doing business as usual.

Next: Project COSA 

 

 

Subscribe to the Silver Bullet Discussion Group

           

Join to Receive the Latest News via Email

Powered by groups.yahoo.com

Alternatively, those of you who do not wish to register can read and post messages without becoming a member. Only members can choose to receive email notifications.

 

------

*Lady Ada Lovelace and Charles Babbage invented the sequential stored program (or table of instructions) for the analytical engine around 1842. But the idea of using a step by step procedure in a machine is at least as old as Jacquard's punched cards which were used to operate the first automated loom in 1801. The Persian mathematician al Khowarizmi is credited for having invented the algorithm in 825 AD. The word algorithm derives from al Khowarizmi.

 

Last Update: 7/20/2002

Send all comments to:  louis.savain@sbcglobal.net

©2002 Louis Savain