A strategic inflection point is a time in the life of business when its fundamentals are about to change. That change can mean an opportunity to rise to new heights. But it may just as likely signal the beginning of the end.
—Andrew S. Grove, Only the Paranoid Survive (1988)
In the late 1960s Intel sniffed the winds of change and made a dramatic decision to build microprocessors instead of random access memory, RAM. Former Intel CEO Andy Grove describes this abrupt change in Intel’s strategy as an inflection point. As we know from mathematics, the coordinates of an inflection point are where a curve changes direction—typically from up to down or the reverse. In the case of young Intel, the RAM business was no longer profitable, and yet, the profitability of microprocessor chips was totally unknown. It was a choice between dying along with the RAM market or possibly dying with an unproven product in a non-existent market. We now know Grove was right, but he could have been wrong. Such is the life of an entrepreneur.
It has been 50 years since Intel’s inflection point was recognized and then mostly forgotten. But the company and the industry it grew up with is facing another inflection point—the demise of Dennard scaling—the 1974 rule that the power consumption of CMOS chips remains constant as transistors are scaled down in size.
The signs are beginning to appear. Heat and energy dissipation is beginning to be a problem; and the cost of building the next fabrication facility is rocketing to $14 billion or more.
Intel missed the transition from 10nm to 7nm nodes, the current state of the art. Currently, industry-leading Intel has no significant market share in mobile computing, massively parallel GPUs (graphical processing unit), and TPUs (tensor processing units), which are lower-precision specialized processors for performing the node functions of an artificial neural network). Add on the fact that the company continues to miss deadlines for matching its competitors in the 7nm and 5nm node competition, Intel’s domination in enterprise computing, with its x86 architecture, is under siege by faster and cooler products from competitors.
While the enterprise segment is thriving because of cloud computing, the future is leaning heavily toward the internet of things (IoT) versus continued growth of cloud servers. What can be done? This is the burning question forcing the semiconductor industry, as well as researchers at numerous university labs, to evaluate everything.
I call this the 50-year inflection point. Nobody wants it, but everybody has to confront it.
By whatever measure of performance the industry uses, the fact is that heat-energy limitations are being reached using existing CMOS. If we want to track Moore’s law for another 50 years computer hardware must make a technology jump from CMOS to something else. What might that be?
Technology Jumping Options
The 50-year inflection point is just as uncertain and ill defined as the first inflection point described by Grove. There are options, but which option can an entire industry turn to as it did with CMOS? There is no definitive answer, which is both good and bad news . Rather, it appears there are a handful of bets to be made by the next-generation of entrepreneurs following in Grove’s footsteps.
The easiest path is the one we are currently on: changing the architectural paradigm from von Neumann to massively parallel architectures, with many parallel processors for supporting deep learning algorithms. Instead of multiple instruction pipelines, the connectionist machine contains multiple cores. And these cores may be stacked up in three dimensions. Moreover, cores may be simpler than von Neumann arithmetic units, because of the value of threshold computation in machine learning. Connectionist algorithms often require simple operations, but lots of them. For example, TPUs need only compute a sigmoid function at relatively low precision to simulate neural network behavior. A neuron “fires” when the sum of its inputs exceeds a threshold defined by the classical sigmoid, or similar, simple function. A massive number of TPUs fit where a much larger CPU used to fit—typically on the order of three to five thousand TPUs.
To get the most out of the relatively large number of cores, applications must be reformulated as massively parallel processes. Fortunately, this is not difficult to do for many useful applications such as HTTP serving, AI, image and voice recognition, and certain useful optimizations. Unfortunately, heat-energy dissipation remains a problem that continues to get worse as engineers attempt to increase performance. Supercomputers based on this paradigm are starting to consume enormous amounts of electricity. It’s been noted massively parallel bitcoin mining GPUs consume as much energy as a small nation.
The next path is called adiabatic reversible computing. The objective of adiabatic reversible computing is to conserve rather than consume energy at the lowest level of action. The concept is easy to understand, but the technology is not. At each step, the conventional logic operation AND, XOR, and NOR take two input bits and produce one output bit, thus losing one bit and the energy needed to represent it. Prototypes of reversible circuits have been demonstrated to conserve heat-energy by storing those lost bits as so-called garbage bits instead of dissipating them, but no commercial products have been developed for sale. On the positive side, it offers an avenue around the heat-energy problem currently facing CMOS machines.
Adiabatic reversible architectures raise interesting research questions. For example, is it necessary to invent entirely new operating systems and languages that conserve rather than consume energy? Entropy increases (and so heat is dissipated) when the value of C is replaced by C = A+B, for example. It is an unresolved question whether bit-loss at the programming language level compiles into energy loss at the circuit level when the circuits are adiabatic reversible. An “adiabatic reversible” compiler must produce code with additional garbage variables where energy can be conserved and used later on in the program, rather than dissipated. Assuming this problem can be solved by advanced compiler technology, the next question is how fast can an adiabatic reversible computer operate? Speed sells, and so the market for lower performing energy conserving hardware may not support the necessary investment to transition an entire industry to adiabatic reversible computing.
The next option is to take the much bigger leap into quantum computing. Currently, there are two approaches I designate as analog versus digital. An analog quantum computer, like the D-Wave machine, stores and processes information in the form of quantum states. Specifically, information is recorded in the spin of electrons arranged in qubits. Unlike bits, qubits are capable of multiple states at the same time, called superposition. Think of these states as analogous to the frequencies in white light. A ray of white light is a collection of coherent frequencies, because it contains all frequencies of visible radiation at once. Similarly, a collection of qubits are said to be coherent if they remain in an entangled and superposition state. They are said to collapse into one state—their ground state—when coherence is removed. It is as if a filter lens is inserted into the white light so that we can see only red.
The trick is to maintain coherence long enough to do useful work, and then allow the superposition states to collapse into a ground state—that is the solution to a mathematical problem. Suppose a room full of people are asked to form a line in alphabetical order by last name, without giving instructions on how to go about it. At first, the motion of people moving around to form a line appears chaotic. They are in multiple states. Eventually, the line emerges from the chaos. This is the ground state reached by each person “collapsing” into their proper place in line. The ordered line forms a minimum state following collapse. Likewise, the qubits of a quantum computer form a minimum state following collapse, if coherence and superposition are carefully controlled.
(For the reader who understands simulated annealing algorithms, the settling down to a collapsed state is very much like the gradual “cooling” of simulated annealing.)
Qubit machines are analog in the sense that computation is achieved by formulating a problem in terms of the Schrodinger equation or some energy minimization equation. For example, an analog quantum computer has quickly solved the NP-complete traveling salesman problem. The solution is equal to the minimum energy state of entangled and coherent qubits as they settle into an energy well formed by ground states.
More significantly, Shor’s prime number factoring equation has been solved by a quantum computer for small numbers, P. Superposition collapses such that the minimum energy state is (P–N*M), where P is a number to be factored into its primes, N and M. Initially, the quantum computer is simultaneously in all possible N*M states for all possible values of N, M (up to square root of P). As the superposition collapses, and under conditions dictated by (P–N*M), the collapsed ground states “solve” the minimization equation (P–N*M). In this case, the collapsed state is zero, and the values of N and M are the prime factors of P. Of course this is a vastly simplified example, because there may be multiple prime factors of P. For example (105-3*5*7) is zero.
Note: When a quantum computer is able to quickly factor large numbers using Shor’s algorithm described above, it may be only a matter of time before the Diffie-Hellman-Merkle encryption scheme is rendered useless. Fortunately, a quantum key distribution algorithm already exists that can replace Diffie-Hellman-Merkle key exchange with a quantum key exchange technology that does not rely on prime numbers. Quantum key exchange is another story.
So far, analog quantum computing machines must be kept extremely cold and be able to maintain coherence for useful periods of time. They are unlikely to ever fit in a smartphone, or watch. In addition, they are difficult to program, because programming means solving an equation in quantum mechanics.
Quantum dot computers, QDC, are digital rather than analog and are based on room-temperature quantum dots rather than entangled qubits. A quantum dot is typically an electron that has been confined or squeezed into a very small volume, such that it has limited states and communicates only through tunneling. Instead of an entangled and superimposed qubit, two pairs of confined electrons are squeezed together to form a bit. Spin is still used to encode zero and one states. Under the right conditions, one pair of electrons are flipped when the other pair of electrons flip—or not, depending on the desired operation.
The two pair of electrons confined in a very small space (100nm or less) are used as building blocks for a cellular automaton that simulates Boolean logic. For this reason, the quantum dot computer is called a QCA, quantum-dot cellular automaton. A cellular automaton can perform all the Boolean logic operations needed to perform operations on bits. In fact, adiabatic reversible QCAs have been designed to combine low heat-energy properties with programmable digital logic.
Researchers in Iran have designed such a machine, but as far as this author knows, nobody has built a running machine. Figure 1 illustrates their design of a full adder using quantum dots to build a cellular automaton. Each two-pair cell forms a qubit-like bit that communicates with its neighbors via tunneling electrons. The GAR (garbage) bits are used to conserve heat/energy so that the QCA is adiabatic (no loss of heat/energy to the outside). This QCA computes binary sum = (a+b) with carry in and out.
Which option should be pursued for the next 50 years?
Nobody knows. However, QCA has many attractive properties found lacking in the other options. Quantum dots can function at room temperature and adiabatic reversible cellular automata are easy to design. It is unclear how fast they can be made to switch, and of course the entire design stack of the CMOS industry would have to be converted.
But then, what is the alternative?