VCU Scholars Compass

. Information processors process information in a variety of ways. The human brain processes information through a highly interconnected system of neurons and synapses, while a digital computer processes information by having a binary switch toggle on and o ﬀ in response to a stream of binary bits. The “switch” is the most primitive unit of the modern computer. The better it is (faster, more energy e ﬃ cient, more reliable, etc.), the more advanced is the computer hardware. Energy e ﬃ ciency, however, is more important than any other attribute, not so much because energy is costly, but because too much energy dissipation prevents increasing the density of switches on a chip that is necessary to make the chip increasingly more powerful. Reducing dissipation entails radically new and often revolutionary approaches for implementing the switch. One such approach is to encode digital bit information in the spin polarization of a single electron (or ensemble of electrons) and then using two mutually antiparallel polarizations to represent the binary bits 0 and 1. Switching between the bits can be accomplished by simply ﬂipping the polarizations of the spins, which takes very little energy. Such switches are extremely energy e ﬃ cient if designed properly, but they are somewhat slower than traditional transistor-based switches and can be more error prone. This paper discusses the pros and cons of spin-based switches and introduces the reader to the most recent advancements in information processing predicated on encoding information in electron spin polarization.


Introduction
Information processors (computers, cell phones, digital watches, personal communicators, etc.) pervade our everyday lives.This paper, for example, is typed in a desktop computer and the author used his cell phone several times during the typing of this paper.The information overload that our society deals with routinely requires ever-increasing computational prowess that can only be attained by packing more and more computing devices in a chip.Since the chip area is limited by considerations of cost, convenience, and practicality, one must increase the density of devices in a chip to keep abreast of the ever-increasing demands of computing.This was foreseen by the visionary cofounder of Intel Corporation who postulated the famous Moore's law [1] stipulating that the density of devices in a chip must double roughly every 18 months.In the past, Moore's law has been sustained; the density has increased roughly by a factor of 2 every 18 months, but a calamity looms in the horizon.What might stop device downscaling in accordance with Moore's law is not so much the difficulty of fabricating smaller and smaller devices, nor is it the fact that classical laws of physics will be defunct when device dimensions approach atomic scales, but it is the unmanageable energy and heat dissipation associated with switching of a device.Present transistors dissipate about 0.2 fJ of energy (∼50,000kT at room temperature; k = Boltzmann constant and T = absolute temperature) when they switch in isolation in ∼100 ps.Therefore, the power dissipated per device per switching event is about 2 μW.The Pentium IV chip of circa 2000 had a transistor density of 10 8 /cm 2 [2] and even if 10% of them switched at any given time, the power dissipation would have been 20 W/cm 2 .That is roughly what the Pentium IV chip actually dissipated.Now imagine what would happen if transistor density increased in accordance with Moore's law.By the year 2020, the density will be 8 × 10 11 /cm 2 and the dissipation would have increased to 164 kW/cm 2 .There is no known heat sinking technology that can remove that much of heat from a chip.Surely, the chip would melt!This is the major problem threatening electronics today.
Excessive energy dissipation is virtually unavoidable in all charge-based digital switches like transistors that encode binary bit information in the amount of charge stored in the device.Charge is a scalar quantity that has magnitude but no direction.Hence, if binary bits 0 and 1 are to be encoded with charge, they must be represented by two different amounts of charge Q 1 and Q 2 .Switching between the bits will then necessitate changing the amount of charge in the device by an amount ΔQ = |Q 1 − Q 2 | in some time Δt, leading to the flow of current I = ΔQ/Δt and the associated energy dissipation I 2 RΔt = (ΔQ) 2 R/Δt, where R is the resistance in the path of the current.One can reduce this dissipation by increasing Δt (switching slowly) or by decreasing ΔQ, but neither is desirable since the former makes the switch slow and sluggish, while the latter makes the switch vulnerable to noise since it decreases the separation between the 0-and 1-states by bringing the two closer together.
The "spin" of an electron is a quantum-mechanical property and can be crudely thought of as the tiny magnetic moment associated with the electron spinning about its axis.It is a pseudovector that has a fixed magnitude of D/2 (D = reduced Planck's constant) and a variable direction or polarization.If the electron is placed in a magnetic field, only two polarizations are allowed and therefore can be viewed as stable and metastable.The polarization parallel to the field will be stable and that antiparallel to the field will be metastable.These two polarizations can encode the binary bits 0 and 1. Switching between them will involve merely flipping the spin, without moving the electron in space and causing current flow as shown in Figure 1.This eliminates the I 2 RΔt dissipation, but does not eliminate dissipation altogether since the two spin states are nondegenerate and separated in energy by the Zeeman splitting energy gμ B B (g = Landé g-factor, μ B = Bohr magneton, B = flux density of the magnetic field).Therefore, the minimum energy dissipation in flipping a spin would have been gμ B B per bit flip event.This energy, however, can be a lot smaller than I 2 RΔt = (ΔQ) 2 R/Δt incurred in switching a transistor switch.

Single Spin Logic (SSL)
The notion of using the bistable spin polarizations of a single electron placed in a magnetic field to encode the binary bits 0 and 1 is at the heart of an exotic idea known as Single Spin Logic (SSL) [3][4][5][6][7][8].
In SSL, single conduction band electrons are confined in semiconductor quantum dots that are delineated on a wafer.The entire wafer is placed in a dc magnetic field generated by a permanent magnet or an electromagnet.This global magnetic field defines the spin quantization axis and makes the spin polarization of every conduction electron bistable, that is, only polarizations parallel and antiparallel to this field are stable or metastable in each dot.This is the first step in making binary switches.
In order to ensure single electron occupancy in every dot, we have to ensure that the Fermi level (or chemical potential) in each dot is above the lowest spin split level in the conduction band but below all other levels.In that case, Pauli Exclusion Principle and Fermi-Dirac statistics .The device is "on" when there are charge carriers (electrons) at the semiconductorinsulator interface (channel) to carry current between the source and drain contacts.A negative voltage applied to the gate depletes the channel of charge carriers and turns the device "off."These two states encode the binary bits 0 and 1. (Right panel): the binary bits 0 and 1 are encoded in the up-and down-spin polarizations of an electron.
would guarantee that there will be only one conduction band electron (or quasi-free electron) in each dot at low temperatures.One way to make this happen is to make sure that the energy cost to accommodate a second electron in any dot, which is roughly e 2 /2C (e = electron charge and C = capacitance of the dot), is prohibitively large and exceeds the thermal energy kT by many times.This would prevent a second electron from getting into any dot.Single electron occupancy in an array of ∼10 8 dots has been demonstrated experimentally [9].
The wavefunction of the lone conduction band electron in any dot is sufficiently delocalized that the wavefunctions of electrons in nearest neighbor dots overlap in space.Valence band electrons are tightly bound to their parent atoms and have localized wavefunctions that do not overlap with the wavefunctions of other electrons.Therefore, they play no role in what is discussed next.
Because of the overlap between the wavefunctions of nearest neighbor conduction band electrons, their spins can interact via exchange.Spin-spin interaction between second nearest neighbors is much weaker since exchange interaction strength decays exponentially with distance [10].For our purpose, we can ignore second-or more distant neighbor interactions altogether.
It is possible to align the spins in certain chosen dots (designated as input dots) in desired directions (parallel or antiparallel to the global magnetic field) using external agents, such as local magnetic fields.(Local magnetic fields for this purpose can be generated with spin-polarized scanning tunneling microscope tips or even current lines if sufficient lithographic resolution is achievable).This is how one "writes" input data into the array.The arrival of the inputs takes the interacting array into a many-body excited state.The system is then allowed to relax to the thermodynamic ground state by coupling to the surrounding thermal bath.(The coupling of a single-isolated electron to the thermal bath is weak, but the collective coupling of many interacting electrons to the thermal bath is much stronger.Hence the entire spin system should relax to ground state much faster than an isolated spin.).When the ground state is reached by emitting phonons, magnons, and so forth, the spin orientations in certain other chosen dots (designated as output dots) will represent the result of a specific computation in response to the input bits.The quantum dots are arranged in space in such a way that the nature of the nearest neighbor interactions guarantees this occurrence.Thus, computation is carried out by engineering the spinspin interactions by choosing appropriate layout of the quantum dots, which determines the nature of the spin-spin interactions.In many ways, this is a "collective computation" model, similar to neural networks.
Once the system has fully relaxed to the ground state, the result of the computation (spin orientations in output ports) can be read using a variety of schemes, all of which have been experimentally demonstrated [11][12][13] (reading).Since this is an "all-hardware" computer with no involvement of any "software," it is extremely fast in producing the final result.The disadvantage, however, is that a particular computer can do only one specific computation since the computer is entirely hard wired and is not easily reconfigured for a different task.The precise placements of the quantum dots on the wafer determine the nature of the exchange interactions and hence the specific computational task that can be carried out by the spin array.The layout is the key; it determines uniquely what kind of computation is performed.
Note that SSL is an equilibrium system where the spins are not intentionally maintained out of equilibrium.In fact, computation is performed by letting the excited spins thermodynamically relax to the ground state by coupling with the thermal bath (phonons).Therefore, this paradigm has some in-built noise immunity because the ground state is always the most stable.On the flip side, it does not exploit any possible advantage of nonequilibrium dynamics in computing that has been discussed in [28][29][30].Maintaining a system perennially out of equilibrium would have, however, consumed additional energy, although that need not have been dissipated in the chip.
Equilibrium statistics mandates that the absolute minimum energy dissipated in a single irreversible logic operation should be the Landauer-Shannon limit kT ln 2 [31,32].However, reaching this limit requires complicated switching dynamics (e.g., time modulated potentials) and extreme timing synchronization between various components of the switching cycle [31,32].If that precision is unattainable, no time-modulated potential is available and switching is carried out in a simple abrupt step, then the minimum energy dissipation will be where p is the static bit error probability (the probability that the bit flips spontaneously).It turns out that the energy dissipated in any irreversible logic operation in an SSL NAND gate (described below) is given precisely by the above expression [5].This is actually a remarkable result since it shows that no paradigm can better the SSL in dissipation for an irreversible logic operation carried out nonadiabatically without elaborate time-modulated potentials and ultraprecise timing mechanisms, since SSL operates at the thermodynamic limit.

The SSL NAND Gate for General Purpose Computing.
There are many ways to carry out general purpose computing (GPC), but the most common one is to use Boolean logic gates.In order to build a universal computing machine employing Boolean logic, we must construct combinational and sequential logic circuits by employing universal logic gates (e.g., the NAND gate).We will then interconnect them with "spin wires" that ferry spin signals between them unidirectionally.The two ingredients-NAND gates and unidirectional spin wires-are all that are required to implement a universal computing machine.
An SSL NAND gate is implemented with a linear array of three quantum dots each containing a single conduction band electron.The array is placed in a global static magnetic field that defines the spin quantization axis, that is, the spin in any dot will be aligned either parallel or antiparallel to it.If a spin is parallel to the field, we will assume that it encodes the binary bit 1 and if it is antiparallel, it encodes the binary bit 0.
The NAND gate realization is shown in Figure 2. The two peripheral dots in the array are treated as input ports whose resident spins are aligned to conform to input bits-either 0 or 1-with external agents that can generate local magnetic fields.The central dot is the output port and its resident spin's polarization encodes the output bit.
It was rigorously shown in [5] that the ground state spin configuration in this system is antiferromagnetic, that is, spins in nearest neighbor quantum dots will be mutually antiparallel as long as the exchange interaction strength between nearest neighbors is greater than one-half the Zeeman splitting energy in any dot due to the global magnetic field, and the local magnetic field applied to the input dots is much stronger than the global magnetic field.In that case, whenever the two inputs are 1, the output must be 0 to preserve the antiferromagnetic ordering, and similarly whenever the two inputs are 0, the output must be 1.When one input is 1 and the other is 0, a tie seemingly occurs.This tie, however, is broken by the global magnetic field, Figure 2: A linear array of three equally spaced quantum dots with nearest neighbor exchange interaction implements a NAND gate when placed in a global magnetic field.Spin polarization parallel to the field "upspin" encodes bit 1 and spin polarization antiparallel to the field "downspin" encodes bit 0. The following conditions must be satisfied: (i) the exchange interaction strength must exceed onehalf of the Zeeman splitting caused by the global magnetic field and (ii) the local magnetic field used to align input spins must cause a Zeeman splitting in the input dot far greater than the Zeeman splitting due to the global magnetic field.
which will generate a slight preference for the spin in the output dot to be aligned parallel to the field when the two inputs are dissimilar.(This is assuming that the Lande gfactor (gyromagnetic ratio) of the dot material is positive).Since spin orientation parallel to the global magnetic field encodes logic bit 1, the output will be 1 whenever the two input bits are different.Thus, the input-output relation of this system obeys the truth table of the NAND gate as shown in Figure 2.

Theory of the SSL NAND Gate.
To show that the 3-dot system indeed acts as described (i.e., performs the NAND logic operation), one must resort to rigorous quantum mechanics and consider the many-body Hamiltonian of the array.One can describe the system with a Hubbard Hamiltonian which will have 29 independent basis states.However, if we assume single electron occupancy in each dot (the dots are so small and have such small capacitance C that the energy cost to add a second electron to any dot, which is e 2 /2C, is prohibitively large), then the Hubbard Hamiltonian can be reduced to a much simpler Heisenberg Hamiltonian [4,7] which has only 8 independent (orthonormal) basis states.This Hamiltonian is given by where the σ-s are Pauli spin matrices.We adopt the convention that the local magnetic field needed to align spins in input dots, and the global magnetic field, are always along the z-axis.
The last two terms in the above Hamiltonian account for the Zeeman energies associated with the local and global fields.The first two terms account for exchange interaction between nearest neighbors (the angular brackets denote summation over nearest neighbors).We will assume the isotropic case when J ⊥ i j = J || i j = J, where J is the exchange energy, which is nonzero if the wavefunctions in dots i and j overlap in space.
The spins in the quantum dots are polarized in either the +z or −z direction by the global magnetic field (conforming to bits 1 or 0), and we designate the corresponding states as "upspin" (↑) and "downspin" (↓) states, respectively.Recall that the downspin state (aligned antiparallel to the global magnetic field) encodes bit 0 and the upspin state (parallel to the global field) encodes bit 1.
Obviously, there are 8 independent 3-spin basis states representing the spin configurations in the 3-dot array, which are |↑↑↑ , |↑↑↓ , |↑↓↑ , |↑↓↓ , |↓↑↑ , |↓↑↓ , |↓↓↑ , |↓↓↓ .In this state representation, the first arrow in every ket is the spin polarization in the left dot, the second arrow is the spin polarization in the central dot, and the third in the right dot.These eight basis functions form a complete orthonormal set.The matrix elements φ m |H Heisenberg |φ n are given in the matrix below, where the φ-s are the 3-electron basis states enumerated above.In spinor notation, |↑ = |1 = 1 0 , In the above matrix, Z is one-half of the Zeeman splitting energy associated with the global magnetic field (i.e., 2Z  |).If the local magnetic field is in the same direction as the global field and writes bit 1, then the corresponding h is positive, otherwise, it is negative.The quantity J is always positive.
Reference [5] evaluated the eigenenergies and eigenfunctions of the above Hamiltonian for the 4 possible input bit combinations to the NAND gate (0, 0), (0, 1), (1, 0), and (1, 1).It was found that the ground state wavefunctions in the four cases approach the desired states |↓↑↓ , |↓↑↑ , |↑↑↓ , |↑↓↑ , respectively, provided h L , h R > J, and J > Z/2.Thus, the ground state spin polarization in the output dot is always the NAND function of the spin polarizations in the input dots, provided the Zeeman splitting caused by the local magnetic fields that "write" input bits in the input dots is much larger than the strength of exchange coupling between nearest neighbors, and the latter, in turn, is larger than onefourth of the Zeeman splitting caused by the global magnetic field.Therefore, the NAND gate is indeed realized by three spins with nearest neighbor exchange coupling if we satisfy the conditions h L , h R > J, and J > Z/2.Since the NAND gate is universal, any arbitrary combinational or sequential circuit can be implemented by interconnecting NAND gates with a "spin wire" shown in Figure 3.
A spin wire is a linear array of quantum dots, each containing a single electron, with tunable nearest neighbor exchange interaction.Between each pair, there is a metal gate that is electrically accessed.(This is a lithographically challenging job since the spacing between dots will hardly exceed 10 nm for sufficient exchange coupling strength.However, lithography has now progressed to the point where this is no longer infeasible.).When a positive potential is applied to the gate, it lowers the potential barrier between the flanking dots and allows their resident electrons' wavefunctions to overlap in space.This turns on the exchange coupling only when the gate pad is activated.Without the positive gate potential, the barrier between dots is so high that exchange coupling is insignificant and the two dots are decoupled.Thus, we can turn the exchange interaction on and off with the gate pad potential.
We will describe in the next subsection how a spin polarization state can be unidirectionally propagated from left to right along the spin wire using a 3-phase clock.Unidirectionality is of paramount importance since, in order to work properly, an input stage in a logic circuit must drive an output stage and not the other way around [33].In other words, there should be no feedback from the output to the input.In a transistor-based circuit, this is automatically ensured since there is inherent "isolation" between the input and output terminals of a transistor that enforces a master-slave relation between the input and output, forcing logic signal to propagate unidirectionally at all times.Unfortunately, that is not the case with SSL since exchange interaction, which plays the role of interconnecting wire between successive spins, Gate pads for clocking Figure 3: A "spin wire" for transmitting binary bit information encoded in single electron spins unidirectionally.The gate pads (which are flared for the ease of contacting) are raised to high potential pairwise at a time with a 3-phase clock to make the logic bit propagate unidirectionally through the chain.
is intrinsically bidirectional.Therefore, we must enforce unidirectionality in some other way.Since we cannot impose unidirectionality in space, we must impose it in time, using a "clock" [6].This is actually an old idea that has been used to steer logic bits unidirectionally in charge-coupleddevice-(CCD-) based shift registers.There, a "push" clock and a "drop" clock are used to enforce unidirectional bit propagation.(This also requires a 3-phase clock.)[34,35].In SSL, the clock signal is a sequence of positive voltage pulses that are applied to the gates interposed between each pair of dots.The arrival of a positive voltage pulse temporarily lowers the potential barrier between two adjacent quantum dots and exchange couples their spins.By sequentially exchange-coupling three adjacent dots at a time using a 3-phase clock, the spin state of the leftmost dot can be propagated unidirectionally from left to right in a bucketbrigade fashion [8].
There are other possible clocking schemes for spin wires, one of which is due to Bennett [36].That scheme can be adapted to SSL as follows.Let us say that we wish to propagate the state (spin polarization) of the nth dot in a chain to the right unidirectionally.We will then rotate the spins of the (n + 1)th dot and (n + 2)th dot by ∼90 • to the right by an external agent.When that agent is withdrawn from the (n + 1)th dot but not the (n + 2)th dot, the (n + 1)th dot finds that its exchange interaction with its left and right neighbors are unequal since one neighbor's spin is pointing down and the other's spin is pointing to the right (see Figure 4).This breaks the tie and allows the (n + 1)th dot's spin to flip up because of the net exchange interaction it experiences.(The "flipping up" happens because it reduces the total energy of the system in this case).In the next step, the (n + 3)th dot's spin is rotated to the right and the rotating agent is removed from the (n + 2)th dot.The latter's spin then flips down owing to exchange interaction and the logic bit has propagated from the nth dot to the (n + 2)th dot unidirectionally.
The next important question is what agent can possibly rotate the spin of a targeted dot to the right by 90 • ?(Whether rotation is to the right or to the left makes no difference.Obviously, either one will work).That agent is an electric field.The field causes Rashba spin-orbit interaction [37] in the dot [38,39] and that can rotate the spin by ∼90 • [40] and implement Bennett clocking.However, it takes a very large voltage to rotate the spin by large angles with this strategy [40], which makes this approach of rotating the spin with a dc voltage extremely energy inefficient.A more energy inefficient approach is to place all the dots in a microwave field and apply a much smaller dc voltage pulse to turn on a slight Rashba interaction in a target dot to increase or decrease slightly the total spin splitting energy in that dot caused by the global magnetic field [38,39].This can make the total spin splitting energy in the target dot resonant with the photon energy in the global microwave field (ac magnetic field) [41].Only the target dot's spin will couple with the microwave field since its spin splitting energy is resonant with the photon energy.This will rotate the spin in the target dot by an arbitrary angle θ due to Rabi oscillation [42][43][44]: where τ is the dc pulse duration (the duration for which the dot is resonant with the global microwave field).By adjusting τ and B ac , one can make θ = 90 • .However, this approach may also require a considerable dc pulse amplitude, albeit less than what would be needed to rotate the spin with the dc potential alone, thereby making it still energy-inefficient.Moreover, the notion of placing a computer within a microwave cavity in order to obtain a sufficiently large B ac is not particularly appealing from an engineering perspective and hence not entirely practical.Thus, the optimum scheme may still be the first approach where the potential barriers between three adjacent dots are lowered with voltage pulses to exchange couple the trio at a time.

SSL Spin Wire.
A spin wire cannot only ferry spin logic bits unidirectionally, but it obviously can also perform the role of fan-out where a signal is split into multiple paths in order to drive multiple stages.This is shown in Figure 4(f).
It is obvious that the same strategy can implement fan-in as well.
Finally, one last requirement that wires must satisfy is the function of "crossover" where two wires cross each other in space without interfering with one another.Combinational logic circuits (e.g., adders and subtractors) do not always need crossover, but sequential circuits (e.g., flip-flops) will require feedback of an output state to an input state, and therefore crossover.This is the most challenging requirement and normally will be implemented with multiple layers of dots where a dot in one layer is sufficiently distant from the nearest dot in the closest layer to avoid significant exchange coupling.As a result, combinational logic is usually easier to implement in SSL than sequential logic.

The Toffoli-Fredkin Gate with SSL.
The NAND gate is a universal Boolean logic gate, but it is logically irreversible, meaning that we cannot infer the input bits if we have knowledge of only the output bit.For example, if the output bit is 0, then we can state with certainty that the input bits must have been (1, 1), but if the output bit is 1, then we could not tell whether the inputs were (1, 0), (0, 1), or (0, 0).A logically reversible universal gate is the Toffoli-Fredkin gate [45] which has three inputs A, B, and C and three outputs A , B , and C .Knowledge of the output bits of this gate allows us to infer the input bits uniquely.This is often a very desirable trait since it is believed that logically reversible gates can, in principle, be physically reversible and not dissipate any energy at all [31,32].
The truth table of the Toffoli-Fredkin (T-F) gate is as in Table 1.
It is clear that the input-output relation can be expressed as where ⊕ represents the logical exclusive OR operation and the • represents the logical AND operation.Therefore, two of the output bits (A , B ) replicate the corresponding input bits (A, B)-called the control bits-while the third bit C replicates itself unless both A and B are 1.In the latter case, it flips.Note that the gate is logically reversible since we can uniquely deduce the input bits A, B, and C from the output bits A , B , and C .
The Toffoli-Fredkin (T-F) gate can be realized with the same 3-dot array as the NAND gate.The spin orientations in the two peripheral dots will represent the control bits A and B, while that in the central dot will represent the target bit C. The dots are placed in a global magnetic field pointing in the upspin direction as in Figure 2. As before, spin orientations antiparallel to the global magnetic field (upspin) will represent bit 0 and that parallel to the global field will represent bit 1.The same Hamiltonian as in (2) will represent the system.It can be shown [46] ( [46] had a different convention where the magnetic field pointed in the downspin direction and spin polarization parallel to the field represented bit 0. The convention used in this article is equally valid) that as long as one-half of the Zeeman splitting caused by the local magnetic fields that orient the spins in the input dots greatly exceeds the exchange coupling energy, that is, h A = h B J and J > Z/2, where Z is the Zeeman splitting due to the global magnetic field, the T-F gate can be implemented.
Provided the above conditions are met, one can show [46] that when A = B = 0, the ground state spin configuration in the array approaches the many-body state | ↓↑↓ (antiferromagnetic ordering) and the first excited state is approximately |↓↓↓ (ferromagnetic ordering).Therefore, when the array is in the ground state, C = 1 and when it is in the first excited state, C = 0.It can also be shown that when A = B = 1, the ground state is approximately | ↑↓↑ (antiferromagnetic) and the first excited state is approximately |↑↑↑ (ferromagnetic).This time, C = 0 in the ground state and C = 1 in the first excited state.Finally, when A and B are logic complements of each other (dissimilar control bits), C = 1.Of course, we expect all of these to happen in any case since we expect the system to behave as a NAND gate.However, what we intend to focus on now is the energy differences between the first excited state and the ground state for the four different control bit combinations (1, 1), (0, 1), (1, 0), and (0, 0) since that will be the key to implementing the T-F gate with this array.
If we designate the energy difference between the first excited state and the ground state of the 3-spin system as ΔE A,B for different control bit combinations (A, B), then we can show that [46] ΔE A=0, B=0 = 4J + 2Z, The key to implementing the T-F gate is the fact that the energy difference between the first excited state and the ground state of the array depends on the control bits A and B since ΔE A=0, B=0 / = ΔE A=1, B=0 / = ΔE A=1, B=1 .Note also that in every case, the difference between the excited state and the ground state of the array is only in the spin polarization of the central dot.Hence, we can view ΔE A,B as essentially the spin splitting energy in the central dot for different states of the control bits A and B. The fact that the spin splitting energy in the central dot depends on the spin polarizations of the electrons in the peripheral dots is a consequence of exchange interaction.
In order to implement the truth table of the T-F gate, we will excite the 3-dot system with a π-pulse of angular frequency ω = ΔE A=1, B=1 /D, which means that we will turn on an ac magnetic field of angular frequency ω and amplitude B ac for a duration τ π such that (2gμ B B ac /h)τ π = 1.This will make the spin in the central dot flip (i.e., rotate by an angle θ = π) if the spins in the peripheral dots are both upspin.Hence, if and only if A = B = 1, C will flip (from 0 to 1 or 1 to 0).Otherwise, it will retain its previous state.This realizes the truth table of the T-F gate.Note that no energy is dissipated in the operation of the gate since the flipping of the spin in the central dot occurs by coherently absorbing a photon from the ac magnetic field (microwave).
There have been numerous ideas for physical implementation of the T-F gate [47][48][49].What we have described above is the first SSL implementation.

Energy Dissipation in SSL.
We mentioned at the outset that SSL should be very energy efficient and dissipate very little energy to carry out logic operations.It therefore behooves us to provide some concrete estimates of energy dissipation.
There are two sources of energy dissipation in generic SSL: internal dissipation in the gate while it switches in response to changed input bits, and dissipation in the clock that steers bits unidirectionally in a spin wire.We examine both below.
2.5.1.Gate Dissipation.Reference [5] showed that the energy dissipated in a NAND gate operation is approximately gμ B |B global | which also happens to be the energy difference between the two antiparallel spin states in any isolated dot that is not subjected to any external field other than the global field.Furthermore, it was shown that if the coupled spin system is in thermal equilibrium and governed by Boltzmann statistics, then the energy gμ B |B global | is also equal to kT ln(1/ p) where p is the probability of gate error caused by spins straying from the many-body ground state (which represents the correct gate result) into many-body excited states by absorbing phonons or magnons.(This result, although obvious for an isolated spin, is not obvious for a 3-spin system forming a NAND gate.Reference [5] proved this result rigorously).Remarkably, this energy-kT ln(1/ p)-is the minimum energy that any irreversible gate must dissipate in a single logic operation as long as the gate is in thermodynamic equilibrium with the environment, and the switching is carried out abruptly without any time modulated potential, by taking the system from one state to another.
The energy dissipated in a gate operation, as well as the strength of the global magnetic field, is therefore determined by how much gate error probability can be tolerated at a given temperature.If the error probability cannot exceed 10 −15 , then the energy dissipated in a gate operation will be kT ln( 10 For nonadiabatic clocking, the energy dissipated in the clock will be ∼CV 2 where C is the capacitance of the clock pad and V is the amplitude of the clock pulse.This energy depends on the clocking mechanism.It will be very high for Bennett clocking and presumably much lower if we merely modulate the tunneling barrier between neighboring dots.In any case, it should be considerably larger than the thermal energy kT to protect against thermal noise [50].Let us assume that the clock amplitude V is 10 times larger than the thermal voltage fluctuation on the clock pad which is √ kT/C, resulting in a signal-to-noise ratio of 10 : 1 or 20 dB.Therefore, the clock dissipation will be ∼100kT per cycle.In principle, this energy can be reduced to zero by using an RLC circuit-comprising a resistor in series with a parallel combination of an inductor and capacitor-to carry out the clocking where the dot acts as the capacitor.The clock should be a sinusoidal whose frequency is the resonant frequency of the RLC circuit.However, it is technologically challenging to string an inductor across a quantum dot of diameter ∼10 nm, making this somewhat impractical. It should be clear now that there are two sources of dissipation in an SSL circuit-the clock and the gate.The former could dissipate about 100kT per clock cycle and the latter dissipates kT ln(1/ p) per bit flip, which will be 34.5kTif we operate with a bit error probability of 10 −15 .Therefore, the total dissipation per clock cycle per bit is potentially ∼ 134.5kT, which is considerably less than the ∼50,000kT that present CMOS transistors dissipate [51].
2.6.The Speed of SSL.The speed of SSL (i.e., the maximum allowable clock frequency) is determined by four factors: (1) the speed with which an input bit can be written in an input port by the writing agent, (2) the speed with which an output bit can be read in an output port by the reader, (3) the gate switching speed, and (4) whether or not the architecture is pipelined.If the architecture is pipelined, then the clock speed is limited by the lowest of the other three speeds.

Pipelining in SSL.
Fortunately, SSL is a pipelined architecture.The clock in SSL not only propagates signals unidirectionally, but it is also invariably makes the architecture pipelined.To understand this concept, consider the spin wire in Figure 3.The input bit is applied to the leftmost dot by aligning its spin in the up-direction with an external agent.This is done during the first clock cycle.In the next cycle, the potentials in the first two gate pads are raised to cause nearest neighbor exchange coupling between the first three dots which then order their spins in the antiferromagnetic configuration.In the third cycle, the potential in the first gate pad is lowered, while that in the second gate pad is held, and that in the third gate pad is raised to cause nearest neighbor coupling between the second, third, and fourth dots.This ensures antiferromagnetic ordering within this latter trio which successfully orients the fourth dot's spin antiparallel to the input spin.In the fourth cycle, the potential at the second gate pad is lowered, that in the third gate pad is held high and that in the fourth gate pad is raised, which successfully transfers the input bit applied at the first dot to the fifth dot, thereby ensuring unidirectional signal propagation along the wire.
The point to note here is that as soon as the potential in the first gate pad is lowered in the third cycle, the first dot is decoupled from the chain, and the input applied to this dot can then be changed without affecting successful replication of the original input bit in the fifth dot as described above.In other words, the input can be changed during the fourth cycle regardless of how long the chain is.During the fifth clock cycle, when the first and second gate pad's potentials are raised again to exchange couple the first three dots, the original input bit has already propagated down the chain (to the sixth dot) and is decoupled from the input side since the third gate potential has been lowered in the fifth cycle, which decouples the input side from the output side.Thus, the traveling bit will not be affected by the new input.In other words, a new input bit can be fed to the spin wire before the earlier input makes it to the very end of the wire.Therefore, the input bits can be pipelined.The reader should be able to determine that in this case, the input bit rate will be only onethird of the clock rate.
The pipelining, however, comes with a serious fabrication penalty since gate pads must now be interposed between every pair of dots in order to apply a local potential independently between any chosen pair to exchange couple them.We call this scheme of clocking "granular clocking" since every pair has its own clock pad.This increases the fabrication complexity and cost and limits the bit density on a chip.However, the alternate is a nonpipelined architecture which will be extremely slow and hence unacceptable.
One intriguing possibility to have the best of both worlds (pipelined and yet no separate clock pad for each pair) is to launch a guided electromagnetic wave in a waveguide built underneath a spin wire.When the crest of the wave arrives at a set of dots, the corresponding gate pad voltages are raised.Since we need to address two neighboring gate pads at a time, the wavelength of this wave should be roughly the distance spanned by four gate pads in order to maintain pipelining.This distance may be roughly 100 nm, requiring ultraviolet waves.This idea allows pipelining of data without requiring separate electrical connections to every gate pad and therefore appears to be very attractive.However, this is also fraught with some danger since the magnetic field in the electromagnetic wave may interfere with the spin states.
Another possibility is to launch a traveling magnetic field pulse in a waveguide buried underneath the spin wire.This field is not collinear with the global field.A quantum dot positioned at the crest of this pulse experiences a net magnetic field that is at an angle with the global field.The spin in this dot will align with the local field and hence will be slanted with respect to the global field.If the input bit propagates synchronously with this pulse, it can propagate unidirectionally in the wake of the pulse.This method too does not require individual connections to every quantum dot to implement a pipelined architecture.

The Clock Speed in SSL.
Once it has been established that SSL is a pipelined architecture, we have to next determine the writing speed, the reading speed, and the gate switching speed in order to ascertain which is the slowest among them.The slowest speed will determine the maximum allowable clock speed.

Writing Speed.
The speed with which an input bit can be written in an input port depends on the flux density of the local field B local .Reference [5] showed that this field must be strong enough that the Zeeman splitting it causes in the input dot is at least 20 times larger than the exchange coupling strength between dots.The latter can be about 1 meV in semiconductor dots [52].Therefore, in InSb quantum dot systems, where we have assumed the g-factor of bulk InSb which is −51.The g-factor in quantum dots can be less than in bulk, which will increase B local .This analysis clearly shows that writing of bits calls for a Herculean feat since generating ∼7 Tesla of magnetic field locally is a very tall order.There are some materials like InSb 1−x N x which reportedly have g-factors as large as 900 in the bulk [53].Assuming that the same g-factor can be retained in quantum dots, the value of B local needs to be only ∼0. 4 Tesla, if one employs InSb 1−x N x quantum dots as hosts for the spins.Generating field strengths of this magnitude locally is still quite demanding.
The time required to complete the "writing" of input bits in isolated input dots is of the order of ∼ h/(2gμ B B local ).The value of gμ B B local in (7) yields the writing time as ∼0.1 ps, which is indeed very fast and clearly will not be the limiting factor for clock speed.

2.6.4.
Reading.There are many strategies to "read" the spin polarization of single electrons in quantum dots [11][12][13], among which the scheme of [13] is best suited to SSL.In [13], the reading time was of the order of a millisecond.This time is determined by the speed with which electrons can tunnel in and out of the dot and therefore one should be able to increase this speed dramatically with better engineered structures.Again, this should not be the limiting factor to determine clock speed.

Gate Switching Speed.
The gate switching speed is determined by how long it takes for a gate to complete a logic operation.That, in turn, depends on how fast the coupled spin system can relax to the ground state when coupled with the external thermal bath.This time is much shorter than the spin relaxation time of a single-isolated spin for essentially the same reasons that the ensemble averaged spin dephasing time of many interacting spins is orders of magnitude shorter than the dephasing time of a single-isolated spin [54,55].There are no reports of any measurement of spin relaxation times in coupled (as opposed to isolated) quantum dots.However, there are numerous ways to shorten this time, for example, by implanting magnetic impurities in the barriers.It should be possible to reduce this time to ∼1 ns.
It is clear now that among all the three switching speeds, the gate switching speed and the reading speed are the slowest and therefore will determine the clock speed.Assuming reading times and gate switching times of ∼1 nanosecond, the maximum clock frequency will be

The Gate Error Probability in SSL.
There are two types of gate error in SSL: (1) the intrinsic error caused by the coupled spin system in a gate occupying thermally excited states instead of the ground state with probability p; (2) the extrinsic error caused by a spin in a dot flipping spontaneously during a clock period (due to coupling with the environment) and its probability is given by where T c is the clock period and T 1 is the spin flip time of an isolated spin.Spin flip times of an isolated spin as long as 1 second have been demonstrated in GaAs quantum dots at very low temperatures of 120 mK [56] and in organic nanostructures at much higher temperatures of 100 K [57].
Assuming T c = 1 nsec and T 1 = 1 sec at the operating temperature, p extrinsic = 10 −9 , which is acceptable.
2.8.The Temperature of Operation of SSL.Reference [5] showed that if we want a fixed intrinsic error probability p, then the temperature of operation is determined by the condition.(The condition for SSL to work is that J > gμ B B/4): where J is the energy of exchange coupling between neighboring dots.Assuming J = 1 meV, which is achievable with today's quantum dot technology [52], the maximum operating temperature turns out to be if we operate with an intrinsic error probability of 10 −9 .This is a very low temperature and requires He 3 cooling, which is a serious disadvantage and essentially precludes SSL from being a serious contender for general purpose computing (although niche applications are still a possibility).Room temperature operation with such low error probability would have required exchange coupling strengths in excess of ISRN Materials Science 300 meV, which is not presently achievable with semiconductor quantum dot technology.
Had we operated at room temperature with the presently achievable J = 1 meV, then the bit error probability would have been p = e −2J/kT = 92.6%,which is clearly unacceptable.At 4.2 K temperature (which requires He 4 cooling instead of the more demanding He 3 cooling), the bit error probability would have been 4 × 10 −3 which may be acceptable in some situations if significant error correction resources are available.
A recent development has altered this prognosis dramatically.It has been shown that graphene nanoflakes can implement SSL-type logic gates with much higher exchange interaction strength (2J = 180 meV) which allows roomtemperature operation with a bit error probability p = e −2J/kT = 0.1% [58].This is a very exciting and promising route for SSL and may revive interest in SSL since it establishes a clear pathway for practical implementation.
Equation ( 10) also yields the value of the global dc magnetic field required for operating at 1 K with an error probability of 10 −9 .In an InSb quantum dot with g = −51, |B global | will be 0.7 Tesla, which is easily achieved.If the quantum dot material has a g-factor of 900 [53], then the required strength of |B global | is only 0.04 Tesla.These field strengths can be easily achieved with permanent magnets.

Current Experimental Status of SSL.
To our knowledge, SSL has never been demonstrated experimentally, but the pathways to low temperature demonstration are clear.This architecture requires the delineation of an array of quantum dots, each containing a single electron, in specific topological patterns on a wafer.Neighboring dots must be spaced within ∼10 nm to allow significant exchange coupling between nearest neighbor spins, and gate pads must be inserted between every pair of dots to allow clocking.The lithography is undoubtedly challenging, but not daunting to the point of being unrealistic.
Numerous groups have demonstrated arrays of quantum dots with single electron occupancy [9] and manipulation of single electron spins in isolated quantum dots has also been demonstrated by a number of groups recently [14][15][16][17][18][19][20][21][22][23][24][25][26][27].These results inspire hope that SSL, which only requires single electron dots with nearest neighbor exchange coupling, is within the reach of current technology.The only major challenge is the alignment of gate pads between every pair of dots with a high degree of reliability.Recent demonstration of field effect transistors with 6 nm gate length [59] shows that lithography is advancing to the level where such challenges can be met.

Nanomagnetic Logic: Computing with Spin Ensembles
The major drawback of SSL is that it requires cryogenic operation because (i) exchange interaction between spins confined in semiconductor quantum dots is very weak, and yet it has to exceed the thermal energy kT manyfold in order to have small error probability p (see ( 10)); (ii) higher temperatures increase the spontaneous spin flip rate 1/T 1 dramatically and hence increase the extrinsic error probability p extrinsic rapidly (see (9)).These two limitations make SSL a low-temperature technology.Therefore, it behooves us to look at other systems that behave like SSL but are much more error-resilient (have much smaller p extrinsic at any temperature) and do not necessarily operate with exchange interaction.One such system is an array of single-domain nanomagnets each consisting of roughly 10 4 spins, all of which rotate or flip in unison under external stimuli.Thus, all the ∼10 4 spins act like one giant classical spin with ∼10 4 times the magnetic moment [60,61].The single-domain nanomagnets interact with each other via dipole coupling which can be easily ∼1000 times stronger than exchange coupling.Furthermore, the magnetization of a nanomagnet is much more stable than the spin polarization of a single electron, that is, p extrinsic is much smaller at any given temperature.One can replicate SSL with nanomagnets instead of single electron spins.These systems have been termed magnetic quantum cellular automata [62,63] and are essentially nothing but nanomagnetic versions of SSL with a single-domain nanomagnet replacing a single spin, and dipole interaction replacing exchange interaction.
While a single electron's spin is made bistable by placing it in a magnetic field, a nanomagnet's magnetization orientation cannot be made bistable in the same fashion.Instead, one can make the shape of the magnet "anisotropic" as in an elliptical cylinder whose major axis dimension exceeds that of the minor axis.Because of the anisotropic shape, the magnetization vector of this magnet has two (mutually antiparallel) stable orientations along the major axis, which is called the "easy axis" since it is easier for the magnetization to align along this axis compared to any other direction.Only these two orientations are stable because of the socalled "shape anisotropy energy" of the magnet, which makes the minimum energy state corresponds to magnetization alignment along the easy axis.Thus, just like the spin of a single electron placed in a magnetic field, a shape-anisotropic nanomagnet has two stable states: parallel and antiparallel to the easy axis.Unlike in the case of single spin, however, where the stable and metastable states were not energetically degenerate and were separated by the Zeeman splitting energy gμ B B, here the two states are energetically degenerate.We can intentionally make them nondegenerate by applying a magnetic field along the easy axis, but that is not necessary.
The minimum energy barrier separating the two stable states in a shape-anisotropic single-domain nanomagnet is related to the degree of shape anisotropy and is given by where μ 0 is the permeability of free space, M s is the saturation magnetization of the magnet per unit volume (∼5 × 10 5 A/m for common materials like nickel and cobalt), Ω is the nanomagnet's volume, and N d−yy , N d−zz are the demagnetization factors along the yand z-axes, respectively.
The demagnetization factors are given by [64] for elliptical cylinders: where a is the major axis, b is the minor axis, and l is the thickness of the nanomagnet.Note that Ω = (π/4)abl.If we choose a = 105 nm, b = 95 nm, and l = 6 nm, then the minimum energy barrier in a nickel or cobalt nanomagnet shaped like an elliptical cylinder is ∼34kT at room temperature.
The probability that the magnetization of the shape anisotropic magnet will spontaneously flip in a period of time τ is p , where τ r is the "magnetic retention time" given by τ r = τ 0 exp[E b /kT], with τ 0 being the "attempt frequency," which is between 1 ps and 1 ns [65].Therefore, at room temperature (kT = 26 meV), τ r is between 588 and 588,000 seconds since E b = 34kT.For τ = 1 ns (or clock frequency of 1 GHz), we have the condition τ τ r .That makes p magnet extrinsic ≈ τ/τ r = (τ/τ 0 )e −Eb/kT ∼ e −Eb/kT which is e −34 = 1.7 × 10 −15 at room temperature.Thus, clearly, room temperature operation is possible with very high errorresilience.Note that p magnet extrinsic p single-spin extrinsic .This is what makes a nanomagnet, consisting of many interacting spins, much more robust than a single-isolated spin.
It will be natural to assume that if a single-domain nanomagnet contains ∼10 4 spins, then the energy dissipated in flipping the magnetization of the magnet will be ∼10 4 times higher than in flipping a single spin, that is, the minimum dissipation will be NkT ln(1/ p) where N (∼10 4 ) is the number of spins in the magnet and p is the probability of spontaneously flipping a single spin (p = p single-spin extrinsic p magnet extrinsic ).
The authors of [61] have shown this assumption to be flawed.In a single-domain magnet, all the N spins collectively behave as one giant single spin [60] and rotate together in unison because the strong exchange interaction among them keeps them mutually parallel at all times.As long as the exchange interaction strength is much larger than kT, this will happen at any temperature T. Thus, there is a single degree of freedom for the spins and not N independent degrees of freedom.As a result, the minimum energy dissipated to switch a single spin and the minimum energy dissipated to switch a single magnet consisting of many spins are roughly the same, that is, in both cases, this energy is ∼kT ln(1/ p) and not NkT ln(1/ p)!This remarkable result makes the idea of replacing a single spin with a single magnet worth pursuing.
The above discussion reveals why magnet-based switches are potentially much more energy efficient than transistorbased switches.In a nanotransistor, where there are N charges (information carriers) in the channel, the minimum energy dissipation will indeed be NkT ln(1/ p) because each charge represents an independent degree of freedom, but in a single-domain nanomagnet, it can be only ∼kT ln(1/ p).Thus, the magnet has an intrinsic advantage over the transistor, particularly when N 1.To summarize, there are two reasons why magnets may replace transistors in digital logic systems: (1) the elimination of the I 2 R dissipation (in principle, no current flow should be needed to switch a magnet), (2) the collective interaction between spins which makes the minimum energy dissipation in a magnet much less than that in a transistor when both contain the same number of information carriers (electron charges or electron spins).Magnets also suffer from no "leakage" unlike transistors, which increases their energy efficiency even more.

Switching a Nanomagnet: Penny-Wise and Pound-Foolish.
There are two sources of energy dissipation in switching nanomagnets: (1) the internal energy dissipated when the magnetization flips (its minimum value is kT ln(1/ p) but the actual value may be somewhat higher); (2) the energy dissipated in the switching circuitry, which depends on the method of switching.
The internal energy dissipation in a magnet is typically small because of the collective interaction between spins as discussed, but unless one is judicious in the choice of the switching methodology, the energy dissipated in the external switching circuit may become overwhelming and completely erase the magnet's advantage over the transistor.Thus, in order to avoid being penny-wise and poundfoolish, one must employ energy efficient switching strategies for flipping the magnetizations of single-domain shapeanisotropic nanomagnets.
The traditional method of switching nanomagnets is to generate a local magnetic field in the vicinity of a magnet with a current [63,66].The current flows in a loop circling the magnet.The magnetic field H generated by this current is given by Ampere's law: where the line integral is taken around the loop in which the current I flows.The last equation relates the minimum current I min needed to flip the magnetization to the minimum magnetic field H min that can overcome the energy barrier E b in (12) and make the magnetization switch from one stable state to the other.We can estimate H min by equating the magnetic energy of this field to the energy barrier: where Ω is the nanomagnet's volume.We will assume that E b = 30kT at room temperature (this makes the error probability associated with spontaneous flipping of magnetization e −30 = 10 −13 at room temperature) and M s = 10 5 A/m (typical for cobalt or nickel).If the nanomagnet is shaped like an elliptical cylinder, the dimensions that yield this value of E b (see ( 12) and ( 13)) are a = 105 nm, b = 95 nm, and l = 6 nm.

ISRN Materials Science
Hence, Ω = (π/4)abl = 47,000 nm 3 .Equation ( 15) then yields the value of |H min | as 21,262 A/m = 267 Oe.From ( 14), we get I min = 13 mA, assuming the loop radius to be 100 nm.Therefore, the energy dissipated to flip a bit per clock cycle (assuming a switching time Δt of 1 ns) is I 2 min RΔt = 1.7 pJ = 4 × 10 8 kT at room temperature, assuming the resistance of the loop to be 10 ohms.This is two orders of magnitude larger than the energy dissipated to switch a transistor in a circuit with a switching delay of the same 1 ns.Therefore, this method of switching nanomagnets-generating a local magnetic field with a current-is clearly energy-inefficient and must be avoided.
A second method of switching nanomagnets is by passing a spin-polarized current through it.This delivers either a spin transfer torque [67][68][69][70][71] or induces domain wall motion [72], resulting in magnetization flip.The energy dissipated in this method is also of the order of 10 8 kT [73] although there is a report of switching a nanomagnet with domain wall motion in ∼2 ns while dissipating only about 10 4 kT of energy [74].Nonetheless, these methods unfortunately do not make magnetic switches so energy efficient that they would be actually poised to replace transistors and therefore merit serious attention.It is therefore imperative to find better schemes for switching magnets since the switching circuitry has turned out to be the Achilles' heel.

Hybrid Spintronics and Straintronics.
Recently, we devised an extremely energy efficient scheme for switching nanomagnets that employs multiferroics.This actually raises hopes that nanomagnets may indeed some day replace transistors as binary switches in digital logic circuits.Multiferroics [75] are sometimes multiphase materials, for example, a bilayer consisting of a single-domain magnetostrictive (magnet) layer overlying a piezoelectric layer.Consider the elliptical multiferroic in Figure 5.A voltage applied across the piezoelectric layer as shown generates uniaxial stress along the major axis of the piezoelectric through d 31 coupling, provided the entire multiferroic structure is clamped to prevent expansion and contraction along the in-plane hard axis (minor axis of the ellipse).The associated strain is transferred elastically to the magnetostrictive layer, generating stress in it and rotating its magnetization by large angles [76][77][78][79][80][81][82][83][84].If the strain is withdrawn at the right juncture, rotation by ∼180 • is possible with >99.99% probability even in the presence of thermal noise at room temperature [85].The switching takes less than 1 ns to complete, making this strategy one of the most energy efficient, and yet relatively fast, switching methodologies extant.Because we are rotating spins within the magnet with electrically-generated strain, we have termed this approach hybrid spintronics and straintronics [83].We will discuss this next.
Consider the magnet in Figure 5 shaped like an elliptical cylinder whose cross-section is in the y-z plane.The z-axis is along the major axis of the ellipse and is the easy axis of magnetization.The stable magnetization orientations are of course along the ±z-axis.There are two hard axes: the y-axis is the in-plane hard axis and the x-axis is the outof-plane hard axis.Because the thickness of the magnet is much smaller than the lateral dimensions, the x-axis will be "harder" than the y-axis.
We will adopt spherical coordinates for analysis and assume that the magnetization vector's direction is the radial direction.Hence, the magnetization orientation is specified by the coordinates (r, θ, φ), where r is fixed.The polar angle θ is the angle subtended by the magnetization vector with the +z-axis, and the azimuthal angle φ is the angle subtended by the projection of the vector on the x-y plane with the +x axis.Thus, [θ = 0 • , 180 • ] corresponds to the stable orientations along the easy axis while [φ = 90 • , 270 • ] corresponds to the plane of the magnet.The coordinate system is shown in Figure 5.
The total potential energy of the shape-anisotropic magnetostrictive nanomagnet is the sum of shape-and stressanisotropy energies: shape anisotropy energy where λ s is the magnetostrictive coefficient and σ(t) is the time-dependent stress.We assume the magnet to be polycrystalline so that we can ignore magnetocrystalline anisotropy energy.
Because of the inequality N d−xx N d−yy > N d−zz , it is clear that in the absence of stress, the minimum energy configurations are θ = 0 • , 180 • and φ = 90 • , 270 • .Therefore, the stable orientations of the unstressed shape-anisotropic nanomagnet's magnetization are along the ±z-axis.However, in the presence of stress, the minimum energy orientation will shift to θ = 90 • and φ = 90 • , 270 • if the product λ s σ(t) is negative and the stress is sufficiently high to make The potential energy profile as a function of the polar angle is shown in Figure 6 for φ = 90 • .Note that by applying sufficient stress, one can move the potential energy minimum from θ = 0 • , 180 • to θ = 90 • in the magnet's plane.In other words, sufficient amount of stress will rotate that prevent expansion/contraction of the multiferroic in the ydirection.An electrostatic potential applied across the piezoelectric generates uniaxial strain in that layer, which is transferred almost entirely to the magnetostrictive layer if the latter layer is much thinner than the former.This will generate uniaxial stress in the magnet and rotate its magnetization away from the stable z-axis (easy axis) towards the y-axis (in-plane hard axis), ultimately resulting in the magnetization flipping if the voltage is turned off as soon as the magnetization vector enters the x-y plane.

Potential energy
Increasing stress θ (polar angle) 0 • 90 • 180 • the magnetization from the easy axis to the in-plane hard axis.
If the voltage is turned off (and stress withdrawn) as soon as θ reaches 90 • , then the torque resulting from the out-ofplane motion of the magnetization vector will continue to rotate the magnetization past θ = 90 • and make it approach θ = 180 • , resulting in a "flip." In the above discussion, we have avoided some subtle issues.For example, if the initial orientation of the magnetization vector is exactly along the easy axis, then no amount of stress can budge it since the torque on the magnetization vector, which is proportional to the gradient of the energy E(θ(t), φ(t)) in θand φ-space, vanishes.However, thermal fluctuations can dislodge the magnetization vector slightly from the easy axis, whereupon the torque resulting from stress and shape anisotropy rotates the magnetization vector away from the easy axis towards the in-plane hard axis and ultimately accomplishes switching.

Nanomagnetic Logic.
A logic system has two components: (1) universal logic gates such as NAND or NOR and (2) a unidirectional "wire" for ferrying logic bits without feedback from the input stage to the output.These two components are sufficient to implement any combinational or sequential logic circuit.
Universal Gate.A NAND gate can be implemented in a way reminiscent of the approach adopted in SSL and a specific nanomagnetic implementation of a NAND gate with fanin and fan-out is shown in Figure 7.The array is placed in a global magnetic field B global such that the magnetostatic energy due to this field M s B global Ω (where M s is the saturation magnetization of the magnet per unit volume and Ω is the magnet volume) is smaller than the shape anisotropy energy and dipole interaction energy.Because of the specific layout employed, dipole interaction between the magnets ensures that the output bit is always the NAND function of the two input bits for any of the four input combinations (0, 0), (0, 1), (1, 0), and (1, 1) [86].Bits will propagate unidirectionally through this gate if the four groups of magnets (classified into groups I, II, III, and IV) are clocked sequentially with a sinusoidal 4-phase clock that are phase shifted from each other by 90 • [86].
The internal energy dissipated in the four magnets constituting the basic NAND gate is ∼500kT at room temperature per clock cycle and the energy dissipated in the entire 12-magnet array to perform one logic operation is ∼1250kT [86].The energy dissipated in the clocking circuit is negligible in comparison and can be made essentially zero if the clocking is performed with a parallel LC circuit with a resistance R in series, where the clock frequency is the resonant frequency of the LC circuit [86].
Logic Wire.A logic wire is implemented with a linear array of nanomagnets where the line joining the centers of adjacent magnets is parallel to the in-plane hard axis of the magnets (see, e.g., the three magnets for fan-in in Figure 7).Bits are propagated unidirectionally through the wire (or chain) by stressing the magnets sequentially pairwise using a 3phase clock just as in the case of SSL.This implements Bennett clocking for unidirectional logic bit propagation.The stress rotates the magnetization of any magnet by 90 • , aligning it temporarily along the in-plane hard axis just as shown in Figure 4. Reference [81] has shown rigorously that Bennett clocking by this method is not only possible, but consumes very little energy per bit in every clock cycle.The voltage required to rotate the magnetization by ∼90 • is about 200 mV if the magnetostrictive material is nickel (weakly magnetostrictive) and roughly 10 mV if the magnetostrictive material is Terfenol-D (strongly magnetostrictive) [81].
In order to calculate the energy dissipation in Bennett clocking as a function of switching speed, one needs to solve the time-dependent problem of switching (or magnetization In a linear array, if the line joining the centers of the magnets is parallel to the in-plane hard axis, then dipole interaction between the magnets ensures that the ordering is antiferromagnetic, whereas if that line is parallel to the easy axis, then the ordering is ferromagnetic.Note that very specific distances have to be maintained between the magnets and that the arrangement here is different from that in SSL.This figure is adapted from [86] with permission from the Institute of Physics.The magnets are clocked with a 4-phase sinusoidal clock in order to propagate bits unidirectionally from the input to the output port.Each group of magnets labeled I, II, III, and IV is clocked with one phase and the clock phases are shifted from each other by 90 • . dynamics) using the Landau-Lifshitz-Gilbert (LLG) [87] equation that describes the magnetization dynamics.Stress acts like an effective magnetic field that gives rise to two kinds of motion: (1) precessional motion about the field (which will lift the magnetization vector out of the plane of the magnet) and (2) damping motion that will tend to align the magnetization along the effective field.The precessional motion is nondissipative while the damping motion is dissipative.Materials like nickel have small damping because of weak coupling to dissipative processes, while Terfenol-D has much stronger damping.However, Terfenol-D has much stronger magnetostriction and hence requires much less stress than nickel to switch.As a result, Terfenol-D is much more energy efficient than nickel when used in the magnetostrictive layer of a multiferroic switch.

Nanomagnetic Memory.
A memory element implemented with a multiferroic nanomagnet is shown in Figure 8.The bit information is stored in the magnetization orientation of the soft magnetostrictive magnet shaped like an ellipsoidal cylinder.The two (mutually antiparallel) orientations along the major axis are the stable states and encode bits 0 and 1.The reading and writing schemes are described in the caption of Figure 8.The memory is addressed via a cross-bar architecture shown in the right panel of Figure 8.When writing bits, the voltage applied between the crossbars should be able to not only rotate the magnetization, but rotate it by ∼180 • , resulting in a bit flip.This is indeed possible if we withdraw the stress as soon as the projection of the magnetization vector on the magnet's plane reaches close to the in-plane hard axis, that is, the magnetization vector enters the plane defined by the in-plane and out-of-plane hard axes.Not only is this possible at 0 K temperature, but solution of the stochastic Landau-Lifshitz-Gilbert equation [88] has shown that it is possible at room temperature as well, despite thermal noise [89].

Energy Dissipation in Straintronics.
Reference [89] and the later work by our group have shown that the total energy dissipated per bit flip in hybrid spintronic/straintronic memory is about 400kT at room temperature if we switch in ∼1 ns.We can reduce this energy by a factor of 10 or more if we switch slower, for example, in 10 ns.Thus, in a chip with 10 8 logic switches per square centimeter, the power dissipated is 0.17 mW/cm 2 at a clock rate of 100 MHz, if 10% of the devices switch at any given time (10% activity level).This opens up unprecedented applications.Chips with such low-power requirements can run by scavenging energy from the environment without requiring a battery.There are numerous energy harvesting schemes that can harvest this level of energy from energy radiated by cable TV, 3G networks and environmental vibrations [90][91][92][93][94]. Furthermore, devices of this type are ideally suited for medically implanted devices, such as processors implanted in an epileptic patient's brain that monitor brain signals and warn of an impending seizure.These processors can run by harvesting energy from the patient's head movements or from electromagnetic radiation in the environment, without every requiring a battery.Another possible application of such processors is in distributed sensor networks for structural health monitoring that can run off the power harvested from mechanical In order to read a stored bit, a spin-valve structure is used.This structure consists of a soft magnetostrictive layer separated from a permanently magnetized hard magnet by a thin spacer layer.Let us say that the magnetization orientation of the hard magnet represents bit 1.If the soft layer stores bit 1, its magnetization is parallel to that of the hard layer and the vertical resistance of the spin valve structure will be small.If the soft layer stores bit 0, its magnetization is antiparallel to that of the hard layer and the spin-valve's resistance will be larger.Thus, by reading the spin-valve's ac resistance with a small signal, we can read the stored bit.The resistance is measured between the upper and lower crossbars.For writing, the stored bit is first read.If it is the desired bit, no action is taken.Otherwise, the bit is flipped by applying a potential between the upper and lower cross-bars.This potential is dropped mostly across the piezoelectric since the magnets are metallic and the spacer layer is ultrathin.vibrations in the structure (buildings, bridges) induced by wind or passing traffic.

Other-Spin-Based Logic and Memory
Ideas.An idea that is closely related to hybrid spintronics and straintronics and has been advanced by its proponents as an energy efficient computing paradigm is reconfigurable array magnetic automata (RAMA) which visualizes pillars of nanomagnets embedded in a piezoelectric (or ferroelectric) matrix [95].Because of the shape anisotropy of the pillars, magnetization up or down along the axis of a pillar are the two stable states.Nearest neighbor pillars interact via dipole interaction and hence two neighbors have antiferromagnetic ordering.By exploiting the dipole coupling between nearest neighbors, a NAND gate can be implemented in the usual way as shown in Figure 9.
Application of an electric field in the piezoelectric (along the pillar axis) generates strain that strains the pillars and hence produces stress anisotropy energy which rotates the pillar's magnetization by up to 90 • .Such rotations have been demonstrated in BiFeO 3 -based piezoelectrics interfaced with magnetostrictive materials [96].This can implement Bennett clocking and hence a unidirectional logic wire, thus fulfilling the requirements of a complete logic system in nanomagnetic logic.However, implementing a memory is much more difficult and could be very costly in terms of energy dissipation.
In hybrid spintronics and straintronics, it is possible to rotate the magnetization by 180 • and not just 90 • if we withdraw the stress at or close to the exact juncture when the magnetization vector's projection on the magnet's plane aligns along the in-plane hard axis.What makes it happen is the out-of-plane dynamics of the magnetization vector that generates a helpful torque to rotate the magnetization from 90 • to 180 • [85].This out of plane dynamics, crucial for a complete bit flip or the 180 • rotation, is either absent or very weak in a pillar, making bit flip via stress nearly impossible.Therefore, the only way to implement memory with RAMA is to apply a local magnetic field in the direction of the intended magnetization when a bit is to be written.This is indeed the method advanced by the proponents of RAMA [95].Unfortunately, local magnetic fields are not only challenging to produce, but dissipate enormous energy as already discussed.Hence, RAMA-based memory is not likely to be very energy efficient, unlike hybrid spintronics and straintronics.

3.4.1.
All-Spin Logic.Another interesting "spintronic" idea that has received significant attention has been termed "allspin-logic" [97][98][99][100].A basic element in this paradigm is shown in Figure 10 where two identical magnets are placed on a spatially asymmetric conducting channel.The channel is "asymmetric" since the ground terminal is closer to the left magnet than to the right one.
The current flowing through the two magnets under the common bias voltage V bias is I 1 = V bias /R 1 and I 2 = V bias /R 2 , where R 2 > R 1 since the second current path is longer.As a result, I 1 > I 2 , which means that there is in-built nonreciprocity.Since the current injected by (or extracted from) the left magnet designated as M in is larger than the current injected by (or extracted from) the right magnet designated M out , the left magnet's magnetization serves as the input determining the magnetization of the right magnet which acts as the output.We will explain this shortly.
As usual, logic bits are encoded in the two stable magnetization orientations of either magnets shaped like an elliptical cylinder.Let us first consider the situation when V bias is negative.Magnet M in then injects net spinpolarized current (I 1 − I 2 ) into magnet M out where the spin polarization of this current is that of the majority spins in magnet M in , meaning that the spin polarization is parallel to the magnetization of M in .This happens because I 1 > I 2 and hence the current injected by the left magnet overshadows that by the right.As a result, there is net flow of spin-polarized electrons from M in into M out .These spin-polarized carriers exert a spin transfer torque on the electrons in magnet M out and turn their spin polarizations in the direction of the majority spins in M in .As a result, the magnetization of M out becomes parallel to that of M in and this is the COPY operation, where the bit encoded by the input magnet M in is "copied" into the output magnet M out .
When V bias is positive, majority spins are extracted from M in which must be replenished by electrons with the same spin polarization flowing in from M out .As a result, M out  becomes deficient in these spins and gradually the spins whose polarizations are antiparallel to the magnetization of M in become the majority in M out .Therefore, the magnetization of the output magnet becomes antiparallel to that of the input.This is logical inversion or the NOT operation.Therefore, the structure in Figure 10 can perform either the COPY operation or the NOT operation by simply reversing the polarity of the bias voltage.This lends itself to applications in ring oscillators [100].Note that placing the ground terminal closer to the input magnet has endowed this system with built-in nonreciprocity.Since this makes I 1 > I 2 , we have isolation between input and output; the input commands the output and not the other way around.As a result, no Bennett clocking is needed for unidirectional logic propagation and that saves the energy in the Bennett clock.However, as we have shown, the dissipation in the Bennett clock is negligible and can be made close to zero by employing resonantly excited LCR circuits, so this energy saving is not a major advantage.What might be an advantage is the elimination of clock connections to individual devices, which is lithographically taxing.

Isolation layer
Note that there is an isolation layer (or isolation trench) under each magnet which electrically isolates M in from the magnet to the right of M out and also M out from the magnet to the left of M in .The right side of M in is the "talking" side of that magnet that talks to magnet M out and the left side of M out is the "listening" side of M out that listens to M in .Similarly, the right side of M out will be the talking side that will talk to the magnet to the right of M out .Thus, there is a master-slave relation between any pair of magnets-the left magnet is the master that talks to the slave magnet on the right, who always listens to the master.
The most attractive feature of all-spin logic, in the opinion of this author, is the inherent non-reciprocity.The reason why magnetic quantum cellular automata type of architectures lacks non-reciprocity (and therefore requires a Bennett clock) is that it uses dipole interaction to communicate between magnets and that interaction is inherently bidirectional.One could, in principle, progressively increase the distance between nanomagnets in a magnetic quantum cellular automata "wire" to achieve unidirectionality in space (and therefore avoid Bennett clocking), but ultimately the dipole interaction will become too weak to communicate bit information.The all-spin logic does not use bidirectional interaction between magnets and hence can achieve nonreciprocity.Note that hybrid spintronics and straintronics do not have to lack non-reciprocity.As long as we do not use a bidirectional interaction to communicate between magnets (i.e., avoid magnetic quantum cellular automata type of architectures), we can fashion nonreciprocal circuits out of multiferroics and avoid Bennett clocking as well.An example of this will be presented in a forthcoming publication.
Reference [97] has shown how a universal logic gate can be configured in all-spin logic.It is considerably more complex than what we have discussed in the context of SSL or hybrid spintronics/straintronics or RAMA.
The energy dissipation in all-spin logic was briefly addressed in [99].As always, there are two components to the energy dissipation; the internal energy dissipated in the magnets and the energy dissipated by the currents that switch the magnets.The latter will be roughly (I 1 − I 2 ) 2 (R 1 + R 2 ).

Figure 1 :
Figure 1: (Left panel): the on and off states of a metal-insulatorsemiconductor field effect transistor (MISFET).The device is "on" when there are charge carriers (electrons) at the semiconductorinsulator interface (channel) to carry current between the source and drain contacts.A negative voltage applied to the gate depletes the channel of charge carriers and turns the device "off."These two states encode the binary bits 0 and 1. (Right panel): the binary bits 0 and 1 are encoded in the up-and down-spin polarizations of an electron.

Figure 4 :
Figure 4: A Bennett clocked spin wire.(a) Ground state of an array of nearest-neighbor exchange coupled dots forming a "spin wire;" (b) the nth spin is flipped and a tie results between the nth, (n + 1)th and (n + 2)th spins; (c) the (n + 1)th and (n + 2)th spins are rotated to the right by ∼90 • with external potentials applied selectively to these two dots which resolves the tie; (d) the potential is withdrawn from the (n + 1)th dot, whose resident spin flips down because of unequal exchange interactions with the right and left neighbors, while the spin in the (n + 3)th dot is rotated by ∼90 • by a potential applied to that dot; (e) the potential from the (n +2)th dot is withdrawn, resulting in the host spin flipping up; (f) "fan out" operation via a bifurcated spin wire (the gate pads between the dots are not shown here for clarity).

Figure 5 :
Figure5: A two-phase multiferroic nanomagnet shaped like an elliptical cylinder.It consists of a magnetostrictive layer elastically coupled to a piezoelectric layer.There are clamps (not shown) that prevent expansion/contraction of the multiferroic in the ydirection.An electrostatic potential applied across the piezoelectric generates uniaxial strain in that layer, which is transferred almost entirely to the magnetostrictive layer if the latter layer is much thinner than the former.This will generate uniaxial stress in the magnet and rotate its magnetization away from the stable z-axis (easy axis) towards the y-axis (in-plane hard axis), ultimately resulting in the magnetization flipping if the voltage is turned off as soon as the magnetization vector enters the x-y plane.

Figure 6 :
Figure 6: Potential energy in the plane of the magnet (φ = 90 • ) as a function of the polar angle θ.

Figure 7 :
Figure7: A nanomagnetic realization of a NAND gate with fan-in and fan-out.The four magnets within the shaded region constitute the basic NAND gate and the remaining eight magnets are used for fan-in and fan-out.Upspin represents logic bit 1 and downspin logic bit 0. In a linear array, if the line joining the centers of the magnets is parallel to the in-plane hard axis, then dipole interaction between the magnets ensures that the ordering is antiferromagnetic, whereas if that line is parallel to the easy axis, then the ordering is ferromagnetic.Note that very specific distances have to be maintained between the magnets and that the arrangement here is different from that in SSL.This figure is adapted from[86] with permission from the Institute of Physics.The magnets are clocked with a 4-phase sinusoidal clock in order to propagate bits unidirectionally from the input to the output port.Each group of magnets labeled I, II, III, and IV is clocked with one phase and the clock phases are shifted from each other by 90 • .

Figure 8 :
Figure8: A memory element in a cross-bar configuration for reading and writing bits.In order to read a stored bit, a spin-valve structure is used.This structure consists of a soft magnetostrictive layer separated from a permanently magnetized hard magnet by a thin spacer layer.Let us say that the magnetization orientation of the hard magnet represents bit 1.If the soft layer stores bit 1, its magnetization is parallel to that of the hard layer and the vertical resistance of the spin valve structure will be small.If the soft layer stores bit 0, its magnetization is antiparallel to that of the hard layer and the spin-valve's resistance will be larger.Thus, by reading the spin-valve's ac resistance with a small signal, we can read the stored bit.The resistance is measured between the upper and lower crossbars.For writing, the stored bit is first read.If it is the desired bit, no action is taken.Otherwise, the bit is flipped by applying a potential between the upper and lower cross-bars.This potential is dropped mostly across the piezoelectric since the magnets are metallic and the spacer layer is ultrathin.

Figure 9 :
Figure 9: (Left panel) the RAMA architecture and (right panel) a NAND realization.
= |gμ B B global |), while 2h L and 2h R are Zeeman splitting energies in the left and right input dots caused by the local magnetic fields that write input data (2h L = |gμ B B left local |; 2h R = |gμ B B

Table 1 :
Truth table of the Toffoli-Fredkin gate.