The ability to work with singularquantum systems will be necessary if we are able to harness the power ofquantum mechanics and apply it to computational and information theories.

(atomtrap – isolate a single atom and probe its behaviour)Study of information processingtasks that can be accomplished using quantum mechanical systemsA classical computer is simply anextravagant calculator, a device whose memory can store a sting of numbers, 0sand 1s, known as bits. These devices can perform operations and building theseup gives us an algorithm. In 19** Somebody Moore presented a paper detailing Information can be encoded In 1965 Gordon Moore observedthat transistors, (info about what a transistor does here) were shrinking atsuch a rate that every year double the number could fit onto a chip, ten yearslater this slowed to doubling every two years, and since this result was statedover 50 years ago the prediction has held true, until now as it begins to slow.Moore’s company, Intel, now suggests that the currently used silicontransistors will reach their limit in around five years. What does this meanfor computing as we know it? Over the last 50 years we’ve seen a computationalrevolution, where previously (info about what we could do in the beginning witha computer – army room sized) we can – and over x% of the worlds population -now carry this around in their pockets. We have become accustomed to continualimprovement and development and see some exciting advances for the future, withthe advent of machine learning, self-driving cars and (another new use ofcomputing), will we be able to continue this without the ever shrinkingtransistors and chip capability?From Morse Code to Classical computing and the next step into thequantum world.

How does a classical computerwork?In Claude Shannon’s 1948 paperthe notion of information was defined and quantified. The smallest unit ofinformation, the binary digit, simply the bit, was defined and Shannon showedit was possible to encode any message using only two values, 0s and 1s in astring. A further way to see this is using the idea of the game 20 questions,the only possible responses are ‘yes’ or ‘no’ and yet we can gain a huge amountof information about the thing the ‘sender’ is trying to communicate. If weextend this further and remove the limit of the number of questions we are ableto describe everything, simply using this string of 0s and 1s. Throughout its vast development, communicationhas been fraught with problems. From the days when fires were lit acrosshilltops to transfer news in the fog and rain, to analogue systems gainingnoise and becoming distorted to the current state of classical computingseemingly reaching its limit, what is the next step? In this paper we look atthe abilities of classical computers and their mathematical limits and how wecompare and contrast this with the developing quantum computers, consideringtheir possible uses and capabilities. Since Claude Shannon’s 1948 revolutionarypaper on information theory the developments of computation have been rapid.

A Classical ComputerWhilst computers have developedto be able to do a whole host of tasks, they still remain relatively simplemachines operating via two key functions which use the transistor. Firstly, atransistor can store a bit, a value of either 0 or 1, and with a string of just8 bits we can encode around 255 characters, both the upper and lower casealphabets, the numbers zero to nine and a number of the most commonly usedsymbols. Secondly, logic gates are used to undertake computations usingtransistors which are connected together, resulting in a new string of bits.

Itis these two processes, storage and transmission of information, whichShannon’s Mathematical Theories In Computation paper discuss, that underlie thetheory of classical information. In order to efficiently use the power oftransistors we need to compress the given data into as small a piece aspossible. In 1965 Gordon Moore observed thattransistors, were shrinking at such a rate that every year double the numbercould fit onto a chip, ten years later this slowed to doubling every two years,and since this result was stated over 50 years ago the prediction has heldtrue, until now as it begins to slow. Moore’s company, Intel, now suggests thatthe currently used silicon transistors will reach their limit in around fiveyears. What does this mean for computing as we know it? Over the last 50 yearswe’ve seen a computational revolution, where previously (info about what wecould do in the beginning with a computer – army room sized) we can – and overx% of the worlds population – now carry this around in their pockets. We havebecome accustomed to continual improvement and development and are on the edgeof some exciting advances for the future, with the advent of machine learning,self-driving cars and (another new use of computing), but will we be able tocontinue this once we reach the limit of ever shrinking transistors and chipcapability?In order to consider the limitswhich Shannon determined on both data compression and data transmission werequire a background of the various types of entropy defined.

In this sectionwe will introduce the notion of entropy, joint entropy, conditional entropy aswell as mutual information from the classical point of view. From there we willdiscuss Shannon’s Noiseless Channel Capacity Theorem and Shannon’s NoisyChannel Capacity Theorem. In Shannon’s 1948 paper the notionof information as discrete data was presented, allowing the idea of entropy toform, that is, how much information we gain by obtaining the outcome of an event.A useful example of this idea is a coin toss. Suppose we have a fair coin,where the likelihood of obtaining heads upon tossing the coin is 0.5, then wegain the maximum about of information when we see the result, we could not havepredicted the outcome.

Now suppose the coin is heavily biased, where obtaininga head is 0.9, then we are not surprised when heads occurs, we have gained lessinformation. Shannon required that for aseries of possible events with probabilities p1, p2, … pn to find a measureH(p1, p2, …, pn) of uncertainty regarding the outcome1. Therespective probabilities must be continuous2. Ifall the probabilities are equal pi = 1/n then n should be a monotonicallyincreasing function3. Fortwo sources of outcomes A and B, H(A,B) = H(A) + H(B|A) i.e. the associatedmeasure of uncertainty is the weighted sum of the two separately.

The only function satisfyingthese properties is of the form H(Pi) = -K sum_{i=1}^n p_i log p_i where p_iis the probability distribution of our series of possible outcomes. Movingforwards we will work with log_2, which gives us the binary entropy and removesthe constant K which is a unit of measure. Considering the entropy of a casewhich has two possible outcomes with probabilities p and q = 1-p shows someinteresting properties of the binary entropy function. Plotting H = -p log p + qlog q gives (figure) which shows entropy to always be positive, and only equalto zero when there is no uncertainty in an event happening. Further, H ismaximised when p = q, showing that uncertainty maximises entropy. From here we can define the jointentropy, suppose we have two sources X and Y with discrete outcomes, thenH(X,Y) = – sum p(x,y) log p(x,y), if the outcomes are perfectly independentthen we can describe the joint entropy as H(X,Y) = H(X) + H(Y), however, in generalH(X,Y) < H(X) + H(Y) since.The conditional entropy allows usto find the information gained if we know that the outcome of one event dependsupon the outcome of another, in this case H(Y|X) = -sum_{x, y} p(x,y) log p(y|x)= -sum_x p(x) H(Y|X =x ) CODES Figure 2 shows how data istransmitted from a source to its output.

Firstly, the source is encoded, itthen travels through a channel, which may be susceptible to noise which canlead to error in the source, this is then decoded back to a form which thereceiver can then understand. Shannon’s two theories depend upon the idea of redundancy,the first, his Noiseless Channel Coding Theorem dictates the limit ofcompression, that is how much redundancy can be removed from the source whilst allowingit to be reliably decoded at the output end. The second, the Noisy ChannelCoding Theorem, gives a mathematical limit for how much redundancy we must addto each source to ensure that it can be reliably recovered after it has beentransmitted through a potentially noisy channel which has introduced an elementof error.

It was these two theorems thathelped revolutionise classical computing, allowing engineers to create systemswhereby the limits were utilised. Suppose we have a finite alphabet$A := {a_1, a_2, …, a_n}$ with an associated probability distribution $P_i = (p_1,p_2, …, p_n)$ and $sum P_i = 1$ and a string of characters $x_1, x_2, …, x_nin A^n $ from a memoryless source (such that no character depends upon thosewhich precede or follow it). Shannon showed that if $n (much bigger) 1$ theinformation gained by a message of n characters with entropy $H(A)$ can be compressedinto $nH(A) bits$ (an excellent proof is presented in ).Now, for the second theorem, supposeagain that $A$ is our source alphabet and let $B$ be the alphabet at thereceiving end of the system, with a channel over which the source istransmitted. The capacity of the channel $C:= Sup_{p_{X}}$ where $I(X;Y)$ isthe mutual information, as defined in section.

Now for our decoder to reliably decodethe source it receives this mutual information must be within a certain limit,that is, we require there to be a certain amount of correlation between the inputand output across a channel. The rate $R$ of a code $c$ is defined $R:= frac{log|c|}{n}$i.e. the number of informative bits the number of bits transferred. To bereliable we require $C > R$ ie the capacity of the channel is greater thanthe rate of information transferred. It is widely believed that allclassical systems can be described using quantum mechanics, and so it isnatural to view computing, which we consider in a macroscopic world, on amicroscopic scale and in terms of quantum mechanics.

Just as a classicalcomputer is built from wires and transistors that perform computations aquantum computer uses wires to transfer information and quantum gates tomanipulate this quantum information, and it is the behaviour of the qubit, thesmallest possible quantum unit, that allows us to explore new methods incomputing. The most simple quantum systemcan be described as a two dimensional Hilbert space spanned by the orthonormalbasis ${|0>, |1>} and alongside the basis states of $|0>$ and $|1>$,analogous to $0$ and $1$, the system can have any other state which is a linearsuperposition of $a|0> + b|1>$ with $|a|^2 + |b|^2 =1$. A useful way to considerthis is to view the pure states of a singular qubit as the points on a Bloch sphere,where the classical bits would be the north and south pole, a qubit can take onany point on the sphere. The fact that a qubit can take on any of an infinitenumber of possible states, as opposed to the two classical choices, allowsquantum computing to be different to that of classical computing. Many aspects of classicalinformation theory have analogous properties in the quantum world, there are,however, other ideas which are impossible in the classical world which allow usto view quantum computing as a different and more useful capability. Von Neumann this is not the case in thequantum world, where a state can take the form of any linear superposition ofthe form $a|0> + b|1>$ In classical computing it ispossible to divide a system into its subsystems allowing us to decode and thusdeconstruct the entire system, this is not the case in the quantum world.Entanglement is the property that allows us to know everything about our fullsystem with no ability to decompose it to its parts, leaving us uncertain aboutthe individual components.