The ability to work with singular

quantum systems will be necessary if we are able to harness the power of

quantum mechanics and apply it to computational and information theories. (atom

trap – isolate a single atom and probe its behaviour)

Study of information processing

tasks that can be accomplished using quantum mechanical systems

A classical computer is simply an

extravagant calculator, a device whose memory can store a sting of numbers, 0s

and 1s, known as bits. These devices can perform operations and building these

up gives us an algorithm. In 19** Somebody Moore presented a paper detailing

Information can be encoded

In 1965 Gordon Moore observed

that transistors, (info about what a transistor does here) were shrinking at

such a rate that every year double the number could fit onto a chip, ten years

later this slowed to doubling every two years, and since this result was stated

over 50 years ago the prediction has held true, until now as it begins to slow.

Moore’s company, Intel, now suggests that the currently used silicon

transistors will reach their limit in around five years. What does this mean

for computing as we know it? Over the last 50 years we’ve seen a computational

revolution, where previously (info about what we could do in the beginning with

a computer – army room sized) we can – and over x% of the worlds population –

now carry this around in their pockets. We have become accustomed to continual

improvement and development and see some exciting advances for the future, with

the advent of machine learning, self-driving cars and (another new use of

computing), will we be able to continue this without the ever shrinking

transistors and chip capability?

From Morse Code to Classical computing and the next step into the

quantum world.

How does a classical computer

work?

In Claude Shannon’s 1948 paper

the notion of information was defined and quantified. The smallest unit of

information, the binary digit, simply the bit, was defined and Shannon showed

it was possible to encode any message using only two values, 0s and 1s in a

string. A further way to see this is using the idea of the game 20 questions,

the only possible responses are ‘yes’ or ‘no’ and yet we can gain a huge amount

of information about the thing the ‘sender’ is trying to communicate. If we

extend this further and remove the limit of the number of questions we are able

to describe everything, simply using this string of 0s and 1s.

Throughout its vast development, communication

has been fraught with problems. From the days when fires were lit across

hilltops to transfer news in the fog and rain, to analogue systems gaining

noise and becoming distorted to the current state of classical computing

seemingly reaching its limit, what is the next step? In this paper we look at

the abilities of classical computers and their mathematical limits and how we

compare and contrast this with the developing quantum computers, considering

their possible uses and capabilities. Since Claude Shannon’s 1948 revolutionary

paper on information theory the developments of computation have been rapid.

A Classical Computer

Whilst computers have developed

to be able to do a whole host of tasks, they still remain relatively simple

machines operating via two key functions which use the transistor. Firstly, a

transistor can store a bit, a value of either 0 or 1, and with a string of just

8 bits we can encode around 255 characters, both the upper and lower case

alphabets, the numbers zero to nine and a number of the most commonly used

symbols. Secondly, logic gates are used to undertake computations using

transistors which are connected together, resulting in a new string of bits. It

is these two processes, storage and transmission of information, which

Shannon’s Mathematical Theories In Computation paper discuss, that underlie the

theory of classical information. In order to efficiently use the power of

transistors we need to compress the given data into as small a piece as

possible.

In 1965 Gordon Moore observed that

transistors, were shrinking at such a rate that every year double the number

could fit onto a chip, ten years later this slowed to doubling every two years,

and since this result was stated over 50 years ago the prediction has held

true, until now as it begins to slow. Moore’s company, Intel, now suggests that

the currently used silicon transistors will reach their limit in around five

years. What does this mean for computing as we know it? Over the last 50 years

we’ve seen a computational revolution, where previously (info about what we

could do in the beginning with a computer – army room sized) we can – and over

x% of the worlds population – now carry this around in their pockets. We have

become accustomed to continual improvement and development and are on the edge

of some exciting advances for the future, with the advent of machine learning,

self-driving cars and (another new use of computing), but will we be able to

continue this once we reach the limit of ever shrinking transistors and chip

capability?

In order to consider the limits

which Shannon determined on both data compression and data transmission we

require a background of the various types of entropy defined. In this section

we will introduce the notion of entropy, joint entropy, conditional entropy as

well as mutual information from the classical point of view. From there we will

discuss Shannon’s Noiseless Channel Capacity Theorem and Shannon’s Noisy

Channel Capacity Theorem.

In Shannon’s 1948 paper the notion

of information as discrete data was presented, allowing the idea of entropy to

form, that is, how much information we gain by obtaining the outcome of an event.

A useful example of this idea is a coin toss. Suppose we have a fair coin,

where the likelihood of obtaining heads upon tossing the coin is 0.5, then we

gain the maximum about of information when we see the result, we could not have

predicted the outcome. Now suppose the coin is heavily biased, where obtaining

a head is 0.9, then we are not surprised when heads occurs, we have gained less

information.

Shannon required that for a

series of possible events with probabilities p1, p2, … pn to find a measure

H(p1, p2, …, pn) of uncertainty regarding the outcome

1. The

respective probabilities must be continuous

2. If

all the probabilities are equal pi = 1/n then n should be a monotonically

increasing function

3. For

two sources of outcomes A and B, H(A,B) = H(A) + H(B|A) i.e. the associated

measure of uncertainty is the weighted sum of the two separately.

The only function satisfying

these properties is of the form H(Pi) = -K sum_{i=1}^n p_i log p_i where p_i

is the probability distribution of our series of possible outcomes. Moving

forwards we will work with log_2, which gives us the binary entropy and removes

the constant K which is a unit of measure.

Considering the entropy of a case

which has two possible outcomes with probabilities p and q = 1-p shows some

interesting properties of the binary entropy function. Plotting H = -p log p + q

log q gives (figure) which shows entropy to always be positive, and only equal

to zero when there is no uncertainty in an event happening. Further, H is

maximised when p = q, showing that uncertainty maximises entropy.

From here we can define the joint

entropy, suppose we have two sources X and Y with discrete outcomes, then

H(X,Y) = – sum p(x,y) log p(x,y), if the outcomes are perfectly independent

then we can describe the joint entropy as H(X,Y) = H(X) + H(Y), however, in general

H(X,Y) < H(X) + H(Y) since.
The conditional entropy allows us
to find the information gained if we know that the outcome of one event depends
upon the outcome of another, in this case H(Y|X) = -sum_{x, y} p(x,y) log p(y|x)
= -sum_x p(x) H(Y|X =x )
CODES
Figure 2 shows how data is
transmitted from a source to its output. Firstly, the source is encoded, it
then travels through a channel, which may be susceptible to noise which can
lead to error in the source, this is then decoded back to a form which the
receiver can then understand. Shannon's two theories depend upon the idea of redundancy,
the first, his Noiseless Channel Coding Theorem dictates the limit of
compression, that is how much redundancy can be removed from the source whilst allowing
it to be reliably decoded at the output end. The second, the Noisy Channel
Coding Theorem, gives a mathematical limit for how much redundancy we must add
to each source to ensure that it can be reliably recovered after it has been
transmitted through a potentially noisy channel which has introduced an element
of error.
It was these two theorems that
helped revolutionise classical computing, allowing engineers to create systems
whereby the limits were utilised.
Suppose we have a finite alphabet
$A := {a_1, a_2, …, a_n}$ with an associated probability distribution $P_i = (p_1,
p_2, …, p_n)$ and $sum P_i = 1$ and a string of characters $x_1, x_2, …, x_n
in A^n $ from a memoryless source (such that no character depends upon those
which precede or follow it). Shannon showed that if $n (much bigger) 1$ the
information gained by a message of n characters with entropy $H(A)$ can be compressed
into $nH(A) bits$ (an excellent proof is presented in ).
Now, for the second theorem, suppose
again that $A$ is our source alphabet and let $B$ be the alphabet at the
receiving end of the system, with a channel over which the source is
transmitted. The capacity of the channel $C:= Sup_{p_{X}}$ where $I(X;Y)$ is
the mutual information, as defined in section. Now for our decoder to reliably decode
the source it receives this mutual information must be within a certain limit,
that is, we require there to be a certain amount of correlation between the input
and output across a channel. The rate $R$ of a code $c$ is defined $R:= frac{log|c|}{n}$
i.e. the number of informative bits the number of bits transferred. To be
reliable we require $C > R$ ie the capacity of the channel is greater than

the rate of information transferred.

It is widely believed that all

classical systems can be described using quantum mechanics, and so it is

natural to view computing, which we consider in a macroscopic world, on a

microscopic scale and in terms of quantum mechanics. Just as a classical

computer is built from wires and transistors that perform computations a

quantum computer uses wires to transfer information and quantum gates to

manipulate this quantum information, and it is the behaviour of the qubit, the

smallest possible quantum unit, that allows us to explore new methods in

computing.

The most simple quantum system

can be described as a two dimensional Hilbert space spanned by the orthonormal

basis ${|0>, |1>} and alongside the basis states of $|0>$ and $|1>$,

analogous to $0$ and $1$, the system can have any other state which is a linear

superposition of $a|0> + b|1>$ with $|a|^2 + |b|^2 =1$. A useful way to consider

this is to view the pure states of a singular qubit as the points on a Bloch sphere,

where the classical bits would be the north and south pole, a qubit can take on

any point on the sphere. The fact that a qubit can take on any of an infinite

number of possible states, as opposed to the two classical choices, allows

quantum computing to be different to that of classical computing.

Many aspects of classical

information theory have analogous properties in the quantum world, there are,

however, other ideas which are impossible in the classical world which allow us

to view quantum computing as a different and more useful capability.

Von Neumann

this is not the case in the

quantum world, where a state can take the form of any linear superposition of

the form $a|0> + b|1>$

In classical computing it is

possible to divide a system into its subsystems allowing us to decode and thus

deconstruct the entire system, this is not the case in the quantum world.

Entanglement is the property that allows us to know everything about our full

system with no ability to decompose it to its parts, leaving us uncertain about

the individual components.