[Lecture Notes by Prof Yuen and possibly others 2000-2001]

Information Theory
------------------

Claud Shannon's mathematical theory of information was a major breakthrough
in communication engineering: it allows system designers to estimate the
capacity of communication channels and the requirements of information sources,
and also has some application in data storage (since storing away information
and getting it back is rather similar to sending and receiving information)
and possibly psychology. Relation with other subjects has also been suggested
though so far not much imortant impact has occurred.

The basic observation lies in that information removes uncertainty, and if
we can measure uncertainty, we can measure information. Like if you do not
know whether someone is male or female, then the two possibilities are
equally probable; if you do not know about the aged of a person, say it
is equally likely to be from 0 to 127, then your uncertainty is greater,
so the information value of knowing the age is greater than the information
value of knowing the gender. How much greater? Today we know that one 
requires 1 bit of information, and the other requires 7 bits. However, it
is not as simple as that: we may already have some information, like it
is very unlikely for a person to be over 100, and if we are talking about
an NUS professor, the chance of him/her being below 30 or above 70 would
be low; so the information is actually worth less than 7 bits. Mathematically,
it says "non-uniform probability distribution has less uncertainty than
uniform distribution over the same set" and we have formulae to produce
the exact information value from the probability distribution; in the case
of uniform distribution over set of size N, the information values is of
course logN.

The second observation is that sending information over communication 
channels may add uncertainty and hence reduce information, since what
you receive might be different from what you send. You can mathematically
model the channel's behaviour also in terms of probability distributions:
if x is sent, the probability of receiving x1, x2, x3... instead. This
can be done not just for sending single values, but for whole functions,
and you can work out a theoretical information transmission capacity from
the way a channel changes a finite duration function to a longer function
because of slow channel response and resonances, or equivalently, the
way it suppresses high frequencies because of its inability to respond
fast enough, and enhances some frequencies because of resonances.

The important implication of Shannon's results was that there are certain
inherent limits on our ability to design our signals to get a larger amount
of information through a communication channel: you can represent information
using different pulses of various levels and shapes, modulate different
carrier frequencies, recode using different number systems, etc, but the
ultimate capacity of communication channels is determined by its send-
receive probability distributions. However, you can use signal designs
for other objectives. For example, though the received signal may be
different from the sent signal, you can get information across without
error (or more correctly, with infinitesimal probability of error), as long
as the amount is less than the channel capacity. This is achieved by using
the extra transmission capacity to send information about the information,
so that errors caused by the channel would produce inconsistency in the
received signals, which allow errors to be detected and corrected. Altern-
atively, you can use the spare capacity to send transformed versions of your
signals so that its content is secure against eavesdroppers.

A secure code system is usually based on a secret transform algorithm and
a secret reverse algorithm, but an idea important for e-business is public
key system: the transform algorithm is released to the public so that
other people can send messages to you which only you can read, since eaves
droppers do not know the reverse algorithm. Such a public-secret transform-
reverse pair is however difficult to design, and in practice most schemes
use reverse transforms which could be worked out from the forward transforms
in theory, though with impractical complexity. For example, I publicize a
matrix for everyone to transform their numerical messages by matrix multi-
plication, and I know the inverse matrix which I multiply into your message
to recover the original; others could derive the inverse, but the cost is
N**3, while the cost of message tranform/reverse is N. Now this difference
is rather too small and it would be easy to "break the code", but there are
schemes where the difference between "authorized use" and "unauthorized use"
is exponential.

We also have the opposite: I release the inverse algorithm to the public,
and send messages to people I do business with using the secret transform
which others do not know. Anyone can apply the public transform and so can
read my messages, which are not meant
to be secret; however, you would not have others sendng messages pretending 
to be me because they do not know my transform algorithm: their messages,
when reversed using the algorithm I gave out, make no sense. Hence, my messages
are authenticated by the publicly available inverse algorithm. Further, I would
not be able to deny that I sent those messages, because nobody else could send
messages reversible by the inverse algorithm. So you see that the encryption
methods work two ways.