[Lecture Notes by Prof Yuen and possibly others 2000-2001] Information Theory ------------------ Claud Shannon's mathematical theory of information was a major breakthrough in communication engineering: it allows system designers to estimate the capacity of communication channels and the requirements of information sources, and also has some application in data storage (since storing away information and getting it back is rather similar to sending and receiving information) and possibly psychology. Relation with other subjects has also been suggested though so far not much imortant impact has occurred. The basic observation lies in that information removes uncertainty, and if we can measure uncertainty, we can measure information. Like if you do not know whether someone is male or female, then the two possibilities are equally probable; if you do not know about the aged of a person, say it is equally likely to be from 0 to 127, then your uncertainty is greater, so the information value of knowing the age is greater than the information value of knowing the gender. How much greater? Today we know that one requires 1 bit of information, and the other requires 7 bits. However, it is not as simple as that: we may already have some information, like it is very unlikely for a person to be over 100, and if we are talking about an NUS professor, the chance of him/her being below 30 or above 70 would be low; so the information is actually worth less than 7 bits. Mathematically, it says "non-uniform probability distribution has less uncertainty than uniform distribution over the same set" and we have formulae to produce the exact information value from the probability distribution; in the case of uniform distribution over set of size N, the information values is of course logN. The second observation is that sending information over communication channels may add uncertainty and hence reduce information, since what you receive might be different from what you send. You can mathematically model the channel's behaviour also in terms of probability distributions: if x is sent, the probability of receiving x1, x2, x3... instead. This can be done not just for sending single values, but for whole functions, and you can work out a theoretical information transmission capacity from the way a channel changes a finite duration function to a longer function because of slow channel response and resonances, or equivalently, the way it suppresses high frequencies because of its inability to respond fast enough, and enhances some frequencies because of resonances. The important implication of Shannon's results was that there are certain inherent limits on our ability to design our signals to get a larger amount of information through a communication channel: you can represent information using different pulses of various levels and shapes, modulate different carrier frequencies, recode using different number systems, etc, but the ultimate capacity of communication channels is determined by its send- receive probability distributions. However, you can use signal designs for other objectives. For example, though the received signal may be different from the sent signal, you can get information across without error (or more correctly, with infinitesimal probability of error), as long as the amount is less than the channel capacity. This is achieved by using the extra transmission capacity to send information about the information, so that errors caused by the channel would produce inconsistency in the received signals, which allow errors to be detected and corrected. Altern- atively, you can use the spare capacity to send transformed versions of your signals so that its content is secure against eavesdroppers. A secure code system is usually based on a secret transform algorithm and a secret reverse algorithm, but an idea important for e-business is public key system: the transform algorithm is released to the public so that other people can send messages to you which only you can read, since eaves droppers do not know the reverse algorithm. Such a public-secret transform- reverse pair is however difficult to design, and in practice most schemes use reverse transforms which could be worked out from the forward transforms in theory, though with impractical complexity. For example, I publicize a matrix for everyone to transform their numerical messages by matrix multi- plication, and I know the inverse matrix which I multiply into your message to recover the original; others could derive the inverse, but the cost is N**3, while the cost of message tranform/reverse is N. Now this difference is rather too small and it would be easy to "break the code", but there are schemes where the difference between "authorized use" and "unauthorized use" is exponential. We also have the opposite: I release the inverse algorithm to the public, and send messages to people I do business with using the secret transform which others do not know. Anyone can apply the public transform and so can read my messages, which are not meant to be secret; however, you would not have others sendng messages pretending to be me because they do not know my transform algorithm: their messages, when reversed using the algorithm I gave out, make no sense. Hence, my messages are authenticated by the publicly available inverse algorithm. Further, I would not be able to deny that I sent those messages, because nobody else could send messages reversible by the inverse algorithm. So you see that the encryption methods work two ways.