Books on information theory tend to fall into one of two extreme categories. There are large academic textbooks that cover the subject with great depth and rigor. Probably the best known of these is the book by Cover and Thomas. At the other extreme are the popular books such as the ones by Pierce and Gleick. They provide a very superficial introduction to the subject, enough to engage in cocktail party conversation but little else. This book attempts to bridge these two extremes.

This book is written for someone who is at least semi-mathematically literate and wants a concise introduction to some of the major concepts in information theory. The level of mathematics needed is very elementary. A rudimentary grasp of logarithms, probability, and basic algebra is all that is required. Two chapters at the end of the book provide a review of everything the reader needs to know about logarithms and discrete probability to get the most out of the book. Very little attention is given to mathematical proof. Instead the results are presented in a way that makes them almost obvious or at least plausible.

We start in the introduction with a discussion of how information theory has its roots in the field of communication systems design. This leads to the question of how to quantify information and how a logarithmic measure is the most sensible. The concept of entropy is introduced at this point but only for the case where all messages are equally probable. The introduction ends with two examples of how information concepts come up in areas seemingly unrelated to communication. The first is a number guessing game and the second is the problem of finding a counterfeit coin.

The next chapter looks at the problem of encoding messages as efficiently as possible. This is the source coding or data compression problem. The idea of prefix free codes and the Kraft-McMillan inequality are introduced. It is shown how the entropy is a lower limit for the average length of a code word.

The following two chapters discuss specific coding techniques. The first is Huffman coding. Three detailed examples of constructing a Huffman code are worked out. Software for constructing Huffman codes can be found on the book's website.

The next coding chapter is on a powerful technique called arithmetic coding. This technique encodes a string of symbols as a single code word. For long strings of symbols it gets very close to the per symbol entropy.

There is a long chapter devoted just to the concept of entropy. How to calculate joint and conditional entropy is covered along with a detailed example of how to use them. There is a discussion of mutual information, what it means and how it is calculated. This is followed by a simple example of how to calculate the entropy of a Markov chain. The chapter ends with an elementary example using the principle of maximum entropy to infer a probability distribution.

Calculating the entropy of English is covered in the next chapter. It shows how to use the statistics of n-grams to get a series of increasingly accurate estimates for the entropy. It also shows how to use the statistics of words to estimate the entropy.

The next chapter covers channel capacity and the noisy channel coding theorem. It shows in general how to calculate channel capacity but only the capacity of a binary symmetric channel is worked out in detail. There is a brief discussion of the noisy channel coding theorem with no proofs. The chapter ends with an unusual example of the use of channel capacity in gambling and investing. This is called the Kelly gambling system.

The final chapter is a brief introduction to the topic of error correction or channel coding. Repetition codes, parity check codes, and Hamming codes are covered.

We hope this book is useful for someone looking for a fast introduction to most of the major topics in information theory. An introduction that is concise but not superficial.