Entropy (Shannon)

Core idea

Overview

Shannon entropy quantifies the average level of uncertainty, surprise, or information inherent in a random variable's possible outcomes. It provides the theoretical foundation for data compression by defining the minimum average number of bits required to represent a message.

When to use: Use this formula to determine the limits of lossless data compression or to measure the unpredictability of a discrete probability distribution. It is most effective when the set of possible outcomes is finite and their probabilities are independent and known.

Why it matters: It is the fundamental metric of information theory, enabling the efficiency of modern digital communications, from ZIP files to streaming video. By identifying the statistical structure of data, it allows for the optimization of storage and transmission bandwidth.

Symbols

Variables

H = Entropy (Bits), p = Probability (p)

H

Entropy (Bits)

bits

p

Probability (p)

Variable

Walkthrough

Derivation

Formula: Shannon Entropy

Shannon entropy measures average uncertainty (information content) of a discrete random variable, using probabilities of outcomes.

X is discrete with outcomes $x_{i}$ and probabilities $p_{i}$ =P( $x_{i}$ ).
Terms with $p_{i}$ =0 contribute 0 (treat 0\log 0 as 0).

1

State the entropy formula:

Sum probability-weighted information $lo g_{2}$ (1/ $p_{i}$ ) across outcomes, giving expected information per symbol.

H (X) = - i = 1 \sum n p_{i} lo g_{2} (p_{i})

2

Interpret the units:

Using base-2 logarithms means the entropy is measured in bits (binary digits).

units = bits

Note: Maximum entropy occurs when all outcomes are equally likely.

Result

units = bits

Source: AQA A-Level Computer Science — Data Representation

Free formulas

Rearrangements

Solve for $H$

Entropy (Shannon)

H = - [p lo g_{2} (p) + q lo g_{2} (q)]

Simplify Shannon's Entropy formula from its general summation form to the specific case of binary entropy, where there are only two possible outcomes.

Difficulty: 2/5

The static page shows the finished rearrangements. The app keeps the full worked algebra walkthrough.

Visual intuition

Graph

The graph of entropy (H) against the probability (p) is a concave, parabolic-like curve that starts and ends at zero. It reaches a single maximum turning point when the probability is 0.5, reflecting the peak uncertainty in a binary system.

Graph type: parabolic

Why it behaves this way

Intuition

Shannon entropy quantifies the 'spread' or 'flatness' of a probability distribution: a more uniform distribution (all outcomes equally likely)

H(X)

Shannon entropy, representing the average uncertainty or information content of a random variable X.

A higher H(X) means the outcomes of X are more unpredictable or 'surprising' on average, requiring more bits to describe.

p(x)

The probability of a specific outcome 'x' from the set of all possible outcomes for the random variable X.

How likely a particular event 'x' is to occur. Less likely events (small p(x)) carry more individual information.

l o g_{2} p (x)

The logarithm (base 2) of the probability of an outcome 'x'. This term, when negated, represents the 'self-information' or 'surprise' of outcome 'x'.

Since p(x) is between 0 and 1, log_2 p(x) is always negative or zero. Outcomes with very low probability have a large negative log_2 p(x), meaning they are very 'surprising' (and thus carry a lot of information when the in that context).

Signs and relationships

-: The logarithm log_2 p(x) is negative for probabilities p(x) between 0 and 1. The negative sign ensures that the information content -log_2 p(x) is a positive quantity, representing the number of bits.

Free study cues

Insight

Canonical usage

Shannon entropy quantifies information in units determined by the base of the logarithm used, most commonly bits (for base 2 logarithm).

Common confusion

A common confusion is treating 'bits' as a physical unit rather than a specific unit of information derived from the base-2 logarithm. Another is forgetting that probabilities must sum to 1 and are dimensionless.

Dimension note

Shannon entropy is a dimensionless quantity representing the average information content or uncertainty. The probabilities p(x) are themselves dimensionless, and the logarithm of a dimensionless quantity is also

Unit systems

H(X)bits - The unit 'bit' is specific to information theory and arises from using a base-2 logarithm. Other bases yield 'nats' (natural logarithm) or 'hartleys' (base-10 logarithm).

p(x)dimensionless - Probabilities are dimensionless values between 0 and 1, summing to 1 over all possible outcomes.

One free problem

Practice Problem

A fair coin has two outcomes, heads and tails, each with a probability of 0.5. Calculate the Shannon entropy of a single coin flip.

Probability (p)0.5

Solve for: $H$

Hint: When outcomes are equally likely (p = 0.5 for binary), entropy is at its maximum value.

The full worked solution stays in the interactive walkthrough.

Where it shows up

Real-World Context

When measuring uncertainty of a biased coin, Entropy (Shannon) is used to calculate Entropy from Probability (p). The result matters because it helps estimate likelihood and make a risk or decision statement rather than treating the number as certainty.

Study smarter

Tips

Entropy is maximized when all outcomes are equally likely.
The units are in bits when the logarithm is base 2.
Entropy is always zero or positive; it is zero only when one outcome is certain.
Use the change of base formula: log₂(x) = ln(x) / ln(2).

Avoid these traps

Common Mistakes

Using natural log instead of log2.
Forgetting both p and q terms.

Keep going

Related Formulas

Common questions

Frequently Asked Questions

Shannon entropy measures average uncertainty (information content) of a discrete random variable, using probabilities of outcomes.

Use this formula to determine the limits of lossless data compression or to measure the unpredictability of a discrete probability distribution. It is most effective when the set of possible outcomes is finite and their probabilities are independent and known.

It is the fundamental metric of information theory, enabling the efficiency of modern digital communications, from ZIP files to streaming video. By identifying the statistical structure of data, it allows for the optimization of storage and transmission bandwidth.

Using natural log instead of log2. Forgetting both p and q terms.

When measuring uncertainty of a biased coin, Entropy (Shannon) is used to calculate Entropy from Probability (p). The result matters because it helps estimate likelihood and make a risk or decision statement rather than treating the number as certainty.

Entropy is maximized when all outcomes are equally likely. The units are in bits when the logarithm is base 2. Entropy is always zero or positive; it is zero only when one outcome is certain. Use the change of base formula: log₂(x) = ln(x) / ln(2).

References

Sources

Shannon, C. E. (1948). A Mathematical Theory of Communication.
Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory.
Wikipedia: Shannon entropy
Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379-423.
Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.). Wiley.
Claude E. Shannon, 'A Mathematical Theory of Communication', Bell System Technical Journal, 1948
Thomas M. Cover and Joy A. Thomas, 'Elements of Information Theory', 2nd ed., Wiley-Interscience, 2006
David J. C. MacKay, 'Information Theory, Inference, and Learning Algorithms', Cambridge University Press, 2003

Overview

Variables

Derivation

State the entropy formula:

Interpret the units:

Rearrangements

Graph

Intuition

Insight

Practice Problem

Real-World Context

Tips

Common Mistakes

Related Formulas

Information Gain

Frequently Asked Questions

Sources