Data & ComputingInformation TheoryUniversity
AQAAPOntarioNSWCBSEGCE O-LevelMoECAPS

KL Divergence (Bernoulli) Calculator

D_KL(p||q) for Bernoulli distributions.

Use the free calculatorCheck the variablesOpen the advanced solver
This is the free calculator preview. Advanced walkthroughs stay in the app.
Result
Ready
KL Divergence

Formula first

Overview

The Bernoulli KL divergence measures the relative entropy between two Bernoulli distributions, quantifying the information lost when distribution q is used to approximate distribution p. It is a non-symmetric metric that characterizes the statistical distance between two binary outcomes across a shared probability space.

Symbols

Variables

= KL Divergence, p = True Probability, q = Model Probability

KL Divergence
nats
True Probability
Variable
Model Probability
Variable

Apply it well

When To Use

When to use: This equation is essential when evaluating the performance of binary classifiers or when comparing a theoretical model to observed binary frequencies. It is frequently applied in machine learning as a component of loss functions like Binary Cross-Entropy and in the context of information-theoretic model selection.

Why it matters: It provides a rigorous way to measure the 'surprise' or extra cost incurred by assuming one set of probabilities when the reality is different. In practice, minimizing this divergence optimizes data transmission and ensures that predictive models are as close to the true data generation process as possible.

Avoid these traps

Common Mistakes

  • Swapping p and q (changes the value).
  • Assuming KL is a distance metric (it isn’t symmetric).

One free problem

Practice Problem

A coin is known to have a true probability of landing heads of p = 0.5. If a researcher models this coin with an estimated probability q = 0.2, calculate the resulting KL Divergence in nats.

True Probability0.5
Model Probability0.2

Solve for:

Hint: Plug the values into the formula using natural logarithms for both the p/q and (1-p)/(1-q) terms.

The full worked solution stays in the interactive walkthrough.

References

Sources

  1. Elements of Information Theory by Thomas M. Cover and Joy A. Thomas
  2. Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
  3. Wikipedia: Kullback-Leibler divergence
  4. Cover and Thomas, Elements of Information Theory, 2nd ed.
  5. Wikipedia: Bernoulli distribution
  6. IUPAC Gold Book: relative entropy
  7. Cover and Thomas Elements of Information Theory