Information Gain Calculator
Reduction in entropy.
Formula first
Overview
Information Gain measures the reduction in uncertainty, or entropy, within a dataset after it is partitioned based on a specific attribute. It is the primary criterion used by algorithms like ID3 and C4.5 to determine the best feature for splitting a node in a decision tree.
Symbols
Variables
IG = Info Gain, = Parent Entropy, = Child Entropy
Apply it well
When To Use
When to use: Apply this metric during the construction of supervised learning models to evaluate the predictive power of independent variables. It is most effective when working with categorical targets where the goal is to maximize class purity in resulting subsets.
Why it matters: By identifying features that offer the highest Information Gain, models can be built with fewer levels, reducing computational complexity. This efficiency helps prevent overfitting and ensures that the most relevant data patterns are prioritized during training.
Avoid these traps
Common Mistakes
- Adding entropies instead of subtracting.
- Mixing log bases.
One free problem
Practice Problem
A dataset has an initial entropy of 0.940 bits. After splitting it based on a specific feature, the weighted average entropy of the child nodes is 0.693 bits. Calculate the Information Gain.
Solve for: IG
Hint: Subtract the entropy of the children from the entropy of the parent node.
The full worked solution stays in the interactive walkthrough.
References
Sources
- Wikipedia: Information gain (decision tree)
- Wikipedia: Entropy (information theory)
- An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
- Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
- Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
- Wikipedia: Information gain in decision trees
- Standard curriculum — Machine Learning (Decision Trees)