Pearson's Product-Moment Correlation Coefficient
A statistical measure that quantifies the strength and direction of the linear relationship between two continuous interval or ratio variables.
This public page keeps the free explanation visible and leaves premium worked solving, advanced walkthroughs, and saved study tools inside the app.
Core idea
Overview
Pearson's r produces a value between -1 and +1, where +1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear correlation. In geographical research, it is essential for testing hypotheses about how two variables, such as distance from a CBD and property prices, covary across a landscape. The coefficient assumes that the data is normally distributed and that the relationship is strictly linear.
When to use: Use when analyzing two sets of interval or ratio data to determine if a linear trend exists between them.
Why it matters: It allows geographers to move beyond visual inspection of scatter graphs to provide a statistically significant confirmation of relationships between environmental or social variables.
Symbols
Variables
r = Correlation Coefficient, n = Sample size, x = Variable 1 data points, y = Variable 2 data points
Walkthrough
Derivation
Derivation of Pearson's Product-Moment Correlation Coefficient
The formula is derived from the definition of the correlation coefficient as the covariance of two variables divided by the product of their standard deviations. It simplifies the algebraic expression of the Pearson coefficient for easier computational use.
- The relationship between the two variables is linear.
- The data points are paired as (x, y) observations.
- The variables are measured on an interval or ratio scale.
Defining the Correlation Coefficient
Start with the population definition where r is the covariance divided by the product of the standard deviations.
Note: Note that the 1/n terms cancel out during simplification.
Expanding the Covariance Term
Expand the brackets and apply the summation to each term, using the property that the sum of the mean is n times the mean.
Note: Recall that n = x and n = y.
Simplifying the Covariance Expression
Substitute the definitions of the means (x-bar and y-bar) into the expanded covariance expression to clear the denominators.
Note: This creates the numerator of the final formula.
Simplifying the Variance Denominator
Apply the same algebraic expansion to the variance terms for x and y. When substituted back into the denominator, the 'n' factors cancel out.
Note: Ensure you calculate the sum of the squares (sum ) and the square of the sum (sum x)^2 separately to avoid errors.
Final Assembly
Combine the simplified numerator and denominator to arrive at the computational formula.
Note: This form is often called the 'computational formula' because it is more efficient for manual calculation.
Result
Source: AQA/Edexcel A-Level Geography Specification - Quantitative Skills: Statistical Analysis
Free formulas
Rearrangements
Solve for
Make r the subject
The formula is already defined with r as the subject.
Difficulty: 1/5
Solve for
Make n the subject
Isolating n requires squaring both sides and using the quadratic formula or variable substitution techniques.
Difficulty: 5/5
Solve for
Make Σxy the subject
Isolate the numerator term by multiplying by the denominator and rearranging.
Difficulty: 3/5
The static page shows the finished rearrangements. The app keeps the full worked algebra walkthrough.
Visual intuition
Graph
Graph unavailable for this formula.
Contains advanced operator notation (integrals/sums/limits)
Why it behaves this way
Intuition
Think of the data as a cloud of points on a scatter graph. This equation calculates how well those points fit onto a straight line. Imagine trying to draw a 'best-fit' line through the cloud: the numerator measures how much the x and y values 'move together' (covariance), while the denominator acts as a scaling factor (standard deviations) to normalize the result, ensuring the value always sits between -1 and 1 regardless of the units used.
Signs and relationships
- Numerator (nΣxy - (Σx)(Σy)): If the numerator is positive, x and y increase together (positive correlation). If negative, one increases as the other decreases (negative correlation).
- Square root in denominator: This forces the result into the -1 to +1 range by dividing the covariance by the product of the two variables' individual standard deviations (normalization).
One free problem
Practice Problem
Given a small sample where n=5, Σx=15, Σy=20, Σxy=70, Σx²=55, and Σy²=90, calculate Pearson's r.
Solve for:
Hint: Calculate the numerator first, then the denominator parts separately.
The full worked solution stays in the interactive walkthrough.
Where it shows up
Real-World Context
Investigating the correlation between the distance of settlements from a river (x) and the average annual flood depth (y) to determine flood risk zones.
Study smarter
Tips
- Always plot a scatter graph first to check for linearity before calculating r.
- Ensure that your sample size (n) is sufficiently large to avoid skewed results from outliers.
- Remember that correlation does not imply causation.
Avoid these traps
Common Mistakes
- Forgetting to square the sum (Σx)² versus summing the squares Σx².
- Applying the test to non-linear relationships (e.g., exponential growth patterns).
- Ignoring the impact of extreme outliers which can heavily bias the result.
Common questions
Frequently Asked Questions
The formula is derived from the definition of the correlation coefficient as the covariance of two variables divided by the product of their standard deviations. It simplifies the algebraic expression of the Pearson coefficient for easier computational use.
Use when analyzing two sets of interval or ratio data to determine if a linear trend exists between them.
It allows geographers to move beyond visual inspection of scatter graphs to provide a statistically significant confirmation of relationships between environmental or social variables.
Forgetting to square the sum (Σx)² versus summing the squares Σx². Applying the test to non-linear relationships (e.g., exponential growth patterns). Ignoring the impact of extreme outliers which can heavily bias the result.
Investigating the correlation between the distance of settlements from a river (x) and the average annual flood depth (y) to determine flood risk zones.
Always plot a scatter graph first to check for linearity before calculating r. Ensure that your sample size (n) is sufficiently large to avoid skewed results from outliers. Remember that correlation does not imply causation.
References
Sources
- Pearson, K. (1896). Mathematical Contributions to the Theory of Evolution.
- Burt, J. E., Barber, G. M., & Rigby, D. L. (2009). Elementary Statistics for Geographers.
- AQA/Edexcel A-Level Geography Specification - Quantitative Skills: Statistical Analysis