Pearson's Correlation (r)
Strength and direction of the linear relationship between two variables.
This public page keeps the free explanation visible and leaves premium worked solving, advanced walkthroughs, and saved study tools inside the app.
Core idea
Overview
Pearson's correlation coefficient, often denoted as r, measures the strength and direction of the linear relationship between two continuous variables. It is calculated by dividing the covariance of the two variables by the product of their individual standard deviations, effectively scaling the result to a range between -1 and +1.
When to use: Apply this metric when examining the linear association between two interval or ratio-level variables that follow a normal distribution. It is essential for datasets where you assume a constant rate of change and want to ignore the specific units of measurement. It should not be used for non-linear relationships or data with extreme outliers.
Why it matters: In psychological research, r allows for the quantification of relationships between abstract constructs like intelligence and academic performance. This coefficient is vital for assessing the reliability of diagnostic tools and predicting behavioral outcomes based on known variables. It serves as the foundation for more complex multivariate analyses like regression.
Symbols
Variables
r = Pearson's r, Cov = Covariance, SD_x = SD of X, SD_y = SD of Y
Walkthrough
Derivation
Derivation/Understanding of Pearson's Correlation (r)
This derivation explains how Pearson's correlation coefficient (r) is developed from the concept of covariance to provide a standardized, interpretable measure of the linear relationship between two variables.
- The relationship between the two variables (X and Y) is linear.
- Both variables are measured on an interval or ratio scale.
Understanding Covariance:
Covariance measures the extent to which two variables, X and Y, vary together. A positive covariance means they tend to increase or decrease together, while a negative covariance means one tends to increase as the other decreases.
The Need for Standardization:
The raw value of covariance is influenced by the scales and units of the variables X and Y. This makes it difficult to interpret the strength of a relationship or compare it across different datasets, as there's no fixed range for covariance.
Standardizing with Standard Deviations:
To create a universally interpretable measure, we need to standardize the covariance. This is achieved by dividing it by a measure of the individual variability of X and Y, which are their respective standard deviations ( and ).
The Pearson's r Formula:
By dividing the covariance by the product of the standard deviations of X and Y, we obtain Pearson's r. This results in a unitless value ranging from -1 to +1, indicating the strength and direction of the linear relationship.
Result
Source: AQA Psychology for A Level Year 2 by Cara Flanagan, Dave Berry, Jo Hayward, and Rob Liddle
Free formulas
Rearrangements
Solve for
Pearson's Correlation (r)
This equation defines Pearson's correlation coefficient, `r`, in terms of the covariance of two variables, X and Y, and their respective standard deviations. The steps demonstrate how to express the formula using common shorthand notation.
Difficulty: 2/5
The static page shows the finished rearrangements. The app keeps the full worked algebra walkthrough.
Visual intuition
Graph
Graph unavailable for this formula.
The graph of this relationship is a linear scatter plot where the independent variable is plotted on the x-axis and the dependent variable on the y-axis. The data points form a straight line pattern, as the formula measures the constant rate of change between the two variables. The slope of this line indicates the strength and direction of the correlation.
Graph type: linear
Why it behaves this way
Intuition
Imagine a scatter plot where each point represents a pair of (X, Y) values. Pearson's r describes how closely these points cluster around a straight line and whether that line slopes upwards (positive r)
Signs and relationships
- \text{cov}(X,Y): The sign of the covariance directly determines the sign of Pearson's r. A positive covariance indicates a positive linear relationship (variables tend to move in the same direction), while a negative covariance indicates
- \sigma_X \sigma_Y: Standard deviations are always non-negative. Their product in the denominator scales the covariance, transforming it into a standardized, unitless measure ranging from -1 to +1.
Free study cues
Insight
Canonical usage
Pearson's correlation coefficient (r) is a dimensionless quantity, always reported as a pure number between -1 and +1, irrespective of the units of the variables X and Y.
Common confusion
Students sometimes mistakenly try to assign units to the correlation coefficient 'r' or confuse its interpretation with coefficients that have different ranges or assumptions (e.g., Spearman's rho).
Dimension note
Pearson's r is inherently dimensionless because it is a ratio of the covariance of X and Y (which has units of [unit X] * [unit Y]) to the product of their standard deviations (which also has units of [unit X] * [unit
One free problem
Practice Problem
A psychologist finds that the covariance between study hours (X) and test scores (Y) is 12.0. If the standard deviation for study hours is 4.0 and for test scores is 5.0, what is the correlation coefficient?
Solve for:
Hint: Divide the covariance by the product of the two standard deviations.
The full worked solution stays in the interactive walkthrough.
Study smarter
Tips
- Always visualize data with a scatterplot to ensure the relationship is linear.
- Remember that r values only range from -1.0 to +1.0.
- Outliers can significantly inflate or deflate the resulting correlation.
- Correlation measures association, but never proves a cause-and-effect link.
Avoid these traps
Common Mistakes
- Using it on non-linear data.
- Assuming correlation implies causation.
Common questions
Frequently Asked Questions
This derivation explains how Pearson's correlation coefficient (r) is developed from the concept of covariance to provide a standardized, interpretable measure of the linear relationship between two variables.
Apply this metric when examining the linear association between two interval or ratio-level variables that follow a normal distribution. It is essential for datasets where you assume a constant rate of change and want to ignore the specific units of measurement. It should not be used for non-linear relationships or data with extreme outliers.
In psychological research, r allows for the quantification of relationships between abstract constructs like intelligence and academic performance. This coefficient is vital for assessing the reliability of diagnostic tools and predicting behavioral outcomes based on known variables. It serves as the foundation for more complex multivariate analyses like regression.
Using it on non-linear data. Assuming correlation implies causation.
Always visualize data with a scatterplot to ensure the relationship is linear. Remember that r values only range from -1.0 to +1.0. Outliers can significantly inflate or deflate the resulting correlation. Correlation measures association, but never proves a cause-and-effect link.
References
Sources
- Wikipedia: Pearson correlation coefficient
- Statistics for Psychology (8th ed.) by Arthur Aron, Elaine N. Aron, and Elliot Coups
- Pearson product-moment correlation coefficient (Wikipedia article)
- Discovering Statistics Using IBM SPSS Statistics by Andy Field
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE Publications.
- Howell, D. C. (2013). Statistical Methods for Psychology (8th ed.). Wadsworth Cengage Learning.
- AQA Psychology for A Level Year 2 by Cara Flanagan, Dave Berry, Jo Hayward, and Rob Liddle