PsychologyStatisticsA-Level
AQACISCECambridgeWJECOCREdexcelIBAbitur

Pearson's Correlation (r)

Strength and direction of the linear relationship between two variables.

Understand the formulaSee the free derivationOpen the full walkthrough

This public page keeps the free explanation visible and leaves premium worked solving, advanced walkthroughs, and saved study tools inside the app.

Core idea

Overview

Pearson's correlation coefficient, often denoted as r, measures the strength and direction of the linear relationship between two continuous variables. It is calculated by dividing the covariance of the two variables by the product of their individual standard deviations, effectively scaling the result to a range between -1 and +1.

When to use: Apply this metric when examining the linear association between two interval or ratio-level variables that follow a normal distribution. It is essential for datasets where you assume a constant rate of change and want to ignore the specific units of measurement. It should not be used for non-linear relationships or data with extreme outliers.

Why it matters: In psychological research, r allows for the quantification of relationships between abstract constructs like intelligence and academic performance. This coefficient is vital for assessing the reliability of diagnostic tools and predicting behavioral outcomes based on known variables. It serves as the foundation for more complex multivariate analyses like regression.

Symbols

Variables

r = Pearson's r, Cov = Covariance, SD_x = SD of X, SD_y = SD of Y

Pearson's r
Covariance
SD of X
SD of Y

Walkthrough

Derivation

Derivation/Understanding of Pearson's Correlation (r)

This derivation explains how Pearson's correlation coefficient (r) is developed from the concept of covariance to provide a standardized, interpretable measure of the linear relationship between two variables.

  • The relationship between the two variables (X and Y) is linear.
  • Both variables are measured on an interval or ratio scale.
1

Understanding Covariance:

Covariance measures the extent to which two variables, X and Y, vary together. A positive covariance means they tend to increase or decrease together, while a negative covariance means one tends to increase as the other decreases.

2

The Need for Standardization:

The raw value of covariance is influenced by the scales and units of the variables X and Y. This makes it difficult to interpret the strength of a relationship or compare it across different datasets, as there's no fixed range for covariance.

3

Standardizing with Standard Deviations:

To create a universally interpretable measure, we need to standardize the covariance. This is achieved by dividing it by a measure of the individual variability of X and Y, which are their respective standard deviations ( and ).

4

The Pearson's r Formula:

By dividing the covariance by the product of the standard deviations of X and Y, we obtain Pearson's r. This results in a unitless value ranging from -1 to +1, indicating the strength and direction of the linear relationship.

Result

Source: AQA Psychology for A Level Year 2 by Cara Flanagan, Dave Berry, Jo Hayward, and Rob Liddle

Free formulas

Rearrangements

Solve for

Pearson's Correlation (r)

This equation defines Pearson's correlation coefficient, `r`, in terms of the covariance of two variables, X and Y, and their respective standard deviations. The steps demonstrate how to express the formula using common shorthand notation.

Difficulty: 2/5

The static page shows the finished rearrangements. The app keeps the full worked algebra walkthrough.

Visual intuition

Graph

Graph unavailable for this formula.

The graph of this relationship is a linear scatter plot where the independent variable is plotted on the x-axis and the dependent variable on the y-axis. The data points form a straight line pattern, as the formula measures the constant rate of change between the two variables. The slope of this line indicates the strength and direction of the correlation.

Graph type: linear

Why it behaves this way

Intuition

Imagine a scatter plot where each point represents a pair of (X, Y) values. Pearson's r describes how closely these points cluster around a straight line and whether that line slopes upwards (positive r)

r
Pearson's correlation coefficient, quantifying the strength and direction of a linear relationship between two continuous variables.
A value of +1 indicates a perfect positive linear relationship (as one variable increases, the other increases proportionally), -1 a perfect negative linear relationship (as one variable increases, the other decreases
cov(X,Y)
Covariance of variables X and Y, a measure of how much two variables vary together.
If X and Y tend to increase or decrease simultaneously relative to their respective means, covariance is positive. If one tends to increase while the other decreases, it's negative.
Standard deviation of variable X, representing the typical spread or dispersion of data points around the mean of that variable.
It normalizes the covariance, ensuring 'r' is a unitless measure of association, independent of the scales or units of X and Y.
Standard deviation of variable Y, representing the typical spread or dispersion of data points around the mean of that variable.
It normalizes the covariance, ensuring 'r' is a unitless measure of association, independent of the scales or units of X and Y.

Signs and relationships

  • \text{cov}(X,Y): The sign of the covariance directly determines the sign of Pearson's r. A positive covariance indicates a positive linear relationship (variables tend to move in the same direction), while a negative covariance indicates
  • \sigma_X \sigma_Y: Standard deviations are always non-negative. Their product in the denominator scales the covariance, transforming it into a standardized, unitless measure ranging from -1 to +1.

Free study cues

Insight

Canonical usage

Pearson's correlation coefficient (r) is a dimensionless quantity, always reported as a pure number between -1 and +1, irrespective of the units of the variables X and Y.

Common confusion

Students sometimes mistakenly try to assign units to the correlation coefficient 'r' or confuse its interpretation with coefficients that have different ranges or assumptions (e.g., Spearman's rho).

Dimension note

Pearson's r is inherently dimensionless because it is a ratio of the covariance of X and Y (which has units of [unit X] * [unit Y]) to the product of their standard deviations (which also has units of [unit X] * [unit

One free problem

Practice Problem

A psychologist finds that the covariance between study hours (X) and test scores (Y) is 12.0. If the standard deviation for study hours is 4.0 and for test scores is 5.0, what is the correlation coefficient?

Covariance12
SD of X4
SD of Y5

Solve for:

Hint: Divide the covariance by the product of the two standard deviations.

The full worked solution stays in the interactive walkthrough.

Study smarter

Tips

  • Always visualize data with a scatterplot to ensure the relationship is linear.
  • Remember that r values only range from -1.0 to +1.0.
  • Outliers can significantly inflate or deflate the resulting correlation.
  • Correlation measures association, but never proves a cause-and-effect link.

Avoid these traps

Common Mistakes

  • Using it on non-linear data.
  • Assuming correlation implies causation.

Common questions

Frequently Asked Questions

This derivation explains how Pearson's correlation coefficient (r) is developed from the concept of covariance to provide a standardized, interpretable measure of the linear relationship between two variables.

Apply this metric when examining the linear association between two interval or ratio-level variables that follow a normal distribution. It is essential for datasets where you assume a constant rate of change and want to ignore the specific units of measurement. It should not be used for non-linear relationships or data with extreme outliers.

In psychological research, r allows for the quantification of relationships between abstract constructs like intelligence and academic performance. This coefficient is vital for assessing the reliability of diagnostic tools and predicting behavioral outcomes based on known variables. It serves as the foundation for more complex multivariate analyses like regression.

Using it on non-linear data. Assuming correlation implies causation.

Always visualize data with a scatterplot to ensure the relationship is linear. Remember that r values only range from -1.0 to +1.0. Outliers can significantly inflate or deflate the resulting correlation. Correlation measures association, but never proves a cause-and-effect link.

References

Sources

  1. Wikipedia: Pearson correlation coefficient
  2. Statistics for Psychology (8th ed.) by Arthur Aron, Elaine N. Aron, and Elliot Coups
  3. Pearson product-moment correlation coefficient (Wikipedia article)
  4. Discovering Statistics Using IBM SPSS Statistics by Andy Field
  5. Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE Publications.
  6. Howell, D. C. (2013). Statistical Methods for Psychology (8th ed.). Wadsworth Cengage Learning.
  7. AQA Psychology for A Level Year 2 by Cara Flanagan, Dave Berry, Jo Hayward, and Rob Liddle