SociologyQuantitative Research MethodsUniversity
AQAAPOntarioNSWCBSEGCE O-LevelMoECAPS

Pearson Correlation Coefficient

Measures the linear relationship strength and direction between two continuous variables.

Understand the formulaSee the free derivationOpen the full walkthrough

This public page keeps the free explanation visible and leaves premium worked solving, advanced walkthroughs, and saved study tools inside the app.

Core idea

Overview

The Pearson product-moment correlation coefficient (r) quantifies the degree to which two continuous variables are linearly related. Its value ranges from -1 to +1, where +1 indicates a perfect positive linear relationship, -1 a perfect negative linear relationship, and 0 no linear relationship. It's a fundamental statistic for exploring bivariate associations in social science research.

When to use: Applied when examining the linear association between two continuous, interval, or ratio-level variables. Common in studies exploring relationships between social attitudes, economic indicators, educational attainment, or health outcomes.

Why it matters: Essential for understanding how social phenomena co-vary. It helps sociologists identify potential predictors, explore theoretical relationships, and inform the development of more complex models like regression. It's a key descriptive statistic and a precursor to many inferential analyses.

Symbols

Variables

Cov(X,Y) = Covariance of X and Y, = Standard Deviation of X, = Standard Deviation of Y, r = Pearson's r

Cov(X,Y)
Covariance of X and Y
unit²
Standard Deviation of X
unit
Standard Deviation of Y
unit
Pearson's r
dimensionless

Walkthrough

Derivation

Formula: Pearson Correlation Coefficient

Derives the standardized measure of linear association between two continuous variables.

  • Variables are continuous.
  • Relationship is linear.
  • Data are approximately normally distributed (for inferential tests).
1

Start with Covariance:

Covariance measures how much two variables change together, but its magnitude depends on the variables' scales.

2

Standardize by Standard Deviations:

Dividing covariance by the product of the individual standard deviations standardizes the measure, making it unitless and interpretable between -1 and +1.

Result

Source: Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240-242.

Free formulas

Rearrangements

Solve for Cov(X,Y)

Make Cov(X,Y) the subject of Pearson Correlation Coefficient formula

Rearrange the Pearson correlation coefficient formula to solve for the covariance of X and Y.

Difficulty: 2/5

Solve for

Make SDₓ the subject of Pearson Correlation Coefficient formula

Rearrange the Pearson correlation coefficient formula to solve for the standard deviation of X.

Difficulty: 2/5

Solve for

Make SDᵧ the subject of Pearson Correlation Coefficient formula

Rearrange the Pearson correlation coefficient formula to solve for the standard deviation of Y.

Difficulty: 2/5

The static page shows the finished rearrangements. The app keeps the full worked algebra walkthrough.

Why it behaves this way

Intuition

Visualize a scatter plot where each point represents an observation for two variables. The Pearson correlation coefficient describes how closely these points align along a straight line, and whether that line slopes

The Pearson product-moment correlation coefficient, quantifying the strength and direction of a linear relationship between two continuous variables.
A value close to +1 indicates a strong positive linear association (as one variable increases, the other tends to increase proportionally).
Cov(X, Y)
Covariance between variables X and Y, a measure of how much two variables vary together.
If X and Y tend to increase or decrease simultaneously, their covariance is positive. If one tends to increase while the other decreases, their covariance is negative. Its magnitude depends on the units of X and Y.
The standard deviation of variable X, a measure of the typical amount of variation or dispersion of data points around the mean of X.
A larger standard deviation means the data points for X are more spread out from their average value. It serves to normalize the covariance, making 'r' unitless.
The standard deviation of variable Y, a measure of the typical amount of variation or dispersion of data points around the mean of Y.
A larger standard deviation means the data points for Y are more spread out from their average value. It serves to normalize the covariance, making 'r' unitless.

Signs and relationships

  • Cov(X, Y): The sign of the covariance directly determines the sign of 'r'. A positive covariance indicates a positive linear relationship, meaning as one variable increases, the other tends to increase.
  • SD_X SD_Y: The product of standard deviations is always positive, ensuring that the sign of 'r' is solely determined by the covariance. This term normalizes the covariance, scaling 'r' to a value between -1 and +1, making it a

Free study cues

Insight

Canonical usage

The Pearson correlation coefficient is a dimensionless statistical measure, meaning its value is independent of the units used for the underlying variables, as long as those units are consistently applied within each

Common confusion

A common confusion is incorrectly assuming that Pearson's r has units, or misapplying it to variables that are not continuous or not measured on at least an interval scale, which can lead to invalid interpretations.

Dimension note

The Pearson correlation coefficient is inherently dimensionless because it is a ratio of the covariance of two variables to the product of their standard deviations. The units of the variables in the numerator (Cov(X,Y)

Unit systems

X, YAny consistent unit for continuous variables (e.g., years, dollars, scores · While the input variables X and Y possess their own units and dimensions, the mathematical structure of the Pearson correlation coefficient ensures that these units cancel out in the final calculation, rendering 'r'

Ballpark figures

  • Quantity:

One free problem

Practice Problem

A study on social capital and community engagement found the covariance between the two variables to be 24. The standard deviation of social capital scores was 4, and the standard deviation of community engagement scores was 6. Calculate the Pearson correlation coefficient (r).

Covariance of X and Y24 unit²
Standard Deviation of X4 unit
Standard Deviation of Y6 unit

Solve for:

Hint: Divide the covariance by the product of the two standard deviations.

The full worked solution stays in the interactive walkthrough.

Where it shows up

Real-World Context

A researcher calculates Pearson's r to determine the strength and direction of the linear relationship between years of education and annual income in a population.

Study smarter

Tips

  • Correlation does not imply causation.
  • Always visualize the relationship with a scatter plot to check for linearity and outliers.
  • The coefficient 'r' is sensitive to outliers, which can distort its value.
  • A strong correlation (e.g., |r| > 0.7) indicates a substantial linear relationship.

Avoid these traps

Common Mistakes

  • Inferring causation from correlation.
  • Applying to non-linear relationships.
  • Ignoring outliers or non-normal distributions.

Common questions

Frequently Asked Questions

Derives the standardized measure of linear association between two continuous variables.

Applied when examining the linear association between two continuous, interval, or ratio-level variables. Common in studies exploring relationships between social attitudes, economic indicators, educational attainment, or health outcomes.

Essential for understanding how social phenomena co-vary. It helps sociologists identify potential predictors, explore theoretical relationships, and inform the development of more complex models like regression. It's a key descriptive statistic and a precursor to many inferential analyses.

Inferring causation from correlation. Applying to non-linear relationships. Ignoring outliers or non-normal distributions.

A researcher calculates Pearson's r to determine the strength and direction of the linear relationship between years of education and annual income in a population.

Correlation does not imply causation. Always visualize the relationship with a scatter plot to check for linearity and outliers. The coefficient 'r' is sensitive to outliers, which can distort its value. A strong correlation (e.g., |r| > 0.7) indicates a substantial linear relationship.

References

Sources

  1. Wikipedia: Pearson correlation coefficient
  2. Agresti, A. (2018). Statistical Methods for the Social Sciences (5th ed.). Pearson.
  3. Andy Field, Discovering Statistics Using IBM SPSS Statistics
  4. Paul F. Velleman, David S. Moore, Richard D. De Veaux, Bock, Stats: Data and Models
  5. Agresti, Statistical Methods for the Social Sciences
  6. Frankfort-Nachmias and Leon-Guerrero, Social Statistics for a Diverse Society
  7. Cohen, Statistical Power Analysis for the Behavioral Sciences
  8. Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240-242.