Pearson Correlation Coefficient
Measures the linear relationship strength and direction between two continuous variables.
This public page keeps the free explanation visible and leaves premium worked solving, advanced walkthroughs, and saved study tools inside the app.
Core idea
Overview
The Pearson product-moment correlation coefficient (r) quantifies the degree to which two continuous variables are linearly related. Its value ranges from -1 to +1, where +1 indicates a perfect positive linear relationship, -1 a perfect negative linear relationship, and 0 no linear relationship. It's a fundamental statistic for exploring bivariate associations in social science research.
When to use: Applied when examining the linear association between two continuous, interval, or ratio-level variables. Common in studies exploring relationships between social attitudes, economic indicators, educational attainment, or health outcomes.
Why it matters: Essential for understanding how social phenomena co-vary. It helps sociologists identify potential predictors, explore theoretical relationships, and inform the development of more complex models like regression. It's a key descriptive statistic and a precursor to many inferential analyses.
Symbols
Variables
Cov(X,Y) = Covariance of X and Y, = Standard Deviation of X, = Standard Deviation of Y, r = Pearson's r
Walkthrough
Derivation
Formula: Pearson Correlation Coefficient
Derives the standardized measure of linear association between two continuous variables.
- Variables are continuous.
- Relationship is linear.
- Data are approximately normally distributed (for inferential tests).
Start with Covariance:
Covariance measures how much two variables change together, but its magnitude depends on the variables' scales.
Standardize by Standard Deviations:
Dividing covariance by the product of the individual standard deviations standardizes the measure, making it unitless and interpretable between -1 and +1.
Result
Source: Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240-242.
Free formulas
Rearrangements
Solve for Cov(X,Y)
Make Cov(X,Y) the subject of Pearson Correlation Coefficient formula
Rearrange the Pearson correlation coefficient formula to solve for the covariance of X and Y.
Difficulty: 2/5
Solve for
Make SDₓ the subject of Pearson Correlation Coefficient formula
Rearrange the Pearson correlation coefficient formula to solve for the standard deviation of X.
Difficulty: 2/5
Solve for
Make SDᵧ the subject of Pearson Correlation Coefficient formula
Rearrange the Pearson correlation coefficient formula to solve for the standard deviation of Y.
Difficulty: 2/5
The static page shows the finished rearrangements. The app keeps the full worked algebra walkthrough.
Why it behaves this way
Intuition
Visualize a scatter plot where each point represents an observation for two variables. The Pearson correlation coefficient describes how closely these points align along a straight line, and whether that line slopes
Signs and relationships
- Cov(X, Y): The sign of the covariance directly determines the sign of 'r'. A positive covariance indicates a positive linear relationship, meaning as one variable increases, the other tends to increase.
- SD_X SD_Y: The product of standard deviations is always positive, ensuring that the sign of 'r' is solely determined by the covariance. This term normalizes the covariance, scaling 'r' to a value between -1 and +1, making it a
Free study cues
Insight
Canonical usage
The Pearson correlation coefficient is a dimensionless statistical measure, meaning its value is independent of the units used for the underlying variables, as long as those units are consistently applied within each
Common confusion
A common confusion is incorrectly assuming that Pearson's r has units, or misapplying it to variables that are not continuous or not measured on at least an interval scale, which can lead to invalid interpretations.
Dimension note
The Pearson correlation coefficient is inherently dimensionless because it is a ratio of the covariance of two variables to the product of their standard deviations. The units of the variables in the numerator (Cov(X,Y)
Unit systems
Ballpark figures
- Quantity:
One free problem
Practice Problem
A study on social capital and community engagement found the covariance between the two variables to be 24. The standard deviation of social capital scores was 4, and the standard deviation of community engagement scores was 6. Calculate the Pearson correlation coefficient (r).
Solve for:
Hint: Divide the covariance by the product of the two standard deviations.
The full worked solution stays in the interactive walkthrough.
Where it shows up
Real-World Context
A researcher calculates Pearson's r to determine the strength and direction of the linear relationship between years of education and annual income in a population.
Study smarter
Tips
- Correlation does not imply causation.
- Always visualize the relationship with a scatter plot to check for linearity and outliers.
- The coefficient 'r' is sensitive to outliers, which can distort its value.
- A strong correlation (e.g., |r| > 0.7) indicates a substantial linear relationship.
Avoid these traps
Common Mistakes
- Inferring causation from correlation.
- Applying to non-linear relationships.
- Ignoring outliers or non-normal distributions.
Common questions
Frequently Asked Questions
Derives the standardized measure of linear association between two continuous variables.
Applied when examining the linear association between two continuous, interval, or ratio-level variables. Common in studies exploring relationships between social attitudes, economic indicators, educational attainment, or health outcomes.
Essential for understanding how social phenomena co-vary. It helps sociologists identify potential predictors, explore theoretical relationships, and inform the development of more complex models like regression. It's a key descriptive statistic and a precursor to many inferential analyses.
Inferring causation from correlation. Applying to non-linear relationships. Ignoring outliers or non-normal distributions.
A researcher calculates Pearson's r to determine the strength and direction of the linear relationship between years of education and annual income in a population.
Correlation does not imply causation. Always visualize the relationship with a scatter plot to check for linearity and outliers. The coefficient 'r' is sensitive to outliers, which can distort its value. A strong correlation (e.g., |r| > 0.7) indicates a substantial linear relationship.
References
Sources
- Wikipedia: Pearson correlation coefficient
- Agresti, A. (2018). Statistical Methods for the Social Sciences (5th ed.). Pearson.
- Andy Field, Discovering Statistics Using IBM SPSS Statistics
- Paul F. Velleman, David S. Moore, Richard D. De Veaux, Bock, Stats: Data and Models
- Agresti, Statistical Methods for the Social Sciences
- Frankfort-Nachmias and Leon-Guerrero, Social Statistics for a Diverse Society
- Cohen, Statistical Power Analysis for the Behavioral Sciences
- Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240-242.