SociologyQuantitative Research MethodsUniversity
AQAAPOntarioNSWCBSEGCE O-LevelMoECAPS

Simple Linear Regression Equation

Models the linear relationship between a dependent variable and a single independent variable.

Understand the formulaSee the free derivationOpen the full walkthrough

This public page keeps the free explanation visible and leaves premium worked solving, advanced walkthroughs, and saved study tools inside the app.

Core idea

Overview

Simple linear regression is a statistical method used to predict the value of a dependent variable (Y) based on the value of a single independent variable (X). It fits a straight line (the regression line) to the observed data, minimizing the sum of squared residuals. The equation provides the intercept (b₀) and the slope (b₁), which quantify the predicted change in Y for a one-unit change in X.

When to use: Applied when a researcher wants to understand or predict a continuous outcome variable based on a single continuous predictor. Common in studies examining the impact of education on income, age on political attitudes, or social capital on health outcomes.

Why it matters: Fundamental for understanding causal pathways and making predictions in social science. It allows sociologists to quantify the strength and direction of relationships, control for other variables (in multiple regression), and test theoretical hypotheses about social processes and inequalities.

Symbols

Variables

= Intercept, = Slope, X = Independent Variable, Ŷ = Predicted Dependent Variable

Intercept
Slope
Independent Variable
Ŷ
Predicted Dependent Variable

Walkthrough

Derivation

Formula: Simple Linear Regression Equation

Defines the linear model for predicting a dependent variable from an independent variable.

  • Linear relationship between X and Y.
  • Independent observations.
  • Homoscedasticity (constant variance of residuals).
  • Normally distributed residuals.
1

Define the linear model:

The true population model where Y is a linear function of X, plus an error term. We estimate this with sample data.

2

Estimate with sample data:

The estimated regression line, where `` is the predicted value, `` is the estimated intercept, and `` is the estimated slope, derived using the method of Ordinary Least Squares (OLS).

Result

Source: Legendre, A. M. (1805). Nouvelles méthodes pour la détermination des orbites des comètes. Paris: Courcier.

Free formulas

Rearrangements

Solve for

Make b₀ the subject of the Simple Linear Regression Equation

Rearrange the simple linear regression equation to solve for the intercept, b₀.

Difficulty: 2/5

Solve for

Make b₁ the subject of the Simple Linear Regression Equation

Rearrange the simple linear regression equation to solve for the slope, b₁.

Difficulty: 2/5

Solve for

Make X the subject of the Simple Linear Regression Equation

Rearrange the simple linear regression equation to solve for the independent variable, X.

Difficulty: 2/5

The static page shows the finished rearrangements. The app keeps the full worked algebra walkthrough.

Visual intuition

Graph

Graph unavailable for this formula.

The graph is a straight line where the output changes at a constant rate as the independent variable increases. For a sociology student, this shape implies that a unit change in the independent variable consistently predicts the same shift in the dependent variable, regardless of whether the independent variable is at a small or large value. The most important feature of this linear relationship is that the constant rate of change remains uniform across the entire range of the independent variable.

Graph type: linear

Why it behaves this way

Intuition

A straight line drawn through a scatter plot of data points, representing the best linear fit that minimizes the sum of squared vertical distances (residuals)

The predicted or estimated value of the dependent (outcome) variable.
This is the value the model expects for the outcome based on the given input X.
The intercept, representing the predicted value of \hat{Y} when the independent variable X is zero.
It's the baseline level of the outcome when the predictor has no value or is absent.
The slope coefficient, representing the predicted change in \hat{Y} for a one-unit increase in X.
It quantifies how much the outcome is expected to change for each unit increase in the predictor.
The independent (predictor) variable.
This is the factor whose effect on Y we are trying to understand or predict.

Signs and relationships

  • b_1: The sign of indicates the direction of the linear relationship between X and Y. A positive means increases as X increases (a positive association), while a negative means decreases as X

Free study cues

Insight

Canonical usage

Units of variables are preserved through the regression, with coefficients inheriting units derived from the dependent and independent variables.

Common confusion

A common mistake is misinterpreting the units of the slope coefficient (b1), especially when X or Y are percentages or scores. The unit of b1 is 'units of Y per unit of X', not a percentage change unless explicitly

Dimension note

While individual variables (e.g., scores, proportions) may be dimensionless, the equation itself establishes a relationship between quantities whose 'units' (whether physical, monetary, or abstract scores)

Unit systems

Varies (e.g., USD, years, score, count) · The unit of the dependent variable determines the unit of the intercept (b0).
Varies (e.g., years, education level, count, score) · The unit of the independent variable, combined with the unit of Y, determines the unit of the slope (b1).
Same as Y · Represents the predicted value of Y when X is zero. Its unit must be identical to that of Y.
Unit(Y) / Unit(X) · Represents the predicted change in Y for a one-unit increase in X. Its unit is the ratio of the dependent variable's unit to the independent variable's unit.

One free problem

Practice Problem

A regression model predicts an individual's political participation (Ŷ) based on their age (X). The intercept (b₀) is 5, and the slope (b₁) is 3. What is the predicted political participation score for an individual who is 10 years old?

Intercept5 unit_Y
Slope3 unit_Y/unit_X
Independent Variable10 unit_X

Solve for:

Hint: Substitute the given values into the regression equation: Ŷ = b₀ + b₁X.

The full worked solution stays in the interactive walkthrough.

Where it shows up

Real-World Context

A sociologist uses simple linear regression to predict an individual's level of social trust based on their reported level of community engagement.

Study smarter

Tips

  • The slope (b₁) indicates the average change in Y for a one-unit increase in X.
  • The intercept (b₀) is the predicted value of Y when X is zero, but only interpretable if X=0 is meaningful.
  • Always check regression assumptions (linearity, independence, homoscedasticity, normality of residuals).
  • Regression models predict, but do not prove, causation without careful research design.

Avoid these traps

Common Mistakes

  • Extrapolating beyond the range of the observed data.
  • Assuming causation without experimental design.
  • Ignoring violations of regression assumptions.

Common questions

Frequently Asked Questions

Defines the linear model for predicting a dependent variable from an independent variable.

Applied when a researcher wants to understand or predict a continuous outcome variable based on a single continuous predictor. Common in studies examining the impact of education on income, age on political attitudes, or social capital on health outcomes.

Fundamental for understanding causal pathways and making predictions in social science. It allows sociologists to quantify the strength and direction of relationships, control for other variables (in multiple regression), and test theoretical hypotheses about social processes and inequalities.

Extrapolating beyond the range of the observed data. Assuming causation without experimental design. Ignoring violations of regression assumptions.

A sociologist uses simple linear regression to predict an individual's level of social trust based on their reported level of community engagement.

The slope (b₁) indicates the average change in Y for a one-unit increase in X. The intercept (b₀) is the predicted value of Y when X is zero, but only interpretable if X=0 is meaningful. Always check regression assumptions (linearity, independence, homoscedasticity, normality of residuals). Regression models predict, but do not prove, causation without careful research design.

References

Sources

  1. Discovering Statistics Using IBM SPSS Statistics
  2. Wikipedia: Simple linear regression
  3. Andy Field, Discovering Statistics Using R and RStudio, 2012, SAGE Publications
  4. Alan Agresti, Statistical Methods for the Social Sciences, 5th ed., 2018, Pearson
  5. Wikipedia: Linear regression
  6. Andy Field Discovering Statistics Using R and RStudio
  7. John Neter, Michael H. Kutner, Christopher J. Nachtsheim, William Wasserman Applied Linear Regression Models
  8. Alan Agresti, Barbara Finlay Statistical Methods for the Social Sciences