Cayley-Hamilton Theorem

Core idea

Overview

The Cayley-Hamilton Theorem asserts that every square matrix satisfies its own characteristic equation, meaning if p(λ) is the characteristic polynomial of matrix A, then p(A) results in the zero matrix. This fundamental result bridges the gap between matrix algebra and polynomial theory, providing a powerful tool for matrix analysis.

When to use: Apply this theorem when calculating large powers of a matrix or finding the inverse of a non-singular matrix without row reduction. It is also used to simplify matrix-valued functions and to find the minimal polynomial of a linear operator.

Why it matters: It drastically reduces computational complexity in fields like control theory and signal processing by converting matrix exponentiation into linear combinations of lower powers. It is a cornerstone of the Jordan Canonical Form and other structural decompositions in linear algebra.

Walkthrough

Derivation

Derivation/Understanding of Cayley-Hamilton Theorem

The Cayley-Hamilton Theorem states that every square matrix satisfies its own characteristic polynomial, meaning that if a matrix is substituted into its characteristic polynomial, the result is the zero matrix.

The matrix $A$ is a square matrix of dimension $n \times n$ .
The field of scalars is $C$ (complex numbers) or $R$ (real numbers).

1

Defining the Characteristic Polynomial and Adjugate Relationship:

We begin by defining the characteristic polynomial $p (λ)$ for an $n \times n$ matrix $A$ . We then recall the fundamental property relating a matrix, its adjugate, and its determinant, applying it to the matrix $(A - λ I)$ .

Let A be an n \times n matrix. The characteristic polynomial is p (λ) = det (A - λ I) = c_{n} λ^{n} + c_{n - 1} λ^{n - 1} + \dots + c_{1} λ + c_{0} . We know that for any matrix M, M \cdot adj (M) = det (M) I . Applying this to (A - λ I) : (A - λ I) adj (A - λ I) = det (A - λ I) I = p (λ) I .

2

Expressing the Adjugate as a Polynomial Matrix:

Since the adjugate matrix's elements are determinants of submatrices of $(A - λ I)$ , they are polynomials in $λ$ of degree at most $n - 1$ . This allows us to express the adjugate as a polynomial in $λ$ whose coefficients are constant matrices.

The entries of adj (A - λ I) are cofactors of (A - λ I), which are polynomials in λ of degree at most n - 1. Thus, we can write adj (A - λ I) = B_{n - 1} λ^{n - 1} + B_{n - 2} λ^{n - 2} + \dots + B_{1} λ + B_{0}, where B_{k} are n \times n matrices with constant entries.

3

Equating Coefficients and Deriving the Theorem:

By substituting the polynomial expressions for $p (λ)$ and $adj (A - λ I)$ into the identity, we can equate coefficients of powers of $λ$ . Multiplying these resulting matrix equations by appropriate powers of $A$ and summing them leads to a telescoping sum on the left, which cancels to the zero matrix, thus proving that $p (A)$ equals the zero matrix.

Substitute the polynomial forms into the identity from Step 1: (A - λ I) (B_{n - 1} λ^{n - 1} + \dots + B_{0}) = (c_{n} λ^{n} + \dots + c_{0}) I . Expanding the left side and equating coefficients of powers of λ on both sides: - I B_{n - 1} n A B_{n - 1} - I B_{n - 2} n A B_{1} - I B_{0} n A B_{0} = = = = c_{n} I c_{n - 1} I ⋮ c_{1} I c_{0} I ⋮ Multiply each equation by A^{n}, A^{n - 1}, \dots, A^{1}, A^{0} respectively, from the left: - A^{n} B_{n - 1} n A^{n} B_{n - 1} - A^{n - 1} B_{n - 2} n A^{2} B_{1} - A B_{0} n A B_{0} = = = = c_{n} A^{n} c_{n - 1} A^{n - 1} ⋮ c_{1} A c_{0} I ⋮ Summing these equations yields the zero matrix on the left side due to telescoping cancellation: 0 = c_{n} A^{n} + c_{n - 1} A^{n - 1} + \dots + c_{1} A + c_{0} I . This is precisely p (A) = 0.

Result

Substitute the polynomial forms into the identity from Step 1: (A - λ I) (B_{n - 1} λ^{n - 1} + \dots + B_{0}) = (c_{n} λ^{n} + \dots + c_{0}) I . Expanding the left side and equating coefficients of powers of λ on both sides: - I B_{n - 1} n A B_{n - 1} - I B_{n - 2} n A B_{1} - I B_{0} n A B_{0} = = = = c_{n} I c_{n - 1} I ⋮ c_{1} I c_{0} I ⋮ Multiply each equation by A^{n}, A^{n - 1}, \dots, A^{1}, A^{0} respectively, from the left: - A^{n} B_{n - 1} n A^{n} B_{n - 1} - A^{n - 1} B_{n - 2} n A^{2} B_{1} - A B_{0} n A B_{0} = = = = c_{n} A^{n} c_{n - 1} A^{n - 1} ⋮ c_{1} A c_{0} I ⋮ Summing these equations yields the zero matrix on the left side due to telescoping cancellation: 0 = c_{n} A^{n} + c_{n - 1} A^{n - 1} + \dots + c_{1} A + c_{0} I . This is precisely p (A) = 0.

Source: Introduction to Linear Algebra by Gilbert Strang

Visual intuition

Graph

Graph unavailable for this formula.

The graph is a constant function where the dependent variable result is always zero for any independent variable input. This horizontal line lies directly on the x-axis because the matrix polynomial evaluates to the zero matrix.

Graph type: constant

Why it behaves this way

Intuition

Imagine a square matrix as a set of instructions for transforming vectors; the Cayley-Hamilton theorem states that if you apply a specific polynomial sequence of these instructions (derived from the matrix's own characteristic polynomial), the net transformation is the zero transformation.

A

The square matrix whose algebraic properties are being described.

Represents a linear transformation or a system's operator.

P(A)

The characteristic polynomial of matrix A, evaluated by substituting A for the variable.

This operation combines powers of A and scalar multiples, demonstrating a fundamental algebraic identity specific to A.

a_{n}, \dots, a_{0}

Scalar coefficients that define the specific characteristic polynomial of matrix A.

These scalars determine the unique polynomial equation that the matrix A satisfies.

I

The identity matrix, which acts as the multiplicative identity in matrix algebra.

Ensures the constant term

a_{0}

of the characteristic polynomial is correctly represented as a matrix in the equation.

0

The zero matrix, which acts as the additive identity in matrix algebra.

Signifies that the polynomial expression, when evaluated with A, results in the null transformation or the absence of any net effect.

Free study cues

Insight

Canonical usage

This mathematical theorem describes an algebraic identity for square matrices. If the matrix elements possess physical units, then the polynomial coefficients must be chosen to ensure dimensional consistency across all terms of the identity.

Common confusion

A common confusion is either trying to assign physical units where none are necessary for a pure mathematical theorem, or failing to ensure dimensional consistency of the polynomial terms if the matrix elements do

Unit systems

$A$ If elements of the matrix A have a unit U, then A^k will have elements with unit - The Cayley-Hamilton Theorem itself is unit-agnostic. Unit considerations arise when A represents a matrix of physical quantities, such as a stiffness matrix or a Jacobian matrix.

$a_{k}$ If matrix A (n x n) has elements with unit U, then the coefficient a_k must have - For example, if A's elements have unit U, then a_n is dimensionless, a_{n-1} has unit U, a_{n-2} has unit U^2, ..., and a_0 has unit U^n.

$I$ Dimensionless - The identity matrix I consists of dimensionless elements (ones and zeros).

0If matrix A (n x n) has elements with unit U, then the zero matrix 0 on the - This represents the zero matrix, not a scalar zero, and its elements must be dimensionally consistent with the terms of P(A).

One free problem

Practice Problem

Given a 2×2 matrix A with diagonal elements m11 = 5 and m22 = 3, the Cayley-Hamilton theorem states that A satisfies the equation A² - kA + dI = 0. Find the value of k, which corresponds to the trace of the matrix.

m115

m223

Solve for: $k$

Hint: The trace of a matrix is the sum of its diagonal elements and appears as the negative coefficient of the λ term in the characteristic polynomial.

The full worked solution stays in the interactive walkthrough.

Where it shows up

Real-World Context

In control theory to compute the matrix exponential for solving systems of linear differential equations, Cayley-Hamilton Theorem is used to calculate P(A) from the measured values. The result matters because it helps connect the calculation to the shape, rate, probability, or constraint in the model.

Study smarter

Tips

Calculate the characteristic polynomial first using det(λI - A) = 0.
Substitute λ with the matrix A and the constant term with the identity matrix I.
Use it to express A⁻¹ as a polynomial in A by multiplying the characteristic equation by A⁻¹.

Avoid these traps

Common Mistakes

Applying the theorem to non-square matrices.
Forgetting to multiply the constant term by the identity matrix when evaluating p(A).

Keep going

Related Formulas

Common questions

Frequently Asked Questions

The Cayley-Hamilton Theorem states that every square matrix satisfies its own characteristic polynomial, meaning that if a matrix is substituted into its characteristic polynomial, the result is the zero matrix.

Apply this theorem when calculating large powers of a matrix or finding the inverse of a non-singular matrix without row reduction. It is also used to simplify matrix-valued functions and to find the minimal polynomial of a linear operator.

It drastically reduces computational complexity in fields like control theory and signal processing by converting matrix exponentiation into linear combinations of lower powers. It is a cornerstone of the Jordan Canonical Form and other structural decompositions in linear algebra.

Applying the theorem to non-square matrices. Forgetting to multiply the constant term by the identity matrix when evaluating p(A).