Functions are Vectors

Conceptualizing functions as infinite-dimensional vectors lets us apply the tools of linear algebra to a vast landscape of new problems, from image and geometry processing to curve fitting, light transport, and machine learning.

Prerequisites: introductory linear algebra, introductory calculus, introductory differential equations.

This article received an honorable mention in 3Blue1Brown’s Summer of Math Exposition 3!


Functions as Vectors

Vectors are often first introduced as lists of real numbers—i.e. the familiar notation we use for points, directions, and more.

$$ \mathbf{v} = \begin{bmatrix}x\\y\\z\end{bmatrix} $$

You may recall that this representation is only one example of an abstract vector space. There are many other types of vectors, such as lists of complex numbers, graph cycles, and even magic squares.

However, all of these vector spaces have one thing in common: a finite number of dimensions. That is, each kind of vector can be represented as a collection of \(N\) numbers, though the definition of “number” varies.

If any \(N\)-dimensional vector is essentially a length-\(N\) list, we could also consider a vector to be a mapping from an index to a value.

\[\begin{align*} \mathbf{v}_1 &= x\\ \mathbf{v}_2 &= y\\ \mathbf{v}_3 &= z \end{align*}\ \iff\ \mathbf{v} = \begin{bmatrix}x \\ y \\ z\end{bmatrix}\]

What does this perspective hint at as we increase the number of dimensions?

Dimensions

In higher dimensions, vectors start to look more like functions!

Countably Infinite Indices

Of course, a finite-length vector only specifies a value at a limited number of indices. Could we instead define a vector that contains infinitely many values?

Writing down a vector representing a function on the natural numbers (\(\mathbb{N}\))—or any other countably infinite domain—is straightforward: just extend the list indefinitely.

$$ \begin{align*}\mathbf{v}_1 &= 1\\\mathbf{v}_2 &= 2\\ &\vdots \\ \mathbf{v}_i &= i\end{align*}\ \iff\ \mathbf{v} = \begin{bmatrix}1 \\ 2 \\ 3 \\ \vdots \end{bmatrix} $$

This vector could represent the function \(f(x) = x\), where \(x \in \mathbb{N}\).1

Uncountably Infinite Indices

Many interesting functions are defined on the real numbers (\(\mathbb{R}\)), so may not be representable as a countably infinite vector. Therefore, we will have to make a larger conceptual leap: not only will our set of indices be infinite, it will be uncountably infinite.

That means we can’t write down vectors as lists at all—it is impossible to assign an integer index to each element of an uncountable set. So, how can we write down a vector mapping a real index to a certain value?

Now, a vector really is just an arbitrary function:

$$ \mathbf{v}_{x} = x^2\ \iff\ \mathbf{v} = \begin{bmatrix} x \mapsto x^2 \end{bmatrix} $$

Precisely defining how and why we can represent functions as infinite-dimensional vectors is the purview of functional analysis. In this post, we won’t attempt to prove our results in infinite dimensions: we will focus on building intuition via analogies to finite-dimensional linear algebra.


Vector Spaces

Formally, a vector space is defined by choosing a set of vectors \(\mathcal{V}\), a scalar field \(\mathbb{F}\), and a zero vector \(\mathbf{0}\). The field \(\mathbb{F}\) is often the real numbers (\(\mathbb{R}\)), complex numbers (\(\mathbb{C}\)), or a finite field such as the integers modulo a prime (\(\mathbb{Z}_p\)).

Additionally, we must specify how to add two vectors and how to multiply a vector by a scalar.

\[\begin{align*} (+)\ &:\ \mathcal{V}\times\mathcal{V}\mapsto\mathcal{V}\\ (\cdot)\ &:\ \mathbb{F}\times\mathcal{V} \mapsto \mathcal{V} \end{align*}\]

To describe a vector space, our definitions must entail several vector space axioms.

A Functional Vector Space

In the following sections, we’ll work with the vector space of real functions. To avoid ambiguity, square brackets are used to denote function application.

  • The scalar field \(\mathbb{F}\) is the real numbers \(\mathbb{R}\).
  • The set of vectors \(\mathcal{V}\) contains functions from \(\mathbb{R}\) to \(\mathbb{R}\).2
  • \(\mathbf{0}\) is the zero function, i.e. \(\mathbf{0}[x] = 0\).

Adding functions corresponds to applying the functions separately and summing the results.

$$ (f + g)[x] = f[x] + g[x] $$

This definition generalizes the typical element-wise addition rule—it’s like adding the two values at each index.

\[f+g = \begin{bmatrix}f_1 + g_1 \\ f_2 + g_2 \\ \vdots \end{bmatrix}\]

Multiplying a function by a scalar corresponds to applying the function and scaling the result.

$$ (\alpha f)[x] = \alpha f[x] $$

This rule similarly generalizes element-wise multiplication—it’s like scaling the value at each index.

\[\alpha f = \begin{bmatrix}\alpha f_1 \\ \alpha f_2 \\ \vdots \end{bmatrix}\]

Proofs

Given these definitions, we can now prove all necessary vector space axioms. We will illustrate the analog of each property in \(\mathbb{R}^2\), the familiar vector space of two-dimensional arrows.

Therefore, we’ve built a vector space of functions!3 It may not be immediately obvious why this result is useful, but bear with us through a few more definitions—we will spend the rest of this post exploring powerful techniques arising from this perspective.

A Standard Basis for Functions

Unless specified otherwise, vectors are written down with respect to the standard basis. In \(\mathbb{R}^2\), the standard basis consists of the two coordinate axes.

$$ \mathbf{e}_1 = \begin{bmatrix}1 \\ 0\end{bmatrix},\,\, \mathbf{e}_2 = \begin{bmatrix}0 \\ 1\end{bmatrix} $$

Hence, vector notation is shorthand for a linear combination of the standard basis vectors.

$$ \mathbf{u} = \begin{bmatrix}\alpha \\ \beta\end{bmatrix} = \alpha\mathbf{e}_1 + \beta\mathbf{e}_2 $$

Above, we represented functions as vectors by assuming each dimension of an infinite-length vector contains the function’s result for that index. This construction points to a natural generalization of the standard basis.

Just like the coordinate axes, each standard basis function contains a \(1\) at one index and \(0\) everywhere else. More precisely, for every \(\alpha \in \mathbb{R}\):

\[\mathbf{e}_\alpha[x] = \begin{cases} 1 & \text{if } x = \alpha \\ 0 & \text{otherwise} \end{cases}\]

Ideally, we could express an arbitrary function \(f\) as a linear combination of these basis functions. However, there are uncountably many of them—and we can’t simply write down a sum over the reals. Still, considering their linear combination is illustrative:

\[\begin{align*} f[x] &= f[\alpha]\mathbf{e}_\alpha[x] \\ &= f[1]\mathbf{e}_1[x] + f[2]\mathbf{e}_2[x] + f[\pi]\mathbf{e}_\pi[x] + \dots \end{align*}\]

If we evaluate this “sum” at \(x\), we’ll find that all terms are zero—except \(\mathbf{e}_x\), making the result \(f[x]\).


Linear Operators

Now that we can manipulate functions as vectors, let’s start transferring the tools of linear algebra to the functional perspective.

One ubiquitous operation on finite-dimensional vectors is transforming them with matrices. A matrix \(\mathbf{A}\) encodes a linear transformation, meaning multiplication preserves linear combinations.

\[\mathbf{A}(\alpha \mathbf{x} + \beta \mathbf{y}) = \alpha \mathbf{A}\mathbf{x} + \beta \mathbf{A}\mathbf{y}\]

Multiplying a vector by a matrix can be intuitively interpreted as defining a new set of coordinate axes from the matrix’s column vectors. The result is a linear combination of the columns:

\[\mathbf{Ax} = \begin{bmatrix} \vert & \vert & \vert \\ \mathbf{u} & \mathbf{v} & \mathbf{w} \\ \vert & \vert & \vert \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} = x_1\mathbf{u} + x_2\mathbf{v} + x_3\mathbf{w}\]
\[\begin{align*} \mathbf{Ax} &= \begin{bmatrix} \vert & \vert & \vert \\ \mathbf{u} & \mathbf{v} & \mathbf{w} \\ \vert & \vert & \vert \end{bmatrix} \begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} \\ &= x_1\mathbf{u} + x_2\mathbf{v} + x_3\mathbf{w} \end{align*}\]

When all vectors can be expressed as a linear combination of \(\mathbf{u}\), \(\mathbf{v}\), and \(\mathbf{w}\), the columns form a basis for the underlying vector space. Here, the matrix \(\mathbf{A}\) transforms a vector from the \(\mathbf{uvw}\) basis into the standard basis.

Since functions are vectors, we could imagine transforming a function by a matrix. Such a matrix would be infinite-dimensional, so we will instead call it a linear operator and denote it with \(\mathcal{L}\).

\[\mathcal{L}f = \begin{bmatrix} \vert & \vert & \vert & \\ \mathbf{f} & \mathbf{g} & \mathbf{h} & \cdots \\ \vert & \vert & \vert & \end{bmatrix} \begin{bmatrix}f_1\\ f_2 \\ f_3\\ \vdots\end{bmatrix} = f_1\mathbf{f} + f_2\mathbf{g} + f_3\mathbf{h} + \cdots\]
\[\begin{align*} \mathcal{L}f &= \begin{bmatrix} \vert & \vert & \vert & \\ \mathbf{f} & \mathbf{g} & \mathbf{h} & \cdots \\ \vert & \vert & \vert & \end{bmatrix} \begin{bmatrix}f_1\\ f_2\\ f_3 \\ \vdots\end{bmatrix} \\ &= f_1\mathbf{f} + f_2\mathbf{g} + f_3\mathbf{h} + \cdots \end{align*}\]

This visualization isn’t very accurate—we’re dealing with uncountably infinite-dimensional vectors, so we can’t actually write out an operator in matrix form. Nonetheless, the structure is suggestive: each “column” of the operator describes a new basis function for our functional vector space. Just like we saw with finite-dimensional vectors, \(\mathcal{L}\) represents a change of basis.

Differentiation

So, what’s an example of a linear operator on functions? You might recall that differentiation is linear:

\[\frac{\partial}{\partial x} \left(\alpha f[x] + \beta g[x]\right) = \alpha\frac{\partial f}{\partial x} + \beta\frac{\partial g}{\partial x}\]

It’s hard to visualize differentiation on general functions, but it’s feasible for the subspace of polynomials, \(\mathcal{P}\). Let’s take a slight detour to examine this smaller space of functions.

\[\mathcal{P} = \{ p[x] = a + bx + cx^2 + dx^3 + \cdots \}\]

We typically write down polynomials as a sequence of powers, i.e. \(1, x, x^2, x^3\), etc. All polynomials are linear combinations of the functions \(\mathbf{e}_i[x] = x^i\), so they constitute a countably infinite basis for \(\mathcal{P}\).4

This basis provides a convenient vector notation:

\[\begin{align*} p[x] &= a + bx + cx^2 + dx^3 + \cdots \\ &= a\mathbf{e}_0 + b\mathbf{e}_1 + c \mathbf{e}_2 + d\mathbf{e}_3 + \dots \end{align*}\ \iff\ \mathbf{p} = \begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix}\]
\[\begin{align*} p[x] &= a + bx + cx^2 + dx^3 + \cdots \\ &= a\mathbf{e}_0 + b\mathbf{e}_1 + c \mathbf{e}_2 + d\mathbf{e}_3 + \dots \\& \iff\ \mathbf{p} = \begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix} \end{align*}\]

Since differentiation is linear, we’re able to apply the rule \(\frac{\partial}{\partial x} x^n = nx^{n-1}\) to each term.

\[\begin{align*}\frac{\partial}{\partial x}p[x] &= \vphantom{\Bigg\vert}a\frac{\partial}{\partial x}1 + b\frac{\partial}{\partial x}x + c\frac{\partial}{\partial x}x^2 + d\frac{\partial}{\partial x}x^3 + \dots \\ &= b + 2cx + 3dx^2 + \cdots\\ &= b\mathbf{e}_0 + 2c\mathbf{e}_1 + 3d\mathbf{e}_2 + \dots\end{align*} \ \iff\ \frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\]
\[\begin{align*}\frac{\partial}{\partial x}p[x] &= \vphantom{\Bigg\vert}a\frac{\partial}{\partial x}1 + b\frac{\partial}{\partial x}x + c\frac{\partial}{\partial x}x^2\, +\\ & \phantom{=} d\frac{\partial}{\partial x}x^3 + \dots \\ &= b + 2cx + 3dx^2 + \cdots\\ &= b\mathbf{e}_0 + 2c\mathbf{e}_1 + 3d\mathbf{e}_2 + \dots \\ &\iff\ \frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\end{align*}\]

We’ve performed a linear transformation on the coefficients, so we can represent differentiation as a matrix!

\[\frac{\partial}{\partial x}\mathbf{p} = \begin{bmatrix}0 & 1 & 0 & 0 & \cdots\\ 0 & 0 & 2 & 0 & \cdots\\ 0 & 0 & 0 & 3 & \cdots\\ \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}\begin{bmatrix}a\\ b\\ c\\ d\\ \vdots\end{bmatrix} = \begin{bmatrix}b\\ 2c\\ 3d\\ \vdots\end{bmatrix}\]

Each column of the differentiation operator is itself a polynomial, so this matrix represents a change of basis.

\[\frac{\partial}{\partial x} = \begin{bmatrix} \vert & \vert & \vert & \vert & \vert & \\ 0 & 1 & 2x & 3x^2 & 4x^3 & \cdots \\ \vert & \vert & \vert & \vert & \vert & \end{bmatrix}\]

As we can see, the differentiation operator simply maps each basis function to its derivative.

This result also applies to the larger space of analytic real functions, which includes polynomials, exponential functions, trigonometric functions, logarithms, and other familiar names. By definition, an analytic function can be expressed as a Taylor series about \(0\):

\[f[x] = \sum_{n=0}^\infty \frac{f^{(n)}[0]}{n!}x^n = \sum_{n=0}^\infty \alpha_n x^n\]

Which is a linear combination of our polynomial basis functions. That means a Taylor expansion is essentially a change of basis into the sequence of powers, where our differentiation operator is quite simple.5


Diagonalization

Matrix decompositions are arguably the crowning achievement of linear algebra. To get started, let’s review what diagonalization means for a \(3\times3\) real matrix \(\mathbf{A}\).

Eigenvectors

A vector \(\mathbf{u}\) is an eigenvector of the matrix \(\mathbf{A}\) when the following condition holds:

$$ \mathbf{Au} = \lambda \mathbf{u} $$

The eigenvalue \(\lambda\) may be computed by solving the characteristic polynomial of \(\mathbf{A}\). Eigenvalues may be real or complex.

The matrix \(\mathbf{A}\) is diagonalizable when it admits three linearly independent eigenvectors, each with a corresponding real eigenvalue. This set of eigenvectors constitutes an eigenbasis for the underlying vector space, indicating that we can express any vector \(\mathbf{x}\) via their linear combination.

$$ \mathbf{x} = \alpha\mathbf{u}_1 + \beta\mathbf{u}_2 + \gamma\mathbf{u}_3 $$

To multiply \(\mathbf{x}\) by \(\mathbf{A}\), we just have to scale each component by its corresponding eigenvalue.

$$ \begin{align*} \mathbf{Ax} &= \alpha\mathbf{A}\mathbf{u}_1 + \beta\mathbf{A}\mathbf{u}_2 + \gamma\mathbf{A}\mathbf{u}_3 \\ &= \alpha\lambda_1\mathbf{u}_1 + \beta\lambda_2\mathbf{u}_2 + \gamma\lambda_3\mathbf{u}_3 \end{align*} $$

Finally, re-combining the eigenvectors expresses the result in the standard basis.

Intuitively, we’ve shown that multiplying by \(\mathbf{A}\) is equivalent to a change of basis, a scaling, and a change back. That means we can write \(\mathbf{A}\) as the product of an invertible matrix \(\mathbf{U}\) and a diagonal matrix \(\mathbf{\Lambda}\).

\[\begin{align*} \mathbf{A} &= \mathbf{U\Lambda U^{-1}} \\ &= \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix} \begin{bmatrix}\lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{bmatrix} \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix}^{-1} \end{align*}\]
\[\begin{align*} \mathbf{A} &= \mathbf{U\Lambda U^{-1}} \\ &= \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix} \\ & \phantom{=} \begin{bmatrix}\lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{bmatrix} \\ & \phantom{=} \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix}^{-1} \end{align*}\]

Note that \(\mathbf{U}\) is invertible because its columns (the eigenvectors) form a basis for \(\mathbb{R}^3\). When multiplying by \(\mathbf{x}\), \(\mathbf{U}^{-1}\) converts \(\mathbf{x}\) to the eigenbasis, \(\mathbf{\Lambda}\) scales by the corresponding eigenvalues, and \(\mathbf{U}\) takes us back to the standard basis.

In the presence of complex eigenvalues, \(\mathbf{A}\) may still be diagonalizable if we allow \(\mathbf{U}\) and \(\mathbf{\Lambda}\) to include complex entires. In this case, the decomposition as a whole still maps real vectors to real vectors, but the intermediate values become complex.

Eigenfunctions

So, what does diagonalization mean in a vector space of functions? Given a linear operator \(\mathcal{L}\), you might imagine a corresponding definition for eigenfunctions:

\[\mathcal{L}f = \psi f\]

The scalar \(\psi\) is again known as an eigenvalue. Since \(\mathcal{L}\) is infinite-dimensional, it doesn’t have a characteristic polynomial—there’s not a straightforward method for computing \(\psi\).

Nevertheless, let’s attempt to diagonalize differentiation on analytic functions. The first step is to find the eigenfunctions. Start by applying the above condition to our differentiation operator in the power basis:

\[\begin{align*} && \frac{\partial}{\partial x}\mathbf{p} = \psi \mathbf{p} \vphantom{\Big|}& \\ &\iff& \begin{bmatrix}0 & 1 & 0 & 0 & \cdots\\ 0 & 0 & 2 & 0 & \cdots\\ 0 & 0 & 0 & 3 & \cdots\\ \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix}\begin{bmatrix}p_0\\ p_1\\ p_2\\ p_3\\ \vdots\end{bmatrix} &= \begin{bmatrix}\psi p_0\\ \psi p_1 \\ \psi p_2 \\ \psi p_3 \\ \vdots \end{bmatrix} \\ &\iff& \begin{cases} p_1 &= \psi p_0 \\ p_2 &= \frac{\psi}{2} p_1 \\ p_3 &= \frac{\psi}{3} p_2 \\ &\dots \end{cases} & \end{align*}\]

This system of equations implies that all coefficients are determined solely by our choice of constants \(p_0\) and \(\psi\). We can explicitly write down their relationship as \(p_i = \frac{\psi^i}{i!}p_0\).

Now, let’s see what this class of polynomials actually looks like.

\[p[x] = p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2 + p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots\]
\[\begin{align*} p[x] &= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2\, +\\ &\phantom{=} p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots \end{align*}\]

Differentiation shows that this function is, in fact, an eigenfunction for the eigenvalue \(\psi\).

\[\begin{align*} \frac{\partial}{\partial x} p[x] &= 0 + p_0\psi + p_0 \psi^2 x + p_0\frac{\psi^3}{2}x^2 + p_0\frac{\psi^4}{6}x^3 + \dots \\ &= \psi p[x] \end{align*}\]
\[\begin{align*} \frac{\partial}{\partial x} p[x] &= 0 + p_0\psi + p_0 \psi^2 x\, +\\ &\phantom{=} p_0\frac{\psi^3}{2}x^2 + p_0\frac{\psi^4}{6}x^3 + \dots \\ &= \psi p[x] \end{align*}\]

With a bit of algebraic manipulation, the definition of \(e^{x}\) pops out:

\[\begin{align*} p[x] &= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2 + p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots \\ &= p_0\left((\psi x) + \frac{1}{2!}(\psi x)^2 + \frac{1}{3!}(\psi x)^3 + \frac{1}{4!}(\psi x)^4 + \dots\right) \\ &= p_0 e^{\psi x} \end{align*}\]
\[\begin{align*} p[x] &= p_0 + p_0\psi x + p_0\frac{\psi^2}{2}x^2\, +\\ &\phantom{=} p_0\frac{\psi^3}{6}x^3 + p_0\frac{\psi^4}{24}x^4 + \dots \\ &= p_0\Big((\psi x) + \frac{1}{2!}(\psi x)^2\, +\\ &\phantom{=p_0\Big((} \frac{1}{3!}(\psi x)^3 + \frac{1}{4!}(\psi x)^4 + \dots\Big) \\ &= p_0 e^{\psi x} \end{align*}\]

Therefore, functions of the form \(p_0e^{\psi x}\) are eigenfunctions for the eigenvalue \(\psi\), including when \(\psi=0\).

Diagonalizing Differentiation

We’ve found the eigenfunctions of the derivative operator, but can we diagonalize it? Ideally, we would express differentiation as the combination of an invertible operator \(\mathcal{L}\) and a diagonal operator \(\mathcal{D}\).

\[\begin{align*} \frac{\partial}{\partial x} &= \mathcal{L} \mathcal{D} \mathcal{L}^{-1} \\ &= \begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix} \begin{bmatrix} \psi_1 & 0 & \dots \\ 0 & \psi_2 & \dots \\ \vdots & \vdots & \ddots \end{bmatrix} {\color{red} \begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix}^{-1} } \end{align*}\]
\[\begin{align*} \frac{\partial}{\partial x} &= \mathcal{L} \mathcal{D} \mathcal{L}^{-1} \\ &= \begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix} \\ & \phantom{=} \begin{bmatrix} \psi_1 & 0 & \dots \\ 0 & \psi_2 & \dots \\ \vdots & \vdots & \ddots \end{bmatrix} \\ & \phantom{=} {\color{red} \begin{bmatrix} \vert & \vert & & \\ \alpha e^{\psi_1 x} & \beta e^{\psi_2 x} & \dots \\ \vert & \vert & \end{bmatrix}^{-1} } \end{align*}\]

Diagonalization is only possible when our eigenfunctions form a basis. This would be true if all analytic functions are expressible as a linear combination of exponentials. However…

Real exponentials don’t constitute a basis, so we cannot construct an invertible \(\mathcal{L}\).

The Laplace Transform

We previously mentioned that more matrices can be diagonalized if we allow the decomposition to contain complex numbers. Analogously, more linear operators are diagonalizable in the larger vector space of functions from \(\mathbb{R}\) to \(\mathbb{C}\).

Differentiation works the same way in this space; we’ll still find that its eigenfunctions are exponential.

\[\frac{\partial}{\partial x} e^{(a+bi)x} = (a+bi)e^{(a+bi)x}\]

However, the new eigenfunctions have complex eigenvalues, so we still can’t diagonalize. We’ll need to consider the still larger space of functions from \(\mathbb{C}\) to \(\mathbb{C}\).

\[\frac{\partial}{\partial x} : (\mathbb{C}\mapsto\mathbb{C}) \mapsto (\mathbb{C}\mapsto\mathbb{C})\]

In this space, differentiation can be diagonalized via the Laplace transform. Although useful for solving differential equations, the Laplace transform is non-trivial to invert, so we won’t discuss it further. In the following sections, we’ll delve into an operator that can be easily diagonalized in \(\mathbb{R}\mapsto\mathbb{C}\): the Laplacian.


Inner Product Spaces

Before we get to the spectral theorem, we’ll need to understand one more topic: inner products. You’re likely already familiar with one example of an inner product—the Euclidean dot product.

\[\begin{bmatrix}x\\ y\\ z\end{bmatrix} \cdot \begin{bmatrix}a\\ b\\ c\end{bmatrix} = ax + by + cz\]

An inner product describes how to measure a vector along another vector. For example, \(\mathbf{u}\cdot\mathbf{v}\) is proportional to the length of the projection of \(\mathbf{u}\) onto \(\mathbf{v}\).

$$ \mathbf{u} \cdot \mathbf{v} =\|\mathbf{u}\|\|\mathbf{v}\|\cos[\theta] $$

With a bit of trigonometry, we can show that the dot product is equivalent to multiplying the vectors’ lengths with the cosine of their angle. This relationship suggests that the product of a vector with itself produces the square of its length.

\[\begin{align*} \mathbf{u}\cdot\mathbf{u} &= \|\mathbf{u}\|\|\mathbf{u}\|\cos[0] \\ &= \|\mathbf{u}\|^2 \end{align*}\]

Similarly, when two vectors form a right angle (are orthogonal), their dot product is zero.

$$ \begin{align*} \mathbf{u} \cdot \mathbf{v} &= \|\mathbf{u}\|\|\mathbf{v}\|\cos[90^\circ] \\ &= 0 \end{align*} $$

Of course, the Euclidean dot product is only one example of an inner product. In more general spaces, the inner product is denoted using angle brackets, such as \(\langle \mathbf{u}, \mathbf{v} \rangle\).

  • The length (also known as the norm) of a vector is defined as \(\|\mathbf{u}\| = \sqrt{\langle \mathbf{u}, \mathbf{u} \rangle}\).
  • Two vectors are orthogonal if their inner product is zero: \(\ \mathbf{u} \perp \mathbf{v}\ \iff\ \langle \mathbf{u}, \mathbf{v} \rangle = 0\).

A vector space augmented with an inner product is known as an inner product space.

A Functional Inner Product

We can’t directly apply the Euclidean dot product to our space of real functions, but its \(N\)-dimensional generalization is suggestive.

\[\begin{align*} \mathbf{u} \cdot \mathbf{v} &= u_1v_1 + u_2v_2 + \dots + u_Nv_N \\ &= \sum_{i=1}^N u_iv_i \end{align*}\]

Given countable indices, we simply match up the values, multiply them, and add the results. When indices are uncountable, we can convert the discrete sum to its continuous analog: an integral!

\[\langle f, g \rangle = \int_a^b f[x]g[x] \, dx\]

When \(f\) and \(g\) are similar, multiplying them produces a larger function; when they’re different, they cancel out. Integration measures their product over some domain to produce a scalar result.

Of course, not all functions can be integrated. Our inner product space will only contain functions that are square integrable over the domain \([a, b]\), which may be \([-\infty, \infty]\). Luckily, the important properties of our inner product do not depend on the choice of integration domain.

Proofs

Below, we’ll briefly cover functions from \(\mathbb{R}\) to \(\mathbb{C}\). In this space, our intuitive notion of similarity still applies, but we’ll use a slightly more general inner product:

\[\langle f,g \rangle = \int_a^b f[x]\overline{g[x]}\, dx\]

Where \(\overline{x}\) denotes conjugation, i.e. \(\overline{a + bi} = a - bi\).

Like other vector space operations, an inner product must satisfy several axioms:

Along with the definition \(\|f\| = \sqrt{\langle f, f \rangle}\), these properties entail a variety of important results, including the Cauchy–Schwarz and triangle inequalities.


The Spectral Theorem

Diagonalization is already a powerful technique, but we’re building up to an even more important result regarding orthonormal eigenbases. In an inner product space, an orthonormal basis must satisfy two conditions: each vector is unit length, and all vectors are mutually orthogonal.

$$ \begin{cases} \langle\mathbf{u}_i,\mathbf{u}_i\rangle = 1 & \forall i \\ \langle \mathbf{u}_i, \mathbf{u}_j \rangle = 0 & \forall i \neq j \end{cases} $$

A matrix consisting of orthonormal columns is known as an orthogonal matrix. Orthogonal matrices represent rotations of the standard basis.

In an inner product space, matrix-vector multiplication computes the inner product of the vector with each row of the matrix. Something interesting happens when we multiply an orthogonal matrix \(\mathbf{U}\) by its transpose:

\[\begin{align*} \mathbf{U}^T\mathbf{U} &= \begin{bmatrix}\text{---} & \mathbf{u}_1 & \text{---} \\ \text{---} & \mathbf{u}_2 & \text{---} \\ \text{---} & \mathbf{u}_3 & \text{---} \end{bmatrix} \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix} \\ &= \begin{bmatrix} \langle \mathbf{u}_1, \mathbf{u}_1 \rangle & \langle \mathbf{u}_1, \mathbf{u}_2 \rangle & \langle \mathbf{u}_1, \mathbf{u}_3 \rangle \\ \langle \mathbf{u}_2, \mathbf{u}_1 \rangle & \langle \mathbf{u}_2, \mathbf{u}_2 \rangle & \langle \mathbf{u}_2, \mathbf{u}_3 \rangle \\ \langle \mathbf{u}_3, \mathbf{u}_1 \rangle & \langle \mathbf{u}_3, \mathbf{u}_2 \rangle & \langle \mathbf{u}_3, \mathbf{u}_3 \rangle \end{bmatrix} \\ &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \\ &= \mathcal{I} \end{align*}\]

Since \(\mathbf{U}^T\mathbf{U} = \mathcal{I}\) (and \(\mathbf{U}\mathbf{U}^T = \mathcal{I}\)), we’ve found that the transpose of \(\mathbf{U}\) is equal to its inverse.

When diagonalizing \(\mathbf{A}\), we used \(\mathbf{U}\) to transform vectors from our eigenbasis to the standard basis. Conversely, its inverse transformed vectors from the standard basis to our eigenbasis. If \(\mathbf{U}\) happens to be orthogonal, transforming a vector \(\mathbf{x}\) into the eigenbasis is equivalent to projecting \(\mathbf{x}\) onto each eigenvector.

$$ \mathbf{U}^{-1}\mathbf{x} = \mathbf{U}^T\mathbf{x} = \begin{bmatrix}\langle \mathbf{u}_1, \mathbf{x} \rangle \\ \langle \mathbf{u}_2, \mathbf{x} \rangle \\ \langle \mathbf{u}_3, \mathbf{x} \rangle \end{bmatrix} $$

Additionally, the diagonalization of \(\mathbf{A}\) becomes quite simple:

\[\begin{align*} \mathbf{A} &= \mathbf{U\Lambda U^T} \\ &= \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix} \begin{bmatrix}\lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{bmatrix} \begin{bmatrix}\text{---} & \mathbf{u}_1 & \text{---} \\ \text{---} & \mathbf{u}_2 & \text{---} \\ \text{---} & \mathbf{u}_3 & \text{---} \end{bmatrix} \end{align*}\]
\[\begin{align*} \mathbf{A} &= \mathbf{U\Lambda U^T} \\ &= \begin{bmatrix}\vert & \vert & \vert \\ \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 \\ \vert & \vert & \vert \end{bmatrix} \\ &\phantom{=} \begin{bmatrix}\lambda_1 & 0 & 0 \\ 0 & \lambda_2 & 0 \\ 0 & 0 & \lambda_3 \end{bmatrix} \\ &\phantom{=} \begin{bmatrix}\text{---} & \mathbf{u}_1 & \text{---} \\ \text{---} & \mathbf{u}_2 & \text{---} \\ \text{---} & \mathbf{u}_3 & \text{---} \end{bmatrix} \end{align*}\]

Given an orthogonal diagonalization of \(\mathbf{A}\), we can deduce that \(\mathbf{A}\) must be symmetric, i.e. \(\mathbf{A} = \mathbf{A}^T\).

\[\begin{align*} \mathbf{A}^T &= (\mathbf{U\Lambda U}^T)^T \\ &= {\mathbf{U}^T}^T \mathbf{\Lambda }^T \mathbf{U}^T \\ &= \mathbf{U\Lambda U}^T \\ &= \mathbf{A} \end{align*}\]

The spectral theorem states that the converse is also true: \(\mathbf{A}\) is symmetric if and only if it admits an orthonormal eigenbasis with real eigenvalues. Proving this result is somewhat involved in finite dimensions and very involved in infinite dimensions, so we won’t reproduce the proofs here.

Self-Adjoint Operators

We can generalize the spectral theorem to our space of functions, where it states that a self-adjoint operator admits an orthonormal eigenbasis with real eigenvalues.6

Denoted as \(\mathbf{A}^{\hspace{-0.1em}\star\hspace{0.1em}}\), the adjoint of an operator \(\mathbf{A}\) is defined by the following relationship.

\[\langle \mathbf{Ax}, \mathbf{y} \rangle = \langle \mathbf{x}, \mathbf{A}^{\hspace{-0.1em}\star\hspace{0.1em}}\mathbf{y} \rangle\]

When \(\mathbf{A} = \mathbf{A}^\star\), we say that \(\mathbf{A}\) is self-adjoint.

The adjoint can be thought of as a generalized transpose—but it’s not obvious what that means in infinite dimensions. We will simply use our functional inner product to determine whether an operator is self-adjoint.

The Laplace Operator

Earlier, we weren’t able to diagonalize (real) differentiation, so it must not be self-adjoint. Therefore, we will explore another fundamental operator, the Laplacian.

There are many equivalent definitions of the Laplacian, but in our space of one-dimensional functions, it’s just the second derivative. We will hence restrict our domain to twice-differentiable functions.

\[\Delta f = \frac{\partial^2 f}{\partial x^2}\]

We may compute \(\Delta^\star\) using two integrations by parts:

\[\begin{align*} \left\langle \Delta f[x], g[x] \right\rangle &= \int_a^b f^{\prime\prime}[x] g[x]\, dx \\ &= f^\prime[x]g[x]\Big|_a^b - \int_a^b f^{\prime}[x] g^{\prime}[x]\, dx \\ &= (f^\prime[x]g[x] - f[x]g^{\prime}[x])\Big|_a^b + \int_a^b f[x] g^{\prime\prime}[x]\, dx \\ &= \left\langle f[x], \Delta g[x] \right\rangle \end{align*}\]
\[\begin{align*} \left\langle \Delta f[x], g[x] \right\rangle &= \int_a^b f^{\prime\prime}[x] g[x]\, dx \\ &= f^\prime[x]g[x]\Big|_a^b - \int_a^b f^{\prime}[x] g^{\prime}[x]\, dx \\ &= (f^\prime[x]g[x] - f[x]g^{\prime}[x])\Big|_a^b \\&\hphantom{=}+ \int_a^b f[x] g^{\prime\prime}[x]\, dx \\ &= \left\langle f[x], \Delta g[x] \right\rangle \end{align*}\]

In the final step, we assume that \((f^\prime[x]g[x] - f[x]g^{\prime}[x])\big|_a^b = 0\), which is not true in general. To make our conclusion valid, we will constrain our domain to only include functions satisfying this boundary condition. Specifically, we will only consider periodic functions with period \(b-a\). These functions have the same value and derivative at \(a\) and \(b\), so the additional term vanishes.

For simplicity, we will also assume our domain to be \([0,1]\). For example:

Therefore, the Laplacian is self-adjoint…almost. Technically, we’ve shown that the Laplacian is symmetric, not that \(\Delta = \Delta^\star\). This is a subtle point, and it’s possible to prove self-adjointness, so we will omit this detail.

Laplacian Eigenfunctions

Applying the spectral theorem tells us that the Laplacian admits an orthonormal eigenbasis. Let’s find it.7

Since the Laplacian is simply the second derivative, real exponentials would still be eigenfunctions—but they’re not periodic, so we’ll have to exclude them.

\[\color{red} \Delta e^{\psi x} = \psi^2 e^{\psi x}\]

Luckily, a new class of periodic eigenfunctions appears:

\[\begin{align*} \Delta \sin[\psi x] &= -\psi^2 \sin[\psi x] \\ \Delta \cos[\psi x] &= -\psi^2 \cos[\psi x] \end{align*}\]

If we allow our diagonalization to introduce complex numbers, we can also consider functions from \(\mathbb{R}\) to \(\mathbb{C}\) . Here, purely complex exponentials are eigenfunctions with real eigenvalues.

\[\begin{align*} \Delta e^{\psi i x} &= (\psi i)^2e^{\psi i x} \\ &= -\psi^2 e^{\psi i x} \\ &= -\psi^2 (\cos[\psi x] + i\sin[\psi x]) \tag{Euler's formula} \end{align*}\]
\[\begin{align*} \Delta e^{\psi i x} &= (\psi i)^2e^{\psi i x} \\ &= -\psi^2 e^{\psi i x} \\ &= -\psi^2 (\cos[\psi x] + i\sin[\psi x]) \\& \tag{Euler's formula} \end{align*}\]

Using Euler’s formula, we can see that these two perspectives are equivalent: they both introduce \(\sin\) and \(\cos\) as eigenfunctions. Either path can lead to our final result, but we’ll stick with the more compact complex case.

We also need to constrain the set of eigenfunctions to be periodic on \([0,1]\). As suggested above, we can pick out the eigenvalues that are an integer multiple of \(2\pi\).

\[e^{2\pi \xi i x} = \cos[2\pi \xi x] + i\sin[2\pi \xi x]\]

Our set of eigenfunctions is therefore \(e^{2\pi \xi i x}\) for all integers \(\xi\).

Diagonalizing the Laplacian

Now that we’ve found suitable eigenfunctions, we can construct an orthonormal basis.

Our collection of eigenfunctions is linearly independent, as each one corresponds to a distinct eigenvalue. Next, we can check for orthogonality and unit magnitude:

The final step is to show that all functions in our domain can be represented by a linear combination of eigenfunctions. To do so, we will find an invertible operator \(\mathcal{L}\) representing the proper change of basis.

Critically, since our eigenbasis is orthonormal, we can intuitively consider the inverse of \(\mathcal{L}\) to be its transpose.

\[\mathcal{I} = \mathcal{L}\mathcal{L}^{-1} = \mathcal{L}\mathcal{L}^{T} = \begin{bmatrix} \vert & \vert & & \\ e^{2\pi\xi_1 i x} & e^{2\pi\xi_2 i x} & \dots \\ \vert & \vert & \end{bmatrix}\begin{bmatrix} \text{---} & e^{2\pi\xi_1 i x} & \text{---} \\ \text{---} & e^{2\pi\xi_2 i x} & \text{---} \\ & \vdots & \end{bmatrix}\]
\[\begin{align*} \mathcal{I} &= \mathcal{L}\mathcal{L}^{-1} \\&= \mathcal{L}\mathcal{L}^{T} \\&= \begin{bmatrix} \vert & \vert & & \\ e^{2\pi\xi_1 i x} & e^{2\pi\xi_2 i x} & \dots \\ \vert & \vert & \end{bmatrix}\\ & \phantom{=} \begin{bmatrix} \text{---} & e^{2\pi\xi_1 i x} & \text{---} \\ \text{---} & e^{2\pi\xi_2 i x} & \text{---} \\ & \vdots & \end{bmatrix} \end{align*}\]

This visualization suggests that \(\mathcal{L}^Tf\) computes the inner product of \(f\) with each eigenvector.

\[\mathcal{L}^Tf = \begin{bmatrix}\langle f, e^{2\pi\xi_1 i x} \rangle \\ \langle f, e^{2\pi\xi_2 i x} \rangle \\ \vdots \end{bmatrix}\]

Which is highly reminiscent of the finite-dimensional case, where we projected onto each eigenvector of an orthogonal eigenbasis.

This insight allows us to write down the product \(\mathcal{L}^Tf\) as an integer function \(\hat{f}[\xi]\). Note that the complex inner product conjugates the second argument, so the exponent is negated.

\[(\mathcal{L}^Tf)[\xi] = \hat{f}[\xi] = \int_0^1 f[x]e^{-2\pi\xi i x}\, dx\]

Conversely, \(\mathcal{L}\) converts \(\hat{f}\) back to the standard basis. It simply creates a linear combination of eigenfunctions.

\[(\mathcal{L}\hat{f})[x] = f[x] = \sum_{\xi=-\infty}^\infty \hat{f}[\xi] e^{2\pi\xi i x}\]

These operators are, in fact, inverses of each other, but a rigorous proof is beyond the scope of this post. Therefore, we’ve diagonalized the Laplacian:

\[\begin{align*} \Delta &= \mathcal{L} \mathcal{D} \mathcal{L}^T \\ &= \begin{bmatrix} \vert & \vert & & \\ e^{2\pi\xi_1 i x} & e^{2\pi\xi_2 i x} & \dots \\ \vert & \vert & \end{bmatrix} \begin{bmatrix} -(2\pi\xi_1)^2 & 0 & \dots \\ 0 & -(2\pi\xi_2)^2 & \dots \\ \vdots & \vdots & \ddots \end{bmatrix} \begin{bmatrix} \text{---} & e^{2\pi\xi_1 i x} & \text{---} \\ \text{---} & e^{2\pi\xi_2 i x} & \text{---} \\ & \vdots & \end{bmatrix} \end{align*}\]
\[\begin{align*} \Delta &= \mathcal{L} \mathcal{D} \mathcal{L}^T \\ &= \begin{bmatrix} \vert & \vert & & \\ e^{2\pi\xi_1 i x} & e^{2\pi\xi_2 i x} & \dots \\ \vert & \vert & \end{bmatrix} \\ & \phantom{=} \begin{bmatrix} -(2\pi\xi_1)^2 & 0 & \dots \\ 0 & -(2\pi\xi_2)^2 & \dots \\ \vdots & \vdots & \ddots \end{bmatrix} \\ & \phantom{=} \begin{bmatrix} \text{---} & e^{2\pi\xi_1 i x} & \text{---} \\ \text{---} & e^{2\pi\xi_2 i x} & \text{---} \\ & \vdots & \end{bmatrix} \end{align*}\]

Although \(\mathcal{L}^T\) transforms our real-valued function into a complex-valued function, \(\Delta\) as a whole still maps real functions to real functions. Next, we’ll see how \(\mathcal{L}^T\) is itself an incredibly useful transformation.


Applications

In this section, we’ll explore several applications in signal processing, each of which arises from diagonalizing the Laplacian on a new domain.

Fourier Series

If you’re familiar with Fourier methods, you likely noticed that \(\hat{f}\) encodes the Fourier series of \(f\). That’s because a Fourier transform is a change of basis into the Laplacian eigenbasis!

This basis consists of waves, which makes \(\hat{f}\) a particularly interesting representation for \(f\). For example, consider evaluating \(\hat{f}[1]\):

\[\hat{f}[1] = \int_0^1 f[x] e^{-2\pi i x}\, dx = \int_0^1 f[x](\cos[2\pi x] - i\sin[2\pi x])\, dx\]
\[\begin{align*} \hat{f}[1] &= \int_0^1 f[x] e^{-2\pi i x}\, dx \\&= \int_0^1 f[x](\cos[2\pi x] - i\sin[2\pi x])\, dx \end{align*}\]

This integral measures how much of \(f\) is represented by waves of frequency (positive) 1. Naturally, \(\hat{f}[\xi]\) computes the same quantity for any integer frequency \(\xi\).

\[\hphantom{aaa} {\color{#9673A6} \text{Real}\left[e^{2\pi i \xi x}\right]}\,\,\,\,\,\,\,\, {\color{#D79B00} \text{Complex}\left[e^{2\pi i \xi x}\right]}\]

$$\xi = 1$$

Therefore, we say that \(\hat{f}\) expresses our function in the frequency domain. To illustrate this point, we’ll use a Fourier series to decompose a piecewise linear function into a collection of waves.8 Since our new basis is orthonormal, the transform is easy to invert by re-combining the waves.

Here, the \(\color{#9673A6}\text{purple}\) curve is \(f\); the \(\color{#D79B00}\text{orange}\) curve is a reconstruction of \(f\) from the first \(N\) coefficients of \(\hat{f}\). Try varying the number of coefficients and moving the \(\color{#9673A6}\text{purple}\) dots to effect the results.

$$ \hphantom{aaa} {\color{#9673A6} f[x]}\,\,\,\,\,\,\,\,\,\,\,\, {\color{#D79B00} \sum_{\xi=-N}^N \hat{f}[\xi]e^{2\pi i \xi x}} $$

$$N = 3$$

Additionally, explore the individual basis functions making up our result:
$$\hat{f}[0]$$ $$\hat{f}[1]e^{2\pi i x}$$ $$\hat{f}[2]e^{4\pi i x}$$ $$\hat{f}[3]e^{6\pi i x}$$
 

Many interesting operations become easy to compute in the frequency domain. For example, by simply dropping Fourier coefficients beyond a certain threshold, we can reconstruct a smoothed version of our function. This technique is known as a low-pass filter—try it out above.

Image Compression

Computationally, Fourier series are especially useful for compression. Encoding a function \(f\) in the standard basis takes a lot of space, since we store a separate result for each input. If we instead express \(f\) in the Fourier basis, we only need to store a few coefficients—we’ll be able to approximately reconstruct \(f\) by re-combining the corresponding basis functions.

So far, we’ve only defined a Fourier transform for functions on \(\mathbb{R}\). Luckily, the transform arose via diagonalizing the Laplacian, and the Laplacian is not limited to one-dimensional functions. In fact, wherever we can define a Laplacian, we can find a corresponding Fourier transform.9

For example, in two dimensions, the Laplacian becomes a sum of second derivatives.

\[\Delta f[x,y] = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2}\]

For the domain \([0,1]\times[0,1]\), we’ll find a familiar set of periodic eigenfunctions.

\[e^{2\pi i(nx + my)} = \cos[2\pi(nx + my)] + i\sin[2\pi(nx + my)]\]
\[\begin{align*} e^{2\pi i(nx + my)} &= \cos[2\pi(nx + my)]\, + \\ &\phantom{=}\, i\sin[2\pi(nx + my)] \end{align*}\]

Where \(n\) and \(m\) are both integers. Let’s see what these basis functions look like:

\[{\color{#9673A6} \text{Real}\left[e^{2\pi i(nx + my)}\right]}\]
\[{\color{#D79B00} \text{Complex}\left[e^{2\pi i(nx + my)}\right]}\]

$$n = 3$$

$$m = 3$$

Just like the 1D case, the corresponding Fourier transform is a change of basis into the Laplacian’s orthonormal eigenbasis. Above, we decomposed a 1D function into a collection of 1D waves—here, we equivalently decompose a 2D image into a collection of 2D waves.

\[\phantom{\Bigg|} {\color{#9673A6} f[x,y]}\]
\[{\color{#D79B00} \sum_{n=-N}^N \sum_{m=-N}^N \hat{f}[n,m]e^{2\pi i(nx + my)}}\]

$$N = 3$$

A variant of the 2D Fourier transform is at the core of many image compression algorithms, including JPEG.

Spherical Harmonics

Computer graphics is often concerned with functions on the unit sphere, so let’s see if we can find a corresponding Fourier transform. In spherical coordinates, the Laplacian can be defined as follows:

\[\Delta f(\theta, \phi) = \frac{1}{\sin[\theta]}\frac{\partial}{\partial \theta}\left(\sin[\theta] \frac{\partial f}{\partial \theta}\right) + \frac{1}{\sin^2[\theta]}\frac{\partial^2 f}{\partial \phi^2}\]
\[\begin{align*} \Delta f(\theta, \phi) &= \frac{1}{\sin[\theta]}\frac{\partial}{\partial \theta}\left(\sin[\theta] \frac{\partial f}{\partial \theta}\right)\, + \\ &\phantom{=} \frac{1}{\sin^2[\theta]}\frac{\partial^2 f}{\partial \phi^2} \end{align*}\]

We won’t go through the full derivation, but this Laplacian admits an orthornormal eigenbasis known as the spherical harmonics.10

\[Y_\ell^m[\theta, \phi] = N_\ell^m P_\ell^m[\cos[\theta]] e^{im\phi}\]

Where \(Y_\ell^m\) is the spherical harmonic of degree \(\ell \ge 0\) and order \(m \in [-\ell,\ell]\). Note that \(N_\ell^m\) is a constant and \(P_\ell^m\) are the associated Legendre polynomials.11

\[\hphantom{aa} {\color{#9673A6} \text{Real}\left[Y_\ell^m[\theta,\phi]\right]}\,\,\,\,\,\,\,\, {\color{#D79B00} \text{Complex}\left[Y_\ell^m[\theta,\phi]\right]}\]

$$\ell = 0$$

$$m = 0$$

As above, we define the spherical Fourier transform as a change of basis into the spherical harmonics. In game engines, this transform is often used to compress diffuse environment maps (i.e. spherical images) and global illumination probes.

\[\phantom{\Bigg|} {\color{#9673A6} f[\theta,\phi]}\]
\[{\color{#D79B00} \sum_{\ell=0}^N \sum_{m=-\ell}^\ell \hat{f}[\ell,m]\left( Y_\ell^m[\theta,\phi]e^{im\phi} \right)}\]

$$N = 3$$

You might also recognize spherical harmonics as electron orbitals—quantum mechanics is primarily concerned with the eigenfunctions of linear operators.

Geometry Processing

Representing functions as vectors underlies many modern algorithms—image compression is only one example. In fact, because computers can do linear algebra so efficiently, applying linear-algebraic techniques to functions produces a powerful new computational paradigm.

The nascent field of discrete differential geometry uses this perspective to build algorithms for three-dimensional geometry processing. In computer graphics, functions on meshes often represent textures, unwrappings, displacements, or simulation parameters. DDG gives us a way to faithfully encode such functions as vectors: for example, by associating a value with each vertex of the mesh.

$$ \mathbf{f} = \begin{bmatrix} {\color{#666666} f_1}\\ {\color{#82B366} f_2}\\ {\color{#B85450} f_3}\\ {\color{#6C8EBF} f_4}\\ {\color{#D79B00} f_5}\\ {\color{#9673A6} f_6}\\ {\color{#D6B656} f_7}\\ \end{bmatrix} $$

One particularly relevant result is a Laplace operator for meshes. A mesh Laplacian is a finite-dimensional matrix, so we can use numerical linear algebra to find its eigenfunctions.

As with the continuous case, these functions generalize sine and cosine to a new domain. Here, we visualize the real and complex parts of each eigenfunction, where the two colors indicate positive vs. negative regions.

\[\hphantom{aa} {\color{#9673A6} \text{Rea}}{\color{#82B366}\text{l}\left[\psi_N\right]}\,\,\,\,\,\,\,\, {\color{#D79B00} \text{Comp}}{\color{#6C8EBF} \text{lex}\left[\psi_N\right]}\]

$$4\text{th Eigenfunction}$$

At this point, the implications might be obvious—this eigenbasis is useful for transforming and compressing functions on the mesh. In fact, by interpreting the vertices’ positions as a function, we can even smooth or sharpen the geometry itself.

Further Reading

There’s far more to signal and geometry processing than we can cover here, let alone the many other applications in engineering, physics, and computer science. We will conclude with an (incomplete, biased) list of topics for further exploration. See if you can follow the functions-are-vectors thread throughout:

Thanks to Joanna Y, Hesper Yin, and Fan Pu Zeng for providing feedback on this post.


Footnotes

  1. We shouldn’t assume all countably infinite-dimensional vectors represent functions on the natural numbers. Later on, we’ll discuss the space of real polynomial functions, which also has countably infinite dimensionality. 

  2. If you’re alarmed by the fact that the set of all real functions does not form a Hilbert space, you’re probably not in the target audience of this post. Don’t worry, we will later move on to \(L^2(\mathbb{R})\) and \(L^2(\mathbb{R},\mathbb{C})\). 

  3. If you’re paying close attention, you might notice that none of these proofs depended on the fact that the domain of our functions is the real numbers. In fact, the only necessary assumption is that our functions return elements of a field, allowing us to apply commutativity, associativity, etc. to the outputs.

    Therefore, we can define a vector space over the set of functions that map any fixed set \(\mathcal{S}\) to a field \(\mathbb{F}\). This result illustrates why we could intuitively equate finite-dimensional vectors with mappings from index to value. For example, consider \(\mathcal{S} = \{1,2,3\}\):

    $$ \begin{align*} f_{\mathcal{S}\mapsto\mathbb{R}} &= \{ 1 \mapsto x, 2 \mapsto y, 3 \mapsto z \} \\ &\vphantom{\Big|}\phantom{\,}\Updownarrow \\ f_{\mathbb{R}^3} &= \begin{bmatrix}x \\ y \\ z\end{bmatrix} \end{align*} $$

    If \(\mathcal{S}\) is finite, a function from \(\mathcal{S}\) to a field directly corresponds to a familiar finite-dimensional vector. 

  4. Earlier, we used countably infinite-dimensional vectors to represent functions on the natural numbers. In doing so, we implicitly employed the standard basis of impulse functions indexed by \(i \in \mathbb{N}\). Here, we encode real polynomials by choosing new basis functions of type \(\mathbb{R}\mapsto\mathbb{R}\), namely \(\mathbf{e}_i[x] = x^i\). 

  5. Since this basis spans the space of analytic functions, analytic functions have countably infinite dimensionality. Relative to the uncountably infinite-dimensional space of all real functions, analytic functions are vanishingly rare! 

  6. Actually, it states that a compact self-adjoint operator on a Hilbert space admits an orthonormal eigenbasis with real eigenvalues. Rigorously defining these terms is beyond the scope of this post, but note that our intuition about “infinite-dimensional matrices” only legitimately applies to this class of operators. 

  7. Specifically, a basis for the domain of our Laplacian, not all functions. We’ve built up several conditions that restrict our domain to square integrable, periodic, twice-differentiable real functions on \([0,1]\). 

  8. Also known as a vibe check

  9. Higher-dimensional Laplacians are sums of second derivatives, e.g. \(\Delta f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} + \frac{\partial^2 f}{\partial z^2}\). The Laplace-Beltrami operator expresses this sum as the divergence of the gradient, i.e. \(\Delta f = \nabla \cdot \nabla f\). Using exterior calculus, we can further generalize divergence and gradient to produce the Laplace-de Rham operator, \(\Delta f = \delta d f\), where \(\delta\) is the codifferential and \(d\) is the exterior derivative

  10. Typically, a “harmonic” function must satisfy \(\Delta f = 0\), but in this case we include all eigenfunctions. 

  11. The associated Legendre polynomials are closely related to the Legendre polynomials, which form a orthogonal basis for polynomials. Interestingly, they can be derived by applying the Gram–Schmidt process to the basis of powers (\(1, x, x^2, \dots\)). 

Written on July 29, 2023