Vector Space, Normed Space & Hilbert Space (Machine Learning)

Jonathan Hui
15 min readMay 20, 2024

--

Euclidean space, the familiar geometry of our everyday world, provides a useful framework for understanding basic geometric concepts like angles and distances between points, lines, and planes. But in machine learning (ML), where data can be abstract, we need to venture beyond these familiar boundaries. To tackle the diverse challenges in ML, we turn to more abstract mathematical spaces that offer a richer framework for understanding and manipulating data.

Feature spaces, where each data point is represented as a vector, are crucial for algorithms to identify patterns and make predictions. Metric spaces equip these feature spaces with a distance function, allowing for measuring similarity between data points. Hilbert spaces, a specific type of metric space with additional properties like completeness and an inner product, are particularly useful for kernel-based methods such as SVM.

Their role is less prominent in deep learning, where complex non-linear models often rely more on empirical experimentation and intuition-driven design. However, this article seeks to demystify the mathematical terminology prevalent in research papers that utilize mathematical spaces, making the concepts more accessible and less intimidating for readers.

Prerequisite

This article assumes familiarity with the basic terms outlined in the previous article titled “The Study of Mathematical Spaces (Machine Learning)”.

Metric Space

While the concept of distance between two points is familiar, the power of metric spaces lies in extending this concept to less intuitive mathematical objects, allowing us to apply the properties and operations of metric spaces in broader contexts.

To refresh our understanding, a metric space, denoted ⟨M, d⟩, is essentially a set M along with a well-defined distance function (metric) d.

The metric d is a function that formalizes the notion of distance between elements of the set M. L1 and L2 distances are some popular examples for the metric in ML.

To qualify as a metric function, three simple conditions must be satisfied.

  • Non-negativity: 𝑑(𝑥,𝑦)≥0.
  • Symmetry: The distance is the same in both directions.
  • Triangle Inequality: The direct route is the shortest.

This level of abstraction offers considerable flexibility in defining a metric. Consequently, it grants substantial freedom to design metrics tailored to specific mathematical objects and problems. The goal is to find a metric that proves beneficial in solving your particular challenges. A notable example is the Wasserstein metric, which quantifies the similarity between two probability distributions and has gained popularity as a loss function in Generative Adversarial Networks (GANs).

Complete Metric Spaces

Completeness is a fundamental property of metric spaces. It means that every Cauchy sequence (a sequence where the terms get arbitrarily close together as they progress) converges to a limit that exists within the space. Formally, a metric space M is complete if every Cauchy sequence in M converges to a point that is also in M. This implies that M doesn’t have any “holes” or “missing points.”

Completeness is crucial in various mathematical fields because it provides a foundation for analyzing limits and continuity. In machine learning, optimization algorithms like gradient descent rely on convergence properties. Theoretically, if the underlying space is complete, we can be confident that the algorithm will converge to a solution within the space, ensuring the results are meaningful and reliable.

Examples

Consider the sequence

All elements in this sequence are real numbers inside the interval (0,1). But the limit, 1, is not in the interval (0, 1), and therefore the metric space ⟨X, d⟩ for X = (1, 0) with the usual metric 𝑑(𝑥, 𝑦) = |𝑥−𝑦| is not complete. To address this problem, we can expand X to [0, 1], making the metric space complete.

To assess the completeness of a space, we first need to define a metric. The set of rational numbers equipped with the standard Euclidean metric is not complete. This is because a sequence of rational numbers can converge to an irrational number, which is not part of the rational number set. For example, even we can construct a Cauchy sequence of rational numbers ℚ that converges to the square root of 2. √2 is missing from ℚ.

Vector Spaces (Linear Space)

Vector spaces not only offer a natural structure to organize data points and features but also underpin linear models, such as linear regression and logistic regression, in many ML algorithms. Additionally, they facilitate the linear transformations central to model optimization. Their impact extends to kernel methods and natural language processing, where they enable the capture of semantic relationships through word embeddings.

Vector spaces, also called linear spaces, provides a structure for manipulating objects called vectors. These vectors can be added together and multiplied (scaled) by numbers called scalars. Mathematically, a vector space over a field 𝐹 (commonly the real numbers ℝ or the complex numbers ℂ) is a set 𝑉 equipped with two operations: vector addition and scalar multiplication. The set 𝑉 and the operations must satisfy the following axioms, which hold for all elements 𝑢, 𝑣, w in V and all scalars 𝑎, 𝑏 in F:

These operations are closed under V. The addition and the multiplication results remain under the same vector space. Vectors in a general vector space are abstract mathematical objects. They are not limited to the two-dimensional arrows used to represent physical quantities like force or velocity. The concept of a vector is much broader and can encompass many different types of objects.

Example

Polynomials, like 3x + 4x², x + 2x² + 0.5x⁵, and 5x², form a vector space V over the real numbers. We can add polynomials together and multiply them by real numbers (scalars), and the result will always be another polynomial within the same vector space V. For example, if we add (3x + 4x²) and (x + 2x² + 0.5x⁵), the result is 4x + 6x² + 0.5x⁵, which is also a polynomial in V. Therefore, functions can form a vector space, and each element in 𝑉 is simply a function.

Normed Space & Banach Space

A normed space is a vector space equipped with a norm, which measures the length or magnitude of its vectors. A Banach space is a complete normed space.

Norm

The norm of a vector v is written as ‖v‖. It is a function that assigns to each vector in a vector space V a non-negative real number (including zero), representing its length, magnitude, or size. The norm generalizes the familiar notion of length in Euclidean space to more abstract vector spaces.

A normed vector space is a vector space defined over the field of real or complex numbers (𝔽 ∈ {ℝ, ℂ}). The norm maps each vector to a scalar in the field 𝔽.

Any definition of ‖ · ‖ is called a norm if it satisfies the following properties: positive definiteness, absolute homogeneity, and the triangle inequality.

These three rules also imply a norm is greater or equal to 0. This is the reason why we omit it from the norm axioms.

Here are the common definitions for norms in ML.

Normed Space (Normed Vector Space / Normed Linear Space)

X, ‖ · ‖ ⟩ forms a normed space. Here are two examples in the normed space: the first one uses an absolute value norm while the second one uses a p-norm.

It’s important to note that a normed space is defined by its norm. The behavior of the space, including concepts like convergence and distance, depends on the specific norm being used. Different norms can lead to different conclusions about the properties of the space.

Every normed space inherently possesses a metric induced by its norm. This allows norms to quantify both lengths of individual vectors and distances between vectors within the space. A normed space can simply use the norm to define its metric.

Normed spaces are commonly used to quantify the similarity between data points, assess the quality of model predictions, and design optimization algorithms. For example, the L2 norm is frequently used to measure the error between predicted and true values in regression tasks, while the L1 norm promotes sparsity in feature selection.

Banach Spaces

A Banach space is a complete normed space (X, ‖ · ‖).

Hilbert Space

Hilbert spaces, as a specific type of complete normed vector space with additional structure, offer several advantages for ML applications:

Inner Product: The defining feature of a Hilbert space is its inner product, which generalizes the dot product in Euclidean space. This inner product allows for the calculation of angles and projections, enabling techniques such as:

  • Principal Component Analysis (PCA): Finding directions of maximal variance in data.
  • Support Vector Machines (SVM): Finding optimal hyperplanes for classification.

Orthogonality: Inner products facilitate the concept of orthogonality, which is crucial for:

  • Fourier Transforms: Decomposing signals into orthogonal frequency components.
  • Function Approximation: Representing functions as linear combinations of orthogonal basis functions.

Completeness: Hilbert spaces are complete, meaning that every Cauchy sequence converges within the space. This is important for:

  • Optimization: Guaranteeing the existence of solutions to optimization problems.
  • Function Spaces: Many function spaces used in ML (e.g., spaces of square-integrable functions) are naturally Hilbert spaces.

Duality: The Riesz representation theorem establishes a powerful duality between Hilbert spaces and their continuous dual spaces (spaces of linear functionals). This is exploited in:

  • Kernel Methods: Optimizing the computation of similarity between two points projected in high-dimensional space using properties of the inner product space.
  • Reproducing Kernel Hilbert Spaces (RKHS): A type of Hilbert space that provides a framework for kernel-based learning algorithms.

Inner product

The dot product is an example of an inner product, but the concept of an inner product is more abstract. An inner product is a binary operation on a vector space that satisfies the following axioms for all vectors x, y, z in the space and all scalars a:

Inner Product Space & Hilbert Space

A Hilbert space is endowed with an inner product and is complete. If the space is not complete, it is referred to as an inner product space.

The inner product establishes a framework for abstract geometric concepts such as length, distance, and angle within a vector space.

In every inner product space, we can derive a norm induced by the inner product:

This norm allows us to define the distance between two vectors:

Therefore, a Hilbert space is also a Banach space and a normed space. Furthermore, we can determine the angle between two vectors x and y:

These definitions of norm, distance, and angle establish the geometric structure of inner product spaces.

Both the set of real numbers (ℝ) and the set of 𝑛-dimensional real vectors (ℝ) can form Hilbert spaces. In these cases, the inner product is typically defined as the dot product.

Hilbert Space with Dot Product

For Hilbert spaces consisting of functions, the inner product is often defined as an integral involving the product of the functions.

Cauchy–Schwarz inequality

The Cauchy-Schwarz inequality states that for all vectors x and y in an inner product space:

Equality holds if and only if x and y are linearly dependent, meaning that one vector is a scalar multiple of the other.

Orthogonality

When the inner product of two vectors x and y is zero, they are said to be orthogonal, denoted as xy. Orthogonal vectors are linearly independent, meaning neither can be expressed as a scalar multiple of the other.

Similarly, functions f and g are orthogonal if their inner product is zero. A classic example is the pair of trigonometric functions sin(x) and cos(x), which are orthogonal over the interval [0, 2π]. In the broader context of vector spaces, any vector can be uniquely represented as a linear combination of vectors from an orthonormal basis. Notably, in function spaces, these basis vectors can themselves be functions.

Subsets of a vector space X are considered independent if no element in one subset can be expressed as a linear combination of elements from the other subset.

Use of Hilbert Space

One notable application of Hilbert spaces in ML is the Reproducing Kernel Hilbert Space (RKHS). RKHS is a specialized type of Hilbert space that plays a crucial role in kernel-based learning algorithms, leveraging the inner product structure for efficient computation and representation of functions.

The key advantage of RKHS is its ability to implicitly map data points into high-dimensional feature spaces, where similarities between them can be more easily found. Explicitly performing this mapping would be computationally expensive or even impossible in cases of infinite-dimensional feature spaces. However, the defining property of an RKHS, the reproducing property, combined with the kernel trick, enables efficient computation of these similarities using the kernel function without explicitly representing the data in the high-dimensional space. This is possible because the ultimate goal is often to compute scalar similarity values in the real numbers (ℝ), rather than the high-dimensional features themselves.

𝓁ᵖ Space

The concept of infinity can be daunting in computer science. However, in ML, where algorithms often project data into high-dimensional spaces to uncover hidden patterns, it’s natural to consider extending this to infinite dimensions. While infinite-dimensional spaces might seem impractical, the key isn’t the number of dimensions, but whether we can efficiently compute relevant properties within those spaces. For example, if the sum of an infinite series or the limit of a sequence within an infinite-dimensional space is finite and computable, then infinite dimensionality can be a powerful tool for enhancing the expressiveness of ML models. This capability opens the door to studying and designing infinite-dimensional sequences.

Elements in a mathematical space can be sequences. The ℓᵖ space is a set of infinite sequences where the p-th power of the absolute values of the terms is summable and finite. It is defined as the set of all infinite sequences x = (x₁, x₂, x₃, …) for which the infinite series formed by the p-th power of the absolute values of the elements converges. More formally, it is defined as:

It is a normed space when endowed with the p-norm.

To guarantee completeness, every Cauchy sequence in a space must converge to a limit within that space. In an ℓᵖ space, this condition is satisfied as the infinite series ∑|xᵢ|ᵖ converges for each sequence in the space. Therefore, ℓᵖ spaces qualify as Banach spaces, which are complete normed vector spaces.

Banach Spaces (Complete Normed Space)

Example Sequence in ℓ²

Consider the following sequence as an example of an element in the ℓ² space:

Inner Product for ℓ² Space

The inner product in ℓ² can be defined as follows.

Given the property below, the definition indeed represents an inner product in ℓ² space, fulfilling all necessary criteria. This will also serve as a good example to demonstrate how to prove that a function is an inner product.

ℓ² pace & Hilbert Space

The ℓ² space, often described as the space of square-summable sequences, is a prominent example of a Hilbert space, a type of complete inner product space. In ML, the ℓ² space, which is the space of square-summable sequences, is often used in the context of regularization techniques to prevent overfitting in models.

∞-norm

When p = ∞, the corresponding norm is called the infinity norm (or supremum norm). Since there are infinitely many components in the sequence, finding a particular element as the maximum is not possible, so the infinity norm is computed using the supremum, which is the smallest value that is greater than or equal to all elements in the sequence.

Lᵖ Space

Lᵖ spaces are a specific type of function space defined for functions from a measurable space (often a subset of ℝⁿ) to ℝ (or ℂ). A measurable space extends the concept of size and volume to more abstract spaces. Infinite-dimensional spaces, like function spaces, allow us to model data as functions rather than just finite points. This concept is familiar to engineering students, who use Fourier transforms to represent data. In this context, the input is represented with periodic sinusoidal functions.

The functions in Lᵖ spaces are required to be Lebesgue integrable, meaning the integral of the absolute value of the function raised to the power of p is finite. This ensures the completeness of the L space.

It is a normed vector space when endowed with the p-norm.

The Gaussian function is in L² space. The Gaussian function 𝑓(𝑥) defined on the real line ℝ and specifically considered over an interval, say [−𝑎, 𝑎], is given by:

Its square is integrable. The integral is finite for any finite a, because the exponential function rapidly decays to zero as 𝑥 moves away from zero, making the area under the curve finite. Indeed, the Gaussian function is in L² even over the entire real line (from -∞ to ∞).

Inner Product for Lᵖ Space

The inner product can be defined as:

creating a Hilbert space.

Continuous Function Space

The continuous function space C[a, b] is defined as the set of all real-valued functions that are continuous on the closed interval [a, b]. In mathematical notation, this is expressed as:

Let’s explore some visual examples of continuous functions within this space:

Norm

The corresponding p-norm is

And the natural norm is defined as

The metric d(u, v) below is a supremum between u(x) and v(x). (C, d) forms a complete metric space.

Inner Product

forms an inner product space.

Examples of Hilbert Spaces

Here is an recap of three possible Hilbert spaces, each defined by its unique set of elements and corresponding inner product: Euclidean space equipped with the dot product, ℓ² space characterized by the ℓ² norm, and the space of continuous functions on the interval [0,1].

Isomorphism

Isomorphism is a structure-preserving mapping between two spaces. In the context of vector spaces, this preservation specifically refers to linearity, meaning that the operations of vector addition and scalar multiplication are maintained under the mapping.

In a Banach space, an isomorphism preserves the norm following the mapping. That is,

This is also referred to as an isometric isomorphism, where the distance is preserved before and after the mapping.

For Hilbert spaces X and Y, and the mapping M: XY, isomorphism means that M is a bijective (one-to-one and onto) linear map that preserves the inner product, i.e.,

Recap

Here is an overview of the relationships among some of the spaces we’ve studied. The arrows indicate the “is a” relationship. A normed vector space is inherently a metric space because the norm induces a metric, which quantifies the distance between vectors. Without a defined topology, a vector space by itself does not automatically meet the criteria required to be considered a topological space, as it lacks the concepts of closeness, connectivity or distance.

Source

--

--

Responses (4)