The Study of Mathematical Spaces (Machine Learning)

Jonathan Hui
22 min readMay 15, 2024


The terminology of mathematical spaces in AI research papers can be intimidating. Fortunately, understanding these concepts isn’t always crucial to grasping the core AI ideas. However, some readers may still feel unsatisfied when they can’t fully grasp the researchers’ intended message. This article will first explain some key terms, then explore the mathematical spaces most relevant in machine learning (ML). The field of mathematical spaces is vast, but this article aims to provide a foundational understanding within the context of machine learning, while also catering to those interested in delving deeper into the subject.

Mathematical spaces exhibit a hierarchical structure, reminiscent of object-oriented design. At the top of this hierarchy reside the most abstract spaces, such as topological spaces, which establish fundamental concepts like continuity and convergence. As we move down the hierarchy, spaces become more specialized, acquiring additional structures and properties that tailor them for specific applications.


Let’s begin by discussing fields, a type of mathematical space. Both real numbers and complex numbers form fields. While basic, this concept provides a quick overview and introduces some of the associated terms.

A field ⟨F, +, ·⟩ consists of a set F equipped with two binary operations (operations that take two elements to produce a third element):

  • Addition (+)
  • Multiplication (·)

The set of real numbers ℝ forms a field which contains all real numbers. The operations of addition (+) and multiplication (·) are defined in the usual way for real numbers. However, to qualify as a field, these operations on the elements of the field are required to adhere to the following axioms (basic rules):

For all a, b, cF:

1. Closure under addition and multiplication: a + bF, a · bF.

2. Associativity of addition and multiplication:

  • (a + b) + c = a + (b + c), (a · b) · c = a · (b · c)

3. Commutativity of addition and multiplication:

  • a + b = b + a, a · b = b · a

4. Existence of additive and multiplicative identities:

  • There exists an element 0 ∈ F such that a + 0 = a = 0 + a for all aF.
  • There exists an element 1 ∈ F (where 0 ≠ 1) such that a · 1 = a = 1 · a for all a ∈ F.

5. Existence of additive and multiplicative inverses:

  • For every aF, there exists an element -aF such that a + (-a) = 0 = (-a) + a.
  • For every aF where a ≠ 0, there exists an element a⁻¹ ∈ F such that a · a⁻¹ = 1 = a⁻¹ · a.

7. Distributivity of multiplication over addition:

  • a · (b + c) = (a · b) + (a · c)

A field is closed under addition, and multiplication. This means that performing these operations on elements within the field will always produce another element within the same field.

In quantum mechanics, the complex field ℂ (consisting of complex numbers) is essential for describing quantum phenomena. Rational numbers form the rational field ℚ, while integers do not form a field. This is because most integers (except for 1) do not have multiplicative inverses that are also integers, violating the field axiom requiring the existence of multiplicative inverses for all nonzero elements.

Ordered Fields

An ordered field is a field equipped with a order relation (≤). The rational numbers (ℚ) and real numbers (ℝ) are examples of ordered fields.


Although I haven’t published many articles in 2023 and 2024, these years have been incredibly busy for me. The rapid advancements in AI have inspired me to write a book on Generative AI, a project that has been incredibly challenging and time-consuming. During the writing process, numerous interesting topics emerged that were either too tedious or not directly relevant to the core focus of the book. These topics may be a bit rough around the edges due to time constraints, but I still believe they offer insights as a reference. Rather than discarding them, I’ve decided to share them through a series of articles on Medium.

While the book isn’t finished yet, you can stay updated on its progress by following me on Medium or connecting with me on LinkedIn. I’ll share an announcement there as soon as it’s ready!


The notion of space in mathematics is abstract yet profoundly powerful. It starts with a set — a collection of objects usually called points or elements. But a set alone isn’t very interesting. The magic happens when we add different structures to the set, giving the points meaning and connections. This process of enhancing sets with various structures gives rise to a wide array of mathematical spaces, each possessing its own unique properties, and practical applications.

Space is a set that can be endowed with structures:

  • Algebraic Structure: It defines operations (like addition or multiplication) and rules (axioms) on the points within a space.
  • Relations: These specify the relationships between elements. For instance, in an ordered set, a relation determines whether one element is less than or greater than another.
  • Metrics (Distance Functions): These provide a numerical method to measure the distance or closeness between points in a space, enabling the study of concepts like convergence, compactness and continuity.
  • Topology: This defines a more general notion of closeness without necessarily relying on numerical distances.

Metric Space

Metric spaces, endowed with a distance function known as a metric, are often the first step in understanding mathematical spaces. The definition of a space is often written within brackets ⟨ ⟩ or parentheses ( ) to specify the name of the set and the particular structure applied to it.

M represents the underlying set of a metric space, which can consist of numbers, functions, sequences, or other mathematical objects. When the context is clear, we may also refer to the entire metric space as M. A metric d is a function that assigns a non-negative real number to every pair of elements, introducing the concept of “distance” between them. This structure allows for the analysis of distances and also enables the discussion of convergence and continuity.

Metric Space (Image by Author)

Common metrics include the Manhattan distance and the Euclidean (L2) distance.

However, a metric function must meet the following conditions for all x, y, and z in M:

When we set 𝑧 = 𝑥, they lead to the conclusion that 𝑑(𝑥 , 𝑦) is non-negative.

Hence, these three properties are equivalent to the properties below.

  • Non-negativity: 𝑑(𝑥,𝑦)≥0.
  • Symmetry: The distance is the same in both directions.
  • Triangle Inequality: The direct route is the shortest.

The broad definition of a metric allows for broad applicability and consistent manipulation of fundamental concepts. For example, the Wasserstein loss, used in generative AI for more efficient training, satisfies the criteria of a metric function. This allows us to apply the properties of a metric space to probability distributions without the need to create a new mathematical framework.


Sequences provide a foundational tool for studying concepts like convergence and limits. In abstract mathematical spaces, the familiar notion of a sequence as an ordered list of numbers is too limited. We need to re-establish this concept to accommodate other mathematical objects, such as functions, while preserving the essential idea of an ordered progression.

Let’s explore the concepts of convergence and limits within the context of metric spaces.

A sequence in X is

X is a mathematical space.

Convergency & Limit in Metric Space

In the context of metric spaces, a sequence is said to be convergent if the terms of the sequence approach a specific limit as the sequence progresses indefinitely. More formally, a sequence in a metric space 𝑋 converges to a limit 𝐿∈𝑋 if, for every positive number ϵ (no matter how small), there exists a natural number N such that for all 𝑛≥𝑁, the distance between the sequence​ and L is less than ϵ. This can be expressed mathematically as:

A sequence in a metric space converges if it approaches a specific limit that is part of the space X, meaning the sequence will have a limit LX.

This approach, however, relies on knowing the limit beforehand, which is not always the case. To address this issue, mathematicians have developed the concept of Cauchy sequences.

Cauchy Sequence

A Cauchy sequence is defined as a sequence where the elements become arbitrarily close to each other as the sequence progresses. For a sequence to qualify as a Cauchy sequence, for any given positive distance ϵ, there exists a point in the sequence beyond which the distance between any two elements is always less than ϵ.

Definition: For every positive real number ϵ (no matter how small it is), it exists a value N (a natural number, 1, 2, 3, …) where 𝑚,𝑛 ≥ 𝑁 and


Let’s examine a sequence in ℝ: 3, 3.1, 3.14, 3.141, …. This sequence successively adds one decimal place to the approximation of π. In this example, we use the usual metric 𝑑(𝑥,𝑦)=∣𝑥−𝑦∣. For 𝑚<𝑛, the difference between the m-th and n-th terms becomes progressively smaller than:

Hence, for any positive number ε, there exists an N such that for all m and n greater than N, the difference between the mth and nth elements is less than ε.


A convergent sequence is always a Cauchy sequence. However, not all Cauchy sequences converge. Take, for example, a Cauchy sequence consisting entirely of rational numbers from the set ℚ. Every term in this sequence is a rational number.

If the sequence has a limit x, then

However, no rational number can satisfy this condition. There is no limit within ℚ for this sequence, meaning it does not converge. To make the sequence complete, we can extend the space to include ℝ.

A metric space is said to be complete if every Cauchy sequence in that space converges to a limit that is also within the space, ensuring that no sequence “escapes” the space as it converges.

Working with incomplete metric spaces presents challenges. We might construct a sequence of approximate solutions, using an iterative method or a numerical method. As the sequence progresses, the approximate solutions become increasingly close to one another, forming a Cauchy sequence in the metric space. Ideally, we would want these approximations to converge to a limit, and then demonstrate that this limit is indeed a solution. However, this approach is only guaranteed to work if the underlying metric space is complete. Otherwise, we might need to extend the space.

Domain & Image

The domain of a function is the set of all possible input values for which the function is defined. It essentially tells you what you can input into the function. On the other hand, the image of a function (codomain) refers to the set of all output values that the function can produce when applied to every element in its domain.


Limits and continuity are fundamental building blocks in differential calculus. Cauchy sequences provide a way to define and analyze limits in the broader context of metric spaces. Let’s discuss the idea of continuity for functions between metric spaces.

A function f from one metric space X to another metric space Y is said to be continuous at a point 𝑥₀ in X if, for every ϵ>0, there exists a δ>0 such that for all x in X where the distance

This definition ensures that small changes in the input around 𝑥₀ result in small changes in the output around 𝑓(𝑥₀).

Continuity in deep learning is crucial for ensuring smooth changes in model outputs as inputs vary, which helps in stable training. It allows for the use of gradient-based optimization techniques, like backpropagation, essential for training neural networks effectively. Continuity also helps in generalization, preventing abrupt changes in predictions, leading to more reliable and interpretable models.


Dealing with infinite possibilities is challenging. Countability in a mathematical space primarily seeks to ensure a manageable and well-behaved structure. Countability conditions facilitate simplifications in analysis and topology, such as the existence of countable bases and the ability to approximate elements with finite sets.

A set is considered countable if its elements can be put into a one-to-one correspondence with the natural numbers (1, 2, 3, …). This means you can list the elements of the set in a sequence. Formally, a set is countable if there exists an injective function f : F → ℕ (natural number) that every element in F can map to a unique element in ℕ.

However, these sets can have infinitely many elements, as long as they can still be listed sequentially, like the set of even numbers, the set of integers or the set of rational numbers. In contrast, the set of real numbers between 0 and 1 is uncountable. These sets are larger than the set of natural numbers and cannot be put into a one-to-one correspondence with them.


Let ⟨𝑋, 𝑑⟩ be a metric space. A set 𝑌⊆𝑋 is dense in 𝑋 if, for every element 𝑥∈𝑋, there exists an element 𝑦∈𝑌 such that d(x, y) < ϵ for every 𝜖>0. Informally, this means that for any element outside of 𝑌, we can find an element within 𝑌 that is arbitrarily close to it. An example of a dense subset in ℝ is the set of rational numbers ℚ. To illustrate this, consider the decimal expansion of a real number:

While each individual element in the sequence is rational, the sequence itself converges to a real number. This demonstrates that any real number can be arbitrarily approximated by a rational number.


By definition, a metric space X is called separable if there is a countable set YX such that the closure of Y (the set of all points in X that are either in Y or arbitrarily close to points in Y) is X.

Intuitively, if a space is separable, every point of X can be approximated arbitrarily closely by points in the countable dense subset Y. This means that properties proven for Y can often be extended to the whole space X through this approximation. Separability is frequently a necessary condition for certain important theorems to hold. This property can simplify analysis and lead to more powerful implications for the entire space.

While the direct application of countable dense subsets to complex deep learning models can be challenging, their existence simplifies the justification and analysis of various techniques, such as dimensionality reduction, kernel design, approximation, and data representation.


An isomorphism is a structure-preserving mapping between two structures, which is both injective and reversible through an inverse mapping. An injective map (or one-to-one mapping) ensures that distinct elements are mapped to distinct elements.

F →G

Surjective mappings ensure that every element in the target set G is mapped to by at least one element from the domain set F.

F →G

If a mapping is both injective (one-to-one) and surjective (onto), it is classified as a bijection. This means that every element of the domain is mapped to exactly one element of the image set, and every element of the image set is the image of exactly one element from the domain, establishing a perfect one-to-one correspondence between all elements of the domain and image sets.

Isomorphisms, while not directly visible in the implementation of deep learning algorithms, play a crucial role in the underlying mathematical framework. They ensure the preservation of essential structural relationships between different mathematical spaces, which is fundamental for understanding how data transformations within neural networks affect inherent information.For instance, linear transformations in neural networks strive to preserve relationships between data points. Isomorphisms is crucial in representation learning, where the goal is to capture meaningful patterns while discarding irrelevant details. However, non-linear functions like ReLU, while essential for learning complex patterns, can potentially lead to some information loss due to their non-invertibility.


Preservation preserves operations. In the case of fields, it preserves addition and scalar multiplication. Specifically:

Let 𝜙 be a map from 𝐹 to G. In the context of a field, if 𝜙 obeys all the rules above, the map 𝜙 is an isomorphism.

An isometry between two metric spaces is a function that preserves distances. Specifically, if (𝑋, 𝑑𝑋) and (𝑌, 𝑑𝑌) are two metric spaces, a function 𝑓:𝑋→𝑌 is called an isometry if for all 𝑥,𝑥′∈𝑋, the following holds:

This means that the distance between any two points in 𝑋 is the same as the distance between their images in 𝑌, as measured in their respective metrics.

Open Sets & Closed Sets in Metric Spaces

Open and closed sets are the fundamental building blocks of mathematical spaces, providing the essential framework for developing more complex topological concepts. For instance, they are essential for defining convergence and continuity.

Open sets are sets that don’t include their boundaries, while closed sets contain all their boundary points. To aid in visualization and understanding, we will begin our discussion by exploring open and closed sets within the more familiar framework of metric spaces.

Let’s examine a subset AX, with an element x A.

Open Ball B (Image by Author)

We can construct an open ball 𝐵 centered at 𝑥 with a radius smaller than ϵ. This ball 𝐵 consists of all the elements:

Essentially, B includes x and its neighbors within a radius of ϵ. Visually, these neighbors of x might all lie within A, or some could extend beyond it.

Open Set & Boundary Points

A set A is considered open if, for every element x within A, there exists a sufficiently small radius ϵ such that all elements of the open ball B centered at x with this radius are contained entirely within A.

Open Set (Image by Author)

A boundary point of A is a point in X such that every open ball centered at that point contains elements both from A and from the complement of A (i.e., the set of points in X that are not in A).

Boundary Point (Image by Author)

A boundary point x is formally defined as follows:

where A is the complement of A. The set of all boundary points of 𝐴 is denoted as δA.

An open set does not contain any of its boundary points.

An open interval and an open circle

That is,

Closed Set & Closure

The definition of a closed set is simply its complement is open.


From another angle, a closed set contains all the boundary points.

The closure of a set A is formed by combining A with its boundary points. In the case of the rational numbers (ℚ) within the real numbers (ℝ), the closure of ℚ is the entire set of real numbers (ℝ).

If the closure of A is the same as A, A is a closed set.

Both the empty set ∅ and the entire set X are considered both open and closed.


Consider the open interval 𝐴 = (3, 6) on the real number line ℝ.

A is an open set. For any x in A, we can identify open balls centered at x such that all elements within these balls are contained in A. For example, we could choose ε to be half the distance from x to the nearest boundary point of A.

Let’s explore a more challenging example with subsets A and C included in X. Are A and C open or closed?

The element “0” and elements greater than “3” are not boundary points of A because they do not belong to the set X itself. The element “3” is also not a boundary point of A. No open balls centered at “3” contain elements outside of 𝐴 that are also within 𝑋.

This example highlights a crucial point: determining whether a set is open or closed requires considering both the set’s definition A and the definition of the underlying space X, which together determine the set’s boundary points. In this case, the definitions result in the surprising outcome that A has an empty set of boundary points. This leads to the conclusion that A is both closed and open. Thus, it’s important to note that open and closed sets are not always mutually exclusive categories. Indeed, a set that is both open and closed is referred to as a clopen set.

For the subset C, the element 2 is a boundary point. The closure of C is equal to C itself, which indicates that C is a closed set.

Continuous Functions

Many concepts in topology can be defined using either open sets or closed sets. Consider two metric spaces 𝐴 and 𝐶, with a mapping function 𝑓:𝐴→𝐶. Informally, the function 𝑓 is continuous at a point 𝑎∈𝐴 if for every open ball around 𝑓(𝑎) in 𝐶, there exists a corresponding open ball around a in A such that the image under f of this latter ball is contained within the open ball around 𝑓(𝑎).

Formally, a function f is continuous at a point 𝑎 if, for every 𝜖>0, there exists a 𝛿> such that

This means the image of the open ball of radius 𝛿 around 𝑎 is contained within the open ball of radius ϵ around 𝑓(𝑎), ensuring that small changes in the domain near a lead to small changes in the image at f(a).

A function f is continuous on its entire domain A if it is continuous at every point within A. However, in the example below, f is not continuous at the point 𝑎.

Sequential Continuity

A function 𝑇 is sequentially continuous at a point X if, for any sequence (𝑥𝑛) converging to , the sequence 𝑇(𝑥𝑛) converges to 𝑇().

In metric spaces, continuity and sequential continuity are equivalent.


The concept of finiteness can be tricky when dealing with spaces containing infinite-dimensional elements, like functions or infinite sequences. Compactness generalizes the notion of a set being ‘closed and bounded’ in Euclidean space to such spaces. (In Euclidean space, a set is closed if it contains all its limit points, and bounded if it can be contained within a ball of finite radius.) Even if a set contains infinite-dimensional elements, compactness gives it a certain ‘finiteness’ property.

Compactness is a crucial concept in many areas of mathematics for several reasons. Firstly, it often simplifies proofs and arguments by reducing infinite scenarios to finite ones, making complex problems more manageable. Secondly, it ensures that a continuous function on a compact set always has a highest and lowest value, which is a key idea in the Extreme Value Theorem. Lastly, every sequence in compact spaces possesses a subsequence that converges. This property is important in analysis, as it guarantees the existence of limits in a wide range of scenarios.

A topological space 𝑋 is called compact if every open cover of 𝑋 has a finite subcover. Let’s define open cover and finite subcover:

Open Cover and Finite Subcover (Image by Author)

A finite subcover is a smaller collection of open sets, selected from an initial open cover, that still covers the entire set. Compactness is a topological property that ensures that, for any open cover of a set, there always exists a finite subcover. In other words, no matter how you try to cover a compact space with open sets, you can always find a finite number of them that do the job. In retrospect, problems can be analyzed locally using a finite number of open sets, and the results can then be aggregated.

Topological Space


A topological space is a very general type of mathematical space that provides a framework for defining concepts like convergence, continuity, and compactness. It formally defines the notion of neighborhoods around points within a set, serving as a fundamental foundation for more advanced mathematical theories. It establishes essential but basic structures, which on their own have limited practical utility. Typically, additional structure and refinement are necessary to tailor a space for practical applications.

Unlike metric spaces, which rely on a distance function to define closeness, topological spaces are built upon the concept of open sets. This means that topological spaces don’t possess a notion of distance between points, offering a less structured framework than metric spaces. Instead, they focus on the concept of neighboring points.

Topological spaces consists of two main components:

  1. A set of points X: This can be any collection of objects, such as numbers, shapes, or even more abstract entities.
  2. A topology τ: This is a collection of subsets of the set of points, called open sets that satisfy certain properties.

The open sets 𝜏 must satisfy the following axioms:

  1. The empty set ∅ and the entire set X itself are included in τ. This axiom ensures that there are at least two open sets in any topology, the minimum needed.
  2. The union of any collection of open sets in a topological space 𝜏 also belongs to 𝜏, meaning the resulting union is itself an open set.
  3. The intersection of any finite number of open sets in a topological space 𝜏 also belongs to 𝜏, meaning the resulting intersection is itself an open set.

A subset A of X is closed if and only if its complement, Aᶜ = X \ A, is open.

For a given set 𝑋={1,2,3,4}, the topology on 𝑋 can range from the simplest to the most complex depending on the number of subsets included as open sets. The simplest possible topology on any set is the trivial topology. For the set 𝑋, this topology would include only the minimal required subsets:

The most complex topology on any set is the discrete topology, where every possible subset of 𝑋 is considered an open set:

Many real-world problems involve spaces where only certain kinds of subsets are relevant or meaningful for analysis. Intermediate topologies strike a balance between too little structure (trivial topology) and too much granularity (discrete topology), making them particularly suited for detailed yet manageable analysis in both theoretical and applied mathematics.

Given a topological space X and a point p in X, a neighborhood of p is a subset V of X that contains an open set U such that

Every open set is a neighborhood of each of its points. (Note that V itself is not required to be an open set.)

With the definition of a neighborhood, the definitions for convergence, continuity, and compactness in a topological space are as follows:

Topological Isomorphism

Topology focuses on the idea of neighborhoods. While metric spaces include the concept of distance, topological spaces, being more general and abstract, do not. In topology, a teacup and a donut are considered homeomorphic, meaning they are topologically equivalent. The two shapes can be continuously deformed into each other without cutting or gluing. We can gradually deform the cup, widening its handle to form the donut’s ring. This deformation, while changing the distance between points, preserves the essential neighboring relationships that are the focus of topology. However, a teacup cannot be transformed into a bowl, as this would require punching a hole and disrupting the established neighboring relationships.

Topological Isomorphism (Image by Author)

A topological isomorphism, also known as a homeomorphism, is a continuous function between two topological spaces that preserves the topological structure. It’s a bijection, meaning it’s both one-to-one (injective) and onto (surjective), and both the function and its inverse are also continuous. If such a function exists, the two spaces are said to be homeomorphic, or topologically equivalent.

The concept of homeomorphism is fundamental in topology because it allows mathematicians to classify and study spaces based on their intrinsic topological properties rather than their specific geometric shapes. This abstraction helps in understanding and solving complex problems across different areas of mathematics and science.

Bases of Open Sets

In topology, open sets are fundamental to understanding the structure of topological spaces. However, explicitly defining all open sets can be cumbersome. The concept of a basis offers a solution. A basis for a topological space is a smaller collection of open sets with a special property: every open set in the topology can be formed by taking unions of sets from the basis. Essentially, the basis acts as a set of building blocks from which all other open sets can be constructed. A basis for a topology is a collection of open sets that can be used to generate all other open sets in the space.

Example: Standard Topology on the Real Line ℝ

The standard topology on the real line ℝ is the topology generated by the collection of all open intervals in the real line. It is generated by a basis consisting of all open intervals (a, b) where a<b and 𝑎,𝑏∈ℝ. This means that any open set in this topology can be formed by taking the union of (possibly infinitely many) open intervals.