--

# Kernel

In machine learning (ML), kernel machines utilize linear classifiers, such as SVMs, to solve nonlinear problems. They achieve this by employing a kernel function, which implicitly maps data into a higher-dimensional space, making it linearly separable.

Let X be a non-empty set, and a kernel k is defined as:

Under this definition, a kernel function k: X x X → ℝ measures the similarity between data points x and x’. For k to be a valid kernel, it must satisfy the condition k(x, x’) = ⟨φ(x), φ(x’)⟩, where φ is a mapping from the input space X to a feature space H, and ⟨・ ,・⟩ represents the inner product in the Hilbert space H. This means the kernel value is equivalent to the inner product of the data points’ representations in the feature space.

However, unlike many other ML algorithms that require explicit transformation of input data into feature vectors using a feature map, kernel methods directly compute similarity using a user-specified kernel function. Even though this kernel function implicitly operates in a high-dimensional, potentially infinite-dimensional feature space, the kernel trick allows us to avoid explicitly computing the feature representations. This technique helps us bypass the significant computational expense associated with such calculations. This implicit operation allows kernel methods to efficiently analyze complex data relationships and facilitate effective pattern recognition and analysis.

# Function Space

In functional analysis, f (・) denotes a function itself, while f(x) represents the specific value that function takes at the input x.

Consider a function f that takes an input x, where x is a vector defined as (x₁, x₂).

The function f (・) is an element of a function space mapping ℝ² to ℝ. In this example, we can represent f (・) as ℝ³ within this function space.

Given the function

The linear functional f can be represented as:

In certain function spaces, such as those used in machine learning with kernel methods, functions can be written as linear combinations of the features 𝜙(𝑥). The evaluation of f at x, 𝑓(𝑥), is an inner product in the feature space:

For example, f evaluated at x = (-1, 4) equals

# Feature Space in Infinite Dimension

This concept naturally extends to infinite-dimensional feature spaces. For instance, we can expand the exponential function using its Taylor series representation.

and e³ becomes

Here, the feature space is in an infinite dimension.

# Evaluation Functional

Functionals map vectors to scalars. An evaluation functional L at x X is defined as the map:

This definition broadens our perspective as L is a functional parameterized by x that takes a function f as input and produces a real number ℝ.

Linear functional is a functional that preserves the operations of vector addition and scalar multiplication, as illustrated below:

# Dual Space

The dual space of a vector space V is the set of all linear functionals on V.

Consider the familiar 3D space, denoted as ℝ³, where each point is represented by a vector with three components (x, y, z). This is our original vector space V. A linear functional on ℝ³ is a linear map that takes a 3D vector as input and produces a single real number as output. For example, a linear functional could be defined as: f(x, y, z) = 2x+3y + 4z. The dual space of ℝ³, denoted as V*, is the set of all possible linear functionals on ℝ³. Each functional in this dual space can be uniquely represented by a 3D vector. For example, the functional f(x, y, z) = 2x+3y + 4z can be represented by the vector (2, 3, 4).

# Riesz Representation Theorem

The Riesz Representation Theorem states that every continuous linear functional L on a Hilbert space can be represented as an inner product with a fixed element in F. Formally, for any continuous linear functional 𝐿 on a Hilbert space F ’,

The Riesz Representation Theorem bridges the abstract world of linear functionals with the more familiar concept of inner products, allowing us to understand and manipulate linear functionals using the geometric intuition of inner products. In essence, the Riesz Representation Theorem states that for every continuous linear functional on a Hilbert space H, there exists a unique element within that same space H that fully characterizes the functional. This means the linear functional can be thought of as an inner product with a fixed “representative” vector. This establishes a fundamental connection between linear functionals and inner products in Hilbert spaces. In the context of Reproducing Kernel Hilbert Spaces (RKHS), this concept is extended to show that the evaluation of functions can be computed using inner products with kernel functions.

# Reproducing Kernel

Combining the evaluation functional with the Riesz Representation Theorem, we obtain:

Any function fF evaluated at x can be represented as an inner product between f and a unique a unique functional kₓ F, where kₓ can be written as:

Let’s replace x with y, we get

Since kₓ F, let’s replace fF with kₓ and we get:

This essentially brings us back to the idea of a kernel.

In summary, the evaluation functional Lₓ can be written with k( ⋅ , x ) as

It can be computed as the inner product of f with a kernel function in the Hilbert space 𝐻. The function 𝑘(⋅, 𝑥) or kₓ is called the reproducing kernel.

Hence,

Here is the notation used, which all referring to the same object:

# Reproducing Kernel Hilbert Spaces (RKHS)

A Reproducing Kernel Hilbert Space (RKHS) is a Hilbert space of functions in which evaluation at each point is a continuous linear functional. This means that for any function f in the space and any point 𝑥, there exists a kernel function k such that 𝑓(𝑥) = ⟨𝑓,𝑘(⋅, 𝑥)⟩. The kernel function 𝑘(⋅,𝑥), known as the reproducing kernel, acts as a bridge between the function and the feature space, allowing us to evaluate the function at specific points.

In essence, f(x) can be represented as the inner product of f with the reproducing kernel function k(⋅, x). The function k(⋅, x) is called the reproducing kernel. Conceptually, the evaluation function can be computed by the inner product of f and the feature space representation of x.

# Example: Approximating a Function using RKHS

Suppose we have a set of data points and we want to approximate a function that fits these points using the Gaussian kernel.

The final steps involve solving for α and using it to compute f(x).

So, the approximate value of the function at x = 1.5 is f(1.5) ≈ 3.05.