Simple Geometry Initiation


This is an easy-to-follow intro to hyperbolic geometry.

If you know nothing about it, this is for you! 🙂

The only parts which may require basic differential geometry are these

Some explanation potentially requiring knowledge of tangent space, metric tensor, geodesic, exponential map, etc.


What’s a geometry?


It is commonly defined by a

Consider a set \mathcal{X}. A distance function d maps two points x,y\in\mathcal X to a non-negative d(x,y), such that it satisfies the following properties:

  1.  Symmetry: d(x,y)=d(y,x),
  2.  Separation: d(x,y)=0 if and only if x=y,
  3. Triangle inequality: d(x,y)\leq d(x,z)+d(z,y),

for all x,y,z\in\mathcal X.

Figure 1: Illustration of the triangle inequality.


There are many

  1. Euclidean geometry: \mathcal X = \mathbb{R}^n and  d(x,y)=\parallel x-y\parallel_2=\sqrt{\sum_{i=1}^n(x_i-y_i)^2}.
  2. Connected graph geometry: \mathcal X is the set of vertices and d(x,y) is the length of a shortest-path, with potentially weighted edges.
  3. Hamming geometry: \mathcal X is a set of strings of equal length and d(x,y) is the number of positions at which the corresponding symbols are different.
Figure 2: In this connected graph, the graph distance matches the hamming distance.


A set \mathcal X together with a distance function d is called a metric space.


We consider the geometries defined by two metric spaces (\mathcal X,d) and (\mathcal X',d')  to be equivalent if they are

We say that (\mathcal X,d) and (\mathcal X',d') are isometric if there exists a one-to-one mapping f from \mathcal X onto \mathcal X' preserving all distances:

\forall x,y\in\mathcal X,\ d(x,y)=d'(f(x),f(y)),

and f is called an isometry. This definition is quite intuitive.


The hyperbolic space is a particular case of metric space with many interesting properties that can be very powerful for data representations in machine learning.


What’s hyperbolic geometry?


In differential geometry, the spaces that people study are called manifolds, a sort of high-dimensional generalization of curved surfaces.

Figure 3: Example of a manifold.


Each point of such a manifold can be assigned a curvature. When the curvature is constant, it is either everywhere positive, zero or negative. This gives rise to three types of geometry: elliptic, Euclidean and hyperbolic respectively, usually considered as the 

  • Elliptic geometry (positive curvature) is found in spheres.
    1. The sum of angles in a triangle is always greater than 180°.
    2. Two lines orthogonal to another one must intersect.
Figure 4: Tessellation of a sphere with triangles.
  • Euclidean geometry (zero curvature) is well known.
    1. The sum of angles in a triangle is always exactly 180°.
    2. In a plane, given a line and a point not on it, at most one line parallel to the given line can be drawn through the point.
  • Hyperbolic geometry (negative curvature) is less intuitive.
    1. The sum of angles in a triangle is always lesser than 180°.
    2. For any given line and point not on the line, in the plane containing both the line and the point, there are at least two distinct lines through the point that do not intersect the line.
Figure 5: Triangle drawn on a surface of negative curvature.



There exist 5 models of hyperbolic space, i.e. ways to represent it and work with it. The most commonly used in machine learning are the Poincaré (disk) model and the Lorentz (hyperboloid) model.


Their sets and distance functions are defined in the table below.


ModelSet Distance function


Where \langle x, y\rangle_{\mathcal{L}} = -x_0 y_0 + \sum_{i=1}^n x_i y_i is called the Lorentz inner-product, and \lambda_x = 2/(1\ -\ \parallel x\parallel_2^2) is called the conformal factor. These are also useful to define the

  • In the n-dimensional PoincarĂ© model, for any x\in\mathbb{D}^n, the tangent space at x is given by

T_x \mathbb{D}^n=\mathbb{R}^n,

and the metric tensor, for u,v\in T_x\mathbb{D}^n, by

g_x(u,v)=\lambda_x^2 \langle u,v\rangle.

Note that the Poincaré model is conformal, i.e. defines the same angles as in Euclidean space, since:

\frac{g_x(u,v)}{\sqrt{g_x(u,u)}\sqrt{g_x(v,v)}}=\frac{\langle u,v\rangle}{\parallel u\parallel \parallel v\parallel},

for all u,v\in T_x\mathbb{D}^n\setminus\lbrace 0\rbrace.

  • In the n-dimensional Lorentz model, for any x\in\mathbb{H}^n, the tangent space at x is given by

T_x \mathbb{H}^n = \lbrace v\in\mathbb{R}^{n+1},\ \langle x,v\rangle_{\mathcal{L}}\ =\ 0\rbrace,

and the metric tensor, for u,v\in T_x \mathbb{H}^n, by

g_x(u,v)=\langle u,v\rangle_{\mathcal{L}}.


These two models are equivalent, i.e. they are isometric. See Figure 6 below.


Figure 6: The half-line emerging from point (0,0,-1) maps the Poincaré disk to the Lorentz hyperboloid, while preserving distances (isometry).


It is hence easy to switch from one model of hyperbolic space, to another.


Now that we have defined hyperbolic spaces, let’s try to understand them. For intuition, the PoincarĂ© model is the friendliest one. The following gif illustrates how “straight-lines” look like in the PoincarĂ© disk.


Figure 7: Hyperbolic lines in the Poincaré disk with same hyperbolic length.


As can be seen, around the origin of the disk, these lines look straight, as in Euclidean space.


However, the closer they get to the border, the stronger they bend.


Note that these lines are geodesics, i.e. they are shortest paths between the two points that they connect, and their length in hyperbolic space gives the hyperbolic distance between the two points.


Let’s have a closer look at the

d(x,y)=\cosh^{-1}\left(1+\frac{1}{2}\lambda_x\lambda_y\parallel x-y\parallel_2^2\right)


Remember that for u\geq 0, we have \cosh^{-1}(1+u)=\log(1+u+\sqrt{u^2+2u}), which is equivalent to \sqrt{2u} around 0, and to \log(1+2u) towards \inft .


Further notice how the conformal factor \lambda_x =2/(1\ -\ \parallel x\parallel_2^2) goes to 2 around the origin and grows to \infty when x approaches the border.


If x,y are close to the origin, then


d(x,y)\approx 2 \parallel x-y\parallel_2


This shows that close to the origin, the hyperbolic space resembles Euclidean geometry.


The hyperbolic nature of the space is more prominent at the border of the disk, which corresponds to points at infinity in the Lorentz model, according to Figure 6.


Figure 8: All edges of this graph have same hyperbolic length. The space looks more tree-like and has more capacity at the border.

Moreover, in the Lorentz model \mathbb{H}^n, it is known that



The volume of a ball grows exponentially with the radius!


In comparison, the volume of B(0,r) in \mathbb{R}^n grows only proportionally to r^n.


This exponential volume growth of hyperbolic spaces is reminiscent of trees.


Indeed, a binary tree of depth $n$ contains 2^n nodes.


This dilatation of distances can also be understood intuitively from the negative curvature of the space.


Figure 9: Hyperbolic paraboloid.

Consider the surface in Figure 9. It is negatively curved.


This means that it is stretched in opposite directions.


Now consider two rays of light emerging from the origin, shooting in different directions, but remaining inside the surface.


Because of negative curvature, they will go apart much faster if their respective directions are orthogonal, than if they are co-linear.


As a byproduct, the shortest path between the end points of these two rays of light will almost go near the origin.


This is similar to what the metric would look like in a tree, in which joining two leaves requires passing through the closest parent.


Similarly, one can think of a hyperbolic space as stretching the metric in opposite directions, connecting together the concepts of exponential volume growth, continuous tree-likeness and negative curvature…



What’s better in hyperbolic geometry?


Because hyperbolic spaces possess this exponential volume growth property, they are better suited to embed tree-like graphs, or any kind of data with an underlying


In particular, they have been widely used to better visualize large hierarchies (Figure 10, from this paper).


Figure 10: Four different embeddings of the same hierarchy in the Poincaré disk.



Other common applications range from representing phylogenetic trees of DNA sequences in bioinformatics, to efficiently routing information in complex networks (see Papers and Code for more references).


More recently, hyperbolic spaces have been used in machine learning by Nickel & Diela, to embed word hierarchies, obtaining new state-of-the-art results in word hypernymy detection, i.e. given two words, predicting if one is a subconcept of the other.


But is hyperbolic geometry only useful for hierarchical data?


No! It turns out that its power applies to any kind of data with an


Indeed, some recent work on the hyperbolic geometry of complex networks explains that the presence of an underlying hyperbolic geometry in a symbolic graph is equivalent to the graph possessing a heterogeneous topology, meaning that its nodes can be somehow classified into a taxonomy of elements, i.e. clustered into groups, and these groups being split into smaller subgroups, etc…


Notice how general this is, since the tree-structure only needs to be approximate!


Let’s have a simple thought experiment.

Consider a toy language obtained by finite sequences of tokens taken from a vocabulary of size k. The number of sentences of size n is k^n, growing exponentially with its length.

Roughly speaking, if we see smaller sentences as more generic, and longer ones telling a more specific story, we naturally obtain a graph of entailment relations between sentences, together with a heterogeneous taxonomy between nodes.


Approximate hierarchical structures are more prominent than they seem to be.


At this stage, you may wonder:


Could we use hyperbolic embeddings in downstream task, i.e. as inputs to neural networks?

Could we design neural networks with hidden states in hyperbolic space?


The answer in the next episode… 🙂


Leave a Reply

Your email address will not be published. Required fields are marked *