Visualization of Poincaré Embeddings

M. Nickel and D. Kiela proposed Poincaré Embeddings for learning hierarchical representations (NIPS’17).

One of their tasks consists in embedding words in the Poincaré ball as to preserve entailment links.

Consider a symbolic dataset of words and directed edges (u,v) between them, indicating that u is a subconcept of v. Optimize the following

Training loss

\mathcal{L}(\Theta) = \sum_{(u,v)\in\mathcal D}\log\left(\frac{e^{-d(u,v)}}{\sum_{w\in\mathcal{N}(u)} e^{-d(u,w)}} \right)

\Theta denotes word embeddings parameters,
\mathcal D denotes observed entailment links (u,v),
\mathcal N(u):= \{v\ \mid\ (u,v)\ \notin\ \mathcal D\}\ \cup\ \lbrace u\rbrace is the set of negative parents of u.

In order to better understand the training dynamics induced by this model, we decided to visualize it in 2D, for the mammal subtree of the WordNet hierarchy.

Hyperbolic case

Using the Poincaré disk distance for d, and optimizing parameters using Riemannian SGD, for embeddings in dimension 2, yields the following dynamics:

Euclidean case

In comparison, using the Euclidean distance for d and optimizing parameters using SGD, for embeddings in dimension 2, yields the following dynamics:

In both simulations, initialization was done with a burn-in phase, i.e. with small learning rate, as suggested by Nickel and Kiela.