Understanding t-SNE/UMAP

t-SNE and UMAP are great visualisation techniques. However they have a few pitfalls in term of representing high-level structure of the data. Here are a few plots of known distributions and their reduction to 2d from high dimension.

  1. Data is generated in 2d (left most column)
  2. Data is generated in 10d, mapped to 2d with PCA
  3. Data is the same 10d data, mapped to 2d with t-SNE
  4. Data is the same 10d data, mapped to 2d with UMAP (right most column)

The plots were generated using sklearn, umap-learn, gnuplot and the following script.


10 Random gaussian clusters of fixed size, fixed number of elements

10 Random gaussian clusters of random size, fixed number of elements

9 Random gaussian clusters of fixed size, varying number of elements (2^n)

Uniform hypercube

Uniform hypersphere surface

Normal distribution

20 Random angular lines from origin

20 Random lines with same number of elements

20 Random parallel lines