Principal geodesic analysis

Resources

Principal Geodesic Analysis for the Study of Nonlinear Statistics of Shape by P. Thomas Fletcher et al. is the original paper introducing PGA.
How to Use Quaternions in Industrial Robotics? by Mecademic is a practical introduction to quaternions for robotics.
Fréchet Mean and Tangent PCA from the geomstats documentation is a hands-on walkthrough of tangent PCA.

What is PGA?

Principal Geodesic Analysis (PGA) generalizes PCA to data that lives on curved spaces (Riemannian manifolds). The classic example is 3D rotation data — unit quaternions lie on a sphere, and naively applying PCA to raw quaternions ignores this geometry.

PGA works by:

Computing the Fréchet mean on the manifold
Log-mapping all data points to the tangent space at the mean
Running PCA in that tangent space
Optionally Exp-mapping results back to the manifold

Use PGA when your data represents orientations, rotations, or other objects on a manifold.

Data

We’ll use a dataset of robot IMU quaternion measurements on different surfaces. Each row is a unit quaternion (qw, qx, qy, qz) in scalar-first convention.

import prince

dataset = prince.datasets.load_rotations()
dataset.head()

	qw	qx	qy	qz	surface
0	0.993069	-0.027934	0.081096	0.080356	concrete
1	0.992538	-0.096088	-0.072483	0.019539	concrete
2	0.997917	0.026264	0.023723	-0.053941	concrete
3	0.989711	0.115557	0.083048	0.014901	concrete
4	0.987688	0.133990	0.059817	-0.054223	concrete

Fitting

The PGA estimator follows the familiar fit/transform API. The rescale_with_mean and rescale_with_std parameters control whether centering and standardization are applied to the PCA in the tangent space. By default, rescale_with_mean is True and rescale_with_std is False.

pga = prince.PGA(
    n_components=3,
    rescale_with_mean=True,
    rescale_with_std=False,
    manifold='SO3',
    input_format='quaternion',
    random_state=42
)
pga = pga.fit(dataset)

The Fréchet mean is the intrinsic average rotation on SO(3):

pga.frechet_mean_

array([0.9992735 , 0.02122122, 0.02001992, 0.02452203])

Eigenvalues

Eigenvalues represent the variance captured by each principal geodesic component.

pga.eigenvalues_summary

	eigenvalue	% of variance	% of variance (cumulative)
component
0	0.044	39.00%	39.00%
1	0.041	36.30%	75.30%
2	0.028	24.70%	100.00%

pga.percentage_of_variance_

array([39.00078371, 36.30360397, 24.69561232])

pga.scree_plot()

Coordinates

The transform method projects data into the tangent space principal components.

pga.transform(dataset).head()

component	0	1	2
0	0.169169	-0.073644	0.056543
1	-0.147218	-0.210429	0.154167
2	-0.077323	0.117207	0.071317
3	0.085360	0.177164	-0.116648
4	-0.029520	0.271355	-0.090446

pga.transform(dataset).equals(pga.row_coordinates(dataset))

True

Visualization

Plot the projections colored by surface type.

pga.plot(
    dataset,
    x_component=0,
    y_component=1,
    color_rows_by='surface',
    show_row_markers=True,
    show_row_labels=False
)

Inverse transformation

With all components, inverse_transform reconstructs the original quaternions.

import numpy as np
from scipy.spatial.transform import Rotation

coords = pga.transform(dataset)
reconstructed = pga.inverse_transform(coords)

# Measure reconstruction error as geodesic distance
original = dataset[['qw', 'qx', 'qy', 'qz']].to_numpy()
recon = reconstructed[['qw', 'qx', 'qy', 'qz']].to_numpy()

r1 = Rotation.from_quat(original[:, [1, 2, 3, 0]])
r2 = Rotation.from_quat(recon[:, [1, 2, 3, 0]])
errors = (r1.inv() * r2).magnitude()

print(f'Mean geodesic error: {errors.mean():.6f} rad')
print(f'Max geodesic error:  {errors.max():.6f} rad')

Mean geodesic error: 0.000000 rad
Max geodesic error:  0.000000 rad

Why not just PCA on raw quaternions?

Naively applying PCA to quaternion columns ignores the manifold structure. The resulting components are not guaranteed to be valid rotations, and distances in quaternion space don’t correspond to geodesic distances on SO(3).

import pandas as pd

# Naive PCA on raw quaternion columns
quat_df = dataset[['qw', 'qx', 'qy', 'qz']]
naive_pca = prince.PCA(n_components=3, random_state=42).fit(quat_df)

print('Naive PCA on quaternions:')
print(naive_pca.eigenvalues_summary)
print()
print('PGA (manifold-aware):')
print(pga.eigenvalues_summary)

Naive PCA on quaternions:
          eigenvalue % of variance % of variance (cumulative)
component                                                    
0              1.533        38.32%                     38.32%
1              1.152        28.80%                     67.12%
2              0.803        20.07%                     87.19%

PGA (manifold-aware):
          eigenvalue % of variance % of variance (cumulative)
component                                                    
0              0.044        39.00%                     39.00%
1              0.041        36.30%                     75.30%
2              0.028        24.70%                    100.00%

# Naive PCA reconstructions are not valid unit quaternions
naive_coords = naive_pca.transform(quat_df)
naive_recon = naive_pca.inverse_transform(naive_coords)
norms = np.linalg.norm(naive_recon.values, axis=1)
print(f'Naive PCA reconstruction quaternion norms: min={norms.min():.4f}, max={norms.max():.4f}')
print('(Should be 1.0 for valid unit quaternions)')

Naive PCA reconstruction quaternion norms: min=0.9779, max=1.0959
(Should be 1.0 for valid unit quaternions)

Prince foo