Principal geodesic analysis
Table of contents
Resources
- Principal Geodesic Analysis for the Study of Nonlinear Statistics of Shape by P. Thomas Fletcher et al. is the original paper introducing PGA.
- How to Use Quaternions in Industrial Robotics? by Mecademic is a practical introduction to quaternions for robotics.
- Fréchet Mean and Tangent PCA from the geomstats documentation is a hands-on walkthrough of tangent PCA.
What is PGA?
Principal Geodesic Analysis (PGA) generalizes PCA to data that lives on curved spaces (Riemannian manifolds). The classic example is 3D rotation data — unit quaternions lie on a sphere, and naively applying PCA to raw quaternions ignores this geometry.
PGA works by:
- Computing the Fréchet mean on the manifold
- Log-mapping all data points to the tangent space at the mean
- Running PCA in that tangent space
- Optionally Exp-mapping results back to the manifold
Use PGA when your data represents orientations, rotations, or other objects on a manifold.
Data
We’ll use a dataset of robot IMU quaternion measurements on different surfaces. Each row is a unit quaternion (qw, qx, qy, qz) in scalar-first convention.
import prince
dataset = prince.datasets.load_rotations()
dataset.head()
| qw | qx | qy | qz | surface | |
|---|---|---|---|---|---|
| 0 | 0.993069 | -0.027934 | 0.081096 | 0.080356 | concrete |
| 1 | 0.992538 | -0.096088 | -0.072483 | 0.019539 | concrete |
| 2 | 0.997917 | 0.026264 | 0.023723 | -0.053941 | concrete |
| 3 | 0.989711 | 0.115557 | 0.083048 | 0.014901 | concrete |
| 4 | 0.987688 | 0.133990 | 0.059817 | -0.054223 | concrete |
Fitting
The PGA estimator follows the familiar fit/transform API.
pga = prince.PGA(
n_components=3,
manifold='SO3',
input_format='quaternion',
random_state=42
)
pga = pga.fit(dataset)
The Fréchet mean is the intrinsic average rotation on SO(3):
pga.frechet_mean_
array([0.9992735 , 0.02122122, 0.02001992, 0.02452203])
Eigenvalues
Eigenvalues represent the variance captured by each principal geodesic component.
pga.eigenvalues_summary
| eigenvalue | % of variance | % of variance (cumulative) | |
|---|---|---|---|
| component | |||
| 0 | 0.044 | 39.00% | 39.00% |
| 1 | 0.041 | 36.30% | 75.30% |
| 2 | 0.028 | 24.70% | 100.00% |
pga.percentage_of_variance_
array([39.00078371, 36.30360397, 24.69561232])
pga.scree_plot()
Coordinates
The transform method projects data into the tangent space principal components.
pga.transform(dataset).head()
| component | 0 | 1 | 2 |
|---|---|---|---|
| 0 | 0.169169 | -0.073644 | 0.056543 |
| 1 | -0.147218 | -0.210429 | 0.154167 |
| 2 | -0.077323 | 0.117207 | 0.071317 |
| 3 | 0.085360 | 0.177164 | -0.116648 |
| 4 | -0.029520 | 0.271355 | -0.090446 |
pga.transform(dataset).equals(pga.row_coordinates(dataset))
True
Visualization
Plot the projections colored by surface type.
pga.plot(
dataset,
x_component=0,
y_component=1,
color_rows_by='surface',
show_row_markers=True,
show_row_labels=False
)
Inverse transformation
With all components, inverse_transform reconstructs the original quaternions.
import numpy as np
from scipy.spatial.transform import Rotation
coords = pga.transform(dataset)
reconstructed = pga.inverse_transform(coords)
# Measure reconstruction error as geodesic distance
original = dataset[['qw', 'qx', 'qy', 'qz']].to_numpy()
recon = reconstructed[['qw', 'qx', 'qy', 'qz']].to_numpy()
r1 = Rotation.from_quat(original[:, [1, 2, 3, 0]])
r2 = Rotation.from_quat(recon[:, [1, 2, 3, 0]])
errors = (r1.inv() * r2).magnitude()
print(f'Mean geodesic error: {errors.mean():.6f} rad')
print(f'Max geodesic error: {errors.max():.6f} rad')
Mean geodesic error: 0.000000 rad
Max geodesic error: 0.000000 rad
Why not just PCA on raw quaternions?
Naively applying PCA to quaternion columns ignores the manifold structure. The resulting components are not guaranteed to be valid rotations, and distances in quaternion space don’t correspond to geodesic distances on SO(3).
import pandas as pd
# Naive PCA on raw quaternion columns
quat_df = dataset[['qw', 'qx', 'qy', 'qz']]
naive_pca = prince.PCA(n_components=3, random_state=42).fit(quat_df)
print('Naive PCA on quaternions:')
print(naive_pca.eigenvalues_summary)
print()
print('PGA (manifold-aware):')
print(pga.eigenvalues_summary)
Naive PCA on quaternions:
eigenvalue % of variance % of variance (cumulative)
component
0 1.533 38.32% 38.32%
1 1.152 28.80% 67.12%
2 0.803 20.07% 87.19%
PGA (manifold-aware):
eigenvalue % of variance % of variance (cumulative)
component
0 0.044 39.00% 39.00%
1 0.041 36.30% 75.30%
2 0.028 24.70% 100.00%
# Naive PCA reconstructions are not valid unit quaternions
naive_coords = naive_pca.transform(quat_df)
naive_recon = naive_pca.inverse_transform(naive_coords)
norms = np.linalg.norm(naive_recon.values, axis=1)
print(f'Naive PCA reconstruction quaternion norms: min={norms.min():.4f}, max={norms.max():.4f}')
print('(Should be 1.0 for valid unit quaternions)')
Naive PCA reconstruction quaternion norms: min=0.9779, max=1.0959
(Should be 1.0 for valid unit quaternions)
