Prince foo

Principal geodesic analysis

Table of contents

Resources

What is PGA?

Principal Geodesic Analysis (PGA) generalizes PCA to data that lives on curved spaces (Riemannian manifolds). The classic example is 3D rotation data — unit quaternions lie on a sphere, and naively applying PCA to raw quaternions ignores this geometry.

PGA works by:

  1. Computing the Fréchet mean on the manifold
  2. Log-mapping all data points to the tangent space at the mean
  3. Running PCA in that tangent space
  4. Optionally Exp-mapping results back to the manifold

Use PGA when your data represents orientations, rotations, or other objects on a manifold.

Data

We’ll use a dataset of robot IMU quaternion measurements on different surfaces. Each row is a unit quaternion (qw, qx, qy, qz) in scalar-first convention.

import prince

dataset = prince.datasets.load_rotations()
dataset.head()

qwqxqyqzsurface
00.993069-0.0279340.0810960.080356concrete
10.992538-0.096088-0.0724830.019539concrete
20.9979170.0262640.023723-0.053941concrete
30.9897110.1155570.0830480.014901concrete
40.9876880.1339900.059817-0.054223concrete

Fitting

The PGA estimator follows the familiar fit/transform API.

pga = prince.PGA(
    n_components=3,
    manifold='SO3',
    input_format='quaternion',
    random_state=42
)
pga = pga.fit(dataset)

The Fréchet mean is the intrinsic average rotation on SO(3):

pga.frechet_mean_
array([0.9992735 , 0.02122122, 0.02001992, 0.02452203])

Eigenvalues

Eigenvalues represent the variance captured by each principal geodesic component.

pga.eigenvalues_summary

eigenvalue% of variance% of variance (cumulative)
component
00.04439.00%39.00%
10.04136.30%75.30%
20.02824.70%100.00%
pga.percentage_of_variance_
array([39.00078371, 36.30360397, 24.69561232])
pga.scree_plot()

Coordinates

The transform method projects data into the tangent space principal components.

pga.transform(dataset).head()

component012
00.169169-0.0736440.056543
1-0.147218-0.2104290.154167
2-0.0773230.1172070.071317
30.0853600.177164-0.116648
4-0.0295200.271355-0.090446
pga.transform(dataset).equals(pga.row_coordinates(dataset))
True

Visualization

Plot the projections colored by surface type.

pga.plot(
    dataset,
    x_component=0,
    y_component=1,
    color_rows_by='surface',
    show_row_markers=True,
    show_row_labels=False
)

Inverse transformation

With all components, inverse_transform reconstructs the original quaternions.

import numpy as np
from scipy.spatial.transform import Rotation

coords = pga.transform(dataset)
reconstructed = pga.inverse_transform(coords)

# Measure reconstruction error as geodesic distance
original = dataset[['qw', 'qx', 'qy', 'qz']].to_numpy()
recon = reconstructed[['qw', 'qx', 'qy', 'qz']].to_numpy()

r1 = Rotation.from_quat(original[:, [1, 2, 3, 0]])
r2 = Rotation.from_quat(recon[:, [1, 2, 3, 0]])
errors = (r1.inv() * r2).magnitude()

print(f'Mean geodesic error: {errors.mean():.6f} rad')
print(f'Max geodesic error:  {errors.max():.6f} rad')
Mean geodesic error: 0.000000 rad
Max geodesic error:  0.000000 rad

Why not just PCA on raw quaternions?

Naively applying PCA to quaternion columns ignores the manifold structure. The resulting components are not guaranteed to be valid rotations, and distances in quaternion space don’t correspond to geodesic distances on SO(3).

import pandas as pd

# Naive PCA on raw quaternion columns
quat_df = dataset[['qw', 'qx', 'qy', 'qz']]
naive_pca = prince.PCA(n_components=3, random_state=42).fit(quat_df)

print('Naive PCA on quaternions:')
print(naive_pca.eigenvalues_summary)
print()
print('PGA (manifold-aware):')
print(pga.eigenvalues_summary)
Naive PCA on quaternions:
          eigenvalue % of variance % of variance (cumulative)
component                                                    
0              1.533        38.32%                     38.32%
1              1.152        28.80%                     67.12%
2              0.803        20.07%                     87.19%

PGA (manifold-aware):
          eigenvalue % of variance % of variance (cumulative)
component                                                    
0              0.044        39.00%                     39.00%
1              0.041        36.30%                     75.30%
2              0.028        24.70%                    100.00%
# Naive PCA reconstructions are not valid unit quaternions
naive_coords = naive_pca.transform(quat_df)
naive_recon = naive_pca.inverse_transform(naive_coords)
norms = np.linalg.norm(naive_recon.values, axis=1)
print(f'Naive PCA reconstruction quaternion norms: min={norms.min():.4f}, max={norms.max():.4f}')
print('(Should be 1.0 for valid unit quaternions)')
Naive PCA reconstruction quaternion norms: min=0.9779, max=1.0959
(Should be 1.0 for valid unit quaternions)