Prince foo

Multiple factor analysis

Table of contents

Resources

Data

Multiple factor analysis (MFA) is meant to be used when you have groups of variables. In practice, it builds a PCA on each group. It then fits a global PCA on the results of the so-called partial PCAs.

The dataset used in the following example come from this paper. In the dataset, three experts give their opinion on six different wines. Each opinion for each wine is recorded as a variable. We thus want to consider the separate opinions of each expert whilst also having a global overview of each wine. MFA is the perfect fit for this kind of situation.

import prince 

dataset = prince.datasets.load_burgundy_wines()
dataset

expertOak typeExpert 1Expert 2Expert 3
aspectFruityWoodyCoffeeRed fruitRoastedVanillinWoodyFruityButterWoody
Wine 111672576367
Wine 225324442443
Wine 326115211711
Wine 427127212222
Wine 512543565266
Wine 613443545175

Fitting

The groups are passed as a dictionary to the fit method.

groups = dataset.columns.levels[0].drop("Oak type").tolist()
groups
['Expert 1', 'Expert 2', 'Expert 3']
mfa = prince.MFA(
    n_components=2,
    n_iter=3,
    copy=True,
    check_input=True,
    engine='sklearn',
    random_state=42
)
mfa = mfa.fit(dataset, groups=groups)

Eigenvalues

mfa.eigenvalues_summary

eigenvalue% of variance% of variance (cumulative)
component
02.83588.82%88.82%
10.35711.18%100.00%

Coordinates

The MFA inherits from the PCA class, which means it provides access to the PCA methods and properties. For instance, the row_coordinates method will return the global coordinates of each wine.

mfa.row_coordinates(dataset)

01
Wine 1-2.172155-0.508596
Wine 20.557017-0.197408
Wine 32.317663-0.830259
Wine 41.8325570.905046
Wine 5-1.4037870.054977
Wine 6-1.1312960.576241

However, all the other methods are not implemented yet. They will raise a NotImplemented exception if you call them.

mfa.group_row_coordinates(dataset)

groupExpert 1Expert 2Expert 3
component010101
Wine 1-2.764432-1.104812-2.213928-0.863519-1.5381060.442545
Wine 20.7730340.2989190.284247-0.1321350.613771-0.759009
Wine 31.9913980.8058932.1115080.4997182.850084-3.796390
Wine 41.9814560.9271872.3930091.2271461.1232060.560803
Wine 5-1.292834-0.620661-1.492114-0.488088-1.4264141.273679
Wine 6-0.688623-0.306527-1.082723-0.243122-1.6225412.278372

Visualization

mfa.plot(
    dataset,
    x_component=0,
    y_component=1
)

The first axis explains most of the difference between the wine ratings. This difference is actually due to the oak type of the barrels they were fermented in.

Partial PCAs

An MFA is essentially a PCA applied to the outputs of partial PCA. Indeed, a PCA is first fitted to each group. A partial PCA can be accessed as so:

dataset['Expert 1']

aspectFruityWoodyCoffee
Wine 1167
Wine 2532
Wine 3611
Wine 4712
Wine 5254
Wine 6344
mfa['Expert 1'].eigenvalues_summary

eigenvalue% of variance% of variance (cumulative)
component
02.86395.42%95.42%
10.1203.99%99.41%