Prince foo

Multiple factor analysis

Table of contents

Resources

Data

Multiple factor analysis (MFA) is meant to be used when you have groups of variables. In practice, it builds a PCA on each group. It then fits a global PCA on the results of the so-called partial PCAs.

The dataset used in the following example come from this paper. In the dataset, three experts give their opinion on six different wines. Each opinion for each wine is recorded as a variable. We thus want to consider the separate opinions of each expert whilst also having a global overview of each wine. MFA is the perfect fit for this kind of situation.

import prince

dataset = prince.datasets.load_premier_league()
dataset

2021-222022-232023-24
WDLGFGAPtsWDLGFGAPtsWDLGFGAPts
Team
Arsenal2231361486926668843842855912989
Aston Villa136195254451871351466120810766168
Brentford137184856461514958465910919566539
Brighton & Hove Albion12151142445118812725362121214556248
Chelsea2111676337411111638474418911776363
Crystal Palace111512504648111215404945131015575849
Everton116214366398121834573613916405140
Liverpool28829426921910975476724104864182
Manchester City296399269328559433892873963491
Manchester United161012575758236958437518614575860
Newcastle United1310154462491914568337118614856260
Tottenham Hotspur225116940711861470636020612746166
West Ham United1681460515611720425540141014607452
Wolverhampton Wanderers156173843511181931584113718506546
import pandas as pd

isinstance(dataset.columns, pd.MultiIndex)
True

Fitting

The groups are specified by the groups argument when calling fit.

groups = dataset.columns.levels[0].tolist()
groups
['2021-22', '2022-23', '2023-24']
mfa = prince.MFA(
    n_components=3,
    n_iter=3,
    copy=True,
    check_input=True,
    engine='sklearn',
    random_state=42
)
mfa = mfa.fit(
    dataset,
    groups=groups,
    supplementary_groups=None
)

There are several ways to specify the groups:

The supplementary_groups argument is expected to be a list with one more existing group names.

Eigenvalues

mfa.eigenvalues_summary

eigenvalue% of variance% of variance (cumulative)
component
02.37659.53%59.53%
10.61915.51%75.04%
20.41210.32%85.36%

Coordinates

The MFA inherits from the PCA class, which means it provides access to the PCA methods and properties. For instance, the row_coordinates method will return the global coordinates of each wine.

mfa.row_coordinates(dataset)

component012
Team
Arsenal2.2369711.0345840.697651
Aston Villa-0.1799880.5802970.463962
Brentford-1.2674470.696757-0.490607
Brighton & Hove Albion-0.800062-0.248918-0.904603
Chelsea0.000108-1.253858-0.365442
Crystal Palace-1.325908-0.410853-0.809261
Everton-2.0892190.1842910.552330
Liverpool2.063236-1.170222-0.419547
Manchester City3.393773-0.160572-0.151160
Manchester United0.1894480.753614-0.007898
Newcastle United-0.0046561.462421-0.872403
Tottenham Hotspur0.510562-0.4169550.992128
West Ham United-1.186842-0.7563590.432273
Wolverhampton Wanderers-1.539976-0.2942260.882576

There is also a partial_row_coordinates method that returns the coordinates projected onto each group.

mfa.partial_row_coordinates(dataset)

2021-222022-232023-24
012012012
Team
Arsenal0.690262-0.0595171.4170842.5056242.235689-0.8254303.5150250.9275791.501298
Aston Villa-1.2048901.8074320.8981280.113710-0.0350780.3710640.551216-0.0314640.122694
Brentford-1.2894551.8257810.620325-0.2442230.442700-1.365260-2.268664-0.178208-0.726887
Brighton & Hove Albion-1.0253280.230789-1.8055210.3295200.0290760.362772-1.704379-1.006619-1.271060
Chelsea1.423732-2.259632-1.063349-1.506446-1.2306280.2353330.083038-0.271314-0.268308
Crystal Palace-1.1062480.364282-1.768677-1.512225-1.1480290.057866-1.359252-0.448812-0.716972
Everton-2.0254593.0138371.068040-2.466096-2.2970071.002036-1.776102-0.163958-0.413086
Liverpool3.136063-3.954644-0.4948320.7960270.895556-0.7638942.257618-0.4515780.000085
Manchester City3.346269-3.9368280.0582943.3048543.094441-1.4863583.5301980.3606720.974585
Manchester United-0.4623760.551069-0.3881861.3220631.191180-0.205701-0.2913440.5185930.570194
Newcastle United-1.3901561.706830-0.2258161.1361872.110547-2.7943850.2400010.5698870.402993
Tottenham Hotspur1.098053-0.9643280.751364-0.037297-0.7874171.6214850.4709300.5008810.603535
West Ham United-0.3437110.5242010.161269-1.726248-2.1915901.896491-1.490567-0.601689-0.760941
Wolverhampton Wanderers-0.8467571.1507310.771878-2.015449-2.3094391.893981-1.7577210.276030-0.018129

Visualization

mfa.plot(
    dataset,
    x_component=0,
    y_component=1
)

The first axis explains most of the difference between the wine ratings. This difference is actually due to the oak type of the barrels they were fermented in.

The show_partial_rows argument allows showing the global row coordinates together with the partial row coordinates. All the coordinates of each sample are connected with edges.

mfa.plot(
    dataset,
    show_partial_rows=True
)

Partial PCAs

An MFA is essentially a PCA applied to the outputs of partial PCA. Indeed, a PCA is first fitted to each group. A partial PCA can be accessed as so:

dataset['2022-23']

WDLGFGAPts
Team
Arsenal2666884384
Aston Villa18713514661
Brentford15149584659
Brighton & Hove Albion18812725362
Chelsea111116384744
Crystal Palace111215404945
Everton81218345736
Liverpool19109754767
Manchester City2855943389
Manchester United2369584375
Newcastle United19145683371
Tottenham Hotspur18614706360
West Ham United11720425540
Wolverhampton Wanderers11819315841
mfa['2022-23'].eigenvalues_summary

eigenvalue% of variance% of variance (cumulative)
component
04.37472.89%72.89%
11.24520.74%93.64%
20.3205.34%98.97%