Multiple factor analysis
Table of contents
Resources
- Multiple Factor Analysis by Hervé Abdi
- Multiple Factor Analysis: main features and application to sensory data by Jérôme Pagès
- Wikipedia article
Data
Multiple factor analysis (MFA) is meant to be used when you have groups of variables. In practice, it builds a PCA on each group. It then fits a global PCA on the results of the so-called partial PCAs.
The following dataset contains end of season figures for Premier League football teams. It spans the 2021/22, 2022/23, and 2023/24 seasons. Only the 14 teams that have been in the Premier League for all three seasons are included.
import prince
dataset = prince.datasets.load_premier_league()
dataset
2021-22 | 2022-23 | 2023-24 | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
W | D | L | GF | GA | Pts | W | D | L | GF | GA | Pts | W | D | L | GF | GA | Pts | |
Team | ||||||||||||||||||
Arsenal | 22 | 3 | 13 | 61 | 48 | 69 | 26 | 6 | 6 | 88 | 43 | 84 | 28 | 5 | 5 | 91 | 29 | 89 |
Aston Villa | 13 | 6 | 19 | 52 | 54 | 45 | 18 | 7 | 13 | 51 | 46 | 61 | 20 | 8 | 10 | 76 | 61 | 68 |
Brentford | 13 | 7 | 18 | 48 | 56 | 46 | 15 | 14 | 9 | 58 | 46 | 59 | 10 | 9 | 19 | 56 | 65 | 39 |
Brighton & Hove Albion | 12 | 15 | 11 | 42 | 44 | 51 | 18 | 8 | 12 | 72 | 53 | 62 | 12 | 12 | 14 | 55 | 62 | 48 |
Chelsea | 21 | 11 | 6 | 76 | 33 | 74 | 11 | 11 | 16 | 38 | 47 | 44 | 18 | 9 | 11 | 77 | 63 | 63 |
Crystal Palace | 11 | 15 | 12 | 50 | 46 | 48 | 11 | 12 | 15 | 40 | 49 | 45 | 13 | 10 | 15 | 57 | 58 | 49 |
Everton | 11 | 6 | 21 | 43 | 66 | 39 | 8 | 12 | 18 | 34 | 57 | 36 | 13 | 9 | 16 | 40 | 51 | 40 |
Liverpool | 28 | 8 | 2 | 94 | 26 | 92 | 19 | 10 | 9 | 75 | 47 | 67 | 24 | 10 | 4 | 86 | 41 | 82 |
Manchester City | 29 | 6 | 3 | 99 | 26 | 93 | 28 | 5 | 5 | 94 | 33 | 89 | 28 | 7 | 3 | 96 | 34 | 91 |
Manchester United | 16 | 10 | 12 | 57 | 57 | 58 | 23 | 6 | 9 | 58 | 43 | 75 | 18 | 6 | 14 | 57 | 58 | 60 |
Newcastle United | 13 | 10 | 15 | 44 | 62 | 49 | 19 | 14 | 5 | 68 | 33 | 71 | 18 | 6 | 14 | 85 | 62 | 60 |
Tottenham Hotspur | 22 | 5 | 11 | 69 | 40 | 71 | 18 | 6 | 14 | 70 | 63 | 60 | 20 | 6 | 12 | 74 | 61 | 66 |
West Ham United | 16 | 8 | 14 | 60 | 51 | 56 | 11 | 7 | 20 | 42 | 55 | 40 | 14 | 10 | 14 | 60 | 74 | 52 |
Wolverhampton Wanderers | 15 | 6 | 17 | 38 | 43 | 51 | 11 | 8 | 19 | 31 | 58 | 41 | 13 | 7 | 18 | 50 | 65 | 46 |
import pandas as pd
isinstance(dataset.columns, pd.MultiIndex)
True
Fitting
The groups are specified by the groups
argument when calling fit
.
groups = dataset.columns.levels[0].tolist()
groups
['2021-22', '2022-23', '2023-24']
mfa = prince.MFA(
n_components=3,
n_iter=3,
copy=True,
check_input=True,
engine='sklearn',
random_state=42
)
mfa = mfa.fit(
dataset,
groups=groups,
supplementary_groups=None
)
There are several ways to specify the groups:
- If the columns of the dataframe are a
MultiIndex
:- By default the groups are all the columns in the first level.
- You can also pass a list with a subset of the columns in the first level.
- You can also pass a dict that maps group names to the desired columns.
The supplementary_groups
argument is expected to be a list with one more existing group names.
Eigenvalues
mfa.eigenvalues_summary
eigenvalue | % of variance | % of variance (cumulative) | |
---|---|---|---|
component | |||
0 | 2.376 | 59.53% | 59.53% |
1 | 0.619 | 15.51% | 75.04% |
2 | 0.412 | 10.32% | 85.36% |
Coordinates
The MFA
inherits from the PCA
class, which means it provides access to the PCA
methods and properties. For instance, the row_coordinates
method will return the global coordinates of each wine.
mfa.row_coordinates(dataset)
component | 0 | 1 | 2 |
---|---|---|---|
Team | |||
Arsenal | 2.236971 | 1.034584 | 0.697651 |
Aston Villa | -0.179988 | 0.580297 | 0.463962 |
Brentford | -1.267447 | 0.696757 | -0.490607 |
Brighton & Hove Albion | -0.800062 | -0.248918 | -0.904603 |
Chelsea | 0.000108 | -1.253858 | -0.365442 |
Crystal Palace | -1.325908 | -0.410853 | -0.809261 |
Everton | -2.089219 | 0.184291 | 0.552330 |
Liverpool | 2.063236 | -1.170222 | -0.419547 |
Manchester City | 3.393773 | -0.160572 | -0.151160 |
Manchester United | 0.189448 | 0.753614 | -0.007898 |
Newcastle United | -0.004656 | 1.462421 | -0.872403 |
Tottenham Hotspur | 0.510562 | -0.416955 | 0.992128 |
West Ham United | -1.186842 | -0.756359 | 0.432273 |
Wolverhampton Wanderers | -1.539976 | -0.294226 | 0.882576 |
There is also a partial_row_coordinates
method that returns the coordinates projected onto each group.
mfa.partial_row_coordinates(dataset)
2021-22 | 2022-23 | 2023-24 | |||||||
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 0 | 1 | 2 | 0 | 1 | 2 | |
Team | |||||||||
Arsenal | 0.690262 | -0.059517 | 1.417084 | 2.505624 | 2.235689 | -0.825430 | 3.515025 | 0.927579 | 1.501298 |
Aston Villa | -1.204890 | 1.807432 | 0.898128 | 0.113710 | -0.035078 | 0.371064 | 0.551216 | -0.031464 | 0.122694 |
Brentford | -1.289455 | 1.825781 | 0.620325 | -0.244223 | 0.442700 | -1.365260 | -2.268664 | -0.178208 | -0.726887 |
Brighton & Hove Albion | -1.025328 | 0.230789 | -1.805521 | 0.329520 | 0.029076 | 0.362772 | -1.704379 | -1.006619 | -1.271060 |
Chelsea | 1.423732 | -2.259632 | -1.063349 | -1.506446 | -1.230628 | 0.235333 | 0.083038 | -0.271314 | -0.268308 |
Crystal Palace | -1.106248 | 0.364282 | -1.768677 | -1.512225 | -1.148029 | 0.057866 | -1.359252 | -0.448812 | -0.716972 |
Everton | -2.025459 | 3.013837 | 1.068040 | -2.466096 | -2.297007 | 1.002036 | -1.776102 | -0.163958 | -0.413086 |
Liverpool | 3.136063 | -3.954644 | -0.494832 | 0.796027 | 0.895556 | -0.763894 | 2.257618 | -0.451578 | 0.000085 |
Manchester City | 3.346269 | -3.936828 | 0.058294 | 3.304854 | 3.094441 | -1.486358 | 3.530198 | 0.360672 | 0.974585 |
Manchester United | -0.462376 | 0.551069 | -0.388186 | 1.322063 | 1.191180 | -0.205701 | -0.291344 | 0.518593 | 0.570194 |
Newcastle United | -1.390156 | 1.706830 | -0.225816 | 1.136187 | 2.110547 | -2.794385 | 0.240001 | 0.569887 | 0.402993 |
Tottenham Hotspur | 1.098053 | -0.964328 | 0.751364 | -0.037297 | -0.787417 | 1.621485 | 0.470930 | 0.500881 | 0.603535 |
West Ham United | -0.343711 | 0.524201 | 0.161269 | -1.726248 | -2.191590 | 1.896491 | -1.490567 | -0.601689 | -0.760941 |
Wolverhampton Wanderers | -0.846757 | 1.150731 | 0.771878 | -2.015449 | -2.309439 | 1.893981 | -1.757721 | 0.276030 | -0.018129 |
Visualization
mfa.plot(
dataset,
x_component=0,
y_component=1
)
The first axis explains most of the difference between the wine ratings. This difference is actually due to the oak type of the barrels they were fermented in.
The show_partial_rows
argument allows showing the global row coordinates together with the partial row coordinates. All the coordinates of each sample are connected with edges.
mfa.plot(
dataset,
show_partial_rows=True
)
Partial PCAs
An MFA is essentially a PCA applied to the outputs of partial PCA. Indeed, a PCA is first fitted to each group. A partial PCA can be accessed as so:
dataset['2022-23']
W | D | L | GF | GA | Pts | |
---|---|---|---|---|---|---|
Team | ||||||
Arsenal | 26 | 6 | 6 | 88 | 43 | 84 |
Aston Villa | 18 | 7 | 13 | 51 | 46 | 61 |
Brentford | 15 | 14 | 9 | 58 | 46 | 59 |
Brighton & Hove Albion | 18 | 8 | 12 | 72 | 53 | 62 |
Chelsea | 11 | 11 | 16 | 38 | 47 | 44 |
Crystal Palace | 11 | 12 | 15 | 40 | 49 | 45 |
Everton | 8 | 12 | 18 | 34 | 57 | 36 |
Liverpool | 19 | 10 | 9 | 75 | 47 | 67 |
Manchester City | 28 | 5 | 5 | 94 | 33 | 89 |
Manchester United | 23 | 6 | 9 | 58 | 43 | 75 |
Newcastle United | 19 | 14 | 5 | 68 | 33 | 71 |
Tottenham Hotspur | 18 | 6 | 14 | 70 | 63 | 60 |
West Ham United | 11 | 7 | 20 | 42 | 55 | 40 |
Wolverhampton Wanderers | 11 | 8 | 19 | 31 | 58 | 41 |
mfa['2022-23'].eigenvalues_summary
eigenvalue | % of variance | % of variance (cumulative) | |
---|---|---|---|
component | |||
0 | 4.374 | 72.89% | 72.89% |
1 | 1.245 | 20.74% | 93.64% |
2 | 0.320 | 5.34% | 98.97% |