Multiple factor analysis
Table of contents
Resources
- Multiple Factor Analysis by Hervé Abdi
- Multiple Factor Analysis: main features and application to sensory data by Jérôme Pagès
- Wikipedia article
Data
Multiple factor analysis (MFA) is meant to be used when you have groups of variables. In practice, it builds a PCA on each group. It then fits a global PCA on the results of the so-called partial PCAs.
The dataset used in the following example come from this paper. In the dataset, three experts give their opinion on six different wines. Each opinion for each wine is recorded as a variable. We thus want to consider the separate opinions of each expert whilst also having a global overview of each wine. MFA is the perfect fit for this kind of situation.
import prince
dataset = prince.datasets.load_premier_league()
dataset
2021-22 | 2022-23 | 2023-24 | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
W | D | L | GF | GA | Pts | W | D | L | GF | GA | Pts | W | D | L | GF | GA | Pts | |
Team | ||||||||||||||||||
Arsenal | 22 | 3 | 13 | 61 | 48 | 69 | 26 | 6 | 6 | 88 | 43 | 84 | 28 | 5 | 5 | 91 | 29 | 89 |
Aston Villa | 13 | 6 | 19 | 52 | 54 | 45 | 18 | 7 | 13 | 51 | 46 | 61 | 20 | 8 | 10 | 76 | 61 | 68 |
Brentford | 13 | 7 | 18 | 48 | 56 | 46 | 15 | 14 | 9 | 58 | 46 | 59 | 10 | 9 | 19 | 56 | 65 | 39 |
Brighton & Hove Albion | 12 | 15 | 11 | 42 | 44 | 51 | 18 | 8 | 12 | 72 | 53 | 62 | 12 | 12 | 14 | 55 | 62 | 48 |
Chelsea | 21 | 11 | 6 | 76 | 33 | 74 | 11 | 11 | 16 | 38 | 47 | 44 | 18 | 9 | 11 | 77 | 63 | 63 |
Crystal Palace | 11 | 15 | 12 | 50 | 46 | 48 | 11 | 12 | 15 | 40 | 49 | 45 | 13 | 10 | 15 | 57 | 58 | 49 |
Everton | 11 | 6 | 21 | 43 | 66 | 39 | 8 | 12 | 18 | 34 | 57 | 36 | 13 | 9 | 16 | 40 | 51 | 40 |
Liverpool | 28 | 8 | 2 | 94 | 26 | 92 | 19 | 10 | 9 | 75 | 47 | 67 | 24 | 10 | 4 | 86 | 41 | 82 |
Manchester City | 29 | 6 | 3 | 99 | 26 | 93 | 28 | 5 | 5 | 94 | 33 | 89 | 28 | 7 | 3 | 96 | 34 | 91 |
Manchester United | 16 | 10 | 12 | 57 | 57 | 58 | 23 | 6 | 9 | 58 | 43 | 75 | 18 | 6 | 14 | 57 | 58 | 60 |
Newcastle United | 13 | 10 | 15 | 44 | 62 | 49 | 19 | 14 | 5 | 68 | 33 | 71 | 18 | 6 | 14 | 85 | 62 | 60 |
Tottenham Hotspur | 22 | 5 | 11 | 69 | 40 | 71 | 18 | 6 | 14 | 70 | 63 | 60 | 20 | 6 | 12 | 74 | 61 | 66 |
West Ham United | 16 | 8 | 14 | 60 | 51 | 56 | 11 | 7 | 20 | 42 | 55 | 40 | 14 | 10 | 14 | 60 | 74 | 52 |
Wolverhampton Wanderers | 15 | 6 | 17 | 38 | 43 | 51 | 11 | 8 | 19 | 31 | 58 | 41 | 13 | 7 | 18 | 50 | 65 | 46 |
import pandas as pd
isinstance(dataset.columns, pd.MultiIndex)
True
Fitting
The groups are specified by the groups
argument when calling fit
.
groups = dataset.columns.levels[0].tolist()
groups
['2021-22', '2022-23', '2023-24']
mfa = prince.MFA(
n_components=3,
n_iter=3,
copy=True,
check_input=True,
engine='sklearn',
random_state=42
)
mfa = mfa.fit(
dataset,
groups=groups,
supplementary_groups=None
)
There are several ways to specify the groups:
- If the columns of the dataframe are a
MultiIndex
:- By default the groups are all the columns in the first level.
- You can also pass a list with a subset of the columns in the first level.
- You can also pass a dict that maps group names to the desired columns.
The supplementary_groups
argument is expected to be a list with one more existing group names.
Eigenvalues
mfa.eigenvalues_summary
eigenvalue | % of variance | % of variance (cumulative) | |
---|---|---|---|
component | |||
0 | 2.376 | 59.53% | 59.53% |
1 | 0.619 | 15.51% | 75.04% |
2 | 0.412 | 10.32% | 85.36% |
Coordinates
The MFA
inherits from the PCA
class, which means it provides access to the PCA
methods and properties. For instance, the row_coordinates
method will return the global coordinates of each wine.
mfa.row_coordinates(dataset)
component | 0 | 1 | 2 |
---|---|---|---|
Team | |||
Arsenal | 2.236971 | 1.034584 | 0.697651 |
Aston Villa | -0.179988 | 0.580297 | 0.463962 |
Brentford | -1.267447 | 0.696757 | -0.490607 |
Brighton & Hove Albion | -0.800062 | -0.248918 | -0.904603 |
Chelsea | 0.000108 | -1.253858 | -0.365442 |
Crystal Palace | -1.325908 | -0.410853 | -0.809261 |
Everton | -2.089219 | 0.184291 | 0.552330 |
Liverpool | 2.063236 | -1.170222 | -0.419547 |
Manchester City | 3.393773 | -0.160572 | -0.151160 |
Manchester United | 0.189448 | 0.753614 | -0.007898 |
Newcastle United | -0.004656 | 1.462421 | -0.872403 |
Tottenham Hotspur | 0.510562 | -0.416955 | 0.992128 |
West Ham United | -1.186842 | -0.756359 | 0.432273 |
Wolverhampton Wanderers | -1.539976 | -0.294226 | 0.882576 |
There is also a partial_row_coordinates
method that returns the coordinates projected onto each group.
mfa.partial_row_coordinates(dataset)
2021-22 | 2022-23 | 2023-24 | |||||||
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 0 | 1 | 2 | 0 | 1 | 2 | |
Team | |||||||||
Arsenal | 0.690262 | -0.059517 | 1.417084 | 2.505624 | 2.235689 | -0.825430 | 3.515025 | 0.927579 | 1.501298 |
Aston Villa | -1.204890 | 1.807432 | 0.898128 | 0.113710 | -0.035078 | 0.371064 | 0.551216 | -0.031464 | 0.122694 |
Brentford | -1.289455 | 1.825781 | 0.620325 | -0.244223 | 0.442700 | -1.365260 | -2.268664 | -0.178208 | -0.726887 |
Brighton & Hove Albion | -1.025328 | 0.230789 | -1.805521 | 0.329520 | 0.029076 | 0.362772 | -1.704379 | -1.006619 | -1.271060 |
Chelsea | 1.423732 | -2.259632 | -1.063349 | -1.506446 | -1.230628 | 0.235333 | 0.083038 | -0.271314 | -0.268308 |
Crystal Palace | -1.106248 | 0.364282 | -1.768677 | -1.512225 | -1.148029 | 0.057866 | -1.359252 | -0.448812 | -0.716972 |
Everton | -2.025459 | 3.013837 | 1.068040 | -2.466096 | -2.297007 | 1.002036 | -1.776102 | -0.163958 | -0.413086 |
Liverpool | 3.136063 | -3.954644 | -0.494832 | 0.796027 | 0.895556 | -0.763894 | 2.257618 | -0.451578 | 0.000085 |
Manchester City | 3.346269 | -3.936828 | 0.058294 | 3.304854 | 3.094441 | -1.486358 | 3.530198 | 0.360672 | 0.974585 |
Manchester United | -0.462376 | 0.551069 | -0.388186 | 1.322063 | 1.191180 | -0.205701 | -0.291344 | 0.518593 | 0.570194 |
Newcastle United | -1.390156 | 1.706830 | -0.225816 | 1.136187 | 2.110547 | -2.794385 | 0.240001 | 0.569887 | 0.402993 |
Tottenham Hotspur | 1.098053 | -0.964328 | 0.751364 | -0.037297 | -0.787417 | 1.621485 | 0.470930 | 0.500881 | 0.603535 |
West Ham United | -0.343711 | 0.524201 | 0.161269 | -1.726248 | -2.191590 | 1.896491 | -1.490567 | -0.601689 | -0.760941 |
Wolverhampton Wanderers | -0.846757 | 1.150731 | 0.771878 | -2.015449 | -2.309439 | 1.893981 | -1.757721 | 0.276030 | -0.018129 |
Visualization
mfa.plot(
dataset,
x_component=0,
y_component=1
)
The first axis explains most of the difference between the wine ratings. This difference is actually due to the oak type of the barrels they were fermented in.
The show_partial_rows
argument allows showing the global row coordinates together with the partial row coordinates. All the coordinates of each sample are connected with edges.
mfa.plot(
dataset,
show_partial_rows=True
)
Partial PCAs
An MFA is essentially a PCA applied to the outputs of partial PCA. Indeed, a PCA is first fitted to each group. A partial PCA can be accessed as so:
dataset['2022-23']
W | D | L | GF | GA | Pts | |
---|---|---|---|---|---|---|
Team | ||||||
Arsenal | 26 | 6 | 6 | 88 | 43 | 84 |
Aston Villa | 18 | 7 | 13 | 51 | 46 | 61 |
Brentford | 15 | 14 | 9 | 58 | 46 | 59 |
Brighton & Hove Albion | 18 | 8 | 12 | 72 | 53 | 62 |
Chelsea | 11 | 11 | 16 | 38 | 47 | 44 |
Crystal Palace | 11 | 12 | 15 | 40 | 49 | 45 |
Everton | 8 | 12 | 18 | 34 | 57 | 36 |
Liverpool | 19 | 10 | 9 | 75 | 47 | 67 |
Manchester City | 28 | 5 | 5 | 94 | 33 | 89 |
Manchester United | 23 | 6 | 9 | 58 | 43 | 75 |
Newcastle United | 19 | 14 | 5 | 68 | 33 | 71 |
Tottenham Hotspur | 18 | 6 | 14 | 70 | 63 | 60 |
West Ham United | 11 | 7 | 20 | 42 | 55 | 40 |
Wolverhampton Wanderers | 11 | 8 | 19 | 31 | 58 | 41 |
mfa['2022-23'].eigenvalues_summary
eigenvalue | % of variance | % of variance (cumulative) | |
---|---|---|---|
component | |||
0 | 4.374 | 72.89% | 72.89% |
1 | 1.245 | 20.74% | 93.64% |
2 | 0.320 | 5.34% | 98.97% |