Multiple factor analysis

Resources

Data

Multiple factor analysis (MFA) is meant to be used when you have groups of variables. In practice, it builds a PCA on each group. It then fits a global PCA on the results of the so-called partial PCAs.

The following dataset contains end of season figures for Premier League football teams. It spans the 2021/22, 2022/23, and 2023/24 seasons. Only the 14 teams that have been in the Premier League for all three seasons are included.

import prince

dataset = prince.datasets.load_premier_league()
dataset

	2021-22						2022-23						2023-24
	W	D	L	GF	GA	Pts	W	D	L	GF	GA	Pts	W	D	L	GF	GA	Pts
Team
Arsenal	22	3	13	61	48	69	26	6	6	88	43	84	28	5	5	91	29	89
Aston Villa	13	6	19	52	54	45	18	7	13	51	46	61	20	8	10	76	61	68
Brentford	13	7	18	48	56	46	15	14	9	58	46	59	10	9	19	56	65	39
Brighton & Hove Albion	12	15	11	42	44	51	18	8	12	72	53	62	12	12	14	55	62	48
Chelsea	21	11	6	76	33	74	11	11	16	38	47	44	18	9	11	77	63	63
Crystal Palace	11	15	12	50	46	48	11	12	15	40	49	45	13	10	15	57	58	49
Everton	11	6	21	43	66	39	8	12	18	34	57	36	13	9	16	40	51	40
Liverpool	28	8	2	94	26	92	19	10	9	75	47	67	24	10	4	86	41	82
Manchester City	29	6	3	99	26	93	28	5	5	94	33	89	28	7	3	96	34	91
Manchester United	16	10	12	57	57	58	23	6	9	58	43	75	18	6	14	57	58	60
Newcastle United	13	10	15	44	62	49	19	14	5	68	33	71	18	6	14	85	62	60
Tottenham Hotspur	22	5	11	69	40	71	18	6	14	70	63	60	20	6	12	74	61	66
West Ham United	16	8	14	60	51	56	11	7	20	42	55	40	14	10	14	60	74	52
Wolverhampton Wanderers	15	6	17	38	43	51	11	8	19	31	58	41	13	7	18	50	65	46

import pandas as pd

isinstance(dataset.columns, pd.MultiIndex)

True

Fitting

The groups are specified by the groups argument when calling fit.

groups = dataset.columns.levels[0].tolist()
groups

['2021-22', '2022-23', '2023-24']

mfa = prince.MFA(
    n_components=3,
    n_iter=3,
    copy=True,
    check_input=True,
    engine='sklearn',
    random_state=42
)
mfa = mfa.fit(
    dataset,
    groups=groups,
    supplementary_groups=None
)

There are several ways to specify the groups:

If the columns of the dataframe are a MultiIndex:
- By default the groups are all the columns in the first level.
- You can also pass a list with a subset of the columns in the first level.
You can also pass a dict that maps group names to the desired columns.

The supplementary_groups argument is expected to be a list with one more existing group names.

Eigenvalues

mfa.eigenvalues_summary

	eigenvalue	% of variance	% of variance (cumulative)
component
0	2.376	59.53%	59.53%
1	0.619	15.51%	75.04%
2	0.412	10.32%	85.36%

Coordinates

The MFA inherits from the PCA class, which means it provides access to the PCA methods and properties. For instance, the row_coordinates method will return the global coordinates of each wine.

mfa.row_coordinates(dataset)

component	0	1	2
Team
Arsenal	2.236971	1.034584	0.697651
Aston Villa	-0.179988	0.580297	0.463962
Brentford	-1.267447	0.696757	-0.490607
Brighton & Hove Albion	-0.800062	-0.248918	-0.904603
Chelsea	0.000108	-1.253858	-0.365442
Crystal Palace	-1.325908	-0.410853	-0.809261
Everton	-2.089219	0.184291	0.552330
Liverpool	2.063236	-1.170222	-0.419547
Manchester City	3.393773	-0.160572	-0.151160
Manchester United	0.189448	0.753614	-0.007898
Newcastle United	-0.004656	1.462421	-0.872403
Tottenham Hotspur	0.510562	-0.416955	0.992128
West Ham United	-1.186842	-0.756359	0.432273
Wolverhampton Wanderers	-1.539976	-0.294226	0.882576

There is also a partial_row_coordinates method that returns the coordinates projected onto each group.

mfa.partial_row_coordinates(dataset)

	2021-22			2022-23			2023-24
	0	1	2	0	1	2	0	1	2
Team
Arsenal	0.690262	-0.059517	1.417084	2.505624	2.235689	-0.825430	3.515025	0.927579	1.501298
Aston Villa	-1.204890	1.807432	0.898128	0.113710	-0.035078	0.371064	0.551216	-0.031464	0.122694
Brentford	-1.289455	1.825781	0.620325	-0.244223	0.442700	-1.365260	-2.268664	-0.178208	-0.726887
Brighton & Hove Albion	-1.025328	0.230789	-1.805521	0.329520	0.029076	0.362772	-1.704379	-1.006619	-1.271060
Chelsea	1.423732	-2.259632	-1.063349	-1.506446	-1.230628	0.235333	0.083038	-0.271314	-0.268308
Crystal Palace	-1.106248	0.364282	-1.768677	-1.512225	-1.148029	0.057866	-1.359252	-0.448812	-0.716972
Everton	-2.025459	3.013837	1.068040	-2.466096	-2.297007	1.002036	-1.776102	-0.163958	-0.413086
Liverpool	3.136063	-3.954644	-0.494832	0.796027	0.895556	-0.763894	2.257618	-0.451578	0.000085
Manchester City	3.346269	-3.936828	0.058294	3.304854	3.094441	-1.486358	3.530198	0.360672	0.974585
Manchester United	-0.462376	0.551069	-0.388186	1.322063	1.191180	-0.205701	-0.291344	0.518593	0.570194
Newcastle United	-1.390156	1.706830	-0.225816	1.136187	2.110547	-2.794385	0.240001	0.569887	0.402993
Tottenham Hotspur	1.098053	-0.964328	0.751364	-0.037297	-0.787417	1.621485	0.470930	0.500881	0.603535
West Ham United	-0.343711	0.524201	0.161269	-1.726248	-2.191590	1.896491	-1.490567	-0.601689	-0.760941
Wolverhampton Wanderers	-0.846757	1.150731	0.771878	-2.015449	-2.309439	1.893981	-1.757721	0.276030	-0.018129

Visualization

mfa.plot(
    dataset,
    x_component=0,
    y_component=1
)

The first axis explains most of the difference between the wine ratings. This difference is actually due to the oak type of the barrels they were fermented in.

The show_partial_rows argument allows showing the global row coordinates together with the partial row coordinates. All the coordinates of each sample are connected with edges.

mfa.plot(
    dataset,
    show_partial_rows=True
)

Partial PCAs

An MFA is essentially a PCA applied to the outputs of partial PCA. Indeed, a PCA is first fitted to each group. A partial PCA can be accessed as so:

dataset['2022-23']

	W	D	L	GF	GA	Pts
Team
Arsenal	26	6	6	88	43	84
Aston Villa	18	7	13	51	46	61
Brentford	15	14	9	58	46	59
Brighton & Hove Albion	18	8	12	72	53	62
Chelsea	11	11	16	38	47	44
Crystal Palace	11	12	15	40	49	45
Everton	8	12	18	34	57	36
Liverpool	19	10	9	75	47	67
Manchester City	28	5	5	94	33	89
Manchester United	23	6	9	58	43	75
Newcastle United	19	14	5	68	33	71
Tottenham Hotspur	18	6	14	70	63	60
West Ham United	11	7	20	42	55	40
Wolverhampton Wanderers	11	8	19	31	58	41

mfa['2022-23'].eigenvalues_summary

	eigenvalue	% of variance	% of variance (cumulative)
component
0	4.374	72.89%	72.89%
1	1.245	20.74%	93.64%
2	0.320	5.34%	98.97%

Prince foo