Prince foo

Multiple factor analysis

Table of contents

Resources

Data

Multiple factor analysis (MFA) is designed for datasets where variables are organized into groups. A common situation is when the same individuals are described by several sets of variables — for example, the same wines rated by different expert panels, the same patients measured with different medical instruments, or the same geographical sites described by both flora and soil characteristics.

The problem with running a plain PCA on all variables at once is that larger groups dominate the analysis simply through sheer numbers, not because they carry more meaningful information. If flora is described by 50 variables and soil by 11, PCA will be influenced mainly by the flora group. MFA solves this by weighting each group so that no single group can dominate the first principal component.

The following dataset contains end of season figures for Premier League football teams. It spans the 2021/22, 2022/23, and 2023/24 seasons. Each season forms a natural group of variables. Only the 14 teams that have been in the Premier League for all three seasons are included.

import prince

dataset = prince.datasets.load_premier_league()
dataset

2021-222022-232023-24
WDLGFGAPtsWDLGFGAPtsWDLGFGAPts
Team
Arsenal2231361486926668843842855912989
Aston Villa136195254451871351466120810766168
Brentford137184856461514958465910919566539
Brighton & Hove Albion12151142445118812725362121214556248
Chelsea2111676337411111638474418911776363
Crystal Palace111512504648111215404945131015575849
Everton116214366398121834573613916405140
Liverpool28829426921910975476724104864182
Manchester City296399269328559433892873963491
Manchester United161012575758236958437518614575860
Newcastle United1310154462491914568337118614856260
Tottenham Hotspur225116940711861470636020612746166
West Ham United1681460515611720425540141014607452
Wolverhampton Wanderers156173843511181931584113718506546
import pandas as pd

isinstance(dataset.columns, pd.MultiIndex)
True

Fitting

Under the hood, MFA proceeds in two stages:

  1. Separate PCAs: a PCA is fitted independently on each group of variables.
  2. Weighted global PCA: all variables are concatenated, but each variable in group j is divided by the first singular value of group j’s separate PCA. This ensures that the maximum axial inertia of every group is normalized to 1, so no single group can monopolize the first axis of the global analysis.

The result is a compromise: the global components reflect the common structure shared across groups, rather than the structure of whichever group happens to have the most variables or the highest variance.

The groups are specified by the groups argument when calling fit.

groups = dataset.columns.levels[0].tolist()
groups
['2021-22', '2022-23', '2023-24']
mfa = prince.MFA(
    n_components=3,
    n_iter=3,
    rescale_with_mean=True,
    rescale_with_std=True,
    copy=True,
    check_input=True,
    engine='sklearn',
    random_state=42
)
mfa = mfa.fit(
    dataset,
    groups=groups,
    supplementary_groups=None
)

There are several ways to specify the groups:

The supplementary_groups argument is expected to be a list with one more existing group names.

The rescale_with_mean and rescale_with_std parameters control whether centering and standardization are applied. These are passed to both the partial PCAs (one per group) and the global PCA.

Eigenvalues

As with PCA, eigenvalues indicate how much variance (inertia) each component captures. The first component represents the direction of maximum consensus across all groups.

mfa.eigenvalues_summary

eigenvalue% of variance% of variance (cumulative)
component
02.37659.53%59.53%
10.61915.51%75.04%
20.41210.32%85.36%

Row coordinates

Row coordinates position each individual (here, each football team) in the space defined by the MFA components. Teams with similar performance profiles across all three seasons will be close together.

The MFA inherits from the PCA class, which means it provides access to the PCA methods and properties. For instance, the row_coordinates method will return the global coordinates of each team.

mfa.row_coordinates(dataset)

component012
Team
Arsenal2.2369711.0345840.697651
Aston Villa-0.1799880.5802970.463962
Brentford-1.2674470.696757-0.490607
Brighton & Hove Albion-0.800062-0.248918-0.904603
Chelsea0.000108-1.253858-0.365442
Crystal Palace-1.325908-0.410853-0.809261
Everton-2.0892190.1842910.552330
Liverpool2.063236-1.170222-0.419547
Manchester City3.393773-0.160572-0.151160
Manchester United0.1894480.753614-0.007898
Newcastle United-0.0046561.462421-0.872403
Tottenham Hotspur0.510562-0.4169550.992128
West Ham United-1.186842-0.7563590.432273
Wolverhampton Wanderers-1.539976-0.2942260.882576

Partial row coordinates

A key feature of MFA is the concept of partial individuals. Each team can be viewed from the perspective of a single season: its “partial point” for 2021-22 shows where that team would sit if only the 2021-22 variables mattered. The global coordinate is the barycenter (average) of the partial points across groups.

When a team’s partial points cluster tightly together, it means the team had a consistent profile across all three seasons. When they spread apart, the team’s performance changed significantly from one season to another.

mfa.partial_row_coordinates(dataset)

2021-222022-232023-24
012012012
Team
Arsenal0.690262-0.0595171.4170842.5056242.235689-0.8254303.5150250.9275791.501298
Aston Villa-1.2048901.8074320.8981280.113710-0.0350780.3710640.551216-0.0314640.122694
Brentford-1.2894551.8257810.620325-0.2442230.442700-1.365260-2.268664-0.178208-0.726887
Brighton & Hove Albion-1.0253280.230789-1.8055210.3295200.0290760.362772-1.704379-1.006619-1.271060
Chelsea1.423732-2.259632-1.063349-1.506446-1.2306280.2353330.083038-0.271314-0.268308
Crystal Palace-1.1062480.364282-1.768677-1.512225-1.1480290.057866-1.359252-0.448812-0.716972
Everton-2.0254593.0138371.068040-2.466096-2.2970071.002036-1.776102-0.163958-0.413086
Liverpool3.136063-3.954644-0.4948320.7960270.895556-0.7638942.257618-0.4515780.000085
Manchester City3.346269-3.9368280.0582943.3048543.094441-1.4863583.5301980.3606720.974585
Manchester United-0.4623760.551069-0.3881861.3220631.191180-0.205701-0.2913440.5185930.570194
Newcastle United-1.3901561.706830-0.2258161.1361872.110547-2.7943850.2400010.5698870.402993
Tottenham Hotspur1.098053-0.9643280.751364-0.037297-0.7874171.6214850.4709300.5008810.603535
West Ham United-0.3437110.5242010.161269-1.726248-2.1915901.896491-1.490567-0.601689-0.760941
Wolverhampton Wanderers-0.8467571.1507310.771878-2.015449-2.3094391.893981-1.7577210.276030-0.018129

Visualization

mfa.plot(
    dataset,
    x_component=0,
    y_component=1
)

The show_partial_rows argument allows showing the global row coordinates together with the partial row coordinates. Each team’s partial points are connected to its global point with edges. Teams whose edges are short had a stable profile across seasons; teams with long edges changed substantially.

mfa.plot(
    dataset,
    show_partial_rows=True
)

Separate PCAs

Before building the global analysis, MFA fits a separate PCA on each group. These are stored in the MFA object and can be inspected individually. The first eigenvalue of each group PCA is what determines the weighting: each variable in a group is divided by that group’s first singular value.

For example, here is the 2022-23 season data and its separate PCA eigenvalues:

dataset['2022-23']

WDLGFGAPts
Team
Arsenal2666884384
Aston Villa18713514661
Brentford15149584659
Brighton & Hove Albion18812725362
Chelsea111116384744
Crystal Palace111215404945
Everton81218345736
Liverpool19109754767
Manchester City2855943389
Manchester United2369584375
Newcastle United19145683371
Tottenham Hotspur18614706360
West Ham United11720425540
Wolverhampton Wanderers11819315841
mfa['2022-23'].eigenvalues_summary

eigenvalue% of variance% of variance (cumulative)
component
04.37472.89%72.89%
11.24520.74%93.64%
20.3205.34%98.97%

Column coordinates

Column coordinates show how each variable loads onto the global MFA components. Since MFA normalizes each group’s influence, these loadings reflect the balanced structure across groups rather than being dominated by whichever group has the most variables.

mfa.column_coordinates_

component012
variable
(2021-22, W)0.881990-0.3794160.152508
(2021-22, D)-0.339390-0.243071-0.789014
(2021-22, L)-0.7298560.5573140.330582
(2021-22, GF)0.817458-0.474511-0.028325
(2021-22, GA)-0.6281690.7023540.081710
(2021-22, Pts)0.868301-0.454356-0.001584
(2022-23, W)0.8560580.401114-0.043721
(2022-23, D)-0.4920920.116892-0.554631
(2022-23, L)-0.715459-0.5468480.389624
(2022-23, GF)0.8493150.260247-0.152257
(2022-23, GA)-0.563610-0.4650160.517534
(2022-23, Pts)0.8395790.458098-0.150842
(2023-24, W)0.9469290.0585920.206938
(2023-24, D)-0.411948-0.569731-0.486639
(2023-24, L)-0.9273390.163900-0.041400
(2023-24, GF)0.8938370.061729-0.083240
(2023-24, GA)-0.780522-0.130760-0.074441
(2023-24, Pts)0.964985-0.0165460.123008

Contributions

Contributions indicate how much each row or column contributes to the variance of each component. They sum to 1 across all rows (or all columns) for each component.

mfa.row_contributions_.style.format('{:.0%}')
component012
Team   
Arsenal15%12%8%
Aston Villa0%4%4%
Brentford5%6%4%
Brighton & Hove Albion2%1%14%
Chelsea0%18%2%
Crystal Palace5%2%11%
Everton13%0%5%
Liverpool13%16%3%
Manchester City35%0%0%
Manchester United0%7%0%
Newcastle United0%25%13%
Tottenham Hotspur1%2%17%
West Ham United4%7%3%
Wolverhampton Wanderers7%1%14%
mfa.column_contributions_.style.format('{:.0%}')
component012
variable   
('2021-22', 'W')7%5%1%
('2021-22', 'D')1%2%33%
('2021-22', 'L')5%11%6%
('2021-22', 'GF')6%8%0%
('2021-22', 'GA')4%18%0%
('2021-22', 'Pts')7%7%0%
('2022-23', 'W')7%6%0%
('2022-23', 'D')2%1%17%
('2022-23', 'L')5%11%8%
('2022-23', 'GF')7%3%1%
('2022-23', 'GA')3%8%15%
('2022-23', 'Pts')7%8%1%
('2023-24', 'W')8%0%2%
('2023-24', 'D')2%11%12%
('2023-24', 'L')8%1%0%
('2023-24', 'GF')7%0%0%
('2023-24', 'GA')6%1%0%
('2023-24', 'Pts')8%0%1%

Cosine similarities

Cosine similarities (cos²) measure the quality of representation of each row or column on each component. A high cos² means that the component faithfully represents that individual or variable; a low cos² means the individual’s position on that component is unreliable.

mfa.row_cosine_similarities(dataset)

012
Team
Arsenal7.310324e-010.1563680.071104
Aston Villa2.699398e-020.2805950.179368
Brentford5.234111e-010.1581780.078424
Brighton & Hove Albion2.145100e-010.0207640.274231
Chelsea5.150712e-090.6913770.058729
Crystal Palace6.126493e-010.0588250.228225
Everton7.817237e-010.0060830.054636
Liverpool6.925223e-010.2227780.028635
Manchester City9.804235e-010.0021950.001945
Manchester United2.364993e-020.3742390.000041
Newcastle United6.268486e-060.6184010.220069
Tottenham Hotspur1.288072e-010.0859050.486383
West Ham United5.502091e-010.2234590.072989
Wolverhampton Wanderers6.612554e-010.0241380.217193
mfa.column_cosine_similarities_

component012
variable
(2021-22, W)0.7779060.1439560.023259
(2021-22, D)0.1151850.0590830.622544
(2021-22, L)0.5326900.3105990.109284
(2021-22, GF)0.6682380.2251610.000802
(2021-22, GA)0.3945960.4933010.006677
(2021-22, Pts)0.7539470.2064390.000003
(2022-23, W)0.7328360.1608930.001911
(2022-23, D)0.2421550.0136640.307615
(2022-23, L)0.5118820.2990420.151807
(2022-23, GF)0.7213350.0677280.023182
(2022-23, GA)0.3176560.2162400.267841
(2022-23, Pts)0.7048920.2098540.022753
(2023-24, W)0.8966750.0034330.042823
(2023-24, D)0.1697010.3245930.236817
(2023-24, L)0.8599580.0268630.001714
(2023-24, GF)0.7989440.0038110.006929
(2023-24, GA)0.6092150.0170980.005541
(2023-24, Pts)0.9311960.0002740.015131

Column correlations

Column correlations (also called loadings) measure the Pearson correlation between each original variable and each MFA component. The squared correlations equal the cosine similarities.

mfa.column_correlations

component012
variable
(2021-22, W)0.881990-0.3794160.152508
(2021-22, D)-0.339390-0.243071-0.789014
(2021-22, L)-0.7298560.5573140.330582
(2021-22, GF)0.817458-0.474511-0.028325
(2021-22, GA)-0.6281690.7023540.081710
(2021-22, Pts)0.868301-0.454356-0.001584
(2022-23, W)0.8560580.401114-0.043721
(2022-23, D)-0.4920920.116892-0.554631
(2022-23, L)-0.715459-0.5468480.389624
(2022-23, GF)0.8493150.260247-0.152257
(2022-23, GA)-0.563610-0.4650160.517534
(2022-23, Pts)0.8395790.458098-0.150842
(2023-24, W)0.9469290.0585920.206938
(2023-24, D)-0.411948-0.569731-0.486639
(2023-24, L)-0.9273390.163900-0.041400
(2023-24, GF)0.8938370.061729-0.083240
(2023-24, GA)-0.780522-0.130760-0.074441
(2023-24, Pts)0.964985-0.0165460.123008

Group results

One of MFA’s most useful features is its ability to summarize results at the group level. Instead of examining 18 individual variable loadings, you can look at 3 group-level summaries — one per season. These correspond to FactoMineR’s result$group outputs.

This is especially valuable when groups are numerous and contain many variables, because a single group-level plot can replace dozens of individual variable plots.

Group coordinates indicate how strongly each group is associated with each MFA component. They are computed by summing the weighted squared variable coordinates within each group. A high value means that the group’s variables are collectively well-aligned with that component.

mfa.group_coordinates_

component012
group
2021-220.7126790.3161750.167604
2022-230.7386800.2211910.177221
2023-240.9241540.0814750.066935

Group contributions show what percentage of each component’s variance is explained by each group. They sum to 100% across groups for each component. Here, the 2023-24 season contributes more to component 0 than the other seasons, meaning this component captures structure that is particularly prominent in that season.

mfa.group_contributions_.style.format('{:.0%}')
component012
group   
2021-2230%51%41%
2022-2331%36%43%
2023-2439%13%16%

Group cosine similarities (cos²) measure the quality of representation of each group on each component. The 2023-24 season has a very high cos² on component 0 (0.82), meaning this component captures most of that season’s internal structure. In contrast, the same season has a very low cos² on components 1 and 2, meaning those axes tell us little about within-season variation for 2023-24.

mfa.group_cosine_similarities_

component012
group
2021-220.4739670.0932860.026214
2022-230.5021880.0450290.028906
2023-240.8183000.0063600.004293

Partial axes

Partial axes answer the question: how does each group’s own internal structure relate to the global MFA structure? Each group has its own PCA components (from the separate analyses in step 1). Partial axes show the correlation between these group-level components and the global MFA components.

This is useful for understanding whether all groups share the same dominant pattern (high correlations between each group’s first axis and the global first axis) or whether some groups introduce distinct structure on different global axes.

Partial correlations show the Pearson correlation between each group’s PCA components and the global MFA components. Here, all three seasons’ first component correlates strongly with the global first component (0.84, 0.85, 0.96), meaning the dominant pattern (strong vs. weak teams) is consistent across seasons. The 2021-22 season’s second component correlates with the global third component (-0.79), suggesting that season has a unique secondary pattern that shows up on a different global axis.

mfa.partial_correlations_

component012
groupcomponent
2021-2200.835657-0.524262-0.028301
1-0.215210-0.361226-0.786578
2-0.197522-0.3564720.288117
2022-2300.8527690.436041-0.202374
1-0.1942180.310181-0.689932
2-0.014356-0.063442-0.040341
2023-2400.9605820.0732470.129441
10.078869-0.613825-0.465048
2-0.0518970.1914980.185045

This can be visualized with plot_partial. Each arrow represents one group’s PCA axis projected onto the global MFA plane. Arrows close to the unit circle indicate a strong correlation. The plot clearly shows all three seasons’ first axis (dim 0) clustering on the right side of the circle — confirming they share the same dominant structure.

mfa.plot_partial(x_component=0, y_component=1)

Partial contributions show how much each group’s PCA axes contribute to each global MFA component. They sum to 100% per component across all group axes.

mfa.partial_contributions_.style.format('{:.0%}')
 component012
groupcomponent   
2021-22029%45%0%
11%6%40%
20%1%1%
2022-23031%31%10%
10%4%34%
20%0%0%
2023-24039%1%4%
10%11%10%
20%1%1%