Multiple factor analysis
Table of contents
Resources
Data
Multiple factor analysis (MFA) is meant to be used when you have groups of variables. In practice, it builds a PCA on each group. It then fits a global PCA on the results of the so-called partial PCAs.
The dataset used in the following example come from this paper. In the dataset, three experts give their opinion on six different wines. Each opinion for each wine is recorded as a variable. We thus want to consider the separate opinions of each expert whilst also having a global overview of each wine. MFA is the perfect fit for this kind of situation.
import prince
dataset = prince.datasets.load_burgundy_wines()
dataset
expert | Oak type | Expert 1 | Expert 2 | Expert 3 | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
aspect | Fruity | Woody | Coffee | Red fruit | Roasted | Vanillin | Woody | Fruity | Butter | Woody | |
Wine 1 | 1 | 1 | 6 | 7 | 2 | 5 | 7 | 6 | 3 | 6 | 7 |
Wine 2 | 2 | 5 | 3 | 2 | 4 | 4 | 4 | 2 | 4 | 4 | 3 |
Wine 3 | 2 | 6 | 1 | 1 | 5 | 2 | 1 | 1 | 7 | 1 | 1 |
Wine 4 | 2 | 7 | 1 | 2 | 7 | 2 | 1 | 2 | 2 | 2 | 2 |
Wine 5 | 1 | 2 | 5 | 4 | 3 | 5 | 6 | 5 | 2 | 6 | 6 |
Wine 6 | 1 | 3 | 4 | 4 | 3 | 5 | 4 | 5 | 1 | 7 | 5 |
Fitting
The groups are passed as a dictionary to the fit
method.
groups = dataset.columns.levels[0].drop("Oak type").tolist()
groups
['Expert 1', 'Expert 2', 'Expert 3']
mfa = prince.MFA(
n_components=2,
n_iter=3,
copy=True,
check_input=True,
engine='sklearn',
random_state=42
)
mfa = mfa.fit(dataset, groups=groups)
Eigenvalues
mfa.eigenvalues_summary
eigenvalue | % of variance | % of variance (cumulative) | |
---|---|---|---|
component | |||
0 | 2.835 | 88.82% | 88.82% |
1 | 0.357 | 11.18% | 100.00% |
Coordinates
The MFA
inherits from the PCA
class, which means it provides access to the PCA
methods and properties. For instance, the row_coordinates
method will return the global coordinates of each wine.
mfa.row_coordinates(dataset)
0 | 1 | |
---|---|---|
Wine 1 | -2.172155 | -0.508596 |
Wine 2 | 0.557017 | -0.197408 |
Wine 3 | 2.317663 | -0.830259 |
Wine 4 | 1.832557 | 0.905046 |
Wine 5 | -1.403787 | 0.054977 |
Wine 6 | -1.131296 | 0.576241 |
However, all the other methods are not implemented yet. They will raise a NotImplemented
exception if you call them.
mfa.group_row_coordinates(dataset)
group | Expert 1 | Expert 2 | Expert 3 | |||
---|---|---|---|---|---|---|
component | 0 | 1 | 0 | 1 | 0 | 1 |
Wine 1 | -2.764432 | -1.104812 | -2.213928 | -0.863519 | -1.538106 | 0.442545 |
Wine 2 | 0.773034 | 0.298919 | 0.284247 | -0.132135 | 0.613771 | -0.759009 |
Wine 3 | 1.991398 | 0.805893 | 2.111508 | 0.499718 | 2.850084 | -3.796390 |
Wine 4 | 1.981456 | 0.927187 | 2.393009 | 1.227146 | 1.123206 | 0.560803 |
Wine 5 | -1.292834 | -0.620661 | -1.492114 | -0.488088 | -1.426414 | 1.273679 |
Wine 6 | -0.688623 | -0.306527 | -1.082723 | -0.243122 | -1.622541 | 2.278372 |
Visualization
mfa.plot(
dataset,
x_component=0,
y_component=1
)
The first axis explains most of the difference between the wine ratings. This difference is actually due to the oak type of the barrels they were fermented in.
Partial PCAs
An MFA is essentially a PCA applied to the outputs of partial PCA. Indeed, a PCA is first fitted to each group. A partial PCA can be accessed as so:
dataset['Expert 1']
aspect | Fruity | Woody | Coffee |
---|---|---|---|
Wine 1 | 1 | 6 | 7 |
Wine 2 | 5 | 3 | 2 |
Wine 3 | 6 | 1 | 1 |
Wine 4 | 7 | 1 | 2 |
Wine 5 | 2 | 5 | 4 |
Wine 6 | 3 | 4 | 4 |
mfa['Expert 1'].eigenvalues_summary
eigenvalue | % of variance | % of variance (cumulative) | |
---|---|---|---|
component | |||
0 | 2.863 | 95.42% | 95.42% |
1 | 0.120 | 3.99% | 99.41% |