# Multiple correspondence analysis

## Data

Multiple correspondence analysis is an extension of correspondence analysis. It should be used when you have more than two categorical variables. The idea is to one-hot encode a dataset, before applying correspondence analysis to it.

As an example, we’re going to use the balloons dataset taken from the UCI datasets website.

``````import pandas as pd

dataset.columns = ['Color', 'Size', 'Action', 'Age', 'Inflated']
``````

ColorSizeActionAgeInflated
1YELLOWSMALLSTRETCHCHILDF
3YELLOWSMALLDIPCHILDF

## Fitting

``````import prince

mca = prince.MCA(
n_components=3,
n_iter=3,
copy=True,
check_input=True,
engine='sklearn',
random_state=42
)
mca = mca.fit(dataset)
``````

The way MCA works is that it one-hot encodes the dataset, and then fits a correspondence analysis. In case your dataset is already one-hot encoded, you can specify `one_hot=False` to skip this step.

``````one_hot = pd.get_dummies(dataset)

mca_no_one_hot = prince.MCA(one_hot=False)
mca_no_one_hot = mca_no_one_hot.fit(one_hot)
``````

## Eigenvalues

``````mca.eigenvalues_summary
``````

eigenvalue% of variance% of variance (cumulative)
component
00.40240.17%40.17%
10.21121.11%61.28%
20.18618.56%79.84%

## Coordinates

``````mca.row_coordinates(dataset).head()
``````

012
00.7053875.369158e-150.758639
1-0.3865865.724889e-150.626063
2-0.3865864.807799e-150.626063
3-0.8520145.108782e-150.562447
40.783539-6.333333e-010.130201
``````mca.column_coordinates(dataset).head()
``````

012
Color_PURPLE0.1173086.892024e-01-0.641270
Color_YELLOW-0.130342-7.657805e-010.712523
Size_LARGE0.117308-6.892024e-01-0.641270
Size_SMALL-0.1303427.657805e-010.712523
Action_DIP-0.853864-6.367615e-16-0.079340

## Visualization

``````mca.plot(
dataset,
x_component=0,
y_component=1,
show_column_markers=True,
show_row_markers=True,
show_column_labels=False,
show_row_labels=False
)
``````

## Contributions

``````mca.row_contributions_.head().style.format('{:.0%}')
``````
012
07%0%16%
12%0%11%
22%0%11%
310%0%9%
48%10%0%
``````mca.column_contributions_.head().style.format('{:.0%}')
``````
012
Color_PURPLE0%24%23%
Color_YELLOW0%26%26%
Size_LARGE0%24%23%
Size_SMALL0%26%26%
Action_DIP15%0%0%

## Cosine similarities

``````mca.row_cosine_similarities(dataset).head()
``````

012
00.4614782.673675e-290.533786
10.1522563.338988e-290.399316
20.1522562.354904e-290.399316
30.6533352.348969e-290.284712
40.5926063.871772e-010.016363
``````mca.column_cosine_similarities(dataset).head()
``````

012
Color_PURPLE0.0152905.277778e-010.456920
Color_YELLOW0.0152905.277778e-010.456920
Size_LARGE0.0152905.277778e-010.456920
Size_SMALL0.0152905.277778e-010.456920
Action_DIP0.5302432.948838e-310.004578