# Correspondence analysis

## Data

You can use correspondence analysis when you have a contingency table. In other words, when you want to analyse the dependency between two categorical variables. For instance, here is a dataset which counts the number of voters per region for each candidate in the 2022 French presidential elections.

``````import prince

``````

candidateLe PenMacronMélenchonAbstention
region
Auvergne-Rhône-Alpes94329411750858974341228490
Bourgogne-Franche-Comté409639394117277899456682
Bretagne385393647172407527543425
Centre-Val de Loire347845383851251259459528
Corse42283267951977990636

☝️ This dataset is already available as a contingency matrix. It’s more common to have at one’s disposal a flat dataset. If this is the case, a contigency matrix can be obtained using the `pivot_table` function in `pandas`.

## Fitting

``````ca = prince.CA(
n_components=3,
n_iter=3,
copy=True,
check_input=True,
engine='sklearn',
random_state=42
)
ca = ca.fit(dataset)
``````

## Eigenvalues

``````ca.eigenvalues_summary
``````

eigenvalue% of variance% of variance (cumulative)
component
00.02140.82%40.82%
10.01836.15%76.97%
20.00510.08%87.04%

## Coordinates

``````ca.row_coordinates(dataset).head()
``````

012
region
Auvergne-Rhône-Alpes-0.0586380.0383030.000937
Bourgogne-Franche-Comté-0.070815-0.077604-0.016357
Bretagne-0.0836550.110491-0.058991
Centre-Val de Loire-0.024624-0.055799-0.046167
Corse0.127370-0.2817550.279328
``````ca.column_coordinates(dataset).head()
``````

012
candidate
Arthaud-0.034732-0.091291-0.122722
Dupont-Aignan-0.094708-0.064696-0.023546
Hidalgo-0.1378970.0528460.101351
Lassalle-0.271867-0.0914070.365112

## Visualization

``````ca.plot(
dataset,
x_component=0,
y_component=1,
show_row_markers=True,
show_column_markers=True,
show_row_labels=False,
show_column_labels=False
)
``````
``````ca.plot(
dataset,
x_component=0,
y_component=1,
show_row_markers=False,
show_column_markers=False,
show_row_labels=False,
show_column_labels=True
)
``````

## Contributions

``````ca.row_contributions_.head().style.format('{:.0%}')
``````
012
Auvergne-Rhône-Alpes2%1%0%
Bourgogne-Franche-Comté1%1%0%
Bretagne2%4%4%
Centre-Val de Loire0%1%2%
Corse0%2%8%
``````ca.column_contributions_.head().style.format('{:.0%}')
``````
012
Arthaud0%0%1%
Dupont-Aignan1%0%0%
Hidalgo1%0%3%
Lassalle8%1%61%

## Cosine similarities

``````ca.row_cosine_similarities(dataset).head()
``````

012
region
Auvergne-Rhône-Alpes0.5683310.2425000.000145
Bourgogne-Franche-Comté0.3656260.4390860.019507
Bretagne0.2127060.3710610.105772
Centre-Val de Loire0.0763560.3920780.268406
Corse0.0668250.3270010.321391
``````ca.column_cosine_similarities(dataset).head()
``````

012
candidate
Arthaud0.0246190.1700880.307375
Dupont-Aignan0.3052770.1424520.018869
Hidalgo0.2924280.0429470.157968