Prince foo

Correspondence analysis

Resources

Data

You can use correspondence analysis when you have a contingency table. In other words, when you want to analyse the dependency between two categorical variables. For instance, here is a dataset which counts the number of voters per region for each candidate in the 2022 French presidential elections.

import prince

dataset = prince.datasets.load_french_elections()
dataset[['Le Pen', 'Macron', 'Mélenchon', 'Abstention']].head()

candidateLe PenMacronMélenchonAbstention
region
Auvergne-Rhône-Alpes94329411750858974341228490
Bourgogne-Franche-Comté409639394117277899456682
Bretagne385393647172407527543425
Centre-Val de Loire347845383851251259459528
Corse42283267951977990636

☝️ This dataset is already available as a contingency matrix. It’s more common to have at one’s disposal a flat dataset. If this is the case, a contigency matrix can be obtained using the pivot_table function in pandas.

Fitting

ca = prince.CA(
    n_components=3,
    n_iter=3,
    copy=True,
    check_input=True,
    engine='sklearn',
    random_state=42
)
ca = ca.fit(dataset)

Eigenvalues

ca.eigenvalues_summary

eigenvalue% of variance% of variance (cumulative)
component
00.02140.82%40.82%
10.01836.15%76.97%
20.00510.08%87.04%

Coordinates

ca.row_coordinates(dataset).head()

012
region
Auvergne-Rhône-Alpes-0.0586380.0383030.000937
Bourgogne-Franche-Comté-0.070815-0.077604-0.016357
Bretagne-0.0836550.110491-0.058991
Centre-Val de Loire-0.024624-0.055799-0.046167
Corse0.127370-0.2817550.279328
ca.column_coordinates(dataset).head()

012
candidate
Arthaud-0.034732-0.091291-0.122722
Dupont-Aignan-0.094708-0.064696-0.023546
Hidalgo-0.1378970.0528460.101351
Jadot-0.1262280.188836-0.031329
Lassalle-0.271867-0.0914070.365112

Visualization

ca.plot(
    dataset,
    x_component=0,
    y_component=1,
    show_row_markers=True,
    show_column_markers=True,
    show_row_labels=False,
    show_column_labels=False
)
ca.plot(
    dataset,
    x_component=0,
    y_component=1,
    show_row_markers=False,
    show_column_markers=False,
    show_row_labels=False,
    show_column_labels=True
)

Contributions

ca.row_contributions_.head().style.format('{:.0%}')
 012
Auvergne-Rhône-Alpes2%1%0%
Bourgogne-Franche-Comté1%1%0%
Bretagne2%4%4%
Centre-Val de Loire0%1%2%
Corse0%2%8%
ca.column_contributions_.head().style.format('{:.0%}')
 012
Arthaud0%0%1%
Dupont-Aignan1%0%0%
Hidalgo1%0%3%
Jadot3%7%1%
Lassalle8%1%61%

Cosine similarities

ca.row_cosine_similarities(dataset).head()

012
region
Auvergne-Rhône-Alpes0.5683310.2425000.000145
Bourgogne-Franche-Comté0.3656260.4390860.019507
Bretagne0.2127060.3710610.105772
Centre-Val de Loire0.0763560.3920780.268406
Corse0.0668250.3270010.321391
ca.column_cosine_similarities(dataset).head()

012
candidate
Arthaud0.0246190.1700880.307375
Dupont-Aignan0.3052770.1424520.018869
Hidalgo0.2924280.0429470.157968
Jadot0.2656420.5945000.016364
Lassalle0.3070400.0347090.553774