Frequently Asked Questions

How to use Prince with sklearn pipelines?

Prince estimators consume and produce pandas DataFrames. If you want to use them in a sklearn pipeline, you can sklearn’s set_output API. This way, you can tell sklearn that the pipeline should exchange DataFrames instead of numpy arrays between the steps.

import prince
from sklearn import datasets
from sklearn import impute
from sklearn import pipeline

pipe = pipeline.make_pipeline(
    impute.SimpleImputer(),
    prince.PCA()
)
pipe.set_output(transform='pandas')
dataset = datasets.load_iris()
pipe.fit_transform(dataset.data).head()

component	0	1
0	-2.264703	0.480027
1	-2.080961	-0.674134
2	-2.364229	-0.341908
3	-2.299384	-0.597395
4	-2.389842	0.646835