Command-line interface (CLI)
Installation
First, install Go, set your GOPATH
, and make sure $GOPATH/bin
is on your PATH
. Here are some additional ressources depending on your operating system:
Next, regardless of your OS, you can install the xgp
CLI with the following command.
go install github.com/MaxHalford/xgp/cmd/xgp
If xgp --help
runs without any errors then the installation was successful. If you encounter an error feel free to open an issue on GitHub.
Usage
Tip
Apart from the following documentation you can also check out the command-line usage examples.
Tip
Run xgp <command> -h
to get help with a command. For example xgp fit -h
will display the help for the fit
command.
Training
The fit
command trains programs against a training dataset and saves the best one to a JSON file. The only required argument is a path to a CSV file which acts as the training dataset. The dataset is that it should contain only numerical data. Moreover the first row should contain column names.
Warning
For the while XGP does not handle categorical data. You should preemptively encode the categorical features in your dataset before feeding it to XGP. The recommended way is to use label encoding for ordinal data and one-hot encoding for non-ordinal data.
Warning
For the while xgp does not handle missing values.
Once your dataset is ready, you can train XGP on it with the following command.
>>> xgp fit train.csv
This will evaluate and evolve many programs with default values before finally outputting the best obtained program to a JSON file. By default the JSON file is named program.json
. The JSON file can then be used by the predict
command to make predictions on another dataset.
Info
Whether the task is classification or regression is guessed from the loss metric parameter. The available loss metrics are listed here
There are many parameters you can use; the details and default values and are specified in the training parameters section
>>> xgp fit train.csv --loss mse --val mae --gens 64 --indis 256 --parsimony 0.001
The following parameters are available with the fit
command in addition to training parameters.
Argument | Description | Default |
---|---|---|
ignore | Comma-separated list of columns to ignore | |
output | Path where to save the JSON representation of the best program | program.json |
target | Name of the target column in the training and validation datasets | y |
val | Path to a validation dataset that can be used to monitor out-of-bag performance |
If you use the val
argument then the best model of each generation will be scored against the validation dataset. The resulting score is called the out-of-bag score because it is obtained by making predictions on a dataset that the model hasn't seen.
Predicting
Once you have produced a program with the fit
command you can use it to make predictions on a dataset. The test set should have exactly the same format as the training set. Specifically the columns in the test set should be ordered in the same way they were in the training set.
>>> xgp predict test.csv
This will make predictions on test.csv
and save them to a specificied path. The default path is y_pred.csv
. The following arguments are available for the predict
command.
Argument | Description | Default |
---|---|---|
keep | Comma-separated list of columns to keep in the CSV output | |
output | Path to the CSV output | y_pred.csv |
program | Path to the program used to make predictions | program.json |
target | Name of the target column in the CSV output | y |
Scoring
If you don't want to save predictions and instead only want to evaluate a program then you can use the score
command. The score
command will open a program, make predictions against a given dataset, and output a prediction score. By default the scoring metric is the loss metric used for training.
>>> xgp score test.csv
The following arguments are available for the score
command.
Argument | Description | Default |
---|---|---|
eval | Evaluation metric | Same as the loss metric used during training |
program | Path to the program to score | program.json |
target | Name of the target column in the dataset | y |
Visualization
Because programs can be represented as trees, Graphviz can be used to visualize them. The todot
command takes a program as input and outputs the Graphviz representation of the program. You can then copy/paste the output and use a service such as webgraphviz to obtain the visualization. By default the output will not be saved to a file but will however be displayed in the terminal.
>>> xgp todot program.json
You can also feed the todot
command a formula instead of a JSON file.
>>> xgp todot "sum(X[13], 42)"
The following arguments are available for the todot
command.
Argument | Description | Default |
---|---|---|
output | Path to the DOT file output | program.dot |
save | Save to a DOT file or not | False |
shell | Output in the terminal or not | True |