# DivBO: Diversity-aware CASH for Ensemble Learning

## Requirements
Please install all python packages by:
```bash
pip install -r requirement.txt
```
If failed, try installing Numpy, Scipy and Cython first.

## Datasets
The experiments use the datasets collected from OpenML. Please download them and place them under `data/cls_datasets/*`.
For example, to run the dataset quake, please ensure `quake.csv` is under the directory `data/cls_datasets/`.

## Baselines
This project includes the following 9 methods:
1. Random Search (RS)
2. Bayesian Optimization (BO)
3. Rising Bandit (RB)
4. Neural Ensemble Search (NES)
5. Ensemble Optimization (EO)
6. Random Search with Post-hoc Ensemble Selection (RS-ES)
7. Bayesian Optimization with Post-hoc Ensemble Selection (BO-ES)
8. Rising Bandit with Post-hoc Ensemble Selection (RB-ES)
9. DivBO

## Reproduction details
We use the script `run_exp.py` to perform CASH optimization.
The choices of baselines are rs (RS), bo (BO), rb(RB), eo (EO), rea_es (NES), and div_bo (DivBO).
Please refer to the script for more details about the arguments.

E.g., Run BO-ES on quake and spambase given a budget of 250 iterations for 10 times. The ensemble is set to 25.
```bash
python run_exp.py --datasets quake,spambase --algos bo --rep_num 10 --iter_num 250 --ens_size 25
```

Note that, the stored search results of BO and BO-ES are exactly the same, as the ensemble is built based on the search results after the optimization process.

To generate the final results using the stored search results, as mentioned in the paper, we use:
1) All the observation history for ensemble selection (RS-ES, BO-ES, RB-ES, DivBO);
2) The last 12 observations as the fixed ensemble pool in EO;
3) The last 30 observations as the population pool in NES;
4) The best among observations for non-ensemble methods (RS, BO, RB, DivBO-).