# CoDA CoDE Supplement

Please find herein implementations of CoDA and the various experiments used in our paper. To maintain anonymity, we refactored a few things; we ran sanity checks to ensure the commands below work in a new environment and obtain comparable results, but cannot guarantee 100% reproducibility with this version. 


## Installation

There is a `requirements.txt` that was works with venv:

```
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```

Then install the appropriate version of `Pytorch` by following the instructions here: https://pytorch.org/get-started/locally/.

To run `Mujoco` environments you need to have the Mujoco binaries and a license key. Follow the instructions [here](https://github.com/openai/mujoco-py#obtaining-the-binaries-and-license-key).

To test that everything compiles run:

```
PYTHONPATH=./ pytest tests
```

## Relevant pieces of code

- For implementation of `CoDA` (Algorithm 1) see `pong_and_fetch/coda_generic.py` (vectorized) _or_ `spriteworld/coda.py` (not vectorized / inefficient).
- For how `CoDA` (Algorithm 1) is used, see `pong_and_fetch/coda_module.py` or `spriteworld/train_RL_agent.py`.
- For implementation of the transformer masking model (`SANDy-Transformer` in Appendix) see `pong_and_fetch/sandy_module.py`.
- For implementation of the `SANDy-Mixture` (from Appendix) see `spriteworld/structured_transitions.py`.


## Running Experiments

#### Pong

From the main folder, run the experiment:

```
PYTHONPATH=./ python pong_and_fetch/pong/pong_experiment.py --seed 0 --num_real_samples 25000 --num_coda_samples 25000
```

(Should replicate corresponding results in paper (25K real data size with 1:1 Real:CoDA ratio))

#### Fetch

From the main folder, run the experiment:

```
PYTHONPATH=./ python pong_and_fetch/train_coda.py --env disentangledpush --tb CODA --replay_size 1000000 --coda_buffer_size 3000000 --batch_size 2000 --her futureactual_2_2 --max_steps 1000000 --coda_every 250 --coda_source_pairs 2000 --relabel_type push_heuristic --max_coda_ratio 0.75 --seed 111 --parent_folder ./push_results --num_envs 6
```

(Should achieve test reward better than -40 within 30,000 steps and better than -25 in 50,000 steps)

#### Spriteworld

WARNING: Parallelization in this implementation is very inefficient relative to the `pong_and_fetch` one; it is slow. Use higher `max_cpu` at your own risk---it is very memory hungry.

From the `spriteworld` folder, pretrain the transformer model that will be used for both CoDA and Dyna:

```
python spriteworld_scm_discovery.py \
  --num_epochs 125 \
  --num_runs 1 \
  --num_sprites 4 \
  --batch_size 1000 \
  --num_examples 50000 \
  --mask_reg 0. \
  --weight_reg 0. \
  --attn_reg 0. \
  --weight_decay 0. \
  --model_type SSA \
  --results_dir ./attn_ssa
```

(Should achieve AUC > 0.97)

Then run the Spriteworld experiment: 

```
python train_RL_agent.py --policy TD3 --relabel_type attn_mech --seed 0 --reward_type place_partial_2 --attn_mech_dir attn_ssa --max_cpu 2
```