## Prepare the Environment
```
conda create -n jiuzhang python=3.10
conda activate jiuzhang
pip install -r requirements.txt
```

## Synthesis
> The code is adapted from https://github.com/huggingface/cosmopedia

### Prepare Data for Synthesis Model to Synthesize
First set data paths, then run
```
bash sh/synthesis/build_cot_data.sh
bash sh/synthesis/build_code_data.sh
```

### Synthesize Natural Language Reasoning Data
First set data paths, then run
```
bash sh/synthesis/cot_synthesis.sh
```

### Synthesize Code Data
First set data paths, then run
```
bash sh/synthesis/code_synthesis.sh
```

### Trainnig Synthesis Model
First set data paths, then run
```
bash sh/synthesis/train.sh
```

## Selecting Valuable Data
Please refer to https://github.com/princeton-nlp/LESS

## Pre-training and Fine-tuning
Due to space limit, we provide sampled pre-training and fine-tuning data in `data`

Run
```
bash sh/train_jiuzhang/llama3.sh
bash sh/train_jiuzhang/mistral.sh
bash sh/train_jiuzhang/mixtral.sh
```

## Evalation
> The code is adapted from https://github.com/ZubinGou/math-evaluation-harness
* For base model
```python
export CUDA_VISIBLE_DEVICES=""
bash sh/eval/base_model.sh cot deepseek-ai/deepseek-math-7b-base
```
* For finetuned models for natural language reasoning
```python
export CUDA_VISIBLE_DEVICES=""
bash sh/eval/instruct_model.sh jiuzhang /path/to/model
bash sh/eval/instruct_model.sh deepseek-math deepseek-ai/deepseek-math-7b-rl
```
* For finetuned models for tool manipulation
```python
export CUDA_VISIBLE_DEVICES=""
bash sh/eval/instruct_model.sh jiuzhang_tora /path/to/model
```