# HippoRAG

## Setup Environment

Create a conda environment and install dependency:

```shell
conda create -n hipporag python=3.9
conda activate hipporag
pip install -r requirements.txt

GPU_DEVICES=0,1,2,3 #Replace with your own free GPU Devices
export OPENAI_API_KEY='Add your own OpenAI API key here.'
export TOGETHER_API_KEY='Add your own TogetherAI API key here.'
```

To use ColBERTv2, download the pre-trained [checkpoint](https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/colbertv2.0.tar.gz) and put it under `exp/colbertv2.0`.

```shell
cd exp
wget https://downloads.cs.stanford.edu/nlp/data/colbert/colbertv2/colbertv2.0.tar.gz
tar –xvzf colbertv2.0.tar.gz
```

## Data

We provide all the necessary data to reproduce our experiments. 

To save cost and time in reproducibility efforts, we also include the knowledge graphs generated via open IE with GPT-3.5 Turbo (1106), both Llama-3 models and REBEL on all three subsets and the hyperparameter tuning dataset. We also include the NER results obtained via GPT-3.5 Turbo (1106) on all datasets.

## Baselines

Please check `src/baselines/README.md` for more details.

## Running HippoRAG

Using our HippoRAG framework requires a two-step process, indexing and retrieval.

### Indexing

To run indexing for both our main experiments and our ablations, run the following bash scripts. Retrieval will fail if this step does not succeed.

```shell
bash src/setup_hipporag_main_exps.sh $GPU_DEVICES
```

### HippoRAG Retrieval

After running indexing, run the following bash scripts to test both single-step and multi-step retrieval using HippoRAG with both Contriever and ColBERTv2. 

#### HippoRAG Only
```shell
bash src/run_hipporag_main_exps.sh```
```

#### HippoRAG w/ IRCoT

```shell
bash src/run_hipporag_ircot_main_exps.sh
```

#### Ablation Experiments

To run all our ablations, run the following bash scripts:

```shell
bash src/setup_hipporag_ablations.sh $GPU_DEVICES
bash src/run_hipporag_ablations.sh
```

## Hyperparameter Tuning

To reproduce our hyperparameter tuning, we must first run indexing on the MuSiQue training subset by running the following script:

```shell
bash src/setup_hipporag_hyperparameter_tune.sh $GPU_DEVICES
```

After indexing is completed, run the following script and note the performance of each hyperparameter combination tested.

```shell
bash src/run_hipporag_hyperparameter_tune.sh
```

## Question Answering

Please check `src/qa/README.md` for more details. Running QA can only be done after running retrieval on baselines and HippoRAG since it uses the output of retrieval.

## Path-Finding Multi-Hop QA Case Study

To run the case study examples shown in our paper, which we also include in our data directory, run the following scripts. Note that to run these examples, it will be necessary to set your own OpenAI API Key.

### Indexing

```shell
bash src/setup_hipporag_case_study.sh $GPU_DEVICES
```

### Retrieval

```shell
bash src/run_hipporag_case_study.sh
```

After running these, you can explore the outputs inside the ```output/ircot/``` directory.

## Citation

TBD