# ReMoDetect

This repository is based on the open-sourced [Fast-DetectGPT](https://github.com/baoguangsheng/fast-detect-gpt.git) project and some codes include their licences. Follow the instructions below to set up and run the project.

## Prerequisites

- Ensure you have Python and Bash installed on your system.
- Download the cleaned full-text HC3 dataset in English from [YuchuanTian/AIGC_text_detector.git](https://github.com/YuchuanTian/AIGC_text_detector.git).

## Setup Instructions

1. **Download and Extract Dataset**

   Download the `unfilter_full/en_train_cleaned.csv` file from the repository linked above and extract it into the `./data` directory.

2. **Configure Environment Variables**

   Set your Groq API key in the `data_process.sh` script.

   ```bash
   export GROQ_API_KEY="your_groq_api_key_here"
   ```

3. **Run Data Processing Script**

   Execute the `data_process.sh` script to process the dataset.

   ```bash
   bash data_process.sh
   ```

4. **Generate Evaluation Data (Optional)**
    
   For the convenient we provide evaluation data for gpt-3.5-turbo, gpt-4, gpt4turbo, llama-3-70b-instruct, gemini, claude opus, claude haiku, claude sonnet.

   If you want to generate evaluation data, place your Azure, Anthropic, and Groq API keys in the `gen_eval_data.sh` script.

   ```bash
   export AZURE_API_KEY="your_azure_api_key_here"
   export ANTHROPIC_API_KEY="your_anthropic_api_key_here"
   export GROQ_API_KEY="your_groq_api_key_here"
   ```

   Then, run the script:

   ```bash
   bash gen_eval_data.sh
   ```

5. **Evaluate the Model**

   Finally, run the `eval.sh` script to evaluate the model.

   ```bash
   bash eval.sh
   ```

## Machine Specifications

- **CPU:** Intel(R) Xeon(R) Gold 6426Y CPU @ 2.50GHz
- **GPU:** NVIDIA A6000 48GB
