# README for Comprehensive Model Training and Testing Framework

## Overview

The Python script (`src/runner.py`) is part of a training and testing framework tailored for experiments to train neural networks (NNs) to answer MPE queries over probabilistic models (PGMs). It is highly configurable, supporting a variety of model types, training regimes, and optimization strategies to facilitate deep learning experiments across diverse tasks and datasets.

## Features

- **Multiple Model Support:** Enables the use of different models such as neural networks, transformers, and AEs.
- **Dual Network Training:** Supports dual network setups for advanced training strategies involving the GUIDE framework.
- **PGM Integration:** Allows for training neural networks for answering MPE queries over different Probabilistic Models.
- **Flexible Training and Testing Options:** Offers detailed control over epochs, batch sizes, learning rates, and the use of CUDA/MPS for training.
- **Extensive Debugging and Optimization Tools:** Includes options for debugging, early stopping, and gradient clipping to fine-tune the training process.

## Dependencies

- Python 3.6 or newer
- PyTorch
- Other relevant machine learning libraries (e.g., Numpy, Pandas, Scikit-Learn)

## Installation

To install the necessary Python packages, use pip:

```bash
pip install torch numpy pandas scikit-learn
```

## Usage

To run the script, execute it from the command line with necessary arguments to control various aspects of the training and testing process. We allow users to customize the training and testing process by specifying different parameters, such as model type, dataset, task, and optimization strategies. Below is an example command to run the script:

### Running the SSMP Model with traditional Inference

```bash
python3 src/runner.py \
  --no-debug \
  --query-prob="<query_prob>" \
  --model="<model>" \
  --dataset "<dataset>" \
  --task "<task>" \
  --student-layers "<layers>" \
  --model-type="<model_type>" \
  --lr-scheduler "<lr_scheduler>" \
  --input-type "<data_type>" \
  --embedding-type "<embedding_type>" \
  --embedding-size "<embedding_size>" \
  --experiment-dir "./experiments/" \
  --pgm "<pgm>" \
  --train-optimizer "<optimizer>" \
  --add-gradient-clipping \
  --pgm-model-directory "<saved-pgm-directory>" \
  --dataset-directory "<sampled-data-directory>"
```

### Running the SSMP Model with ITSELF Inference

```bash
python3 src/runner.py \
  --no-debug \
  --query-prob="<query_prob>" \
  --model="<model>" \
  --dataset "<dataset>" \
  --task "<task>" \
  --student-layers "<layers>" \
  --model-type="<model_type>" \
  --lr-scheduler "<lr_scheduler>" \
  --input-type "<data_type>" \
  --embedding-type "<embedding_type>" \
  --embedding-size "<embedding_size>" \
  --train-on-test-set  \
  --num-iter-train-on-test 10 \
  --experiment-dir "./experiments/" \
  --pgm "<pgm>" \
  --train-optimizer "<train-optimizer>" \
  --test-optimizer "<test-optimizer>" \
  --add-gradient-clipping \
  --pgm-model-directory "<saved-pgm-directory>" \
  --dataset-directory "<sampled-data-directory>"
```

### Running the GUIDE Model

```bash
    python3 src/runner.py \
    --query-prob="<query_prob>" \
    --model="<model>" \
    --dataset "<dataset>" \
    --task "<task>" \
    --student-layers "<layers>" \
    --teacher-layers "<layers>" \
    --model-type="<model_type>" \
    --input-type "<data_type>" \
    --embedding-type "<embedding_type>" \
    --experiment-dir "./experiments/" \
    --pgm "<pgm>" \
    --dual-network \
    --tot-train-dn 100 \
    --copy-student-to-teacher-dn \
    --pgm-model-directory "<saved-pgm-directory>" \
    --dataset-directory "<sampled-data-directory>"
```

### Running the GUIDE Model with ITSELF Inference

```bash
python3 src/runner.py \
  --query-prob="<query_prob>" \
  --model="<model>" \
  --dataset "<dataset>" \
  --task "<task>" \
  --student-layers "<layers>" \
  --teacher-layers "<layers>" \
  --model-type="<model_type>" \
  --train-on-test-set \
  --num-iter-train-on-test 100 \
  --experiment-dir "./experiments/" \
  --pgm "<pgm>" \
  --dual-network \
  --tot-train-dn 100 \
  --copy-student-to-teacher-dn \
  --pgm-model-directory "<saved-pgm-directory>" \
  --dataset-directory "<sampled-data-directory>"
```

## Output

- The script generates logs, performance metrics, and model checkpoints, all stored in the specified experiment directory.
- Outputs include detailed reports on model performance across different metrics, providing insights into the efficacy of the chosen configurations and optimizations.

## Note

Ensure that all paths and environmental variables are correctly set to match the system and dataset configurations. Before executing the script, verify that the necessary libraries and dependencies are installed and properly configured.
