
# Train.py

This Python script is used for training an Autoencoder model on a given dataset. It uses PyTorch for model creation and training.

## Dependencies

- Python 3.6+
- PyTorch
- Weights & Biases (wandb)
- tqdm
- loguru

## Usage
To run the script, use the following command:

```bash
python script.py --dataset_name <dataset_name> --dataset_directory <dataset_directory> [--batch-size <batch_size>] [--num-workers <num_workers>] [--num-hidden-layers <num_hidden_layers>] [--encoding-size <encoding_size>] [--lr <learning_rate>] [--num-epochs <num_epochs>] [--no-cuda] [--no-mps]
```

### Arguments

- `--dataset_name` (required): The name of the dataset to be processed.
- `--dataset_directory` (required): The directory path where the dataset is located.
- `--batch-size` (optional, default: 512): The batch size used for training the model.
- `--num-workers` (optional, default: 4): The number of worker processes used for data loading.
- `--num-hidden-layers` (optional, default: 4): The number of hidden layers in the model.
- `--encoding-size` (optional, default: 512): The size of the encoding layer in the model.
- `--lr` (optional, default: 1e-3): The learning rate used for training the model.
- `--num-epochs` (optional, default: 5): The number of epochs to train the model.
- `--no-cuda` (optional): Disables CUDA training. If not specified, CUDA will be used if available.
- `--no-mps` (optional): Disables macOS GPU training. If not specified, macOS GPU will be used if available.

## Example

To process a dataset named "my_dataset" located in the directory "/path/to/dataset" with default settings, run:

```bash
python script.py --dataset_name my_dataset --dataset_directory /path/to/dataset
```

To customize the training parameters, you can provide additional arguments. For example:

```bash
python script.py --dataset_name my_dataset --dataset_directory /path/to/dataset --batch-size 256 --num-hidden-layers 6 --lr 0.001 --num-epochs 10
```

This will set the batch size to 256, the number of hidden layers to 6, the learning rate to 0.001, and the number of training epochs to 10.

## Disabling CUDA or macOS GPU

By default, the script will use CUDA for training if a CUDA-enabled GPU is available. To disable CUDA and run the training on the CPU, use the `--no-cuda` flag:

```bash
python script.py --dataset_name my_dataset --dataset_directory /path/to/dataset --no-cuda
```

Similarly, if you are running the script on a macOS system with a GPU, you can disable macOS GPU training by using the `--no-mps` flag:

```bash
python script.py --dataset_name my_dataset --dataset_directory /path/to/dataset --no-mps
```
The script also requires a path to be added to the system path. This path should point to the directory containing the `project_utils` module.

```python
sys.path.append(
    # Add the path of the anympe directory here
)  # Adds the parent directory to the system path
```



## Model

The script trains an Autoencoder model. The model is defined in the `ae.py` file. The input size of the model is twice the number of variables in the training data.

## Dataloader

The script uses a custom dataloader defined in `dataloader.py` to load the training and validation data in batches.

## Logging

The script uses the `loguru` library for logging and `wandb` for experiment tracking. The logging is initialized using the `init_logger_and_wandb` function from the `project_utils.logging_utils` module.