This is the code for project low-pass fitered differentially private optimization

Key requirements:
* deepspeed
* fast-differential-privacy
* pytorch
* opacus
* wandb
* torchvision
* timm

Provides 3 optimizers:
### LPSGD
* initialize parameters:
    * params: model parameters
    * lr: float, learning rate
    * weight_decay: float, weight_decay
    * a: list of float, momentum coefficients
    * b：list of float, gradient coefficients
    * c: float | None, Adam coefficient
* update:
    * $x_{t+1} = (1-\text{weight\_decay})\times x_t - \text{lr}\times\hat{y}_t$
    * $\sum^{n_a-1}_{i=0}a[i]y_{t-i} = \sum^{n_b-1}_{i=0}b_i\tilde{g}_{t-i}$,
    * If $c$ is not None, use Adam, $v_t = cv_{t-1}+ (1-c)g_t\circ g_t, \hat{v}_t=v_t/(1-c^t)$ 
    * $\hat{y}_t=\frac{y_t/\text{norm\_factor}}{\max\{\sqrt{\hat{v_t}},\epsilon\}}$
### GaLoreLPSGD
* initialize parameters:
    * same as LPSGD
### LMSSGD
* To be filled


### Running scripts:
An example to run LPSGD:
```console
python ./run_LPSGD.py \
    --tag test --log_type file --log_freq 5 \
    --bs 1000 --mnbs 100 --data cifar100 \
    --algo sgd --lr 0.2 --epoch 150 --scheduler \
    --clipping --noise -1 --epsilon 8 \
    --coef_file ./coefs/PID.csv
```

Suggested parameters:
--bs: 1000
--lr: 0.5 for sgd, 0.001 for adam 
--coef_file: PID.csv

Other arguments:
--epoch: 150~200 for pretraining cifar, 10 for finetuning
--scheduler: use cosineanealling learning rate scheduler (suggested for pretraining)
--pretrained: use pretrained weights