# Code for Paper 'On Convergence of Nearest Neighbor Classifiers over Feature Transformations'

This folder contains all the code to reproduce the experimental results for the submission 'On Convergence of Nearest Neighbor Classifiers over Feature Transformations'.

## Requirements

In order to be able to run the code there are multiple dependencies needed. The python requirements are stored in the "requirements.txt" file. We suggest to create an Anaconda virtual environment by using the following command:

- conda env create -f environment.yml

And activating the environment if needed by running:

- conda activate convergence_knn

## Export the representations

### Vision Datasets

For exporting the representations of all the computer vision datasets run the following commands:

- bash scripts/export/mnist.sh
- bash scripts/export/cifar10.sh
- bash scripts/export/cifar100.sh

### NLP Datasets

For exporting the NLP dataset representation start by exporting the "raw" (BOW and BOW-TFIDF) representations for all the datasets by running the following command (For _YELP_ make sure that you have downloaded the yelp reviews into "~/Downloads/yelp_academic_dataset_review.json"):

- python tools/datasets/imdb/generate_bag_of_words.py
- python tools/datasets/imdb/generate_bag_of_words_tfidf.py
- python tools/datasets/sst2/generate_bag_of_words.py
- python tools/datasets/sst2/generate_bag_of_words_tfidf.py
- python tools/datasets/yelp/generate_bag_of_words.py
- python tools/datasets/yelp/generate_bag_of_words_tfidf.py

After having exported the "raw" features, proceed by exporting the representations:

- bash scripts/export/imdb.sh
- bash scripts/export/sst2.sh
- bash scripts/export/yelp.sh

## Calculate the kNN accuracies

Using the scripts in folder "scripts/knn_accuracy", which points to the library in the "lib" folder. The library calculates the accuracy of the kNN classifier (and a second value, which is not used in this context). Run everything by using the commands:

- bash script/knn_accuracy/mnist.sh
- bash script/knn_accuracy/cifar10.sh
- bash script/knn_accuracy/cifar100.sh
- bash script/knn_accuracy/imdb.sh
- bash script/knn_accuracy/sst2.sh
- bash script/knn_accuracy/yelp.sh

## Fine-Tune the logistic regression models

Train the logistic regression (LR) models and report the accuracies using:

- bash script/fine_tune/mnist.sh
- bash script/fine_tune/cifar10.sh
- bash script/fine_tune/cifar100.sh
- bash script/fine_tune/imdb.sh
- bash script/fine_tune/sst2.sh
- bash script/fine_tune/yelp.sh

## Calculate the values for the convergence plots

Calculate the kNN accuracies (first value in the resulting csv files) for all the representations and datasets using:

- bash script/estimate_convergence/mnist.sh
- bash script/estimate_convergence/cifar10.sh
- bash script/estimate_convergence/cifar100.sh
- bash script/estimate_convergence/imdb.sh
- bash script/estimate_convergence/sst2.sh
- bash script/estimate_convergence/yelp.sh

## Plot the Results

The previous scripts generate resulting csv files in the folder "results".
Use the jupyter notebook "notebooks/Eval_Results.ipynb" to generate the plots used in the paper and appendix to inspect the already provided results.
