# Knowledge Extraction with No Observable Data

This is a code repository for "Knowledge Extraction with No Observable Data," 
submitted to NeurIPS 2019. This software is free of charge for research purposes.
For commercial purposes, please contact the authors.

## 1. Abstract

Knowledge distillation is to transfer the knowledge of a large neural network
into a smaller one and has been shown to be effective especially when the amount
of training data is limited or the size of the student model is very small. To
transfer the knowledge, it is essential to observe the data that have been used
to train the network since its knowledge is concentrated on a narrow manifold
rather than the whole input space. However, the data are not accessible in many
cases due to the privacy or confidentiality issues in medical, industrial, and
military domains. To the best of our knowledge, there has been no approach that
distills the knowledge of a neural network when no data are observable. In this
work, we propose KegNet (Knowledge Extraction with No Observable Data), a novel
approach to extract the knowledge of a trained deep neural network and to
generate artificial data points that replace the missing training data in
knowledge distillation. Experiments show that KegNet outperforms all baselines
for data-free knowledge distillation.

## 2. Overview

This repository contains the source code of KegNet that was used for the
experiments of the paper. All codes are written by Python 3.6, and deep neural
networks are implemented by PyTorch 1.0.1.

The structure of this repository is given as follows:
- `src/` contains the Python source codes.
- `pretrained/` contains pretrained classifiers for the image datasets.
- `demo.sh` helps to run easily the main script.
- `requirements.txt` contains required Python packages.
- `README.md` represents this file.

## 3. How to Run

You may type `bash demo.sh` to run the demo script. It trains a generator for
MNIST and uses it to train a student network by knowledge distillation. All the
results are stored in `out`, including the trained networks and intermediate
loss and accuracy during the training. You can modify `src/main.py` to change
datasets, models, and other hyperparameters for your own experiments. The given
setting is the same as the one used in the paper.
