# Learning Neural Acoustic Fields
Code release for: **Learning Neural Acoustic Fields**

### Abstract
Our environment is filled with rich and dynamic acoustic information. When we walk into a cathedral, the reverberations as much as appearance inform us of the sanctuary's wide open space. Similarly, as an object moves around us, we expect the sound emitted to also exhibit this movement. While recent advances in learned implicit functions have led to increasingly higher quality representations of the visual world, there have not been commensurate advances in learning spatial auditory representations. To address this gap, we introduce Neural Acoustic Fields (NAFs), an implicit representation that captures how sounds propagate in a physical scene. By modeling acoustic propagation in a scene as a linear time-invariant system, NAFs learn to continuously map all emitter and listener location pairs to a neural impulse response function that can then be applied to arbitrary sounds. We demonstrate that the continuous nature of NAFs enables us to render spatial acoustics for a listener at an arbitrary location, and can predict sound propagation at novel locations. We further show that the representation learned by NAFs can help improve visual learning with sparse views. Finally we show that a representation informative of scene structure emerges during the learning of NAFs.

### Codebase
* Requirements (in addition to the usual python stack)
  * Pytorch 1.9 (1.10 should work as well)
  * h5py
  * numpy
  * scipy
  * matplotlib
  * sklearn (for linear probe and feature visualization)
  * librosa (for training data parsing)
  * ffmpeg 5.0 (for AAC-LC baseline only) - compile/use docker
  * opus-tools 0.2 & libopus 1.3.1 (for Xiph-opus baseline only) - install `opus-tools` via conda-forge
  * Tested on Ubuntu 20.04 and 21.10

```
Project structure
|-Neural_Acoustic_Fields
  |-baselines
    |-make_data_aac.py
      # Code for generating AAC-LC baseline, uses ffmpeg
    |-make_data_opus.py
      # Code for generating Xiph opus baseline, uses opus-tools
  |-data_loading
    |-sound_loader.py
      # Code that contains the dataset definition for our training data
  |-metadata
    |-magnitudes 
    |-mean_std
    |minmax
    *
    *
    * # Various data for training/testing
  |-model
    |-modules.py
      # Contains the definition for sinusoidal embedding and other non-network parts
    |-networks.py
      # Contains various differentiable modules to build our network
  |-testing
    |-cache_feature_NAF.py
      # Cache the NAF features, so you can visualize them using "vis_feat_NAF.py", also for linear probe
    |-cache_test_baseline.py
      # Cache the results from interpolation baselines
    |-cache_test_NAF.py
      # Cache the NAF results for the test set
    |-compute_spectral_baseline.py
      # Compute the spectral loss for the interpolation baselines (run cache_test_baseline.py first)
    |-compute_spectral_NAF.py
      # Compute the spectral loss for the NAF results (run cache_test_NAF.py first)
    |-compute_T60_err_baseline.py
      # Compute the T60 error for the interpolation baselines (run cache_test_baseline.py first)
    |-compute_T60_err_NAF.py
      # Compute the T60 error for the NAF results (run cache_test_NAF.py first)
    |-lin_probe_NAF.py
      # Fits a linear probe to NAF features, saves the images to ./results/depth_img (run cache_feature_NAF.py first)
    |-test_utils.py
      # Various tools that can help with testing
    |-vis_feat_NAF.py
      # Use TSNE to visualize the NAF features (run cache_feature_NAF.py first)
    |-vis_loudness_NAF.py
      # Query the network to get the loudness at all locations in a room for a given emitter
  |-results
    |-apartment_1 # weights for network trained on apartment_1
    |-apartment_2 # weights for network trained on apartment_2
    |-depth_img
    *
    *
    * # Various network/baseline outputs
  |-train.py
    # Contains the training loop for the NAF network
```
### Common use cases
* Training the NAF network with 4 GPUs from scratch (uses pytorch DDP)

  * `python train.py --apt apartment_1 --epochs 200 --resume 0 --batch_size 20 --gpus 4`


* Testing the numerical results for the NAF (single GPU)
  
  Note this will ask our NAF to generate the data at the test locations, and then compute the spectral(L1)/T60 scores

   * `python ./testing/cache_test_NAF.py; python ./testing/compute_spectral_NAF.py`


* Generate the baseline testing data

  We utilize a ramdisk to cache the intermediate results prior to decoding to wav, this is quite time intensive.
  ```
  sudo mkdir /mnt/ramdisk
  sudo chmod 777 /mnt/ramdisk
  sudo mount -t tmpfs -o rw,size=2G tmpfs /mnt/ramdisk
  ```
  Then run `make_data_opus.py`


* Testing the numerical results for the baseline

  Note that mode can be `linear` or `nearest`, which uses either linear or nearest interpolation.

  * `python ./testing/cache_test_baseline.py; python ./testing/compute_spectral_baseline.py;`


* Visualize loudness 
  
  Note the current code requires listener points for each scene to be in `./metadata/room_grid_coors`, we have generated these for you. Code to create your own will require `habitat-sim` (see below).
  
  * `python ./testing/vis_loudness.py --apt apartment_1` for visualizing NAF, results are saved in `./results/loudness_img`

* Visualize TSNE projection features
  
  Note the current code requires query points for each scene to be in `./metadata/room_feat_coors`, we have generated these for you. Code to create your own will require `habitat-sim` (see below).
  * `python ./testing/cache_feature_NAF.py; python ./testing/vis_feat_NAF.py --apt apartment_1`


* Linear probe of features

  Note the current code requires query points for each scene to be in `./metadata/room_feat_coors` and `./metadata/room_grid_coors`, we have generated these for you. Code to create your own will require `habitat-sim` (see below).
  * `python ./testing/cache_feature_NAF.py --apt apartment_1; python ./testing/lin_probe_NAF.py --apt apartment_1` for linear probe of NAF features