# NEURIPS 2023: How to run the code (Supplementary Material)
# Paper Submission #8692

## Introduction

<p style="text-align: justify; text-indent: 20px;" >
This README file intends to guide the reviewers through our code usage. We used the AdLeap-MAS framework (available publicly on <a href="https://github.com/lsmcolab/adleap-mas/">GitHub</a>) to run our experiment, which also presents an extensive documentation to guide the user through its installation and usage, besides presenting the steps on how to modify the environments to run experiments <a href="#alves2022adleapmas">[1]</a>. However, we are delivering only the necessary material to run experiments presented in the Paper #8692: "Information-guided Planning: An Online Apporach for Partially Observable Problems" submitted at NeuRIPS 2023.
</p>

## How to run an experiment

<p style="text-align: justify; text-indent: 20px;" >
Straightforwardly, we provide the main routine used to run experiments in the "<i>main.py</i>" file. We organised the file to enable the reviewer to easily switch the method, environment and components by only setting 4 variables:
</p>

- <b><i>method</i></b>: where it is possible to select which algorithm you want to test, which options are: <i>POMCP</i> <a href="#silver2010">[2]</a>, <i>rho-POMCP</i> <a href="#thomas2020">[3]</a>, <i>IB-POMCP</i>, <i>I-UCT POMCP</i> and <i>IPR-POMCP</i>;
- <b><i>kwargs</i></b>: where it is possible to change some methods' hyperparameters by setting the necessary arguments (If no kwargs are added, each method runs using the default hyperparameters), which are:
    - k = 100 (particle filter size);
    - discount_factor = 0.95 (discount factor of simulations);
    - smallbag_size = 10 (size of each node's small bag) -> <b>only for rho-POMCP based methods</b>;
    - time_budget = infty (time budget in seconds for rhopomcp perform a time-constrained planning -- TB rho-POMCP) -> <b>only for rho-POMCP based methods</b>.
- <b><i>env_name</i></b>: where it is possible to select which environment you want to run and collect results for the selected method. Options: <i>TigerEnv</i>, <i>MazeEnv</i> and <i>LevelForagingEnv</i>;
- <b><i>scenario_id</i></b>: where, if your are running <i>MazeEnv</i> or <i>LevelForagingEnv</i> environments, you can select which scenario configuration you want to run by changing this variable value.

<p style="text-align: justify; text-indent: 20px;" >
After setting these variables, the environment will run automatically, generating a results file in the "<i>results</i>" folder.
</p>

## Where to find the algorithms' details

<p style="text-align: justify; text-indent: 20px;" >
You can find the details of implementation for each baseline used in this paper by following the path:
</p>

> <b>PATH:</b> ./src/reasoning/desired_algorithm.py

<p style="text-align: justify; text-indent: 20px;" >
All the baselines will follow a similar scheme of implementation and modularization. At the end of the file (which implements the algorithm), you can find the "<i>algorithm_planning</i>" method that represents the start point of every baseline. By following it, you should be able to find all the necessary information and detail about each tree-search process implementation.
</p>

<p style="text-align: justify; text-indent: 20px;" >
Note that some information are calculated within nodes or through the internal learn process of each algorithm. The information about these calculations may be find in these following files:
</p>

> <b>PATH:</b> ./src/reasoning/node.py

> <b>PATH:</b> ./src/reasoning/qlearn.py

## How to read the results

<p style="text-align: justify; text-indent: 20px;" >
After running a experiment, the results will be available in a "<i>.CSV</i>" file. You will find there an easy-to-read table that follows the format:
</p>

| Iteration     | Reward        | Time to reason    | N Rollouts | N Simulations |
| ------------- | ------------- | --------          |----------- | ------------- |
| 0             | 0             | 0.23456           | 70         | 30            |
| 1             | 100           | 0.77776           | 50         | 50            |
| 2             | 10            | 0.74576           | 30         | 70            |
| 3             | -10           | 0.66616           | 10         | 90            |
| ...           | ...           | ...               | ...        | ...           |


<b>NOTE:</b> There is no difference in the tables between environments or algorithms. Each column is separated by ";" and each line by a line skip "\n". 

Therefore, reading and plotting the results for <i>ANY</i> environment and method may follow a similar procedure, if the routine is implemented correctly.

## Links
- AdLeap-MAS's GitHub: <a href="https://github.com/lsmcolab/adleap-mas/">https://github.com/lsmcolab/adleap-mas/</a>

## References
<a name="alves2022adleapmas">[1]</a> do Carmo Alves, M. A., Varma, A., Elkhatib, Y., & Soriano Marcolino, L. (2022, May). AdLeap-MAS: An Open-source Multi-Agent Simulator for Ad-hoc Reasoning. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (pp. 1893-1895).

<a name="silver2010">[2]</a> D. Silver and J. Veness. Monte-carlo planning in large pomdps. In Neural Information Processing Systems, pages 2164—-2172. NeurIPS Foundation, 2010.

<a name="thomas2020">[3]</a> V. Thomas, G. Hutin, and O. Buffet. Monte carlo information-oriented planning. In 24th European Conference On Artificial Antelligence, 2020.