# Can Graph Learning Improve Task Planning?



## Environment Setups

In `requirements.txt`, we have listed required Python packages. To install them, run the command:

 `pip install -r requirements.txt`.

Besides, for running LLM's direct inference or GraphSearch, LLMs are depolyed as API services using `FastChat` to the `localhost:8008` endpoint. For installing FastChat, run additional commands as follows:
```shell
pip3 install "fschat[model_worker,webui]"
pip3 install vllm
```



To deploy the local LLM services, three commands are required:

```shell
python3 -m fastchat.serve.controller --host 127.0.0.1 

# Specify the LLM to be deployed, and we take CodeLlama-13B as an example
python3 -m fastchat.serve.vllm_worker --model-path codellama/CodeLlama-13b-Instruct-hf --host 127.0.0.1 
# Other used LLMs
python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-13b-v1.5 --host 127.0.0.1
python3 -m fastchat.serve.vllm_worker --model-path codellama/CodeLlama-7b-Instruct-hf --host 127.0.0.1 
python3 -m fastchat.serve.vllm_worker --model-path mistralai/Mistral-7B-Instruct-v0.2 --host 127.0.0.1 
python3 -m fastchat.serve.vllm_worker --model-path baichuan-inc/Baichuan2-13B-Chat --host 127.0.0.1

python3 -m fastchat.serve.openai_api_server --host localhost --port 8008
```



## Datasets

Four experimental datasets (HuggingFace, Multimedia, and Daily Life from [TaskBench](https://github.com/microsoft/JARVIS/blob/main/taskbench/README.md), and TMDB from [RestBench](https://github.com/Yifan-Song793/RestGPT)) are under the `data` folder. 

For each dataset, it contains the following files:
* `data.json` Detailed dataset, with each sample has a concrete user request, ground-truth decomposed task steps, and task invocation path
* `graph_desc.json` Detailed task graph where each node represents a unique task and each link denotes the dependencies among tasks
* `tool_desc.json` Only present the nodes' information in the task graph
* `user_requests.json` Original user requests
* `split_ids.json` Give the formal split of 500 test samples

As dataset from RestBench only contains orignal request and ground-truth API sequences, we have reformatted this dataset to align with experiments, including assigning a unique name to each API, constructing a task graph, and finally reformat original data samples. Processing details are shown in the file `raw_process_restgpt.py`.



## Training-free Modes

Codes of training-free modes (**Direct, GraphSearch, SGC**) are under the `trainfree` folder:

* **LLM's Direct Inference** `direct.py`
* **GraphSearch** `graphsearch.py` with arguments referring to different searching strategies
* **SGC** `sgc.py`

Besides, we show improved prompts (in-context learning with more examples, and Plan like a Graph (PlaG)) in `direct_diffprompt.py`.

Running scripts can be found in `trainfree_scripts.sh`.



## Training GNNs

Codes of training GNN or (LM+GNN) are under the `traingnn` folder.

Just run `main.py` with specified LM and GNN configurations.

* `lm_frozen` denotes whether LM backbone is frozen (`lm_frozen=1`) or fine-tuned (`lm_frozen=0`) during model training
* `lm_name` denotes LM backbone
* `gnn_name` denotes GNN's encoder type, choices are ['GCN', 'LightGCN', 'GAT', 'SAGE', 'GIN', 'TransformerConv'] ('LightGCN' refers to SGC)

Reproducing our main experimental results are shown in `traingnn_reprocude.sh`.



## Fine-tuning LLMs

Codes of fine-tuning LLMs are under the `finetunellm` folder:

* **LLM Fine-tune** `main.py`
* **Inference** `inference.py` where we leverage fine-tuned LLMs to make direct inference 

Running scripts can be found in `finetunellm_scripts.sh`.



## Other Folders 

`utils` folder contains utilization functions

`evaluate` folder contains evaluation functions:
  - `evaluate.py` evaluates any dataset, any LLM, and any method's predicted result (Node-F1, Link-F1). You have to specify the dataset name, LLM type, and method type (like direct / graphsearch / etc).
  - `bi_evaluate.py` provides bi-level evaluation, like comparing direct inference and graphsearch.
