# DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
This repository is the official Pytorch implementation of our work:

**DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection.**

## Framework
<p align="center">
  <img height="200" src="misc/dipex.png" />
</p>
Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity.
In this work, we investigate using vision-language models (VLMs) to enhance object detection via a self-supervised prompt learning strategy. Our initial findings indicate that manually crafted text queries often result in undetected objects, primarily because detection confidence diminishes when the query words exhibit semantic overlap. To address this, we propose a Dispersing Prompt Expansion (DiPEx) approach. DiPEx progressively learns to expand a set of distinct, non-overlapping hyperspherical prompts to enhance recall rates, thereby improving performance in downstream tasks such as out-of-distribution OD. Specifically, DiPEx initiates the process by self-training generic parent prompts and selecting the one with the highest semantic uncertainty for further expansion. The resulting child prompts are expected to inherit semantics from their parent prompts while capturing more fine-grained semantics. We apply dispersion losses to ensure high inter-class discrepancy among child prompts while preserving semantic consistency between parent-child prompt pairs. To prevent excessive growth of the prompt sets, we utilize the maximum angular coverage (MAC) of the semantic space as a criterion for early termination. We demonstrate the effectiveness of DiPEx through extensive class-agnostic OD and OOD-OD experiments on MS-COCO and LVIS, surpassing other prompting methods by up to 20.1% in AR and achieving a 21.3% AP improvement over SAM.

## Contents
* [Installation](#Installation)
  * [Requirements](#Requirements)
  * [Open-GroundingDino](#install-open-groundingdino)
* [Getting Started](#getting-started)
  * [Requirements](#Requirements)
  * [Training & Testing](#training--testing)

## Installation

### Requirements
All the codes are tested in the following environment:
* Python 3.10+
* PyTorch 2.0.1
* CUDA 11.7

### Install `Open-GroundingDino`
Our implementations are based on the lastest [`Open-GroundingDino`](https://github.com/longzw1997/Open-GroundingDino). Please follow the instructions provided in the repository to install Open-GroundingDino.

## Dataset Preparation
Currently we provide the dataloader of COCO dataset and LVIS dataset, and the supporting of more datasets are on the way.

## Training & Testing
To train the model in the paper, `cd` into `Open-GroundingDino` directory and run this command:
```shell script
bash prompt_tune.sh
```

To evaluate the model, run the following command:
```shell script
bash eval.sh
```

## Results
### Evaluation on MS-COCO Dataset
<p align="center">
  <img height="200" src="misc/coco_table.png" />
</p>

### Evaluation on LVISv1.0 Dataset
<p align="center">
  <img height="200" src="misc/lvis_table.png" />
</p>