We provide the following materials:
(1) COCO regional captions generated by BLIP-2;
(2) Our training and inference code. The start files are in ./FineCLIP_code_reference/scripts/