Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector

ECCV 2024


Yuqian Fu*1,2,3,
Yu Wang*1,
Yixuan Pan*4,
Lian Huai†5,
Xingyu Qiu1,

Zeyu Shangguan5,
Tong Liu5,
Yanwei Fu1, Luc Van Gool2,3,
Xingqun Jiang5

1Fudan University, 2ETH Zürich, 3INSAIT, 4Southeast University, 5BOE Technology


CD-FSOD benchmark: COCO serves as the training source data, while six datasets including ArTaxOr, Clipart1k, DIOR, DeepFish, NEU-DET, and UODD are utilized as novel testing target datasets. Dataset metrics: We introduce styles, inter-class variance (ICV), and indefinable boundaries (IB) as metrics for evaluating the domain gap issue in CD-FSOD. Our target datasets exhibit variations in these metrics.

Motivations

Cross-domain Few-Shot Learning (CD-FSL) aims at transferring knowledge from a source dataset to new target domains with few labeled data. However, most of the exitsing CD-FSL works focus on the classification task, overlooking object detection. Thus, this paper delves into object detection tasks in CD-FSL, also known as cross-domain few-shot object detection (CD-FSOD). Previous traditional FSOD methods could roughtly grouped into meta-learning based ones and finetuning based ones, while a recent transformer-based open-set detector, DE-ViT, shows exceptional performance in FSOD, surpassing other methods as depicted in the below figure. This inspired us to study:

  1. Can such open-set detection methods easily generalize to CD-FSOD?
  2. If not, how can models be enhanced when facing huge domain gaps?
  3. a) Our motivation: The DE-ViT open-set detector excels in FSOD but strug-gles in CD-FSOD, inspiring our creation of CD-ViTO. (b) Technical motivation: FSOD models face challenges when dealing with cross-domain targets, such as small inter-class variance (ICV), indefinable boundaries (IB), and varying appearances (styles).


Contributions

>> To answer the first question:

  • We employ metrics including style, ICV, and IB to understand the domain gap in CD-FSOD.
  • Based on these metrics, we establish a new benchmark with diverse targets for CD-FSOD.
  • We conduct extensive study of existing detectors revealing the challenges posed by CD-FSOD.

>> To answer the second question:

We build a new CD-ViTO method via enhancing the existing DE-ViT with the following novel modules:
  • Learnable Instance Features: align initial fixed instances with target categories, thus tackle the small ICV issue by enhancing feature distinctiveness.
  • Instance Reweighting Module: assigns higher importance to high-quality instances with slight IB, thus alleviate the significant IB issue.
  • Domain Prompter: encourages features resilient to different styles by synthesizing imaginary domains without altering semantic contents, thus improve model's robustness to different styles.
As indicated in the above figure, our enhanced CD-ViTO (orange stars) makes the DE-ViT (green stars) great again on CD-FSOD targets.

Datasets

In addition to the visual examples as shown in the benchmark figure, we further provide more infomations here. All the target datasets could be found on our github repo.

CD-ViTO Method

Overall framework of our CD-ViTO:

We build our method upon the base open-set detector (DE-ViT) and finetune our method using few labeled instances of target domain. Modules in blue are inherited from DE-ViT while modules in orange are proposed by us. New improvements include learnable instance features, instance reweighting, domain prompter, and finetuning; More details about the modules please refer to our paper.

We hightlight that all the modules are very lightwight causing very negligible or even no cost.

Main Results

A wide broader of existing detectors are studied, incuding: 1) typical FSOD methods e.g., meta-RCNN, TFA, FSCE, and DeFRCN; 2) CD-FSOD methods e.g., distill-cdfsod; 3) ViT-based detector e.g., ViTDeT; 4) open-set detector e.g., DE-ViT and Detic.

The comparsional results are as follows. ( "FT" means we apply the finetuning on the original model.)

Citation

Please consider cite us if you find our task, benchmark, or model is useful to you.

(Also feel free to ask questions or if you are interested in working on this topic together, thanks!

      @article{fu2024cross,
        title={Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector},
        author={Fu, Yuqian and Wang, Yu and Pan, Yixuan and Huai, Lian and Qiu, Xingyu and Shangguan, Zeyu and Liu, Tong and Kong, Lingjie and Fu, Yanwei and Van Gool, Luc and others},
        journal={arXiv preprint arXiv:2402.03094},
        year={2024}
      }