ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives

ICCV 2025 (Highlight)

Yuqian Fu¹,

Runze Wang²,

Bin Ren^1,3,4,

Guolei Sun⁵,

Biao Gong⁶,

Yanwei Fu²,

Danda Pani Paudel¹,

Xuanjing Huang²,

Luc Van Gool¹

¹INSAIT, ²Fudan University, ³University of Trento, ⁴University of Pisa, ⁵ETH Zurich, ⁶Ant Group

arXiv Data & Code Live Demo Video (coming soon)

We tackle the task of Ego-Exo Object Correspondence which is recently proposed in Ego-Exo4D. Given object queries from one perspective (e.g., ego view), the task involves predicting the corresponding object masks in another perspective (e.g., exo view). Solving this task unlocks new possibilities in VR and Robotics, e.g., enabling virtual agents or robots to manipulate ego-view actions by learning from exo-view demonstrations.

Video Demos on Ego-Exo4D

The first row shows Ego2Exo, while the second row shows Exo2Ego.

More demo can be played here: https://huggingface.co/spaces/YuqianFu/ObjectRelatorDemo .

Breif Summary & Main Contributions

Despite the importance of this task, most existing segmentation models (e.g., Mask2Former, SAM, LISA) operate on single-view inputs, making them nontrival for it. To address this, we:

Toward Ego-Exo Object Correspondence Task: We conduct an early exploration of this challenging task, analyzing its unique difficulties, constructing several baselines, and proposing a new method.

ObjectRelator Framework: We introduce ObjectRelator, a cross-view object segmentation method combining MCFuse and XObjAlign. MCFuse for the first time introduces the text modality into this task and improves localization using multimodal cues for the same object(s), while XObjAlign boosts performance under appearance variations with an object-level consistency constraint.

New Testbed and SOTA Results: Alongside Ego-Exo4D, we present HANDAL-X as an additional benchmark. ObjectRelator achieves SOTA results on both datasets.

Framework Overview

Ego2Exo is used as an example in the frameork. Our method builds on the PSALM baseline (pink blocks) and tailors it for Ego-Exo Object Correspondence with two novel modules: Multimodal Condition Fusion (MCFuse) and Cross-View Object Alignment (XObjAlign). More details please refer to our paper.

Main Results on Ego-Exo4D

We highlight that: 1) Results are reported on Val set due to the lack of GT of testing set. 2) We construct a "Small TrainSet"(1/3 data) and "Full TrainSet". Splits are released for the community which are especially friendly to the GPU/Storage limited groups. 3) Our Method clearly outperforms baselines and competitors.

Results on test set, same as EgoExo4D Correspondence Challenge and our technical report .

Visulization Results

Visulization results show that: 1) Our MCFuse enhances object localization ability by using text as an extra prompt; 2) Our XObjAlign improves model's performance upon huge view shift.

More: We also adapt HANDAL-X, a benchmark featuring robot-friendly objects, as an additional testbed for cross-view object segmentation. Since HANDAL-X is a cross-view image object segmentation benchmark, for detailed results and more visualizations, please refer to our paper.

Video Demos on Human2Robot

Updates: Using our trained ObjectRelator, we can adapt it to a simple human2robotic demo, the detailed solution will be published in an extension paper (hopefully will be done soon!)

Citations

Please consider cite us if you find our tackled data, code, or model is useful to you.

Also feel free to ask questions or if you are interested in working on this topic together, thanks! :)


  @article{fu2024objectrelator,
      title={Objectrelator: Enabling cross-view object relation understanding in ego-centric and exo-centric videos},
      author={Fu, Yuqian and Wang, Runze and Bin, Ren and Guolei, Sun and Gong, Biao and Fu, Yanwei and Paudel, Danda Pani and Huang, Xuanjing and Van Gool, Luc},
      journal={ICCV2025},
      year={2025}
    }

  @article{fu2025cross,
      title={Cross-View Multi-Modal Segmentation@ Ego-Exo4D Challenges 2025},
      author={Fu, Yuqian and Wang, Runze and Fu, Yanwei and Paudel, Danda Pani and Van Gool, Luc},
      journal={arXiv preprint arXiv:2506.05856},
      year={2025}
    }