WildCross: A Cross-Modal Large Scale Benchmark for Place Recognition and Metric Depth Estimation in Natural Environments

Accepted at IEEE ICRA 2026

1CSIRO Robotics 2Queensland University of Technology
WildCross teaser image showing RGB, depth, overlay, normal, and LiDAR submap modalities.

WildCross provides aligned RGB, depth, surface normal, and LiDAR submap modalities for natural-environment benchmarking.

Overview

We introduced WildCross, a large-scale benchmark for cross-modal place recognition and metric depth estimation in natural environments. The dataset comprises over 476K sequential RGB frames with semi-dense depth and surface normal annotations, each aligned with accurate 6DoF poses and synchronized dense lidar submaps.

We conduct comprehensive experiments on visual, lidar, and cross-modal place recognition, as well as metric depth estimation, demonstrating the value of WildCross as a challenging benchmark for multi-modal robotic perception tasks.

Tasks

Visual Place Recognition (VPR)

WildCross supports visual relocalization with sequential RGB imagery across challenging revisits, including reverse-direction traversals and long-term appearance changes. The benchmark includes cross-fold train/test splits for robust evaluation of generalization and in-domain adaptation.

Cross-Modal Place Recognition (CMPR)

CMPR in WildCross evaluates retrieval across sensing modalities, such as image-to-lidar localization. The synchronized RGB frames, accurate poses, and dense lidar submaps provide a strong testbed for cross-modal representation learning.

Metric Depth Estimation

WildCross provides semi-dense metric depth and surface normal annotations for every frame, generated from accumulated global point clouds, accurate camera poses, and visibility filtering to remove occluded points. This supports training and benchmarking depth models in natural environments where current methods face substantial domain-shift challenges.

LiDAR Place Recognition (LPR)

For LiDAR place recognition (LPR), code for training and evaluation can be found on a WildCross branch of the original Wild-Places repository. Information for accessing the branch is available in the WildCross LPR documentation folder.

Video and Multimedia

This video shows the RGB images from one of the sequences of WildCross alongside semi-dense depth images.

This video shows the RGB images from one of the sequences of WildCross with fine-tuned depth predictions

Related Links

WildCross Paper for published details of dataset and benchmark results.

WildCross GitHub Repository for code, benchmarking scripts, and project updates.

WildCross Dataset Download via the CSIRO Data Access Portal.

WildCross HuggingFace Page for published checkpoints.

Wild-Places Repository for LiDAR place recognition baseline and training code.

BibTeX

@inproceedings{wildcross2026,
  title={{WildCross: A Cross-Modal Large Scale Benchmark for Place Recognition and Metric Depth Estimation in Natural Environments}},
  author={Joshua Knights, Joseph Reid, Kaushik Roy, David Hall, Mark Cox, Peyman Moghadam},
  booktitle={Proceedings-IEEE International Conference on Robotics and Automation},
  pages={},
  year={2026}
}