Unified Spherical Frontend:
Learning Rotation-Equivariant Representations of
Spherical Images from Any Camera
Mukai (Tom Notch) Yu Mosam Dabhi Liuyue (Louise) Xie Sebastian Scherer László A. Jeni
Carnegie Mellon University, Robotics Institute
Unified Spherical Frontend (USF) is a distortion-free lens-agnostic rotation-equivariant vision framework for modern perception
-
Configure environment variables
cp .env.example .env
Edit
.envand fill in your own values (WandB API key, entity, Docker username, etc.). This file is gitignored and will not be committed. It is automatically loaded intoos.environwhen youimport usf, so Hydra configs can reference them via${oc.env:VAR}. -
Docker available by
docker compose up -dif you setDOCKER_USER=tomnotchin.env- scripts/ contains scripts to build/push/pull/run Docker/Singularity
-
Set up the environment locally (optional)
-
Environment compartmentalization
-
Create the conda environment (provides Python 3.12, uv, and system-level libraries):
conda env create -f environment.yml
-
Activate environment:
conda activate usf
-
Install Python packages with uv:
uv pip install -e . # core deps uv pip install -e '.[full]' # optional deps (jupyter, pre-commit, etc.)
Optional Manim for
notebook/manim/: not part of[full](PyPI needs native Cairo/Pango). See the note inrequirements-full.txt, thenuv pip install manimwhen those libraries are available on the host. -
Verify:
pip checkandpython -c "import torch, torch_scatter, xformers, faiss"-
If
torch_scattercomplains aboutGLIBCversion mismatch, build from source:uv pip install --no-build-isolation --no-deps "git+https://github.com/rusty1s/pytorch_scatter.git@2.1.2"
-
-
-
Download datasets, create
data/folder and symlink everything under itln -s path/to/your/dataset/folder/* data/Click to see folder structure
❯ tree -dhl ./data ./data ├── [ 28] MNIST -> /home/your_user_name/dataset/MNIST │ ├── [4.0K] t10k-images-idx3-ubyte │ ├── [4.0K] t10k-labels-idx1-ubyte │ ├── [4.0K] train-images-idx3-ubyte │ └── [4.0K] train-labels-idx1-ubyte ├── [ 30] PANDORA -> /home/your_user_name/dataset/PANDORA │ ├── [4.0K] annotations │ └── [ 92K] images └── [ 36] stanford2D3DS -> /home/your_user_name/dataset/stanford2D3DS └── [4.0K] area_3 ├── [4.0K] 3d │ └── [ 12K] rgb_textures ├── [4.0K] data │ ├── [448K] depth │ ├── [460K] global_xyz │ ├── [464K] normal │ ├── [460K] pose │ ├── [436K] rgb │ ├── [472K] semantic │ └── [484K] semantic_pretty ├── [4.0K] pano │ ├── [ 16K] depth │ ├── [ 16K] global_xyz │ ├── [ 20K] normal │ ├── [ 16K] pose │ ├── [ 16K] rgb │ ├── [ 16K] semantic │ └── [ 20K] semantic_pretty └── [376K] raw
-
You may change this however you like, but you need to modify corresponding YAML configs in
config/folder -
Link to relevant dataset:
-
-
As an example, run notebooks such as sampler.ipynb and network_layers.ipynb
-
To train MNIST classification model, just do this in your shell
train task=mnist
- You might need to follow the instruction to create or login to your WanDB account and set up a project using WanDB's web interface for the logging to work properly, otherwise, you can run mnist.ipynb for a local demo
- Relevant config file: mnist.yaml
-
To train object detection
train task=object_detection
-
Local demo: object_detection.ipynb
-
Relevant config file: object_detection.yaml
-
-
To train semantic segmentation
train task=semantic_segmentation
- Local demo: semantic_segmentation.ipynb
- Relevant config file: semantic_segmentation.yaml
-
To visualize a (batch of) spherical image file
visualize_spherical_image -p path/to/image.npz -f desired_fps -s point_size
- Path can be relative, e.g. I do
-p data/output.npzall the time - FPS and point size are optional
- You should see an interactive Open3D visualization window, press
hto see operations printed in shell
- Path can be relative, e.g. I do
-
To generate lens normal map for a given camera, make sure you have the camera config YAML file ready, see rgb_0.yaml for example. Then run the following command
generate_lens_normal_map -c your/camera/config.yaml
This will generate a lens normal map
.npzand.pdfin thelens_normal_mapfolder under the same folder of your camera config file. -
For coordinate system conventions, read Spherical & Vector Convention.pdf
If you find this work useful, please cite:
@inproceedings{yu2026usf,
title = {Unified Spherical Frontend: Learning Rotation-Equivariant Representations of Spherical Images from Any Camera},
author = {Yu, Mukai and Dabhi, Mosam and Xie, Liuyue and Scherer, Sebastian and Jeni, L{\'a}szl{\'o} A.},
year = {2026},
month = jun,
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
publisher = IEEE
}