Unified Spherical Frontend:
Learning Rotation-Equivariant Representations of
Spherical Images from Any Camera

Mukai (Tom Notch) Yu Mosam Dabhi Liuyue (Louise) Xie Sebastian Scherer László A. Jeni

Carnegie Mellon University, Robotics Institute

Unified Spherical Frontend (USF) is a distortion-free lens-agnostic rotation-equivariant vision framework for modern perception

Usage

Configure environment variables
```
cp .env.example .env
```
Edit .env and fill in your own values (WandB API key, entity, Docker username, etc.). This file is gitignored and will not be committed. It is automatically loaded into os.environ when you import usf, so Hydra configs can reference them via ${oc.env:VAR}.
Docker available by docker compose up -d if you set DOCKER_USER=tomnotch in .env
- scripts/ contains scripts to build/push/pull/run Docker/Singularity
Set up the environment locally (optional)
1. Environment compartmentalization
  1. Install Conda/Miniconda/Mamba
  2. Create the conda environment (provides Python 3.12, uv, and system-level libraries):
```
conda env create -f environment.yml
```
  3. Activate environment:
```
conda activate usf
```
  4. Install Python packages with uv:
```
uv pip install -e .            # core deps
uv pip install -e '.[full]'    # optional deps (jupyter, pre-commit, etc.)
```
    Optional Manim for notebook/manim/: not part of [full] (PyPI needs native Cairo/Pango). See the note in requirements-full.txt, then uv pip install manim when those libraries are available on the host.
  5. Verify: pip check and python -c "import torch, torch_scatter, xformers, faiss"
    - If torch_scatter complains about GLIBC version mismatch, build from source:
      uv pip install --no-build-isolation --no-deps "git+https://github.com/rusty1s/pytorch_scatter.git@2.1.2"

Download datasets, create data/ folder and symlink everything under it

ln -s path/to/your/dataset/folder/* data/

Click to see folder structure

❯ tree -dhl ./data
./data
├── [  28]  MNIST -> /home/your_user_name/dataset/MNIST
│   ├── [4.0K]  t10k-images-idx3-ubyte
│   ├── [4.0K]  t10k-labels-idx1-ubyte
│   ├── [4.0K]  train-images-idx3-ubyte
│   └── [4.0K]  train-labels-idx1-ubyte
├── [  30]  PANDORA -> /home/your_user_name/dataset/PANDORA
│   ├── [4.0K]  annotations
│   └── [ 92K]  images
└── [  36]  stanford2D3DS -> /home/your_user_name/dataset/stanford2D3DS
    └── [4.0K]  area_3
        ├── [4.0K]  3d
        │   └── [ 12K]  rgb_textures
        ├── [4.0K]  data
        │   ├── [448K]  depth
        │   ├── [460K]  global_xyz
        │   ├── [464K]  normal
        │   ├── [460K]  pose
        │   ├── [436K]  rgb
        │   ├── [472K]  semantic
        │   └── [484K]  semantic_pretty
        ├── [4.0K]  pano
        │   ├── [ 16K]  depth
        │   ├── [ 16K]  global_xyz
        │   ├── [ 20K]  normal
        │   ├── [ 16K]  pose
        │   ├── [ 16K]  rgb
        │   ├── [ 16K]  semantic
        │   └── [ 20K]  semantic_pretty
        └── [376K]  raw

You may change this however you like, but you need to modify corresponding YAML configs in config/ folder
Link to relevant dataset:

As an example, run notebooks such as sampler.ipynb and network_layers.ipynb
To train MNIST classification model, just do this in your shell
```
train task=mnist
```
- You might need to follow the instruction to create or login to your WanDB account and set up a project using WanDB's web interface for the logging to work properly, otherwise, you can run mnist.ipynb for a local demo
- Relevant config file: mnist.yaml
To train object detection
```
train task=object_detection
```
- Local demo: object_detection.ipynb
- Relevant config file: object_detection.yaml
To train semantic segmentation
```
train task=semantic_segmentation
```
- Local demo: semantic_segmentation.ipynb
- Relevant config file: semantic_segmentation.yaml
To visualize a (batch of) spherical image file
```
visualize_spherical_image -p path/to/image.npz -f desired_fps -s point_size
```
- Path can be relative, e.g. I do -p data/output.npz all the time
- FPS and point size are optional
- You should see an interactive Open3D visualization window, press h to see operations printed in shell
To generate lens normal map for a given camera, make sure you have the camera config YAML file ready, see rgb_0.yaml for example. Then run the following command
```
generate_lens_normal_map -c your/camera/config.yaml
```
This will generate a lens normal map .npz and .pdf in the lens_normal_map folder under the same folder of your camera config file.
For coordinate system conventions, read Spherical & Vector Convention.pdf

Development Environment Setup

Run scripts/dev_setup.sh

Citation

If you find this work useful, please cite:

@inproceedings{yu2026usf,
  title     = {Unified Spherical Frontend: Learning Rotation-Equivariant Representations of Spherical Images from Any Camera},
  author    = {Yu, Mukai and Dabhi, Mosam and Xie, Liuyue and Scherer, Sebastian and Jeni, L{\'a}szl{\'o} A.},
  year      = {2026},
  month     = jun,
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  publisher = IEEE
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
config		config
docker		docker
docs		docs
notebook		notebook
scripts		scripts
singularity		singularity
tests		tests
usf		usf
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements-full.txt		requirements-full.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified Spherical Frontend:
Learning Rotation-Equivariant Representations of
Spherical Images from Any Camera

Usage

Development Environment Setup

Citation

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unified Spherical Frontend:Learning Rotation-Equivariant Representations ofSpherical Images from Any Camera

Usage

Development Environment Setup

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages

Unified Spherical Frontend:
Learning Rotation-Equivariant Representations of
Spherical Images from Any Camera