PRiSM

Announcements

April 7, 2026: 🎉 Paper accepted at ACL Main 2026!

🚀 Quickstart

# clone project
git clone git@github.com:changelinglab/prism.git
cd prism

# create environment with your favourite package manager 
# and install dependencies from requirements.txt
# We provide "setup_uv.sh" for doing these and activating environment
. ./setup_uv.sh

Description

A benchmark for evaluating phonetic capabilities of speech models.

📚 Datasets

PRiSM datasets are organized in the Hugging Face collection: changelinglab/prism.

The benchmark currently uses the following dataset sources for extrinsic tasks:

Task	HF-Repo	Task
`DYS-ez`	`changelinglab/easycall-dysarthria`	Dysarthria Intelligibility Prediction
`CSD-us`	`changelinglab/ultrasuite-benchmark`	Atypical child speech classification
`L1-eda`	`changelinglab/edacc-l1cls`	L1 classification
`L1-arc`	`changelinglab/cmul2arctic-l1cls`	L1 Classification
`L2-so`	`changelinglab/speechocean-l2eval`	L2 assessment
`LID-fl`	`changelinglab/fleurs24-lid`	Language Identification
`GEO-v`	`shikhar7ssu/vaani-hi-geo`	Speech geolocation (To be released!)

For intrinsic task (Phone Recognition), PRiSM evaluation configs include Kaldi-style test sets (doreco, gmuaccent, l2arctic_perceived, timit, tusom2021, voxangeles) via configs/data/powsm_evalset_index.yaml.

To set up evalset-index data, download the corresponding *-pr dataset repos from the collection into one root directory, then point data.data_dir to that root when running inference/evaluation. The expected layout under that root must match configs/data/powsm_evalset_index.yaml (each dataset directory contains wav.scp, text.good, and language files).

After data download, PR inference can be run with:

# run with evalset index (override data_dir)
python src/main.py experiment=inference/transcribe_powsm data=powsmeval data.data_dir=/path/to/prism-evalsets

How to run

Train model with default configuration

# train on CPU
python src/main.py trainer=cpu

# train on GPU
python src/main.py trainer=gpu

Train model with chosen experiment configuration from configs/experiment/

# For probing experiments using hidden representations
python src/main.py experiment=probing/lid_fleurs_powsm

# For inference experiments
python src/main.py experiment=inference/transcribe_powsm data=doreco data.dataset_name=voxangeles task_name=inf_voxangeles_powsm

You can override any parameter from command line like this

python src/main.py trainer.max_epochs=20 data.batch_size=64

Citation

If you use this code in your research, please cite our paper:

@misc{prism2026,
      title={PRiSM: Benchmarking Phone Realization in Speech Models}, 
      author={Shikhar Bharadwaj and Chin-Jou Li and Yoonjae Kim and Kwanghee Choi and Eunjung Yeo and Ryan Soh-Eun Shim and Hanyu Zhou and Brendon Boldt and Karen Rosero Jacome and Kalvin Chang and Darsh Agrawal and Keer Xu and Chao-Han Huck Yang and Jian Zhu and Shinji Watanabe and David R. Mortensen},
      year={2026},
      eprint={2601.14046},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.14046}, 
}

❤️ Acknowledgement

This repository structure is based on the Lightning-Hydra-Template.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
configs		configs
docs		docs
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
setup_uv.sh		setup_uv.sh
uv.toml		uv.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRiSM

Announcements

🚀 Quickstart

Description

📚 Datasets

How to run

More Documentation

Citation

❤️ Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PRiSM

Announcements

🚀 Quickstart

Description

📚 Datasets

How to run

More Documentation

Citation

❤️ Acknowledgement

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages