The Fitness Landscape for Antibodies (FLAb) is the largest publicly available therapeutic antibody dataset designed to train and benchmark protein AI models. It provides open-access, high-quality developability data on diverse therapeutic properties — expression, thermostability, immunogenicity, aggregation, polyreactivity, binding affinity, and pharmacokinetics — spanning 241 datasets and over 3 million antibody assay data points aggregated from public studies.
Each dataset is a CSV with heavy (and optionally light) amino acid sequence columns and a fitness column containing the experimental assay value. Additional metadata columns may also be present.
A web interface to FLAb can be found here.
FLAb/
├── data/ # 241 datasets in 7 therapeutic property categories
├── models/ # Scoring scripts (zero-shot, few-shot, ablation)
├── score/ # Zero-shot scored outputs per model
├── score_ft/ # Few-shot scored outputs per model
├── score_ablation/ # Ablation study outputs (empty — runs pending)
└── envs/ # Conda environment YAML files
Datasets are organized by therapeutic property under data/. Each category folder has its own README describing each dataset (size, assay units, publication, license, direction of favorable values).
See data/README.md for the full dataset index.
data/
├── aggregation/ (31 datasets)
├── binding/ (132 datasets)
├── expression/ (7 datasets)
├── immunogenicity/ (4 datasets)
├── pharmacokinetics/ (9 datasets)
├── polyreactivity/ (33 datasets)
└── thermostability/ (25 datasets)
Format: Each CSV has at minimum heavy and fitness columns. Two-chain antibodies also have light. Nanobody datasets have heavy only. Fifteen datasets have non-standard fitness column names (jain2024assessment_*, kirby2024retrospective_*) and are kept as reference but excluded from automated scoring.
All scoring scripts live in models/ and are run from the FLAb/ root directory. See models/README.md for full details.
Each scoring_*.py script takes a single dataset and a method name:
python models/scoring_esm2_150M.py data/binding/hie2023efficient_CoV2_S309_Kd.csv esm2_150M_scoreOutput is written to score/{method_name}/{category}_{dataset}.csv.gz with columns:
folder, csv, {method}, {method}_pval, {method}_ld, {method}_ld_pval
Available zero-shot models: antiberty, iglm, ld_score, esm2_{8M,35M,150M,650M,3B}, esm2_15B, bp_{aromaticity,average_flexibility,charge_at_7_4,gravy,instability_index,isoelectric_point,molecular_weight}, ism_{3B_uc30,650M_uc30,650M_uc30pdb}, progen2_{151M_small,2p7B_bfd90,2p7B_large,6p4B_xlarge,764M_{base,medium,oas}}, pyrosetta, abmpnn, chai1, esmif, igfold, proteinmpnn
Few-shot scripts use an 80/10/10 train/val/test split and output to score_ft/ft_{model}/:
python models/ft_scoring_esm2_150M.py data/binding/hie2023efficient_CoV2_S309_Kd.csv ft_esm2_150M_scoreAvailable few-shot models: ft_{antiberty,esm2_{8M,35M,150M,650M,3B},esmif,igfold2_{bert,gt,structure},ism_{3B,650M_uc30,650M_uc30pdb},onehot}
Pre-computed few-shot results are in score_ft/ft_combined_data.csv.
Create a conda environment for each scoring method:
conda env create --name ENV_NAME --file envs/ENV.ymlAvailable environments: antiberty.yml, esmif.yml, iglm.yml, mpnn.yml, progen.yml, pyrosetta.yml
Model weights (ISM, ProGen2, AbMPNN, ProteinMPNN) are expected at ~/models/. See models/README.md for exact paths.
FLAb is a living benchmark. To contribute data or models, submit a pull request or email mchungy1@jhu.edu.
For bugs, open a GitHub issue.
@article{chungyoun2025flab2,
title = {Fitness Landscape for Antibodies 2: Benchmarking Reveals That Protein AI Models Cannot Yet Consistently Predict Developability Properties},
author = {Chungyoun, Michael and Gray, Jeffrey},
journal = {bioRxiv},
doi = {https://doi.org/10.64898/2025.12.27.696706},
year = {2025}
}Dataset licenses are listed in data/README.md.
