Skip to content

feat: add support for LMDB molecular/materials science datasets from FAIR Chemistry#83

Open
amorehead wants to merge 8 commits into
RosettaCommons:productionfrom
amorehead:feat/add-lmdb-support
Open

feat: add support for LMDB molecular/materials science datasets from FAIR Chemistry#83
amorehead wants to merge 8 commits into
RosettaCommons:productionfrom
amorehead:feat/add-lmdb-support

Conversation

@amorehead
Copy link
Copy Markdown

@amorehead amorehead commented May 6, 2026

📋 PR Checklist

  • This PR is tagged as a draft if it is still under development and not ready for review.

    This avoids auto-triggering the slower tests in the CI and needlessly wasting resources.

  • I have ensured that all my commits follow angular commit message conventions.

    Format: <type>[optional scope]: <subject>
    Example: fix(af3): add missing crop transform to the af3 pipeline

    This affects semantic versioning as follows:

    • fix: patch version increment (0.0.1 → 0.0.2)
    • feat: minor version increment (0.0.1 → 0.1.0)
    • BREAKING CHANGE: major version increment (0.0.1 → 1.0.0)
    • All other types do not affect versioning

    The format ensures readable changelogs through auto-generation from commit messages.

  • I have run make format on the codebase before submitting the PR (this autoformats the code and lints it).

  • I have named the PR in angular PR message format as well (c.f. above), with a sensible tag line that summarizes all the changes in the PR.

    This is useful as the name of the PR is the default name of the commit that will be used if you merge with a squash & merge.
    Format: <type>[optional scope]: <subject>
    Example: fix(af3): add missing crop transform to the af3 pipeline


ℹ️ PR Description

What changes were made and why?

This pull request adds support for LMDB molecular/materials science datasets from FAIR Chemistry. This will enable users to load, analyze, and train models with datasets such as OMol25, OMat24, and OPoly26. The code changes made in this pull request should be fairly modular and unlikely to negatively impact other regions of the codebase.

How were the changes tested?

Dedicated molecular and materials science data loading unit tests were introduced at tests/ml/datasets/test_lmdb_dataset.py. All tests pass with the command pytest tests/ml/datasets. Credit goes to @Alex-Abrudan for testing that this branch enables model training with OMol25 using foundry.

Additional Notes

Thank you for preparing such a modular codebase! That made this implementation fairly straightforward.

@rclune rclune requested a review from nscorley May 8, 2026 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant