Skip to content

Add onnx export#294

Open
julianpollmann wants to merge 32 commits into
mainfrom
feature/add-onnx-export
Open

Add onnx export#294
julianpollmann wants to merge 32 commits into
mainfrom
feature/add-onnx-export

Conversation

@julianpollmann

Copy link
Copy Markdown
Collaborator

This adds export to onnx to the model and training script.

@florian-huber @niekdejonge does this fit or should this be added to a converter script in ms2deepscore/models?

Should I adapt the Pipeline to use onnxruntime for inference?

@niekdejonge

Copy link
Copy Markdown
Collaborator

@julianpollmann Great! Thanks for developing an onnx converter and for adding it here.

Good that you added a save option for the settings. Now it is a separate json file right. Ideally it is part of the onnx file itself, for easier use and less risk of mismatching the wrong settings with a model. I think this is possible for onnx as well, using metadata_props (suggested by claude).

For inference we would like to have matchms compatibility, which makes it a bit more work... So how it is solved now is having the class MS2DeepScore(BaseSimilarity): which inherits from the matchms BaseSimilarity to make it matchms compatible. This has some torch specific code (like eval) and takes in a SiameseSpectralModel (also torch specific).

So what I think is needed to make it run using onnxruntime is a new MS2DeepScoreONNX class, a SiameseSpectralModelONNX class and a compute_embedding_array_onnx function, to enable inference. In fact you might be able to put all this functionality in just a single MS2DeepScoreONNX class, since SiameseSpectralModelONNX is not really needed as a separate class, since we don't do ONNX training. I think Pipeline doesn't need changing, this is on the matchms end and will just run anything that inherits from BaseSimilarity if I remember correctly, so if you give it the new SiameseSpectralModelONNX, which inherits from BaseSimilarity it should work fine with Pipeline.

A bit more work than I realized, when I commented on your linkedin post, sorry... But I think it is a nice change worth the effort, since onnx is faster and also programming language interoperable.

@julianpollmann

julianpollmann commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

@niekdejonge I will have time on friday to look into that!
Weights and settings can be integrated into onnx via metadata_props, however some providers (e.g., Huggingface) have separate config/settings.json files for e.g., model cards. Don't know which use case is more suitable for us.

Also: Right now onnx conversion is done on every checkpoint, is there a specific "last checkpoint" function or should this better be integrated into training_wrapper_functions.py?

@julianpollmann julianpollmann removed the request for review from florian-huber June 24, 2026 13:09
Comment thread ms2deepscore/models/SiameseSpectralModelONNX.py Outdated
Comment thread tests/test_siamese_spectral_model_onnx.py
Comment thread ms2deepscore/models/SiameseSpectralModel.py

@florian-huber florian-huber left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition, thanks @julianpollmann !
I added a few comments to check (feel free to ignore them if they are beyond the point).

Comment thread README.md
## 1) Compute spectral similarities
We provide a model which was trained on > 500,000 MS/MS combined spectra from [GNPS](https://gnps.ucsd.edu/), [Mona](https://mona.fiehnlab.ucdavis.edu/), MassBank and MSnLib.

This model can be downloaded from [from zenodo here](https://zenodo.org/records/17826815). Only the ms2deepscore_model.pt is needed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update this and upload the latest onnx model to a new zenodo link.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure, but I think @florian-huber is training a model on new data. Maybe we should add this one then?

Comment thread README.md
from ms2deepscore import MS2DeepScore, MS2DeepScoreONNX

cleaned_spectra = pipeline.spectra_queries

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two lines below should be removed. Since it continuous from the code above. Which is an ONNX model, so I guess this won't work with MS2DeepScore. So if we remove these two lines it should be fine.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean removing:

ms2ds_model = MS2DeepScore(model)
ms2ds_embeddings = ms2ds_model.get_embedding_array(cleaned_spectra)

?

Comment thread README.md
Comment thread README.md

@niekdejonge niekdejonge left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julianpollmann Great work! Looks good to me! Very complete PR, both in tests, and readme.

The tests with the onnx model have the same output as the ones for the .pt models, so it looks all good to me!

Will be great to have the CPU speed up by ONNX, since the majority of users will just want to run it on their laptop.

I left a few comments in the tutorial, for further clarity and some small things that won't run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants