Add onnx export#294
Conversation
|
@julianpollmann Great! Thanks for developing an onnx converter and for adding it here. Good that you added a save option for the settings. Now it is a separate json file right. Ideally it is part of the onnx file itself, for easier use and less risk of mismatching the wrong settings with a model. I think this is possible for onnx as well, using metadata_props (suggested by claude). For inference we would like to have matchms compatibility, which makes it a bit more work... So how it is solved now is having the class MS2DeepScore(BaseSimilarity): which inherits from the matchms BaseSimilarity to make it matchms compatible. This has some torch specific code (like eval) and takes in a SiameseSpectralModel (also torch specific). So what I think is needed to make it run using onnxruntime is a new MS2DeepScoreONNX class, a SiameseSpectralModelONNX class and a compute_embedding_array_onnx function, to enable inference. In fact you might be able to put all this functionality in just a single MS2DeepScoreONNX class, since SiameseSpectralModelONNX is not really needed as a separate class, since we don't do ONNX training. I think Pipeline doesn't need changing, this is on the matchms end and will just run anything that inherits from BaseSimilarity if I remember correctly, so if you give it the new SiameseSpectralModelONNX, which inherits from BaseSimilarity it should work fine with Pipeline. A bit more work than I realized, when I commented on your linkedin post, sorry... But I think it is a nice change worth the effort, since onnx is faster and also programming language interoperable. |
|
@niekdejonge I will have time on friday to look into that! Also: Right now onnx conversion is done on every checkpoint, is there a specific "last checkpoint" function or should this better be integrated into training_wrapper_functions.py? |
florian-huber
left a comment
There was a problem hiding this comment.
Great addition, thanks @julianpollmann !
I added a few comments to check (feel free to ignore them if they are beyond the point).
| ## 1) Compute spectral similarities | ||
| We provide a model which was trained on > 500,000 MS/MS combined spectra from [GNPS](https://gnps.ucsd.edu/), [Mona](https://mona.fiehnlab.ucdavis.edu/), MassBank and MSnLib. | ||
|
|
||
| This model can be downloaded from [from zenodo here](https://zenodo.org/records/17826815). Only the ms2deepscore_model.pt is needed. |
There was a problem hiding this comment.
We should update this and upload the latest onnx model to a new zenodo link.
There was a problem hiding this comment.
Not quite sure, but I think @florian-huber is training a model on new data. Maybe we should add this one then?
| from ms2deepscore import MS2DeepScore, MS2DeepScoreONNX | ||
|
|
||
| cleaned_spectra = pipeline.spectra_queries | ||
|
|
There was a problem hiding this comment.
I think the two lines below should be removed. Since it continuous from the code above. Which is an ONNX model, so I guess this won't work with MS2DeepScore. So if we remove these two lines it should be fine.
There was a problem hiding this comment.
So you mean removing:
ms2ds_model = MS2DeepScore(model)
ms2ds_embeddings = ms2ds_model.get_embedding_array(cleaned_spectra)
?
niekdejonge
left a comment
There was a problem hiding this comment.
@julianpollmann Great work! Looks good to me! Very complete PR, both in tests, and readme.
The tests with the onnx model have the same output as the ones for the .pt models, so it looks all good to me!
Will be great to have the CPU speed up by ONNX, since the majority of users will just want to run it on their laptop.
I left a few comments in the tutorial, for further clarity and some small things that won't run.
This adds export to onnx to the model and training script.
@florian-huber @niekdejonge does this fit or should this be added to a converter script in
ms2deepscore/models?Should I adapt the Pipeline to use onnxruntime for inference?