Expose project metadata fields via REST API#946
Open
avalyset wants to merge 1 commit into
Open
Conversation
Add a "metadata" field to the project representation returned by GET /v1/projects and /v1/projects/<project_id>, listing the document metadata fields a project consumes through select(...) transforms. For ensemble projects the fields of the source projects are gathered in addition to the project's own, as a flat, order-preserving, de-duplicated list. A source project that cannot be loaded is skipped best-effort, mirroring the graceful degradation already used for vocab in dump(). The Project schema in the OpenAPI spec is updated so schemathesis response validation stays green.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Closes #893.
Exposes the document metadata fields a project uses during
suggestvia theREST API, as requested in the issue.
GET /v1/projectsandGET /v1/projects/<project_id>now return ametadatafield — a flat list ofthe field names a project consumes through
select(...)transforms:Following the issue's spec for ensemble projects ("gather the metadata fields
that the source projects use, in addition to the fields the ensemble project
itself uses"), an ensemble's
metadatais the order-preserving, de-duplicatedunion of its own select fields and those of its source projects.
One implementation note:
metadata_fields()skips a source project that can'tbe resolved (
get_projectraisingValueError) rather than propagating, so itadds no new failure mode to
dump(). This is narrower than thevocabtry/exceptabove it —dump()already evaluatesis_trained(which resolvesensemble sources) before this point, so a genuinely missing source still
surfaces there exactly as today; the skip only keeps
metadata_fields()itselffrom being a second raise point. Happy to make it strict (raise) if you'd prefer.
Changes
annif/project.py:metadata_fields()+_select_fields()helpers;metadataadded todump().annif/openapi/annif.yaml:metadataadded to theProjectschema (keeps schemathesis response validation green).select(...)project and a metadata-gathering ensemble fixture; unit + REST coverage for own fields, ensemble union/dedup, and unloadable-source skip.Validation
pytest tests/test_project.py tests/test_rest.py tests/test_openapi.py— green (pre-existing tfidf-training failures unrelated to this change).flake8/black --check/isort --check— clean.