Skip to content

Expose project metadata fields via REST API#946

Open
avalyset wants to merge 1 commit into
NatLibFi:mainfrom
avalyset:issue893-expose-project-metadata-rest-api
Open

Expose project metadata fields via REST API#946
avalyset wants to merge 1 commit into
NatLibFi:mainfrom
avalyset:issue893-expose-project-metadata-rest-api

Conversation

@avalyset

Copy link
Copy Markdown

Closes #893.

Exposes the document metadata fields a project uses during suggest via the
REST API, as requested in the issue. GET /v1/projects and
GET /v1/projects/<project_id> now return a metadata field — a flat list of
the field names a project consumes through select(...) transforms:

"metadata": ["title", "description", "text"]

Following the issue's spec for ensemble projects ("gather the metadata fields
that the source projects use, in addition to the fields the ensemble project
itself uses"), an ensemble's metadata is the order-preserving, de-duplicated
union of its own select fields and those of its source projects.

One implementation note: metadata_fields() skips a source project that can't
be resolved (get_project raising ValueError) rather than propagating, so it
adds no new failure mode to dump(). This is narrower than the vocab
try/except above it — dump() already evaluates is_trained (which resolves
ensemble sources) before this point, so a genuinely missing source still
surfaces there exactly as today; the skip only keeps metadata_fields() itself
from being a second raise point. Happy to make it strict (raise) if you'd prefer.

Changes

  • annif/project.py: metadata_fields() + _select_fields() helpers; metadata added to dump().
  • annif/openapi/annif.yaml: metadata added to the Project schema (keeps schemathesis response validation green).
  • Tests: a select(...) project and a metadata-gathering ensemble fixture; unit + REST coverage for own fields, ensemble union/dedup, and unloadable-source skip.

Validation

  • pytest tests/test_project.py tests/test_rest.py tests/test_openapi.py — green (pre-existing tfidf-training failures unrelated to this change).
  • flake8 / black --check / isort --check — clean.

Add a "metadata" field to the project representation returned by
GET /v1/projects and /v1/projects/<project_id>, listing the document
metadata fields a project consumes through select(...) transforms.

For ensemble projects the fields of the source projects are gathered in
addition to the project's own, as a flat, order-preserving, de-duplicated
list. A source project that cannot be loaded is skipped best-effort,
mirroring the graceful degradation already used for vocab in dump().

The Project schema in the OpenAPI spec is updated so schemathesis response
validation stays green.
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add information on metadata fields of projects to REST API

1 participant