Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
cff-version: 1.2.0
title: Hyrax
message: >-
If you use Hyrax in your research, please cite this software using the
metadata in this file. For dependency citations used in Hyrax workflows,
also generate the dependency citation report with scripts/print_citation_compass_report.py.
type: software
authors:
- name: LINCC Frameworks
email: mtauraso@uw.edu
repository-code: https://github.com/lincc-frameworks/hyrax
url: https://github.com/lincc-frameworks/hyrax
license: MIT
abstract: >-
Hyrax is a low-code framework for rapid experimentation with machine
learning workflows in astronomy.
version: 0.0.0-dev
date-released: 2026-01-01
keywords:
- astronomy
- machine-learning
- pytorch
- anomaly-detection
152 changes: 152 additions & 0 deletions CITATION_COMPASS_IMPLEMENTATION_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Citation Compass integration plan for Hyrax

## Purpose
Implement a **minimal, maintainable** citation workflow that covers both:

1. **Hyrax as a dependency** (downstream projects can cite Hyrax itself).
2. **Hyrax dependencies** (Hyrax can emit citations for key method-defining packages).

This plan is intentionally short in scope and aligned with Hyrax design principles in `HYRAX_GUIDE.md` (simple defaults, single obvious workflow, low user burden).

---

## What I reviewed before proposing this

- Project architecture and assistant guidance: `HYRAX_GUIDE.md`, `CLAUDE.md`, `.github/copilot-instructions.md`.
- Packaging + dependency surface: `pyproject.toml`.
- Current CLI architecture and extension pattern: `src/hyrax_cli/main.py`, `src/hyrax/hyrax.py`, `src/hyrax/verbs/*`.
- Docs structure and discoverability: `docs/index.rst`, `README.md`, docs reference pages.

This matters because the integration should fit existing Hyrax patterns (Jupyter first, CLI second, avoid unnecessary new config knobs/verbs unless justified).

---

## Scope for first implementation

### In scope
- Add authoritative citation metadata for Hyrax itself.
- Add Citation Compass config/data for a **small curated set** of high-impact runtime dependencies.
- Provide **one canonical way** for users/maintainers to generate citations.
- Add lightweight tests/validation and contributor documentation.

### Out of scope
- Auto-citing all transitive dependencies.
- Workflow-specific citation graph logic.
- Introducing complex policy/configuration UI in this first pass.

---

## Minimal implementation steps (with concrete file targets)

### 1) Add Hyrax software citation metadata (`CITATION.cff`)

**Create:** `CITATION.cff` at repo root.

**Include:**
- `cff-version`
- `title` (`Hyrax`)
- `message`
- `authors` (LINCC Frameworks + maintainers as appropriate)
- `repository-code` (`https://github.com/lincc-frameworks/hyrax`)
- `license` (`MIT`)
- `version` strategy tied to releases (manual update or release automation)
- `date-released` on release tags
- `doi` if/when minted

**Why:** This is the standard artifact GitHub and downstream users expect for citing software.

---

### 2) Add Citation Compass dependency source file

**Create (name per tool convention once verified):** one root-level Citation Compass config/source file.

**Initial curated dependency set (from `pyproject.toml`):**
- `torch`
- `pytorch-ignite`
- `astropy`
- `mlflow`
- `umap-learn`
- `lancedb`
- `pyarrow`
- (optionally) one vector DB backend package actively documented in workflows (`chromadb` or `qdrant-client`)

**Selection rule for v1:** include only dependencies that are methodologically central to Hyrax scientific outcomes, not utilities.

**Why:** keeps maintenance small while covering likely citation requirements.

---

### 3) Add one canonical citation generation path

Pick exactly one user-facing interface for v1:

- **Preferred (lowest code risk):** provide a small repository wrapper script that uses the Citation Compass Python API directly.

**Initial recommendation:** keep this as a documented script-based workflow (not a Hyrax verb).

**Why:** consistent with “make easy things easy, hard things possible” while avoiding premature CLI surface expansion.

---

### 4) Document usage + maintenance policy

**Update docs:**
- `README.md` (brief “How to cite Hyrax and dependencies” section)
- `docs/reference_and_faq.rst` or another stable reference page
- Developer/contributor guidance with a short policy snippet:
- add dependency citations only for methodologically central libs,
- review citation list during release prep,
- keep `CITATION.cff` current.

**Why:** without policy, citation lists sprawl quickly.

---

### 5) Add basic validation checks

**Add tests/checks with minimal friction:**
- A packaging/documentation test ensuring `CITATION.cff` exists and has required top-level fields.
- A check that Citation Compass config file is present and parseable.
- Optional: smoke test for citation command execution if dependencies are available in CI.

Likely location: `tests/hyrax/test_packaging.py` (or nearby packaging/docs test module).

**Why:** prevents silent regressions and missing citation metadata in releases.

---

## Suggested sequence for implementation

1. **Merged PR A+B:** add `CITATION.cff`, add Citation Compass config, add docs usage snippet, and add lightweight validation checks in one implementation PR.

No `hyrax cite` PR is planned for this phase.

---

## Decision log for future implementer

- **Why no immediate new verb?** Hyrax guidance recommends avoiding new verbs unless clearly needed; documented command is enough for v1.
- **Why not all dependencies?** Most packages in `pyproject.toml` are infrastructure/utilities; citing all by default increases noise and maintenance burden.
- **Why root-level plan and files?** Citation metadata and citation tooling are repository-level concerns, not core scientific docs pages.

---

## Acceptance criteria (v1 done)

- `CITATION.cff` exists and is valid enough for GitHub citation UI.
- `citation_compass.toml` exists and includes a curated dependency list.
- One canonical documented command/path exists for generating the dependency citation report.
- Docs clearly explain when to update citation metadata and dependency entries.
- Automated checks fail if key citation artifacts are removed or malformed.

---

## Risks and mitigations

- **Risk:** dependency citation entries become stale.
- **Mitigation:** add “citation review” to release checklist and a simple test for file presence/shape.
- **Risk:** command UX confusion if multiple ways are documented.
- **Mitigation:** explicitly mark one canonical path.
- **Risk:** overgrowth of citation file.
- **Mitigation:** enforce “methodologically central only” policy in contributor docs.
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,17 @@ the [LINCC Frameworks Team](https://lsstdiscoveryalliance.org/programs/lincc-fra
and LSST-DA Catalyst Fellow, [Aritra Ghosh](https://ghosharitra.com/).

This project is supported by Schmidt Sciences and the John Templeton Foundation


## Citation
If you use Hyrax in research, cite Hyrax using `CITATION.cff` at the repository root.

To generate the dependency citation report used by Hyrax workflows, first install Citation Compass and then run the verified repository wrapper script:

```
pip install citation-compass
python scripts/print_citation_compass_report.py --config citation_compass.toml
```

Maintenance policy: only include methodologically central dependencies in `citation_compass.toml`
(not every utility/transitive package), and review both citation files during release preparation.
50 changes: 50 additions & 0 deletions citation_compass.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Citation Compass source configuration for Hyrax.
# v1 policy: include only methodologically central runtime dependencies.

[project]
name = "hyrax"
repository = "https://github.com/lincc-frameworks/hyrax"
primary_citation = "CITATION.cff"

# Canonical command path for maintainers/users:
# python scripts/print_citation_compass_report.py --config citation_compass.toml

[[dependencies]]
package = "torch"
reason = "Core ML framework used by Hyrax models and training/inference execution."
url = "https://pytorch.org/"

[[dependencies]]
package = "pytorch-ignite"
reason = "Distributed training and orchestration infrastructure in Hyrax verbs."
url = "https://pytorch.org/ignite/"

[[dependencies]]
package = "astropy"
reason = "FITS and astronomy data interfaces used in data workflows."
url = "https://www.astropy.org/"

[[dependencies]]
package = "mlflow"
reason = "Experiment tracking and model comparison support."
url = "https://mlflow.org/"

[[dependencies]]
package = "umap-learn"
reason = "Latent space dimensionality reduction used in analysis workflows."
url = "https://umap-learn.readthedocs.io/"

[[dependencies]]
package = "lancedb"
reason = "Columnar vector/result storage for Hyrax output workflows."
url = "https://lancedb.com/"

[[dependencies]]
package = "pyarrow"
reason = "Data interchange layer used by LanceDB integrations."
url = "https://arrow.apache.org/docs/python/"

[[dependencies]]
package = "chromadb"
reason = "Vector database backend available in similarity search workflows."
url = "https://www.trychroma.com/"
18 changes: 18 additions & 0 deletions docs/reference_and_faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,21 @@ prior to the doc sprint.
* :doc:`Model Comparison <model_comparison>` - Tools available in Hyrax for model tracking and comparison.
* :doc:`Data Set Splits <data_set_splits>` - Deep dive into how Hyrax handles data set splits.



Citation and attribution
------------------------

Use ``CITATION.cff`` in the repository root to cite Hyrax itself.

Use Citation Compass with the repository wrapper script and ``citation_compass.toml``
to generate the dependency citation report for methodologically central Hyrax
dependencies:

.. code-block:: bash

pip install citation-compass
python scripts/print_citation_compass_report.py --config citation_compass.toml

Hyrax intentionally keeps dependency citations curated to key methodological
libraries rather than every utility dependency.
111 changes: 111 additions & 0 deletions scripts/print_citation_compass_report.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
#!/usr/bin/env python3
"""Render Hyrax citation metadata using the citation-compass Python API.

This wrapper exists because citation-compass currently ships as a library,
not as a ``python -m citation_compass`` CLI entrypoint.
"""

from __future__ import annotations

import argparse
from pathlib import Path

from citation_compass.citation import cite_inline, get_all_citations


def _strip_quotes(value: str) -> str:
value = value.strip()
if len(value) >= 2 and value[0] == value[-1] and value[0] in {'"', "'"}:
return value[1:-1]
return value


def parse_citation_compass_toml(config_path: Path) -> tuple[dict[str, str], list[dict[str, str]]]:
"""Parse the small subset of TOML used by ``citation_compass.toml``.

The repository config intentionally uses only simple string values in a
single ``[project]`` table and repeated ``[[dependencies]]`` tables, so a
lightweight parser is sufficient here and avoids requiring an additional
TOML dependency for this helper script.
"""
project: dict[str, str] = {}
dependencies: list[dict[str, str]] = []
current_dependency: dict[str, str] | None = None
section: str | None = None

for raw_line in config_path.read_text(encoding="utf-8").splitlines():
line = raw_line.strip()
if not line or line.startswith("#"):
continue
if line == "[project]":
section = "project"
current_dependency = None
continue
if line == "[[dependencies]]":
section = "dependencies"
current_dependency = {}
dependencies.append(current_dependency)
continue
if "=" not in line:
continue

key, value = [part.strip() for part in line.split("=", 1)]
value = _strip_quotes(value)

if section == "project":
project[key] = value
elif section == "dependencies" and current_dependency is not None:
current_dependency[key] = value

return project, dependencies


def register_citations(config_path: Path) -> None:
"""Register the Hyrax and dependency citations described in the config file."""
project, dependencies = parse_citation_compass_toml(config_path)

project_name = project.get("name", "hyrax")
repository = project.get("repository", "")
primary_citation = project.get("primary_citation", "CITATION.cff")

cite_inline(
f"{project_name}.software",
(
f"Project: {project_name}\n"
f"Cite the software using {primary_citation}.\n"
f"Repository: {repository}"
),
)

for dependency in dependencies:
package = dependency.get("package", "unknown-package")
reason = dependency.get("reason", "No reason provided.")
url = dependency.get("url", "")
citation_text = f"Dependency: {package}\nReason in Hyrax: {reason}"
if url:
citation_text += f"\nProject URL: {url}"
cite_inline(f"dependency.{package}", citation_text)


def main() -> int:
parser = argparse.ArgumentParser(description="Render Hyrax citation metadata using citation-compass.")
parser.add_argument(
"--config",
default="citation_compass.toml",
help="Path to the Citation Compass config file (default: citation_compass.toml).",
)
args = parser.parse_args()

config_path = Path(args.config)
if not config_path.exists():
raise FileNotFoundError(f"Citation Compass config file not found: {config_path}")

register_citations(config_path)
for citation in get_all_citations():
print(citation)
print()
return 0


if __name__ == "__main__":
raise SystemExit(main())
Loading
Loading