Skip to content

feat(api): add feature-flagged /rest/v1/health endpoint#936

Open
skypank-coder wants to merge 1 commit into
OWASP:mainfrom
skypank-coder:feat/health-endpoint
Open

feat(api): add feature-flagged /rest/v1/health endpoint#936
skypank-coder wants to merge 1 commit into
OWASP:mainfrom
skypank-coder:feat/health-endpoint

Conversation

@skypank-coder

Copy link
Copy Markdown
Contributor

Closes #935

What

Thin v1 deploy/uptime health probe at GET /rest/v1/health, off by default behind the CRE_ENABLE_HEALTH feature flag.

Behavior

  • 404 when the flag is off (default) — endpoint behaves as if it doesn't exist.
  • 200 when the DB is reachable AND the dataset is non-empty (cre_count > 0 and standards_count > 0), returning {ok, db_reachable, cre_count, standards_count}.
  • 503 when the DB is unreachable or the dataset is empty/broken (reason explains which).

How

Node_collection.health_check() runs two cheap COUNT queries (over CRE and Node), never raises — connectivity errors are reported as ok=False so the route can return the right status code.

Scope

DB reachability + data sanity only. No Neo4j / Redis. No GA / mapping completeness — those are deliberately excluded and stay in ops tooling (verify_ga_completeness, monitor_ga_health, weekly automation). A heavier "deep health" can be added separately later.

Testing

  • 4 new tests in web_main_test.py: disabled→404, enabled+empty→503, enabled+populated→200, DB unreachable→503. All green.
  • black==24.4.2 --check clean on all changed files.

Files changed

  • application/feature_flags.pyis_health_endpoint_enabled()
  • application/database/db.pyNode_collection.health_check()
  • application/web/web_main.py/rest/v1/health route
  • application/tests/web_main_test.py — tests

Adds a lightweight deploy/uptime health probe at GET /rest/v1/health, gated behind the CRE_ENABLE_HEALTH feature flag (off by default).

Behavior:
- Flag off (default): endpoint returns 404, as if it does not exist.
- Flag on, healthy: 200 with {ok, db_reachable, cre_count, standards_count} when the serving DB is reachable and holds a non-empty dataset.
- Flag on, unhealthy: 503 when the DB is unreachable or the dataset is empty/broken (reason explains which).

Node_collection.health_check() runs cheap COUNT queries over CRE and Node, never raises (connectivity errors are reported as ok=False), and treats a zero count for either as an empty dataset.

Scope is intentionally limited to DB reachability + data sanity. Deeper checks (gap-analysis completeness, mapping coverage, Neo4j, Redis) are deliberately excluded by design and belong in ops tooling.
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Summary by CodeRabbit

Release Notes

  • New Features

    • Added a health check endpoint that provides database diagnostics, including connectivity status, row counts, and operational health status (disabled by default; can be enabled via environment configuration).
  • Tests

    • Added comprehensive test coverage for the health check endpoint, covering enabled/disabled states and various database conditions.

Walkthrough

Adds a feature-flagged GET /rest/v1/health endpoint. A new is_health_endpoint_enabled() function reads the CRE_ENABLE_HEALTH environment variable. Node_collection.health_check() runs two COUNT queries against the DB and returns a structured result. The Flask route returns 404 when disabled, 200 on success, or 503 on failure. Four unit tests cover all branches.

Changes

Health probe endpoint

Layer / File(s) Summary
Feature flag and DB health_check() implementation
application/feature_flags.py, application/database/db.py
is_health_endpoint_enabled() reads CRE_ENABLE_HEALTH against TRUE_VALUES. Node_collection.health_check() runs COUNT queries for CRE rows and Node/standard rows, handles OperationalError separately, and returns ok, db_reachable, cre_count, standards_count, and reason.
Flask route wiring
application/web/web_main.py
Imports is_health_endpoint_enabled, registers GET /rest/v1/health, aborts with 404 when the flag is off, calls database.health_check(), and returns the JSON payload with HTTP 200 or 503 based on result["ok"].
Unit tests
application/tests/web_main_test.py
Four tests cover: 404 when CRE_ENABLE_HEALTH is unset, 503 with reason="empty dataset" when enabled but no data, 200 with counts when data is seeded, and 503 with db_reachable=false when health_check is patched to simulate an unreachable DB.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a feature-flagged health endpoint to the API.
Description check ✅ Passed The description is well-structured and directly related to the changeset, covering what was added, behavior, implementation details, scope limitations, and testing.
Linked Issues check ✅ Passed The PR fully satisfies issue #935: implements GET /rest/v1/health gated by CRE_ENABLE_HEALTH flag, returns 200/503/404 as specified, performs DB + data sanity checks only, and excludes Neo4j/Redis/GA completeness checks.
Out of Scope Changes check ✅ Passed All changes are directly within scope: health endpoint implementation, feature flag helper, database health check method, and comprehensive tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
application/database/db.py (1)

2292-2297: ⚡ Quick win

Consider narrowing the broad Exception handler to SQLAlchemy exception types.

The bare except Exception: on line 2292 is caught by static analysis (Ruff BLE001). While the defensive intent is clear from the comment, you could make this more precise by catching sqlalchemy.exc.DatabaseError (which is the base class for all SQLAlchemy database exceptions, including OperationalError) or a tuple of specific exceptions like (OperationalError, ProgrammingError, DatabaseError). This would maintain the defensive posture while being more explicit about what you're guarding against.

♻️ Proposed refinement
+from sqlalchemy.exc import DatabaseError
+
 def health_check(self) -> Dict[str, Any]:
     try:
         cre_count = self.session.query(func.count(CRE.id)).scalar() or 0
         standards_count = self.session.query(func.count(Node.id)).scalar() or 0
     except OperationalError:
         return {
             "ok": False,
             "db_reachable": False,
             "reason": "database unreachable",
         }
-    except Exception:  # pragma: no cover - defensive, never fail open
+    except DatabaseError:  # pragma: no cover - defensive, catch all DB errors
         return {
             "ok": False,
             "db_reachable": False,
             "reason": "database health query failed",
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/database/db.py` around lines 2292 - 2297, The broad `except
Exception:` handler in the database health check function should be narrowed to
catch specific SQLAlchemy exceptions instead. Replace the bare Exception catch
with either `sqlalchemy.exc.DatabaseError` (which is the base class for all
SQLAlchemy database exceptions) or a tuple of specific exceptions like
`(OperationalError, ProgrammingError, DatabaseError)`. This maintains the
defensive error handling while being more explicit about what database-related
errors are being guarded against, which will satisfy the Ruff BLE001 static
analysis check.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@application/database/db.py`:
- Around line 2292-2297: The broad `except Exception:` handler in the database
health check function should be narrowed to catch specific SQLAlchemy exceptions
instead. Replace the bare Exception catch with either
`sqlalchemy.exc.DatabaseError` (which is the base class for all SQLAlchemy
database exceptions) or a tuple of specific exceptions like `(OperationalError,
ProgrammingError, DatabaseError)`. This maintains the defensive error handling
while being more explicit about what database-related errors are being guarded
against, which will satisfy the Ruff BLE001 static analysis check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: dde30f7c-1450-4218-9931-ccaaa7a18f56

📥 Commits

Reviewing files that changed from the base of the PR and between e853cd3 and ec908bb.

📒 Files selected for processing (4)
  • application/database/db.py
  • application/feature_flags.py
  • application/tests/web_main_test.py
  • application/web/web_main.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a feature-flagged /rest/v1/health endpoint (thin v1: DB + data sanity)

1 participant