feat(workstream-c): add cheatsheet categorizer and grouping#934
feat(workstream-c): add cheatsheet categorizer and grouping#934shreeshtripurwarcomp23-coder wants to merge 2 commits into
Conversation
- Implement categorize_cheatsheet() with 29-label controlled taxonomy - Implement group_cheatsheets() with stable sha256-based group IDs - Deterministic keyword/rule baseline, no LLM dependency - LLM-optional path with safe fallback on failure - Validate all CheatsheetRecord fields in __post_init__ - 50 tests covering all acceptance criteria from RFC Issue C CheatsheetRecord uses local stub pending Workstream B merge.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Summary by CodeRabbitRelease Notes
WalkthroughAdds a new ChangesCheatsheet Categorization System
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@application/tests/test_cheatsheet_categorizer.py`:
- Around line 42-48: The _make_record helper function creates CheatsheetRecord
instances with an empty summary string, but CheatsheetRecord.__post_init__
enforces that summary must be non-empty, causing construction to fail before
test assertions run. Replace the empty string assignment `summary=""` in the
_make_record function with a minimal non-empty placeholder value (such as a
single space, period, or descriptive placeholder text like "Test summary") to
satisfy the validation requirement.
In
`@application/utils/external_project_parsers/parsers/cheatsheet_categorizer.py`:
- Around line 194-210: The __post_init__ method in CheatsheetRecord currently
validates only string-typed fields but does not validate the element types of
list-typed fields (headings and category_hints). This causes runtime crashes
when non-string items reach code expecting to call " ".join() on these fields.
Add validation in __post_init__ to ensure that headings and category_hints are
lists containing only string elements, raising a ValueError with a clear message
if any element is not a string, so that parser input validation fails fast at
construction time rather than later during string joining operations.
- Around line 366-381: The `_validate_labels` function currently allows
`uncategorized` to be returned alongside other valid labels, which violates the
sentinel semantics where `uncategorized` should only be returned when it is the
sole valid label. Modify the function to add logic after building the deduped
list: if `uncategorized` (or the appropriate constant reference from TAXONOMY)
is present in the result AND there are other valid labels alongside it, remove
the `uncategorized` entry from the returned list. This ensures that
`uncategorized` is only returned when no other categories match, preserving its
role as a fallback indicator and preventing inconsistent downstream grouping and
UX.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: ca9bd7d2-9738-4eb1-9bf6-5c1b32faf0db
📒 Files selected for processing (2)
application/tests/test_cheatsheet_categorizer.pyapplication/utils/external_project_parsers/parsers/cheatsheet_categorizer.py
CheatsheetRecord uses local stub pending Workstream B merge.