Conversation
* linkml: add tool1 schema * linkml: add schema utils * linkml: add schema vignette + schema_to_mermaid.R * linkml: reorder tool1 schema * add schema_versions for mermaid diagrams * pkgdown fixes * gha: refactor deploy workflow for dev and main branches * pkgdown: use auto development mode * Bump version: 0.0.3 => 0.0.3.9000 * r-ulid: grab from umccr conda channel * makefile: add bump rule * makefile: add bump rule * Bump version: 0.0.3.9000 => 0.0.3.9001 * rattler-build upload anaconda: use channel, not label * Bump version: 0.0.3.9001 => 0.0.3.9002 * gha conda: drop umccr prefix to find dev label * Bump version: 0.0.3.9002 => 0.0.3.9003 * gha: use ssh-key for bot committing to protected branch * Bump version: 0.0.3.9003 => 0.0.3.9004 * gha conda pkgdown: drop umccr prefix to find dev label * Bump version: 0.0.3.9004 => 0.0.3.9005 * gha conda pkgdown: specify dev label * Bump version: 0.0.3.9005 => 0.0.3.9006 * [bot] Updating conda-lock files (v0.0.3.9006) * precommit: add air formatter * add CLAUDE.md * claude: add new nemotool skill * "Claude PR Assistant workflow" * "Claude Code Review workflow" * Change GitHub + Anaconda orgs (#39) * change gh org * change anaconda org * change anaconda org * GitHub Actions: use GitHub app for branch protection override (#40) * gha: use gh app for branch protection override * gha: use app email * gha: use same wf for dev + main (#41) * GitHub Actions: use reusable workflows for conda + pkgdown (#42) * gha: fix permissions (#43) * Add GHA-based version bumping workflow (#44) * Bump version: 0.0.3.9006 => 0.0.3.9007 * [bot] Updating conda-lock files (v0.0.3.9007) * precommit update * remove LinkML schema system (to be redesigned in separate PR) * gha: remove auto claude code review workflow * gha: restrict claude workflow to repo owners/collaborators/members --------- Co-authored-by: GitHub Actions <actions@github.com> Co-authored-by: tidywf-ci-bot[bot] <3171681+tidywf-ci-bot[bot]@users.noreply.github.com>
Schema refactor
API cleanup: methods, naming, and encapsulation
More API refactoring
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete overhaul of the
Config,Tool, andWorkflowAPIs, transitioning toa unified
schema.yamlformat that replaces the two-fileraw.yaml+tidy.yamlsplit, a new assertion and utility layer, expanded CLI, CI/CDmigration to reusable workflows, and substantially improved test coverage.
Breaking changes
1. Unified
schema.yamlreplacesraw.yaml+tidy.yamlThe most structural change. Each tool previously required two separate YAML
files in
inst/config/tools/<tool>/:raw.yaml- file patterns, ftypes, raw column names, and per-version schemastidy.yaml- tidy column names, types, descriptions, also per-versionThese are replaced by a single
schema.yamlwith a flattables:map. Eachcolumn entry now carries its
rawname,tidyname,type,description, anda
versionslist in one place:Child packages (
tidywigits,tidydragen) will need their per-tool configsmigrated to this format.
2. Config API renames and encapsulation
All schema/config accessor methods were renamed for consistency. The
configraw parsed list,
raw_schemas_all, andtidy_schemas_allfields are removedfrom the public interface; schemas are now computed internally and served
through methods.
conf$get_raw_patterns()conf$get_patterns()conf$get_raw_versions()versions:)conf$get_raw_descriptions()conf$get_descriptions()conf$get_raw_schemas_all()conf$get_schemas_raw()conf$get_tidy_schemas_all()conf$get_schemas_tidy()conf$get_raw_schema(tbl, v = ...)conf$get_schema_raw(tbl, version = ...)conf$get_tidy_schema(tbl, v = ...)conf$get_schema_tidy(tbl, version = ...)conf$are_raw_schemas_valid()initialize())conf$config(field)conf$raw_schemas_all(field)conf$tidy_schemas_all(field)New:
conf$get_col_map(tbl),conf$get_pattern(tbl),conf$get_ftype(tbl),conf$get_ftypes(),conf$get_description(tbl),conf$get_tables().Configalso gainspkgas a public field (was only a constructor argumentbefore), and is now
cloneable = FALSE.3. Tool API renames and encapsulation
tool$files(field)tool$list_files()(accessor)tool$tbls(field)tool$get_tbls()(accessor)tool$raw_schemas_all(field)tool$config$get_schemas_raw()tool$tidy_schemas_all(field)tool$config$get_schemas_tidy()tool$get_raw_schema(delegate field)tool$config$get_schema_raw()tool$get_tidy_schema(delegate field)tool$config$get_schema_tidy()tool$nemofy(diro, ...)tool$run(output_dir, ...)diroparameteroutput_dirout_dirparameteroutput_dirinput_pfixcolumninput_prefixcolumnpfix_includeparameterprefix_includeparametergroupcolumn inlist_files()prefix_suffixenframe_data()nemo_enframe()files,tbls, andfiles_tblare moved to private.Toolis nowcloneable = FALSE.4. Workflow API renames and encapsulation
wf$tools(field)wf$get_tools()(accessor)wf$files_tbl(field)wf$nemofy(diro, ...)wf$run(output_dir, ...)wf$list_files(type = ...)wf$list_files()(type arg removed)wf$get_metadata(..., pkgs = c("nemo"))pkgsdefaults toNULL— resolved fromself$metapkgWorkflownow validates path existence at construction time, gains ametapkgargument (defaults to
"nemo") for metadata version reporting, and iscloneable = FALSE.print()now showsfiles_total,files_matched,tidied, andwritten(formatted as a knitr table, consistent with
ToolandConfig).filter_files()now validatesinclude/excludevalues against knowntool_parsernames (e.g."tool1_table1") and correctly dispatches per-toolwhen include/exclude doesn't match any parser in a given tool.
New methods:
get_schemas_raw()andget_schemas_tidy()aggregate raw/tidyschemas across all tools, adding a
toolcolumn for identification.5. CLI renames
--out_dir--output_dir--pfix_include--prefix_includeinput_pfixoutput columninput_prefix--prefix_includeis now opt-in (no longer addsinput_prefixto output bydefault).
6.
nemo_metadata()signature changeinput_dirparameter renamed toinput_dirs(always a character vector).Return value changed from a named
list()(withjsonlite::unboxwrappers) toa single-row tibble with list-columns (
input_dirs,pkg_versions,files).7.
RPostgresdropped as hard dependencyRPostgresis removed fromDESCRIPTION. The DB writer now accepts acaller-supplied
dbdrvargument (anyDBI-compatible driver), so callers bringtheir own driver.
New features
New R modules
R/assert.Rnemo_stop(),nemo_assert_scalar_chr(),nemo_assert_chr(),nemo_assert_not_null(),nemo_assert_out_fmt(), and internal helpersassert_files_tbl(),assert_include_exclude(),check_unknown_parsers()R/config_prep.Rconfig_prep_raw_schema(),config_prep_raw(),config_prep_multi(),config_prep_write()— bootstrap aschema.yamlfrom example raw filesR/gha.Rnemo_gha_mermaid()— generates a Mermaid flowchart of the full CI/CD pipeline by combining local and remote GHA YAMLR/uml.Rnemo_uml()— generates a PlantUML SVG from R6 class names using theR6toPlantpackageR/schema_vis.Rnemo_schema_reactable(),nemo_schemavis_data(),reactable_schema()— interactive reactable schema explorer (wasinst/scripts/vignettes/schemas.R)File type changes (
ftype)The
schema.yamlformat consolidates and renames ftype values. The baseToolclass dispatches on ftype in
parse_by_ftype(); the full ftype set is now:ftypetxttsvtxt-keyvaluetxt-noheadtxt-noheadX1..XNtxt-nohead)csvcsv-nohead-longTool, requires subclassparse_{table}()overrideChild packages using
ftype: 'tsv'in their oldraw.yamlmust change toftype: 'txt'inschema.yaml. Oldftype: 'txt-nohead'(key-value) must becomeftype: 'txt-keyvalue'.New schema example tables
inst/config/tools/tool1/schema.yamlnow covers 6 tables, one per supportedftype, providing reference data for parsing tests:
table1—txt(3 versions:v1.2.3,v4.5.6,latest)table2—txt(2 versions:v1.0.0,latest)table3—txt-keyvaluetable4—txt-noheadtable5—csv-nohead-longtable6—csvCorresponding example data added under
inst/extdata/tool1/.metadata.parquetwritten per runBoth
Tool$write()andWorkflow$write()now accept awrite_metadata = TRUEboolean. When true, a
metadata.parquetfile is written to the output directoryalongside the tidy tables. The metadata tibble records
input_id,output_id,input_dirs,output_dir,pkg_versions, and a file manifest.Tool$list_files()~20x speedupSwitched from a regex approach via
fs::dir_infotomap + greplover apre-built flat file tibble. The
files_tblis computed once at construction andreused across all lookups.
Config scaffold helpers (
config_prep_*)A new family of functions for bootstrapping a
schema.yamlfrom example files:--output_id/--ulidCLI flagsThe
tidysubcommand gains:--output_id VALUE— tags output files with a fixed run identifier--ulid— auto-generates a ULID as the output identifier (mutually exclusivewith
--output_id)--maxonlistsubcommandcli_nemo_list()now accepts amaxparameter to cap the number of rows shown.Infrastructure and CI/CD
deploy.yamlrefactored to reusable workflowsThe 136-line monolithic
deploy.yamlis replaced by a 42-line orchestrator thatcalls reusable workflows from
tidywf/actions:The workflow now also triggers on
devpushes (not justmain).claude.yml— restrict@claudeto repo membersThe Claude GHA now gates all
@claudetriggers onauthor_association IN ["OWNER", "COLLABORATOR", "MEMBER"]to prevent abusefrom external commenters.
dependabot.ymladdedAutomated dependency update PRs enabled for GitHub Actions.
bump.yaml— usetidywf/actionsrepoPoints to
tidywf/actions/.github/workflows/bump.yaml@maininstead of the oldtidywf/.githubmonorepo reference.conda: aarch64 lock file added
deploy/conda/env/lock/conda-linux-aarch64.lockis now tracked alongside theexisting
linux-64lock file.deploy/conda/env/yaml/bump.yamladdedA dedicated conda env for the bumpversion workflow.
Testing
Manual test files for all R6 classes
New standalone
test-<ClassName>.Rfiles with propertest_thatblocks:tests/testthat/test-Config.Rget_*methods, error pathstests/testthat/test-Tool.Rtests/testthat/test-Tool1.Rtests/testthat/test-Workflow.Rtests/testthat/test-Workflow1.RRoxytest files expanded
New auto-generated test files from
@testexamplesblocks:test-roxytest-testexamples-assert.Rtest-roxytest-testexamples-cli_list.Rtest-roxytest-testexamples-cli_tidy.Rtest-roxytest-testexamples-config_prep.Rtest-roxytest-testexamples-gha.Rtest-roxytest-testexamples-schema_vis.RRemoved (R6 classes now tested manually):
test-roxytest-testexamples-Tool1.Rtest-roxytest-testexamples-Workflow.Rtest-roxytest-testexamples-Workflow1.RDocumentation and vignettes
New vignettes
vignettes/cicd.qmdnemo_gha_mermaid())vignettes/new-tool.qmdvignettes/schema_table.qmdschema.yamlexplorer vianemo_schema_reactable()Removed vignettes
vignettes/contribute.qmd(replaced bynew-tool.qmd)inst/doc-templates/reorganisedThe
inst/documentation/directory is renamed toinst/doc-templates/.Parameterised installation template fragments (conda, docker, pixi, R) are added
for child packages to include/reuse.
pkgdown
_pkgdown.ymlupdated with new vignettes and function reference groupingspkgdown/extra.scssadded for custom stylingCLAUDE.md
.claude/CLAUDE.mdadded with full nemo repo documentation for in-sessioncontext (repo layout, reference implementations, testing conventions, CLI docs,
logging, key API table, dev commands).
Dependency changes (
DESCRIPTION)assertthatassert.Rwrappers usingrlangjsonliteRPostgresquartoVignetteBuilderknitrstringr(Imports)knitr(Imports)print()methodshere(Suggests)htmltools,reactable(Suggests)R6toPlant(Suggests, GitLab remote)withr(Suggests)File inventory
Key additions:
R/assert.R(98 lines) — new assertion layerR/config_prep.R(181 lines) — schema scaffolding helpersR/gha.R(153 lines) — GHA Mermaid diagram generatorR/uml.R(70 lines) — PlantUML integrationR/schema_vis.R(promoted frominst/scripts/)inst/config/tools/tool1/schema.yaml(171 lines) — unified schematests/testthat/test-Tool.R(233 lines)tests/testthat/test-Tool1.R(138 lines)tests/testthat/test-Config.R(38 lines)tests/testthat/test-Workflow.R(89 lines)deploy/conda/env/lock/conda-linux-aarch64.lock(214 lines)Key deletions:
inst/config/tools/tool1/raw.yaml— replaced byschema.yamlinst/config/tools/tool1/tidy.yaml— replaced byschema.yamlinst/scripts/file_to_yaml.R— superseded byconfig_prep_*helpersinst/scripts/uml.R— superseded byR/uml.Rman/valid_out_fmt.Rd— function renamed tonemo_assert_out_fmtvignettes/contribute.qmd— replaced bynew-tool.qmdChecklist for child packages after merge
raw.yaml+tidy.yamlto unifiedschema.yamlschema.yaml: renameftype: 'tsv'→ftype: 'txt'; renameftype: 'txt-nohead'(if key-value) →ftype: 'txt-keyvalue'$nemofy(diro = ...)calls →$run(output_dir = ...)$toolsfield access →$get_tools()$filesfield access →$list_files()$tblsfield access →$get_tbls()conf$get_raw_*/conf$get_tidy_*method calls to new names--out_dir/--pfix_includeCLI flags if wrappingnemo.Rnemo_metadata(..., input_dir = ...)→..., input_dirs = ...dbdrvexplicitly when using the DB writer (no longer defaults toRPostgres)