Add string built-ins for multi-line text formatting: ?indent, ?dedent, ?wrap, ?pad_lines#130
Add string built-ins for multi-line text formatting: ?indent, ?dedent, ?wrap, ?pad_lines#130gdisirio wants to merge 4 commits into
Conversation
…, ?wrap, ?pad_lines These four built-ins make it easier to format multi-line text, which is useful when generating source code, configuration files, documentation comments, and similar structured output. They all work on the string they're applied to and have no side effects; none require any new configuration or language mode. - ?indent(prefix): prepends prefix to each (non-empty) line. - ?dedent(prefix): removes prefix from the start of each line that has it (the inverse of ?indent). - ?wrap(width, firstPrefix[, restPrefix]): word-wraps the string to the given column width, with configurable per-line prefixes. Handy for wrapped comment blocks. - ?pad_lines(width[, fillChar]): pads each line on the right to the given column. Unlike ?right_pad, which pads the string as a whole, this operates per line, which is useful for aligning multi-line text. Line breaks (LF, CR, CRLF) are recognized and preserved. Added JUnit coverage and FreeMarker Manual reference entries (with @SInCE 2.3.35).
|
I'm primarily commenting just to signtal that I have seen this and will eventually deal with it in depth.
The main doubt there is handling tabs, and even non-breaking-space characters. |
|
Thanks for the read! All three points are fair, here's how I'd address them. On renaming On tabs and non-breaking spaces: widths in all four built-ins are counted in Java Non-breaking spaces are an interesting case: On One related thought, since you brought up the tab handling: it might be useful to have built-ins for normalizing tabs and spaces explicitly. Something like For code generation that's a real need — if the target project uses tabs, the generator needs to emit tabs; if it mandates 4-space indent, ditto. Having primitives for that means generators don't each have to roll their own. Not proposing them in this PR — just floating the idea, since it's the underlying capability you'd want behind the "expand tabs first" docs note. Happy to do a follow-up PR if it sounds worth having. I'll push the revisions for the three current points in a day or two and ping you. Thanks again for engaging on this. |
…dent. - Rename ?pad_lines / ?padLines -> ?right_pad_lines / ?rightPadLines, for consistency with ?right_pad / ?left_pad. The class was renamed to right_pad_linesBI to match the snake_case naming convention used by the surrounding builtins. NUMBER_OF_BIS unchanged (renamed pair). - Add a no-argument form to ?dedent: ?dedent() finds the longest leading whitespace (spaces and tabs) that is a common prefix of all non-empty lines and removes it. Same semantics as Python's textwrap.dedent — robust to imperfect input. Empty/whitespace-only lines are ignored when computing the common prefix. A leading tab and a leading space are distinct (no implicit collapsing), matching Python. - Update tests: rename pad_lines tests; add 8 tests for ?dedent() covering uniform indent, mixed indent, blank lines, no common prefix, tabs, tabs+spaces distinction, empty input, and already-dedented input. - README: rename and document ?right_pad_lines, document the new no-arg ?dedent() with examples, add notes that widths are counted in Java chars (not display columns) and that ?wrap correctly leaves U+00A0 inside words (not used as break points). Full ./gradlew check is green; ChibiOS oop+xhal regenerate with zero diff. Changes parallel what will go on the upstream PR branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Aligns with ?right_pad / ?left_pad naming. Internal class renamed to right_pad_linesBI accordingly. Registration moved to the alphabetical position after right_pad. Tests, manual entry, and index link renamed. Per review comment by ddekany on PR apache#130.
…nt-style) Adds a more robust default behaviour for ?dedent. The no-argument form finds the longest leading whitespace (spaces and tabs only) that is a common prefix of every non-empty line, and removes that. This handles imperfect input (lines with different leading-whitespace amounts) gracefully, whereas the original explicit-prefix form leaves any line not starting with the exact prefix unchanged. Semantics match Python's textwrap.dedent. Empty/whitespace-only lines are ignored when computing the common prefix and pass through unchanged. A leading tab and a leading space are distinct (no implicit collapsing), again matching Python. The explicit-prefix form ?dedent(prefix) remains for cases where exact control is wanted; it's not redundant — just less robust by design. 8 new JUnit tests covering uniform indent, mixed indent, blank-line handling, no-common-prefix passthrough, tabs, tabs+spaces distinction, empty input, and already-dedented input. Manual section expanded with the new form and a worked example. Per review comment by ddekany on PR apache#130.
…pace behaviour Adds notes in the manual for the new built-ins: - For ?right_pad_lines: widths are counted in Java chars (UTF-16 code units), not visual display columns — same as ?right_pad / ?left_pad. A tab counts as one character, not as an advance to the next tab stop. Visual alignment for tab-containing input requires expanding tabs first. - For ?wrap: same width semantics, plus a note that words are split on Java's \s+, which does NOT include U+00A0 (non-breaking space) — so a non-breaking space correctly stays inside a word and is never used as a break point. This is the intended behaviour. Per review comment by ddekany on PR apache#130 about tab and non-breaking space handling.
This adds four string built-ins that make it easier to format multi-line text, which is useful when generating source code, configuration files, documentation comments, and similar structured output.
?indent(prefix)— prependsprefixto each (non-empty) line.?dedent(prefix)— removesprefixfrom the start of each line that has it (the inverse of?indent).?wrap(width, firstPrefix[, restPrefix])— word-wraps the string to the given column width, with configurable per-line prefixes (handy for wrapped comment blocks).?pad_lines(width[, fillChar])— pads each line on the right to the given column. Unlike?right_pad, which pads the string as a whole, this operates per line, which is useful for aligning multi-line text.All four operate on the string they're applied to, have no side effects, and require no new configuration or language mode. Line breaks (LF, CR, CRLF) are recognized and preserved.
Backward compatibility
Purely additive — these are new built-in names, so existing templates are unaffected.
Testing & docs
IndentAndWrapBuiltInTest.@since 2.3.35)../gradlew checkand./gradlew manualOfflineboth pass.Background
These were developed for a code-generation workflow (generating embedded C in ChibiOS, via FMPP), but they're generally useful for any multi-line output.