Skip to content

Add string built-ins for multi-line text formatting: ?indent, ?dedent, ?wrap, ?pad_lines#130

Open
gdisirio wants to merge 4 commits into
apache:2.3-gaefrom
gdisirio:feature/string-formatting-builtins
Open

Add string built-ins for multi-line text formatting: ?indent, ?dedent, ?wrap, ?pad_lines#130
gdisirio wants to merge 4 commits into
apache:2.3-gaefrom
gdisirio:feature/string-formatting-builtins

Conversation

@gdisirio
Copy link
Copy Markdown

This adds four string built-ins that make it easier to format multi-line text, which is useful when generating source code, configuration files, documentation comments, and similar structured output.

  • ?indent(prefix) — prepends prefix to each (non-empty) line.
  • ?dedent(prefix) — removes prefix from the start of each line that has it (the inverse of ?indent).
  • ?wrap(width, firstPrefix[, restPrefix]) — word-wraps the string to the given column width, with configurable per-line prefixes (handy for wrapped comment blocks).
  • ?pad_lines(width[, fillChar]) — pads each line on the right to the given column. Unlike ?right_pad, which pads the string as a whole, this operates per line, which is useful for aligning multi-line text.

All four operate on the string they're applied to, have no side effects, and require no new configuration or language mode. Line breaks (LF, CR, CRLF) are recognized and preserved.

Backward compatibility

Purely additive — these are new built-in names, so existing templates are unaffected.

Testing & docs

  • JUnit coverage in IndentAndWrapBuiltInTest.
  • FreeMarker Manual reference entries added for each built-in (marked @since 2.3.35).
  • ./gradlew check and ./gradlew manualOffline both pass.

Background

These were developed for a code-generation workflow (generating embedded C in ChibiOS, via FMPP), but they're generally useful for any multi-line output.

…, ?wrap, ?pad_lines

These four built-ins make it easier to format multi-line text, which is
useful when generating source code, configuration files, documentation
comments, and similar structured output. They all work on the string
they're applied to and have no side effects; none require any new
configuration or language mode.

- ?indent(prefix): prepends prefix to each (non-empty) line.
- ?dedent(prefix): removes prefix from the start of each line that has
  it (the inverse of ?indent).
- ?wrap(width, firstPrefix[, restPrefix]): word-wraps the string to the
  given column width, with configurable per-line prefixes. Handy for
  wrapped comment blocks.
- ?pad_lines(width[, fillChar]): pads each line on the right to the
  given column. Unlike ?right_pad, which pads the string as a whole,
  this operates per line, which is useful for aligning multi-line text.

Line breaks (LF, CR, CRLF) are recognized and preserved. Added JUnit
coverage and FreeMarker Manual reference entries (with @SInCE 2.3.35).
@ddekany
Copy link
Copy Markdown
Contributor

ddekany commented May 29, 2026

I'm primarily commenting just to signtal that I have seen this and will eventually deal with it in depth. indent/dedent/wrap is something that we can certainly add as funcitonality.

pad_lines functionality feels a bit too special... which is not necessarily a blocker. But can you tell some use-case? Also I think since we have ?left_pad/?right_pad already, it should be called ?right_pad_lines. (Yes, there's no ?left_pad_lines, or not yet, but still, for consistency.)

The main doubt there is handling tabs, and even non-breaking-space characters. dedent assumes that you can provide exact prefix as string, but that's not very robust in face of inperfect input. Even indent can be affected by that approach, like some possibly wants specify the indentation width, and there tabs can play a role. The reason to think about these is that if we want to make this more advanced later, backard-comaptibility constratints can complicat that.

@gdisirio
Copy link
Copy Markdown
Author

Thanks for the read! All three points are fair, here's how I'd address them.

On renaming ?pad_lines to ?right_pad_lines — yeah, you're right that's the consistent name with ?right_pad/?left_pad. Will do. On whether it's needed at all: it does something ?right_pad can't, which is pad each line of a multi-line string independently. So "a\nbb"?right_pad_lines(5) gives you "a \nbb \n" (each line padded to 5), whereas ?right_pad would treat the 4-char string as one and pad it to 5. It's the difference between aligning a column of values vs. padding a single value, and it shows up a lot when generating tabular output.

On tabs and non-breaking spaces: widths in all four built-ins are counted in Java chars (UTF-16 code units), not visual columns — same as ?right_pad/?left_pad. So a tab counts as one character, not as "advance to next multiple of 8" or whatever. I'll make that explicit in each built-in's manual entry — something like "widths are in characters, not display columns; expand tabs to spaces first if you need visual alignment."

Non-breaking spaces are an interesting case: ?wrap splits on \s+, and Java's \s doesn't include U+00A0, so non-breaking spaces stay inside a word and aren't break points. That's actually exactly what you'd want from a non-breaking space, but it's by happy accident rather than by design — I'll mention it in the docs so it's visible behavior.

On ?dedent — yeah, requiring an exact prefix match is fragile. I'll add a no-argument form: ?dedent (with no args) finds the longest common leading whitespace across all non-empty lines and removes that. Same semantics as Python's textwrap.dedent, which is the well-known reference for "do the obvious thing with imperfect input." The explicit ?dedent(prefix) form would stay for the cases where you actually do want exact control — it's not redundant, just less robust by design.


One related thought, since you brought up the tab handling: it might be useful to have built-ins for normalizing tabs and spaces explicitly. Something like ?expand_tabs(tabWidth) for column-aware tab-to-spaces conversion (semantics of Python's str.expandtabs / Unix expand), and an inverse for converting leading-whitespace runs to tabs at multiples of tabWidth (Unix unexpand-style, leading-only — otherwise you'd break alignment).

For code generation that's a real need — if the target project uses tabs, the generator needs to emit tabs; if it mandates 4-space indent, ditto. Having primitives for that means generators don't each have to roll their own. Not proposing them in this PR — just floating the idea, since it's the underlying capability you'd want behind the "expand tabs first" docs note. Happy to do a follow-up PR if it sounds worth having.

I'll push the revisions for the three current points in a day or two and ping you. Thanks again for engaging on this.

gdisirio added a commit to gdisirio/freemarker-codegen that referenced this pull request May 29, 2026
…dent.

- Rename ?pad_lines / ?padLines -> ?right_pad_lines / ?rightPadLines,
  for consistency with ?right_pad / ?left_pad. The class was renamed
  to right_pad_linesBI to match the snake_case naming convention used
  by the surrounding builtins. NUMBER_OF_BIS unchanged (renamed pair).
- Add a no-argument form to ?dedent: ?dedent() finds the longest
  leading whitespace (spaces and tabs) that is a common prefix of all
  non-empty lines and removes it. Same semantics as Python's
  textwrap.dedent — robust to imperfect input. Empty/whitespace-only
  lines are ignored when computing the common prefix. A leading tab
  and a leading space are distinct (no implicit collapsing), matching
  Python.
- Update tests: rename pad_lines tests; add 8 tests for ?dedent()
  covering uniform indent, mixed indent, blank lines, no common prefix,
  tabs, tabs+spaces distinction, empty input, and already-dedented
  input.
- README: rename and document ?right_pad_lines, document the new
  no-arg ?dedent() with examples, add notes that widths are counted
  in Java chars (not display columns) and that ?wrap correctly leaves
  U+00A0 inside words (not used as break points).

Full ./gradlew check is green; ChibiOS oop+xhal regenerate with zero
diff. Changes parallel what will go on the upstream PR branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gdisirio added 3 commits May 29, 2026 22:54
Aligns with ?right_pad / ?left_pad naming. Internal class renamed to
right_pad_linesBI accordingly. Registration moved to the alphabetical
position after right_pad. Tests, manual entry, and index link renamed.

Per review comment by ddekany on PR apache#130.
…nt-style)

Adds a more robust default behaviour for ?dedent. The no-argument form
finds the longest leading whitespace (spaces and tabs only) that is a
common prefix of every non-empty line, and removes that. This handles
imperfect input (lines with different leading-whitespace amounts)
gracefully, whereas the original explicit-prefix form leaves any line
not starting with the exact prefix unchanged. Semantics match Python's
textwrap.dedent.

Empty/whitespace-only lines are ignored when computing the common
prefix and pass through unchanged. A leading tab and a leading space
are distinct (no implicit collapsing), again matching Python.

The explicit-prefix form ?dedent(prefix) remains for cases where exact
control is wanted; it's not redundant — just less robust by design.

8 new JUnit tests covering uniform indent, mixed indent, blank-line
handling, no-common-prefix passthrough, tabs, tabs+spaces distinction,
empty input, and already-dedented input. Manual section expanded with
the new form and a worked example.

Per review comment by ddekany on PR apache#130.
…pace behaviour

Adds notes in the manual for the new built-ins:

- For ?right_pad_lines: widths are counted in Java chars (UTF-16 code
  units), not visual display columns — same as ?right_pad / ?left_pad.
  A tab counts as one character, not as an advance to the next tab
  stop. Visual alignment for tab-containing input requires expanding
  tabs first.
- For ?wrap: same width semantics, plus a note that words are split on
  Java's \s+, which does NOT include U+00A0 (non-breaking space) — so
  a non-breaking space correctly stays inside a word and is never used
  as a break point. This is the intended behaviour.

Per review comment by ddekany on PR apache#130 about tab and non-breaking
space handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants