Skip to content

Fix streaming NDJSON output not emitting incrementally#12299

Draft
zachbai wants to merge 34 commits into
masterfrom
zb/wsh-v2
Draft

Fix streaming NDJSON output not emitting incrementally#12299
zachbai wants to merge 34 commits into
masterfrom
zb/wsh-v2

Conversation

@zachbai

@zachbai zachbai commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Description

Fix streaming NDJSON output (warp agent run --output-format ndjson) not emitting incrementally.

Root cause

During LLM streaming, the server sends AppendToMessageContent events that update a single AIAgentOutputMessage in-place (via upsert_output_for_message). The streaming NDJSON code in the SDK driver tracked only message count (streaming_ndjson_emitted) to detect new output. Since in-place updates don't change the count, the streaming code never detected them — output only appeared as a single chunk when the exchange finished.

Fix

  • Added a revision: u64 counter to AIAgentOutput that is bumped on every mutation (both extend for new messages and upsert for in-place updates).
  • The driver now tracks both message count AND revision. When the revision changes without a count increase, it re-emits the last message with its updated content.

Testing

  • All existing tests pass (41 task tests, 16 convert_conversation tests)
  • Compilation verified with cargo check -p warp and full cargo build -p warp

CHANGELOG-NONE


Conversation: https://staging.warp.dev/conversation/3b2a5233-c0f8-4484-ac41-2f521147b3ec
Run: https://oz.staging.warp.dev/runs/019e9af3-5959-7747-b9f7-fda60d01cf83

This PR was generated with Oz.

zachbai and others added 30 commits June 4, 2026 15:57
- shell_integration.rs: Incremental byte-stream parser for OSC 133
  semantic prompt sequences (A/B/C/D). Strips sequences from output
  and emits structured ShellEvents. Handles partial sequences across
  feed() boundaries, both BEL and ST terminators, and passes non-133
  OSC sequences through unchanged. 22 unit tests.
- wsh_zsh_init.sh: Zsh integration script that sources the user's
  real .zshrc and installs precmd/preexec hooks emitting OSC 133
  markers (D+A in precmd, B via PROMPT, C in preexec).

Co-Authored-By: Oz <oz-agent@warp.dev>
Pure ANSI rendering functions for styled agent blocks, status bar,
input prompt, and cursor/scroll region helpers. Includes 19 inline
unit tests covering box drawing, word wrapping, status bar width,
and escape sequence correctness.

Co-Authored-By: Oz <oz-agent@warp.dev>
Introduce the wsh crate, a CLI that wraps the user's shell in a PTY
proxy. This commit includes:

- Crate scaffold (Cargo.toml, lib.rs, main.rs)
- pty.rs: PTY spawning via openpty/fork/exec, resize via TIOCSWINSZ,
  FD_CLOEXEC on master fd
- event_loop.rs: poll-based multiplexing of stdin, PTY master, and a
  SIGWINCH self-pipe for terminal resize propagation. Raw mode on
  entry, restored on exit. POLLHUP detection for child exit.

Co-Authored-By: Oz <oz-agent@warp.dev>
Three issues caused broken rendering when the agent mode displayed
blocks (thinking indicator, Running, Output):

1. Raw mode line endings: crossterm's enable_raw_mode() disables OPOST,
   so \n no longer translates to \r\n. All writeln!() in render_block
   and the \n in render_thinking_indicator produced bare LF, causing
   staircase-stepping (each line shifted right). Fixed by using
   write!() with explicit \r\n and leading \r on each line.

2. Captured PTY output containing \r: The PTY line discipline converts
   \n to \r\n in command output. When this raw output was placed in the
   block body, embedded \r characters moved the cursor to column 1
   mid-render, overwriting the left border character. Fixed by stripping
   \r from captured output before rendering.

3. Redundant \r in submit_agent_query: clear_line() already includes
   \r, so the extra write_stdout(b"\r") was unnecessary. Removed.

Co-Authored-By: Oz <oz-agent@warp.dev>
Co-Authored-By: Oz <oz-agent@warp.dev>
Implements the screen module with:
- enter_alt_screen/leave_alt_screen for managing alt screen + raw mode
- render() for full-frame rendering using crossterm queue/flush
- Layout calculation splitting the screen into scrollback, active grid,
  optional agent input line, and status bar regions
- Cell-to-crossterm style translation with run-length style batching
- Scroll offset support for scrollback navigation
- Status bar with reverse video, left/right aligned content
- Agent input line with bold prefix and block cursor
- Unit tests for layout calculation, color conversion, and status bar formatting

Co-Authored-By: Oz <oz-agent@warp.dev>
Minimal terminal emulator that processes VT escape sequences and
maintains a grid of Cells. Supports:
- Character printing with deferred wrap
- Cursor movement (CUP, CUU/CUD/CUF/CUB, CHA, VPA)
- SGR attributes (colors, bold, italic, underline, etc.)
- Extended colors (256-color indexed and RGB)
- Erase operations (ED, EL, ECH)
- Scroll regions (DECSTBM) with scroll up/down
- Insert/delete lines and characters (IL, DL, ICH, DCH)
- Alt screen buffer (DECSET/DECRST 1049)
- Save/restore cursor (ESC 7/8)
- Reverse index (ESC M) and index (ESC D)
- Scrollback (scrolled-out rows)
- Resize with content preservation

Co-Authored-By: Oz <oz-agent@warp.dev>
Alt-screen architecture replacing the passthrough model. wsh now:
- Embeds a terminal emulator (MiniTerm) for the active block
- Manages structured scrollback via BlockManager
- Renders everything to the host terminal via the screen module
- Captures completed blocks on prompt boundaries (OSC 133)

Co-Authored-By: Oz <oz-agent@warp.dev>
The 'thinking...' indicator is now transient render state instead of a
permanent scrollback entry. It appears during AgentRunning mode and
vanishes automatically when the command completes and the mode returns
to Shell.

Co-Authored-By: Oz <oz-agent@warp.dev>
The active block now shows only rows up to the cursor position,
leaving room for scrollback to display completed blocks and agent
indicators above.

Co-Authored-By: Oz <oz-agent@warp.dev>
Two root causes of broken rendering:

1. grid_offset showed the BOTTOM of the MiniTerm grid instead of the
   content around the cursor. With dynamic active height (cursor_row+1),
   a cursor at row 0 would compute grid_offset = grid.len()-1, showing
   a blank row instead of the prompt. Fix: compute grid_offset from
   cursor position so the visible region always contains the cursor.

2. capture_completed_block called resize() with the same dimensions,
   which copies old content back into the grid — effectively a no-op.
   Old content remained visible in the active block AND appeared in
   scrollback, causing duplication. Fix: add MiniTerm::reset() that
   blanks the grid and resets cursor to (0,0), and call it after
   capturing block content.

Symptoms fixed:
- Invisible prompt at startup
- Content rendering at wrong position
- Duplicated output (scrollback + active block showing same content)
- Previous command output bleeding into current block
- Prompt floating in middle of screen with blank rows below

Co-Authored-By: Oz <oz-agent@warp.dev>
Content now flows naturally from top to bottom: scrollback rows,
then active block immediately after, with empty space below. Previously
the active block was anchored to the bottom of the usable area,
leaving a gap between scrollback and the prompt.

Co-Authored-By: Oz <oz-agent@warp.dev>
Replace the mock agent with a real Warp agent connection. wsh now
spawns `warp agent run --output-format ndjson` as a subprocess when
the user submits a query in agent mode (Ctrl-A).

- Agent stdout is polled alongside PTY and stdin
- NDJSON events are parsed and rendered as styled blocks in scrollback
- Agent text, tool calls, tool results, and errors each get distinct styling
- Ctrl-C kills the agent subprocess
- Binary configurable via WSH_AGENT_BINARY env var (defaults to 'warp')

Co-Authored-By: Oz <oz-agent@warp.dev>
- Add resolve_agent_binary() that checks WSH_AGENT_BINARY env var,
  then tries 'warp'/'oz' on PATH, then falls back to macOS app bundle
  paths (/Applications/WarpDev.app, /Applications/Warp.app)
- Handle 'system' NDJSON events (run_started, conversation_started)
  gracefully instead of showing raw JSON debug output
- Show run URL as dim text when agent starts
- Change stderr from Stdio::piped() to Stdio::null() to prevent
  potential deadlock from unread stderr buffer

Co-Authored-By: Oz <oz-agent@warp.dev>
capture_completed_block was calling miniterm.resize() after cloning the
grid into blocks. resize() preserves content, so every subsequent
PromptStart re-captured the same grid — producing duplicate blocks and
keeping a stale cursor position that inflated active_height, starving
the scrollback region of display rows for new agent output.

Replace resize() with reset(), which clears the grid and resets the
cursor to (0,0). The shell redraws its prompt on the fresh grid.

Fixes:
- AI query output not showing for 2nd+ queries
- N copies of the same block after AI query → shell command

Co-Authored-By: Oz <oz-agent@warp.dev>
- When agent_input or agent_status is present, set active_height=0 to
  hide the shell prompt during agent execution
- Render agent_status and agent_input inline after scrollback rows
  instead of pinned to fixed bottom rows above status bar
- Update existing tests and add layout_with_agent_status test

Co-Authored-By: Oz <oz-agent@warp.dev>
Agent status and input now render at the actual next_row position
(after rendered scrollback content) instead of at computed layout
positions that assumed the scrollback region was full. This fixes
the 'pinned to bottom' issue.

Also removed unused agent_input_row/agent_status_row fields from
the Layout struct since rendering uses next_row directly.

Co-Authored-By: Oz <oz-agent@warp.dev>
- Spinner cycles through 10 braille frames on each render tick
- Suppressed system/run_started URL from agent output (noise)

Co-Authored-By: Oz <oz-agent@warp.dev>
Use 120ms poll timeout during AgentRunning mode so the spinner
animates at ~8fps even when no I/O events arrive. In Shell mode,
poll blocks indefinitely as before.

Co-Authored-By: Oz <oz-agent@warp.dev>
Extract conversation_id from the first agent run's NDJSON system
events, then pass --conversation <id> to subsequent runs. All
prompts within the same wsh session now share one conversation,
maintaining context across interactions.

Co-Authored-By: Oz <oz-agent@warp.dev>
Two high-impact changes to eliminate visible flicker during redraws:

1. Wrap stdout in BufWriter(64KB) so all escape sequences for a frame
   are buffered and flushed atomically, instead of 4-5 mid-frame flushes
   through the default 8KB LineWriter.

2. Remove terminal::Clear(ClearType::All) which blanks the entire screen
   before redrawing, causing a visible flash. Instead, overwrite rows
   in-place and explicitly blank any stale rows between content and the
   status bar.

Also generalize render helper signatures from &mut io::Stdout to
&mut impl Write to support the BufWriter wrapper.

Co-Authored-By: Oz <oz-agent@warp.dev>
Agent text events (agent/agent_reasoning) now stream into an ephemeral
growing region (streaming_lines) instead of immediately becoming
permanent scrollback. The streaming region is rendered inline between
scrollback and agent_status/agent_input. When the agent finishes or is
canceled, streaming lines are flushed to permanent scrollback.

Non-text events (tool_call, tool_result, etc.) flush any accumulated
streaming text before adding themselves as permanent scrollback, so tool
invocations appear after the text that preceded them.

Co-Authored-By: Oz <oz-agent@warp.dev>
When the StreamingNdjsonOutput feature flag is enabled and the output
format is NDJSON, the driver now emits output messages incrementally
as they stream in via UpdatedStreamingExchange events, rather than
buffering the entire exchange. A per-exchange counter tracks how many
messages have been written so the is_finished() fallback only emits
the delta.

Gated behind FeatureFlag::StreamingNdjsonOutput (dogfood-only).
Default behavior (flag disabled) is unchanged.

Co-Authored-By: Oz <oz-agent@warp.dev>
During LLM streaming, the server sends AppendToMessageContent events
that update a single AIAgentOutputMessage in-place (via upsert). The
streaming NDJSON code tracked only message *count* to detect new output,
so in-place updates (where the count stays the same) were never emitted.

This meant callers saw nothing until the exchange finished and all
output was written as a single chunk.

Fix: add a revision counter to AIAgentOutput that is bumped on every
mutation (extend or upsert). The driver now tracks both message count
AND revision, re-emitting the last message whenever content changes
without a count increase.

Co-Authored-By: Oz <oz-agent@warp.dev>
@cla-bot cla-bot Bot added the cla-signed label Jun 6, 2026
zachbai and others added 3 commits June 6, 2026 16:07
Add --model kimi-k26-fireworks to the args passed to `warp agent run`
in AgentProcess::spawn.

Co-Authored-By: Oz <oz-agent@warp.dev>
…rsor

The active block height was calculated as cursor_row + 1, which clipped
content rendered below the cursor (e.g. zsh completion menus). Now we
scan the MiniTerm grid from the bottom to find the last non-blank row
and use max(cursor_row + 1, last_content_row + 1) as the content height.

When no completions are shown, the grid rows below the cursor are blank,
so the active block stays compact.

Co-Authored-By: Oz <oz-agent@warp.dev>
…rsor

The active block height was calculated as cursor_row + 1, which clipped
content rendered below the cursor (e.g. zsh completion menus). Now we
scan the MiniTerm grid from the bottom to find the last non-blank row
and use max(cursor_row + 1, last_content_row + 1) as the content height.

When no completions are shown, the grid rows below the cursor are blank,
so the active block stays compact.

Co-Authored-By: Oz <oz-agent@warp.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant