Add ipynb formatting buffer#12462
Conversation
Add a self-contained ipynb_parser crate that converts nbformat v4 Jupyter notebooks directly into FormattedText, and integrate it into the editor: ContentFormat::Ipynb, InitialBufferState::ipynb, Buffer::from_ipynb (with raw-JSON fallback on parse failure), and RichTextEditorModel::reset_with_ipynb. Co-Authored-By: Oz <oz-agent@warp.dev>
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds an ipynb_parser crate and integrates a new Ipynb content format so notebook JSON can be converted into editor FormattedText with raw fallback handling on parse failures.
Concerns
- Malformed notebooks that declare
nbformat: 4but omit the requiredcellsfield currently deserialize as an empty notebook, causing a blank render instead of the promised raw-content fallback.
Verdict
Found: 0 critical, 1 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
Add the .ipynb file-detection helper and its tests to warp_util. The helper is independent of the app-level wiring and is shared groundwork for opening notebooks, so it lands with the parser/buffer foundation. Co-Authored-By: Oz <oz-agent@warp.dev>
Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>
|
/oz-review |
|
I'm re-reviewing this pull request in response to a review request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds an ipynb_parser crate and editor buffer/model integration for converting nbformat v4 notebooks into FormattedText, with tests covering cell rendering, saved outputs, embedded images, fallback behavior, and .ipynb filename detection.
Concerns
- Stream and
text/plainoutputs still bypass the ANSI cleanup used for tracebacks, so notebooks that saved colored stdout/stderr can render raw control sequences. See the inline suggestion. - No approved spec context was available, and the security pass did not identify security findings.
Verdict
Found: 0 critical, 0 important, 1 suggestions
Approve with nits
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>
|
/oz-review |
|
I'm re-reviewing this pull request in response to a review request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds an ipynb_parser crate and editor buffer plumbing to render nbformat v4 notebooks as FormattedText, including markdown/code/raw cells and selected saved outputs.
Concerns
- Notebook image outputs are emitted as
data:URI image sources, but the existing editor image resolver treats non-HTTP(S), non-absolute sources as local file paths, so those outputs will fail to load instead of rendering.
Verdict
Found: 0 critical, 1 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
… evelyn/ipynb-parser-buffer
|
/oz-review |
|
I'm re-reviewing this pull request in response to a review request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds an ipynb_parser crate, editor buffer/reset support for .ipynb content, a notebook file-type helper, and base64 data: URI asset loading for rendered images. No approved spec context was provided.
Concerns
data_uri_sourceaccepts arbitrarily large base64 payloads from rendered Markdown/notebook content and decodes them without an asset-level size limit; see the inline security comment.- This diff changes user-visible rendered image behavior by teaching the Markdown image resolver to render
data:URI images, but the PR does not include screenshots or a screen recording. For this user-facing change, please include screenshots or a screen recording demonstrating it working end to end.
Security
- Unbounded base64 data URI decoding can cause avoidable memory pressure when rendering crafted Markdown/notebook content.
Verdict
Found: 0 critical, 2 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
|
|
||
| // base64 payloads may contain embedded whitespace/newlines; strip it before | ||
| // decoding. | ||
| let payload: String = payload.chars().filter(|c| !c.is_whitespace()).collect(); |
There was a problem hiding this comment.
source comes from Markdown/notebook content, and this path clones then decodes an unbounded data: payload; reject payloads above a fixed size before collecting/decoding so a crafted image cannot spike memory usage.
|
/oz-review |
|
This is your last |
|
I'm re-reviewing this pull request in response to a review request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds an ipynb_parser crate for rendering nbformat v4 notebooks into FormattedText, integrates .ipynb buffer initialization/reset paths in the editor, adds data-URI image handling for rendered notebook/markdown images, and adds file-type/parser coverage tests.
Concerns
- No blocking correctness, security, spec-drift, or performance concerns found in the annotated diff.
Verdict
Found: 0 critical, 0 important, 0 suggestions
Approve
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
|
|
||
| /// Creates an [`AssetSource::Async`] that decodes an inline base64 `data:` URI | ||
| /// (e.g. `data:image/png;base64,<payload>`) into its raw bytes. | ||
| pub fn data_uri_source(source: &str) -> Option<AssetSource> { |
There was a problem hiding this comment.
Why are we making changes in our asset cache code? I don't quite understand the motivation here
There was a problem hiding this comment.
The motivation comes from Oz comment, as jupyter notebook supports embedding images as base64 inline, which is different from markdown image rendering as an external reference
| base_directory: Option<&Path>, | ||
| ) -> AssetSource { | ||
| if source.starts_with("http://") || source.starts_with("https://") { | ||
| if let Some(data_uri_source) = asset_cache::data_uri_source(source) { |
There was a problem hiding this comment.
This change on supporting inline data URL in editor seems orthogonal to the overall PR on ipynb format parsing?
There was a problem hiding this comment.
it is ipynb parser push_image that would produce inline data, and therefore we need to support this.
Please let me know if i misunderstood
| let language = notebook.language(); | ||
| let mut lines: Vec<FormattedTextLine> = Vec::new(); | ||
|
|
||
| for cell in ¬ebook.cells { |
There was a problem hiding this comment.
Iterating over notebook cells, converting them into our format text, and then pushing it into a Vec seems not optimal for performance due to the incremental allocation we need to do as we iterate
Could we use std::iter::Map here to do the conversion without incrementally allocating space?
| /// parseable notebook (malformed JSON, unsupported nbformat version, etc.). The | ||
| /// raw content is placed in a single code block so any Markdown/HTML inside it | ||
| /// is shown verbatim, never re-interpreted. | ||
| pub fn raw_fallback_formatted_text(content: &str) -> FormattedText { |
There was a problem hiding this comment.
Shouldn't we return an error and render that in a raw code editor (potentially with a user visible UX)?
I don't think silently falling back to a giant block of JSON is the right approach
| return; | ||
| } | ||
| lines.push(FormattedTextLine::Image(FormattedImage { | ||
| alt_text: "output".to_string(), |
There was a problem hiding this comment.
Why are we hard-coding alt text here?
There was a problem hiding this comment.
this is a place holder: if a cell generates image (calling plot libraries like seaborn, matplotlib..etc), it would carry no alt-text metadata and display the image in cell output, like
"outputs": [
{
"data": {
"image/png": some_base64
"text/plain": [
"<Figure size 1427.25x450 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
Hence just picked the placeholder name semantically, as FormattedImage.alt_text is non_optional.
Hard to decide if put an empty string here is better.
| return; | ||
| } | ||
| if base64.len() > MAX_IMAGE_DATA_CHARS { | ||
| push_text_output(lines, "[output image omitted: exceeds size limit]"); |
There was a problem hiding this comment.
Is this the same default behavior we do for markdown images?
There was a problem hiding this comment.
Great call. This is only for jupyter notebook UI optimization (more strict than markdown, but maybe not necessary). i dropped it to be consistent with markdown too. 24a8907
We already have a size guard in asset cache that applied to both jupyter notebook and markdown.
There was a problem hiding this comment.
As a follow up: silent drop maybe not desirable; therefore attempted to display a hint when images are too big to display a2c7def
| if text.is_empty() { | ||
| return; | ||
| } | ||
| if text.chars().count() > MAX_TEXT_OUTPUT_CHARS { |
There was a problem hiding this comment.
Why do we have some arbitrary text length truncation here?
There was a problem hiding this comment.
this truncation only applies for cell output, not cell (code or markdown), to prevent cell output bloating the buffer;
For vscode, we will see something like this

Added a TODO here, if we also want to support a text truncation UI like vscode (for now i feel it's a little overhead on the scope of rendering a notebook)
|
|
||
| /// Strip ANSI escape sequences (CSI/SGR colors, OSC, and simple escapes) so | ||
| /// tracebacks render as readable plain text. | ||
| fn strip_ansi(input: &str) -> String { |
There was a problem hiding this comment.
Why do we need to strip ansi?
There was a problem hiding this comment.
called out by this Oz-comment
The push_text_output will be rendered as plan, unhighlighted code block instead of a terminal/ansi-aware surface. strip_ansi in all input paths (stream, text/plain, and error) of push_text_output in order to avoid garbage rendering
| fn sanitize_language(raw: &str) -> String { | ||
| let trimmed = raw.trim(); | ||
| let is_safe = !trimmed.is_empty() | ||
| && trimmed.chars().count() <= MAX_LANGUAGE_TAG_CHARS |
There was a problem hiding this comment.
Don't quite understand why we need this sanitization process as well. Also seems like the fallback is we just leave an empty string?
There was a problem hiding this comment.
The language refers to the the notebook metadata metadata.language_info.name, which is attacker-controllable.
it's also a security guard called out by Oz
Leave an empty string means the code block will render as a plain, unhighlighted code block. Therefore, when detects the language is unknown/not safe
- push_code_block takes an empty string as param
- Flows to CodeBlockType, where lang is used to pick a grammar highlight
Description
Breakdown #12242 into
This PR is part 1
Comparison with #12242
This PR (#12462) is a strict subset of #12242. Breakdown of which changes each PR covers:
ipynb_parsercrate: convert nbformat v4 notebooks →FormattedText(markdown/code/raw cells, saved outputs, images, ANSI stripping, size/length truncation)ipynb_parseras workspace member +warp_editordependency (Cargo.toml,Cargo.lock,crates/editor/Cargo.toml)ContentFormat::Ipynb,InitialBufferState::ipynb,Buffer::from_ipynb(raw-JSON fallback on parse failure),replace()match armRichTextEditorModel::reset_with_ipynbto bootstrap a buffer from notebook contents.ipynbfile detection helperis_jupyter_notebook_file+ tests (crates/warp_util/src/file_type.rs)JupyterNotebookRenderingfeature flag + dogfood enablement (crates/warp_features,app/src/features.rs,app/Cargo.toml).ipynbas an openable/notebook file type + tests (app/src/util/openable_file_type.rs).ipynbin the notebook view (app/src/code/file_tree/view.rs,app/src/code/view.rs).ipynb(app/src/notebooks/file/mod.rs+ tests,app/src/notebooks/editor/model.rs,editor/view.rs).ipynb(app/src/terminal/view.rs,terminal/view/open_in_warp.rs,app/src/uri/mod.rs+ tests)app/src/util/tooltips.rs,app/src/workspace/view.rs)Linked Issue
ready-to-specorready-to-implement.Testing
./script/runNo user perceived changes in this PR
Screenshots / Videos
https://www.loom.com/share/f6be00f6ab824bb094d74685050675e4
since it changed markdown image rendering by limiting image size and handling base64 string, we tested loading md file in warpdev and warplocal: markdown with images (smaller than 16MB) could be successfully opened/rendered in both (before & after this change)
Agent Mode