LSP support for HEEX ~H sigil contents & tree-sitter-heex#75
LSP support for HEEX ~H sigil contents & tree-sitter-heex#75superhawk610 wants to merge 31 commits into
Conversation
|
Hey @superhawk610, nice work on this! However, I would suggest that we take a slightly different approach. I think that we should index HEEX (both inside sigils and heex files) as well using the tokenizer and parser. My reasoning:
I realize that this is more complex to build up front, but I think the end result would be well worth it. |
|
I'm working on indexing HEEX with the tokenizer/parser so it's incorporated with the existing process. I'm not confident I can write a full tokenizer for HEEX's grammar, so I'm going with a hybrid approach for now that still shells out to |
superhawk610
left a comment
There was a problem hiding this comment.
At this point go-to definition works, but find all references doesn't since the tree-sitter parsing in variables.go treats HEEX sigil contents as opaque. I've taken a pass at a possible approach through this using nested HEEX tree-sitter trees, but this has grown quite a bit in scope. I think a good next step would be to think this through a bit more thoroughly at a high level and plan the implementation more explicitly. What do you think?
I can take a stab at this part if you'd like - I can't promise a specific timeline though. Unfortunately, we can't ship something that shells out to tree-sitter during indexing. It's extremely important that we don't have performance regressions during indexing. There's too much overhead in calling out to tree-sitter. Remote's codebase has >57k files to parse, so we need indexing to be as fast as possible. |
|
No expectations at all on timeline! I've gotten this to handle what I need (go-to definition within HEEX sigils), and I don't mind just building from my fork if this ends up being too much work or too niche. Totally agree on avoiding performance regressions, Dexter's speed is its best selling point amongst my peers! 😎 Please feel free to contribute whatever you'd like to this branch, take over or use some/any of this PR, or close it out if it's not in the cards. |
|
I'll see if I can take a stab at it on this PR sometime this week and we can ship it together. In the meantime, feel free to keep shipping things to this branch and I'll add stuff when I can. |
|
I started on a barebones HEEX tokenizer. The line between tokenizer and parser is a bit blurry and this may lean a bit far into parsing. My aim is to get a simple replacement for |
|
OK, HEEX tokenizer is now on par with the
|
textDocument/definition support for HEEX ~H sigil via tree-sitter-heex- handle parseElixir / parseHeex failure - don't double-count line offset
- use NewTreeWithParsers in DocumentStore - prevent server startup if parsers unavailable - emit TokSigil for empty ~H sigils - simplify break to loop condition
Parsing HEEX sigils will introduce new references.
4c40ca7 to
429f6bd
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 429f6bd. Configure here.
|
Alright @JesseHerrick updated the approach per your request. HEEX files and sigil contents are now fully indexed and the existing Let me know what you think! |
|
Excellent @superhawk610! I will take a look as soon as I can. |

Resolves #74.
This PR extends the existing Elixir tokenizer to parse the contents of
~Hsigils and resolve any function components and expressions contained within so thattextDocument/definitionworks. This allows all common LSP functionality within~Hsigils, commonly found in Phoenix LiveViewrenderfunctions and function components. It also extends the existingtree-sitter-elixirparser withtree-sitter-heexin nested sub-trees within sigils.HEEX Parsing
TokenizeHEEXwill only output a minimal subset of HEEX contents:TokModule/TokIdentfor function components<% .. %>/{ .. }interpolationsTokHEEXOpenTag/TokHEEXCloseTagfor HTML<and</tagsTokEOLfor end-of-lineAll other HEEX contents are ignored during tokenization.
The parser has also been extended to read
.fooas a function call tofoo()when it occurs immediately after aTokHEEXOpenTag/TokHEEXCloseTag. Function components with a module prefix, e.g.<Foo.bar />, are already handled by the existing parser.Tree-sitter Integration
The existing tree-sitter tree stored by the document cache is a single tree with a single language. It's been extended with new
Tree/TreeNodetypes that support nested trees with different languages. A typical Elixir document will now be modeled with a 3-level Elixir->HEEX->Elixir tree:The
TreeNodetype is a light wrapper around*tree_sitter.Nodethat also tracks whichTreethe node is a member of, facilitates traversal between nested trees, and wraps common methods likeStartByte(),EndByte(),StartPosition(),EndPosition(), andUtf8Text()to correctly handle byte offsets from the root tree.Note
Medium Risk
Touches core document parsing, caching, and all tree-sitter-driven LSP paths; scope is large but covered by new HEEX and tree tests and an index version bump.
Overview
Adds Phoenix HEEX support inside
~Hsigils so LSP navigation (go-to-definition, references, hover, completion, rename, highlights) works on LiveView templates, not only plain Elixir.The tokenizer now lexes
~Hbodies viaTokenizeHeex(components like<.foo />,<Foo.bar />,{...}/<% %>interpolations) and maps<.nameafter HEEX open tags to function calls in the reference indexer.ExpressionAtCursorgains HEEX-aware cursor context tests.Tree-sitter is refactored from a single Elixir parse to a nested
treesitter.Tree: Elixir trunk with HEEX branches on~Hquoted_contentand Elixir branches on HEEXexpression_value, usingtree-sitter-heex.DocumentStorekeeps per-language parsers and caches this composite tree; LSP handlers call methods on*treesitter.Tree(e.g.FindVariableOccurrences) so positions resolve across nested languages.Index version is bumped to 13 for the parser/index changes.
Reviewed by Cursor Bugbot for commit 713f6f3. Bugbot is set up for automated code reviews on this repo. Configure here.