Skip to content

LSP support for HEEX ~H sigil contents & tree-sitter-heex#75

Open
superhawk610 wants to merge 31 commits into
remoteoss:mainfrom
superhawk610:feat/treesitter-heex
Open

LSP support for HEEX ~H sigil contents & tree-sitter-heex#75
superhawk610 wants to merge 31 commits into
remoteoss:mainfrom
superhawk610:feat/treesitter-heex

Conversation

@superhawk610

@superhawk610 superhawk610 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Resolves #74.

This PR extends the existing Elixir tokenizer to parse the contents of ~H sigils and resolve any function components and expressions contained within so that textDocument/definition works. This allows all common LSP functionality within ~H sigils, commonly found in Phoenix LiveView render functions and function components. It also extends the existing tree-sitter-elixir parser with tree-sitter-heex in nested sub-trees within sigils.

HEEX Parsing

TokenizeHEEX will only output a minimal subset of HEEX contents:

  • TokModule / TokIdent for function components
  • any Elixir tokens within <% .. %> / { .. } interpolations
  • TokHEEXOpenTag / TokHEEXCloseTag for HTML < and </ tags
  • TokEOL for end-of-line

All other HEEX contents are ignored during tokenization.

The parser has also been extended to read .foo as a function call to foo() when it occurs immediately after a TokHEEXOpenTag / TokHEEXCloseTag. Function components with a module prefix, e.g. <Foo.bar />, are already handled by the existing parser.

Tree-sitter Integration

The existing tree-sitter tree stored by the document cache is a single tree with a single language. It's been extended with new Tree / TreeNode types that support nested trees with different languages. A typical Elixir document will now be modeled with a 3-level Elixir->HEEX->Elixir tree:

t1 := &Tree{   /* def render(assigns) do\n~H"<div class={foo()} />"\nend */
  Root: nil,
  Trunk: &rootElixirTree,
  Language: LangElixir,
  Branches: {
    NodeId: t2 := &Tree{   /* <div class={foo()} /> */
      Root: t1.TrunkNode(),
      Trunk: &nestedHeexTree,
      Language: LangHeex,
      Branches: {
        NodeId: t3 := &Tree{   /* foo() */
          Root: t2.TrunkNode(),
          Trunk: &nestedElixirTree,
          Language: LangElixir,
          Branches: nil,
        },
      },
    },
  },
}

The TreeNode type is a light wrapper around *tree_sitter.Node that also tracks which Tree the node is a member of, facilitates traversal between nested trees, and wraps common methods like StartByte(), EndByte(), StartPosition(), EndPosition(), and Utf8Text() to correctly handle byte offsets from the root tree.


Note

Medium Risk
Touches core document parsing, caching, and all tree-sitter-driven LSP paths; scope is large but covered by new HEEX and tree tests and an index version bump.

Overview
Adds Phoenix HEEX support inside ~H sigils so LSP navigation (go-to-definition, references, hover, completion, rename, highlights) works on LiveView templates, not only plain Elixir.

The tokenizer now lexes ~H bodies via TokenizeHeex (components like <.foo />, <Foo.bar />, {...} / <% %> interpolations) and maps <.name after HEEX open tags to function calls in the reference indexer. ExpressionAtCursor gains HEEX-aware cursor context tests.

Tree-sitter is refactored from a single Elixir parse to a nested treesitter.Tree: Elixir trunk with HEEX branches on ~H quoted_content and Elixir branches on HEEX expression_value, using tree-sitter-heex. DocumentStore keeps per-language parsers and caches this composite tree; LSP handlers call methods on *treesitter.Tree (e.g. FindVariableOccurrences) so positions resolve across nested languages.

Index version is bumped to 13 for the parser/index changes.

Reviewed by Cursor Bugbot for commit 713f6f3. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread internal/lsp/elixir.go Outdated
Comment thread internal/lsp/elixir.go Outdated
@JesseHerrick

Copy link
Copy Markdown
Member

Hey @superhawk610, nice work on this! However, I would suggest that we take a slightly different approach. I think that we should index HEEX (both inside sigils and heex files) as well using the tokenizer and parser. My reasoning:

  • We plan on getting rid of tree-sitter at some point. It adds quite a bit of complexity.
  • In this current approach, go to definition works but go to references wouldn't. We also gain other features by indexing these files.

I realize that this is more complex to build up front, but I think the end result would be well worth it.

Comment thread internal/lsp/elixir.go Outdated
Comment thread internal/lsp/elixir.go Outdated
Comment thread internal/treesitter/variables.go Outdated
@superhawk610

Copy link
Copy Markdown
Contributor Author

I'm working on indexing HEEX with the tokenizer/parser so it's incorporated with the existing process. I'm not confident I can write a full tokenizer for HEEX's grammar, so I'm going with a hybrid approach for now that still shells out to tree-sitter-heex. I'm hoping that once this first step is done, it will provide most of the plumbing and we'll just need a tokenizer for HEEX.

Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/lsp/server.go Outdated
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/treesitter/variables.go Outdated

@superhawk610 superhawk610 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point go-to definition works, but find all references doesn't since the tree-sitter parsing in variables.go treats HEEX sigil contents as opaque. I've taken a pass at a possible approach through this using nested HEEX tree-sitter trees, but this has grown quite a bit in scope. I think a good next step would be to think this through a bit more thoroughly at a high level and plan the implementation more explicitly. What do you think?

Comment thread internal/parser/tokenizer.go
@JesseHerrick

Copy link
Copy Markdown
Member

I'm working on indexing HEEX with the tokenizer/parser so it's incorporated with the existing process. I'm not confident I can write a full tokenizer for HEEX's grammar, so I'm going with a hybrid approach for now that still shells out to tree-sitter-heex. I'm hoping that once this first step is done, it will provide most of the plumbing and we'll just need a tokenizer for HEEX.

I can take a stab at this part if you'd like - I can't promise a specific timeline though. Unfortunately, we can't ship something that shells out to tree-sitter during indexing. It's extremely important that we don't have performance regressions during indexing. There's too much overhead in calling out to tree-sitter. Remote's codebase has >57k files to parse, so we need indexing to be as fast as possible.

@superhawk610

superhawk610 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

No expectations at all on timeline! I've gotten this to handle what I need (go-to definition within HEEX sigils), and I don't mind just building from my fork if this ends up being too much work or too niche. Totally agree on avoiding performance regressions, Dexter's speed is its best selling point amongst my peers! 😎

Please feel free to contribute whatever you'd like to this branch, take over or use some/any of this PR, or close it out if it's not in the cards.

@JesseHerrick

Copy link
Copy Markdown
Member

I'll see if I can take a stab at it on this PR sometime this week and we can ship it together. In the meantime, feel free to keep shipping things to this branch and I'll add stuff when I can.

Comment thread internal/parser/tokenizer.go
Comment thread internal/lsp/documents.go Outdated
Comment thread internal/lsp/documents.go Outdated
Comment thread internal/lsp/server.go Outdated
Comment thread internal/treesitter/tree.go Outdated
@superhawk610

superhawk610 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

I started on a barebones HEEX tokenizer. The line between tokenizer and parser is a bit blurry and this may lean a bit far into parsing. My aim is to get a simple replacement for tree-sitter-heex that can tokenize at minimum: TokModule, TokIdent, TokDot, and recursively tokenize interpolated expressions. If this seems to be moving in the right direction, I may have some more time later in the week to continue.

Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/parser/tokenizer.go
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/parser/tokenizer_test.go
Comment thread internal/treesitter/tree.go
Comment thread internal/parser/tokenizer.go
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/parser/tokenizer_test.go
@superhawk610

superhawk610 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

OK, HEEX tokenizer is now on par with the tree-sitter-heex approach, though lacking some edge-case coverage.

  • scanInterpolation / scanUntil don't handle early occurrences of the terminator, e.g. <div class={"{}"}> will terminate at the first } rather than the second
  • HEEX special forms e.g. <%= for, <%= case, etc. aren't parsed correctly
  • malformed HTML can probably get the tokenizer stuck in an infinite loop added FuzzTokenizeHeex and caught a couple degenerate cases, ran for 10 minutes afterward without catching anything else
  • find all references still doesn't work, as the treesitter module needs to be updated to traverse the new heex nested sub-tree

Comment thread internal/parser/tokenizer.go
Comment thread internal/parser/tokenizer.go Outdated
Comment thread internal/treesitter/tree.go
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/treesitter/variables.go Outdated
Comment thread internal/treesitter/tree.go Outdated
Comment thread internal/lsp/server_test.go Outdated
Comment thread internal/treesitter/variables.go
@superhawk610 superhawk610 changed the title LSP textDocument/definition support for HEEX ~H sigil via tree-sitter-heex LSP support for HEEX ~H sigil contents & tree-sitter-heex Jun 11, 2026
Comment thread internal/parser/tokenizer.go
Comment thread internal/parser/tokenizer.go Outdated
- handle parseElixir / parseHeex failure
- don't double-count line offset
- use NewTreeWithParsers in DocumentStore
- prevent server startup if parsers unavailable
- emit TokSigil for empty ~H sigils
- simplify break to loop condition
Parsing HEEX sigils will introduce new references.
@superhawk610 superhawk610 force-pushed the feat/treesitter-heex branch from 4c40ca7 to 429f6bd Compare June 11, 2026 19:14

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 429f6bd. Configure here.

Comment thread internal/treesitter/variables.go
@superhawk610

Copy link
Copy Markdown
Contributor Author

Alright @JesseHerrick updated the approach per your request. HEEX files and sigil contents are now fully indexed and the existing tree-sitter-elixir tree recursively parses nested HEEX/Elixir sub-trees. TokenizeHeex aims to parse as little of the HEEX contents as is necessary to find relevant Elixir function calls and interpolations.

Let me know what you think!

@JesseHerrick

Copy link
Copy Markdown
Member

Excellent @superhawk610! I will take a look as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LSP textDocument/definition support for HEEX ~H sigils

2 participants