Skip to content

Implement opt-in change indexes for dense components.#23519

Draft
pcwalton wants to merge 1 commit intobevyengine:mainfrom
pcwalton:hierarchical-change-ticks
Draft

Implement opt-in change indexes for dense components.#23519
pcwalton wants to merge 1 commit intobevyengine:mainfrom
pcwalton:hierarchical-change-ticks

Conversation

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Mar 25, 2026

This summary (like the rest of the PR) is a work in progress.

Overview

Currently, for queries that use Added and/or Changed query filters, the Bevy ECS must examine every component of every entity that matches the archetypes in question. Because core systems like rendering, transforms, and visibility calculation rely heavily on Added/Changed query filters, this adds up to a significant bottleneck when scaling to millions of entities. With the significant effort in 0.19 to scale to mega-worlds (1 million entities or more), the performance of Changed has become the largest blocker to achieving high scalability. The goal is to be competitive with Unity DOTS and its megacity demo, which has approximately 4.5 million mesh instances and modifies about 5,000 transforms per frame; without some method of accelerating Added and Changed, as for example in this PR, I don't believe this is feasible for Bevy to achieve.

To solve this issue, this commit adds change indexes, which are an opt-in acceleration method for dense components. Change indexes introduce a table of summaries of each page of rows within a table. The number of consecutive rows that constitute a page is known as the page size, and, through measurement, I found 256 to be a reasonable conservative value. Each summary consists of the most recent change tick for all the indexed components within that archetype. When iterating through a query (either sequentially or in parallel), if an indexed component C cannot match unless Added<C> or Changed<C> is true, then the query engine uses the summary to skip entire pages' worth of entities.

Adding the #[component(change = "indexed")] attribute to a component enables indexing for that component. Because indexing adds overhead to Mut<T> among other operations, indexing is opt-in instead of opt-out. It's possible to determine statically, at compile time, whether a component is indexed, and the plan to ensure that Mut<T> doesn't regress relies on this.

Alternate approaches

There are several alternate approaches that I experimented with. My experience with each one was as follows:

Per-column change indexes

My initial attempt stored change indexes on each column rather than on each archetype. This provided more specificity: the query acceleration could take into account only the change ticks for the components in the query filter rather than all indexed components on the archetype. The downside was that it severely impacted the performance of extract_meshes_for_gpu_building, which has the following query:

fn extract_meshes_for_gpu_building(
    ...,
    changed_meshes_query: Extract<
        Query<
            GpuMeshExtractionQuery,
            Or<(
                Changed<ViewVisibility>,
                Changed<GlobalTransform>,
                Changed<PreviousGlobalTransform>,
                Changed<Lightmap>,
                Changed<Aabb>,
                Changed<Mesh3d>,
                Changed<MeshTag>,
                (
                    Changed<NoFrustumCulling>,
                    Changed<NotShadowReceiver>,
                    Changed<TransmittedShadowReceiver>,
                    Changed<NotShadowCaster>,
                    Changed<NoAutomaticBatching>,
                    Changed<NoCpuCulling>,
                ),
                Changed<VisibilityRange>,
                Changed<SkinnedMesh>,
            )>,
        >,
    ...
)

This is 14 different components that had to be checked and is responsible for one of the bottlenecks. In fact, being able to consolidate all of these components into a single check is one of the major motivations for change indexes to begin with.

Per-archetype change indexes

I also experimented with change indexes stored on the archetype instead of on the table. The advantage of storing the index on the archetype would be that sparse sets and tables are handled identically. Unfortunately, this ballooned complexity quite a bit and led to a lot of incorrect behavior. The biggest sticking point that I could see was that, in order to produce a Mut<T> with a pointer to the change index, a pointer to the change index needs to be stored in the Fetch. But that's incompatible with how query iteration for dense components works: for dense components, queries iterate over tables, not over components.

Benchmarks

many_cubes

My primary interest is in scaling to worlds with millions of entities. A pure benchmark of scalability in this area is many_cubes --instance-count 4000000 --no-cpu-culling. (Four million cubes is the maximum before the transform-and-cull shader runs into wgpu workgroup limits, and CPU culling must be disabled in order to meaningfully scale to that level.) The results are as follows:

many_cubes --instance-count 4000000 --no-cpu-culling, main:
19.34 median ms/frame, 52 FPS
Screenshot 2026-03-25 190232

many_cubes --instance-count 4000000 --no-cpu-culling, this PR:
14.49 median ms/frame, 69 FPS
Screenshot 2026-03-25 191007

The extract_mesh_materials system, the bottleneck during the extraction phase, goes from median 4.58 ms/frame to 0.0238 ms/frame, a 192x speedup:
Screenshot 2026-03-25 192220

(Please note that batch_and_prepare_binned_render_phase, write_work_item_buffers, and write_indirect_parameters_buffers are all addressed by #23481 and followups to it, so the overall speedups from change indexes won't be limited by Amdahl's Law the way they are now.)

bevy_city

In bevy_city, 12,442 entities out of 46,717 change every frame. This is not a workload that change indexes significantly improve, because the time spent actually doing the work that must happen on change dwarfs the time spent checking the filter for static meshes. Nevertheless, it's useful to show that change indexes don't regress bevy_city. Note that bevy_city is GPU bound, so the total frame times don't really indicate anything related to this PR.

bevy_city with no CPU culling on meshes, main:
Median frame time 26.9 ms (37 FPS)
Screenshot 2026-03-25 195502

bevy_city with no CPU culling on meshes, this PR:
Median frame time 27.8 ms (36 FPS)
Screenshot 2026-03-25 195648

extract_meshes_for_gpu_building comparison between this PR (yellow) and main (red). Median time is 2.03 ms in both cases.
Screenshot 2026-03-25 195738

Addition and removal

Benchmark main This PR
add_remove/table 1.0494 ms 1.0840 ms
add_remove/sparse_set 820.38 µs 762.14 µs
add_remove_big/table 1.9357 ms 2.0195 ms
add_remove_big/sparse_set 826.59 µs 828.18 µs
add_remove_very_big/table 61.148 ms 60.296 ms

Change detection

Test main This PR
all_added_detection/5000_entities_ecs::change_detection::Table 5µs 587ns 6µs 519ns
all_added_detection/5000_entities_ecs::change_detection::Sparse 6µs 727ns 6µs 523ns
all_added_detection/50000_entities_ecs::change_detection::Table 57µs 28ns 64µs 783ns
all_added_detection/50000_entities_ecs::change_detection::Sparse 67µs 399ns 67µs 32ns
all_changed_detection/5000_entities_ecs::change_detection::Table 6µs 684ns 7µs 796ns
all_changed_detection/5000_entities_ecs::change_detection::Sparse 6µs 923ns 11µs 786ns
all_changed_detection/50000_entities_ecs::change_detection::Table 66µs 57ns 121µs 793ns
all_changed_detection/50000_entities_ecs::change_detection::Sparse 68µs 780ns 115µs 138ns
few_changed_detection/5000_entities_ecs::change_detection::Table 2µs 19ns 5µs 766ns
few_changed_detection/5000_entities_ecs::change_detection::Sparse 4µs 307ns 8µs 27ns
few_changed_detection/50000_entities_ecs::change_detection::Table 40µs 489ns 52µs 935ns
few_changed_detection/50000_entities_ecs::change_detection::Sparse 83µs 41ns 82µs 157ns
none_changed_detection/5000_entities_ecs::change_detection::Table 1µs 346ns 3µs 886ns
none_changed_detection/5000_entities_ecs::change_detection::Sparse 3µs 922ns 3µs 984ns
none_changed_detection/50000_entities_ecs::change_detection::Table 14µs 238ns 38µs 329ns
none_changed_detection/50000_entities_ecs::change_detection::Sparse 39µs 562ns 39µs 621ns
multiple_archetypes_none_changed_detection/5_archetypes_10_entities_ecs::change_detection::Table 66ns 62ns
multiple_archetypes_none_changed_detection/5_archetypes_10_entities_ecs::change_detection::Sparse 80ns 81ns
multiple_archetypes_none_changed_detection/5_archetypes_100_entities_ecs::change_detection::Table 242ns 383ns
multiple_archetypes_none_changed_detection/5_archetypes_100_entities_ecs::change_detection::Sparse 492ns 488ns
multiple_archetypes_none_changed_detection/5_archetypes_1000_entities_ecs::change_detection::Table 1µs 537ns 3µs 964ns
multiple_archetypes_none_changed_detection/5_archetypes_1000_entities_ecs::change_detection::Sparse 4µs 432ns 4µs 541ns
multiple_archetypes_none_changed_detection/5_archetypes_10000_entities_ecs::change_detection::Table 15µs 416ns 38µs 575ns
multiple_archetypes_none_changed_detection/5_archetypes_10000_entities_ecs::change_detection::Sparse 45µs 476ns 47µs 493ns
multiple_archetypes_none_changed_detection/20_archetypes_10_entities_ecs::change_detection::Table 220ns 216ns
multiple_archetypes_none_changed_detection/20_archetypes_10_entities_ecs::change_detection::Sparse 265ns 267ns
multiple_archetypes_none_changed_detection/20_archetypes_100_entities_ecs::change_detection::Table 962ns 1µs 684ns
multiple_archetypes_none_changed_detection/20_archetypes_100_entities_ecs::change_detection::Sparse 1µs 945ns 1µs 997ns
multiple_archetypes_none_changed_detection/20_archetypes_1000_entities_ecs::change_detection::Table 6µs 537ns 16µs 38ns
multiple_archetypes_none_changed_detection/20_archetypes_1000_entities_ecs::change_detection::Sparse 18µs 632ns 19µs 271ns
multiple_archetypes_none_changed_detection/20_archetypes_10000_entities_ecs::change_detection::Table 68µs 266ns 159µs 500ns
multiple_archetypes_none_changed_detection/20_archetypes_10000_entities_ecs::change_detection::Spars... 264µs 850ns 271µs 500ns
multiple_archetypes_none_changed_detection/100_archetypes_10_entities_ecs::change_detection::Table 1µs 209ns 1µs 132ns
multiple_archetypes_none_changed_detection/100_archetypes_10_entities_ecs::change_detection::Sparse 1µs 396ns 1µs 430ns
multiple_archetypes_none_changed_detection/100_archetypes_100_entities_ecs::change_detection::Table 5µs 927ns 9µs 263ns
multiple_archetypes_none_changed_detection/100_archetypes_100_entities_ecs::change_detection::Sparse 12µs 204ns 12µs 420ns
multiple_archetypes_none_changed_detection/100_archetypes_1000_entities_ecs::change_detection::Table 52µs 475ns 89µs 500ns
multiple_archetypes_none_changed_detection/100_archetypes_1000_entities_ecs::change_detection::Spars... 152µs 187ns 152µs 637ns
multiple_archetypes_none_changed_detection/100_archetypes_10000_entities_ecs::change_detection::Tabl... 380µs 300ns 823µs 50ns
multiple_archetypes_none_changed_detection/100_archetypes_10000_entities_ecs::change_detection::Spar... 1ms 326µs 950ns 1ms 367µs 850ns

Future work

These benchmark numbers shouldn't be considered the upper limit of what is possible with change indexes. The remaining systems in many_cubes, for instance, could probably see large improvements with additional work. For instance:

  1. Systems such as visibility::calculate_bounds and mark_meshes_as_changed_if_their_materials_changed aren't currently eligible to use change indexes because they use AssetChanged, which must perform a full table scan. However, by introducing a resource that stores a bidirectional index between Mesh and Material assets and the entities that use them, the AssetChanged query filter could be dropped, and these systems could be migrated to only use Added/Changed, making them eligible for change indexes.

  2. Some systems such as reset_view_visibility could be migrated to use change indexes and be eliminated from the profile.

Ultimately, the goal is for the CPU time to approach zero for meshes that don't change from frame to frame, and to have efficient handling for meshes that do.

@alice-i-cecile alice-i-cecile added C-Feature A new feature, making something new possible A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times X-Needs-SME This type of work requires an SME to approve it. labels Mar 25, 2026
@alice-i-cecile alice-i-cecile self-assigned this Mar 25, 2026
@github-project-automation github-project-automation bot moved this to Needs SME Triage in ECS Mar 25, 2026
@alice-i-cecile alice-i-cecile added M-Release-Note Work that should be called out in the blog due to impact S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Mar 25, 2026
@ecoskey ecoskey self-requested a review March 25, 2026 22:06
Copy link
Contributor

@ElliottjPierce ElliottjPierce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to come back to this and do a full review later, but here's some quick thoughts:

  • This needs a lot more docs to explain what the structure of this even is. I'll do more review when there's more here. Trying to put this together, I think what's going on here is: In addition to tracking changes for each component value, track changes for blocks/"pages" of entities in each table. There are PagesSize entities in each block and they all share the same world tick. For things that are changed often, this makes mutations slower. But for very rarely changed things, this means we can skip large sections of entities if their shared change tick is old. Am I getting that right?
  • We are going to need more docs and examples to motivate this for users. I'd love to see some benchmark results.
  • How does this perform for entities that rarely have component values changed but are frequently moved between tables? How much does this hurt spawning performance, inserts, and such? Probably well worth the cost, but still...
  • This makes Mut 8 bytes larger IIUC. This is probably the most concerning thing for me. This is still probably worth it, but this is going to hurt in some places if I had to guess.
  • This will probably improve performance for the average user. But, it will also probably make it worse for others, depending on how often they are changing things. I think it would be cool (but probably not worth trying yet) if users could customize the page size more. Maybe per component and the table just takes the larges, IDK. The more rarely a component is changed, the bigger its page size should be. Maybe even have a tool that can watch the app run and suggest ideal page sizes. Could be interesting.
  • I'd like to point out that this improves the theoretical "normal" case but it also makes the theoretical worst case worse. If exactly one entity in each page is changed, even from a different component, it will make performance worse. For example, in a game with 10 rarely changed components using this new indexing scheme, while each one of those 10 is rarely mutated, it's probably pretty common for one of them to be mutated on an entity with all 10. On the whole, this technically makes querying less efficient the more components an entity has, which is not ideal. But it's probably not a huge issue in practice. This could be fixed by moving this indexing scheme to the columns, but that may have other drawbacks. Thoughts on this?
  • We need better names here than Default and Indexed. Maybe Individual, and PessimisticallyPaged, and later we could add None? IDK, but "indexed" isn't very informative IMO, but this is a small thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Feature A new feature, making something new possible C-Performance A change motivated by improving speed, memory usage or compile times M-Release-Note Work that should be called out in the blog due to impact S-Needs-Review Needs reviewer attention (from anyone!) to move forward X-Needs-SME This type of work requires an SME to approve it.

Projects

Status: Needs SME Triage

Development

Successfully merging this pull request may close these issues.

3 participants