Skip to content

Commit 3cf98a6

Browse files
authored
Enable partial bindless on Metal and reduce bind group overhead (#18149) (#23436)
Make BUFFER_BINDING_ARRAY conditional on whether the material uses buffer binding arrays. Fix sampler limit to check array element count instead of binding slot count. Only create binding arrays for resource types the material actually uses, reducing overhead on all platforms. ## Objective Fixes #18149 Metal supports `TEXTURE_BINDING_ARRAY` but not `BUFFER_BINDING_ARRAY`. Bindless was disabled entirely on Metal because `bindless_supported()` required both unconditionally. ## Solution 1. **Conditional feature check**: Only require `BUFFER_BINDING_ARRAY` when the material actually uses buffer binding arrays. 2. Materials using `#[data(...)]`, textures, and samplers (like `StandardMaterial`) only need `TEXTURE_BINDING_ARRAY`. **Fix sampler limit check**: Use `max_binding_array_sampler_elements_per_shader_stage` (array element count) instead of `max_samplers_per_shader_stage` (binding slot count). 3. **Only create needed binding arrays**: `create_bindless_bind_group_layout_entries` now skips resource types the material doesn't use. This stays within Metal's 31 argument buffer slot limit and reduces wasted fallback resources on all platforms. ## Testing Bistro Exterior (698 materials), 5-minute runs: | GPU | Avg FPS (before → after) | Min FPS (before → after) | RAM/VRAM | |-----|--------------------------|--------------------------|----------| | Apple M2 Max (Metal) | 115 → 136 **(+18%)** | 60 → 106 **(+77%)** | -57 MB RAM | | NVIDIA 5060 Ti | 118 → 217 **(+84%)** | 60 → 165 **(+174%)** | Same | | Intel i360P | 25 → 29 **(+15%)** | Same | Same | | AMD Vega 8 / Ryzen 4800U | 25 → 25 | Same | **-88 MB VRAM** | | Intel Iris XE | ~22 → ~22 | Same | No regression | Also tested: `3d_scene`, `pbr`, `lighting`, `transmission`, `deferred_rendering`: all pass with zero errors. Materials using `#[uniform(..., binding_array(...))]` correctly fall back to non-bindless on Metal.
1 parent 862ba07 commit 3cf98a6

4 files changed

Lines changed: 136 additions & 72 deletions

File tree

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: Partial Bindless on Metal and Reduced Bind Group Overhead
3+
authors: ["@holg"]
4+
pull_requests: [23436]
5+
---
6+
7+
Bindless rendering was previously disabled entirely on Metal (macOS, iOS) because Bevy required both `TEXTURE_BINDING_ARRAY` and `BUFFER_BINDING_ARRAY` support unconditionally. Metal supports the former but not the latter. Since `StandardMaterial` only needs texture and sampler binding arrays - not buffer binding arrays — this requirement was unnecessarily restrictive.
8+
9+
`BUFFER_BINDING_ARRAY` is now only required when a material actually uses buffer binding arrays. Materials that only use `#[data(...)]`, textures, and samplers (including `StandardMaterial`) can now use the bindless path on Metal. A related fix corrects the sampler limit check to use `max_binding_array_sampler_elements_per_shader_stage` (the array element count) instead of `max_samplers_per_shader_stage` (the binding slot count).
10+
11+
Additionally, `create_bindless_bind_group_layout_entries` now only creates binding arrays for resource types the material actually uses, reducing bind group overhead and memory consumption on all platforms.
12+
13+
## Performance
14+
15+
Benchmarked on Bistro Exterior (698 materials), 5-minute runs:
16+
17+
| GPU | Avg FPS improvement | Min FPS improvement | Memory |
18+
| --- | --- | --- | --- |
19+
| Apple M2 Max (Metal) | +18% | +77% | −57 MB RAM |
20+
| NVIDIA 5060 Ti | +84% | +174% | Same |
21+
| Intel i360P | +15% | Same | Same |
22+
| AMD Vega 8 / Ryzen 4800U | Same | Same | −88 MB VRAM |
23+
| Intel Iris XE | Same | Same | No regression |

crates/bevy_pbr/src/material_bind_groups.rs

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1355,13 +1355,15 @@ impl MaterialBindlessSlab {
13551355
self.create_sampler_binding_resource_arrays(
13561356
&mut binding_resource_arrays,
13571357
fallback_bindless_resources,
1358+
bindless_descriptor,
13581359
required_binding_array_size,
13591360
);
13601361

13611362
// Build texture bindings.
13621363
self.create_texture_binding_resource_arrays(
13631364
&mut binding_resource_arrays,
13641365
fallback_image,
1366+
bindless_descriptor,
13651367
required_binding_array_size,
13661368
);
13671369

@@ -1382,6 +1384,7 @@ impl MaterialBindlessSlab {
13821384
&'a self,
13831385
binding_resource_arrays: &'b mut Vec<(&'a u32, BindingResourceArray<'a>)>,
13841386
fallback_bindless_resources: &'a FallbackBindlessResources,
1387+
bindless_descriptor: &'a BindlessDescriptor,
13851388
required_binding_array_size: Option<u32>,
13861389
) {
13871390
// We have one binding resource array per sampler type.
@@ -1399,6 +1402,14 @@ impl MaterialBindlessSlab {
13991402
&fallback_bindless_resources.comparison_sampler,
14001403
),
14011404
] {
1405+
// Skip resource types not used by this material.
1406+
if !bindless_descriptor
1407+
.resources
1408+
.contains(&bindless_resource_type)
1409+
{
1410+
continue;
1411+
}
1412+
14021413
let mut sampler_bindings = vec![];
14031414

14041415
match self.samplers.get(&bindless_resource_type) {
@@ -1443,6 +1454,7 @@ impl MaterialBindlessSlab {
14431454
&'a self,
14441455
binding_resource_arrays: &'b mut Vec<(&'a u32, BindingResourceArray<'a>)>,
14451456
fallback_image: &'a FallbackImage,
1457+
bindless_descriptor: &'a BindlessDescriptor,
14461458
required_binding_array_size: Option<u32>,
14471459
) {
14481460
for (bindless_resource_type, fallback_image) in [
@@ -1459,6 +1471,14 @@ impl MaterialBindlessSlab {
14591471
&fallback_image.cube_array,
14601472
),
14611473
] {
1474+
// Skip texture types that this material doesn't use.
1475+
if !bindless_descriptor
1476+
.resources
1477+
.contains(&bindless_resource_type)
1478+
{
1479+
continue;
1480+
}
1481+
14621482
let mut texture_bindings = vec![];
14631483

14641484
let binding_number = bindless_resource_type

crates/bevy_render/macros/src/as_bind_group.rs

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,12 @@ pub fn derive_as_bind_group(ast: syn::DeriveInput) -> Result<TokenStream> {
7373
let mut non_bindless_binding_layouts = Vec::new();
7474
let mut bindless_resource_types = Vec::new();
7575
let mut bindless_buffer_descriptors = Vec::new();
76+
// Whether this material needs buffer binding arrays (BUFFER_BINDING_ARRAY feature).
77+
// False when only #[data(...)], textures, and samplers are used.
78+
// NOTE: When wgpu adds support for buffer binding arrays on Metal
79+
// (see https://github.com/gfx-rs/wgpu/pull/9081), this conditional
80+
// can be removed and BUFFER_BINDING_ARRAY required unconditionally.
81+
let mut has_buffer_binding_arrays = false;
7682
let mut attr_prepared_data_ident = None;
7783
// After the first attribute pass, this will be `None` if the object isn't
7884
// bindless and `Some` if it is.
@@ -225,6 +231,7 @@ pub fn derive_as_bind_group(ast: syn::DeriveInput) -> Result<TokenStream> {
225231
binding_index,
226232
quote! { #render_path::render_resource::BindlessResourceType::Buffer },
227233
);
234+
has_buffer_binding_arrays = true;
228235
}
229236

230237
UniformBindingAttrType::Data => {
@@ -501,6 +508,8 @@ pub fn derive_as_bind_group(ast: syn::DeriveInput) -> Result<TokenStream> {
501508
bindless_resource_type,
502509
);
503510

511+
has_buffer_binding_arrays = true;
512+
504513
// Push the buffer descriptor.
505514
bindless_buffer_descriptors.push(quote! {
506515
#render_path::render_resource::BindlessBufferDescriptor {
@@ -953,16 +962,29 @@ pub fn derive_as_bind_group(ast: syn::DeriveInput) -> Result<TokenStream> {
953962
let (bindless_slot_count, actual_bindless_slot_count_declaration, bindless_descriptor_syntax) =
954963
match attr_bindless_count {
955964
Some(ref bindless_count) => {
965+
// Only require BUFFER_BINDING_ARRAY when the material actually uses
966+
// buffer binding arrays. Materials using only textures, samplers, and
967+
// data buffers can use bindless without it (e.g. on Metal).
968+
let required_features = if has_buffer_binding_arrays {
969+
quote! {
970+
#render_path::settings::WgpuFeatures::BUFFER_BINDING_ARRAY |
971+
#render_path::settings::WgpuFeatures::TEXTURE_BINDING_ARRAY
972+
}
973+
} else {
974+
quote! {
975+
#render_path::settings::WgpuFeatures::TEXTURE_BINDING_ARRAY
976+
}
977+
};
978+
956979
let bindless_supported_syntax = quote! {
957980
fn bindless_supported(
958981
render_device: &#render_path::renderer::RenderDevice
959982
) -> bool {
960983
render_device.features().contains(
961-
#render_path::settings::WgpuFeatures::BUFFER_BINDING_ARRAY |
962-
#render_path::settings::WgpuFeatures::TEXTURE_BINDING_ARRAY
984+
#required_features
963985
) &&
964986
render_device.limits().max_storage_buffers_per_shader_stage > 0 &&
965-
render_device.limits().max_samplers_per_shader_stage >=
987+
render_device.limits().max_binding_array_sampler_elements_per_shader_stage >=
966988
(#sampler_binding_count * #bindless_count_syntax)
967989
}
968990
};
@@ -1090,12 +1112,16 @@ pub fn derive_as_bind_group(ast: syn::DeriveInput) -> Result<TokenStream> {
10901112
match #actual_bindless_slot_count {
10911113
Some(bindless_slot_count) => {
10921114
let bindless_index_table_range = #bindless_index_table_range;
1115+
let used_resource_types = &[
1116+
#(#bindless_resource_types),*
1117+
];
10931118
#bind_group_layout_entries.extend(
10941119
#render_path::render_resource::create_bindless_bind_group_layout_entries(
10951120
bindless_index_table_range.end.0 -
10961121
bindless_index_table_range.start.0,
10971122
bindless_slot_count.into(),
10981123
#bindless_index_table_binding_number,
1124+
used_resource_types,
10991125
).into_iter()
11001126
);
11011127
#(#bindless_binding_layouts)*;

crates/bevy_render/src/render_resource/bindless.rs

Lines changed: 64 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -220,88 +220,83 @@ pub struct BindlessIndex(pub u32);
220220
/// Creates the bind group layout entries common to all shaders that use
221221
/// bindless bind groups.
222222
///
223-
/// `bindless_resource_count` specifies the total number of bindless resources.
224-
/// `bindless_slab_resource_limit` specifies the resolved
225-
/// [`BindlessSlabResourceLimit`] value.
223+
/// `used_resource_types` limits which binding arrays are created,
224+
/// reducing argument buffer slot usage on constrained platforms.
226225
pub fn create_bindless_bind_group_layout_entries(
227226
bindless_index_table_length: u32,
228227
bindless_slab_resource_limit: u32,
229228
bindless_index_table_binding_number: BindingNumber,
229+
used_resource_types: &[BindlessResourceType],
230230
) -> Vec<BindGroupLayoutEntry> {
231231
let bindless_slab_resource_limit =
232232
NonZeroU32::new(bindless_slab_resource_limit).expect("Bindless slot count must be nonzero");
233233

234-
// The maximum size of a binding array is the
235-
// `bindless_slab_resource_limit`, which would occur if all of the bindless
236-
// resources were of the same type. So we create our binding arrays with
237-
// that size.
234+
let stages = ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE;
238235

239-
vec![
240-
// Start with the bindless index table, bound to binding number 0.
236+
// Start the bindless index table; remaining entries are added
237+
// below based on which resource types the material actually uses.
238+
239+
let mut entries = vec![
240+
// Start with the bindless index table.
241241
storage_buffer_read_only_sized(
242242
false,
243243
NonZeroU64::new(bindless_index_table_length as u64 * size_of::<u32>() as u64),
244244
)
245-
.build(
246-
*bindless_index_table_binding_number,
247-
ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE,
248-
),
249-
// Continue with the common bindless resource arrays.
250-
sampler(SamplerBindingType::Filtering)
251-
.count(bindless_slab_resource_limit)
252-
.build(
253-
1,
254-
ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE,
255-
),
256-
sampler(SamplerBindingType::NonFiltering)
257-
.count(bindless_slab_resource_limit)
258-
.build(
259-
2,
260-
ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE,
261-
),
262-
sampler(SamplerBindingType::Comparison)
263-
.count(bindless_slab_resource_limit)
264-
.build(
265-
3,
266-
ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE,
267-
),
268-
texture_1d(TextureSampleType::Float { filterable: true })
269-
.count(bindless_slab_resource_limit)
270-
.build(
271-
4,
272-
ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE,
273-
),
274-
texture_2d(TextureSampleType::Float { filterable: true })
275-
.count(bindless_slab_resource_limit)
276-
.build(
277-
5,
278-
ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE,
279-
),
280-
texture_2d_array(TextureSampleType::Float { filterable: true })
281-
.count(bindless_slab_resource_limit)
282-
.build(
283-
6,
284-
ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE,
285-
),
286-
texture_3d(TextureSampleType::Float { filterable: true })
287-
.count(bindless_slab_resource_limit)
288-
.build(
289-
7,
290-
ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE,
291-
),
292-
texture_cube(TextureSampleType::Float { filterable: true })
293-
.count(bindless_slab_resource_limit)
294-
.build(
295-
8,
296-
ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE,
297-
),
298-
texture_cube_array(TextureSampleType::Float { filterable: true })
299-
.count(bindless_slab_resource_limit)
300-
.build(
301-
9,
302-
ShaderStages::FRAGMENT | ShaderStages::VERTEX | ShaderStages::COMPUTE,
303-
),
304-
]
245+
.build(*bindless_index_table_binding_number, stages),
246+
];
247+
248+
// Create binding arrays only for types that this material uses.
249+
// This is important for platforms like Metal where each binding array uses a buffer slot
250+
// (limited to 31 per the Metal Feature Set Tables:
251+
// https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf)
252+
for &(resource_type, ref binding_number) in BINDING_NUMBERS.iter() {
253+
if !used_resource_types.contains(&resource_type) {
254+
continue;
255+
}
256+
let Some(binding_type) = (match resource_type {
257+
BindlessResourceType::SamplerFiltering => Some(sampler(SamplerBindingType::Filtering)),
258+
BindlessResourceType::SamplerNonFiltering => {
259+
Some(sampler(SamplerBindingType::NonFiltering))
260+
}
261+
BindlessResourceType::SamplerComparison => {
262+
Some(sampler(SamplerBindingType::Comparison))
263+
}
264+
BindlessResourceType::Texture1d => {
265+
Some(texture_1d(TextureSampleType::Float { filterable: true }))
266+
}
267+
BindlessResourceType::Texture2d => {
268+
Some(texture_2d(TextureSampleType::Float { filterable: true }))
269+
}
270+
BindlessResourceType::Texture2dArray => {
271+
Some(texture_2d_array(TextureSampleType::Float {
272+
filterable: true,
273+
}))
274+
}
275+
BindlessResourceType::Texture3d => {
276+
Some(texture_3d(TextureSampleType::Float { filterable: true }))
277+
}
278+
BindlessResourceType::TextureCube => {
279+
Some(texture_cube(TextureSampleType::Float { filterable: true }))
280+
}
281+
BindlessResourceType::TextureCubeArray => {
282+
Some(texture_cube_array(TextureSampleType::Float {
283+
filterable: true,
284+
}))
285+
}
286+
BindlessResourceType::None
287+
| BindlessResourceType::Buffer
288+
| BindlessResourceType::DataBuffer => None,
289+
}) else {
290+
continue;
291+
};
292+
entries.push(
293+
binding_type
294+
.count(bindless_slab_resource_limit)
295+
.build(**binding_number, stages),
296+
);
297+
}
298+
299+
entries
305300
}
306301

307302
impl BindlessSlabResourceLimit {

0 commit comments

Comments
 (0)