Add Betsy to speed up BC6 compression #91535

BlueCube3310 · 2024-05-03T20:28:44Z

Excerpt of #91150.

Implements the Betsy GPU texture compressor to drastically speed up BC6 compression, which is essential for compressed baked lightmaps.

Performance (compression ONLY):
Build - Production 64 bit,
CPU - Ryzen 9 5900x,
GPU - RTX 4060 TI

image	CVTT	Betsy
Symmetrical garden 8k .hdr with mipmaps	92.4s	595ms
Cobblestone Street Night .hdr 4k with mipmaps	26.5s	362ms
Laufenurg Church 8k .hdr with mipmaps	99.3s	522ms
Little Paris 8k .hdr with mipmaps	92.7s	467ms

TODO:

optimize the setup (shader compilation, RGB->RGBA conversion), since that takes up most of the time,
fix signed compression (when importing a layered texture, some slices are signed and others aren't, so the combined texture breaks).

thirdparty/README.md

clayjohn · 2024-05-03T21:07:47Z

This is going to help immensely with our lightmapper! Currently the lightmapper doesn't compress the results since it is so slow. But with this, we can finally re-enable compression by default

BlueCube3310 · 2024-05-04T12:15:45Z

In terms of performance, a layered 4096x2048x8 image takes ~11 seconds to import on an RTX 4060TI.

Update: After optimizing RGB-RGBA conversion and disabling the forced auto mipmap generation for layered textures (is there any reason why it was enabled in the first place?), the compression time goes down to ~7 seconds, and that's still mostly CPU bound. I'm hoping to bring it down to around 2, 3 seconds.

BlueCube3310 · 2024-05-04T12:47:40Z

Signed compression is now supported, though because the sign detection happens on individual layers, directional lightmaps break when recombined. The same issue happens in master.

fire · 2024-05-04T17:24:56Z

Is there a way of bc6 using sign detection on individual layers or the entire set at once without too much research time?

This seems like we need to implement a batch system?

BlueCube3310 · 2024-05-04T17:41:04Z

This is tricky since each layer is compressed as an independent image, and as such one layer isn't aware of the other ones. IMO the layered/3d/cube texture import pipeline needs a light rework as the current approach also causes other problems due to this issue.

BlueCube3310 · 2024-05-06T11:02:03Z

Did some benchmarks, the difference in performance is incredible (up to 200 times faster on some large images).

modules/betsy/SCsub

modules/betsy/image_compress_betsy.cpp

Calinou

Tested locally, it works as expected.

Benchmark

PC specifications

CPU: Intel Core i9-13900K
GPU: NVIDIA GeForce RTX 4090
RAM: 64 GB (2×32 GB DDR5-5800 C30)
SSD: Solidigm P44 Pro 2 TB
OS: Linux (Fedora 39)

Using a Linux optimized (optimize=speed lto=full) editor build.

image	CVTT	Betsy
French graden 8k with mipmaps	52.3s	4.2s
German town at night 4k with mipmaps	14.1s	1.1s
Church at starry night 8k with mipmaps	56.7s	4.1s

This is on average a speedup of 13.1×.

It's strange how my import speeds are quite a bit slower than @BlueCube3310's though, considering my GPU is significantly faster. I measured the times by starting the editor with --verbose, which prints the time taken after every reimport. The texture was assigned to a StandardMaterial3D so it was detected as used in 3D, which enables VRAM compression and mipmaps.

Output quality seems pretty much identical – I can't spot any differences.

Impact on stripped binary size:

115,089,512  godot.linuxbsd.editor.x86_64
115,118,184  godot.linuxbsd.editor.x86_64.betsy-bc6h

This is only 30 KB and only affects editor builds, so it's pretty reasonable.

Great work 🙂

PS: hdri-haven.com is a typosquatting site, use hdrihaven.com (which redirects to polyhaven.com) instead.

core/io/image.cpp

modules/betsy/image_compress_betsy.cpp

modules/betsy/bc6h.glsl

BlueCube3310 · 2024-05-06T17:02:16Z

It's strange how my import speeds are quite a bit slower than @BlueCube3310's though, considering my GPU is significantly faster.

My results only contain the time it takes to execute the compress function, and not the whole import process (decoding, mipmap generation, etc.). On my machine the whole process took ~14s with Betsy.

Edit: Both CVTT and Betsy now print the compression time with verbose.

PS: hdri-haven.com is a typosquatting site, use hdrihaven.com (which redirects to polyhaven.com) instead.

Redirected the sources to polyhaven.

fire · 2024-05-07T14:48:58Z

I was thinking about fix signed compression (when importing a layered texture, some slices are signed and others aren't, so the combined texture breaks). can we disable the signed path or somehow resolve or do we have to implement batched signing properly?

BlueCube3310 · 2024-05-09T11:32:41Z

The GPU compressor can now be toggled from the project settings and the compressor works in rendering modes other than Forward+.

BlueCube3310 · 2024-05-09T16:04:55Z

So I tried compressing a directional lightmap with Betsy and noticed some weird artifacts:

I thought this was a compressor-specific issue, so I tried CVTT (the one used by Godot):

I have no idea why this happens. It could be a format limitation, or simply a shared issue between the compressors. Either way, directional lightmaps should probably remain uncompressed.

jcostello · 2024-05-09T16:41:59Z

Ohh, that is a shame. Directional Lightmaps are so heavy. Maybe for moment you can use compression on non-directional lightmaps until de issue is solved

fire · 2024-05-09T17:38:14Z

Can you compress some thing else that has signed floats for comparison and against some stable signed float bc6 encoder?

BlueCube3310 · 2024-05-09T18:54:13Z

Can you compress some thing else that has signed floats for comparison and against some stable signed float bc6 encoder?

Left: Microsoft's DirectXTex,
Middle: CVTT,
Right: Betsy

So it looks like it's an issue with the compressors.

fire · 2024-05-09T19:14:25Z

Is there a quality knob we can maximize for Betsy before we try too hard on debugging?

There is a "quality" knob mentioned here https://github.com/knarkowicz/GPURealTimeBC6H?tab=readme-ov-file#gpurealtimebc6h

fire · 2024-05-09T19:18:06Z

modules/betsy/bc6h.glsl

+#include "UavCrossPlatform_piece_all.glsl"
+
+#VERSION_DEFINES
+#define QUALITY


@BlueCube3310 Here is the quality define.

Disabling the quality setting gets rid of the artifacts, but it also removes the negative values. Similarly, DirectXTex also seems to clamp the values to 0.

@BlueCube3310 What GPU and OS are you using? Are you able to try other GPUs?

Sometimes the artifacts are caused by genuine driver bugs.

I'm on Windows 10 64bit, I've tested an RTX 4060TI and a GTX 1660TI on the latest drivers, both exhibit the artifacting.

modules/betsy/image_compress_betsy.cpp

akien-mga · 2024-05-10T07:44:44Z

CC @darksylinc, FYI. Finally putting Betsy to good use in Godot!

fire · 2024-05-10T10:47:36Z

modules/betsy/bc6h.glsl

+	EncodeP1(block, blockMSLE, texels);
+#ifdef QUALITY
+	for (uint i = 0u; i < 32u; ++i) {
+		EncodeP2Pattern(block, blockMSLE, i, texels);
+	}
+#endif


Suggested change

EncodeP1(block, blockMSLE, texels);

#ifdef QUALITY

for (uint i = 0u; i < 32u; ++i) {

EncodeP2Pattern(block, blockMSLE, i, texels);

}

#endif

EncodeP1(block, blockMSLE, texels);

#ifdef QUALITY

for (uint i = 0u; i < 64u; ++i) {

EncodeP2Pattern(block, blockMSLE, i, texels);

}

#endif

I wonder if I'll cause a driver hang if I double the search space for an encoding value. What's your setup to output the three images for bc6?

I'm using this image as a source: testout.zip.

For CVTT I disable the GPU compressor from the settings, for Betsy I leave it enabled, and for DirectXTex I use a custom program to convert the EXR to DDS with BC6SF compression.

Edit:
It looks like NVIDIA Texture Tools can preserve the negative data without artifacts:

I'll stop poking at it, but one approach is to search more bc6 block modes but I have to go now.

BlueCube3310 · 2024-05-12T09:09:24Z

It looks like ASTC doesn't handle negative values either.

Calinou · 2024-05-12T20:57:29Z

Could we bias directional lightmaps when baking (and reverse this biasing in the shader) so that we don't need the texture to use signed compression? This would make them ASTC-friendly as well.

Doing so will impact precision somewhat, so I assume we'll need to find a reasonable bias value that is flexible enough for very bright lights that you might use in a real world project (e.g. the equivalent of Color(10, 10, 10)).

darksylinc · 2024-05-12T22:04:42Z

Could we bias directional lightmaps when baking (and reverse this biasing in the shader) so that we don't need the texture to use signed compression? This would make them ASTC-friendly as well.

Doing so will impact precision somewhat, so I assume we'll need to find a reasonable bias value that is flexible enough for very bright lights that you might use in a real world project (e.g. the equivalent of Color(10, 10, 10)).

Yes. But several biasing strategies would have to be tested to see which one provides overall the best precision.

e.g.

val = val * 0.5f + 0.5f; // Mapping [-1; 1] -> [0; 1]
val = 1.0f - val * 0.5f + 0.5f; // Mapping [-1; 1] -> [1; 0]
val = exp( val ); // Or exp2
val = exp( 1 - val );

fire · 2024-05-13T17:40:04Z

Been playing around with a python script and chatgpt4.

import numpy as np

def ulp(x):
    return np.nextafter(x, np.inf) - x

def map_minus_one_to_one(val):
    return val * 0.5 + 0.5 # Mapping [-1; 1] -> [0; 1] 

def map_one_to_zero(val):
    return 1.0 - (val * 0.5 + 0.5); # Mapping [-1; 1] -> [1; 0]

def map_exp(val):
    return np.exp(val)

def map_exp2(val):
    return np.exp2(val)

def map_exp_inverse(val):
    return np.exp(1 - val)

def map_abs(val):
    return np.abs(val)

def map_half_plus_half(val):
    return 0.5 * val + 0.5

def map_cos(val):
    return (np.cos(val) + 1) / 2

def map_sin(val):
    return (np.sin(val) + 1) / 2

def map_tanh(val):
    return (np.tanh(val) + 1) / 2

values = np.linspace(-1, 1, 1000)
mappings = {'map_minus_one_to_one': map_minus_one_to_one, 
            'map_one_to_zero': map_one_to_zero, 
            'map_exp': map_exp, 
            'map_exp2': map_exp, 
            'map_exp_inverse': map_exp_inverse,
            'map_abs': map_abs,
            'map_half_plus_half': map_half_plus_half,
            'map_cos': map_cos,
            'map_sin': map_sin,
            'map_tanh': map_tanh
            }

for name, mapping in mappings.items():
    mapped_values = mapping(values)
    ulps = ulp(mapped_values)
    print(f"Mapping {name}:")
    print(f"Min ULP: {np.min(ulps)}")
    print(f"Max ULP: {np.max(ulps)}")
    print(f"Mean ULP: {np.mean(ulps)}")
    print()

Clipped the results.

darksylinc · 2024-05-13T18:16:48Z

That's not exactly what I meant (although it's not bad, and it's close in spirit).

What I meant is that the average use case is going to use certain ranges more than others; thus the testing would need to generate several lightmaps and compare using real world data.

lyuma · 2024-05-13T19:44:44Z

My understanding is BC6H and ASTC hdr allow negative values.
I don't think we should change the format of dir lightmaps. We should debug the betsy shader code.

We should double check it isn't a shader optimizer issue. For example, in HLSL (some versions of)? fxc / dx11 will assume that reads from Texture2D<float4> return positive values. Something like float4 x = tex2D(...); if (x.x <0) { ... } will be optimized out or float foo = max(0.0, tex2D(...).x) will be optimized to float foo = tex2D(...)
Maybe GLSL optimizers have a similar issue

clayjohn · 2024-05-14T19:31:01Z

Once this is added, we should look into compressing ReflectionProbes and Skys when the update mode is set to "update once"

Calinou · 2024-05-15T23:07:08Z

Once this is added, we should look into compressing ReflectionProbes and Skys when the update mode is set to "update once"

For binary size reasons, VRAM compression is not available in export template builds (only decompression is available). I don't see a way of doing this unless we add ReflectionProbe prerendering (and saving to disk so it can be included in the PCK).

There are good arguments for including VRAM compression libraries in export templates (such as user-generated content or runtime lightmap baking), but it should be discussed in its own proposal.

Besides, the current Once update mode for reflection and sky rendering is not a "true" Once update mode as described in godotengine/godot-proposals#2934. This means that if a ReflectionProbe with the Once update mode is moving continuously, VRAM compression would need to be performed every frame, which is too slow.

Doing this for sky shaders that don't use TIME is an interesting idea, but it would also limit the resolution of the sky "texture" as you'd no longer have a sky with true per-pixel rendering. Instead, you'd be looking at a rasterized version of the sky shader.

BlueCube3310 · 2024-05-17T16:57:51Z

My understanding is BC6H and ASTC hdr allow negative values. I don't think we should change the format of dir lightmaps. We should debug the betsy shader code.

That is true (for BC6 at least, not sure about ASTC), although we would also have to modify astcenc, since it clamps the values to 0. The required modifications for both astcenc and Betsy will also take significantly more time.

darksylinc · 2024-05-17T17:00:59Z

Hi.

BC6H supports negative values, but it is betsy's encoder which does not.

ASTC has two variants: LDR & HDR. The LDR variant does not support negative values. I don't know about the HDR variant. However most GPUs do not support the HDR variant.

BlueCube3310 marked this pull request as ready for review May 3, 2024 20:44

BlueCube3310 requested review from a team as code owners May 3, 2024 20:44

fire reviewed May 3, 2024

View reviewed changes

thirdparty/README.md Outdated Show resolved Hide resolved

fire requested a review from a team May 3, 2024 22:12

Sauermann added enhancement feature proposal topic:import labels May 3, 2024

BlueCube3310 force-pushed the betsy-bc6h branch from e2ef58c to d141f9b Compare May 4, 2024 12:13

BlueCube3310 force-pushed the betsy-bc6h branch from d141f9b to b0c4cf1 Compare May 4, 2024 12:45

Calinou mentioned this pull request May 5, 2024

Expose method LightmapGI.bake() in editor and exported projects godotengine/godot-proposals#8656

Open

BlueCube3310 force-pushed the betsy-bc6h branch from b0c4cf1 to 4270b5c Compare May 5, 2024 12:03

Chaosus added this to the 4.x milestone May 5, 2024

BlueCube3310 force-pushed the betsy-bc6h branch from 4270b5c to d684d41 Compare May 6, 2024 10:05

fire reviewed May 6, 2024

View reviewed changes

modules/betsy/SCsub Outdated Show resolved Hide resolved

fire reviewed May 6, 2024

View reviewed changes

modules/betsy/SCsub Outdated Show resolved Hide resolved

fire reviewed May 6, 2024

View reviewed changes

modules/betsy/image_compress_betsy.cpp Outdated Show resolved Hide resolved

Calinou approved these changes May 6, 2024

View reviewed changes

AThousandShips approved these changes May 6, 2024

View reviewed changes

core/io/image.cpp Outdated Show resolved Hide resolved

modules/betsy/image_compress_betsy.cpp Outdated Show resolved Hide resolved

modules/betsy/bc6h.glsl Outdated Show resolved Hide resolved

BlueCube3310 force-pushed the betsy-bc6h branch 3 times, most recently from 35cee57 to b567ad0 Compare May 7, 2024 13:14

BlueCube3310 force-pushed the betsy-bc6h branch from 36bd295 to eceae51 Compare May 9, 2024 11:30

BlueCube3310 requested review from a team as code owners May 9, 2024 11:30

BlueCube3310 force-pushed the betsy-bc6h branch from eceae51 to 0fd6b09 Compare May 9, 2024 12:35

fire reviewed May 9, 2024

View reviewed changes

modules/betsy/image_compress_betsy.cpp Show resolved Hide resolved

fire reviewed May 10, 2024

View reviewed changes

Calinou mentioned this pull request May 16, 2024

FBX import takes forever (more than 2 minutes for single object) #85074

Open

BlueCube3310 force-pushed the betsy-bc6h branch from 0fd6b09 to 8d3c266 Compare May 16, 2024 19:10

Add Betsy to speed up BC6 compression

fdd3d74

BlueCube3310 force-pushed the betsy-bc6h branch from 8d3c266 to fdd3d74 Compare May 16, 2024 19:17

akien-mga modified the milestones: 4.x, 4.4 May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Betsy to speed up BC6 compression #91535

Add Betsy to speed up BC6 compression #91535

BlueCube3310 commented May 3, 2024 •

edited

clayjohn commented May 3, 2024

BlueCube3310 commented May 4, 2024

BlueCube3310 commented May 4, 2024 •

edited

fire commented May 4, 2024

BlueCube3310 commented May 4, 2024

BlueCube3310 commented May 6, 2024 •

edited

Calinou left a comment •

edited

BlueCube3310 commented May 6, 2024 •

edited

fire commented May 7, 2024 •

edited

BlueCube3310 commented May 9, 2024

BlueCube3310 commented May 9, 2024

jcostello commented May 9, 2024

fire commented May 9, 2024 •

edited

BlueCube3310 commented May 9, 2024 •

edited

fire commented May 9, 2024 •

edited

fire May 9, 2024 •

edited

BlueCube3310 May 10, 2024

This comment was marked as outdated.

darksylinc May 10, 2024

BlueCube3310 May 11, 2024

akien-mga commented May 10, 2024

fire May 10, 2024 •

edited

BlueCube3310 May 10, 2024 •

edited

fire May 10, 2024 •

edited

BlueCube3310 commented May 12, 2024

Calinou commented May 12, 2024 •

edited

darksylinc commented May 12, 2024

fire commented May 13, 2024 •

edited

darksylinc commented May 13, 2024

lyuma commented May 13, 2024

clayjohn commented May 14, 2024

Calinou commented May 15, 2024 •

edited

BlueCube3310 commented May 17, 2024

darksylinc commented May 17, 2024

Add Betsy to speed up BC6 compression #91535

Are you sure you want to change the base?

Add Betsy to speed up BC6 compression #91535

Conversation

BlueCube3310 commented May 3, 2024 • edited

clayjohn commented May 3, 2024

BlueCube3310 commented May 4, 2024

BlueCube3310 commented May 4, 2024 • edited

fire commented May 4, 2024

BlueCube3310 commented May 4, 2024

BlueCube3310 commented May 6, 2024 • edited

Calinou left a comment • edited

Choose a reason for hiding this comment

Benchmark

BlueCube3310 commented May 6, 2024 • edited

fire commented May 7, 2024 • edited

BlueCube3310 commented May 9, 2024

BlueCube3310 commented May 9, 2024

jcostello commented May 9, 2024

fire commented May 9, 2024 • edited

BlueCube3310 commented May 9, 2024 • edited

fire commented May 9, 2024 • edited

fire May 9, 2024 • edited

Choose a reason for hiding this comment

BlueCube3310 May 10, 2024

Choose a reason for hiding this comment

This comment was marked as outdated.

darksylinc May 10, 2024

Choose a reason for hiding this comment

BlueCube3310 May 11, 2024

Choose a reason for hiding this comment

akien-mga commented May 10, 2024

fire May 10, 2024 • edited

Choose a reason for hiding this comment

BlueCube3310 May 10, 2024 • edited

Choose a reason for hiding this comment

fire May 10, 2024 • edited

Choose a reason for hiding this comment

BlueCube3310 commented May 12, 2024

Calinou commented May 12, 2024 • edited

darksylinc commented May 12, 2024

fire commented May 13, 2024 • edited

darksylinc commented May 13, 2024

lyuma commented May 13, 2024

clayjohn commented May 14, 2024

Calinou commented May 15, 2024 • edited

BlueCube3310 commented May 17, 2024

darksylinc commented May 17, 2024

BlueCube3310 commented May 3, 2024 •

edited

BlueCube3310 commented May 4, 2024 •

edited

BlueCube3310 commented May 6, 2024 •

edited

Calinou left a comment •

edited

BlueCube3310 commented May 6, 2024 •

edited

fire commented May 7, 2024 •

edited

fire commented May 9, 2024 •

edited

BlueCube3310 commented May 9, 2024 •

edited

fire commented May 9, 2024 •

edited

fire May 9, 2024 •

edited

fire May 10, 2024 •

edited

BlueCube3310 May 10, 2024 •

edited

fire May 10, 2024 •

edited

Calinou commented May 12, 2024 •

edited

fire commented May 13, 2024 •

edited

Calinou commented May 15, 2024 •

edited