Voxel Performance

Voxel Problem
This project began when I wanted to make creatures, items, and landscapes out of voxels. I liked the idea of voxels because they can be used to build up and break down objects in an intuitive way. The problems were speed, memory use, and lack of standardization.

I knew I'd have to write my own voxel package, so I did a brief survey of some free and paid Assets for Unity. Most of them dodged any responsibility for performance, and were suited to using a few tens of thousands voxels in a game scene the way you might use particle effects. I wanted to use voxels as a core game mechanic.

Even with bloated classes used for individual voxels, modern PCs have so much RAM that it's almost not a problem. The biggest problem was speed.

Analysis
Profiling showed that the conversion of a voxel chunk into a mesh of triangles took up over 90% of runtime. Creating large voxel maps quickly was a simple matter of reusing calls to the Perlin noise function, and once converted to meshes a voxel volume performs much as any other object.

Converting chunks can easily be done in parallel, but with eight threads performance was still less than satisfactory. Using a geometry shader to display voxels more directly was no help and began to lag seriously with volumes of only a few million voxels. Sending only surface voxels to the shader handed the problem back to CPU threads. CPU performance is adequate for updates but not for the initial massive volumes required for random map generation.

Solution
Using a series of Compute Shader kernels to produce maps and compress them into an array of visible voxels turned out to be a satisfactory solution. A Geometry Shader can accept a GPU buffer that contains the array to avoid any need to copy to and from main system RAM before the initial display.

In the Unity Asset I wrote to implement and test this process, I settled upon cube voxel chunks 256 on each side that use one byte per voxel. Sending a chunk to or from the GPU would take less than a second on a newer PCIe bus but still be far too slow. With map generation and initial display wholly on the graphics card, the CPU is freed for other scene setup and raw voxel data can be transferred as needed (if at all).

Performance
The speed of the Compute and Geometry Shaders will depend on hardware, but an onboard GPU takes about a third of a second per chunk. Converting to a mesh takes a little longer and uses more memory. Converting to Unity TerrainData is very fast, but only stores a height map.

Sixteen million voxels (256 x 256 x 256) in a chunk take up 16 MB. Surface voxels are typically 512 KB (128K x 4 bytes) for each chunk ready for display.

Level Five Menger Sponge

Displaying 3.2 million voxels.

Stress Testing
This is a fractal that is sometimes used for stress testing voxel libraries because it has no hidden voxels. That is, every voxel has at least one visible face from some angle. It's 14.3 million (243 x 243 x 243) in total volume, with 3.2 million solid voxels.

Using an onboard GPU the FPS varies from 5 to 9 as the camera moves around and through the object.

Flying Through Menger Sponge

Flying Around Terrain

Demonstrations
The terrain demonstration video shows a grid of 16 chunks with all data generated at runtime and near the end you can see the entire 1024x1024 map. Even one 256x256 voxel area could be a game scene but with this asset much larger scenes can be generated and kept active with minimal impact on CPU and main RAM.

Implementation - Geometry Shader
A Geometry Shader has some overhead. One reason is that it can produce a variable amount of output that will need to be moved into a contiguous memory block. My solution is to produce the same amount of output for each geometry function call, and to minimize branching and copying as much as possible.

Each cube face requires four vertices, and the camera can see at most three faces depending on its position relative to the voxel. This makes for three branches that draw very similar faces, except for a shift along one of the axes. The shift variable is assigned to minimize the effect of the branch and used to adjust the face position when the vertices are created.

See the full source code at:

Geometry Shader (Voxel Performance Source Code)

Geometry Shader

/ For each voxel that is visible from some angle, paint the// three sides that the given camera might see.[maxvertexcount(12)]void geom( point inputGS p[1], inout TriangleStream<input> triStream ){float4 pos = p[0].pos * float4( _Size, _Size, _Size, 1 );float4 shift;float4 voxelPosition = pos + _chunkPosition;float halfS = _Size * 0.5; // x, y, z is the center of the voxel,                            // paint sides offset by half of Sizeinput pIn1, pIn2, pIn3, pIn4;
  pIn1._color = p[0]._color;  pIn1.uv = float2( 0.0f, 0.0f );
  pIn2._color = p[0]._color;  pIn2.uv = float2( 0.0f, 1.0f );
  pIn3._color = p[0]._color;  pIn3.uv = float2( 1.0f, 0.0f );
  pIn4._color = p[0]._color;  pIn4.uv = float2( 1.0f, 1.0f );

  shift = (_cameraPosition.x < voxelPosition.x)        ? float4( 1, 1, 1, 1 ) : float4( -1, 1, -1, 1 );
  pIn1.pos = mul( UNITY_MATRIX_VP, mul( _worldMatrixTransform,                     pos + shift*float4( -halfS, -halfS, halfS, 0 ) ));  triStream.Append( pIn1 );
  pIn2.pos = mul( UNITY_MATRIX_VP, mul( _worldMatrixTransform,                     pos + shift*float4( -halfS, halfS, halfS, 0 ) ));  triStream.Append( pIn2 );
  pIn3.pos = mul( UNITY_MATRIX_VP, mul( _worldMatrixTransform,                     pos + shift*float4( -halfS, -halfS, -halfS, 0 )));  triStream.Append( pIn3 );
  pIn4.pos = mul( UNITY_MATRIX_VP, mul( _worldMatrixTransform,                     pos + shift*float4( -halfS, halfS, -halfS, 0 ) ));  triStream.Append( pIn4 );
  triStream.RestartStrip();

Future Directions
I created this asset for my own use, packaging it into what I hope is a simple, performant, and reusable form. I have ideas for later versions, but I would also love to hear yours if you're willing to share them, or let me know how you're using this asset to conquer the world of voxels.

One idea I've been toying with lately is changing the pre-display optimization step to return several GPU buffers instead of just one. This would allow more specialized shaders to handle different voxel information, such as support for clouds, liquids, or a graphics change for different voxel faces.

Let me know what you think!

Related work with weather, and inspiration for this asset

Voxel Performance

Related : Voxel Performance

0 carutan:

Post a Comment

POPULAR

Labels