Skip to content
BigBro222's Blog
LinkedInGitHub

GPU Skinning+GPU instancing for rendering 1000+ characters

Unity development4 min read

CPU computing SkinnedMesh
Figure 1 CPU computing SkinnedMesh
GPU computing SkinnedMesh
Figure 2 GPU computing SkinnedMesh

GPU Skinning is a GPU calculation technique to free the CPU from calculating animation using Skinmesh renderer.

When developing scenes with tons of thousands of people, the most computational cost comes from

  • mesh calculated by the weight of Bones
  • the Draw Call for a large amounts of meshes.

The main concept would be using vertex shader to play the animation and use GPU instancing lower the draw call.

1. Skeleton animation

The concept of a skeleton animation is to store the bones transformation info per frame and the bone index and its weight that affects each vertex. When playing the animation, the CPU make this calculation and change the vertex position accordingly. So an animation normally contains:

  • Bone structures
  • Mesh info
  • Skin info( bone index and weights)
  • Key frame data

the world position of vertex is defined by bones

mesh vertex(defined in mesh space)——>Bone space——>world

How skeleton-based animation store data
Figure 3 How skeleton-based animation store data

if we can use vertex shader to replace the SkinnedMeshRenderer, then we can move the calculation to the GPU side which helps lower the CPU consumption. The technique is very similar to some other common shaders that we normally use, for example: flag, fish or grass.

But we still need some important information to make the character move:
  • Bone Index and Bone Weight of each vertex
  • Transformation data of bones per frame.

2.Get bone transformation data to texture

Transformation matrix
1//locate to the frame
2clip.SampleAnimation(gameObject, i / clip.frameRate);
3
4//get the bone transformation matrix
5Matrix4x4 matrix = skinnedMeshRenderer.transform.worldToLocalMatrix * bones[j].localToWorldMatrix *bindPoses[j];

after getting the bone transformation offset matrix, we can save the matrix , we can save the matrix info of each float to RGBA and finally convert to Texture2D. This way, the float can maintain precision. The process is described in the following picture.

Bone transformation info to Texture2D
Figure 4 Bone transformation info to Texture2D
1// Convert the twelve float to RGBA to save precision accuracy.
2for (var j = 0; j < bonesCount; j++)
3 {
4 var matrix = transform.worldToLocalMatrix * bones[j].localToWorldMatrix * bindPoses[j];
5
6 colors[(i * bonesCount + j) * 12 + 0] = EncodeFloatRGBA(matrix.m00);
7 colors[(i * bonesCount + j) * 12 + 1] = EncodeFloatRGBA(matrix.m01);
8 colors[(i * bonesCount + j) * 12 + 2] = EncodeFloatRGBA(matrix.m02);
9 colors[(i * bonesCount + j) * 12 + 3] = EncodeFloatRGBA(matrix.m03);
10
11 colors[(i * bonesCount + j) * 12 + 4] = EncodeFloatRGBA(matrix.m10);
12 colors[(i * bonesCount + j) * 12 + 5] = EncodeFloatRGBA(matrix.m11);
13 colors[(i * bonesCount + j) * 12 + 6] = EncodeFloatRGBA(matrix.m12);
14 colors[(i * bonesCount + j) * 12 + 7] = EncodeFloatRGBA(matrix.m13);
15
16 colors[(i * bonesCount + j) * 12 + 8] = EncodeFloatRGBA(matrix.m20);
17 colors[(i * bonesCount + j) * 12 + 9] = EncodeFloatRGBA(matrix.m21);
18 colors[(i * bonesCount + j) * 12 + 10] = EncodeFloatRGBA(matrix.m22);
19 colors[(i * bonesCount + j) * 12 + 11] = EncodeFloatRGBA(matrix.m23);
20 }

3.Mapping BoneIndex And Weight To MeshUV

Although we have transformation information for each bone per frame, there is one more thing, that is, which bones are influencing a vertex and its weight. But if you think about it carefully, you can understand that this index and weight are static data of a Mesh.

Since the Mesh is static, it is written directly into the Mesh. The most common one is of course the UV channel. UV is a Vector2 vector, so only one pair of weight index data can be stored at a time. If you have higher requirements for accuracy, you can use 2 UV channels or 4 UV channels.

1//Deep copy from original Mesh
2var bakedMesh = new Mesh();
3bakedMesh = Instantiate(mesh);
4
5//Bake the index and weight info into Mesh UV2 and UV3
6MappingBoneIndexAndWeightToMeshUV(bakedMesh, UVChannel.UV2, UVChannel.UV3);

4.Retrive info from Texture

After getting the data, its time to replace the Skinedrenderer with Shader.

BoneMartix material baked before pixel-by-pixel decoding in Shader, the current vertex is affected by those two bones(might be up to four bones based on the previous step), extract and assemble their matrix information:

1//find bone0 from BoneMatrix2Texture2D
2float total = (y * _BoneCount + (int)(index.x)) * 12;
3float4 line0 = readInBoneTex(total);
4float4 line1 = readInBoneTex(total + 4);
5float4 line2 = readInBoneTex(total + 8);
6
7//get bone0 Matrix4X4 transformation matrix
8float4x4 mat1 = float4x4(line0, line1, line2, float4(0, 0, 0, 1));
9
10//find bone1 from BoneMatrix2Texture2D
11total = (y * _BoneCount + (int)(index.y)) * 12;
12line0 = readInBoneTex(total);
13line1 = readInBoneTex(total + 4);
14line2 = readInBoneTex(total + 8);
15
16//get bone1 Matrix4X4 transformation matrix
17float4x4 mat2 = float4x4(line0, line1, line2, float4(0, 0, 0, 1));

(Vertex Shader):

1// get boneindex and weight
2float2 index = v.iuv;
3float2 weight = v.wuv;
4
5//make the vertex move
6float4 pos = mul(mat1, v.vertex) * weight.x + mul(mat2, v.vertex) * (1 - weight.x);

5. GPU Instancing

With the GPU Skinning technology, the skinning work in the CPU is transferred to the GPU for execution, and the running efficiency of the Unity scene is greatly improved. We can further optimize by introducing GPU Instancing technology.

Now, each rendering of a character requires the CPU to communicate with the GPU. The CPU will tell the GPU the Mesh, Material and position information of the character, and the GPU is responsible for rendering according to the instructions. However, the problem is obvious: because we are rendering many identical characters, each time the CPU tells the GPU much of the same information (Mesh & Material), only the position information is different. Can the CPU tell the GPU all the positions that should be rendered at one time, avoiding a lot of invalid communication? GPU Instancing should be a good fit for this senario.

GPU instancing is a draw call optimization method that renders multiple copies of a mesh with the same material in a single draw call. Each copy of the mesh is called an instance. This is useful for drawing things that appear multiple times in a scene, for example, trees or bushes.

GPU instancing renders identical meshes in the same draw call. To add variation and reduce the appearance of repetition, each instance can have different properties, such as Color or Scale. Draw calls that render multiple instances appear in the Frame Debugger as Draw Mesh (instanced).

Send data to the GPU at one time, and use a drawing function to let the rendering pipeline draw multiple identical objects using this data

With the GPU Skinning technology, the skinning work in the CPU is transferred to the GPU for execution, and the running efficiency of the Unity scene is greatly improved. We can further optimize by introducing GPU Instancing technology.

Now, each rendering of a character requires the CPU to communicate with the GPU. The CPU will tell the GPU the Mesh, Material and position information of the character, and the GPU is responsible for rendering according to the instructions. However, the problem is obvious: because we are rendering many identical characters, each time the CPU tells the GPU much of the same information (Mesh & Material), only the position information is different. Can the CPU tell the GPU all the positions that should be rendered at one time, avoiding a lot of invalid communication? GPU Instancing came into being.

By add the keyword in shader can enable the Auto GPU instancing:

1#pragma multi_compile_instancing

And we can also change Properties in the shader (for example time offset)

1Properties
2{
3 //property needs to be changed
4 _Offset("Offset", Range(0, 1)) = 0
5}
6
7···
8//Declare
9UNITY_INSTANCING_BUFFER_START(Props)
10 UNITY_DEFINE_INSTANCED_PROP(float, _Offset)
11UNITY_INSTANCING_BUFFER_END(Props)
12···
13
14//use the property
15float y = _Time.y * _FrameRate + UNITY_ACCESS_INSTANCED_PROP(Props, _Offset) * _FrameCount;

After this, by click the Enable GPU Instancing, the GPU instancing is enabled

GPUSkinning, CPUSkinning
Figure 5 Left: GPUSkinning, Right: CPUSkinning

You can find a full version of example here

6. Conclusion

By using GPU skinning and GPU instancing techniques, we can dramatically increase the number of characters can be presented inside a scene. However, because it’s based on vertex shader, we have to bake animation textures for each single one of them. And also, blending between different animations is hard to implement, which makes it hard to apply to precise animations. When dealing with detail character interactions, it’s always better to use a “normal” animator.

Rendering about one thousand people to simulate crowd stream
Figure 6 Rendering about one thousand people to simulate crowd stream

Reference

https://zentia.github.io/2018/03/23/Engine/Unity/Unity-Skinned-Mesh/

https://blog.csdn.net/weixin_30784501/article/details/98346439?spm=1001.2101.3001.6661.1&utm_medium=distribute.pc_relevant_t0.none-task-blog-2~default~CTRLIST~Rate-1-98346439-blog-100771084.pc_relevant_default&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2~default~CTRLIST~Rate-1-98346439-blog-100771084.pc_relevant_default&utm_relevant_index=1

https://zhuanlan.zhihu.com/p/76562300

https://blog.csdn.net/candycat1992/article/details/42127811

https://zhuanlan.zhihu.com/p/523702434

https://zhuanlan.zhihu.com/p/50640269

https://github.com/Minghou-Lei/GPU-Skinning-Demo

https://github.com/chengkehan/GPUSkinning

https://github.com/Unity-Technologies/Animation-Instancing

© 2023 by BigBro222's Blog. All rights reserved.