How many steps does it take to get a face 3D modeling? Well, according to the latest findings, you only need a smartphone and it will create your face’s 3D model in 3.5 minutes.
Many of you might say that 3D models of our faces or anyone’s face can be useful in concrete situations. Therefore, there are not many applications of such models. But the truth is the metaverse is getting closer and all of us will need a 3D face avatar.
Well, let’s see what new research from Meta Reality Labs show and how it works.
As the authors mention, the method is suitable for VR applications. We understand that metaverse will allow users not to appear with their own faces and maybe this is what many people want. But if not, if you want your own 3D face, then such approaches might be quite useful.
3D face modeling method principle
It is divided into three parts.
First, it needs to train a super-network with a large multi-view face dataset. The latter can generate individual-specific avatar parameters through a neural network decoder.
The faces in the dataset were collected by a multi-view capture system, including facial image data of 255 participants of different ages, genders, and ethnicities.
Developed by Meta in 2019, the giant device for capturing 3D faces is equipped with 171 high-resolution cameras capable of recording 180GB of data per second. The data collection time is about 1 hour.
It is worth mentioning that in this super-network, the basic building block of the decoder is a convolutional upsampling layer with a biased map. They are used to generate volume units. Afterward, they are used to render avatars via ray tracing.
In addition, the decoder architecture is able to distinguish gaze from other facial activity. In VR applications, it means more direct use of eye-tracking systems.
Second, it is lightweight facial expression capture. Thus, you only need a smartphone with a depth camera to capture faces. For instance, the scholars used the iPhone 12.
We need the camera to get the geometric shapes and textures in each frame of the face image. Then, it has to perform face landmark detection and portrait segmentation on the input RGB image. Afterward, it will fit and warp template grids to match detected facial landmarks, segmentation contours, and depth maps. Next, it will unpack the texture of each frame image, and then summarize it to get the complete face texture.
By the way, the system needs up to 65 specific expressions to further refine the model.
Finally, the 3D face avatar output by this method can not only highly match the user’s appearance, but also further drive and control it through the global expression space.
The entire collection process took about 3.5 minutes. But we have to mention that the modeling process is not real-time. Thus, data processing takes hours.