Borrowed from https://github.com/Doubiiu/CodeTalker/tree/main/BIWI
Request the BIWI dataset from Biwi 3D Audiovisual Corpus of Affective Communication. Download all the compressed files, decompress them, and put the rigid_scans
, faces
and videos
into BIWI/
.
Description
14 persons, 40 English sentences, each of length 3~8 seconds. The face meshes are captured at 25 fps and the audio is captured with a sample rate of 44.1kHz. The dataset contains the following subfolders:
-
'rigid_scans' contains the sequences where the speakers just rotated their head around, with a neutral expression. In this folder there are also the personalised templates (.obj) and textures (.png) files. Each face scan has 23370 vertices.
-
'faces' contains the tracked facial geometries. In each subject's folder, there is one .tar and one .tgz file for each sequence: the .tar archives contain a .vl file for each frame, i.e., a binary file containing the 3D coordinates of each vertex in the generic face template, after the global rotation and translation were removed.
-
'videos' contains videos (.flv) of the rendered 3D geometries and original audio (sampling rate: 44.1kHz).
Data Preprocessing
Here we adopt a verbose way to preprocess the data partially using Matlab scripts, you may use your own script in Python to do the same thing with following sequential steps:
cd BIWI/
- Read the .vl files, normalize the coordinates and store them in .mat files (e.g., F1/vert/e01/frame_001.mat). Each .mat stores the 3D coordinates for one frame. (Optional) It will also out the .off files (e.g., F1/off/e01/frame_001.off) for debugging and visualization purpose. [Note: please define
targetpath
to your own path.]:
run("data_preprocess/creatDataset_BIWI.m")
- Read the template .obj files, normalize the coordinates and store them in .mat/.off files. [Note: please define
targetpath
to your own path.]:
run("data_preprocess/process_BIWI_template.m")
- Read the template .mat files, and store them in .pkl files into
templates.pkl
:
python data_preprocess/load_mat_to_template_pkl.py
- Convert the .mat files to .npy files (e.g., F2_e01.npy). Each .npy (associated with the shape of [num_frame, 23370*3]) stores the 3D coordinates for one sentence for each person and will be saved into
vertices_npy/
:
python data_preprocess/load_mat_to_multiple_vert_np.py
- Install ffmpeg, extract the audios from .flv video files and save them into
wav/
:
python data_preprocess/video2audio.py
You can obtain the preprocessed files (templates.pkl
, vertices_npy/
and wav/
) for training, and note that we have already provided the BIWI.ply
file.