Skip to content

Latest commit

 

History

History
executable file
·
48 lines (34 loc) · 2.73 KB

README.md

File metadata and controls

executable file
·
48 lines (34 loc) · 2.73 KB

Dataset

Borrowed from https://github.com/Doubiiu/CodeTalker/tree/main/BIWI

Request the BIWI dataset from Biwi 3D Audiovisual Corpus of Affective Communication. Download all the compressed files, decompress them, and put the rigid_scans, faces and videos into BIWI/.

Description

14 persons, 40 English sentences, each of length 3~8 seconds. The face meshes are captured at 25 fps and the audio is captured with a sample rate of 44.1kHz. The dataset contains the following subfolders:

  • 'rigid_scans' contains the sequences where the speakers just rotated their head around, with a neutral expression. In this folder there are also the personalised templates (.obj) and textures (.png) files. Each face scan has 23370 vertices.

  • 'faces' contains the tracked facial geometries. In each subject's folder, there is one .tar and one .tgz file for each sequence: the .tar archives contain a .vl file for each frame, i.e., a binary file containing the 3D coordinates of each vertex in the generic face template, after the global rotation and translation were removed.

  • 'videos' contains videos (.flv) of the rendered 3D geometries and original audio (sampling rate: 44.1kHz).

Data Preprocessing

Here we adopt a verbose way to preprocess the data partially using Matlab scripts, you may use your own script in Python to do the same thing with following sequential steps:

cd BIWI/
  1. Read the .vl files, normalize the coordinates and store them in .mat files (e.g., F1/vert/e01/frame_001.mat). Each .mat stores the 3D coordinates for one frame. (Optional) It will also out the .off files (e.g., F1/off/e01/frame_001.off) for debugging and visualization purpose. [Note: please define targetpath to your own path.]:
run("data_preprocess/creatDataset_BIWI.m")
  1. Read the template .obj files, normalize the coordinates and store them in .mat/.off files. [Note: please define targetpath to your own path.]:
run("data_preprocess/process_BIWI_template.m")
  1. Read the template .mat files, and store them in .pkl files into templates.pkl:
python data_preprocess/load_mat_to_template_pkl.py 
  1. Convert the .mat files to .npy files (e.g., F2_e01.npy). Each .npy (associated with the shape of [num_frame, 23370*3]) stores the 3D coordinates for one sentence for each person and will be saved into vertices_npy/:
python data_preprocess/load_mat_to_multiple_vert_np.py
  1. Install ffmpeg, extract the audios from .flv video files and save them into wav/:
python data_preprocess/video2audio.py

You can obtain the preprocessed files (templates.pkl, vertices_npy/ and wav/) for training, and note that we have already provided the BIWI.ply file.