THCHS30 is an open Chinese speech database published by Center for Speech and Language Technology (CSLT) at Tsinghua University. You can cite the data using the following BibTeX entry:
@misc{THCHS30_2015,
title={THCHS-30 : A Free Chinese Speech Corpus},
author={Dong Wang, Xuewei Zhang, Zhiyong Zhang},
year={2015},
url={http://arxiv.org/abs/1512.01882}
}
The data was obtained from http://www.openslr.org/18/ . The original .wav files were converted to .mp3 at 22kHz. Only the data/
directory is kept. The train/
, dev/
and test/
directories, which contained symlinks to data/
are not in this repository.