The corpus UD_French-FQB is an automatic conversion of the French QuestionBank v1, a corpus entirely made of questions.
The original French QuestionBank is described in Hard Time Parsing Questions: Building a QuestionBank for French.. It was converted to UD with the conversion system described in the chapter 3 of the book Application of Graph Rewriting to Natural Language Processing and available on Inria Gitlab.
The original annotation scheme versions (phrase-structure, surface dependencies following the FTB scheme, Deep syntax annotations following the Deep Sequoia scheme are available at the following URL.
Due to the UD constraints on the test set size (at least 10k tokens), we recommend to simply concatenate this treebank to the Sequoia and FTB treebanks in order to get a robust, less domain sensitive, parser. Those 3 treebanks are perfectly compatible and were converted by the same team.
In our own experiments, we either used the UD_French-FQB in a 10-folds cross-validation scenario or in a train/dev/test scenario with the i_th sentence in train, i_th+1 in dev, i_th+2 in test.
- sentences: 2289
- words: 23236
- Average sentence length: 10.15
- TREC 08-11: 1893 sents.
- French Government/NGOs FAQs: 196 sents.
- CLEF 03: 200 (sents.)
Note that the TREC domain questions are a translation of the corresponding questions in the English Question Bank (Judge et al, 2006).
- contributors: Marie Candito, Bruno Guillaume, Djamé Seddah
- contact: Djamé Seddah: [email protected], Marie Candito: [email protected]
- UD maintainer: Bruno Guillaume, [email protected]
-
Djamé Seddah, Marie Candito. Hard Time Parsing Questions: Building a QuestionBank for French. Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 2016, Portorož, Slovenia.
-
Guillaume Bonfante, Bruno Guillaume, Guy Perrier. Application of Graph Rewriting to Natural Language Processing. ISTE Wiley, 1, pp.272, 2018, Logic, Linguistics and Computer Science Set, Christian Rétoré, 1786300966. ⟨hal-01814386⟩
-
John Judge, Aoife Cahill, and Joseph van Genabith, (2006). QuestionBank: Creating a Corpus of Parse-Annotated Questions. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006), pages 497–504, Sydney, Australia.
- 2020-11-15 v2.7
- New conversion from original treebank
- 2019-11-15 v2.5
- Update the conversion process to improve consistency with other French treebanks:
- expletive annotation with relations
expl:subj
,expl:comp
andexpl:pass
aux
->aux:tense
MWEPOS
->EXTPOS
- expletive annotation with relations
- Update the conversion process to improve consistency with other French treebanks:
- 2019-05-15 v2.4
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.4 License: LGPL-LR Includes text: yes Genre: nonfiction news Lemmas: converted from manual UPOS: converted from manual XPOS: manual native Features: converted from manual Relations: converted from manual Contributors: Seddah, Djamé; Candito, Marie; Guillaume, Bruno Contributing: elsewhere Contact: [email protected] ===============================================================================