This page contains the annotations related to olfactory information from the benchmark created for the ODEUROPA project.
For 7 languages we selected a pool of documents covering different time periods (from 1620 to 1925) and topics (e.g. medicine, law, literature).
The annotation was carried out with the INCEpTION annotation platform (Klie et al., 2018) following the guidelines presented in:
Tonelli, Sara and Menini, Stefano. FrameNet-like Annotation of Olfactory Information in Texts. In Proceedings of LaTeCH-CLfL 2021
For every language we provide the list of the annotated Frame Elements in WebAnno format and the related .txt files.
All the metadata information for each language-specific benchmark, including author, year of publication, original source and genre are reported in the Excel spreadsheet at: https://github.com/Odeuropa/benchmarks_and_corpora/blob/main/Documents%20in%20Benchmark.xlsx
The distribution of the topics is shown in this graphs:
While the temporal distribution is displayed here:
This table contains an overview of the content of the annotations. We report between parenthesis the partner of the ODEUROPA consortium responsible for the annotation.
Dutch (KNAW) | English (FBK) | French (EURECOM) | German (KNAW) | Italian (FBK) | Slovenian (JSI) | Latin (KNAW) | |
---|---|---|---|---|---|---|---|
Smell words | 1,788 | 1,530 | 845 | 2,659 | 1,254 | 1,973 | 1,199 |
Total FEs | 4,962 | 4,023 | 1,876 | 5,885 | 2,664 | 4,445 | 2,278 |
Source | 1,922 | 1,313 | 710 | 2,297 | 952 | 1,638 | 772 |
Quality | 1,071 | 1,084 | 450 | 1,730 | 707 | 936 | 552 |
Perceiver | 336 | 362 | 140 | 399 | 153 | 266 | 241 |
Circumstances | 399 | 248 | 88 | 274 | 202 | 228 | 192 |
Odour carrier | 351 | 310 | 106 | 170 | 195 | 408 | 134 |
Effect | 243 | 187 | 53 | 425 | 104 | 214 | 114 |
Evoked Odorant | 228 | 91 | 103 | 258 | 74 | 285 | 42 |
Place | 255 | 302 | 172 | 200 | 158 | 394 | 111 |
Time | 127 | 126 | 49 | 131 | 119 | 75 | 108 |
Creator | 30 | 0 | 5 | 1 | 0 | 1 | 12 |
The full list of annotated documents is available in Documents in Benchmark.xlsx. These have been selected from a larger list of corpora available at the following links:
Language | Link |
---|---|
EN | shorturl.at/BGS14 |
IT | shorturl.at/npIL3 |
FR | https://drive.google.com/drive/folders/1wwj5zhUl5ESxmxBBslHsepbMMsyydGf_?usp=sharing |
This work has been realised in the context of Odeuropa, a research project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101004469.