Skip to content

Odeuropa/benchmarks_and_corpora

Repository files navigation

Benchmarks and Corpora

This page contains the annotations related to olfactory information from the benchmark created for the ODEUROPA project.

For 7 languages we selected a pool of documents covering different time periods (from 1620 to 1925) and topics (e.g. medicine, law, literature).

The annotation was carried out with the INCEpTION annotation platform (Klie et al., 2018) following the guidelines presented in:

Tonelli, Sara and Menini, Stefano. FrameNet-like Annotation of Olfactory Information in Texts. In Proceedings of LaTeCH-CLfL 2021

For every language we provide the list of the annotated Frame Elements in WebAnno format and the related .txt files.

All the metadata information for each language-specific benchmark, including author, year of publication, original source and genre are reported in the Excel spreadsheet at: https://github.com/Odeuropa/benchmarks_and_corpora/blob/main/Documents%20in%20Benchmark.xlsx

The distribution of the topics is shown in this graphs:

While the temporal distribution is displayed here:

This table contains an overview of the content of the annotations. We report between parenthesis the partner of the ODEUROPA consortium responsible for the annotation.

Dutch (KNAW) English (FBK) French (EURECOM) German (KNAW) Italian (FBK) Slovenian (JSI) Latin (KNAW)
Smell words 1,788 1,530 845 2,659 1,254 1,973 1,199
Total FEs 4,962 4,023 1,876 5,885 2,664 4,445 2,278
Source 1,922 1,313 710 2,297 952 1,638 772
Quality 1,071 1,084 450 1,730 707 936 552
Perceiver 336 362 140 399 153 266 241
Circumstances 399 248 88 274 202 228 192
Odour carrier 351 310 106 170 195 408 134
Effect 243 187 53 425 104 214 114
Evoked Odorant 228 91 103 258 74 285 42
Place 255 302 172 200 158 394 111
Time 127 126 49 131 119 75 108
Creator 30 0 5 1 0 1 12

The full list of annotated documents is available in Documents in Benchmark.xlsx. These have been selected from a larger list of corpora available at the following links:

Language Link
EN shorturl.at/BGS14
IT shorturl.at/npIL3
FR https://drive.google.com/drive/folders/1wwj5zhUl5ESxmxBBslHsepbMMsyydGf_?usp=sharing

Funding acknowledgement

EU logo

This work has been realised in the context of Odeuropa, a research project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101004469.

About

WP3 Benchmark Annotated Texts

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •