The Macedonian-MTB treebank is a collection of annotated sentences taken from the Macedonian version of the Cairo CICLing Corpus and from the university textbook in syntax "Contemporary Macedonian Language 4" by Simov Sazdov.
The Macedonian-MTB treebank is a collection of annotated sentences taken from the Macedonian version of the Cairo CICLing Corpus and from the university textbook in syntax "Contemporary Macedonian Language 4" by Simov Sazdov. Under the CC Attribution-NonCommercial 4.0 International License. The treebank consists mainly of everyday, literary and a few non-fiction sentences texts.
-
A description of the treebank and its origin (creation method, data sources, etc.) In its current selection, apart from the sentences taken from the the Cairo CICLing Corpus, the treebank consists of representative sentences from Simov Sazdov's syntax textbook "Contemporary Macedonian Language 4" (Sazdov, 2012). The sentences were manually typed after obtaining the permission from Mr. Sazdov to use them for annotation.
-
A description of how the data was split into training, development and test sets The data is still too small to be split into training, development and test sets.
-
If there are multiple genres/domains, can they be told apart by sentence ids? Does the treebank consist of complete documents, or just randomly shuffled sentences?
- So far, the sentences are randomly selected sentences from (Sazdov 2012).
-
Acknowledgments and references that should be cited when using the treebank
-
A changelog section for treebanks that will be released for the second (or subsequent) time.
...
The sentences were manually annotated by Vladimir Cvetkoski, Mila Dimishkovska, Renata Jovanovska and Bojana Nafidova. Final revision and validation by Vladimir Cvetkoski. Also, for CONLL-U validation, http://spyysalo.github.io/conllu.js/ was used.
Саздов, С. (2012). Современ македонски јазик 4 (2. изд., p. 84 стр.). Табернакул. Sazdov, S. (2012). Contemporary Macedonian Language (2nd ed. p. 84). Tabernakul.
- 2023-11-15 v2.13
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.13 License: CC BY-SA 4.0 Includes text: yes Genre: grammar-examples Lemmas: manual native UPOS: manual native XPOS: not available Features: manual native Relations: manual native Contributors: Cvetkoski, Vladimir Contributing: here Contact: [email protected] ===============================================================================