Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biomedical named entities #11

Open
dan-zeman opened this issue Jul 10, 2023 · 0 comments
Open

Biomedical named entities #11

dan-zeman opened this issue Jul 10, 2023 · 0 comments

Comments

@dan-zeman
Copy link

I suppose that the classes animal and plant are not included in biomedical-entity because they are only used in cases when we either need an abstract concept (for a pronoun), or the animal/plant has a name as an individual, such as Archie. Right?

Now, as for the types under biomedical-entity: At least some of them seem problematic to me, namely taxon, species, and disease.

In Czech, species usually have two-word names, a common noun followed by an adjective. (NB: This word order is specific to names of species, otherwise the adjective would precede the modified noun.) The first word (the common noun) denotes the genus, i.e., a taxon. Do we really want to treat it as a named entity? It can be an unusual (from the Czech perspective) word, such as ptakopysk "platypus" or araukárie "araucaria", but it will also include all common animal and plant names, such as kočka "cat" or dub "oak". What is it that makes these words named entities?

In fact, kočka “cat” is an animal with a particular set of characteristics, just like dub “oak” is a particular type (hyponym) of tree, and hrad “castle” is a particular type of building. But the first two words are biological genuses, hence taxons, while hrad has no special status in the UMR taxonomy. (In the Czech grammar, all three are common nouns.) There is no reason why kočka and dub should be named entities. And by extension, there is little reason why species should be named entities, for example kočka domácí “cat (Felis catus)”, or dub letní “pedunculate oak (Quercus robur)”, or why other taxons should, for example šelmy “beasts of prey, Carnivora”, savci “mammals”, or živočichové “animals, Animalia”. It is true that some species have names that are less common than others and were invented by scholars who discovered and described the species, rather than being part of the language since ancient times. But it would be neither tractable nor helpful to attempt to distinguish them. Perhaps the only exception is the scientific names in Latin, provided that the language of the annotated text is not Latin.

Similarly, diseases may have scientific names but many common diseases are just common nouns or expressions (angína “tonsillitis”, chřipka “flu”, mor “plague”, neštovice “chickenpox”) and it is not clear why they should be handled differently from other common nouns. Moreover, diseases are states rather than entities, aren't they?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant