Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator support for ExtPos in checking external deprels #1062

Open
nschneid opened this issue Nov 2, 2024 · 12 comments
Open

Validator support for ExtPos in checking external deprels #1062

nschneid opened this issue Nov 2, 2024 · 12 comments
Assignees
Labels
enhancement features universal UPOS Universal part-of-speech tags: definitions and examples
Milestone

Comments

@nschneid
Copy link
Contributor

nschneid commented Nov 2, 2024

In English-EWT sentence answers-20111108103930AA7FPhc_ans-0007 there is a connective that is clearly supposed to be prepositional "due to" but the "to" is omitted.

The way "due to" is normally handled is as an ADJ+ADP fixed expression, functioning holistically like a preposition, which we indicate with ExtPos=ADP on the first word (#1037).

At present, the validator ignores external deprel checks on fixed heads. But in this sentence, the "to" is missing, so there is no overt fixed relation, and the validator is throwing an error that an ADJ cannot attach as case.

I think the correct validator behavior is to use the ExtPos if present for checking the deprel. I will temporarily change "due" from case to amod but hope to change it back in the future.

nschneid added a commit to UniversalDependencies/UD_English-EWT that referenced this issue Nov 2, 2024
@amir-zeldes
Copy link
Contributor

+1 - the correct deprel is case and not amod. An alternative is to directly tag "due" as a preposition in this context, but I like this suggestion better, since it's really just due an error ;)

@dan-zeman dan-zeman added enhancement UPOS Universal part-of-speech tags: definitions and examples features universal labels Nov 4, 2024
@dan-zeman dan-zeman self-assigned this Nov 4, 2024
@dan-zeman dan-zeman added this to the v2.16 milestone Nov 4, 2024
dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 18, 2024
@dan-zeman
Copy link
Member

Implemented. In consequence, some treebanks have errors that were not reported before (because the treebanks use ExtPos and its value does not match the deprel):

  • French-GSD ... 1
  • French-Sequoia ... 2
  • Portuguese-Bosque ... 12
  • Portuguese-GSD ... 6

@arademaker
Copy link
Contributor

Ok, I can fix the Portuguese GSD and Bosque.

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 18, 2024
@dan-zeman
Copy link
Member

I am now going to gradually remove the exception for fixed expressions from the rel-upos-* tests, because these can be resolved with ExtPos in a more targeted manner. There will be thus more errors in more treebanks. All such treebanks will be put in the LEGACY status, giving their maintainers four years to fix the data.

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 18, 2024
nschneid added a commit to UniversalDependencies/UD_English-EWT that referenced this issue Nov 19, 2024
@nschneid
Copy link
Contributor Author

Thanks. Would it be worth adding a warning for ANY fixed head without ExtPos? Currently it doesn't flag that "according/VERB to/ADP" (fixed expression attaching as case) should have ExtPos but I think it's better with ExtPos=ADP.

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 19, 2024
@dan-zeman
Copy link
Member

Thanks. Would it be worth adding a warning for ANY fixed head without ExtPos? Currently it doesn't flag that "according/VERB to/ADP" (fixed expression attaching as case) should have ExtPos but I think it's better with ExtPos=ADP.

I don't know. But according to will probably be flagged in the next round. I am modifying the tests one-by-one, and rel-upos-case had not been modified when you were asking but it has been modified now.

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 19, 2024
dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 19, 2024
@nschneid
Copy link
Contributor Author

nschneid commented Nov 19, 2024

"According" is tagged VERB so it can attach as case or mark per the deverbal connectives policy.

dan-zeman added a commit to UniversalDependencies/tools that referenced this issue Nov 19, 2024
@dan-zeman
Copy link
Member

dan-zeman commented Nov 19, 2024

"According" is tagged VERB so it can attach as case or mark per the deverbal connectives policy.

Shouldn't it now use ExtPos=ADP?

@nschneid
Copy link
Contributor Author

nschneid commented Nov 19, 2024

There are two issues here: the general policy on VERBs as case/mark and the treatment of fixed expressions.

It looks like the validator change UniversalDependencies/tools@5d0d028 prohibits regarding, given, and such as VERBs attaching as case/mark. But the guidelines explicitly say it is OK and we never discussed repealing that in favor of ExtPos.

Assuming the single-word verbal connectives are allowed, my question was whether there should be a WARNING for any fixed expressions lacking ExtPos. I think that was the conclusion of the Core Group discussion.

@amir-zeldes
Copy link
Contributor

My recollection matches Nathan's - using ExtPos for single word 'case' would be a new policy.

@dan-zeman
Copy link
Member

Well, using ExtPos for single word case was the request with which Nathan started this thread – although that was a bit different because there the second word was omitted by mistake.

The change regarding VERBs can be reverted in the validator if desired. But the note that I had there from the time we discussed it in the core group was saying:

###!!! February 2022: Temporarily allow mark+VERB ("regarding"). In the future, it should be banned again
###!!! by default (and case+VERB too), but there should be a language-specific list of exceptions.

So now I thought that instead of implementing a language-specific list of exceptions, one could simply put ExtPos in the data.

@nschneid
Copy link
Contributor Author

I found a note from Dec. 9, 2021: "Remove the categorical prohibition [on VERB/mark]; Dan will add a lexical list of exceptions (but it may take time)"

Perhaps ExtPos is a more economical solution than adding a lexical list. We should discuss in our next meeting. A concern is that we may be moving too fast in making ExtPos mandatory in some circumstances where it wasn't previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement features universal UPOS Universal part-of-speech tags: definitions and examples
Projects
None yet
Development

No branches or pull requests

4 participants