Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udapi-markbugs: fix multi-obj #34

Open
colinbatchelor opened this issue Oct 28, 2024 · 4 comments
Open

udapi-markbugs: fix multi-obj #34

colinbatchelor opened this issue Oct 28, 2024 · 4 comments
Assignees
Labels
2.15 Enhancements and fixes for November 2024 UD release 2.16 Changes for May 2025 UD release bug Something isn't working

Comments

@colinbatchelor
Copy link
Contributor

Not sure why this hasn't been caught by the other validation scripts...

@colinbatchelor colinbatchelor added the 2.15 Enhancements and fixes for November 2024 UD release label Oct 28, 2024
@colinbatchelor colinbatchelor self-assigned this Nov 26, 2024
@colinbatchelor colinbatchelor added bug Something isn't working 2.16 Changes for May 2025 UD release labels Nov 26, 2024
@colinbatchelor
Copy link
Contributor Author

In p05_017 (dev) both Mr Macaulay and £4/11 are tagged obj:

phàigh iad Maighstir MacAmhlaidh [...] ceithir nòtaichean 's aona tastan deug 'they paid Mr Macaulay [...] four pounds and eleven shillings'

This is annoying because you can remove either of them and the sentence still makes sense. Looking through the examples in DASG the amount paid is occasionally marked with le, what is being paid for is marked with air or airson and sometimes the payee is marked with do.

I wonder whether the best thing to do is therefore to say that the payee is an unmarked obl? This seems preferable to importing iobj into the Gaelic treebank for a very tiny class of verbs.

colinbatchelor added a commit that referenced this issue Dec 22, 2024
…l having been mistagged as mark:prt + ccomp rather than mark + advcl. Also fixed erroneous use of the word 'token' in not-to-release/validate_gd_extras.py.
colinbatchelor added a commit to colinbatchelor/gdbank that referenced this issue Dec 22, 2024
@colinbatchelor
Copy link
Contributor Author

colinbatchelor commented Dec 22, 2024

There are what look at first glance like some genuine multiple object cases where a verbal noun is preceded by the object and also the aspect marker ag combined with a personal possessive pronoun to form gam, gad, ga, gan, e.g. ns08_024 (dev)

a chunnaic ionnsaigh ga thoirt air boireannach a raoir 'who saw an attack carried out on a woman last night'

This is a bit like the problem discussed here: UniversalDependencies/UD_Welsh-CCG#3 but not exactly.

@colinbatchelor
Copy link
Contributor Author

Examples here: https://leacan.gla.ac.uk/leacan/?gd=ag show that this can be thought of a passive-like construction:

bha ionnsachadh na Gàidhlig ga bhacadh gu mòr 'the learning of Gaelic was greatly impeded' - here ionnsachadh na Gàidhlig is the subject
shuidheachaidhean far an cluinnear an cànan ga bruidhinn gu nàdarra 'situations where the language could be heard spoken naturally'

In the case above I'm treating ionnsaigh ga thoirt air boireannach a raoir as being an advcl:relcl attached to chunnaic.
Applying the same treatment to c01_001ab in dev:

bha a' chuid a b' fhearr an dà tharbh air an cumail ann an geata 'The better thing was to keep the two bulls in the gate'

@colinbatchelor
Copy link
Contributor Author

There are a few cases where the infinitive particle (a + lenition) has been tagged in the ARCOSG corpus as the possessive (also a + lenition). This has led to the multiple-object error because the possessive pronoun before a verbal noun is automatically tagged as the object, as it is in some constructions.

They have been retagged. In one sentence:

Tha mòran air a bhith càineadh co-dhùnadh an Riaghaltas dà stèisean maor-chladaich 'sa cheann a Tuath an dùnadh. 'Many have criticised the Government's decision to close two coastguard stations in the north'

the original text has an (the possessive plural pronoun, which does not lenite) in place of a + lenition. The CorrectForm will be marked in the next push.

colinbatchelor added a commit that referenced this issue Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.15 Enhancements and fixes for November 2024 UD release 2.16 Changes for May 2025 UD release bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant