-
Notifications
You must be signed in to change notification settings - Fork 20
References are being replaced into XML with wrong ref id #91
Comments
Probably being caused by https://github.com/pkp/xmlps/blob/master/module/ReferencesConversion/src/ReferencesConversion/Model/Converter/References.php#L311, which was done to avoid Pandoc breaking on non-numeric ref IDs. Need to think of a way to fix this without breaking inline ref IDs... |
Actually @kaschioudi , I'm noticing that we seem to be replacing the wrong Ref ID numbers into XML documents processed from PDF via Cermine too -- this might be a wider problem in our implementation... |
Actually, there are many issues with generating references. For example, all strings between For now I have wrote the Java code that parses all references in square brackets and put to them needed id`s. Also I thinks maybe it is better to write the Java app with JAXB library that will parse JATS after DOCX XSLT transformation and give well-formed JATS as output. |
We're aware that meTypeset overdetects parentheses as references -- I actually thought that this must be due to some recent changes we made to it as I've been noticing the problem more and more lately, but I tried reverting to an older version and the problem is still there, so it turns out it just never came up to this extent in our earlier testing. It's flagged as an issue. As for whether we're investing more effort into parsing pre-JATS transformation or post-JATS transformation, it's a balance to strike between "cleanness" and lossiness. |
@kaschioudi , I think I probably misspoke when I was asking you to fix #50 -- we can't be arbitrarily incrementing ref IDs like in https://github.com/pkp/xmlps/blob/master/module/ReferencesConversion/src/ReferencesConversion/Model/Converter/References.php#L315, we need to match them to the inline xref rid whenever we change them. Heidelberg's MPT script has a component which I believe is designed to recurse through meTypeset output and change UUIDs to integer ref IDs when needed, so I'm going to test that first: https://github.com/withanage/mpt/blob/master/static/tools/archive/postProcess.py#L502 |
Actually, I'm afraid MPT might be too convoluted to add to the workflow just for this -- let's see if we can handle it directly in ReferencesConversion. |
OK, this is working and merged into master! There still seem to be a few issues -- the attached doc has a few unmatched |
I need to track down where this is happening currently, but I've noticed that some Word documents have ref id's that look like id="ID83ddb41b-5d29-4b6f-b862-74e019db4ec7" after being processed by meTypeset, but wind up with "R56" after being output by our stack. Something is replacing the original IDs...
The text was updated successfully, but these errors were encountered: