References are being replaced into XML with wrong ref id #91

axfelix · 2017-01-26T00:24:31Z

I need to track down where this is happening currently, but I've noticed that some Word documents have ref id's that look like id="ID83ddb41b-5d29-4b6f-b862-74e019db4ec7" after being processed by meTypeset, but wind up with "R56" after being output by our stack. Something is replacing the original IDs...

axfelix · 2017-01-26T00:36:09Z

Probably being caused by https://github.com/pkp/xmlps/blob/master/module/ReferencesConversion/src/ReferencesConversion/Model/Converter/References.php#L311, which was done to avoid Pandoc breaking on non-numeric ref IDs. Need to think of a way to fix this without breaking inline ref IDs...

axfelix · 2017-01-26T00:39:51Z

MartinPaulEve/meTypeset#104

axfelix · 2017-01-30T18:04:05Z

Actually @kaschioudi , I'm noticing that we seem to be replacing the wrong Ref ID numbers into XML documents processed from PDF via Cermine too -- this might be a wider problem in our implementation...

Vitaliy-1 · 2017-02-03T12:24:19Z

Actually, there are many issues with generating references. For example, all strings between ( ) are considering with meTypeset as citation, which is really inconvenient. Placing citations in square brackets do not solve the problem, because occasionally numbers not in square brackets also are parsed as references.

For now I have wrote the Java code that parses all references in square brackets and put to them needed id`s. Also I thinks maybe it is better to write the Java app with JAXB library that will parse JATS after DOCX XSLT transformation and give well-formed JATS as output.

axfelix · 2017-02-05T18:55:54Z

We're aware that meTypeset overdetects parentheses as references -- I actually thought that this must be due to some recent changes we made to it as I've been noticing the problem more and more lately, but I tried reverting to an older version and the problem is still there, so it turns out it just never came up to this extent in our earlier testing. It's flagged as an issue.

As for whether we're investing more effort into parsing pre-JATS transformation or post-JATS transformation, it's a balance to strike between "cleanness" and lossiness.

axfelix · 2017-02-05T18:58:26Z

@kaschioudi , I think I probably misspoke when I was asking you to fix #50 -- we can't be arbitrarily incrementing ref IDs like in https://github.com/pkp/xmlps/blob/master/module/ReferencesConversion/src/ReferencesConversion/Model/Converter/References.php#L315, we need to match them to the inline xref rid whenever we change them.

Heidelberg's MPT script has a component which I believe is designed to recurse through meTypeset output and change UUIDs to integer ref IDs when needed, so I'm going to test that first: https://github.com/withanage/mpt/blob/master/static/tools/archive/postProcess.py#L502

axfelix · 2017-02-05T19:10:53Z

Actually, I'm afraid MPT might be too convoluted to add to the workflow just for this -- let's see if we can handle it directly in ReferencesConversion.

fixes for #91 and #92

axfelix · 2017-03-29T21:32:39Z

OK, this is working and merged into master!

There still seem to be a few issues -- the attached doc has a few unmatched rid="ref5" attributes, but all of the xrefs following the pattern rid="R20" are now matched. Not sure what's causing the difference between "R#" and "ref#" but will look into it. Have removed the branch for now because it was mixed in with #92 when I did the merge, but leaving the issue open.

document.xml.txt
33345-106540-1-PB.pdf

kaschioudi added a commit that referenced this issue Feb 22, 2017

update xref's rid in body when reference ref id is replaced #91

748481b

axfelix added a commit that referenced this issue Mar 28, 2017

Merge pull request #93 from pkp/92-fix

2a64cb7

fixes for #91 and #92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

References are being replaced into XML with wrong ref id #91

References are being replaced into XML with wrong ref id #91

axfelix commented Jan 26, 2017

axfelix commented Jan 26, 2017

axfelix commented Jan 26, 2017

axfelix commented Jan 30, 2017

Vitaliy-1 commented Feb 3, 2017

axfelix commented Feb 5, 2017

axfelix commented Feb 5, 2017 •

edited

Loading

axfelix commented Feb 5, 2017

axfelix commented Mar 29, 2017

References are being replaced into XML with wrong ref id #91

References are being replaced into XML with wrong ref id #91

Comments

axfelix commented Jan 26, 2017

axfelix commented Jan 26, 2017

axfelix commented Jan 26, 2017

axfelix commented Jan 30, 2017

Vitaliy-1 commented Feb 3, 2017

axfelix commented Feb 5, 2017

axfelix commented Feb 5, 2017 • edited Loading

axfelix commented Feb 5, 2017

axfelix commented Mar 29, 2017

axfelix commented Feb 5, 2017 •

edited

Loading