-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need to add a check for internal referential integrity on the Guidelines HTML #1563
Comments
Internal links involve a possibly unhealthy degree of magic in any case (same-document links to other documents, e.g.). We might need to resolve that issue before we can properly do link-checking. |
One way to do it might be to check links in the HTML output. |
@hcayless suggests that running a ccheck on compiled P5 would be another option |
Added @martindholmes in hope he'll be willing to help |
I've invited @joeytakeda to join the TEI Contributors team to work with me on this one, because we developed the diagnostics code to do this together. |
Actually, doesn't the ePub build essentially do this? (Says he after spending a big chunk of Friday fixing broken links) |
If the ePub build were checking everything, then the regular build process would break every time there's a the Guidelines-Link-Check job was failing, but that's not the case. Won't the ePub check only catch things that make it into the English version of the Guidelines? |
What we should do is to extract the internal-link-specific checking from here: https://github.com/projectEndings/diagnostics/blob/master/xsl/diagnostics_master.xsl and build that into the Test process, running against the compiled P5. |
Having run that diagnostic on p5.xml (it's designed to run on TEI, rather than on HTML), I found a bunch of issues, most of which are now fixed, but I think it would make sense to take the same approach to creating an XHTML version. The diagnostic isn't designed to work in a headless context; it generates an HTML report and opens it in your browser. We would want something that simply emits error messages that would cause the build to fail (or perhaps just warnings). That's straightforward to do. Just takes time... |
I've had some success in writing XQuery to find unresolved references in p5.xml. It strikes me that the converse should also be checked: find items in the bibliography that are no longer referred to. Those shouldn't necessarily be removed, but they might be candidates for removal. Fwiw, here's my (rather rudimentary) XQuery: declare namespace eg="http://www.tei-c.org/ns/Examples";
for $e in //*/@*[starts-with(.,'#')]
where not(//*[@xml:id = substring-after($e,'#')])
where not($e/ancestor::eg:egXML)
return $e/parent::* Something like this could be run as a test on p5.xml to turn up broken links. Ideally, the output should be empty. |
Here's an XQuery that finds declare namespace t="http://www.tei-c.org/ns/1.0";
for $bib in (//t:div[@xml:id='BIB']//t:bibl/@xml:id | //t:div[@xml:id='BIB']//t:biblStruct[not(ancestor::t:div[@xml:id='BIB-RDG'])]/@xml:id)
where not(//*[@target = concat('#',$bib) or @source=concat('#',$bib)])
return $bib/parent::* |
See #1476 (comment), related (mostly inasmuch as it provides evidence that yes, such a check is a good idea). |
Just ran the diagnostics XSL against a newly-built p5.xml and found the following: Bad internal link: Bad MIME types not listed in the IANA mime types list: I'll check these out and fix if necessary. |
I've fixed the first two lots, but the media types issue is a bit fraught; although the listed ones are not in the IANA media types listing page, they do show up in other locations, so I've left them alone. The "cornu" language value was wrong; "cornu" is a variant subtag meaning Cornish-English, and I'm pretty sure that the intent for the word in question was Middle Cornish, which has the subtag "cnx", so I've changed it to that. For the broken pointer, I added a new item to the bibliography. We can't close this ticket because we don't have a working automated solution yet, but we can consider it done for the upcoming GL release. |
Just ran the diagnostic on a fresh p5.xml, and found these pointers to non-existent things:
I'm going to check into those now. |
Fixed two of them, and raised #2649 for a missing bibliography item. |
@ebeshero I don't think this needs a release milestone, since it's not something to be included in a release of the Guidelines as such; it's a diagnostic test that we want to add to our build process. |
As far as @sydb and I can tell, there is no check run on the HTML of the Guidelines to test that all internal links are working. The Guidelines-Link-Check task on Jenkins checks the external links, but something should check that internal links are working. This could be run as part of the TEIP5-Documentation[-dev] job.
The text was updated successfully, but these errors were encountered: