Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to add a check for internal referential integrity on the Guidelines HTML #1563

Open
martindholmes opened this issue Dec 30, 2016 · 17 comments

Comments

@martindholmes
Copy link
Contributor

As far as @sydb and I can tell, there is no check run on the HTML of the Guidelines to test that all internal links are working. The Guidelines-Link-Check task on Jenkins checks the external links, but something should check that internal links are working. This could be run as part of the TEIP5-Documentation[-dev] job.

@hcayless
Copy link
Member

hcayless commented May 9, 2017

Internal links involve a possibly unhealthy degree of magic in any case (same-document links to other documents, e.g.). We might need to resolve that issue before we can properly do link-checking.

@martindholmes
Copy link
Contributor Author

One way to do it might be to check links in the HTML output.

@tuurma
Copy link
Contributor

tuurma commented Nov 17, 2017

@hcayless suggests that running a ccheck on compiled P5 would be another option

@raffazizzi
Copy link
Contributor

Added @martindholmes in hope he'll be willing to help

@martindholmes
Copy link
Contributor Author

I've invited @joeytakeda to join the TEI Contributors team to work with me on this one, because we developed the diagnostics code to do this together.

@hcayless
Copy link
Member

Actually, doesn't the ePub build essentially do this? (Says he after spending a big chunk of Friday fixing broken links)

@martindholmes
Copy link
Contributor Author

If the ePub build were checking everything, then the regular build process would break every time there's a the Guidelines-Link-Check job was failing, but that's not the case. Won't the ePub check only catch things that make it into the English version of the Guidelines?

@martindholmes
Copy link
Contributor Author

What we should do is to extract the internal-link-specific checking from here:

https://github.com/projectEndings/diagnostics/blob/master/xsl/diagnostics_master.xsl

and build that into the Test process, running against the compiled P5.

@peterstadler peterstadler added this to the Guidelines 3.7.0 milestone Sep 16, 2019
@martindholmes
Copy link
Contributor Author

Having run that diagnostic on p5.xml (it's designed to run on TEI, rather than on HTML), I found a bunch of issues, most of which are now fixed, but I think it would make sense to take the same approach to creating an XHTML version. The diagnostic isn't designed to work in a headless context; it generates an HTML report and opens it in your browser. We would want something that simply emits error messages that would cause the build to fail (or perhaps just warnings). That's straightforward to do. Just takes time...

@hcayless
Copy link
Member

I've had some success in writing XQuery to find unresolved references in p5.xml. It strikes me that the converse should also be checked: find items in the bibliography that are no longer referred to. Those shouldn't necessarily be removed, but they might be candidates for removal.

Fwiw, here's my (rather rudimentary) XQuery:

declare namespace eg="http://www.tei-c.org/ns/Examples";
for $e in //*/@*[starts-with(.,'#')]
where not(//*[@xml:id = substring-after($e,'#')])
where not($e/ancestor::eg:egXML)
return $e/parent::*

Something like this could be run as a test on p5.xml to turn up broken links. Ideally, the output should be empty.

@hcayless
Copy link
Member

Here's an XQuery that finds <bibl> and <biblStruct> elements in the Bibliography (not in the Reading List portion) that are no longer linked to and should be candidates for updating or purging:

declare namespace t="http://www.tei-c.org/ns/1.0";
for $bib in (//t:div[@xml:id='BIB']//t:bibl/@xml:id | //t:div[@xml:id='BIB']//t:biblStruct[not(ancestor::t:div[@xml:id='BIB-RDG'])]/@xml:id)
where not(//*[@target = concat('#',$bib) or @source=concat('#',$bib)])
return $bib/parent::*

@sydb
Copy link
Member

sydb commented Feb 16, 2021

See #1476 (comment), related (mostly inasmuch as it provides evidence that yes, such a check is a good idea).

@martinascholger martinascholger removed this from the Guidelines 4.3.0 milestone Aug 31, 2021
@martindholmes
Copy link
Contributor Author

Just ran the diagnostics XSL against a newly-built p5.xml and found the following:

Bad internal link:
target: #ISO24611

Bad @xml:lang values:
xml:lang: lat
xml:lang: lat
xml:lang: cornu
xml:lang: cornu

MIME types not listed in the IANA mime types list:
mimeType: audio/wav
mimeType: image/jpeg
mimeType: audio/wav
mimeType: image/gif
mimeType: application/x-musescore
mimeType: text/xsl

I'll check these out and fix if necessary.

@martindholmes
Copy link
Contributor Author

I've fixed the first two lots, but the media types issue is a bit fraught; although the listed ones are not in the IANA media types listing page, they do show up in other locations, so I've left them alone. The "cornu" language value was wrong; "cornu" is a variant subtag meaning Cornish-English, and I'm pretty sure that the intent for the word in question was Middle Cornish, which has the subtag "cnx", so I've changed it to that.

For the broken pointer, I added a new item to the bibliography.

We can't close this ticket because we don't have a working automated solution yet, but we can consider it done for the upcoming GL release.

@ebeshero ebeshero modified the milestones: Guidelines 4.6.0, Guidelines 4.7.0 Apr 3, 2023
@raffazizzi raffazizzi modified the milestones: Guidelines 4.7.0, Guidelines 4.8.0 Nov 16, 2023
@martindholmes
Copy link
Contributor Author

Just ran the diagnostic on a fresh p5.xml, and found these pointers to non-existent things:

target: #CMC_Mocoda2
target: #TEIcmc2012
target: #descmr

I'm going to check into those now.

@martindholmes
Copy link
Contributor Author

Fixed two of them, and raised #2649 for a missing bibliography item.

@martindholmes
Copy link
Contributor Author

@ebeshero I don't think this needs a release milestone, since it's not something to be included in a release of the Guidelines as such; it's a diagnostic test that we want to add to our build process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment