-
-
Notifications
You must be signed in to change notification settings - Fork 905
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2150 from sparklemotion/2130-libxml2-html-strict-…
…mode HTML "strict" mode parsing --- **What problem is this PR intended to solve?** There are quite a few things being addressed in this PR, but the most significant bit is consistently obeying the `recover` parse option for HTML documents. As #2130 points out, the CRuby/libxml2 implementation was completely ignoring the `recover` parse option (also know as the `strict` option) for HTML documents. This PR makes the `recover` option behave identically for HTML documents as it does for XML documents. Related, though, the JRuby implementation was incorrectly applying the `recover` parse option in HTML documents, instead using `noerror` and `nowarning` to determine whether to raise an exception. This has been brought in line with the behavior described above. Also related, the `EncodingReader` class which detects encoding in HTML documents was silently swallowing document syntax errors if they occurred in the first "chunk" read from an IO object. This has also been fixed. This PR also introduces some smaller changes: - make JRuby and CRuby implementations consistent in how they handle comparing `XML::Node` to an `XML::Document` using the `<=>` operator - introduce minitest-reporters to the main test suite - skip some irrelevant tests, and restructure others to be faster - fix the annoying JRuby encoding test that fails on some Java JDKs - eating away at the edges of code formatting as I touched some of these rarely-touched files **Have you included adequate test coverage?** Yes. **Does this change affect the behavior of either the C or the Java implementations?** Yes, this changes the behavior of the `strict` (a.k.a. `norecover`) parse option in parsing HTML documents. I've chosen to classify this as a "bugfix that happens to change behavior that people may be depending upon" and so am introducing potentially breaking behavior into a minor release. Apologies to anyone who's inconvenienced by that.
- Loading branch information
Showing
15 changed files
with
734 additions
and
599 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.