Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doctest cannot be used to test markdown files. #116546

Open
MusicalNinjaDad opened this issue Mar 9, 2024 · 5 comments · May be fixed by #116547
Open

doctest cannot be used to test markdown files. #116546

MusicalNinjaDad opened this issue Mar 9, 2024 · 5 comments · May be fixed by #116547
Labels
type-feature A feature request or enhancement

Comments

@MusicalNinjaDad
Copy link

MusicalNinjaDad commented Mar 9, 2024

Feature or enhancement

Proposal:

While core python uses reStructuredText, many packages maintain their documentation in markdown.

IDEs such as vscode show docstrings in popups when using a function and render markdown within those popups.

Doctest provides an amazing framework to easily validate the quality of examples given in docstrings and documentation, it is a best-of-breed approach which encourages documentation as code by allowing documentation-testing. (In the opinion of the author - thank you for creating it!)

Unfortunately doctest is currently incompatible with markdown documentation (and docstrings) as it considers the end of a code block, signified by triple-ticks (```) to be part of the expected output. This creates a barrier to developers to implementing best practices as they are not able to test their documentation or docstrings without being tied to a specific form of markup (rst).

Proposal

I have raised a draft pull request implementing one possible, and very simple, approach to providing compatibility with markdown codeblocks.

If you believe that this is a change needing wider discussion, then I will happily open a thread under discuss.python.org. Given the small nature of the change (1 line of library code, 100 lines of test case, 10 lines of documentation), I am awaiting feedback on this issue & draft PR, in case it is considered a simple, non-breaking, adjustment which does not require significant, wider discussion.

Alternatives considered

  1. Creating a dedicated parser.
    • Pros: Could be provided as a module outside the standard library; able to identify start and end of code blocks and use this information in structuring the results.
    • Cons: doctest.testmod() would need to be adjusted to allow for an optional parser keyword argument analog doctest.testfile(); higher barrier to entry for users, need to install a separate package and specifically select a different parser.
  2. Test framework plugins (e.g. pytest-doctest-mkdocstrings).
    • Pros: Can be provided as a module outside the standard library
    • Cons: Requires monkeypatching internals of doctest - prone to future failures; only works when tests are run via a separate test framework
  3. Providing additional markdown parsing within doctest.DocTestParser
    • Pros: Additional structure to doctest results.
    • Cons: Significant additional complexity, goes against the current simplicity and agnostic nature of doctest.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@MusicalNinjaDad MusicalNinjaDad added the type-feature A feature request or enhancement label Mar 9, 2024
@sobolevn
Copy link
Member

sobolevn commented Mar 9, 2024

I would not say that Markdown is not supported, it is. I use Markdown and doctests in multiple projects. Here's just one example: https://github.com/wemake-services/inspect313

But, yes. You have to keep in mind that triple backticks are treated as an output. So, you have to add an empty line before the backticks: https://github.com/wemake-services/inspect313/blob/1ed3f77059199aa17db4be0fe713315bda87d101/README.md#L46-L48

@MusicalNinjaDad
Copy link
Author

MusicalNinjaDad commented Mar 9, 2024

Yes, you can workaround by adding a blank line at the end but that adds an easy source of error, looks ugly in the code and I've run into cases where the block gets rendered with blank line. Sorry, I didn't include this point in the original description - I felt it was already quite long for such a small change.

@oscarbenjamin
Copy link
Contributor

There are also downstream issues in pytest (pytest-dev/pytest#7374) and pytest-doctestplus (scientific-python/pytest-doctestplus#221). This behaviour is inherited by those libraries as well since they depend on the stdlib doctest module for collecting the doctest fragments. The SymPy in-house doctest runner removes these (https://github.com/sympy/sympy/blob/d91b8ad6d36a59a879cc70e5f4b379da5fdd46ce/sympy/testing/runtests.py#L1827-L1831).

It looks like doctest was not designed for markdown files. It doesn't know anything about markdown backticks and just uses a regex starting with PS1 and ending with either PS1 or a blank line:

cpython/Lib/doctest.py

Lines 608 to 624 in 027fa2e

# This regular expression is used to find doctest examples in a
# string. It defines three groups: `source` is the source code
# (including leading indentation and prompts); `indent` is the
# indentation of the first (PS1) line of the source code; and
# `want` is the expected output (including leading indentation).
_EXAMPLE_RE = re.compile(r'''
# Source consists of a PS1 line followed by zero or more PS2 lines.
(?P<source>
(?:^(?P<indent> [ ]*) >>> .*) # PS1 line
(?:\n [ ]* \.\.\. .*)*) # PS2 lines
\n?
# Want consists of any non-blank lines that do not start with PS1.
(?P<want> (?:(?![ ]*$) # Not a blank line
(?![ ]*>>>) # Not a line starting with PS1
.+$\n? # But any other line
)*)
''', re.MULTILINE | re.VERBOSE)

@MusicalNinjaDad
Copy link
Author

It looks like this is worthy of a discussion, as it's clearly not something that will "just be OK" but may be of use to the community nonetheless. I'll open one in the morning when I have sufficient time at the keyboard.
@oscarbenjamin - what do you think of the proposed change to adjust the regex to:

    _EXAMPLE_RE = re.compile(r'''
        # Source consists of a PS1 line followed by zero or more PS2 lines.
        (?P<source>
            (?:^(?P<indent> [ ]*) >>>    .*)    # PS1 line
            (?:\n           [ ]*  \.\.\. .*)*)  # PS2 lines
        \n?
        # Want consists of any non-blank lines that do not start with PS1.
        (?P<want> (?:(?![ ]*$)    # Not a blank line
                     (?![ ]*```)  # Not end of a code block
                     (?![ ]*>>>)  # Not a line starting with PS1
                     .+$\n?       # But any other line
                  )*)
        ''', re.MULTILINE | re.VERBOSE)

@oscarbenjamin
Copy link
Contributor

what do you think of the proposed change to adjust the regex to:

The obvious failure case would be if the test output includes triple backticks. That doesn't seem like a big deal to me but I might be wrong.

I imagine that this is best handled outside of the stdlib in the first instance though. See scientific-python/pytest-doctestplus#247.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants