Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent leading whitespace in markdown code blocks from being stripped #2203

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

peytondmurray
Copy link

This PR prevents leading whitespace from being stripped from markdown code blocks before they are parsed for highlighting. For certain languages, leading whitespace is syntactically significant. Fixes jupyter/nbviewer#1021.

I tried to go as far back as I could through the git history to figure out why this was added, but it appears in the earliest tag, 4.0.0. If there is a reason to strip leading whitespaces, it's not explicitly tested for in the test suite. If you have any context around this choice, I'd be interested in hearing about it.

I also added a test to check that whitespaces aren't stripped from code blocks. I've also added a RUF001 rule to ignore uses of ambiguous unicode characters in tests/exporters/test_html.py, because of the "ɩ" which is apparently a valid APL function.

@krassowski krassowski added the bug label Jan 6, 2025
@krassowski krassowski changed the title [BUG] Prevent leading whitespace in markdown code blocks from being stripped Prevent leading whitespace in markdown code blocks from being stripped Jan 6, 2025
@krassowski
Copy link
Member

Does it also fix #2156?

Copy link
Member

@krassowski krassowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change makes sense. Should we add an option to allow to opt-in into the old behaviour so that users who may have relied on it are not blocked from upgrading?

@peytondmurray
Copy link
Author

Does it also fix #2156?

I don't think so - this PR only addresses code blocks inside markdown. The issue above ☝️ is about being able to split statements across multiple code cells - for example, putting a single (empty) if, with, or while etc in one cell, and having indented code in subsequent cells further down be part of the scope of the first cell:

image

Maybe this is a discussion for that issue, but I'm not sure that makes sense to do here - at the moment, each cell needs to be a syntactically complete snippet of python code, which isn't true in the use case provided.

About your other suggestion: I like it, I'll add the option to pass arbitrary lexer options to the pygments lexers now!

@peytondmurray peytondmurray force-pushed the 1021-highlight-leading-whitespace branch from 5450e00 to e4d3cf1 Compare January 23, 2025 22:22
@peytondmurray
Copy link
Author

I've added a test for the lexer options. The user can revert to the old behavior by setting

HTMLExporter(lexer_options={"stripall": True})

The option should be documented by traitlets, so I haven't included anything extra about reverting back to the old behavior. But if some additional documentation would be helpful please let me know and I can add an explicit section about this to the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Leading whitespace removed in code blocks with syntax highlighting
2 participants