-
-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Vale's format support extensible #769
Comments
One way to do that is to work with Tree-Sitter grammars the way NeoVim does—it will allow you to analyze comments in every language, depending on the syntax or so-called "injections". |
+1 for tree-sitter- there are grammars for nearly every language and tree-sitter makes it easy to design your own. |
Many of Vale's supported programming languages now use Tree-sitter parsers. I think an interesting way to make this "extensible" would be to expose the underlying queries, allowing users to decide exactly what parts of the file is linted. |
We use MkDocs to generate our documentation. One of the most common themes for MkDocs (and the one we use) is the Material theme, where line indentation can indicate many things other than a code block. Currently, vale ignores content in an admonition, a longer footnote, a definition list with more than one paragraph, a nested annotation, or content tabs, to name a few common examples. We have a convention never to use indentation in the markdown to format a code block in the built HTML, but only use backtick fences. This matches Material's instructions, which don't even mention that indentation would work (presumably to avoid confusion with the many other ways it uses indentation). If code ignoring were not only configurable, but separately configurable for indented blocks, fenced blocks, and code spans, that would solve the problem completely. But even if it were just possible to configure the whole code ignore as true or false, I could at least turn it off and then write a custom rule to ignore only the fences and/or code spans. |
I have maybe a slightly different angle of something I think I'd like here. We use golden files for testing our CLI output and I'm interested in having the ability to lint these to help us enforce a style guide. The output doesn't really confirm to an existing format type, though it is vaguely markdown-like. As a first pass, I've just mapped .golden to .md for vale, which broadly works but doesn't really provide the kind of scoping I would hope for. Unfortunately, something like treesitter isn't going to either since it isn't really something we can do in an existing format per se. What I think I might want would be writing and integrating a script to convert from our format to a known-good/parseable format. I think for instance that this is more along the lines of what the asciidoc/asciidoctor format does (converting to HTML5 if I'm not mistaken). Is this an area of possible extension as well? In lieu of that, I can do that conversion and linting through scripting, but it wasn't apparent to me how I would do that with lsp setup in my neovim. If there is some more direct way to do this already available I'd also be all ears. |
I've put some work into a draft of support for JSON, YAML, and TOML -- a much requested feature (#383, #412, #911, #922). As I've said in those issues, the challenge here is that having a syntactic-level understanding of the format isn't enough: we need to know which key(s) we want to lint and how to lint them. For example, consider a simple OpenAPI file: openapi: 3.0.0
info:
title: Sample API
description: Optional multiline or single-line description in [CommonMark](http://commonmark.org/help/) or HTML.
version: 0.1.9
servers:
- url: http://api.example.com/v1
description: Optional server description, e.g. Main (production) serrver
- url: http://staging-api.example.com
description: Optional server description, e.g. Internal staging In such a file, we might only want to lint the My solution here is a concept with the working title of "blueprints." A blueprint is a YAML file that allows you specify an arbitrary subset of a file to lint using Dasel: # OpenAPI.yml
select:
- name: Metadata
query: info.description
- name: Servers
query: servers.all().description You then reference this file in your [formats]
# Treat the extracted text as Markdown.
yml = md
[*.yml]
BasedOnStyles = Vale
Blueprint = OpenAPI The end result is the ability to lint any keys you want, while maintaining location information and support for nested markup content (commonly HTML or Markdown). |
^^ That looks excellent and exactly the kind of thing I would imagine using while working with an engineering team and trying to lint these files. |
Different files may need different blueprints. For example I might have OpenAPI YAML file(s) and GitLab CI/Cd YAML file in my project. Can these blueprints be applied at that level of granularity? e.g. [formats]
# Treat the extracted text as Markdown.
yml = md
[/api/petstore.yml]
BasedOnStyles = Vale
Blueprint = OpenAPI
[/.gitlab-ci.yml]
BasedOnStyles = Vale
Blueprint = GitLab-CI If we wanted to be greedy we might even ask for wildcard support (consider an OpenAPI spec that is in located in multiple files) as well as named multiple files. For example [formats]
# Treat the extracted text as Markdown.
yml = md
[/api/**.yml]
; All yaml files under the API directory. Includes all subdirectories
BasedOnStyles = Vale
Blueprint = OpenAPI
[{/.gitlab-ci.yml,/.gitlab/gitlab-ci-production.yml}]
; Our CI/CD config is located in two specific files
BasedOnStyles = Vale
Blueprint = GitLab-CI |
Yes, any form of typical I've also changed the file format slightly, so that you can (1) assign a custom scope to any values you've extracted and (2) use tree-sitter queries on supported formats: # OpenAPI.yml
engine: dasel
steps:
# This allows us to, for example, write rules specifically
# targeting titles.
- scope: title
query: info.title
- query: info.description
- query: servers.all().description # Python.yml
engine: tree-sitter
# An example that changes how Python files are linted.
steps:
- query: (comment)+ @comment |
@feasgal, @geemus: Both of those cases could be addressed by allowing the user to provide their own conversion script -- e.g., you could provide a Python script that uses PyMdown Extensions to ensure support with MkDocs. I've avoided such a feature for two reasons: calling external scripts is slow and it means that packages would then be able to execute arbitrary code. One idea would be to not allow such scripts to be included in a package, meaning users would have to install it independently (similar to how we handle |
@jdkato That concern about external makes good sense to me. Similarly, I think for my purposes having a way to in-effect plug in my own |
@jdkato Thanks for investing the additional effort here. 🙇 The "blueprints" idea sounds very versatile, and should be able to solve many of the use cases in the linked issues. 🙏 From what you wrote above, I assume some sort of wildcard syntax could be used in blueprints to ignore the keys in each |
The working title of “blueprints” evokes building something up based on a template, which sounds like an additive process. If the idea is to extract something from whatever passes through it, you might consider calling these “filters”, since you'd effectively be filtering out a subset of the file to lint. |
You'll be able to use wildcard syntax in your The concept itself involves writing queries that use keys to select specific values to lint. The keys themselves aren't part of the linting process though. It's similar to using
I agree mostly. Part of the challenge with naming is not reusing terms from Vale's ever growing list of features. In this case, "filters" are used to describe expressions that allow you to report a subset of your configuration, |
That should work well, as long as there's a way to use some sort of wildcard pattern in the blueprint to lint the values for all the keys without having to specify each key by name (which can change over time and may not be known).
Got it. Hadn't used those yet, so wasn't aware that term is already "taken". Makes sense. |
See #688 (comment) for a related comment.
The text was updated successfully, but these errors were encountered: