Make Vale's format support extensible #769

jdkato · 2024-02-15T16:25:28Z

See #688 (comment) for a related comment.

XVilka · 2024-03-15T13:08:25Z

One way to do that is to work with Tree-Sitter grammars the way NeoVim does—it will allow you to analyze comments in every language, depending on the syntax or so-called "injections".

weaversam8 · 2024-04-30T21:38:24Z

+1 for tree-sitter- there are grammars for nearly every language and tree-sitter makes it easy to design your own.

jdkato · 2024-06-27T05:33:15Z

Many of Vale's supported programming languages now use Tree-sitter parsers.

I think an interesting way to make this "extensible" would be to expose the underlying queries, allowing users to decide exactly what parts of the file is linted.

feasgal · 2024-09-10T19:07:25Z

+1000 for this issue. Here's my use case.

Vale ignores indented blocks, fenced blocks, and code spans by default. (docs)

We use MkDocs to generate our documentation. One of the most common themes for MkDocs (and the one we use) is the Material theme, where line indentation can indicate many things other than a code block. Currently, vale ignores content in an admonition, a longer footnote, a definition list with more than one paragraph, a nested annotation, or content tabs, to name a few common examples.

We have a convention never to use indentation in the markdown to format a code block in the built HTML, but only use backtick fences. This matches Material's instructions, which don't even mention that indentation would work (presumably to avoid confusion with the many other ways it uses indentation). If code ignoring were not only configurable, but separately configurable for indented blocks, fenced blocks, and code spans, that would solve the problem completely.

But even if it were just possible to configure the whole code ignore as true or false, I could at least turn it off and then write a custom rule to ignore only the fences and/or code spans.

geemus · 2024-11-14T15:56:38Z

I have maybe a slightly different angle of something I think I'd like here.

We use golden files for testing our CLI output and I'm interested in having the ability to lint these to help us enforce a style guide. The output doesn't really confirm to an existing format type, though it is vaguely markdown-like.

As a first pass, I've just mapped .golden to .md for vale, which broadly works but doesn't really provide the kind of scoping I would hope for. Unfortunately, something like treesitter isn't going to either since it isn't really something we can do in an existing format per se.

What I think I might want would be writing and integrating a script to convert from our format to a known-good/parseable format. I think for instance that this is more along the lines of what the asciidoc/asciidoctor format does (converting to HTML5 if I'm not mistaken). Is this an area of possible extension as well?

In lieu of that, I can do that conversion and linting through scripting, but it wasn't apparent to me how I would do that with lsp setup in my neovim. If there is some more direct way to do this already available I'd also be all ears.

jdkato · 2025-01-07T06:39:34Z

I've put some work into a draft of support for JSON, YAML, and TOML -- a much requested feature (#383, #412, #911, #922).

As I've said in those issues, the challenge here is that having a syntactic-level understanding of the format isn't enough: we need to know which key(s) we want to lint and how to lint them.

For example, consider a simple OpenAPI file:

openapi: 3.0.0
info:
  title: Sample API
  description: Optional multiline or single-line description in [CommonMark](http://commonmark.org/help/) or HTML.
  version: 0.1.9

servers:
  - url: http://api.example.com/v1
    description: Optional server description, e.g. Main (production) serrver
  - url: http://staging-api.example.com
    description: Optional server description, e.g. Internal staging

In such a file, we might only want to lint the title and description values while ignoring everything else. It's possible to achieve this with an external script that calls Vale at appropriate times, but maintaining location information is non-trivial.

My solution here is a concept with the working title of "blueprints." A blueprint is a YAML file that allows you specify an arbitrary subset of a file to lint using Dasel:

# OpenAPI.yml
select:
  - name: Metadata
    query: info.description

  - name: Servers
    query: servers.all().description

You then reference this file in your .vale.ini:

[formats]
# Treat the extracted text as Markdown.
yml = md

[*.yml]
BasedOnStyles = Vale

Blueprint = OpenAPI

The end result is the ability to lint any keys you want, while maintaining location information and support for nested markup content (commonly HTML or Markdown).

aireilly · 2025-01-07T11:11:36Z

^^ That looks excellent and exactly the kind of thing I would imagine using while working with an engineering team and trying to lint these files.

alecthegeek · 2025-01-07T20:30:08Z

Different files may need different blueprints.

For example I might have OpenAPI YAML file(s) and GitLab CI/Cd YAML file in my project.

Can these blueprints be applied at that level of granularity? e.g.

[formats]
# Treat the extracted text as Markdown.
yml = md

[/api/petstore.yml]
BasedOnStyles = Vale

Blueprint = OpenAPI

[/.gitlab-ci.yml]
BasedOnStyles = Vale

Blueprint = GitLab-CI

If we wanted to be greedy we might even ask for wildcard support (consider an OpenAPI spec that is in located in multiple files) as well as named multiple files. For example

[formats]
# Treat the extracted text as Markdown.
yml = md

[/api/**.yml]
; All yaml files under the API directory. Includes all subdirectories
BasedOnStyles = Vale

Blueprint = OpenAPI

[{/.gitlab-ci.yml,/.gitlab/gitlab-ci-production.yml}]
; Our CI/CD config is located in two specific files
BasedOnStyles = Vale

Blueprint = GitLab-CI

jdkato · 2025-01-08T00:35:54Z

Yes, any form of typical .vale.ini configuration will be supported.

I've also changed the file format slightly, so that you can (1) assign a custom scope to any values you've extracted and (2) use tree-sitter queries on supported formats:

# OpenAPI.yml
engine: dasel
steps:
    # This allows us to, for example, write rules specifically
    # targeting titles. 
  - scope: title
    query: info.title
  - query: info.description
  - query: servers.all().description

# Python.yml
engine: tree-sitter
# An example that changes how Python files are linted.
steps:
  - query: (comment)+ @comment

jdkato · 2025-01-08T00:43:14Z

@feasgal, @geemus: Both of those cases could be addressed by allowing the user to provide their own conversion script -- e.g., you could provide a Python script that uses PyMdown Extensions to ensure support with MkDocs.

I've avoided such a feature for two reasons: calling external scripts is slow and it means that packages would then be able to execute arbitrary code.

One idea would be to not allow such scripts to be included in a package, meaning users would have to install it independently (similar to how we handle rst2html and asciidoctor).

geemus · 2025-01-09T14:26:34Z

@jdkato That concern about external makes good sense to me. Similarly, I think for my purposes having a way to in-effect plug in my own asciidoctor-like script would be super helpful. Keeping it as something that the user would need to explicitly/distinctly do and not a default also seems quite sensible to mitigate security concerns.

infotexture · 2025-01-09T14:29:38Z

@jdkato Thanks for investing the additional effort here. 🙇

The "blueprints" idea sounds very versatile, and should be able to solve many of the use cases in the linked issues. 🙏

From what you wrote above, I assume some sort of wildcard syntax could be used in blueprints to ignore the keys in each "key": "value" pair, and only run on the values as discussed in #922.

infotexture · 2025-01-09T14:40:27Z

The working title of “blueprints” evokes building something up based on a template, which sounds like an additive process.

If the idea is to extract something from whatever passes through it, you might consider calling these “filters”, since you'd effectively be filtering out a subset of the file to lint.

jdkato · 2025-01-09T19:36:36Z

From what you wrote above, I assume some sort of wildcard syntax could be used in blueprints to ignore the keys in each "key": "value" pair, and only run on the values as discussed in #922.

You'll be able to use wildcard syntax in your .vale.ini to assign a blueprint to file(s).

The concept itself involves writing queries that use keys to select specific values to lint. The keys themselves aren't part of the linting process though.

It's similar to using jq, for those that are familiar.

The working title of “blueprints” evokes building something up based on a template, which sounds like an additive process.

If the idea is to extract something from whatever passes through it, you might consider calling these “filters”, since you'd effectively be filtering out a subset of the file to lint.

I agree mostly. Part of the challenge with naming is not reusing terms from Vale's ever growing list of features. In this case, "filters" are used to describe expressions that allow you to report a subset of your configuration,

infotexture · 2025-01-10T16:34:11Z

The concept itself involves writing queries that use keys to select specific values to lint.
The keys themselves aren't part of the linting process though.

That should work well, as long as there's a way to use some sort of wildcard pattern in the blueprint to lint the values for all the keys without having to specify each key by name (which can change over time and may not be known).

"filters" are used to describe expressions that allow you to report a subset of your configuration

Got it. Hadn't used those yet, so wasn't aware that term is already "taken". Makes sense.

jdkato added the Type: Enhancement label Feb 15, 2024

jdkato mentioned this issue Aug 25, 2024

lint more than comments in code #884

Closed

1 task

jdkato mentioned this issue Nov 13, 2024

Option to ignore keys in JSON/YAML #922

Closed

1 task

jdkato pinned this issue Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Vale's format support extensible #769

Make Vale's format support extensible #769

jdkato commented Feb 15, 2024 •

edited

Loading

XVilka commented Mar 15, 2024

weaversam8 commented Apr 30, 2024

jdkato commented Jun 27, 2024

feasgal commented Sep 10, 2024

geemus commented Nov 14, 2024

jdkato commented Jan 7, 2025

aireilly commented Jan 7, 2025 •

edited

Loading

alecthegeek commented Jan 7, 2025 •

edited

Loading

jdkato commented Jan 8, 2025 •

edited

Loading

jdkato commented Jan 8, 2025 •

edited

Loading

geemus commented Jan 9, 2025

infotexture commented Jan 9, 2025

infotexture commented Jan 9, 2025

jdkato commented Jan 9, 2025

infotexture commented Jan 10, 2025

Make Vale's format support extensible #769

Make Vale's format support extensible #769

Comments

jdkato commented Feb 15, 2024 • edited Loading

XVilka commented Mar 15, 2024

weaversam8 commented Apr 30, 2024

jdkato commented Jun 27, 2024

feasgal commented Sep 10, 2024

geemus commented Nov 14, 2024

jdkato commented Jan 7, 2025

aireilly commented Jan 7, 2025 • edited Loading

alecthegeek commented Jan 7, 2025 • edited Loading

jdkato commented Jan 8, 2025 • edited Loading

jdkato commented Jan 8, 2025 • edited Loading

geemus commented Jan 9, 2025

infotexture commented Jan 9, 2025

infotexture commented Jan 9, 2025

jdkato commented Jan 9, 2025

infotexture commented Jan 10, 2025

jdkato commented Feb 15, 2024 •

edited

Loading

aireilly commented Jan 7, 2025 •

edited

Loading

alecthegeek commented Jan 7, 2025 •

edited

Loading

jdkato commented Jan 8, 2025 •

edited

Loading

jdkato commented Jan 8, 2025 •

edited

Loading