diff --git a/README.md b/README.md new file mode 100644 index 000000000..9a4064b23 --- /dev/null +++ b/README.md @@ -0,0 +1,28 @@ +# Rzk blog + +## Setup + +When setting up MkDocs for the first time, it is recommended to create a Python venv and activate it: +```bash +python -m venv venv +venv\Scripts\Activate.ps1 +``` + +The activation command will depend on the OS. The command shown above is for PowerShell on Windows. + +Then, install the requirements: +```bash +pip install -r docs/requirements.txt +``` + +## Running + +Activate the venv (Windows - PowerShell): +```bash +venv\Scripts\Activate.ps1 +``` + +Run the dev server: +```bash +mkdocs serve --config-file docs/config/en/mkdocs.yml +``` diff --git a/docs/docs/en/.authors.yml b/docs/docs/en/.authors.yml index 1981101f3..b96482b5c 100644 --- a/docs/docs/en/.authors.yml +++ b/docs/docs/en/.authors.yml @@ -4,3 +4,8 @@ authors: description: Random guy avatar: https://avatars.githubusercontent.com/u/686582 url: https://github.com/fizruk + aabounegm: + name: Abdelrahman Abounegm + description: Language Server maintainer + avatar: https://avatars.githubusercontent.com/u/11016151 + url: https://github.com/aabounegm diff --git a/docs/docs/en/posts/2023-12-24-formatter.rzk.md b/docs/docs/en/posts/2023-12-24-formatter.rzk.md new file mode 100644 index 000000000..9fdd44c8a --- /dev/null +++ b/docs/docs/en/posts/2023-12-24-formatter.rzk.md @@ -0,0 +1,254 @@ +--- +authors: + - aabounegm +date: 2023-12-24 +--- + +# Rzk Formatter + +Version 0.7.0 (and later 0.7.3 with bug fixes) of Rzk has recently been released, +and it brings with it an exciting new feature: the auto formatter! +It has been under work for over 3 weeks and came with unique design challenges +that I will attempt to unpack in this post. + + + +## Goal + +Basically, the target is to write a program that takes in Rzk code as input and +makes sure it abides by the [style guide](https://rzk-lang.github.io/sHoTT/STYLEGUIDE/) (developed mainly by [Fredrik Bakke](https://github.com/fredrik-bakke) and other contributors to the [sHoTT project](http://github.com/rzk-lang/sHoTT)) with minimal modification otherwise. +For example, comments should stay intact and semantics should be preserved. + +We decided to not support any customization (at least initially), making the +formatter heavily opinionated. + +## Challenges + +Designing a formatter has both technical and nontechnical challenges. +From the nontechnical side, and especially since the formatter is not customizable, +there is the challenge of having the formatting requirements formulated in a +precise enough language that can be translated easily into code, assuming there +is even consensus among the community on these requirements. + +From the technical perspective, the parser generator currently used (BNFC) loses +a lot of the information necessary for reconstructing the source program, so a +pretty-printer based on the parse result is not feasible (for example, all comments would be lost). +Even after this difficulty is overcome, there are still more challenges regarding the formatting itself. +For example, if a given line is too long and it is necessary to wrap it by +inserting a new line somewhere in the line, there are $O(2^n)$ possible choices +(where $n$ is the number of locations in the code where a new line could be inserted) +to compare between for inserting line breaks. +For example, consider the following Rzk code: +```rzk +#define id (A : U) (x : A) : A := x +``` +which has 11 positions where a new line can be inserted, as shown by these markers: +```rzk +#define id (A : U) (x : A) : A := x +-- ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ +``` +This amounts to 2048 combinations that have to be ranked for such a simple line. +Even the ranking itself is not a straightforward problem to tackle. + +Additionally, to properly indent each line, it is necessary to understand the +program semantics, which increases the complexity of the formatter's parser. + +## Possible approaches (and their tradeoffs) + +Multiple possible solutions exist for the above difficulties, each with their pros and cons. +The alternatives we considered for Rzk's formatter can be summarized as follows: + +1. Implement a [Prettier](https://prettier.io/) plugin +1. Reuse the existing parser generated by BNFC +1. Write a dedicated parser for the formatter specifically +1. Reuse the exiting lexer generated by BNFC + +### Prettier plugin + +Seeing as some formalization projects were already making use of Prettier, a plugin +to support Rzk seemed like a reasonable option. However, that option didn't seem +so attractive for a few reasons. +First of all, it would mean forcing all users to install Prettier, which requires +having the JavaScript toolchain (in particular, `npm`) installed, but not all Rzk +developers use JavaScript. It would be better to have an in-house solution. + +Additionally, [Prettier plugins](https://prettier.io/docs/en/plugins) can only be +written in JavaScript, which means having to maintain a separate parser written +from scratch in JS, which is not maintainable or even desirable. + +### BNFC's parser + +Rzk makes use of [BNFC](https://bnfc.digitalgrammars.com/) which produces a lexer, +parser, AST, and pretty-printer for a given _Labelled BNF_ grammar file. +The generated pretty-printer is not configurable, so it would take a lot of dirty +patching to have it follow the style guide of Rzk code. + +As for the _abstract syntax tree_ (AST) it generates, it is also not suitable +particularly for being too _abstract_. It loses all information about spaces, +parentheses, and basically any syntactic construction, which is useful for a +compiler or typechecker, but not so much for a formatter. + +### Dedicated parser + +An alternative to the generated parser would be to manually write another parser +that keeps all the information relevant to the formatter in a **concrete syntax tree** (CST). +This would be the most customizable option, but it would also be a lot of repeated +work (since we already have the generated parser) that is additionally more error-prone. + +Writing a parser manually is time-consuming and redundant since it can easily be +automatically generated from the much more readable grammar file. + +### BNFC's lexer + +Since BNFC generates a lexer that is used by the parser, a more rudimentary +approach would be to make use of that lexer for our formatter. +As the lexer simply tokenizes the input source code, it does not make any +decisions regarding what to throw away and keeps all syntax constructions (except comments). +Therefore, this approach would have the least amount of syntactic information lost +among the automatically generated tools. + +On the hand, working with a stream of tokens is not as easy as working with a parsed tree. +First, tokens do not carry almost any semantics with them, so it is difficult to +understand whether an identifier refers to a parameter, a function application, +or if it's even still being declared. +Also, without having semantics, it is difficult to judge out which line split +would make more sense when inserting line breaks in a long line, or to figure out +how much a given line should be indented. + +## Our chosen solution + +The approach that seemed the most suitable of the above was the last one: use BNFC's lexer. +After going through the style guide a few times, it was clear that the vast majority +of the rules/recommendations did not involve understanding much semantics. +For example, Unicode replacement for ASCII sequences, moving `:=` to a new line, +and adding a space after `\\` in a lambda term can all be applied blindly. + +For the few rules that require having some semantics (like distinguishing between `:` +for definition conclusion and for other types), it was enough to just store some +state that is preserved (and updated) while traversing the tokens; no additional +semantic analysis was necessary. + +The actual implementation in Haskell is just pattern matching on tokens with some +conditions and applying edits to the text in response. +For example, the following is an excerpt of the pattern match for opening parenthesis: + +```haskell +-- Ensure exactly one space after the first open paren of a line +go :: FormatState -> [Token] -> [FormattingEdit] +go s (Token "(" line col : tks) + | precededBySingleCharOnly && spacesAfter /= 1 && not isLastNonSpaceChar + = FormattingEdit line (col + 1) line (spaceColcol + 1 + spacesAfter) " " + : go (incParensDepth s) tks + -- Remove extra spaces if it's *not* the first open paren on a new line + | not precededBySingleCharOnly && spacesAfter > 0 + = FormattingEdit line (col + 1) line (col + 1 + spacesAfter) "" + : go (incParensDepth s) tks + | otherwise = go (incParensDepth s) tks + where + -- Redacted for brevity +``` +where `go` is a helper function containing the core logic used inside `format :: String -> String`. +It simply operates on the current state and the matched token(s) and returns a list +of edits to be applied to the text. The `FormattingEdit` structure provides compatibility +with LSP (for the VS Code extension) and is easy to reuse for the CLI. +Additionally, we try to make sure that these edits perform the minimal changes +needed so that nothing else is affected by mistake (such as comments). + +While this implementation is more error-prone than an auto-generated pretty-printer, +it is much more flexible and unit tests have been added that verify the various +rules to make sure no functionality gets broken by mistake. In fact, the unit tests +helped catch a bug with the formatter right as we thought we were ready to release it. + +### Implemented rules + +The formatter currently implements the following features: + +- Add a space after the opening parenthesis to help with the code tree structure +- Put the definition conclusion (type, starting with `:`) and construction (body, starting with `:=`) on new lines +- Add a space after the `\\` of a lambda term and around binary operators (like `,` and `=`) +- Move binary operators to the beginning of the next line if they appear at the end of the previous one +- Replace common ASCII sequences with their Unicode equivalent +- Removes extraneous spaces in some places like before `)` or between `#lang` and `rzk-1` (though it's not comprehensive yet) + +### Shortcomings + +No software is ever perfect, especially from the first release, and our formatter +is definitely no exception 🙂. + +For example, if the source code contained tab characters, instead of throwing its +hands up in the air and angrily shouting "I can't deal with this", the formatter +decides to silently take revenge and mess up your code by deleting random characters +around the tab character, but only every once in a while! +(I really wish I was joking 😅) + +Thankfully, the accompanying [VS Code extension](https://marketplace.visualstudio.com/items?itemName=NikolaiKudasovfizruk.rzk-1-experimental-highlighting) +configures some default settings to avoid this situation by always inserting spaces. + +Aside from bugs like this, there are also features that are simply not yet implemented. +For example, these include [inserting line breaks](https://github.com/rzk-lang/rzk/issues/147) +in very long lines and +[inferring the correct indentation level](https://github.com/rzk-lang/rzk/issues/148) a line should have. +For the latter, we decided that it might be too constraining to force a specific +indentation on the users when the line might be more readable using a different indentation, +and the formatter is not smart enough to decide when it should adjust indentation or leave it as is. + +Lastly, some desirable features have been identified as belonging more to the linter +(which we do not have yet) rather than the formatter, such as detecting unnecessary +parenthesis, deprecated syntax, and expressions that could be simplified. +The linter is currently not on the roadmap yet, but we hope to have it sometime in the future. + +## How to use the formatter + +The formatter is shipped as part of the `rzk` executable under the subcommand `format`. +It is also integrated in the language server and used by the VS Code extension. +Additionally, the GitHub Action now has an option to check the formatting as well. + +### CLI + +The command looks like so: +```bash +rzk format [--check] [--write] [file_path]... +``` + +It follows the convention of other formatters (Prettier and [Fourmolu](https://hackage.haskell.org/package/fourmolu), to name a couple) +in that usage with any flags (`rzk format file.rzk`) outputs the result to `stdout`, +using `--write` will modify the files in place, and `--check` is only for verifying +that the files are already well-formatted (and exits with a non-zero code if they aren't). +Naturally, `--write` and `--check` are mutually exclusive and it would be an error to specify both. + +If no file paths are given, then they will be read from `rzk.yaml` if it exists (just like with `typecheck`). + +### VS Code + +The accompanying release of the VS Code extension integrated formatter support +so you can use the built-in [`Format Document`](https://code.visualstudio.com/docs/editor/codebasics#_formatting) +command to format Rzk and Literate Rzk Markdown files. +It also comes with new default settings for Rzk file that ensure consistent indentation (2 spaces), +enable formatting on save, and add a ruler at 80 chars to help users identify +long lines since the formatter cannot wrap them just yet. + +The formatter can be turned off manually by setting the `rzk.format.enable` option to `false`. + +### GitHub Action + +A new option has been added to the [`rzk-action`](https://github.com/rzk-lang/rzk-action) +called `check-formatting` that, well, _checks_ the _formatting_ of the input files. +It is off by default, and cannot be used with `rzk-version` lower than `v0.7.1`. + +## Conclusion and Feedback + +This was an exciting milestone for the Rzk toolset and for me personally as I have +never written a similar piece of code before. +That was an interesting engineering and research challenge, and I enjoyed it all the way. + +Contributions have been made to popular Rzk formalization repos to autoformat +their content, such as [sHoTT](https://github.com/rzk-lang/sHoTT/pull/142) and the [HoTT Book](https://github.com/rzk-lang/hottbook/pull/16). + +I encourage everyone to use the formatter and help us by reporting any possible bugs +on our [GitHub issues](https://github.com/rzk-lang/rzk/issues) page (using the label `formatter`). +There also exists a [Zulip thread](https://rzk-lang.zulipchat.com/#narrow/stream/407112-general/topic/rzk.20v0.2E7.2E*.20with.20autoformatter.20.28experimental.29) +about the autoformatter, so feedback and discussions are welcome there as well. +Many thanks to [Fredrik Bakke](https://github.com/fredrik-bakke) for his helpful feedback +on the [sHoTT formatting PR](https://github.com/rzk-lang/sHoTT/pull/142)! + +See you in future blog posts :)