Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add README #3

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @bbannier
14 changes: 14 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
version: 2
updates:
- package-ecosystem: "cargo" # See documentation for possible values
directory: "/"
schedule:
interval: "weekly"
- package-ecosystem: "npm" # See documentation for possible values
directory: "/vscode"
schedule:
interval: "weekly"
- package-ecosystem: "github-actions" # See documentation for possible values
directory: "/"
schedule:
interval: "weekly"
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[submodule "src"]
path = src
url = https://github.com/ckreibich/tree-sitter-zeek-src
url = https://github.com/zeek/tree-sitter-zeek-src
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# tree-sitter-zeek

[![Tests](https://github.com/zeek/tree-sitter-zeek/actions/workflows/test.yaml/badge.svg)](https://github.com/zeek/tree-sitter-zeek/actions/workflows/test.yaml)

A [Zeek](https://zeek.org) grammar for [tree-sitter](https://github.com/tree-sitter/tree-sitter).

## Background

This grammar parses scripts written in the [Zeek scripting
language](https://docs.zeek.org/en/master/script-reference/index.html).

The goal of this grammar is to facilitate tooling around Zeek
scripts. For that reason, its structure resembles Zeek's grammar but differs in
a number of ways. For example, it tracks newlines explicitly and relies more
strongly on precedence and associativity to resolve ambiguities. Like Zeek's
parser, this one currently doesn't name symbols deeply: for example, the grammar
features an `expr` rule that covers any kind of expression, but the choices
aren't currently broken down into, say, `addition_expr`, `or_expr`, and
similars.

## Usage

To use the generated parser directly (e.g. via any of tree-sitter's
[language bindings](https://tree-sitter.github.io/tree-sitter/#language-bindings)),
clone this repository recursively. We maintain a separate
[git repository](https://github.com/zeek/tree-sitter-zeek-src)
to track generated sources. You do not need the tree-sitter CLI
to use those sources in your tooling, but you'll likely want it
anyway to explore the parser. For example, `tree-sitter parse <script>`
produces the script's syntax tree, and `tree-sitter highlight <script>`
shows syntax-highlighted sources.

## Building the parser

* Install [tree-sitter](https://tree-sitter.github.io/tree-sitter/creating-parsers#installation) on your machine.
* Generate the parser: run `tree-sitter generate`.

## Testing

There's currently no `tree-sitter test` testsuite. Instead, a test driver
clones the Zeek repository and runs on every Zeek script in the distribution.
104 changes: 51 additions & 53 deletions grammar.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,60 +22,54 @@ module.exports = grammar({

rules: {
source_file: $ => seq(
repeat($.decl),
repeat($.stmt),
),

decl: $ => choice(
$.module_decl,
$.export_decl,
$.global_decl,
$.option_decl,
$.const_decl,
$.redef_decl,
$.redef_enum_decl,
$.redef_record_decl,
$.type_decl,
$.func_decl,
$.preproc,
repeat($._stmt),
),

module_decl: $ => seq('module', $.id, ';'),
export_decl: $ => seq('export', '{', repeat($.decl), '}'),
export_decl: $ => seq('export', '{', repeat($._stmt), '}'),

// A change here over Zeek's parser: we make the combo of init class
// and initializer jointly optional, instead of individually. Helps
// avoid ambiguity.
global_decl: $ => seq('global', $.id, optional(seq(':', $.type)), optional($.initializer), optional($.attr_list), ';'),
var_decl: $ => seq(choice('global', 'local'), $.id, optional(seq(':', $.type)), optional($.initializer), optional($.attr_list), ';'),
option_decl: $ => seq('option', $.id, optional(seq(':', $.type)), optional($.initializer), optional($.attr_list), ';'),
const_decl: $ => seq('const', $.id, optional(seq(':', $.type)), optional($.initializer), optional($.attr_list), ';'),
redef_decl: $ => seq('redef', $.id, optional(seq(':', $.type)), optional($.initializer), optional($.attr_list), ';'),

redef_enum_decl: $ => seq('redef', 'enum', $.id, '+=', '{', $.enum_body, '}', ';'),
redef_enum_decl: $ => seq('redef', 'enum', $.id, '+=', '{', $._enum_body, '}', ';'),
redef_record_decl: $ => seq('redef', 'record', $.id, '+=', '{', repeat($.type_spec), '}', optional($.attr_list), ';'),
type_decl: $ => seq('type', $.id, ':', $.type, optional($.attr_list), ';'),
func_decl: $ => seq($.func_hdr, repeat($.preproc), $.func_body),

stmt: $ => choice(
_stmt: $ => choice(
$.module_decl,
$.export_decl,
$.option_decl,
$.redef_decl,
$.redef_enum_decl,
$.redef_record_decl,
$.type_decl,
$.func_decl,
$.hook_decl,
$.event_decl,
$.preproc,
// TODO: @no-test support
seq('{', optional($.stmt_list), '}'),
seq('print', $.expr_list, ';'),
seq('event', $.event_hdr, ';'),
prec_r(seq('if', '(', $.expr, ')', $.stmt, optional(seq('else', $.stmt)))),
prec_r(seq('if', '(', $.expr, ')', $._stmt, optional(seq('else', $._stmt)))),
seq('switch', $.expr, '{', optional($.case_list), '}'),
seq('for', '(', $.id, optional(seq(',', $.id)), 'in', $.expr, ')', $.stmt),
seq('for', '(', '[', list1($.id, ','), ']', optional(seq(',', $.id)), 'in', $.expr, ')', $.stmt),
seq('while', '(', $.expr, ')', $.stmt),
$.for,
seq('while', '(', $.expr, ')', $._stmt),
seq(choice('next', 'break', 'fallthrough'), ';'),
seq('return', optional($.expr), ';'),
seq(choice('add', 'delete'), $.expr, ';'),
seq('local', $.id, optional(seq(':', $.type)), optional($.initializer), optional($.attr_list), ';'),
// Precedence here works around ambiguity with similar global declaration:
prec(-1, seq('const', $.id, optional(seq(':', $.type)), optional($.initializer), optional($.attr_list), ';')),
// List these here instead of with the other decls since they can also appear on stmt block scope.
$.var_decl,
$.const_decl,
// Associativity here works around theoretical ambiguity if "when" nested:
prec_r(seq(
optional('return'),
'when', optional($.capture_list), '(', $.expr, ')', $.stmt,
'when', optional($.capture_list), '(', $.expr, ')', $._stmt,
optional(seq('timeout', $.expr, '{', optional($.stmt_list), '}')),
)),
seq($.index_slice, '=', $.expr, ';'),
Expand All @@ -85,7 +79,12 @@ module.exports = grammar({
';',
),

stmt_list: $ => repeat1($.stmt),
for: $ => choice(
seq('for', '(', $.id, optional(seq(',', $.id)), 'in', $.expr, ')', $._stmt),
seq('for', '(', '[', list1($.id, ','), ']', optional(seq(',', $.id)), 'in', $.expr, ')', $._stmt)
),

stmt_list: $ => repeat1($._stmt),

case_list: $ => repeat1(
choice(
Expand Down Expand Up @@ -116,7 +115,7 @@ module.exports = grammar({
'timer',
seq('record', '{', repeat($.type_spec), '}'),
seq('union', '{', list1($.type, ','), '}'),
seq('enum', '{', $.enum_body, '}'),
seq('enum', '{', $._enum_body, '}'),
'list',
seq('list', 'of', $.type),
seq('vector', 'of', $.type),
Expand All @@ -129,7 +128,7 @@ module.exports = grammar({
$.id,
),

enum_body: $ => list1($.enum_body_elem, ',', true),
_enum_body: $ => list1($.enum_body_elem, ',', true),

enum_body_elem: $ => choice(
seq($.id, '=', $.constant, optional($.deprecated)),
Expand All @@ -152,21 +151,15 @@ module.exports = grammar({

initializer: $ => seq(
optional($.init_class),
$.init,
$.expr,
),

init_class: $ => prec_r(choice('=', '+=', '-=')),

init: $ => choice(
seq('{', '}'),
seq('{', repeat(seq($.expr, ',')), $.expr, '}'),
$.expr,
),

attr_list: $ => prec_l(repeat1($.attr)),

attr: $ => prec_l(choice(
'&broker_store_allow_complex_type',
'&broker_allow_complex_type',
'&deprecated',
'&error_handler',
'&is_assigned',
Expand All @@ -183,6 +176,7 @@ module.exports = grammar({
seq('&deprecated', '=', $.string),
seq('&delete_func', '=', $.expr),
seq('&expire_func', '=', $.expr),
seq('&group', '=', $.expr),
seq('&on_change', '=', $.expr),
seq('&priority', '=', $.expr),
seq('&read_expire', '=', $.expr),
Expand All @@ -195,7 +189,7 @@ module.exports = grammar({
expr: $ => choice(
prec_l(7, seq($.expr, '[', $.expr_list, ']')),
prec_l(7, seq($.expr, $.index_slice)),
prec_l(7, seq($.expr, '$', $.id)),
prec_l(7, $.field_access),

prec_r(6, seq('|', $.expr, '|')),
prec_r(6, seq('++', $.expr)),
Expand Down Expand Up @@ -239,10 +233,12 @@ module.exports = grammar({
prec(2, seq('$', $.id, $.begin_lambda, '=', $.func_body)),

prec_l(1, seq('[', optional($.expr_list), ']')),
prec_l(1, seq('{', optional($.expr_list), '}')),
prec_l(1, seq('record', '(', $.expr_list, ')')),
prec_l(1, seq('table', '(', optional($.expr_list), ')', optional($.attr_list))),
prec_l(1, seq('set', '(', optional($.expr_list), ')', optional($.attr_list))),
prec_l(1, seq('vector', '(', optional($.expr_list), ')')),
prec_l(1, seq($.id, '(', optional($.expr_list), ')')),
prec_l(1, seq($.expr, '(', optional($.expr_list), ')')),

$.id,
Expand All @@ -251,16 +247,19 @@ module.exports = grammar({

seq('(', $.expr, ')'),
seq('copy', '(', $.expr, ')'),
prec_r(seq('hook', $.expr)),
seq($.expr, '?$', $.id),
prec_r(seq('hook', $.id, '(', optional(list1($.expr, ',')), ')')),
$.field_check,
seq('schedule', $.expr, '{', $.event_hdr, '}'),
seq('function', $.begin_lambda, $.func_body),

// Lower precedence here to favor local-variable statements
prec_r(-1, seq('local', $.id, '=', $.expr)),
),

expr_list: $ => list1($.expr, ','),
field_access: $ => prec_l(seq($.expr, '$', $.id)),
field_check: $ => prec_l(seq($.expr, '?$', $.id)),

expr_list: $ => list1($.expr, ',', true),

constant: $ => choice(
// Associativity here resolves ambiguity with division
Expand All @@ -277,13 +276,12 @@ module.exports = grammar({
prec(-10, $.integer),
),

func_hdr: $ => choice($.func, $.hook, $.event),

// Precedences here are to avoid ambiguity with related expressions
func: $ => prec(1, seq('function', $.id, $.func_params, optional($.attr_list))),
hook: $ => prec(1, seq('hook', $.id, $.func_params, optional($.attr_list))),
event: $ => seq(optional('redef'), 'event', $.id, $.func_params, optional($.attr_list)),
func_decl: $ => prec(1, seq('function', $.id, $.func_params, optional($.attr_list), $._func_body)),
hook_decl: $ => prec(1, seq('hook', $.id, $.func_params, optional($.attr_list), $._func_body)),
event_decl: $ => seq(optional('redef'), 'event', $.id, $.func_params, optional($.attr_list), $._func_body),

_func_body: $ => seq(repeat($.preproc), $.func_body),
func_body: $ => seq('{', optional($.stmt_list), '}'),

// Precedence here is to disambiguate other interpretations of the colon
Expand Down Expand Up @@ -315,7 +313,7 @@ module.exports = grammar({

event_hdr: $ => seq($.id, '(', optional($.expr_list), ')'),

id: $ => /[A-Za-z_][A-Za-z_0-9]*(::[A-Za-z_][A-Za-z_0-9]*)*/,
id: () => /(([A-Za-z_][A-Za-z_0-9]*)?::)?[A-Za-z_][A-Za-z_0-9]*/,
file: $ => /[^ \t\r\n]+/,
pattern: $ => /\/((\\\/)?[^\r\n\/]?)*\/i?/,

Expand Down Expand Up @@ -347,8 +345,6 @@ module.exports = grammar({
// Plain string characters or escape sequences, wrapped in double-quotes.
string: $ => /"([^\\\r\n\"]|\\([^\r\n]|[0-7]+|x[0-9a-fA-F]+))*"/,

minor_comment: $ => /#[^#][^\r\n]*/,

// Zeekygen comments come in three flavors: a head one at the beginning
// of a script (##!), one that refers to the previous node (##<), and
// ones that refer to the subsequent one. Note that we skip the final
Expand All @@ -357,6 +353,8 @@ module.exports = grammar({
zeekygen_prev_comment: $ => /##<[^\r\n]*/,
zeekygen_next_comment: $ => /##[^\r\n]*/,

minor_comment: $ => /#[^\r\n]*/,

// We track newlines explicitly -- this gives us the ability to honor
// existing formatting in select places.
nl: $ => /\r?\n/,
Expand All @@ -365,9 +363,9 @@ module.exports = grammar({
'extras': $ => [
/[ \t]+/,
$.nl,
$.minor_comment,
$.zeekygen_head_comment,
$.zeekygen_prev_comment,
$.zeekygen_next_comment,
$.minor_comment,
],
});
6 changes: 3 additions & 3 deletions queries/highlights.scm
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
;; Language features
;; -----------------

(event (id) @function)
(hook (id) @function)
(func (id) @function)
(event_decl (id) @function)
(hook_decl (id) @function)
(func_decl (id) @function)
(type) @type
(attr) @attribute

Expand Down
2 changes: 1 addition & 1 deletion src
Submodule src updated 2 files
+0 −5 README
+8 −0 README.md
59 changes: 59 additions & 0 deletions test/corpus/expressions
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
================================================================================
Function call
================================================================================

1;
f(1);
f(1) + f(1);

--------------------------------------------------------------------------------

(source_file
(nl)
(expr
(constant
(integer)))
(nl)
(expr
(id)
(expr_list
(expr
(constant
(integer)))))
(nl)
(expr
(expr
(id)
(expr_list
(expr
(constant
(integer)))))
(expr
(id)
(expr_list
(expr
(constant
(integer))))))
(nl))

================================================================================
Global IDs
================================================================================

global x: ::X;
global x: GLOBAL::X;

---

(source_file
(nl)
(var_decl
(id)
(type
(id)))
(nl)
(var_decl
(id)
(type
(id)))
(nl))