A set of parser for first order languages with many ways of writing connectives and several alternative grammars.
Most parsers take an object whose methods are factory functions that are expected to create representations of syntactic elements recognized by the parser or process them in another application-dependent way. The required set of factories depends on the parser.
The most basic sets consists of factories producing terms:
interface TermFactories<Term> {
variable: (symbol: string, ee: ErrorExpected) => Term,
constant: (symbol: string, ee: ErrorExpected) => Term,
functionApplication: (symbol: string, args: Array<Term>, ee: ErrorExpected) => Term
}
Parsers of formulas need term factories and factories producing the various cases of formulas:
interface FormulaFactories<Term, Formula>
extends TermFactories<Term> {
variable: (symbol: string, ee: ErrorExpected) => Term,
constant: (symbol: string, ee: ErrorExpected) => Term,
functionApplication: (symbol: string, args: Array<Term>, ee: ErrorExpected) => Term,
predicateAtom: (symbol: string, args: Array<Term>, ee: ErrorExpected) => Formula,
equalityAtom: (lhs: Term, rhs: Term, ee: ErrorExpected) => Formula,
negation: (subf: Formula, ee: ErrorExpected) => Formula,
conjunction: (lhs: Formula, rhs: Formula, ee: ErrorExpected) => Formula,
disjunction: (lhs: Formula, rhs: Formula, ee: ErrorExpected) => Formula,
implication: (lhs: Formula, rhs: Formula, ee: ErrorExpected) => Formula,
equivalence: (lhs: Formula, rhs: Formula, ee: ErrorExpected) => Formula,
existentialQuant: (variable: string, subf: Formula, ee: ErrorExpected) => Formula,
universalQuant: (variable: string, subf: Formula, ee: ErrorExpected) => Formula
}
Parsers of clauses require term factories and factories producing literals and clauses:
interface ClauseFactories<Term, Literal, Clause>
extends TermFactories<Term> {
literal: (negated: boolean, symbol: string, args: Array<Term>, ee: ErrorExpected) => Literal,
clause: (literals: Array<Literal>) => Clause,
}
Typically, each factory will construct a node of an abstract syntax tree representing the given case. However, factories can produce any output. For instance, they can directly generate canonical string representations of terms and formulas, or directly evaluate them, returning domain elements for terms and booleans for formulas, respectively.
Factories can use two callbacks to the parser provided in the ee
argument
to throw a syntax error.
interface ErrorExpected {
error: (message: string) => void,
expected: (expectation: string) => void
}
Throwing syntax errors is useful, e.g., if the number of arguments provided in a function application or a predicate atom differs from the symbol arity. This can be seen in the examples of parser usage below.
Note that ee.error(message)
generates a syntax error with message
while ee.expected(expectation)
generates a syntax error with the message
`Expected ${expectation} but "${actual_input}" found.`
.
The latter should be preferred.
Some parsers take an object whose methods should recognize the three types of non-logical symbols and the variables of a first-order language.
interface Language {
isConstant: (symbol: string) => boolean,
isFunction: (symbol: string) => boolean,
isPredicate: (symbol: string) => boolean,
isVariable: (symbol: string) => boolean,
}
When the parser encounters a possible non-logical or variable symbol, it uses these callbacks to determine its type. Note that the symbol's type actually influences the parser's decision about the kind of expression it is parsing. The sets of symbols of different types must therefore be pairwise disjoint.
The parsing function
function parseTerm<Term>(input: string, language: Language, factories: TermFactories<Term>): Term
recognizes basic first-order terms consisting of constants, variables, and prefix applications of function symbols to parenthesized lists of arguments which are again terms. No infix or postfix function symbols are supported.
The basic building blocks of first-order formulas are atoms, though no parser of atoms is exported. Unless stated otherwise below, formula parsers recognize predicate and equality atoms. Predicate atoms consist either of a sole nullary predicate symbol or of a prefix predicate symbol with a positive arity applied to a parenthesized lists of terms. Infix or postfix predicate symbols are not supported. Equality atoms are written with infix equality symbol.
The parsing function
function parseFormulaStrict<Term, Formula>(input: string,
language: Language, factories: FormulaFactories<Term, Formula>): Formula
recognizes first-order formulas with a rather strict syntax in which each binary subformula must be parenthesized, even at the top level. Atomic, negated, and quantified subformulas can be optionally parenthesized.
The function
function parseFormulaWithPrecedence<Term, Formula>(input: string,
language: Language, factories: FormulaFactories<Term, Formula>): Formula
recognizes first-order formulas with a relaxed syntax in which parentheses can be ommited according to precedence of syntactic operators, which is as follows:
- negation and quantification;
- conjunction – left-associative;
- disjunction – left-associative;
- implication and equivalence – both are right-associative with themselves,
implication is also right-associative with equivalence,
but equivalence is not associative with implication to its right;
i.e.,
A → B → C ↔︎ D ↔︎ E
is parsed asA → (B → (C ↔︎ (D ↔︎ E)))
, however,A ↔︎ B → C
is considered ambiguous and will not be parsed; it must be disambiguated toA ↔︎ (B → C)
or(A ↔︎ B) → C
.
The parser of clauses
function parseClause<Term, Literal, Clause>(input: string,
language: Language, factories: ClauseFactories<Term, Literal, Clause>): Clause
recognizes (possibly nested) disjunctions of
first-order predicate literals. Equality literals are not allowed.
Literals can be joined by any disjunction symbol (typically ∨
or |
)
or the comma (,
).
The empty clause can be given by several symbols (typically □
or []
).
The empty string is not considered the empty clause.
The empty clause cannot be used as a literal.
Parsers are typically used as follows:
Suppose that we have classes Literal
and Clause
defined in a module clauses.js
,
classes Variable
, Constant
, and FunctionApplication
defined in a module terms.js
,
and that we use the clause parser in a function
that obtains the set of symbols constants
,
and two maps of symbols to arities functions
and predicates
.
The clause parser is then set up and called as follows:
import {parseClause} from '@fmfi-uk-1-ain-412/js-fol-parser';
import {Constant, Variable, FunctionApplication} from "./terms.js";
import {Literal, Clause} from './clauses.js';
function usingTheClauseParser(constants, functions, predicates) {
...
const nonLogicalSymbols =
new Set([...constants, ...functions.keys(), ...predicates.keys()])
const language = {
isConstant: (symbol) =>
constants.has(symbol),
isFunction: (symbol) =>
functions.has(symbol),
isPredicate: (symbol) =>
predicates.has(symbol),
isVariable: (symbol) =>
!nonLogicalSymbols.has(symbol),
}
function checkArity(symbol, args, arityMap, {expected}) {
const a = arityMap.get(symbol);
if (args.length !== a) {
expected(`${a} argument${(a == 1 ? '' : 's')} to ${symbol}`);
}
}
const factories = {
variable: (symbol, _) =>
new Variable(symbol),
constant: (symbol, _) =>
new Constant(symbol),
functionApplication: (funSymbol, args, ee) => {
checkArity(funSymbol, args, functions, ee);
return new FunctionApplication(funSymbol, args);
},
literal: (negated, predSymbol, args, ee) => {
checkArity(predSymbol, args, predicates, ee);
return new Literal(negated, predSymbol, args);
},
clause: (literals, _) =>
new Clause(literals)
}
const clause = parseClause(input, language, factories);
...
}
Three auxiliary parsers recognize lists declaring non-logical symbols of a first-order language:
export interface SymbolWithArity {
name: string,
arity: number
}
function parseConstants(input: string): Array<string>;
function parseFunctions(input: string): Array<SymbolWithArity>;
function parsePredicates(input: string): Array<SymbolWithArity>;
Symbols declarations must be comma-separated.
A constant declaration is just the constant symbol.
A function or predicate symbol declaration
has the form symbol/arity
.
A function symbol's arity must be a positive decimal integer,
whereas a predicate symbol's arity must be a non-negative decimal integer.
It is up to the code that uses the parsers to check for and resolve repeated mutually inconstistent declarations of the same symbol.
The substitution parser
export function parseSubstitution<Term>(input: string,
language: Language, factories: TermFactories<Term>): Array<[string, Term]>;
accepts comma-separated lists of ordered pairs specifying a substitution,
i.e., a map from variables to terms.
The pairs can be written alternatively
as (var, term)
or var MAPS-TO term
,
where MAPS-TO
is one of ->
, |->
,
↦
(\u21A6
, RIGHTWARDS ARROW FROM BAR),
⟼
(\u27FC
, LONG RIGHTWARDS ARROW FROM BAR, maps to),
or \mapsto
.
It is up to the code that uses the parser to check for and resolve repeated mutually inconstistent pairs assigning different terms to the same variable.
The trio of parsers
function parseDomain(input: string): Array<string>;
function parseTuples(input: string): Array<Array<string>>;
function parseValuation(input: string, language: Language): Array<[string, string]>;
allows for processing parts of definitions of finite structures
and variable valuation.
A domain element is a non-empty string of arbitrary characters
except (
, )
, ,
, Unicode spaces, and
spacing ASCII control characters (\t, \n, \r, \v, \f).
parseDomain
accepts a comma-separated list of domain elements.
parseTuples
accepts a comma-separated list of domain elements
or ordered n-tuples of domain elements in the usual notation
(e1, e2, …,
en)
.
It is intended to parse interpretations of predicate and function symbols.
parseValuation
accepts a comma-separated list of ordered pairs
describing a valuation of variables.
The pairs can be written alternatively
as (var, domain-element)
or var MAPS-TO domain-element
,
where MAPS-TO
has been specified
in the section on substitutions.
This parser requires a language as its second argument
in order to recognize the symbols of variables in the input.
The parser recognizes valid JavaScript identifiers as function, predicate, and variable symbols.
Symbols of constants can also start with a digit (as defined by Unicode) and continue as a regular identifier.
The parser recognizes many alternative ways of writing logical symbols. The alternatives are listed below directly in the PEGjs syntax.
EqualitySymbol
"equality symbol"
= "="
/ "≐"
ConjunctionSymbol
"conjunction symbol"
= "∧"
/ "&&"
/ "&"
/ "/\\"
/ "\\land" ! IdentifierPart
/ "\\wedge" ! IdentifierPart
DisjunctionSymbol
"disjunction symbol"
= "∨"
/ "||"
/ "|"
/ "\\/"
/ "\\lor" ! IdentifierPart
/ "\\vee" ! IdentifierPart
ImplicationSymbol
"implication symbol"
= "→" / "⇒" / "⟶" / "⟹" / "⊃"
/ "->" / "=>" / "-->" / "==>"
/ "\\limpl" ! IdentifierPart
/ "\\implies" ! IdentifierPart
/ "\\rightarrow" ! IdentifierPart
/ "\\to" ! IdentifierPart
EquivalenceSymbol
"equivalence symbol"
= "↔︎" / "↔" / "⟷" / "⇔" / "⟺" / "≡"
/ "<->" / "<-->" / "<=>" / "<==>" / "==="
/ "\\lequiv" ! IdentifierPart
/ "\\leftrightarrow" ! IdentifierPart
/ "\\equivalent" ! IdentifierPart
/ "\\equiv" ! IdentifierPart
ExistsSymbol
"existential quantifier"
= "∃"
/ "\\e" ( "x" "ists"? )? ! IdentifierPart
/ "\\E" ! IdentifierPart
ForallSymbol
"universal quantifier"
= "∀"
/ "\\forall" ! IdentifierPart
/ "\\all" ! IdentifierPart
/ "\\a" ! IdentifierPart
/ "\\A" ! IdentifierPart
NegationSymbol
"negation symbol"
= "¬"
/ "-" / "!" / "~"
/ "\\neg" ! IdentifierPart
/ "\\lnot" ! IdentifierPart
NonEqualitySymbol
"non-equality symbol"
= "≠"
/ "!=" / "<>" / "/="
/ "\\neq" ! IdentifierPart
EmptyClause
"empty clause symbol"
= "□" / "▫︎" / "◽︎" / "◻︎" / "⬜︎" / "▢" / "⊥"
/ "[]" / "_|_"
/ "\\square" ! IdentifierPart
/ "\\Box" ! IdentifierPart
/ "\\qed" ! IdentifierPart