The Oak Parser Generator

Hello! Oak is a parser generator based on Parsing Expression Grammar (PEG). This project has been started to explore the idea of automatically inferring the types of the AST generated by parsing expressions. It is written as a procedural macro and can be embedded in your Rust code without complicating the build system.

Independently of your programming experience with parser generators, a first step is to read the Getting Started chapter. If you are new to parser generator or PEG, the chapter Learn Oak is a smooth tutorial to Oak for incrementally building a small language named Calc with arithmetic expressions and variable bindings. You can directly dive into the full grammar of Calc in the chapter Full Calc Grammar. If you want to learn about the Oak specificities, please go to the chapter Typing Expression. Finally, in the chapter Related Work, we compare Oak to existing parser generators and give some references and credits to papers or implementations that inspired the design of Oak.

The code is available on github.

Documentation

Syntax cheat sheet

e is a sub expression and T is the type of e. The types are only informative, it does not show unit propagation, more on that in Typing Expression. Greedy operators do not generate "backtracking points" and consume as many characters as possible.

Item Type Description
r = e Type of e Declare a rule named r parsing the expression e.
r: T = e T Declare a rule named r of type T parsing the expression e.
Expression Type Precedence level Description
"literal" (^) 0 Match a string literal.
. char 0 Match any single character.
["a-zA-Z-"] char 0 Match a character from one of the specified classes.
(e) T 0 Group an expression.
rule Type of rule 0 Call the rule with the name rule.
ident _ 0 Call an external parser with the name parse_ident and recognize_ident depending on the context.
.. StreamSpan::Output 0 Create the location information (span) of the expression following ..
e? Option<T> 1 (Greedy) Match zero or one e. Always succeed.
e* Vec<T> 1 (Greedy) Match zero or more e. Always succeed.
e+ Vec<T> 1 (Greedy) Match one or more e.
&e (^) 2 Try to match e and succeed if e succeeds. It does not consume any input.
!e (^) 2 Try to match e and succeed if e fails. It does not consume any input.
e1 e2 e3 (T1, T2, T3) 3 Match e1 e2 e3 in sequence. Immediately fails when one fails.
e > f Return type of f 4 Match e and if it succeeds, call f(v) where v is the value of e.
e: () () 4 Force the type of e to be ().
e: (^) (^) 4 Force the type of e to be (^).
e: T T 4 Force the type of e to be a Rust type T.
e1 / e2 / e3 Type of any e 5 Match e1 e2 e3 in sequence. Immediately succeeds when one succeeds.

Oak status

My goal is to propose a complete library to ease the development of Embedded Domain Specific Language (EDSL) in Rust with procedural macros. For the moment my priority is to stabilize and test Oak. Next I want to add more static analysis to prevent grammar design error such as in "=" / "==" (can you find what is wrong?) Here some other wanted features:

  • Extend the choice operator to handle erroneous cases (#30).
  • Bootstrap the grammar (#42).
  • Parametrize rules with other rules and arguments (#10, #12, #28).
  • ...