From 5265cb93699114e05fcea667597ea96cf3f4b2f4 Mon Sep 17 00:00:00 2001 From: Jed Barber Date: Tue, 2 Feb 2021 17:25:27 +1100 Subject: Packrat parser combinator article added --- project/templates/packrat.html | 157 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 157 insertions(+) create mode 100644 project/templates/packrat.html (limited to 'project/templates/packrat.html') diff --git a/project/templates/packrat.html b/project/templates/packrat.html new file mode 100644 index 0000000..117c69a --- /dev/null +++ b/project/templates/packrat.html @@ -0,0 +1,157 @@ + +{% extends "base.html" %} + + + +{% block title %}Packrat Parser Combinator Library{% endblock %} + + + +{% block content %} + + +

Packrat Parser Combinator Library

+ +

Git repository: Link
+Paper this was based on: Link

+ +
2/2/2021
+ + +
Overview
+ +

Parser combinators are what you end up with when you start factoring out common pieces of functionality from +recursive descent parsing. They are higher order +functions that can be combined in modular ways to create a desired parser.

+ +

However they also inherit the drawbacks of recursive descent parsing, and in particular recursive descent +parsing with backtracking. If the grammar that the parser is designed to accept contains left recursion then +the parser will loop infinitely. If the grammar is ambiguous then only one result will be obtained. And any result +may require exponential time and space to calculate.

+ +

This library, based on the paper linked at the top, solves all those problems and a few bits more. As an +example, the following grammar portion:

+ +
+
+Expression ::= Expression - Term | Term
+Term ::= Term * Factor | Factor
+Factor ::= ( Expression ) | Integer
+
+
+ +

Can be turned into the following code snippet:

+ +
+ +package Expr_Redir is new Redirect; +package Term_Redir is new Redirect; + +function Left_Parens is new Match ('('); +function Right_Parens is new Match (')'); +function Fac_Expr is new Between (Left_Parens, Expr_Redir.Call, Right_Parens); +function Fac_Choice is new Choice_2 (Fac_Expr, Integer_Num); +function Factor is new Stamp (Factor_Label, Fac_Choice); + +function Multiply is new Match ('*'); +function Term_Mult is new Sequence (Term_Redir.Call'Access, Multiply'Access, Factor'Access); +function Term_Choice is new Choice_2 (Term_Mult, Factor); +function Term is new Stamp (Term_Label, Term_Choice); + +function Minus is new Match ('-'); +function Expr_Minus is new Sequence (Expr_Redir.Call'Access, Minus'Access, Term'Access); +function Expr_Choice is new Choice_2 (Expr_Minus, Term); +function Expression is new Stamp (Expression_Label, Expr_Choice); + +
+ +

Most of the verbosity is caused by the need to individually instantiate each combinator, as generics are +used to serve the same purpose as higher order functions. Some bits are also omitted, such as the label +enumeration and the actual setting of the redirectors. But the above should provide a good general +impression.

+ + +
Features
+ +

A list of features that this library provides includes:

+ + +

More thorough documentation is provided in the /doc directory.

+ +

The name of the library comes from packrat parsing +which is a parsing algorithm that avoids exponential time complexity by memoizing all intermediate results. As that +is what this library does, both so as to reduce the time complexity and to enable piecewise parsing, the name +seemed appropriate.

+ + +
Left Recursion
+ +

Direct left recursion, meaning a grammar non-terminal that immediately recurses to itself on the left as in +the Expression or Term used above, is fairly easy to handle. A counter is used to keep track +of how many times a particular combinator has been applied to the input at a particular position, and when that +counter exceeds the number of unconsumed input tokens plus one the result is curtailed. This is explained on +pages 7 and 8 of the paper.

+ +

The really thorny issue that caused the most problems with this library is indirect left recursion. This is +when a non-terminal recurses to itself on the left only after first evaluating to one or more other non-terminals. +Curtailment in these circumstances can easily cause those other non-terminals to also be curtailed, and reusing +the results for those other non-terminals may be incorrect. This issue along with a proposed solution is +explained on page 9 of the paper. However that solution was not as clear as would be preferred. So some minor +rephrasing and reworking was needed.

+ +

Bringing this problem back to the start: What are we really doing when we curtail a result due to left +recursion? It is not a matter of cutting off branches in a parse tree. We are identifying conditions where the +parse result of a particular non-terminal can be calculated without further recursion. The word "curtailment" is +somewhat misleading in this regard. Once this reframing is done then a clearer view immediately follows.

+ +

What is the condition? Exactly as described above for direct left recursion. Through comparing recursion counts +with the amount of unconsumed input we determine that a result of no successful parse can be calculated, and that +the result is valid for reuse for any deeper recursion of the same combinator at that input position.

+ +

From that can be derived:

+ + +

So far the above list just covers what is in the paper. But there is a little more that can be inferred:

+ + +

These two details should constitute a minor efficiency improvement.

+ + +
Further Work
+ +

While the polynomial complexity of this library has been experimentally confirmed, no effort has yet been +made to prove that it is actually polynomial in the same way that the parser combinators in the paper are. It +is possible that due to the changes involved with using a non-functional language and enabling piecewise +parsing that some subtle complexity difference may have arisen.

+ +

Likewise, the piecewise parsing has been unit tested to a degree but no formal proof that it is correct has +been done.

+ +

Ideas like being able to modify and resume an erroneous parsing attempt would also be interesting to explore.

+ +

Finally, the plan is to actually use this library for something significant at some point in the future.

+ + +{% endblock %} + -- cgit