From 14025d22ce3d66c9d235e57221ec4653e00f972c Mon Sep 17 00:00:00 2001 From: Jedidiah Barber Date: Fri, 26 Nov 2021 20:17:43 +1300 Subject: Switched to .xhtml extension, fixed some minor bugs --- project/templates/packrat.html | 167 ----------------------------------------- 1 file changed, 167 deletions(-) delete mode 100644 project/templates/packrat.html (limited to 'project/templates/packrat.html') diff --git a/project/templates/packrat.html b/project/templates/packrat.html deleted file mode 100644 index 4ae1c3e..0000000 --- a/project/templates/packrat.html +++ /dev/null @@ -1,167 +0,0 @@ - -{%- extends "base.html" -%} - - - -{%- block title -%}Packrat Parser Combinator Library{%- endblock -%} - - - -{%- block content %} -

Packrat Parser Combinator Library

- -

Git repository: Link
-Paper this was based on: Link

- -
2/2/2021
- - -
Overview
- -

Parser combinators are what you end up with when you start factoring out common pieces of -functionality from -recursive descent parsing. They are higher order functions that can be combined in modular ways -to create a desired parser.

- -

However they also inherit the drawbacks of recursive descent parsing, and in particular recursive -descent parsing with backtracking. If the grammar that the parser is designed to accept contains -left recursion then the parser will loop infinitely. If the grammar is ambiguous then only one -result will be obtained. And any result may require exponential time and space to calculate.

- -

This library, based on the paper linked at the top, solves all those problems and a few bits -more. As an example, the following grammar portion:

- -
-
-Expression ::= Expression - Term | Term
-Term ::= Term * Factor | Factor
-Factor ::= ( Expression ) | Integer
-
-
- -

Can be turned into the following code snippet:

- -
- -package Expr_Redir is new Redirect; -package Term_Redir is new Redirect; - -function Left_Parens is new Match ('('); -function Right_Parens is new Match (')'); -function Fac_Expr is new Between (Left_Parens, Expr_Redir.Call, Right_Parens); -function Fac_Choice is new Choice_2 (Fac_Expr, Integer_Num); -function Factor is new Stamp (Factor_Label, Fac_Choice); - -function Multiply is new Match ('*'); -function Term_Mult is new Sequence (Term_Redir.Call'Access, Multiply'Access, Factor'Access); -function Term_Choice is new Choice_2 (Term_Mult, Factor); -function Term is new Stamp (Term_Label, Term_Choice); - -function Minus is new Match ('-'); -function Expr_Minus is new Sequence (Expr_Redir.Call'Access, Minus'Access, Term'Access); -function Expr_Choice is new Choice_2 (Expr_Minus, Term); -function Expression is new Stamp (Expression_Label, Expr_Choice); - -
- -

Most of the verbosity is caused by the need to individually instantiate each combinator, as -generics are used to serve the same purpose as higher order functions. Some bits are also omitted, -such as the label enumeration and the actual setting of the redirectors. But the above should -provide a good general impression.

- - -
Features
- -

A list of features that this library provides includes:

- - -

More thorough documentation is provided in the /doc directory.

- -

The name of the library comes from packrat parsing which is a parsing algorithm that avoids exponential time -complexity by memoizing all intermediate results. As that is what this library does, both so as to -reduce the time complexity and to enable piecewise parsing, the name seemed appropriate.

- - -
Left Recursion
- -

Direct left recursion, meaning a grammar non-terminal that immediately recurses to itself on the -left as in the Expression or Term used above, is fairly easy to handle. A counter -is used to keep track of how many times a particular combinator has been applied to the input at a -particular position, and when that counter exceeds the number of unconsumed input tokens plus one -the result is curtailed. This is explained on pages 7 and 8 of the paper.

- -

The really thorny issue that caused the most problems with this library is indirect left -recursion. This is when a non-terminal recurses to itself on the left only after first evaluating to -one or more other non-terminals. Curtailment in these circumstances can easily cause those other -non-terminals to also be curtailed, and reusing the results for those other non-terminals may be -incorrect. This issue along with a proposed solution is explained on page 9 of the paper. However -that solution was not as clear as would be preferred. So some minor rephrasing and reworking was -needed.

- -

Bringing this problem back to the start: What are we really doing when we curtail a result due to -left recursion? It is not a matter of cutting off branches in a parse tree. We are identifying -conditions where the parse result of a particular non-terminal can be calculated without further -recursion. The word "curtailment" is somewhat misleading in this regard. Once this reframing is done -then a clearer view immediately follows.

- -

What is the condition? Exactly as described above for direct left recursion. Through comparing -recursion counts with the amount of unconsumed input we determine that a result of no successful -parse can be calculated, and that the result is valid for reuse for any deeper recursion of the same -combinator at that input position.

- -

From that can be derived:

- - -

So far the above list just covers what is in the paper. But there is a little more that can be -inferred:

- - -

These two details should constitute a minor efficiency improvement.

- - -
Further Work
- -

While the polynomial complexity of this library has been experimentally confirmed, no effort has -yet been made to prove that it is actually polynomial in the same way that the parser combinators in -the paper are. It is possible that due to the changes involved with using a non-functional language -and enabling piecewise parsing that some subtle complexity difference may have arisen.

- -

Likewise, the piecewise parsing has been unit tested to a degree but no formal proof that it is -correct has been done.

- -

Ideas like being able to modify and resume an erroneous parsing attempt would also be interesting -to explore.

- -

Finally, the plan is to actually use this library for something significant at some point in the -future.

-{% endblock -%} - -- cgit