summaryrefslogtreecommitdiff
path: root/project/templates/packrat.html
diff options
context:
space:
mode:
Diffstat (limited to 'project/templates/packrat.html')
-rw-r--r--project/templates/packrat.html167
1 files changed, 0 insertions, 167 deletions
diff --git a/project/templates/packrat.html b/project/templates/packrat.html
deleted file mode 100644
index 4ae1c3e..0000000
--- a/project/templates/packrat.html
+++ /dev/null
@@ -1,167 +0,0 @@
-
-{%- extends "base.html" -%}
-
-
-
-{%- block title -%}Packrat Parser Combinator Library{%- endblock -%}
-
-
-
-{%- block content %}
-<h4>Packrat Parser Combinator Library</h4>
-
-<p>Git repository: <a href="/cgi-bin/cgit.cgi/packrat">Link</a><br />
-Paper this was based on: <a href="http://richard.myweb.cs.uwindsor.ca/PUBLICATIONS/PREPRINT_PADL_NOV_07.pdf"
-class="external">Link</a></p>
-
-<h5>2/2/2021</h5>
-
-
-<h5>Overview</h5>
-
-<p>Parser combinators are what you end up with when you start factoring out common pieces of
-functionality from <a href="http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm" class="external">
-recursive descent parsing</a>. They are higher order functions that can be combined in modular ways
-to create a desired parser.</p>
-
-<p>However they also inherit the drawbacks of recursive descent parsing, and in particular recursive
-descent parsing with backtracking. If the grammar that the parser is designed to accept contains
-left recursion then the parser will loop infinitely. If the grammar is ambiguous then only one
-result will be obtained. And any result may require exponential time and space to calculate.</p>
-
-<p>This library, based on the paper linked at the top, solves all those problems and a few bits
-more. As an example, the following grammar portion:</p>
-
-<div class="precontain">
-<pre>
-Expression ::= Expression - Term | Term
-Term ::= Term * Factor | Factor
-Factor ::= ( Expression ) | Integer
-</pre>
-</div>
-
-<p>Can be turned into the following code snippet:</p>
-
-<div class="precontain">
-<code>
-package Expr_Redir is new Redirect;
-package Term_Redir is new Redirect;
-
-function Left_Parens is new Match ('(');
-function Right_Parens is new Match (')');
-function Fac_Expr is new Between (Left_Parens, Expr_Redir.Call, Right_Parens);
-function Fac_Choice is new Choice_2 (Fac_Expr, Integer_Num);
-function Factor is new Stamp (Factor_Label, Fac_Choice);
-
-function Multiply is new Match ('*');
-function Term_Mult is new Sequence (Term_Redir.Call'Access, Multiply'Access, Factor'Access);
-function Term_Choice is new Choice_2 (Term_Mult, Factor);
-function Term is new Stamp (Term_Label, Term_Choice);
-
-function Minus is new Match ('-');
-function Expr_Minus is new Sequence (Expr_Redir.Call'Access, Minus'Access, Term'Access);
-function Expr_Choice is new Choice_2 (Expr_Minus, Term);
-function Expression is new Stamp (Expression_Label, Expr_Choice);
-</code>
-</div>
-
-<p>Most of the verbosity is caused by the need to individually instantiate each combinator, as
-generics are used to serve the same purpose as higher order functions. Some bits are also omitted,
-such as the label enumeration and the actual setting of the redirectors. But the above should
-provide a good general impression.</p>
-
-
-<h5>Features</h5>
-
-<p>A list of features that this library provides includes:</p>
-<ul>
- <li>Higher order combinator functions in Ada, a language that does not support functional
- programming</li>
- <li>Both parser combinators and simpler lexer combinators are available</li>
- <li>Input can be any array, whether that is text strings or otherwise</li>
- <li>Left recursive grammars are parsed correctly with no infinite loops</li>
- <li>Ambiguity is handled by incorporating all possible valid options into the resulting parse tree
- </li>
- <li>Parsing and lexing can both be done piecewise, providing input in successive parts instead of
- all at once</li>
- <li>Error messages are generated when applicable that note what would have been needed and where
- for a successful parse</li>
- <li>All of the above is accomplished in polynomial worst case time and space complexity</li>
-</ul>
-
-<p>More thorough documentation is provided in the <em>/doc</em> directory.</p>
-
-<p>The name of the library comes from <a href="https://bford.info/pub/lang/packrat-icfp02.pdf"
-class="external">packrat parsing</a> which is a parsing algorithm that avoids exponential time
-complexity by memoizing all intermediate results. As that is what this library does, both so as to
-reduce the time complexity and to enable piecewise parsing, the name seemed appropriate.</p>
-
-
-<h5>Left Recursion</h5>
-
-<p>Direct left recursion, meaning a grammar non-terminal that immediately recurses to itself on the
-left as in the <em>Expression</em> or <em>Term</em> used above, is fairly easy to handle. A counter
-is used to keep track of how many times a particular combinator has been applied to the input at a
-particular position, and when that counter exceeds the number of unconsumed input tokens plus one
-the result is curtailed. This is explained on pages 7 and 8 of the paper.</p>
-
-<p>The really thorny issue that caused the most problems with this library is indirect left
-recursion. This is when a non-terminal recurses to itself on the left only after first evaluating to
-one or more other non-terminals. Curtailment in these circumstances can easily cause those other
-non-terminals to also be curtailed, and reusing the results for those other non-terminals may be
-incorrect. This issue along with a proposed solution is explained on page 9 of the paper. However
-that solution was not as clear as would be preferred. So some minor rephrasing and reworking was
-needed.</p>
-
-<p>Bringing this problem back to the start: What are we really doing when we curtail a result due to
-left recursion? It is not a matter of cutting off branches in a parse tree. We are identifying
-conditions where the parse result of a particular non-terminal can be calculated without further
-recursion. The word "curtailment" is somewhat misleading in this regard. Once this reframing is done
-then a clearer view immediately follows.</p>
-
-<p>What is the condition? Exactly as described above for direct left recursion. Through comparing
-recursion counts with the amount of unconsumed input we determine that a result of no successful
-parse can be calculated, and that the result is valid for reuse for any deeper recursion of the same
-combinator at that input position.</p>
-
-<p>From that can be derived:</p>
-<ul>
- <li>When merging two results that have different left recursion count conditions for the same
- non-terminal, the larger count should be used</li>
- <li>Conditions of subresults should also be made part of any result that includes those subresults
- </li>
- <li>Any memoized result is only reusable if all the recursion count conditions of the stored
- result are less than or equal to the recursion counts for the current input position</li>
-</ul>
-
-<p>So far the above list just covers what is in the paper. But there is a little more that can be
-inferred:</p>
-<ul>
- <li>If a result is not reusable and a new result is calculated, then the recursion count
- conditions of the old result should be updated to the recursion counts at the current position and
- applied to the new result</li>
- <li>When the recursion count of a condition applied to a result plus the number of unconsumed
- input tokens after the result is less than the number of input tokens available at the beginning
- of the result, then that condition can be omitted from the result</li>
-</ul>
-
-<p>These two details should constitute a minor efficiency improvement.</p>
-
-
-<h5>Further Work</h5>
-
-<p>While the polynomial complexity of this library has been experimentally confirmed, no effort has
-yet been made to prove that it is actually polynomial in the same way that the parser combinators in
-the paper are. It is possible that due to the changes involved with using a non-functional language
-and enabling piecewise parsing that some subtle complexity difference may have arisen.</p>
-
-<p>Likewise, the piecewise parsing has been unit tested to a degree but no formal proof that it is
-correct has been done.</p>
-
-<p>Ideas like being able to modify and resume an erroneous parsing attempt would also be interesting
-to explore.</p>
-
-<p>Finally, the plan is to actually use this library for something significant at some point in the
-future.</p>
-{% endblock -%}
-