From e59ca4a3eaa53d66fb2dcd3ddbdd86d99b04b7c8 Mon Sep 17 00:00:00 2001
From: Jed Barber Git repository: Link Git repository: LinkPackrat Parser Combinator Library
-
+
Paper this was based on: Link2/2/2021
@@ -20,17 +20,16 @@ Paper this was based on: recursive descent parsing. They are higher order
-functions that can be combined in modular ways to create a desired parser.
Parser combinators are what you end up with when you start factoring out common pieces of +functionality from recursive descent parsing. They are higher order functions that can be combined in modular ways to create a desired parser.
-However they also inherit the drawbacks of recursive descent parsing, and in particular recursive descent -parsing with backtracking. If the grammar that the parser is designed to accept contains left recursion then -the parser will loop infinitely. If the grammar is ambiguous then only one result will be obtained. And any result -may require exponential time and space to calculate.
+However they also inherit the drawbacks of recursive descent parsing, and in particular recursive +descent parsing with backtracking. If the grammar that the parser is designed to accept contains +left recursion then the parser will loop infinitely. If the grammar is ambiguous then only one +result will be obtained. And any result may require exponential time and space to calculate.
-This library, based on the paper linked at the top, solves all those problems and a few bits more. As an -example, the following grammar portion:
+This library, based on the paper linked at the top, solves all those problems and a few bits +more. As an example, the following grammar portion:
@@ -65,74 +64,81 @@ function Expression is new Stamp (Expression_Label, Expr_Choice);
Most of the verbosity is caused by the need to individually instantiate each combinator, as generics are -used to serve the same purpose as higher order functions. Some bits are also omitted, such as the label -enumeration and the actual setting of the redirectors. But the above should provide a good general -impression.
+Most of the verbosity is caused by the need to individually instantiate each combinator, as +generics are used to serve the same purpose as higher order functions. Some bits are also omitted, +such as the label enumeration and the actual setting of the redirectors. But the above should +provide a good general impression.
A list of features that this library provides includes:
More thorough documentation is provided in the /doc directory.
-The name of the library comes from packrat parsing -which is a parsing algorithm that avoids exponential time complexity by memoizing all intermediate results. As that -is what this library does, both so as to reduce the time complexity and to enable piecewise parsing, the name -seemed appropriate.
+The name of the library comes from packrat parsing which is a parsing algorithm that avoids exponential time complexity by memoizing all intermediate results. As that is what this library does, both so as to reduce the time complexity +and to enable piecewise parsing, the name seemed appropriate.
Direct left recursion, meaning a grammar non-terminal that immediately recurses to itself on the left as in -the Expression or Term used above, is fairly easy to handle. A counter is used to keep track -of how many times a particular combinator has been applied to the input at a particular position, and when that -counter exceeds the number of unconsumed input tokens plus one the result is curtailed. This is explained on -pages 7 and 8 of the paper.
- -The really thorny issue that caused the most problems with this library is indirect left recursion. This is -when a non-terminal recurses to itself on the left only after first evaluating to one or more other non-terminals. -Curtailment in these circumstances can easily cause those other non-terminals to also be curtailed, and reusing -the results for those other non-terminals may be incorrect. This issue along with a proposed solution is -explained on page 9 of the paper. However that solution was not as clear as would be preferred. So some minor -rephrasing and reworking was needed.
- -Bringing this problem back to the start: What are we really doing when we curtail a result due to left -recursion? It is not a matter of cutting off branches in a parse tree. We are identifying conditions where the -parse result of a particular non-terminal can be calculated without further recursion. The word "curtailment" is -somewhat misleading in this regard. Once this reframing is done then a clearer view immediately follows.
- -What is the condition? Exactly as described above for direct left recursion. Through comparing recursion counts -with the amount of unconsumed input we determine that a result of no successful parse can be calculated, and that -the result is valid for reuse for any deeper recursion of the same combinator at that input position.
+Direct left recursion, meaning a grammar non-terminal that immediately recurses to itself on the +left as in the Expression or Term used above, is fairly easy to handle. A counter +is used to keep track of how many times a particular combinator has been applied to the input at a +particular position, and when that counter exceeds the number of unconsumed input tokens plus one +the result is curtailed. This is explained on pages 7 and 8 of the paper.
+ +The really thorny issue that caused the most problems with this library is indirect left +recursion. This is when a non-terminal recurses to itself on the left only after first evaluating to +one or more other non-terminals. Curtailment in these circumstances can easily cause those other +non-terminals to also be curtailed, and reusing the results for those other non-terminals may be +incorrect. This issue along with a proposed solution is explained on page 9 of the paper. However +that solution was not as clear as would be preferred. So some minor rephrasing and reworking was +needed.
+ +Bringing this problem back to the start: What are we really doing when we curtail a result due to +left recursion? It is not a matter of cutting off branches in a parse tree. We are identifying +conditions where the parse result of a particular non-terminal can be calculated without further +recursion. The word "curtailment" is somewhat misleading in this regard. Once this reframing is done +then a clearer view immediately follows.
+ +What is the condition? Exactly as described above for direct left recursion. Through comparing +recursion counts with the amount of unconsumed input we determine that a result of no successful +parse can be calculated, and that the result is valid for reuse for any deeper recursion of the same combinator at that input position.
From that can be derived:
So far the above list just covers what is in the paper. But there is a little more that can be inferred:
+So far the above list just covers what is in the paper. But there is a little more that can be +inferred:
These two details should constitute a minor efficiency improvement.
@@ -140,17 +146,19 @@ the result is valid for reuse for any deeper recursion of the same combinator atWhile the polynomial complexity of this library has been experimentally confirmed, no effort has yet been -made to prove that it is actually polynomial in the same way that the parser combinators in the paper are. It -is possible that due to the changes involved with using a non-functional language and enabling piecewise -parsing that some subtle complexity difference may have arisen.
+While the polynomial complexity of this library has been experimentally confirmed, no effort has +yet been made to prove that it is actually polynomial in the same way that the parser combinators in +the paper are. It is possible that due to the changes involved with using a non-functional language +and enabling piecewise parsing that some subtle complexity difference may have arisen.
-Likewise, the piecewise parsing has been unit tested to a degree but no formal proof that it is correct has -been done.
+Likewise, the piecewise parsing has been unit tested to a degree but no formal proof that it is +correct has been done.
-Ideas like being able to modify and resume an erroneous parsing attempt would also be interesting to explore.
+Ideas like being able to modify and resume an erroneous parsing attempt would also be interesting +to explore.
-Finally, the plan is to actually use this library for something significant at some point in the future.
+Finally, the plan is to actually use this library for something significant at some point in the +future.
{% endblock %} -- cgit