From c7195329c60123b2363ba13863f6951a21d0ff57 Mon Sep 17 00:00:00 2001 From: Jed Barber Date: Sat, 23 Jan 2021 02:38:49 +1100 Subject: Implementation detail docs, some old notes removed --- doc/curtailment.html | 55 ++++++++++++++++++++++++++++++++++++++++++++ doc/error_strings.html | 32 ++++++++++++++++++++++++++ doc/index.html | 11 ++++++++- doc/memoize.html | 32 ++++++++++++++++++++++++++ doc/overview.html | 28 ++++++++++++++++++++++ doc/start_finish_ranges.html | 37 +++++++++++++++++++++++++++++ 6 files changed, 194 insertions(+), 1 deletion(-) create mode 100644 doc/curtailment.html create mode 100644 doc/error_strings.html create mode 100644 doc/memoize.html create mode 100644 doc/overview.html create mode 100644 doc/start_finish_ranges.html (limited to 'doc') diff --git a/doc/curtailment.html b/doc/curtailment.html new file mode 100644 index 0000000..9999dac --- /dev/null +++ b/doc/curtailment.html @@ -0,0 +1,55 @@ + + + + + + + Curtailment - Packrat Docs + + + + + + +

Curtailment

+ + Return to Contents + + +

CAUTION: This is implementation information and should not be + relied upon in any way when using the library.

+ +

A curtailment is information attached to a result denoting one or more + combinator/depth pairs.

+ +

Curtailment in general means that the result for a given combinator at a given + position is only valid when that combinator is next encountered at that position for + left recursion depth counts equal or greater than the curtailment.

+ +

If the left recursion count of a combinator at a particular position exceeds the + number of remaining input tokens plus one, the combinator is curtailed at that count + depth and fails.

+ +

When results are folded upwards, any curtailments are retained.

+ +

When two results are merged as in Sequence and Choice, if the + results share curtailments then the curtailment of greater depth is retained.

+ +

If a result has to be recalculated due to the previous result for a given + combinator at a given position not being valid due to being curtailed at a greater + left recursion depth, then the recalculated result has the previous result's + curtailments applied to it, except with the count depths replaced with the current + left recursion depths.

+ +

When a result is recalculated and the depth of a curtailment plus the number of + remaining tokens after the furthest finish of a result is less than the remaining + tokens at the start of a result, then the curtail can be deleted from the result, + since then it is no longer possible for curtailment to happen due to excessive + left recursion count depth. This is algebraically equivalent to checking if the + furthest finish of a result minus the left recursion depth is greater than the + start position of the result.

+ + + + + diff --git a/doc/error_strings.html b/doc/error_strings.html new file mode 100644 index 0000000..db9318d --- /dev/null +++ b/doc/error_strings.html @@ -0,0 +1,32 @@ + + + + + + + Error Strings - Packrat Docs + + + + + + +

Error Strings

+ + Return to Contents + + +

CAUTION: This is implementation information and should not be + relied upon in any way when using the library.

+ +

The format used in the exception message error strings is:
+
+ s<SYMBOL_NAME>p<NUMBER>
+
+ There is one of these substrings for every error encountered, all concatenated + together in arbitrary order.

+ + + + + diff --git a/doc/index.html b/doc/index.html index 8b26b6e..096b0e2 100644 --- a/doc/index.html +++ b/doc/index.html @@ -20,10 +20,11 @@
  1. License
  2. Dependencies
  3. +
  4. Overview
  5. Quickstart Guide
  6. Features
  7. Limitations
  8. -
  9. File Overview
    +
  10. File Overviews
    1. /src
    2. /example
    3. @@ -39,6 +40,14 @@
  11. Handling Errors
  12. +
  13. Implementation Details
    +
      +
    1. Start and Finish Ranges
    2. +
    3. Error Strings
    4. +
    5. Curtailment
    6. +
    7. Memoization
    8. +
    +
  14. Credits and Further Reading
diff --git a/doc/memoize.html b/doc/memoize.html new file mode 100644 index 0000000..780031b --- /dev/null +++ b/doc/memoize.html @@ -0,0 +1,32 @@ + + + + + + + Memoization - Packrat Docs + + + + + + +

Memoization

+ + Return to Contents + + +

CAUTION: This is implementation information and should not be + relied upon in any way when using the library.

+ +

All intermediate parsing results are memoized. This is automatic and + unavoidable, and is necessary to make piecewise parsing possible.

+ +

When piecewise parsing, the combinators that return partially complete + results are used to determine what portion of the input must be passed + forwards to allow calculating the full result when parsing future pieces.

+ + + + + diff --git a/doc/overview.html b/doc/overview.html new file mode 100644 index 0000000..62189f8 --- /dev/null +++ b/doc/overview.html @@ -0,0 +1,28 @@ + + + + + + + Overview - Packrat Docs + + + + + + +

Overview

+ + Return to Contents + + +

Packrat is a parser combinator library that can handle left recursive + ambiguous grammars, and does so in polynomial time and space. It gets its + name from packrat parsing, an algorithm which involves memoizing all + intermediate parsing results in order to avoid exponential time complexity. + The library includes both parser combinators and simpler lexer combinators.

+ + + + + diff --git a/doc/start_finish_ranges.html b/doc/start_finish_ranges.html new file mode 100644 index 0000000..d69c60e --- /dev/null +++ b/doc/start_finish_ranges.html @@ -0,0 +1,37 @@ + + + + + + + Start and Finish Ranges - Packrat Docs + + + + + + +

Start and Finish Ranges

+ + Return to Contents + + +

When parsing input, the start position and finish position that + describe a parsed or lexed chunk are the index of the first item and + the last item of that chunk. This was done to match the idea with + how array slicing is done in Ada, but it has a few other implications.

+ +

Empty chunks are described by a finish position that is less than + the start position. For consistency, this is always done as the finish + position being only one less than the start.

+ +

This convention also differs from the convention used by Frost, Hafiz, + and Callaghan (2008). They used start positions corresponding to the first + item in a chunk, but finish positions corresponding to the first item after + the chunk had ended. Care should be taken to keep this in mind to avoid + off-by-one errors.

+ + + + + -- cgit