From 809ccef242df42ab43ca0b05e48bf841f840be4b Mon Sep 17 00:00:00 2001
From: Jed Barber <jjbarber@y7mail.com>
Date: Sun, 6 Jan 2019 21:48:55 +1100
Subject: Initial commit of design notes

---
 packrat_parser_lib_notes.txt | 263 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 263 insertions(+)
 create mode 100644 packrat_parser_lib_notes.txt

(limited to 'packrat_parser_lib_notes.txt')

diff --git a/packrat_parser_lib_notes.txt b/packrat_parser_lib_notes.txt
new file mode 100644
index 0000000..faa423e
--- /dev/null
+++ b/packrat_parser_lib_notes.txt
@@ -0,0 +1,263 @@
+
+
+package structure:
+
+Packrat
+Packrat.Parser (generic over memoize enum, input item type, array of input items, graph of output item subarrays)
+Packrat.Parser.Combinators
+Packrat.Lexer (generic over stamp enum, input item type, array of input items, array of output items wrapped as tokens)
+Packrat.Lexer.Combinators
+Packrat.Util
+Packrat.Error (nested)
+Packrat.Graphs (nested, generic over leaf array type)
+Packrat.Tokens (nested, generic over contained array)
+
+Ratnest.Tests
+Ratnest.Examples
+
+Calculator
+Tomita
+Off_Side
+
+
+
+
+planned order of writing:
+(in all cases notes here, then spec, then tests, then body)
+(Ratnest not mentioned since that's where all the testing functions will go)
+
+Packrat.Util
+Packrat.Error
+Packrat.Tokens
+Packrat.Lexer
+Packrat.Lexer.Combinators
+Packrat.Graphs
+Packrat.Parser
+Packrat.Parser.Combinators
+Packrat (any remaining)
+Calculator
+Tomita
+Off_Side
+
+
+
+
+
+
+Packrat
+ - main package
+ - defines parser_component_function, lexer_component_function, syntax_tree, token_list, parser_context types
+ - parser_context type is completely opaque and has an empty_context constant
+ - syntax_tree is a tagged type container
+ - token_list is just an array (so can be any sort of string, or an array of tokens produced by a lexer, etc)
+ - is it possible to define parser/lexer components such that an instantiated main parsing/lexing function qualifies?
+   - probably not, since the main parsing/lexing functions will be able to throw exceptions for parsing/lexing errors,
+     whereas the component functions will only pass a failure up the chain as part of the context
+ - possible usage modes include (token_list -> lexer -> token_list -> parser -> syntax_tree) and (token_list ->
+   parser -> syntax_tree)
+
+List of funcs:
+(primitive operations for syntax_tree)
+
+
+
+
+Packrat.Parser
+ - generic over the enum used for memoizing and the input token_list as well as token elements
+
+List of funcs:
+Parse
+ - may return partial results if incomplete input supplied
+ - new input can then be fed back in with the parse_context and initial results to continue parsing
+Parse_Only
+ - will not return partial results and will always attempt a complete parse
+Parse_With
+ - must be supplied with a function to acquire more input
+Parse_Full
+ - will not return partial results and will always require there to be no more input remaining at end
+Memoize
+
+
+
+
+Packrat.Parser.Combinators
+ - all higher order functions/procedures go in here
+ - updating memo_table should be done by overwriting in order to accommodate left recursion
+   unwinding properly
+
+List of funcs:
+(these are combinators that arrange nodes produced by parsers but do not produce nodes themselves)
+Sequence (one of the two that require passing in an array of function accesses)
+ - takes an array of component accesses and adds whatever subtrees they produce in order as
+   children of the current position on the output structure
+Choice (the other that requires passing in an array of function accesses)
+ - takes an array of component accesses and trys them in order, first one to succeed has the
+   subtree it produces added as a child of the current position on the output structure
+Count
+ - 
+Some
+Some_Until
+Many
+Many_Until
+Separate_By
+Separate_End_By
+
+(these are parser components that create nodes from input)
+Satisfy
+ - takes a predicate and if that predicate applied to the next input token is true, creates a
+   node with the next input token
+Satisfy_With
+ - takes a predicate and a transforming function, applies the transform to the next input token
+   then tests it with the predicate as above
+ - the node created uses the *un*transformed input
+Default
+ - takes a default argument (same type as input tokens) and a parser component, and either
+   returns the subresult from the component if it succeeds, or creates a node with the default
+   argument if the component fails
+Match
+ - for matching one item, eg a char in a string input, or a string in a lex'd string token array
+ - checks a supplied argument against the next input token, and if they match, creates a node
+   with the next input token
+Match_With
+ - first uses a transforming function on the next input token then tests against a supplied
+   argument as above, creating a node with the untransformed input if there is a match
+Substring
+ - for matching multiple successive items, eg a substring in a string input
+ - checks an array argument against the same length subarray of input tokens for a match, if
+   successful creates a node with that subarray
+Substring_With
+ - applies a transforming function before the check as above
+Take
+ - creates a node with the next input token
+Take_While
+ - takes a predicate and creates a node with the subarray of input tokens corresponding to
+   the next N input where the predicate succeeds
+Take_Until
+ - takes a predicate and creates a node with the subarray of input tokens corresponding to
+   the next N input where the predicate fails
+Take_All
+ - creates a node with the complete rest of the input token array
+End_Of_Line
+ - a combination of Choice and Match/Substring with "\r\n", "\n", or "\r"
+
+(these are recogniser combinators that discard nodes produced by other components)
+Skip
+ - takes a parser component, executes it, and if it succeeds the result is discarded
+   instead of becoming part of the output
+
+(these are recogniser components that do not create nodes)
+Empty
+End_Of_Input
+
+
+
+
+Packrat.Lexer
+ - generic over the enum used for labeling lexemes and the input token_list as well as component token elements
+
+List of funcs:
+Scan
+Scan_Only
+Scan_With
+Stamp
+
+
+
+
+Packrat.Lexer.Combinators
+
+List of funcs:
+Sequence
+
+Satisfy
+Satisfy_With
+Match
+Match_With
+Take
+Take_While
+Take_Until
+
+
+
+
+Packrat.Util
+ - contains predicates to use in Satisfy parser components and similar
+ - has a few string output functions for syntax_tree objects
+
+List of funcs:
+In_Set
+Not_In_Set
+Is_Digit
+Is_Hex
+Is_Letter
+Is_Alphanumeric
+Is_ASCII
+Is_Space
+Is_Linespace
+Is_End_Of_Line
+Is_Whitespace
+Not_Whitespace
+
+(for syntax_trees)
+To_String
+Pretty_Print
+
+(for exception messages)
+To_String
+Pretty_Print
+
+
+
+
+Packrat.Error
+(actually a nested package, as this functionality is important to parsers/lexers)
+ - functions to handle and process exception messages
+ - exception messages take the form of one or more "s<SYMBOLNAME>p<POSITION>" separated by a space
+ - this message represents the list of expected symbols at particular positions that would have resulted in a
+   more complete parse
+
+List of datatypes:
+Error_Info (containing an enum of the symbol expected, and a natural of the position)
+
+List of funcs:
+New_Message
+To_Message
+From_Message
+
+
+
+
+
+
+Ratnest.Tests
+
+Ratnest.Examples
+ - some parser examples
+
+List of funcs:
+Scientific
+Signed
+Decimal
+Hexadecimal
+Double
+
+
+
+
+
+
+Miscellanea:
+
+Where to place C export binding?
+
+Recognisers/combinators implemented as generic functions/procedures if they require more than just input/output.
+This is done to get around the lack of first class functions, as combinators usually take the form of higher order functions.
+
+eg Term, Sequence, Choice, Some, Many
+
+How to force functions/procedures of a particular type to be referentially transparent?
+
+Want support for incremental input, so will need Parse function that can return a partially parsed state that can be
+supplied with more input.
+
+
-- 
cgit