From 809ccef242df42ab43ca0b05e48bf841f840be4b Mon Sep 17 00:00:00 2001 From: Jed Barber Date: Sun, 6 Jan 2019 21:48:55 +1100 Subject: Initial commit of design notes --- packrat_parser_lib_notes.txt | 263 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 263 insertions(+) create mode 100644 packrat_parser_lib_notes.txt (limited to 'packrat_parser_lib_notes.txt') diff --git a/packrat_parser_lib_notes.txt b/packrat_parser_lib_notes.txt new file mode 100644 index 0000000..faa423e --- /dev/null +++ b/packrat_parser_lib_notes.txt @@ -0,0 +1,263 @@ + + +package structure: + +Packrat +Packrat.Parser (generic over memoize enum, input item type, array of input items, graph of output item subarrays) +Packrat.Parser.Combinators +Packrat.Lexer (generic over stamp enum, input item type, array of input items, array of output items wrapped as tokens) +Packrat.Lexer.Combinators +Packrat.Util +Packrat.Error (nested) +Packrat.Graphs (nested, generic over leaf array type) +Packrat.Tokens (nested, generic over contained array) + +Ratnest.Tests +Ratnest.Examples + +Calculator +Tomita +Off_Side + + + + +planned order of writing: +(in all cases notes here, then spec, then tests, then body) +(Ratnest not mentioned since that's where all the testing functions will go) + +Packrat.Util +Packrat.Error +Packrat.Tokens +Packrat.Lexer +Packrat.Lexer.Combinators +Packrat.Graphs +Packrat.Parser +Packrat.Parser.Combinators +Packrat (any remaining) +Calculator +Tomita +Off_Side + + + + + + +Packrat + - main package + - defines parser_component_function, lexer_component_function, syntax_tree, token_list, parser_context types + - parser_context type is completely opaque and has an empty_context constant + - syntax_tree is a tagged type container + - token_list is just an array (so can be any sort of string, or an array of tokens produced by a lexer, etc) + - is it possible to define parser/lexer components such that an instantiated main parsing/lexing function qualifies? + - probably not, since the main parsing/lexing functions will be able to throw exceptions for parsing/lexing errors, + whereas the component functions will only pass a failure up the chain as part of the context + - possible usage modes include (token_list -> lexer -> token_list -> parser -> syntax_tree) and (token_list -> + parser -> syntax_tree) + +List of funcs: +(primitive operations for syntax_tree) + + + + +Packrat.Parser + - generic over the enum used for memoizing and the input token_list as well as token elements + +List of funcs: +Parse + - may return partial results if incomplete input supplied + - new input can then be fed back in with the parse_context and initial results to continue parsing +Parse_Only + - will not return partial results and will always attempt a complete parse +Parse_With + - must be supplied with a function to acquire more input +Parse_Full + - will not return partial results and will always require there to be no more input remaining at end +Memoize + + + + +Packrat.Parser.Combinators + - all higher order functions/procedures go in here + - updating memo_table should be done by overwriting in order to accommodate left recursion + unwinding properly + +List of funcs: +(these are combinators that arrange nodes produced by parsers but do not produce nodes themselves) +Sequence (one of the two that require passing in an array of function accesses) + - takes an array of component accesses and adds whatever subtrees they produce in order as + children of the current position on the output structure +Choice (the other that requires passing in an array of function accesses) + - takes an array of component accesses and trys them in order, first one to succeed has the + subtree it produces added as a child of the current position on the output structure +Count + - +Some +Some_Until +Many +Many_Until +Separate_By +Separate_End_By + +(these are parser components that create nodes from input) +Satisfy + - takes a predicate and if that predicate applied to the next input token is true, creates a + node with the next input token +Satisfy_With + - takes a predicate and a transforming function, applies the transform to the next input token + then tests it with the predicate as above + - the node created uses the *un*transformed input +Default + - takes a default argument (same type as input tokens) and a parser component, and either + returns the subresult from the component if it succeeds, or creates a node with the default + argument if the component fails +Match + - for matching one item, eg a char in a string input, or a string in a lex'd string token array + - checks a supplied argument against the next input token, and if they match, creates a node + with the next input token +Match_With + - first uses a transforming function on the next input token then tests against a supplied + argument as above, creating a node with the untransformed input if there is a match +Substring + - for matching multiple successive items, eg a substring in a string input + - checks an array argument against the same length subarray of input tokens for a match, if + successful creates a node with that subarray +Substring_With + - applies a transforming function before the check as above +Take + - creates a node with the next input token +Take_While + - takes a predicate and creates a node with the subarray of input tokens corresponding to + the next N input where the predicate succeeds +Take_Until + - takes a predicate and creates a node with the subarray of input tokens corresponding to + the next N input where the predicate fails +Take_All + - creates a node with the complete rest of the input token array +End_Of_Line + - a combination of Choice and Match/Substring with "\r\n", "\n", or "\r" + +(these are recogniser combinators that discard nodes produced by other components) +Skip + - takes a parser component, executes it, and if it succeeds the result is discarded + instead of becoming part of the output + +(these are recogniser components that do not create nodes) +Empty +End_Of_Input + + + + +Packrat.Lexer + - generic over the enum used for labeling lexemes and the input token_list as well as component token elements + +List of funcs: +Scan +Scan_Only +Scan_With +Stamp + + + + +Packrat.Lexer.Combinators + +List of funcs: +Sequence + +Satisfy +Satisfy_With +Match +Match_With +Take +Take_While +Take_Until + + + + +Packrat.Util + - contains predicates to use in Satisfy parser components and similar + - has a few string output functions for syntax_tree objects + +List of funcs: +In_Set +Not_In_Set +Is_Digit +Is_Hex +Is_Letter +Is_Alphanumeric +Is_ASCII +Is_Space +Is_Linespace +Is_End_Of_Line +Is_Whitespace +Not_Whitespace + +(for syntax_trees) +To_String +Pretty_Print + +(for exception messages) +To_String +Pretty_Print + + + + +Packrat.Error +(actually a nested package, as this functionality is important to parsers/lexers) + - functions to handle and process exception messages + - exception messages take the form of one or more "sp" separated by a space + - this message represents the list of expected symbols at particular positions that would have resulted in a + more complete parse + +List of datatypes: +Error_Info (containing an enum of the symbol expected, and a natural of the position) + +List of funcs: +New_Message +To_Message +From_Message + + + + + + +Ratnest.Tests + +Ratnest.Examples + - some parser examples + +List of funcs: +Scientific +Signed +Decimal +Hexadecimal +Double + + + + + + +Miscellanea: + +Where to place C export binding? + +Recognisers/combinators implemented as generic functions/procedures if they require more than just input/output. +This is done to get around the lack of first class functions, as combinators usually take the form of higher order functions. + +eg Term, Sequence, Choice, Some, Many + +How to force functions/procedures of a particular type to be referentially transparent? + +Want support for incremental input, so will need Parse function that can return a partially parsed state that can be +supplied with more input. + + -- cgit