From d089edbb61f6bf6b52bc74f7f03f909b8b33b670 Mon Sep 17 00:00:00 2001 From: Jed Barber Date: Tue, 19 Jan 2021 15:59:35 +1100 Subject: Readme added, some outdated notes removed --- packrat_parser_lib_notes.txt | 357 ------------------------------------------- readme.txt | 50 ++++++ 2 files changed, 50 insertions(+), 357 deletions(-) delete mode 100644 packrat_parser_lib_notes.txt create mode 100644 readme.txt diff --git a/packrat_parser_lib_notes.txt b/packrat_parser_lib_notes.txt deleted file mode 100644 index 6aa0d0b..0000000 --- a/packrat_parser_lib_notes.txt +++ /dev/null @@ -1,357 +0,0 @@ - - -package structure: - -Packrat -Packrat.Errors (nested) -Packrat.Tokens (nested, generic over contained array) -Packrat.Util -Packrat.Parse_Graphs (generic over memoize enum, tokens) -Packrat.Lexer (generic over stamp enum, input item type, array of input items, array of output items wrapped as tokens) -Packrat.Parser (generic over memoize enum, input item type, array of input items, graph of output item subarrays) -Packrat.Text -Packrat.Text.No_Lex (generic over parser label enum) -Packrat.Text.Standard (generic over parser/lexer label enums) -Packrat.Text.Wide (generic over parser/lexer label enums) -Packrat.Text.Wide_Wide (generic over parser/lexer label enums) - -Ratnest.Tests -Ratnest.Examples - -Calculator -Tomita -Off_Side - - - - -planned order of writing: -(in all cases notes here, then spec, then tests, then body) -(Ratnest not mentioned since that's where all the testing functions will go) - -Packrat.Util -Packrat -Packrat.Errors -Packrat.Tokens -Packrat.Lexer -Packrat.Parse_Graphs -Packrat.Parser -Packrat.Text -Packrat.Text.No_Lex -Packrat.Text.Standard -Packrat.Text.Wide -Packrat.Text.Wide_Wide -Calculator -Tomita -Off_Side - - - - - - -Packrat - - main package - - defines exceptions, some common enumeration types that are used throughout - - contains Errors and Tokens as nested packages - - has some protected utility functions to convert between String and Unbounded_String types - - intended usage mode is (token_array -> lexer -> token_array -> parser -> parse_graph) although - it is also possible for (token_array -> parser -> parse_graph) - -List of types: - Result_Status - Parser_Error - Lexer_Error - - - - -Packrat.Errors - - nested package, as this functionality is important to parsers/lexers - - functions to handle and process exception messages - - exception messages take the form of one or more "sp" substrings joined together, - with each symbol name being in all capitals and each position being a positive integer - - this message represents the list of expected symbols at particular positions that would have resulted in a - more complete parse - - one of these messages will be raised with an exception if no valid parse is possible with a given input - -List of datatypes: - Error_Message (the errors encoded into a String) - Error_Info (containing a string of the symbol name expected, and a natural of the position) - Error_Info_Array - -List of funcs: - Valid_Message - Valid_Identifier - Valid_Identifier_Array - Debug_String - -Join - Encode - Encode_Array - Decode -(the message format ensures that using "&" to join messages will still result in a valid message) - - - - -Packrat.Tokens - - nested package, defines a datatype important throughout lexing/parsing - - generic over the array type of whatever is being lexed/parsed and the enum of valid token labels - - contains an enum identifier, the start position, and the token value - - cannot contain the finish position of a token because that would make it impossible - to avoid exponential blowup in parse graphs for ambiguous grammars - -List of datatypes: - Token - -List of funcs: - Create - Debug_String - - Label - Value - Start - - - - -Packrat.Util - - contains predicates to use in Satisfy parser components and similar - -List of funcs: - In_Set - Not_In_Set - Is_Digit - Is_Hex - Is_Letter - Is_Alphanumeric - Is_Punctuation - Is_ASCII - Is_Extended_ASCII - Is_Space - Is_Linespace - Is_End_Of_Line - Is_Whitespace - Not_Whitespace - - - - -Packrat.Lexer - - generic over the enum used for labeling lexemes, the input element list, and an instantiated Packrat.Tokens - - contains all datatypes and functions for lexing - -List of types: - Combinator - Combinator_Result - Combinator_Array - Lexer_Context - Component - Component_Result - Component_Array - With_Input - -List of funcs: -(Component functions) - Stamp (will store a token for the input recognised by its combinators) - Ignore (won't store a token) - -(Lexing functions) - Scan (can be fed input in chunks, will keep continuing until fed a zero-length array) - Scan_Only (will assume that the input given is all there is, without considering the possibility of more input) - Scan_With (takes a function that supplies input when called, processes input the same as Scan) - Scan_Set (fed input in constant size chunks with a specified padding character) - Scan_Set_With (a combination of the functionality of Scan_Set and Scan_With) - -(Combinator functions) - Sequence - Count - Many - Many_Until - - Satisfy - Satisfy_With - Match - Match_With - Multimatch - Take - Take_While - Take_Until - - Line_End - Input_End - - - - -Packrat.Parse_Graphs - - generic over an instantiated Packrat.Tokens and associated types - - based on Ada.Containers.Directed_Graphs in the directed-graph repo - - sets up the above directed graph lib for use as a parse graph that will avoid exponential blowup - - provides extra functions to merge graphs and iterate specific parsing choices - - - - - - -Packrat.Parser - - generic over the enum used for memoizing and the input token_list as well as token elements - -List of funcs: -Parse - - may return partial results if incomplete input supplied - - new input can then be fed back in with the parse_context and initial results to continue parsing -Parse_Only - - will not return partial results and will always attempt a complete parse -Parse_With - - must be supplied with a function to acquire more input -Parse_Full - - will not return partial results and will always require there to be no more input remaining at end -Memoize - - - - -Packrat.Parser.Combinators - - all higher order functions/procedures go in here - - updating memo_table should be done by overwriting in order to accommodate left recursion - unwinding properly - -List of funcs: -(these are combinators that arrange nodes produced by parsers but do not produce nodes themselves) -Sequence (one of the two that require passing in an array of function accesses) - - takes an array of component accesses and adds whatever subtrees they produce in order as - children of the current position on the output structure -Choice (the other that requires passing in an array of function accesses) - - takes an array of component accesses and trys them in order, first one to succeed has the - subtree it produces added as a child of the current position on the output structure -Count - - -Many -Many_Until -Separate_By -Separate_End_By - -(these are parser components that create nodes from input) -Satisfy - - takes a predicate and if that predicate applied to the next input token is true, creates a - node with the next input token -Satisfy_With - - takes a predicate and a transforming function, applies the transform to the next input token - then tests it with the predicate as above - - the node created uses the *un*transformed input -Default - - takes a default argument (same type as input tokens) and a parser component, and either - returns the subresult from the component if it succeeds, or creates a node with the default - argument if the component fails -Match - - for matching one item, eg a char in a string input, or a string in a lex'd string token array - - checks a supplied argument against the next input token, and if they match, creates a node - with the next input token -Match_With - - first uses a transforming function on the next input token then tests against a supplied - argument as above, creating a node with the untransformed input if there is a match -Multimatch - - for matching multiple successive items, eg a substring in a string input - - checks an array argument against the same length subarray of input tokens for a match, if - successful creates a node with that subarray -Multimatch_With - - applies a transforming function before the check as above -Take - - creates a node with the next input token -Take_While - - takes a predicate and creates a node with the subarray of input tokens corresponding to - the next N input where the predicate succeeds -Take_Until - - takes a predicate and creates a node with the subarray of input tokens corresponding to - the next N input where the predicate fails - -(these are recogniser combinators that discard nodes produced by other components) -Skip - - takes a parser component, executes it, and if it succeeds the result is discarded - instead of becoming part of the output - -(these are recogniser components that do not create nodes) -Empty -End_Of_Input - - - - - - - - -Ratnest - -List of funcs: -Run_Tests - - - - -Ratnest.Tests - -List of funcs: -Valid_Message_Check -Valid_Identifier_Check -Join_Check -Encode_1_Check -Encode_2_Check -Encode_3_Check -Encode_4_Check -Decode_Check - -Token_Adjust_Check -Token_Store_Check - -In_Set_Check -Not_In_Set_Check - -Is_Digit_Check -Is_Hex_Check -Is_Letter_Check -Is_Alphanumeric_Check -Is_ASCII_Check -Is_Extended_ASCII_Check -Is_Space_Check -Is_Linespace_Check -Is_End_Of_Line_Check -Is_Whitespace_Check -Is_Not_Whitespace_Check - - - - -Ratnest.Examples - - some parser examples - -List of funcs: -Scientific -Signed -Decimal -Hexadecimal -Double - - - - - - -Miscellanea: - -Where to place C export binding? - -Recognisers/combinators implemented as generic functions/procedures if they require more than just input/output. -This is done to get around the lack of first class functions, as combinators usually take the form of higher order functions. - -eg Term, Sequence, Choice, Some, Many - -How to force functions/procedures of a particular type to be referentially transparent? - -Want support for incremental input, so will need Parse function that can return a partially parsed state that can be -supplied with more input. - - diff --git a/readme.txt b/readme.txt new file mode 100644 index 0000000..bc2d77b --- /dev/null +++ b/readme.txt @@ -0,0 +1,50 @@ + + +Packrat Parser Combinators +========================== + + + + +This is a parser combinator library that can handle ambiguous left-recursive +grammars in at most polynomial time and space complexity. It does this by +memoizing all intermediate parse results and deduplicating them inside a +specialised parse graph structure. + +For more extensive documentation including a quick start guide and short +explanations of individual combinators, see the /doc/index.html file. + + + + +Dependencies: + + GNAT + directed-graph + basic-unit-test (only if compiling the unit test suite) + + + + +How to build/install: + +This repository is written to use the GNAT Project Manager build tools. To +build Packrat, use the following command + + gprbuild packrat.gpr -Xmode=release + +And to install the library, use + + gprinstall -p -m packrat.gpr + +The other project files (tests.gpr and examples.gpr) are for the unit test +suite and the provided examples of how to use this library. + + + + +For further information on the build tools, consult + + https://docs.adacore.com/gprbuild-docs/html/gprbuild_ug.html + + -- cgit