package structure: Packrat Packrat.Errors (nested) Packrat.Tokens (nested, generic over contained array) Packrat.Util Packrat.Parse_Graphs (generic over memoize enum, tokens) Packrat.Lexer (generic over stamp enum, input item type, array of input items, array of output items wrapped as tokens) Packrat.Parser (generic over memoize enum, input item type, array of input items, graph of output item subarrays) Packrat.Text Packrat.Text.No_Lex (generic over parser label enum) Packrat.Text.Standard (generic over parser/lexer label enums) Packrat.Text.Wide (generic over parser/lexer label enums) Packrat.Text.Wide_Wide (generic over parser/lexer label enums) Ratnest.Tests Ratnest.Examples Calculator Tomita Off_Side planned order of writing: (in all cases notes here, then spec, then tests, then body) (Ratnest not mentioned since that's where all the testing functions will go) Packrat.Util Packrat Packrat.Errors Packrat.Tokens Packrat.Lexer Packrat.Parse_Graphs Packrat.Parser Packrat.Text Packrat.Text.No_Lex Packrat.Text.Standard Packrat.Text.Wide Packrat.Text.Wide_Wide Calculator Tomita Off_Side Packrat - main package - defines exceptions, some common enumeration types that are used throughout - contains Errors and Tokens as nested packages - has some protected utility functions to convert between String and Unbounded_String types - intended usage mode is (token_array -> lexer -> token_array -> parser -> parse_graph) although it is also possible for (token_array -> parser -> parse_graph) List of types: Result_Status Parser_Error Lexer_Error Packrat.Errors - nested package, as this functionality is important to parsers/lexers - functions to handle and process exception messages - exception messages take the form of one or more "sp" substrings joined together, with each symbol name being in all capitals and each position being a positive integer - this message represents the list of expected symbols at particular positions that would have resulted in a more complete parse - one of these messages will be raised with an exception if no valid parse is possible with a given input List of datatypes: Error_Message (the errors encoded into a String) Error_Info (containing a string of the symbol name expected, and a natural of the position) Error_Info_Array List of funcs: Valid_Message Valid_Identifier Valid_Identifier_Array Debug_String Join Encode Encode_Array Decode (the message format ensures that using "&" to join messages will still result in a valid message) Packrat.Tokens - nested package, defines a datatype important throughout lexing/parsing - generic over the array type of whatever is being lexed/parsed and the enum of valid token labels - contains an enum identifier, the start position, and the token value - cannot contain the finish position of a token because that would make it impossible to avoid exponential blowup in parse graphs for ambiguous grammars List of datatypes: Token List of funcs: Create Debug_String Label Value Start Packrat.Util - contains predicates to use in Satisfy parser components and similar List of funcs: In_Set Not_In_Set Is_Digit Is_Hex Is_Letter Is_Alphanumeric Is_Punctuation Is_ASCII Is_Extended_ASCII Is_Space Is_Linespace Is_End_Of_Line Is_Whitespace Not_Whitespace Packrat.Lexer - generic over the enum used for labeling lexemes, the input element list, and an instantiated Packrat.Tokens - contains all datatypes and functions for lexing List of types: Combinator Combinator_Result Combinator_Array Lexer_Context Component Component_Result Component_Array With_Input List of funcs: (Component functions) Stamp (will store a token for the input recognised by its combinators) Ignore (won't store a token) (Lexing functions) Scan (can be fed input in chunks, will keep continuing until fed a zero-length array) Scan_Only (will assume that the input given is all there is, without considering the possibility of more input) Scan_With (takes a function that supplies input when called, processes input the same as Scan) Scan_Set (fed input in constant size chunks with a specified padding character) Scan_Set_With (a combination of the functionality of Scan_Set and Scan_With) (Combinator functions) Sequence Count Many Many_Until Satisfy Satisfy_With Match Match_With Multimatch Take Take_While Take_Until Line_End Input_End Packrat.Parse_Graphs - generic over an instantiated Packrat.Tokens and associated types - based on Ada.Containers.Directed_Graphs in the directed-graph repo - sets up the above directed graph lib for use as a parse graph that will avoid exponential blowup - provides extra functions to merge graphs and iterate specific parsing choices Packrat.Parser - generic over the enum used for memoizing and the input token_list as well as token elements List of funcs: Parse - may return partial results if incomplete input supplied - new input can then be fed back in with the parse_context and initial results to continue parsing Parse_Only - will not return partial results and will always attempt a complete parse Parse_With - must be supplied with a function to acquire more input Parse_Full - will not return partial results and will always require there to be no more input remaining at end Memoize Packrat.Parser.Combinators - all higher order functions/procedures go in here - updating memo_table should be done by overwriting in order to accommodate left recursion unwinding properly List of funcs: (these are combinators that arrange nodes produced by parsers but do not produce nodes themselves) Sequence (one of the two that require passing in an array of function accesses) - takes an array of component accesses and adds whatever subtrees they produce in order as children of the current position on the output structure Choice (the other that requires passing in an array of function accesses) - takes an array of component accesses and trys them in order, first one to succeed has the subtree it produces added as a child of the current position on the output structure Count - Many Many_Until Separate_By Separate_End_By (these are parser components that create nodes from input) Satisfy - takes a predicate and if that predicate applied to the next input token is true, creates a node with the next input token Satisfy_With - takes a predicate and a transforming function, applies the transform to the next input token then tests it with the predicate as above - the node created uses the *un*transformed input Default - takes a default argument (same type as input tokens) and a parser component, and either returns the subresult from the component if it succeeds, or creates a node with the default argument if the component fails Match - for matching one item, eg a char in a string input, or a string in a lex'd string token array - checks a supplied argument against the next input token, and if they match, creates a node with the next input token Match_With - first uses a transforming function on the next input token then tests against a supplied argument as above, creating a node with the untransformed input if there is a match Multimatch - for matching multiple successive items, eg a substring in a string input - checks an array argument against the same length subarray of input tokens for a match, if successful creates a node with that subarray Multimatch_With - applies a transforming function before the check as above Take - creates a node with the next input token Take_While - takes a predicate and creates a node with the subarray of input tokens corresponding to the next N input where the predicate succeeds Take_Until - takes a predicate and creates a node with the subarray of input tokens corresponding to the next N input where the predicate fails (these are recogniser combinators that discard nodes produced by other components) Skip - takes a parser component, executes it, and if it succeeds the result is discarded instead of becoming part of the output (these are recogniser components that do not create nodes) Empty End_Of_Input Ratnest List of funcs: Run_Tests Ratnest.Tests List of funcs: Valid_Message_Check Valid_Identifier_Check Join_Check Encode_1_Check Encode_2_Check Encode_3_Check Encode_4_Check Decode_Check Token_Adjust_Check Token_Store_Check In_Set_Check Not_In_Set_Check Is_Digit_Check Is_Hex_Check Is_Letter_Check Is_Alphanumeric_Check Is_ASCII_Check Is_Extended_ASCII_Check Is_Space_Check Is_Linespace_Check Is_End_Of_Line_Check Is_Whitespace_Check Is_Not_Whitespace_Check Ratnest.Examples - some parser examples List of funcs: Scientific Signed Decimal Hexadecimal Double Miscellanea: Where to place C export binding? Recognisers/combinators implemented as generic functions/procedures if they require more than just input/output. This is done to get around the lack of first class functions, as combinators usually take the form of higher order functions. eg Term, Sequence, Choice, Some, Many How to force functions/procedures of a particular type to be referentially transparent? Want support for incremental input, so will need Parse function that can return a partially parsed state that can be supplied with more input.