summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJed Barber <jjbarber@y7mail.com>2020-04-22 00:29:31 +1000
committerJed Barber <jjbarber@y7mail.com>2020-04-22 00:29:31 +1000
commit4b1face73de556041fea86c50e35e0ffaa3e8c74 (patch)
tree02b0af9c2d221c905a66e533566c51133dcf4445
parent42d3982f1e6335cb99c382ddd91c324e5fa458ad (diff)
Revamped notes, formulated generic directed graph API
-rw-r--r--packrat_parser_lib_notes.txt513
1 files changed, 252 insertions, 261 deletions
diff --git a/packrat_parser_lib_notes.txt b/packrat_parser_lib_notes.txt
index 2a8e0cc..67781cd 100644
--- a/packrat_parser_lib_notes.txt
+++ b/packrat_parser_lib_notes.txt
@@ -2,20 +2,20 @@
package structure:
+Ada.Containers.Directed_Graphs
+
Packrat
-Packrat.Parser (generic over memoize enum, input item type, array of input items, graph of output item subarrays)
-Packrat.Parser.Combinators (merged into Packrat.Parser)
-Packrat.Lexer (generic over stamp enum, input item type, array of input items, array of output items wrapped as tokens)
-Packrat.Lexer.Combinators (merged into Packrat.Lexer)
-Packrat.Util
Packrat.Errors (nested)
-Packrat.Interfaces (nested)
-Packrat.Graphs (generic over leaf array type)
Packrat.Tokens (nested, generic over contained array)
+Packrat.Util
+Packrat.Parse_Graphs (generic over memoize enum, tokens)
+Packrat.Lexer (generic over stamp enum, input item type, array of input items, array of output items wrapped as tokens)
+Packrat.Parser (generic over memoize enum, input item type, array of input items, graph of output item subarrays)
Packrat.Text
-Packrat.Text.Std (nested, generic over parser/lexer label enums)
-Packrat.Text.Wide (nested, generic over parser/lexer label enums)
-Packrat.Text.Wide_Wide (nested, generic over parser/lexer label enums)
+Packrat.Text.No_Lex (generic over parser label enum)
+Packrat.Text.Standard (generic over parser/lexer label enums)
+Packrat.Text.Wide (generic over parser/lexer label enums)
+Packrat.Text.Wide_Wide (generic over parser/lexer label enums)
Ratnest.Tests
Ratnest.Examples
@@ -32,13 +32,16 @@ planned order of writing:
(Ratnest not mentioned since that's where all the testing functions will go)
Packrat.Util
+Packrat
Packrat.Errors
Packrat.Tokens
Packrat.Lexer
-Packrat.Graphs
+Ada.Containers.Directed_Graphs
+Packrat.Parse_Graphs
Packrat.Parser
Packrat.Text
-Packrat.Text.Std
+Packrat.Text.No_Lex
+Packrat.Text.Standard
Packrat.Text.Wide
Packrat.Text.Wide_Wide
Calculator
@@ -52,18 +55,245 @@ Off_Side
Packrat
- main package
- - defines parser_component_function, lexer_component_function, syntax_tree, token_list, parser_context types
- - parser_context type is completely opaque and has an empty_context constant
- - syntax_tree is a tagged type container
- - token_list is just an array (so can be any sort of string, or an array of tokens produced by a lexer, etc)
- - is it possible to define parser/lexer components such that an instantiated main parsing/lexing function qualifies?
- - probably not, since the main parsing/lexing functions will be able to throw exceptions for parsing/lexing errors,
- whereas the component functions will only pass a failure up the chain as part of the context
- - possible usage modes include (token_list -> lexer -> token_list -> parser -> syntax_tree) and (token_list ->
- parser -> syntax_tree)
+ - defines exceptions, some common enumeration types that are used throughout
+ - contains Errors and Tokens as nested packages
+ - has some protected utility functions to convert between String and Unbounded_String types
+ - intended usage mode is (token_array -> lexer -> token_array -> parser -> parse_graph) although
+ it is also possible for (token_array -> parser -> parse_graph)
+
+List of types:
+ Result_Status
+ Parser_Error
+ Lexer_Error
+
+
+
+
+Packrat.Errors
+ - nested package, as this functionality is important to parsers/lexers
+ - functions to handle and process exception messages
+ - exception messages take the form of one or more "s<SYMBOLNAME>p<POSITION>" substrings joined together,
+ with each symbol name being in all capitals and each position being a positive integer
+ - this message represents the list of expected symbols at particular positions that would have resulted in a
+ more complete parse
+ - one of these messages will be raised with an exception if no valid parse is possible with a given input
+
+List of datatypes:
+ Error_Message (the errors encoded into a String)
+ Error_Info (containing a string of the symbol name expected, and a natural of the position)
+ Error_Info_Array
List of funcs:
-(primitive operations for syntax_tree)
+ Valid_Message
+ Valid_Identifier
+ Valid_Identifier_Array
+ Debug_String
+
+Join
+ Encode
+ Encode_Array
+ Decode
+(the message format ensures that using "&" to join messages will still result in a valid message)
+
+
+
+
+Packrat.Tokens
+ - nested package, defines a datatype important throughout lexing/parsing
+ - generic over the array type of whatever is being lexed/parsed and the enum of valid token labels
+ - contains an enum identifier, the start position, and the token value
+ - cannot contain the finish position of a token because that would make it impossible
+ to avoid exponential blowup in parse graphs for ambiguous grammars
+
+List of datatypes:
+ Token
+
+List of funcs:
+ Create
+ Debug_String
+
+ Label
+ Value
+ Start
+
+
+
+
+Packrat.Util
+ - contains predicates to use in Satisfy parser components and similar
+
+List of funcs:
+ In_Set
+ Not_In_Set
+ Is_Digit
+ Is_Hex
+ Is_Letter
+ Is_Alphanumeric
+ Is_Punctuation
+ Is_ASCII
+ Is_Extended_ASCII
+ Is_Space
+ Is_Linespace
+ Is_End_Of_Line
+ Is_Whitespace
+ Not_Whitespace
+
+
+
+
+Packrat.Lexer
+ - generic over the enum used for labeling lexemes, the input element list, and an instantiated Packrat.Tokens
+ - contains all datatypes and functions for lexing
+
+List of types:
+ Combinator
+ Combinator_Result
+ Combinator_Array
+ Lexer_Context
+ Component
+ Component_Result
+ Component_Array
+ With_Input
+
+List of funcs:
+(Component functions)
+ Stamp (will store a token for the input recognised by its combinators)
+ Ignore (won't store a token)
+
+(Lexing functions)
+ Scan (can be fed input in chunks, will keep continuing until fed a zero-length array)
+ Scan_Only (will assume that the input given is all there is, without considering the possibility of more input)
+ Scan_With (takes a function that supplies input when called, processes input the same as Scan)
+ Scan_Set (fed input in constant size chunks with a specified padding character)
+ Scan_Set_With (a combination of the functionality of Scan_Set and Scan_With)
+
+(Combinator functions)
+ Sequence
+ Count
+ Many
+ Many_Until
+
+ Satisfy
+ Satisfy_With
+ Match
+ Match_With
+ Multimatch
+ Take
+ Take_While
+ Take_Until
+
+ Line_End
+ Input_End
+
+
+
+
+Ada.Containers.Directed_Graphs
+ - generic over discrete node type, array of node type, node label type, edge label type
+ - implemented using the adjacancy list method
+ - allows both nodes and edges to be labeled, one arbitrary type for each
+ - nodes and edges don't *have* to be labeled
+ - API in the style of the Ada Containers library
+ - inspiration also taken from Haskell Data.Graph.Inductive.Graph
+
+List of types:
+ Extended_Node_Type
+ Edge_Type
+ Edge_Type_Array
+ Path (derived from the node array type)
+ Graph
+ Cursor
+ Node_Label_Reference
+ Constant_Node_Label_Reference
+ Edge_Label_Reference
+ Constant_Edge_Label_Reference
+
+Constants:
+ No_Node (of Extended_Node_Type)
+ No_Element (of Cursor)
+ Empty_Graph (of Graph)
+
+List of Cursor funcs:
+ Has_Element
+ Element (cursor -> node)
+
+ Equal_Subgraph
+ Subgraph_Node_Count
+
+ First_Parent
+ Next_Parent
+ Previous_Parent
+ Last_Parent
+ First_Child
+ Next_Child
+ Previous_Child
+ Last_Child
+
+List of Graph funcs:
+ "="
+
+ Assign
+ Copy
+ Move
+
+ Is_Empty
+ Clear
+ To_Cursor (node -> cursor)
+
+ Node_Count (a measure of number of nodes in the graph)
+ Edge_Count (a measure of number of edges in the graph)
+
+ Nodes (array of all nodes in the graph)
+ Edges (array of all edges in the graph)
+ Node_Range (returns minimum and maximum of nodes in the graph)
+ Unused_Nodes (returns an array of a specified number of nodes not used in the graph)
+
+ Insert (add a node at a given position (with label), add an edge (with label), add multiple nodes, multiple edges)
+ Append (add a node at the next available position (with label), return the node added)
+ Delete (remove a node, an edge, multiple nodes, multiple edges, from the graph)
+ Append_Label (give a node or an edge a label)
+ Replace_Label
+ Delete_Label (remove a label from a node or an edge)
+ Delete_Subgraph
+ Swap (switches two nodes in the graph)
+
+ Context (returns all parents of a node, all children of the node)
+ Labeled_Context (returns all parents of a node, the label of the node, and all the children of the node)
+ Has_Label (predicates for testing if a node or edge has a label in the graph)
+ Label (returns the label of a node or an edge if it has one)
+ Label_Reference (accessor to a label)
+ Constant_Label_Reference (constant accessor)
+ Neighbors (two nodes are neighbors if they have edges in both directions)
+ Parents (nodes with edges going into a node)
+ Children (nodes with edges coming from a node)
+ Outbound (edges going out of a node)
+ Inbound (edges coming into a node)
+ Outdegree (outbound degree of a node)
+ Indegree (inbound degree of a node)
+ Degree (degree of a node; how many other nodes are connected)
+ Has_Edge (check whether two nodes are connected)
+ Has_Labeled_Edge (check whether two nodes are connected with a labeled edge)
+ Has_Neighbor (check whether two nodes are neighbors)
+
+ Find (find a node or edge with a particular label)
+ Find_In_Subgraph
+ Contains (check whether a node or an edge is in the graph)
+ Contains_Label (check if graph contains a particular node/edge label)
+
+ Iterate
+ Iterate_Subgraph
+ Iterate_Children
+ Iterate_Parents
+
+
+
+
+Packrat.Parse_Graphs
+ - generic over an instantiated Packrat.Tokens and associated types
+ - sets up the above directed graph lib for use as a parse graph that will avoid exponential blowup
+ - provides extra functions to merge graphs and iterate specific parsing choices
+
+
@@ -152,245 +382,6 @@ End_Of_Input
-Packrat.Lexer
- - generic over the enum used for labeling lexemes and the input token_list as well as component token elements
- - should be possible to place an upper limit on the number of tokens scanned, so as to accommodate a statically
- sized output array of tokens (and possibly a statically sized input array)
-
-List of datatypes:
-Combinator
-Combinator_Result
-
-List of funcs:
-(for Combinator_Results)
-Create_Result
-Join
-Status
-
-(each of these is generic over an array of lexer_component functions, either Stamp or Ignore as below)
-Scan
- - function that returns an array of lexed tokens
- - uses first applicable lexer component to lex each token
- - if all lexer components return "partial" then also returns with a partial status
- - if all lexer components fail then raises a lexer_error
- - lexer status can be fed back into any of these Scan functions to resume with further input
- - if end of input is reached without any lexer components returning "partial" then returns
- a complete status and feeding the lexer status back into these functions will be the same
- as starting with a blank status
-Scan_Set
- - as above, except is a procedure that uses a fixed size array as output with a padding token
-Scan_Only
- - function that returns an array of lexed tokens
- - takes a lexer status as input to resuem a lex, but will treat it as a constant unlike the others
- - if all lexer components return "partial" or fail then raises a lexer_error
-Scan_With
- - function that returns an array of lexed tokens
- - when it runs out of input it uses the supplied function to get more input until that function
- returns an empty array
- - may also return with a partial status as with Scan
-Scan_Set_With
- - as above, except is a procedure that uses a fixed size array as output with a padding token
-
-(determined to be redundant)
-Scan_Set_Only
-
-(type signature of these are:
- input of an opaque lex_component_input type
- output of an opaque lex_component_output type
- return of a Fail/Partial/Success result enum)
-Stamp
- - one of two available lexer component functions
- - generic over a value of the enum used for labelling lexemes and a lexer combinator
-Ignore
- - the other available lexer component function
- - generic over a lexer combinator, as this function will scan for the given token and then ignore it
-
-
-
-
-Packrat.Lexer.Combinators
-
-type signature of these are:
- inputs of input_array and a starting position index
- outputs of the number of elements consumed and the value lexed
- return of a Fail/Partial/Success result enum
-
-List of funcs:
-Sequence
-Count
-Many
-Many_Until
-
-Satisfy
-Satisfy_With
-Match
-Match_With
-Multimatch
-Take
-Take_While
-Take_Until
-
-Line_End
-Input_End
-
-
-
-
-Packrat.Util
- - contains predicates to use in Satisfy parser components and similar
- - has a few string output functions for syntax_tree objects
-
-List of funcs:
-In_Set
-Not_In_Set
-Is_Digit
-Is_Hex
-Is_Letter
-Is_Alphanumeric
-Is_Punctuation
-Is_ASCII
-Is_Extended_ASCII
-Is_Space
-Is_Linespace
-Is_End_Of_Line
-Is_Whitespace
-Not_Whitespace
-
-
-
-
-Packrat.Error
-(actually a nested package, as this functionality is important to parsers/lexers)
- - functions to handle and process exception messages
- - exception messages take the form of one or more "s<SYMBOLNAME>p<POSITION>" substrings joined together,
- with each symbol name being in all capitals and each position being a positive integer
- - this message represents the list of expected symbols at particular positions that would have resulted in a
- more complete parse
- - one of these messages will be raised with an exception of no valid parse is possible with a given input
-
-List of datatypes:
-Error_Message
-Error_Info (containing a string of the symbol name expected, and a natural of the position)
-Error_Info_Array
-
-List of funcs:
-Valid_Message
-Valid_Identifier
-Valid_Identifier_Array
-Debug_String
-
-Join
-Encode
-Encode_Array
-Decode
-(the message format ensures that using "&" to join messages will still result in a valid message)
-
-
-
-
-Packrat.Tokens
- - nested package, defines a datatype important throughout lexing/parsing
- - generic over the array type of whatever is being lexed/parsed and the enum of valid token labels
- - contains an enum identifier, the start position, the finish position plus one, and the token value
-
-List of datatypes:
-Token (tagged, controlled, but not limited)
-
-List of funcs:
-Create
-Initialized
-Debug_String
-
-Label
-Value
-Start
-Finish
-
-
-
-
-Packrat.Interfaces
- - nested package, defines interfaces and types that should be used to derive an abstract syntax graph
- - some of these may be removed from Interfaces and just implemented in Graphs depending on what actually
- ends up being necessary for parsing
-
-List of interfaces:
-Node
- - Leaf (constructor)
- - Branch (constructor)
- - Is_Leaf
- - Is_Branch
- - Label
- - Elements
- - Start
- - Finish
-
-Cursor
- - Is_Nothing
- - Depth
- - Is_Node
- - Is_Root
- - Is_Branch
- - Is_Leaf
- - Label
- - Elements
- - Start
- - Finish
- - Choices
- - Parent
- - Child_Count
- - All_Child_Count
- - First_Child
- - Last_Child
- - Next_Sibling
- - Prev_Sibling
- - Delete_Children
- - Delete_All_Children
- - Equal_Subgraph
- - Subgraph_Node_Count
- - Find_In_Subgraph
-
-Directed_Acyclic_Graph
- - Contains
- - Singleton (constructor)
- - Is_Empty
- - Is_Ambiguous
- - Node_Count
- - Root_Count
- - Root
- - Append
- - Prepend
- - Attach_Choice
- - Clear
- - Delete_Position
- - Find
-
-
-
-
-Packrat.Graphs
- - builds on the interfaces in Packrat.Interfaces
- - can be replaced with a user-supplied type, assuming proper care is taken to avoid exponential issues
-
-List_of_datatypes:
-Node (implements all of Interfaces.Node)
-Node_Reference
-Cursor (implements all of Interfaces.Cursor)
-Iter_Cursor (wraps a Cursor, necessary for iteration)
-Parse_Graph (implements all of Interfaces.Directed_Acyclic_Graph)
-Choosing_Function (used in Iterate_Choice)
-
-List of funcs:
-Debug_String
-
-Node_At
-
-Is_Valid_Node
-Iterate
-Iterate_Subtree
-Iterate_Choice
-
-