summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJed Barber <jjbarber@y7mail.com>2021-01-19 15:59:35 +1100
committerJed Barber <jjbarber@y7mail.com>2021-01-19 15:59:35 +1100
commitd089edbb61f6bf6b52bc74f7f03f909b8b33b670 (patch)
treee03573d1826184afb1630fed6f09b6d5d2a441ee
parent85b4f256794699c9360d9aa54901ab2ceeddd8d5 (diff)
Readme added, some outdated notes removed
-rw-r--r--packrat_parser_lib_notes.txt357
-rw-r--r--readme.txt50
2 files changed, 50 insertions, 357 deletions
diff --git a/packrat_parser_lib_notes.txt b/packrat_parser_lib_notes.txt
deleted file mode 100644
index 6aa0d0b..0000000
--- a/packrat_parser_lib_notes.txt
+++ /dev/null
@@ -1,357 +0,0 @@
-
-
-package structure:
-
-Packrat
-Packrat.Errors (nested)
-Packrat.Tokens (nested, generic over contained array)
-Packrat.Util
-Packrat.Parse_Graphs (generic over memoize enum, tokens)
-Packrat.Lexer (generic over stamp enum, input item type, array of input items, array of output items wrapped as tokens)
-Packrat.Parser (generic over memoize enum, input item type, array of input items, graph of output item subarrays)
-Packrat.Text
-Packrat.Text.No_Lex (generic over parser label enum)
-Packrat.Text.Standard (generic over parser/lexer label enums)
-Packrat.Text.Wide (generic over parser/lexer label enums)
-Packrat.Text.Wide_Wide (generic over parser/lexer label enums)
-
-Ratnest.Tests
-Ratnest.Examples
-
-Calculator
-Tomita
-Off_Side
-
-
-
-
-planned order of writing:
-(in all cases notes here, then spec, then tests, then body)
-(Ratnest not mentioned since that's where all the testing functions will go)
-
-Packrat.Util
-Packrat
-Packrat.Errors
-Packrat.Tokens
-Packrat.Lexer
-Packrat.Parse_Graphs
-Packrat.Parser
-Packrat.Text
-Packrat.Text.No_Lex
-Packrat.Text.Standard
-Packrat.Text.Wide
-Packrat.Text.Wide_Wide
-Calculator
-Tomita
-Off_Side
-
-
-
-
-
-
-Packrat
- - main package
- - defines exceptions, some common enumeration types that are used throughout
- - contains Errors and Tokens as nested packages
- - has some protected utility functions to convert between String and Unbounded_String types
- - intended usage mode is (token_array -> lexer -> token_array -> parser -> parse_graph) although
- it is also possible for (token_array -> parser -> parse_graph)
-
-List of types:
- Result_Status
- Parser_Error
- Lexer_Error
-
-
-
-
-Packrat.Errors
- - nested package, as this functionality is important to parsers/lexers
- - functions to handle and process exception messages
- - exception messages take the form of one or more "s<SYMBOLNAME>p<POSITION>" substrings joined together,
- with each symbol name being in all capitals and each position being a positive integer
- - this message represents the list of expected symbols at particular positions that would have resulted in a
- more complete parse
- - one of these messages will be raised with an exception if no valid parse is possible with a given input
-
-List of datatypes:
- Error_Message (the errors encoded into a String)
- Error_Info (containing a string of the symbol name expected, and a natural of the position)
- Error_Info_Array
-
-List of funcs:
- Valid_Message
- Valid_Identifier
- Valid_Identifier_Array
- Debug_String
-
-Join
- Encode
- Encode_Array
- Decode
-(the message format ensures that using "&" to join messages will still result in a valid message)
-
-
-
-
-Packrat.Tokens
- - nested package, defines a datatype important throughout lexing/parsing
- - generic over the array type of whatever is being lexed/parsed and the enum of valid token labels
- - contains an enum identifier, the start position, and the token value
- - cannot contain the finish position of a token because that would make it impossible
- to avoid exponential blowup in parse graphs for ambiguous grammars
-
-List of datatypes:
- Token
-
-List of funcs:
- Create
- Debug_String
-
- Label
- Value
- Start
-
-
-
-
-Packrat.Util
- - contains predicates to use in Satisfy parser components and similar
-
-List of funcs:
- In_Set
- Not_In_Set
- Is_Digit
- Is_Hex
- Is_Letter
- Is_Alphanumeric
- Is_Punctuation
- Is_ASCII
- Is_Extended_ASCII
- Is_Space
- Is_Linespace
- Is_End_Of_Line
- Is_Whitespace
- Not_Whitespace
-
-
-
-
-Packrat.Lexer
- - generic over the enum used for labeling lexemes, the input element list, and an instantiated Packrat.Tokens
- - contains all datatypes and functions for lexing
-
-List of types:
- Combinator
- Combinator_Result
- Combinator_Array
- Lexer_Context
- Component
- Component_Result
- Component_Array
- With_Input
-
-List of funcs:
-(Component functions)
- Stamp (will store a token for the input recognised by its combinators)
- Ignore (won't store a token)
-
-(Lexing functions)
- Scan (can be fed input in chunks, will keep continuing until fed a zero-length array)
- Scan_Only (will assume that the input given is all there is, without considering the possibility of more input)
- Scan_With (takes a function that supplies input when called, processes input the same as Scan)
- Scan_Set (fed input in constant size chunks with a specified padding character)
- Scan_Set_With (a combination of the functionality of Scan_Set and Scan_With)
-
-(Combinator functions)
- Sequence
- Count
- Many
- Many_Until
-
- Satisfy
- Satisfy_With
- Match
- Match_With
- Multimatch
- Take
- Take_While
- Take_Until
-
- Line_End
- Input_End
-
-
-
-
-Packrat.Parse_Graphs
- - generic over an instantiated Packrat.Tokens and associated types
- - based on Ada.Containers.Directed_Graphs in the directed-graph repo
- - sets up the above directed graph lib for use as a parse graph that will avoid exponential blowup
- - provides extra functions to merge graphs and iterate specific parsing choices
-
-
-
-
-
-
-Packrat.Parser
- - generic over the enum used for memoizing and the input token_list as well as token elements
-
-List of funcs:
-Parse
- - may return partial results if incomplete input supplied
- - new input can then be fed back in with the parse_context and initial results to continue parsing
-Parse_Only
- - will not return partial results and will always attempt a complete parse
-Parse_With
- - must be supplied with a function to acquire more input
-Parse_Full
- - will not return partial results and will always require there to be no more input remaining at end
-Memoize
-
-
-
-
-Packrat.Parser.Combinators
- - all higher order functions/procedures go in here
- - updating memo_table should be done by overwriting in order to accommodate left recursion
- unwinding properly
-
-List of funcs:
-(these are combinators that arrange nodes produced by parsers but do not produce nodes themselves)
-Sequence (one of the two that require passing in an array of function accesses)
- - takes an array of component accesses and adds whatever subtrees they produce in order as
- children of the current position on the output structure
-Choice (the other that requires passing in an array of function accesses)
- - takes an array of component accesses and trys them in order, first one to succeed has the
- subtree it produces added as a child of the current position on the output structure
-Count
- -
-Many
-Many_Until
-Separate_By
-Separate_End_By
-
-(these are parser components that create nodes from input)
-Satisfy
- - takes a predicate and if that predicate applied to the next input token is true, creates a
- node with the next input token
-Satisfy_With
- - takes a predicate and a transforming function, applies the transform to the next input token
- then tests it with the predicate as above
- - the node created uses the *un*transformed input
-Default
- - takes a default argument (same type as input tokens) and a parser component, and either
- returns the subresult from the component if it succeeds, or creates a node with the default
- argument if the component fails
-Match
- - for matching one item, eg a char in a string input, or a string in a lex'd string token array
- - checks a supplied argument against the next input token, and if they match, creates a node
- with the next input token
-Match_With
- - first uses a transforming function on the next input token then tests against a supplied
- argument as above, creating a node with the untransformed input if there is a match
-Multimatch
- - for matching multiple successive items, eg a substring in a string input
- - checks an array argument against the same length subarray of input tokens for a match, if
- successful creates a node with that subarray
-Multimatch_With
- - applies a transforming function before the check as above
-Take
- - creates a node with the next input token
-Take_While
- - takes a predicate and creates a node with the subarray of input tokens corresponding to
- the next N input where the predicate succeeds
-Take_Until
- - takes a predicate and creates a node with the subarray of input tokens corresponding to
- the next N input where the predicate fails
-
-(these are recogniser combinators that discard nodes produced by other components)
-Skip
- - takes a parser component, executes it, and if it succeeds the result is discarded
- instead of becoming part of the output
-
-(these are recogniser components that do not create nodes)
-Empty
-End_Of_Input
-
-
-
-
-
-
-
-
-Ratnest
-
-List of funcs:
-Run_Tests
-
-
-
-
-Ratnest.Tests
-
-List of funcs:
-Valid_Message_Check
-Valid_Identifier_Check
-Join_Check
-Encode_1_Check
-Encode_2_Check
-Encode_3_Check
-Encode_4_Check
-Decode_Check
-
-Token_Adjust_Check
-Token_Store_Check
-
-In_Set_Check
-Not_In_Set_Check
-
-Is_Digit_Check
-Is_Hex_Check
-Is_Letter_Check
-Is_Alphanumeric_Check
-Is_ASCII_Check
-Is_Extended_ASCII_Check
-Is_Space_Check
-Is_Linespace_Check
-Is_End_Of_Line_Check
-Is_Whitespace_Check
-Is_Not_Whitespace_Check
-
-
-
-
-Ratnest.Examples
- - some parser examples
-
-List of funcs:
-Scientific
-Signed
-Decimal
-Hexadecimal
-Double
-
-
-
-
-
-
-Miscellanea:
-
-Where to place C export binding?
-
-Recognisers/combinators implemented as generic functions/procedures if they require more than just input/output.
-This is done to get around the lack of first class functions, as combinators usually take the form of higher order functions.
-
-eg Term, Sequence, Choice, Some, Many
-
-How to force functions/procedures of a particular type to be referentially transparent?
-
-Want support for incremental input, so will need Parse function that can return a partially parsed state that can be
-supplied with more input.
-
-
diff --git a/readme.txt b/readme.txt
new file mode 100644
index 0000000..bc2d77b
--- /dev/null
+++ b/readme.txt
@@ -0,0 +1,50 @@
+
+
+Packrat Parser Combinators
+==========================
+
+
+
+
+This is a parser combinator library that can handle ambiguous left-recursive
+grammars in at most polynomial time and space complexity. It does this by
+memoizing all intermediate parse results and deduplicating them inside a
+specialised parse graph structure.
+
+For more extensive documentation including a quick start guide and short
+explanations of individual combinators, see the /doc/index.html file.
+
+
+
+
+Dependencies:
+
+ GNAT
+ directed-graph
+ basic-unit-test (only if compiling the unit test suite)
+
+
+
+
+How to build/install:
+
+This repository is written to use the GNAT Project Manager build tools. To
+build Packrat, use the following command
+
+ gprbuild packrat.gpr -Xmode=release
+
+And to install the library, use
+
+ gprinstall -p -m packrat.gpr
+
+The other project files (tests.gpr and examples.gpr) are for the unit test
+suite and the provided examples of how to use this library.
+
+
+
+
+For further information on the build tools, consult
+
+ https://docs.adacore.com/gprbuild-docs/html/gprbuild_ug.html
+
+