packrat_parser_lib_notes.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459


package structure:

Ada.Containers.Directed_Graphs

Packrat
Packrat.Errors (nested)
Packrat.Tokens (nested, generic over contained array)
Packrat.Util
Packrat.Parse_Graphs (generic over memoize enum, tokens)
Packrat.Lexer (generic over stamp enum, input item type, array of input items, array of output items wrapped as tokens)
Packrat.Parser (generic over memoize enum, input item type, array of input items, graph of output item subarrays)
Packrat.Text
Packrat.Text.No_Lex (generic over parser label enum)
Packrat.Text.Standard (generic over parser/lexer label enums)
Packrat.Text.Wide (generic over parser/lexer label enums)
Packrat.Text.Wide_Wide (generic over parser/lexer label enums)

Ratnest.Tests
Ratnest.Examples

Calculator
Tomita
Off_Side


planned order of writing:
(in all cases notes here, then spec, then tests, then body)
(Ratnest not mentioned since that's where all the testing functions will go)

Packrat.Util
Packrat
Packrat.Errors
Packrat.Tokens
Packrat.Lexer
Ada.Containers.Directed_Graphs
Packrat.Parse_Graphs
Packrat.Parser
Packrat.Text
Packrat.Text.No_Lex
Packrat.Text.Standard
Packrat.Text.Wide
Packrat.Text.Wide_Wide
Calculator
Tomita
Off_Side


Packrat
 - main package
 - defines exceptions, some common enumeration types that are used throughout
 - contains Errors and Tokens as nested packages
 - has some protected utility functions to convert between String and Unbounded_String types
 - intended usage mode is (token_array -> lexer -> token_array -> parser -> parse_graph) although
   it is also possible for (token_array -> parser -> parse_graph)

List of types:
 Result_Status
 Parser_Error
 Lexer_Error


Packrat.Errors
 - nested package, as this functionality is important to parsers/lexers
 - functions to handle and process exception messages
 - exception messages take the form of one or more "s<SYMBOLNAME>p<POSITION>" substrings joined together,
   with each symbol name being in all capitals and each position being a positive integer
 - this message represents the list of expected symbols at particular positions that would have resulted in a
   more complete parse
 - one of these messages will be raised with an exception if no valid parse is possible with a given input

List of datatypes:
 Error_Message (the errors encoded into a String)
 Error_Info (containing a string of the symbol name expected, and a natural of the position)
 Error_Info_Array

List of funcs:
 Valid_Message
 Valid_Identifier
 Valid_Identifier_Array
 Debug_String

Join
 Encode
 Encode_Array
 Decode
(the message format ensures that using "&" to join messages will still result in a valid message)


Packrat.Tokens
 - nested package, defines a datatype important throughout lexing/parsing
 - generic over the array type of whatever is being lexed/parsed and the enum of valid token labels
 - contains an enum identifier, the start position, and the token value
 - cannot contain the finish position of a token because that would make it impossible
   to avoid exponential blowup in parse graphs for ambiguous grammars

List of datatypes:
 Token

List of funcs:
 Create
 Debug_String

 Label
 Value
 Start


Packrat.Util
 - contains predicates to use in Satisfy parser components and similar

List of funcs:
 In_Set
 Not_In_Set
 Is_Digit
 Is_Hex
 Is_Letter
 Is_Alphanumeric
 Is_Punctuation
 Is_ASCII
 Is_Extended_ASCII
 Is_Space
 Is_Linespace
 Is_End_Of_Line
 Is_Whitespace
 Not_Whitespace


Packrat.Lexer
 - generic over the enum used for labeling lexemes, the input element list, and an instantiated Packrat.Tokens
 - contains all datatypes and functions for lexing

List of types:
 Combinator
 Combinator_Result
 Combinator_Array
 Lexer_Context
 Component
 Component_Result
 Component_Array
 With_Input

List of funcs:
(Component functions)
 Stamp (will store a token for the input recognised by its combinators)
 Ignore (won't store a token)

(Lexing functions)
 Scan (can be fed input in chunks, will keep continuing until fed a zero-length array)
 Scan_Only (will assume that the input given is all there is, without considering the possibility of more input)
 Scan_With (takes a function that supplies input when called, processes input the same as Scan)
 Scan_Set (fed input in constant size chunks with a specified padding character)
 Scan_Set_With (a combination of the functionality of Scan_Set and Scan_With)

(Combinator functions)
 Sequence
 Count
 Many
 Many_Until

 Satisfy
 Satisfy_With
 Match
 Match_With
 Multimatch
 Take
 Take_While
 Take_Until

 Line_End
 Input_End


Ada.Containers.Directed_Graphs
 - generic over discrete node type, array of node type, node label type, edge label type
 - implemented using the adjacancy list method
 - allows both nodes and edges to be labeled, one arbitrary type for each
 - nodes and edges don't *have* to be labeled
 - API in the style of the Ada Containers library
 - inspiration also taken from Haskell Data.Graph.Inductive.Graph

List of types:
 Extended_Node_Type
 Edge_Type
 Edge_Type_Array
 Path (derived from the node array type)
 Graph
 Cursor
 Node_Label_Reference
 Constant_Node_Label_Reference
 Edge_Label_Reference
 Constant_Edge_Label_Reference

Constants:
 No_Node (of Extended_Node_Type)
 No_Element (of Cursor)
 Empty_Graph (of Graph)

List of Cursor funcs:
 Has_Element
 Element (cursor -> node)

 Equal_Subgraph
 Subgraph_Node_Count

 First_Parent
 Next_Parent
 Previous_Parent
 Last_Parent
 First_Child
 Next_Child
 Previous_Child
 Last_Child

List of Graph funcs:
 "="

 Assign
 Copy
 Move

 Is_Empty
 Clear
 To_Cursor (node -> cursor)

 Node_Count (a measure of number of nodes in the graph)
 Edge_Count (a measure of number of edges in the graph)

 Nodes (array of all nodes in the graph)
 Edges (array of all edges in the graph)
 Node_Range (returns minimum and maximum of nodes in the graph)
 Unused_Nodes (returns an array of a specified number of nodes not used in the graph)

 Insert (add a node at a given position (with label), add an edge (with label), add multiple nodes, multiple edges)
 Append (add a node at the next available position (with label), return the node added)
 Delete (remove a node, an edge, multiple nodes, multiple edges, from the graph)
 Append_Label (give a node or an edge a label)
 Replace_Label
 Delete_Label (remove a label from a node or an edge)
 Delete_Subgraph
 Swap (switches two nodes in the graph)

 Context (returns all parents of a node, all children of the node)
 Labeled_Context (returns all parents of a node, the label of the node, and all the children of the node)
 Has_Label (predicates for testing if a node or edge has a label in the graph)
 Label (returns the label of a node or an edge if it has one)
 Label_Reference (accessor to a label)
 Constant_Label_Reference (constant accessor)
 Neighbors (two nodes are neighbors if they have edges in both directions)
 Parents (nodes with edges going into a node)
 Children (nodes with edges coming from a node)
 Outbound (edges going out of a node)
 Inbound (edges coming into a node)
 Outdegree (outbound degree of a node)
 Indegree (inbound degree of a node)
 Degree (degree of a node; how many other nodes are connected)
 Has_Edge (check whether two nodes are connected)
 Has_Labeled_Edge (check whether two nodes are connected with a labeled edge)
 Has_Neighbor (check whether two nodes are neighbors)

 Find (find a node or edge with a particular label)
 Find_In_Subgraph
 Contains (check whether a node or an edge is in the graph)
 Contains_Label (check if graph contains a particular node/edge label)

 Iterate
 Iterate_Subgraph
 Iterate_Children
 Iterate_Parents


Packrat.Parse_Graphs
 - generic over an instantiated Packrat.Tokens and associated types
 - sets up the above directed graph lib for use as a parse graph that will avoid exponential blowup
 - provides extra functions to merge graphs and iterate specific parsing choices


Packrat.Parser
 - generic over the enum used for memoizing and the input token_list as well as token elements

List of funcs:
Parse
 - may return partial results if incomplete input supplied
 - new input can then be fed back in with the parse_context and initial results to continue parsing
Parse_Only
 - will not return partial results and will always attempt a complete parse
Parse_With
 - must be supplied with a function to acquire more input
Parse_Full
 - will not return partial results and will always require there to be no more input remaining at end
Memoize


Packrat.Parser.Combinators
 - all higher order functions/procedures go in here
 - updating memo_table should be done by overwriting in order to accommodate left recursion
   unwinding properly

List of funcs:
(these are combinators that arrange nodes produced by parsers but do not produce nodes themselves)
Sequence (one of the two that require passing in an array of function accesses)
 - takes an array of component accesses and adds whatever subtrees they produce in order as
   children of the current position on the output structure
Choice (the other that requires passing in an array of function accesses)
 - takes an array of component accesses and trys them in order, first one to succeed has the
   subtree it produces added as a child of the current position on the output structure
Count
 - 
Many
Many_Until
Separate_By
Separate_End_By

(these are parser components that create nodes from input)
Satisfy
 - takes a predicate and if that predicate applied to the next input token is true, creates a
   node with the next input token
Satisfy_With
 - takes a predicate and a transforming function, applies the transform to the next input token
   then tests it with the predicate as above
 - the node created uses the *un*transformed input
Default
 - takes a default argument (same type as input tokens) and a parser component, and either
   returns the subresult from the component if it succeeds, or creates a node with the default
   argument if the component fails
Match
 - for matching one item, eg a char in a string input, or a string in a lex'd string token array
 - checks a supplied argument against the next input token, and if they match, creates a node
   with the next input token
Match_With
 - first uses a transforming function on the next input token then tests against a supplied
   argument as above, creating a node with the untransformed input if there is a match
Multimatch
 - for matching multiple successive items, eg a substring in a string input
 - checks an array argument against the same length subarray of input tokens for a match, if
   successful creates a node with that subarray
Multimatch_With
 - applies a transforming function before the check as above
Take
 - creates a node with the next input token
Take_While
 - takes a predicate and creates a node with the subarray of input tokens corresponding to
   the next N input where the predicate succeeds
Take_Until
 - takes a predicate and creates a node with the subarray of input tokens corresponding to
   the next N input where the predicate fails

(these are recogniser combinators that discard nodes produced by other components)
Skip
 - takes a parser component, executes it, and if it succeeds the result is discarded
   instead of becoming part of the output

(these are recogniser components that do not create nodes)
Empty
End_Of_Input


Ratnest

List of funcs:
Run_Tests


Ratnest.Tests

List of funcs:
Valid_Message_Check
Valid_Identifier_Check
Join_Check
Encode_1_Check
Encode_2_Check
Encode_3_Check
Encode_4_Check
Decode_Check

Token_Adjust_Check
Token_Store_Check

In_Set_Check
Not_In_Set_Check

Is_Digit_Check
Is_Hex_Check
Is_Letter_Check
Is_Alphanumeric_Check
Is_ASCII_Check
Is_Extended_ASCII_Check
Is_Space_Check
Is_Linespace_Check
Is_End_Of_Line_Check
Is_Whitespace_Check
Is_Not_Whitespace_Check


Ratnest.Examples
 - some parser examples

List of funcs:
Scientific
Signed
Decimal
Hexadecimal
Double


Miscellanea:

Where to place C export binding?

Recognisers/combinators implemented as generic functions/procedures if they require more than just input/output.
This is done to get around the lack of first class functions, as combinators usually take the form of higher order functions.

eg Term, Sequence, Choice, Some, Many

How to force functions/procedures of a particular type to be referentially transparent?

Want support for incremental input, so will need Parse function that can return a partially parsed state that can be
supplied with more input.