Parsing
Idris 2 comes with a Lexing and a Parsing library built into the contrib
package.
For this cookbook, we will write a very simple parser for a lambda calculus parser that will accept the following language:
let name = world in (\x.hello x) name
Once we write a lambda calculus parser, we will also see how we can take advantage of a powerful built in expression parser in Idris 2 to write a small calculator that should be smart enough to parse the following expression:
1 + 2 - 3 * 4 / 5
Lexer
The main lexer module is under Text.Lexer
. This module contains toTokenMap
which is a function
that converts a List (Lexer, k) -> TokenMap (Token k)
where k
is a token kind. This function
could be used for simple lexer to token mappings. The module also includes high level lexers for
specifying quantity and common programming primitives like alphas
, intLit
,
lineComment
and blockComment
.
The Text.Lexer
module also reexports Text.Lexer.Core
, Text.Quantity
and Text.Token
.
Text.Lexer.Core
provides the building blocks of the lexer, including a type called
Recognise
which is the underlying data type for the lexer. The other important function that this
module provides is a lex
which takes in a lexer and returns the tokens.
Text.Quantity
provides a data type Quantity
which can be used with certain lexers to specify
how many times something is expected to appear.
Text.Token
provides a data type Token
that represents a parsed token, its kind and the text.
This module also provides an important interface called TokenKind
which tells the lexer how to map
token kinds to Idris 2 types and how to convert each kind from a string to a value.
Parser
The main parser module is under Text.Parser
. This module contains different grammar parsers, the main
one being match
which takes a TokenKind
and returns the value as defined in the TokenKind
interface. There are other grammar parsers as well, but for our example, we will only be using match
.
The Text.Parser
module reexports Text.Parser.Core
, Text.Quantity
and Text.Token
.
Text.Parser.Core
provides the building blocks of the parser, including a type called Grammar
which is the underlying data type for the parser. The other important function that this module provides
is parse
which takes in a Grammar
and returns the parsed expression.
We covered Text.Quantity
and Text.Token
in the Lexer section so we’re not going to
repeat what they do here.
Lambda Calculus Lexer & Parser
1import Data.List
2import Data.List1
3import Text.Lexer
4import Text.Parser
5
6%default total
7
8data Expr = App Expr Expr | Abs String Expr | Var String | Let String Expr Expr
9
10Show Expr where
11 showPrec d (App e1 e2) = showParens (d == App) (showPrec (User 0) e1 ++ " " ++ showPrec App e2)
12 showPrec d (Abs v e) = showParens (d > Open) ("\\" ++ v ++ "." ++ show e)
13 showPrec d (Var v) = v
14 showPrec d (Let v e1 e2) = showParens (d > Open) ("let " ++ v ++ " = " ++ show e1 ++ " in " ++ show e2)
15
16data LambdaTokenKind
17 = LTLambda
18 | LTIdentifier
19 | LTDot
20 | LTOParen
21 | LTCParen
22 | LTIgnore
23 | LTLet
24 | LTEqual
25 | LTIn
26
27Eq LambdaTokenKind where
28 (==) LTLambda LTLambda = True
29 (==) LTDot LTDot = True
30 (==) LTIdentifier LTIdentifier = True
31 (==) LTOParen LTOParen = True
32 (==) LTCParen LTCParen = True
33 (==) LTLet LTLet = True
34 (==) LTEqual LTEqual = True
35 (==) LTIn LTIn = True
36 (==) _ _ = False
37
38Show LambdaTokenKind where
39 show LTLambda = "LTLambda"
40 show LTDot = "LTDot"
41 show LTIdentifier = "LTIdentifier"
42 show LTOParen = "LTOParen"
43 show LTCParen = "LTCParen"
44 show LTIgnore = "LTIgnore"
45 show LTLet = "LTLet"
46 show LTEqual = "LTEqual"
47 show LTIn = "LTIn"
48
49LambdaToken : Type
50LambdaToken = Token LambdaTokenKind
51
52Show LambdaToken where
53 show (Tok kind text) = "Tok kind: " ++ show kind ++ " text: " ++ text
54
55TokenKind LambdaTokenKind where
56 TokType LTIdentifier = String
57 TokType _ = ()
58
59 tokValue LTLambda _ = ()
60 tokValue LTIdentifier s = s
61 tokValue LTDot _ = ()
62 tokValue LTOParen _ = ()
63 tokValue LTCParen _ = ()
64 tokValue LTIgnore _ = ()
65 tokValue LTLet _ = ()
66 tokValue LTEqual _ = ()
67 tokValue LTIn _ = ()
68
69ignored : WithBounds LambdaToken -> Bool
70ignored (MkBounded (Tok LTIgnore _) _ _) = True
71ignored _ = False
72
73identifier : Lexer
74identifier = alpha <+> many alphaNum
75
76keywords : List (String, LambdaTokenKind)
77keywords = [
78 ("let", LTLet),
79 ("in", LTIn)
80]
81
82lambdaTokenMap : TokenMap LambdaToken
83lambdaTokenMap = toTokenMap [(spaces, LTIgnore)] ++
84 [(identifier, \s =>
85 case lookup s keywords of
86 (Just kind) => Tok kind s
87 Nothing => Tok LTIdentifier s
88 )
89 ] ++ toTokenMap [
90 (exact "\\", LTLambda),
91 (exact ".", LTDot),
92 (exact "(", LTOParen),
93 (exact ")", LTCParen),
94 (exact "=", LTEqual)
95 ]
96
97lexLambda : String -> Maybe (List (WithBounds LambdaToken))
98lexLambda str =
99 case lex lambdaTokenMap str of
100 (tokens, _, _, "") => Just tokens
101 _ => Nothing
102
103mutual
104 expr : Grammar state LambdaToken True Expr
105 expr = do
106 t <- term
107 app t <|> pure t
108
109 term : Grammar state LambdaToken True Expr
110 term = abs
111 <|> var
112 <|> paren
113 <|> letE
114
115 app : Expr -> Grammar state LambdaToken True Expr
116 app e1 = do
117 e2 <- term
118 app1 $ App e1 e2
119
120 app1 : Expr -> Grammar state LambdaToken False Expr
121 app1 e = app e <|> pure e
122
123 abs : Grammar state LambdaToken True Expr
124 abs = do
125 match LTLambda
126 commit
127 argument <- match LTIdentifier
128 match LTDot
129 e <- expr
130 pure $ Abs argument e
131
132 var : Grammar state LambdaToken True Expr
133 var = map Var $ match LTIdentifier
134
135 paren : Grammar state LambdaToken True Expr
136 paren = do
137 match LTOParen
138 e <- expr
139 match LTCParen
140 pure e
141
142 letE : Grammar state LambdaToken True Expr
143 letE = do
144 match LTLet
145 commit
146 argument <- match LTIdentifier
147 match LTEqual
148 e1 <- expr
149 match LTIn
150 e2 <- expr
151 pure $ Let argument e1 e2
152
153parseLambda : List (WithBounds LambdaToken) -> Either String Expr
154parseLambda toks =
155 case parse expr $ filter (not . ignored) toks of
156 Right (l, []) => Right l
157 Right e => Left "contains tokens that were not consumed"
158 Left e => Left (show e)
159
160parse : String -> Either String Expr
161parse x =
162 case lexLambda x of
163 Just toks => parseLambda toks
164 Nothing => Left "Failed to lex."
Testing out our parser gives us back the following output:
$ idris2 -p contrib LambdaCalculus.idr
Main> :exec printLn $ parse "let name = world in (\\x.hello x) name"
Right (let name = world in (\x.hello x) name)
Expression Parser
Idris 2 also comes with a very convenient expression parser that is
aware of precedence and associativity in Text.Parser.Expression
.
The main function called buildExpressionParser
takes in an OperatorTable
and a
Grammar
that represents the terms, and returns a parsed expression. The magic comes from
the OperatorTable
since this table defines all the operators, the grammars for those operators,
the precedence, and the associativity.
An OperatorTable
is a list of lists containing the Op
type. The Op
type allows you to specify
Prefix
, Postfix
, and Infix
operators along with their grammars. Infix
also contains the
associativity called Assoc
which can specify left associativity or AssocLeft
, right
associativity assoc or AssocRight
and as being non-associative or AssocNone
.
An example of an operator table we’ll be using for the calculator is:
[
[ Infix (match CTMultiply >> pure (*)) AssocLeft
, Infix (match CTDivide >> pure (/)) AssocLeft
],
[ Infix (match CTPlus >> pure (+)) AssocLeft
, Infix (match CTMinus >> pure (-)) AssocLeft
]
]
This table defines 4 operators for mulitiplication, division, addition and subtraction.
Mulitiplication and division show up in the first table because they have higher precedence than
addition and subtraction, which show up in the second table. We’re also defining them as infix operators,
with a specific grammar and all being left associative via AssocLeft
.
Building a Calculator
1import Data.List1
2import Text.Lexer
3import Text.Parser
4import Text.Parser.Expression
5
6%default total
7
8data CalculatorTokenKind
9 = CTNum
10 | CTPlus
11 | CTMinus
12 | CTMultiply
13 | CTDivide
14 | CTOParen
15 | CTCParen
16 | CTIgnore
17
18Eq CalculatorTokenKind where
19 (==) CTNum CTNum = True
20 (==) CTPlus CTPlus = True
21 (==) CTMinus CTMinus = True
22 (==) CTMultiply CTMultiply = True
23 (==) CTDivide CTDivide = True
24 (==) CTOParen CTOParen = True
25 (==) CTCParen CTCParen = True
26 (==) _ _ = False
27
28Show CalculatorTokenKind where
29 show CTNum = "CTNum"
30 show CTPlus = "CTPlus"
31 show CTMinus = "CTMinus"
32 show CTMultiply = "CTMultiply"
33 show CTDivide = "CTDivide"
34 show CTOParen = "CTOParen"
35 show CTCParen = "CTCParen"
36 show CTIgnore = "CTIgnore"
37
38CalculatorToken : Type
39CalculatorToken = Token CalculatorTokenKind
40
41Show CalculatorToken where
42 show (Tok kind text) = "Tok kind: " ++ show kind ++ " text: " ++ text
43
44TokenKind CalculatorTokenKind where
45 TokType CTNum = Double
46 TokType _ = ()
47
48 tokValue CTNum s = cast s
49 tokValue CTPlus _ = ()
50 tokValue CTMinus _ = ()
51 tokValue CTMultiply _ = ()
52 tokValue CTDivide _ = ()
53 tokValue CTOParen _ = ()
54 tokValue CTCParen _ = ()
55 tokValue CTIgnore _ = ()
56
57ignored : WithBounds CalculatorToken -> Bool
58ignored (MkBounded (Tok CTIgnore _) _ _) = True
59ignored _ = False
60
61number : Lexer
62number = digits
63
64calculatorTokenMap : TokenMap CalculatorToken
65calculatorTokenMap = toTokenMap [
66 (spaces, CTIgnore),
67 (digits, CTNum),
68 (exact "+", CTPlus),
69 (exact "-", CTMinus),
70 (exact "*", CTMultiply),
71 (exact "/", CTDivide)
72]
73
74lexCalculator : String -> Maybe (List (WithBounds CalculatorToken))
75lexCalculator str =
76 case lex calculatorTokenMap str of
77 (tokens, _, _, "") => Just tokens
78 _ => Nothing
79
80mutual
81 term : Grammar state CalculatorToken True Double
82 term = do
83 num <- match CTNum
84 pure num
85
86 expr : Grammar state CalculatorToken True Double
87 expr = buildExpressionParser [
88 [ Infix ((*) <$ match CTMultiply) AssocLeft
89 , Infix ((/) <$ match CTDivide) AssocLeft
90 ],
91 [ Infix ((+) <$ match CTPlus) AssocLeft
92 , Infix ((-) <$ match CTMinus) AssocLeft
93 ]
94 ] term
95
96parseCalculator : List (WithBounds CalculatorToken) -> Either String Double
97parseCalculator toks =
98 case parse expr $ filter (not . ignored) toks of
99 Right (l, []) => Right l
100 Right e => Left "contains tokens that were not consumed"
101 Left e => Left (show e)
102
103parse1 : String -> Either String Double
104parse1 x =
105 case lexCalculator x of
106 Just toks => parseCalculator toks
107 Nothing => Left "Failed to lex."
Testing out our calculator gives us back the following output:
$ idris2 -p contrib Calculator.idr
Main> :exec printLn $ parse1 "1 + 2 - 3 * 4 / 5"
Right 0.6000000000000001