| Safe Haskell | None |
|---|---|
| Language | GHC2021 |
Aihc.Parser.Lex
Description
This module performs the pre-parse tokenization step for Haskell source code.
It turns raw text into LexTokens that preserve:
- a semantic token classification (
LexTokenKind) - the original token text (
lexTokenText) - source location information (
lexTokenSpan)
The lexer runs in two phases:
- Raw tokenization with a custom incremental scanner that consumes one or more
input chunks and emits tokens lazily. Extension-specific lexing (such as
NegativeLiteralsandLexicalNegation) is handled inline during this phase by tracking the previous token context. - Layout insertion (
applyLayoutTokens) that inserts virtual{,;and}according to indentation (the offside rule), so the parser can treat implicit layout like explicit braces and semicolons.
Scanning is incremental and error-tolerant:
- token production starts as soon as enough input is available
- malformed lexemes produce
TkErrortokens instead of aborting lexing # ...,#line ...,{-# LINE #-}and{-# COLUMN #-}are handled in-band by the lexer and update subsequent token spans without being exposed as normal tokens
Layout-sensitive syntax is the tricky part. The implementation tracks a stack of
layout contexts and mirrors the haskell-src-exts model summarized in
docs/hse-indentation-layout.md:
- after layout-introducing keywords (currently
do,of,let,where,\case, plus optional module body layout), mark a pending implicit block - if the next token is an explicit
{, disable implicit insertion for that block - otherwise, open an implicit layout context at the next token column
- at beginning-of-line tokens, dedent emits virtual
}, equal-indent emits virtual;(with a small suppression rule forthen/else)
Keyword classification is intentionally lexical and exact. lexIdentifier
produces a keyword token only when the full identifier text exactly matches a
reserved word in keywordTokenKind. That means:
wherebecomesTkKeywordWherewhere',_where, andM.whereremain identifiers
In other words, use keyword tokens only for exact reserved lexemes; contextual validity is left to the parser.
Synopsis
- data LexToken = LexToken {}
- data LexTokenKind
- = TkKeywordCase
- | TkKeywordClass
- | TkKeywordData
- | TkKeywordDefault
- | TkKeywordDeriving
- | TkKeywordDo
- | TkKeywordElse
- | TkKeywordForeign
- | TkKeywordIf
- | TkKeywordImport
- | TkKeywordIn
- | TkKeywordInfix
- | TkKeywordInfixl
- | TkKeywordInfixr
- | TkKeywordInstance
- | TkKeywordLet
- | TkKeywordModule
- | TkKeywordNewtype
- | TkKeywordOf
- | TkKeywordThen
- | TkKeywordType
- | TkKeywordWhere
- | TkKeywordUnderscore
- | TkKeywordQualified
- | TkKeywordAs
- | TkKeywordHiding
- | TkReservedDotDot
- | TkReservedColon
- | TkReservedDoubleColon
- | TkReservedEquals
- | TkReservedBackslash
- | TkReservedPipe
- | TkReservedLeftArrow
- | TkReservedRightArrow
- | TkReservedAt
- | TkReservedDoubleArrow
- | TkVarId Text
- | TkConId Text
- | TkQVarId Text
- | TkQConId Text
- | TkVarSym Text
- | TkConSym Text
- | TkQVarSym Text
- | TkQConSym Text
- | TkInteger Integer
- | TkIntegerBase Integer Text
- | TkFloat Double Text
- | TkChar Char
- | TkString Text
- | TkSpecialLParen
- | TkSpecialRParen
- | TkSpecialComma
- | TkSpecialSemicolon
- | TkSpecialLBracket
- | TkSpecialRBracket
- | TkSpecialBacktick
- | TkSpecialLBrace
- | TkSpecialRBrace
- | TkMinusOperator
- | TkPrefixMinus
- | TkPrefixBang
- | TkPrefixTilde
- | TkPragmaLanguage [ExtensionSetting]
- | TkPragmaWarning Text
- | TkPragmaDeprecated Text
- | TkQuasiQuote Text Text
- | TkError Text
- isReservedIdentifier :: Text -> Bool
- readModuleHeaderExtensions :: Text -> [ExtensionSetting]
- readModuleHeaderExtensionsFromChunks :: [Text] -> [ExtensionSetting]
- lexTokensFromChunks :: [Text] -> [LexToken]
- lexModuleTokensFromChunks :: [Extension] -> [Text] -> [LexToken]
- lexTokensWithExtensions :: [Extension] -> Text -> [LexToken]
- lexModuleTokensWithExtensions :: [Extension] -> Text -> [LexToken]
- lexTokens :: Text -> [LexToken]
- lexModuleTokens :: Text -> [LexToken]
Documentation
Constructors
| LexToken | |
Fields
| |
Instances
data LexTokenKind Source #
Constructors
| TkKeywordCase | |
| TkKeywordClass | |
| TkKeywordData | |
| TkKeywordDefault | |
| TkKeywordDeriving | |
| TkKeywordDo | |
| TkKeywordElse | |
| TkKeywordForeign | |
| TkKeywordIf | |
| TkKeywordImport | |
| TkKeywordIn | |
| TkKeywordInfix | |
| TkKeywordInfixl | |
| TkKeywordInfixr | |
| TkKeywordInstance | |
| TkKeywordLet | |
| TkKeywordModule | |
| TkKeywordNewtype | |
| TkKeywordOf | |
| TkKeywordThen | |
| TkKeywordType | |
| TkKeywordWhere | |
| TkKeywordUnderscore | |
| TkKeywordQualified | |
| TkKeywordAs | |
| TkKeywordHiding | |
| TkReservedDotDot | |
| TkReservedColon | |
| TkReservedDoubleColon | |
| TkReservedEquals | |
| TkReservedBackslash | |
| TkReservedPipe | |
| TkReservedLeftArrow | |
| TkReservedRightArrow | |
| TkReservedAt | |
| TkReservedDoubleArrow | |
| TkVarId Text | |
| TkConId Text | |
| TkQVarId Text | |
| TkQConId Text | |
| TkVarSym Text | |
| TkConSym Text | |
| TkQVarSym Text | |
| TkQConSym Text | |
| TkInteger Integer | |
| TkIntegerBase Integer Text | |
| TkFloat Double Text | |
| TkChar Char | |
| TkString Text | |
| TkSpecialLParen | |
| TkSpecialRParen | |
| TkSpecialComma | |
| TkSpecialSemicolon | |
| TkSpecialLBracket | |
| TkSpecialRBracket | |
| TkSpecialBacktick | |
| TkSpecialLBrace | |
| TkSpecialRBrace | |
| TkMinusOperator | |
| TkPrefixMinus | |
| TkPrefixBang | |
| TkPrefixTilde | |
| TkPragmaLanguage [ExtensionSetting] | |
| TkPragmaWarning Text | |
| TkPragmaDeprecated Text | |
| TkQuasiQuote Text Text | |
| TkError Text |
Instances
| Shorthand LexTokenKind Source # | |||||
Defined in Aihc.Parser.Shorthand Methods shorthand :: LexTokenKind -> Doc () Source # | |||||
| NFData LexTokenKind Source # | |||||
Defined in Aihc.Parser.Lex Methods rnf :: LexTokenKind -> () # | |||||
| Generic LexTokenKind Source # | |||||
Defined in Aihc.Parser.Lex Associated Types
| |||||
| Read LexTokenKind Source # | |||||
Defined in Aihc.Parser.Lex Methods readsPrec :: Int -> ReadS LexTokenKind # readList :: ReadS [LexTokenKind] # | |||||
| Show LexTokenKind Source # | |||||
Defined in Aihc.Parser.Lex Methods showsPrec :: Int -> LexTokenKind -> ShowS # show :: LexTokenKind -> String # showList :: [LexTokenKind] -> ShowS # | |||||
| Eq LexTokenKind Source # | |||||
Defined in Aihc.Parser.Lex | |||||
| Ord LexTokenKind Source # | |||||
Defined in Aihc.Parser.Lex Methods compare :: LexTokenKind -> LexTokenKind -> Ordering # (<) :: LexTokenKind -> LexTokenKind -> Bool # (<=) :: LexTokenKind -> LexTokenKind -> Bool # (>) :: LexTokenKind -> LexTokenKind -> Bool # (>=) :: LexTokenKind -> LexTokenKind -> Bool # max :: LexTokenKind -> LexTokenKind -> LexTokenKind # min :: LexTokenKind -> LexTokenKind -> LexTokenKind # | |||||
| type Rep LexTokenKind Source # | |||||
Defined in Aihc.Parser.Lex type Rep LexTokenKind = D1 ('MetaData "LexTokenKind" "Aihc.Parser.Lex" "aihc-parser-0.1.0.0-DMgbIAjzuEdJKCHQvjmdks" 'False) ((((((C1 ('MetaCons "TkKeywordCase" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordClass" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkKeywordData" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordDefault" 'PrefixI 'False) (U1 :: Type -> Type))) :+: ((C1 ('MetaCons "TkKeywordDeriving" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordDo" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkKeywordElse" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordForeign" 'PrefixI 'False) (U1 :: Type -> Type)))) :+: (((C1 ('MetaCons "TkKeywordIf" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordImport" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkKeywordIn" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordInfix" 'PrefixI 'False) (U1 :: Type -> Type))) :+: ((C1 ('MetaCons "TkKeywordInfixl" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordInfixr" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkKeywordInstance" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordLet" 'PrefixI 'False) (U1 :: Type -> Type))))) :+: ((((C1 ('MetaCons "TkKeywordModule" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordNewtype" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkKeywordOf" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordThen" 'PrefixI 'False) (U1 :: Type -> Type))) :+: ((C1 ('MetaCons "TkKeywordType" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordWhere" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkKeywordUnderscore" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordQualified" 'PrefixI 'False) (U1 :: Type -> Type)))) :+: (((C1 ('MetaCons "TkKeywordAs" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkKeywordHiding" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkReservedDotDot" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkReservedColon" 'PrefixI 'False) (U1 :: Type -> Type))) :+: ((C1 ('MetaCons "TkReservedDoubleColon" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkReservedEquals" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkReservedBackslash" 'PrefixI 'False) (U1 :: Type -> Type) :+: (C1 ('MetaCons "TkReservedPipe" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkReservedLeftArrow" 'PrefixI 'False) (U1 :: Type -> Type))))))) :+: (((((C1 ('MetaCons "TkReservedRightArrow" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkReservedAt" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkReservedDoubleArrow" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkVarId" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text)))) :+: ((C1 ('MetaCons "TkConId" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text)) :+: C1 ('MetaCons "TkQVarId" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text))) :+: (C1 ('MetaCons "TkQConId" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text)) :+: C1 ('MetaCons "TkVarSym" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text))))) :+: (((C1 ('MetaCons "TkConSym" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text)) :+: C1 ('MetaCons "TkQVarSym" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text))) :+: (C1 ('MetaCons "TkQConSym" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text)) :+: C1 ('MetaCons "TkInteger" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Integer)))) :+: ((C1 ('MetaCons "TkIntegerBase" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Integer) :*: S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text)) :+: C1 ('MetaCons "TkFloat" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Double) :*: S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text))) :+: (C1 ('MetaCons "TkChar" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Char)) :+: (C1 ('MetaCons "TkString" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text)) :+: C1 ('MetaCons "TkSpecialLParen" 'PrefixI 'False) (U1 :: Type -> Type)))))) :+: ((((C1 ('MetaCons "TkSpecialRParen" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkSpecialComma" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkSpecialSemicolon" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkSpecialLBracket" 'PrefixI 'False) (U1 :: Type -> Type))) :+: ((C1 ('MetaCons "TkSpecialRBracket" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkSpecialBacktick" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkSpecialLBrace" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkSpecialRBrace" 'PrefixI 'False) (U1 :: Type -> Type)))) :+: (((C1 ('MetaCons "TkMinusOperator" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkPrefixMinus" 'PrefixI 'False) (U1 :: Type -> Type)) :+: (C1 ('MetaCons "TkPrefixBang" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "TkPrefixTilde" 'PrefixI 'False) (U1 :: Type -> Type))) :+: ((C1 ('MetaCons "TkPragmaLanguage" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 [ExtensionSetting])) :+: C1 ('MetaCons "TkPragmaWarning" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text))) :+: (C1 ('MetaCons "TkPragmaDeprecated" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text)) :+: (C1 ('MetaCons "TkQuasiQuote" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text) :*: S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text)) :+: C1 ('MetaCons "TkError" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Text))))))))) | |||||
isReservedIdentifier :: Text -> Bool Source #
readModuleHeaderExtensions :: Text -> [ExtensionSetting] Source #
Read leading module-header pragmas and return parsed LANGUAGE settings.
This scans only the pragma/header prefix (allowing whitespace and comments) and stops at the first non-pragma token or lexer error token.
readModuleHeaderExtensionsFromChunks :: [Text] -> [ExtensionSetting] Source #
Read leading module-header pragmas from one or more input chunks.
This scans only the pragma/header prefix (allowing whitespace and comments) and stops at the first non-pragma token or lexer error token.
lexTokensFromChunks :: [Text] -> [LexToken] Source #
Lex an expression/declaration stream from one or more input chunks.
Tokens are produced lazily, so downstream consumers can begin parsing before the full source has been scanned.
lexModuleTokensFromChunks :: [Extension] -> [Text] -> [LexToken] Source #
Lex a full module from one or more input chunks with explicit extensions.
This variant enables module-body layout insertion in addition to the normal token scan and extension rewrites.
lexTokensWithExtensions :: [Extension] -> Text -> [LexToken] Source #
Lex source text using explicit lexer extensions.
This runs raw tokenization, extension rewrites, and implicit-layout insertion.
Module-top layout is not enabled here. Malformed lexemes become TkError
tokens in the token stream.
lexModuleTokensWithExtensions :: [Extension] -> Text -> [LexToken] Source #
Lex module source text using explicit lexer extensions.
Like lexTokensWithExtensions, but also enables top-level module-body layout:
when the source omits explicit braces, virtual layout tokens are inserted
after module ... where (or from the first non-pragma token in module-less files).
lexModuleTokens :: Text -> [LexToken] Source #
Convenience lexer entrypoint for full modules: no explicit extension list.
Leading header pragmas are scanned first so module-enabled extensions can be applied before token rewrites and top-level layout insertion.