Skip to main content

Haskell grammar for tree-sitter

Project description

tree-sitter-haskell

CI discord matrix crates npm pypi

Haskell grammar for tree-sitter.

References

Supported Language Extensions

These extensions are supported ✅, unsupported ❌ or not applicable because they don't involve parsing ➖️:

  • AllowAmbiguousTypes ➖️
  • ApplicativeDo ➖️
  • Arrows ❌
  • BangPatterns ✅
  • BinaryLiterals ✅
  • BlockArguments ✅
  • CApiFFI ✅
  • ConstrainedClassMethods ✅
  • ConstraintKinds ✅
  • CPP ✅
  • CUSKs ✅
  • DataKinds ✅
  • DatatypeContexts ✅
  • DefaultSignatures ✅
  • DeriveAnyClass ➖️
  • DeriveDataTypeable ➖️
  • DeriveFoldable ➖️
  • DeriveFunctor ➖️
  • DeriveGeneric ➖️
  • DeriveLift ➖️
  • DeriveTraversable ➖️
  • DerivingStrategies ✅
  • DerivingVia ✅
  • DisambiguateRecordFields ➖️
  • DuplicateRecordFields ➖️
  • EmptyCase ✅
  • EmptyDataDecls ✅
  • EmptyDataDeriving ✅
  • ExistentialQuantification ✅
  • ExplicitForAll ✅
  • ExplicitNamespaces ✅
  • ExtendedDefaultRules ➖️
  • FlexibleContexts ✅
  • FlexibleInstances ✅
  • ForeignFunctionInterface ✅
  • FunctionalDependencies ✅
  • GADTs ✅
  • GADTSyntax ✅
  • GeneralisedNewtypeDeriving ➖️
  • GHCForeignImportPrim ✅
  • Haskell2010 ➖️
  • Haskell98 ➖️
  • HexFloatLiterals ✅
  • ImplicitParams ✅
  • ImplicitPrelude ➖️
  • ImportQualifiedPost ✅
  • ImpredicativeTypes ➖️
  • IncoherentInstances ➖️
  • InstanceSigs ✅
  • InterruptibleFFI ✅
  • KindSignatures ✅
  • LambdaCase ✅
  • LexicalNegation ❌
  • LiberalTypeSynonyms ✅
  • LinearTypes ✅
  • ListTuplePuns ✅
  • MagicHash ✅
  • Modifiers ❌
  • MonadComprehensions ➖️
  • MonadFailDesugaring ➖️
  • MonoLocalBinds ➖️
  • MonomorphismRestriction ➖️
  • MultiParamTypeClasses ✅
  • MultiWayIf ✅
  • NamedFieldPuns ✅
  • NamedWildCards ✅
  • NegativeLiterals ➖️
  • NondecreasingIndentation ✅
  • NPlusKPatterns ➖️
  • NullaryTypeClasses ✅
  • NumDecimals ➖️
  • NumericUnderscores ✅
  • OverlappingInstances ➖️
  • OverloadedLabels ✅
  • OverloadedLists ➖️
  • OverloadedRecordDot ✅
  • OverloadedRecordUpdate ✅
  • OverloadedStrings ➖️
  • PackageImports ✅
  • ParallelListComp ✅
  • PartialTypeSignatures ✅
  • PatternGuards ✅
  • PatternSynonyms ✅
  • PolyKinds ➖️
  • PostfixOperators ➖️
  • QualifiedDo ✅
  • QuantifiedConstraints ✅
  • QuasiQuotes ✅
  • Rank2Types ✅
  • RankNTypes ✅
  • RebindableSyntax ➖️
  • RecordWildCards ➖️
  • RecursiveDo ✅
  • RequiredTypeArguments ✅
  • RoleAnnotations ✅
  • Safe ➖️
  • ScopedTypeVariables ✅
  • StandaloneDeriving ✅
  • StandaloneKindSignatures ✅
  • StarIsType ✅
  • StaticPointers ❌
  • Strict ➖️
  • StrictData ✅
  • TemplateHaskell ✅
  • TemplateHaskellQuotes ✅
  • TraditionalRecordSyntax ➖️
  • TransformListComp ✅
  • Trustworthy ➖️
  • TupleSections ✅
  • TypeAbstractions ✅
  • TypeApplications ✅
  • TypeData ✅
  • TypeFamilies ✅
  • TypeFamilyDependencies ✅
  • TypeInType ✅
  • TypeOperators ✅
  • TypeSynonymInstances ➖️
  • UnboxedSums ✅
  • UnboxedTuples ✅
  • UndecidableInstances ➖️
  • UndecidableSuperClasses ➖️
  • UnicodeSyntax ✅
  • UnliftedFFITypes ➖️
  • UnliftedNewtypes ✅
  • Unsafe ➖️
  • ViewPatterns ✅

Bugs

CPP

Preprocessor #elif and #else directives cannot be handled correctly, since the parser state would have to be manually reset to what it was at the #if. As a workaround, the code blocks in the alternative branches are parsed as part of the directives.

Querying

The grammar contains several supertypes, which group multiple other node types under a single name.

Supertype names do not occur as extra nodes in parse trees, but they can be used in queries in special ways:

  • As an alias, matching any of their subtypes
  • As prefix for one of their subtypes, matching its symbol only when it occurs as a production of the supertype

For example, the query (expression) matches the nodes infix, record, projection, constructor, and the second and third variable in this tree for cats <> Cat {mood = moods.sleepy}:

(infix
  (variable)
  (operator)
  (record
    (constructor)
    (field_update
      (field_name (variable))
      (projection (variable) (field_name (variable)))))))))

The two occurrences of variable in field_name (mood and sleepy) are not expressions, but record field names part of a composite record expression.

Matching variable nodes specifically that are expressions is possible with the second special form. A query for (expression/variable) will match only the other two, cats and moods.

The grammar's supertypes consist of the following sets:

  • expression

    Rules that are valid in any expression position, excluding type applications, explicit types and expression signatures.

  • pattern

    Rules that are valid in any pattern position, excluding type binders, explicit types and pattern signatures.

  • type

    Types that are either atomic (have no ambiguous associativity, like bracketed constructs, variables and type constructors), applied types or infix types.

  • quantified_type

    Types prefixed with a forall, context or function parameter.

  • constraint

    Almost the same rules as type, but mirrored for use in contexts.

  • constraints

    Analog of quantified_type, for constraints with forall or context.

  • type_param

    Atomic nodes in type and class heads, like the three nodes following A in data A @k a (b :: k).

  • declaration

    All top-level declarations, like functions and data types.

  • decl

    Shorthand for declarations that are also valid in local bindings (let and where) and in class and instance bodies, except for fixity declarations. Consists of signature, function and bind.

  • class_decl and instance_decl

    All declarations that are valid in classes and instances, which includes associated type and data families.

  • statement

    Different forms of do-notation statements.

  • qualifier

    Different forms of list comprehension qualifiers.

  • guard

    Different forms of guards in function equations and case alternatives.

Development

The main driver for generating and testing the parser for this grammar is the tree-sitter CLI. Other components of the project require additional tools, described below.

Some are made available through npm – for example, npx tree-sitter runs the CLI. If you don't have tree-sitter available otherwise, prefix all the commands in the following sections with npx.

Output path

The CLI writes the shared library containing the parser to the directory denoted by $TREE_SITTER_LIBDIR. If that variable is unset, it defaults to $HOME/.cache/tree-sitter/lib.

In order to avoid clobbering this global directory with development versions, you can set the env var to a local path:

export TREE_SITTER_LIBDIR=$PWD/.lib

The grammar

The javascript file grammar.js contains the entry point into the grammar's production rules. Please consult the tree-sitter documentation for a comprehensive introduction to the syntax and semantics.

Parsing starts with the first item in the rules field:

{
  rules: {
    haskell: $ => seq(
      optional($.header),
      optional($._body),
    ),
  }
}

Generating the parser

The first step in the development workflow converts the javascript rule definitions to C code in src/parser.c:

$ tree-sitter generate

Two byproducts of this process are written to src/grammar.json and src/node-types.json.

Compiling the parser

The C code is automatically compiled by most of the test tools mentioned below, but you can instruct tree-sitter to do it in one go:

$ tree-sitter generate --build

If you've set $TREE_SITTER_LIBDIR as mentioned above, the shared object will be written to $PWD/.lib/haskell.so.

Aside from the generated src/parser.c, tree-sitter will also compile and link src/scanner.c into this object. This file contains the external scanner, which is a custom extension of the built-in lexer whose purpose is to handle language constructs that cannot be expressed (efficiently) in the javascript grammar, like Haskell layouts.

WebAssembly

The parser can be compiled to WebAssembly as well, which requires emscripten:

$ tree-sitter build --wasm

The resulting binary is written to $PWD/tree-sitter-haskell.wasm.

Testing the parser

The most fundamental test infrastructure for tree-sitter grammars consists of a set of code snippets with associated reference ASTs stored in ./test/corpus/*.txt.

$ tree-sitter test

Individual tests can be run by specifying (a substring of) their description with -f:

$ tree-sitter test -f 'module: exports empty'

The project contains several other types of tests:

  • test/parse/run.bash [update] [test names ...] parses the files in test/parse/*.hs and compares the output with test/parse/*.target. If update is specified as the first argument, it will update the .target file for the first failing test.

  • test/query/run.bash [update] [test names ...] parses the files in test/query/*.hs, applies the queries in test/query/*.query and compares the output with test/query/*.target, similar to test/parse.

  • test/rust/parse-test.rs contains a few tests that use tree-sitter's Rust API to extract the test ranges for terminals in a slightly more convenient way. This requires cargo to be installed, and can be executed with cargo test (which also runs the tests in bindings/rust).

  • test/parse-libs [wasm] clones a set of Haskell libraries to test/libs and parses the entire codebase. When invoked as test/parse-libs wasm, it will use the WebAssembly parser. This requires bc to be installed.

  • test/parse-lib name [wasm] parses only the library name in that directory (without cloning the repository).

Debugging

The shared library built by tree-sitter test includes debug symbols, so if the scanner segfaults you can just run coredumpctl debug to inspect the backtrace and memory:

newline_lookahead () at src/scanner.c:2583
2583                ((Newline *) 0)->indent = 5;
(gdb) bt
#0  newline_lookahead () at src/scanner.c:2583
#1  0x00007ffff7a0740e in newline_start () at src/scanner.c:2604
#2  scan () at src/scanner.c:2646
#3  eval () at src/scanner.c:2684
#4  tree_sitter_haskell_external_scanner_scan (payload=<optimized out>, lexer=<optimized out>,
    valid_symbols=<optimized out>) at src/scanner.c:2724
#5  0x0000555555772488 in ts_parser.lex ()

For more control, launch gdb tree-sitter and start the process with run test -f 'some test', and set a breakpoint with break tree_sitter_haskell_external_scanner_scan.

To disable optimizations, run tree-sitter test --debug-build.

Tracing

The test and parse commands offer two modes for obtaining detailed information about the parsing process.

With tree-sitter test --debug, every lexer step and shift/reduce action is printed to stderr.

With tree-sitter test --debug-graph, the CLI will generate an HTML file showing a graph representation of every step. This requires graphviz to be installed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tree_sitter_haskell-0.23.0.tar.gz (808.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tree_sitter_haskell-0.23.0-cp39-abi3-win_amd64.whl (314.4 kB view details)

Uploaded CPython 3.9+Windows x86-64

tree_sitter_haskell-0.23.0-cp39-abi3-musllinux_1_2_x86_64.whl (417.9 kB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

tree_sitter_haskell-0.23.0-cp39-abi3-musllinux_1_2_aarch64.whl (420.4 kB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ ARM64

tree_sitter_haskell-0.23.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (424.7 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

tree_sitter_haskell-0.23.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (427.5 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

tree_sitter_haskell-0.23.0-cp39-abi3-macosx_11_0_arm64.whl (349.6 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

tree_sitter_haskell-0.23.0-cp39-abi3-macosx_10_9_x86_64.whl (320.2 kB view details)

Uploaded CPython 3.9+macOS 10.9+ x86-64

File details

Details for the file tree_sitter_haskell-0.23.0.tar.gz.

File metadata

  • Download URL: tree_sitter_haskell-0.23.0.tar.gz
  • Upload date:
  • Size: 808.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.5

File hashes

Hashes for tree_sitter_haskell-0.23.0.tar.gz
Algorithm Hash digest
SHA256 599dab84c4deba387dd5edfcf39329f13275043583971f70e4be9e3c1524f99a
MD5 2e7597e5102ceea3aade36c9356240ab
BLAKE2b-256 551ebb3b9350d1ad28aef8ca2ac8ed962d70291ddd2aa7734ca39a9149398320

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.23.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.23.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 0f3f10c0ae6da7a36b42f5f3620c042a746c12517ed2f6b6d7d961e14bc690bc
MD5 06c90c304b985782636178364209d0f7
BLAKE2b-256 614ad63c937dbb8035cffb255bcb74b369222f382750c05692ed038e2c04abff

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.23.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.23.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 4a9da2b04c0b579d50f547c22292d5b2b2d389ae272da16df07ced8d61bcf6c7
MD5 13215db2351858c9e7107a44e8f25db1
BLAKE2b-256 8b74aae4c4608224562726b315f0c9a90a7101376b76dde805ee932d56ff5046

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.23.0-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.23.0-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 4091957be5463c1669f8a78da9327bbe900eb13c97408fed6eb1c80ab2967f9d
MD5 19d1a1c647964b13729fd36ff91454b7
BLAKE2b-256 ed07b72ea2ec180d30786d2987669a47b1027bf7ecbd3337b81db852b4f67db1

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.23.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.23.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fee96beb0bd327f215446a94517fb4b5a1aaf0afecf4cfd8dae10e8aa26d14b4
MD5 43d9b1455c5e21c34120e34917f57213
BLAKE2b-256 dd5cd9bb197f53a433d1336b15e892dbf1ccc7a7b0cf6ce2387301f5828a4bd6

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.23.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.23.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 6d4f9281ce66ee9909053c7010c467ba872d4fb3078cec2e05fea665d42f60d9
MD5 fe47279e5d2bcbf774fc24e826577880
BLAKE2b-256 33edc08d5934c651cd8b3e29a4108b6bfc23ecee60336c30b4a2321522fbaf78

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.23.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.23.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 80d4e2069dd43e83bf77ef4e70602a64cbef8e5ce69bd115e574b20a45c931e9
MD5 3637657e3bc1d5edb84d77d6c20fbcee
BLAKE2b-256 25b3a165cde3235b2e87120497ea343f6ff7c627f02543a8a726dbcf64115cb2

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.23.0-cp39-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.23.0-cp39-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 4dd2cbdb377a43e03c587b417e608454284cb93aa9bd3ba69bd31b299a1fa012
MD5 1c59966f7a62b5f75aa359309f1d6847
BLAKE2b-256 e3fe8552e019ceb178cbe2e7d0910b45582c5bb3d64ee4ea9906170ca4464954

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page