Skip to main content

Haskell grammar for tree-sitter

Project description

tree-sitter-haskell

CI

Haskell grammar for tree-sitter.

References

Supported Language Extensions

These extensions are supported ✅, unsupported ❌ or not applicable because they don't involve parsing ➖️:

  • AllowAmbiguousTypes ➖️
  • ApplicativeDo ➖️
  • Arrows ❌
  • BangPatterns ✅
  • BinaryLiterals ✅
  • BlockArguments ✅
  • CApiFFI ✅
  • ConstrainedClassMethods ✅
  • ConstraintKinds ✅
  • CPP ✅
  • CUSKs ✅
  • DataKinds ✅
  • DatatypeContexts ✅
  • DefaultSignatures ✅
  • DeriveAnyClass ➖️
  • DeriveDataTypeable ➖️
  • DeriveFoldable ➖️
  • DeriveFunctor ➖️
  • DeriveGeneric ➖️
  • DeriveLift ➖️
  • DeriveTraversable ➖️
  • DerivingStrategies ✅
  • DerivingVia ✅
  • DisambiguateRecordFields ➖️
  • DuplicateRecordFields ➖️
  • EmptyCase ✅
  • EmptyDataDecls ✅
  • EmptyDataDeriving ✅
  • ExistentialQuantification ✅
  • ExplicitForAll ✅
  • ExplicitNamespaces ✅
  • ExtendedDefaultRules ➖️
  • FlexibleContexts ✅
  • FlexibleInstances ✅
  • ForeignFunctionInterface ✅
  • FunctionalDependencies ✅
  • GADTs ✅
  • GADTSyntax ✅
  • GeneralisedNewtypeDeriving ➖️
  • GHCForeignImportPrim ✅
  • Haskell2010 ➖️
  • Haskell98 ➖️
  • HexFloatLiterals ✅
  • ImplicitParams ✅
  • ImplicitPrelude ➖️
  • ImportQualifiedPost ✅
  • ImpredicativeTypes ➖️
  • IncoherentInstances ➖️
  • InstanceSigs ✅
  • InterruptibleFFI ✅
  • KindSignatures ✅
  • LambdaCase ✅
  • LexicalNegation ❌
  • LiberalTypeSynonyms ✅
  • LinearTypes ✅
  • ListTuplePuns ✅
  • MagicHash ✅
  • Modifiers ❌
  • MonadComprehensions ➖️
  • MonadFailDesugaring ➖️
  • MonoLocalBinds ➖️
  • MonomorphismRestriction ➖️
  • MultiParamTypeClasses ✅
  • MultiWayIf ✅
  • NamedFieldPuns ✅
  • NamedWildCards ✅
  • NegativeLiterals ➖️
  • NondecreasingIndentation ✅
  • NPlusKPatterns ➖️
  • NullaryTypeClasses ✅
  • NumDecimals ➖️
  • NumericUnderscores ✅
  • OverlappingInstances ➖️
  • OverloadedLabels ✅
  • OverloadedLists ➖️
  • OverloadedRecordDot ✅
  • OverloadedRecordUpdate ✅
  • OverloadedStrings ➖️
  • PackageImports ✅
  • ParallelListComp ✅
  • PartialTypeSignatures ✅
  • PatternGuards ✅
  • PatternSynonyms ✅
  • PolyKinds ➖️
  • PostfixOperators ➖️
  • QualifiedDo ✅
  • QuantifiedConstraints ✅
  • QuasiQuotes ✅
  • Rank2Types ✅
  • RankNTypes ✅
  • RebindableSyntax ➖️
  • RecordWildCards ➖️
  • RecursiveDo ✅
  • RequiredTypeArguments ✅
  • RoleAnnotations ✅
  • Safe ➖️
  • ScopedTypeVariables ✅
  • StandaloneDeriving ✅
  • StandaloneKindSignatures ✅
  • StarIsType ✅
  • StaticPointers ❌
  • Strict ➖️
  • StrictData ✅
  • TemplateHaskell ✅
  • TemplateHaskellQuotes ✅
  • TraditionalRecordSyntax ➖️
  • TransformListComp ✅
  • Trustworthy ➖️
  • TupleSections ✅
  • TypeAbstractions ✅
  • TypeApplications ✅
  • TypeData ✅
  • TypeFamilies ✅
  • TypeFamilyDependencies ✅
  • TypeInType ✅
  • TypeOperators ✅
  • TypeSynonymInstances ➖️
  • UnboxedSums ✅
  • UnboxedTuples ✅
  • UndecidableInstances ➖️
  • UndecidableSuperClasses ➖️
  • UnicodeSyntax ✅
  • UnliftedFFITypes ➖️
  • UnliftedNewtypes ✅
  • Unsafe ➖️
  • ViewPatterns ✅

Bugs

CPP

Preprocessor #elif and #else directives cannot be handled correctly, since the parser state would have to be manually reset to what it was at the #if. As a workaround, the code blocks in the alternative branches are parsed as part of the directives.

Querying

The grammar contains several supertypes, which group multiple other node types under a single name.

Supertype names do not occur as extra nodes in parse trees, but they can be used in queries in special ways:

  • As an alias, matching any of their subtypes
  • As prefix for one of their subtypes, matching its symbol only when it occurs as a production of the supertype

For example, the query (expression) matches the nodes infix, record, projection, constructor, and the second and third variable in this tree for cats <> Cat {mood = moods.sleepy}:

(infix
  (variable)
  (operator)
  (record
    (constructor)
    (field_update
      (field_name (variable))
      (projection (variable) (field_name (variable)))))))))

The two occurrences of variable in field_name (mood and sleepy) are not expressions, but record field names part of a composite record expression.

Matching variable nodes specifically that are expressions is possible with the second special form. A query for (expression/variable) will match only the other two, cats and moods.

The grammar's supertypes consist of the following sets:

  • expression

    Rules that are valid in any expression position, excluding type applications, explicit types and expression signatures.

  • pattern

    Rules that are valid in any pattern position, excluding type binders, explicit types and pattern signatures.

  • type

    Types that are either atomic (have no ambiguous associativity, like bracketed constructs, variables and type constructors), applied types or infix types.

  • quantified_type

    Types prefixed with a forall, context or function parameter.

  • constraint

    Almost the same rules as type, but mirrored for use in contexts.

  • constraints

    Analog of quantified_type, for constraints with forall or context.

  • type_param

    Atomic nodes in type and class heads, like the three nodes following A in data A @k a (b :: k).

  • declaration

    All top-level declarations, like functions and data types.

  • decl

    Shorthand for declarations that are also valid in local bindings (let and where) and in class and instance bodies, except for fixity declarations. Consists of signature, function and bind.

  • class_decl and instance_decl

    All declarations that are valid in classes and instances, which includes associated type and data families.

  • statement

    Different forms of do-notation statements.

  • qualifier

    Different forms of list comprehension qualifiers.

  • guard

    Different forms of guards in function equations and case alternatives.

Development

The main driver for generating and testing the parser for this grammar is the tree-sitter CLI. Other components of the project require additional tools, described below.

Some are made available through npm – for example, npx tree-sitter runs the CLI. If you don't have tree-sitter available otherwise, prefix all the commands in the following sections with npx.

Output path

The CLI writes the shared library containing the parser to the directory denoted by $TREE_SITTER_LIBDIR. If that variable is unset, it defaults to $HOME/.cache/tree-sitter/lib.

In order to avoid clobbering this global directory with development versions, you can set the env var to a local path:

export TREE_SITTER_LIBDIR=$PWD/.lib

The grammar

The javascript file grammar.js contains the entry point into the grammar's production rules. Please consult the tree-sitter documentation for a comprehensive introduction to the syntax and semantics.

Parsing starts with the first item in the rules field:

{
  rules: {
    haskell: $ => seq(
      optional($.header),
      optional($._body),
    ),
  }
}

Generating the parser

The first step in the development workflow converts the javascript rule definitions to C code in src/parser.c:

$ tree-sitter generate

Two byproducts of this process are written to src/grammar.json and src/node-types.json.

Compiling the parser

The C code is automatically compiled by most of the test tools mentioned below, but you can instruct tree-sitter to do it in one go:

$ tree-sitter generate --build

If you've set $TREE_SITTER_LIBDIR as mentioned above, the shared object will be written to $PWD/.lib/haskell.so.

Aside from the generated src/parser.c, tree-sitter will also compile and link src/scanner.c into this object. This file contains the external scanner, which is a custom extension of the built-in lexer whose purpose is to handle language constructs that cannot be expressed (efficiently) in the javascript grammar, like Haskell layouts.

WebAssembly

The parser can be compiled to WebAssembly as well, which requires emscripten:

$ tree-sitter build --wasm

The resulting binary is written to $PWD/tree-sitter-haskell.wasm.

Testing the parser

The most fundamental test infrastructure for tree-sitter grammars consists of a set of code snippets with associated reference ASTs stored in ./test/corpus/*.txt.

$ tree-sitter test

Individual tests can be run by specifying (a substring of) their description with -f:

$ tree-sitter test -f 'module: exports empty'

The project contains several other types of tests:

  • test/parse/run.bash [update] [test names ...] parses the files in test/parse/*.hs and compares the output with test/parse/*.target. If update is specified as the first argument, it will update the .target file for the first failing test.

  • test/query/run.bash [update] [test names ...] parses the files in test/query/*.hs, applies the queries in test/query/*.query and compares the output with test/query/*.target, similar to test/parse.

  • test/rust/parse-test.rs contains a few tests that use tree-sitter's Rust API to extract the test ranges for terminals in a slightly more convenient way. This requires cargo to be installed, and can be executed with cargo test (which also runs the tests in bindings/rust).

  • test/parse-libs [wasm] clones a set of Haskell libraries to test/libs and parses the entire codebase. When invoked as test/parse-libs wasm, it will use the WebAssembly parser. This requires bc to be installed.

  • test/parse-lib name [wasm] parses only the library name in that directory (without cloning the repository).

Debugging

The shared library built by tree-sitter test includes debug symbols, so if the scanner segfaults you can just run coredumpctl debug to inspect the backtrace and memory:

newline_lookahead () at src/scanner.c:2583
2583                ((Newline *) 0)->indent = 5;
(gdb) bt
#0  newline_lookahead () at src/scanner.c:2583
#1  0x00007ffff7a0740e in newline_start () at src/scanner.c:2604
#2  scan () at src/scanner.c:2646
#3  eval () at src/scanner.c:2684
#4  tree_sitter_haskell_external_scanner_scan (payload=<optimized out>, lexer=<optimized out>,
    valid_symbols=<optimized out>) at src/scanner.c:2724
#5  0x0000555555772488 in ts_parser.lex ()

For more control, launch gdb tree-sitter and start the process with run test -f 'some test', and set a breakpoint with break tree_sitter_haskell_external_scanner_scan.

To disable optimizations, run tree-sitter test --debug-build.

Tracing

The test and parse commands offer two modes for obtaining detailed information about the parsing process.

With tree-sitter test --debug, every lexer step and shift/reduce action is printed to stderr.

With tree-sitter test --debug-graph, the CLI will generate an HTML file showing a graph representation of every step. This requires graphviz to be installed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tree-sitter-haskell-0.21.0.tar.gz (827.7 kB view details)

Uploaded Source

Built Distributions

tree_sitter_haskell-0.21.0-cp38-abi3-win_amd64.whl (319.3 kB view details)

Uploaded CPython 3.8+ Windows x86-64

tree_sitter_haskell-0.21.0-cp38-abi3-musllinux_1_1_x86_64.whl (436.4 kB view details)

Uploaded CPython 3.8+ musllinux: musl 1.1+ x86-64

tree_sitter_haskell-0.21.0-cp38-abi3-musllinux_1_1_aarch64.whl (437.7 kB view details)

Uploaded CPython 3.8+ musllinux: musl 1.1+ ARM64

tree_sitter_haskell-0.21.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (431.2 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

tree_sitter_haskell-0.21.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (434.2 kB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

tree_sitter_haskell-0.21.0-cp38-abi3-macosx_11_0_arm64.whl (355.0 kB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

tree_sitter_haskell-0.21.0-cp38-abi3-macosx_10_9_x86_64.whl (325.7 kB view details)

Uploaded CPython 3.8+ macOS 10.9+ x86-64

File details

Details for the file tree-sitter-haskell-0.21.0.tar.gz.

File metadata

  • Download URL: tree-sitter-haskell-0.21.0.tar.gz
  • Upload date:
  • Size: 827.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tree-sitter-haskell-0.21.0.tar.gz
Algorithm Hash digest
SHA256 008d1cee4efbe5e64c77132b6daadf9a71e38c24d0c40277af2d6c0b2e5c3f6b
MD5 2fd1b37e44a242354d88c49885bc9fc4
BLAKE2b-256 a818fabe2c1ef465e51d4e6c2d942b3bc1f48010273afafcd31e0c95c0b43d0f

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.21.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.21.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b0fc53ffd56e3bee58a3e2b0da4fdb583c3a3ea3129ea8faf2900ccf1f8880ed
MD5 d3278336e351a301ecd98dc8fab98a3a
BLAKE2b-256 528b75e24ec4d4f179b4a787f4c2581bd53cabbb2fbfc08d4235f170d092d101

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.21.0-cp38-abi3-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.21.0-cp38-abi3-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 7c2fa774a7083985ece920162f93f2946d5530e1af9b044ec9a4d69d203a9211
MD5 fc0ca0bf380dbedddba7dd6b1c7fcd0e
BLAKE2b-256 6d84294a880747219be460eed54e603e4704ed69282286c37dedfe42d8076596

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.21.0-cp38-abi3-musllinux_1_1_aarch64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.21.0-cp38-abi3-musllinux_1_1_aarch64.whl
Algorithm Hash digest
SHA256 ce2e69b1a7516274d1dabbdb2bfd8fbc7d7f00626d99de163ac36740b306a2e6
MD5 1240bc97285e08dbc3964052946a3147
BLAKE2b-256 e0bd7b21bb9755d35a748a374c81ca0aa8a046f66b2c71605ca6bd227e0e3189

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.21.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.21.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 86c588b51a070a4a00bcb3bb50f830d595e01c5cf4a6edb382bc5ec472fc431f
MD5 58a8536b9f1f3f83d434a27e4c43cde6
BLAKE2b-256 a6391b176be4aca74c7ee035874676746cad19f94815ec67a7b616471bb36ff5

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.21.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.21.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 babb6c5353a0808cae40ee482afc96b7659378b8d77b120cf6e968d7073ce7d2
MD5 f89417eca11ae54cdca64f980e23f1a1
BLAKE2b-256 c939ec2b4d25f959bf65006223b4e2824e08d4f3d6e0a762f2f03e0e68fecd6f

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.21.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.21.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0159bc8da0426c74a4b7906a9d1e035e05f894bdd45dc2e1eb0ae006cf1aa15b
MD5 570b15ac2c0730d89e08b58d632bac4c
BLAKE2b-256 6d844ebc5a400f712bf57d490c0fa4b28088a54e88ab7d1954cb260b16f33228

See more details on using hashes here.

File details

Details for the file tree_sitter_haskell-0.21.0-cp38-abi3-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_haskell-0.21.0-cp38-abi3-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 137092e30bc1fc836ad1f4102bcd1abcfd020b4d1c1f080c040f9cf83f19873d
MD5 989f4e60323d06040c6815761e1309ec
BLAKE2b-256 ae33f163583faeccbd619171383f28cc4f41262c85b54c089b51754c87033f5f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page