Skip to main content

Parser for Python inline script metadata as defined in PEP723.

Project description

ducktools: scriptmetadata

Parser for embedded metadata in python source files originally defined in PEP-723 and specified on packaging.python.org.

Inline script metadata can be extracted from a file path, from a string or from an iterable of lines (such as an open file).

This module does not attempt to parse the contents of the metadata blocks in any way.

How to Install

Install this module via PyPI

python -m pip install ducktools-scriptmetadata

from pathlib import Path

from ducktools.scriptmetadata import parse_source, parse_file, parse_iterable

src_path = Path("examples/pep-723-sample.py")

# Parse from a link to a file
metadata = parse_file(src_path, encoding="utf-8")

# Parse from source code as a string
metadata = parse_source(src_path.read_text())

# Parse from an iterable of source code lines
with src_path.open("r") as f:
    metadata = parse_iterable(f, start_line=1)

# Get all metadata block names and plaintext content as a dict
metadata.blocks

# Get a list of warnings about potentially malformed blocks
metadata.warnings

Inputs and Outputs

PEP-723 Example Input

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "requests<3",
#   "rich",
# ]
# ///

metadata.blocks:

{'script': 'requires-python = ">=3.11"\ndependencies = [\n  "requests<3",\n  "rich",\n]\n'}

metadata.warnings:

[]

Incomplete block

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "requests<3",
#   "rich",
# ]

metadata.blocks:

{}

metadata.warnings:

[MetadataWarning(line_number=7, message="Potential unclosed block 'script' detected. A '# ///' block is needed to indicate the end of the block.")]

Example of usage with toml parsing/validation

An example script using tomllib/tomli to parse TOML and packaging to handle version and dependency specifiers.

import warnings
from pathlib import Path
try:
    import tomllib
except ImportError:
    import tomli as tomllib
    
from packaging.specifiers import SpecifierSet
from packaging.requirements import Requirement

from ducktools.scriptmetadata import parse_file

def parse_requirements(f):
    data = parse_file(f)
    
    if script_block := data.blocks.get("script"):
        deps = tomllib.loads(script_block)
        requires_python = SpecifierSet(deps["requires-python"]) if "requires-python" in deps else None
        dependencies = [Requirement(dep) for dep in deps.get("dependencies", [])]
    else:
        requires_python = None
        dependencies = []
        
    if data.warnings:
        for message in data.warnings:
            warnings.warn(str(message))
    
    return {
        "requires-python": requires_python,
        "dependencies": dependencies,
    }

example_success = Path("examples/pep-723-sample.py")
example_warning = Path("examples/incomplete_example.py")

print("Valid metadata block output:")
print(parse_requirements(example_success))
print()
print("Incomplete metadata block output:")
print(parse_requirements(example_warning))

Output:

Valid metadata block output:
{'requires-python': <SpecifierSet('>=3.11')>, 'dependencies': [<Requirement('requests<3')>, <Requirement('rich')>]}

Incomplete metadata block output:
{'requires-python': None, 'dependencies': []}
<Source Location>: UserWarning: Line 7: Potential unclosed block 'script' detected. A '# ///' block is needed to indicate the end of the block.
  warnings.warn(message)

Why not include the TOML/requirements parsing in this module

I wanted to provide a parser that purely handled the new format for metadata. TOML parsing and validation of version specifiers can then be handled by whichever library the user prefers.

For example: If someone wanted to add inline metadata support to an existing tool that used rtoml to handle other toml parsing duties then it would make sense for the toml parsing to be handled by that package instead of making the choice to use tomllib (and incurring the import cost).

Why not use the regex from the PEP/Specification page?

While using the regex would correctly extract valid metadata blocks it does not provide a way to give additional warnings to users about potential issues with incorrect block formatting.

This parser will collect warnings if it encounters an unclosed block, if it detects multiple valid header lines within a block, and if a potential block name contains an invalid character. It will raise an exception if multiple blocks with the same name are encountered.

Importing the python regex module is also slower than parsing the source in this way.

Python 3.12 on Windows parsing the example file:

hyperfine -w3 -r100 "python -c \"import re\"" "python perf\ducktools_parse.py" "python perf\regex_parse.py"

Benchmark 1: python -c "import re"
  Time (mean ± σ):      30.0 ms ±   0.6 ms    [User: 15.1 ms, System: 11.7 ms]
  Range (min … max):    29.0 ms …  33.5 ms    100 runs

Benchmark 2: python perf\ducktools_parse.py
  Time (mean ± σ):      25.9 ms ±   0.8 ms    [User: 11.8 ms, System: 13.4 ms]
  Range (min … max):    24.9 ms …  30.0 ms    100 runs

Benchmark 3: python perf\regex_parse.py
  Time (mean ± σ):      31.6 ms ±   1.6 ms    [User: 16.7 ms, System: 13.9 ms]
  Range (min … max):    29.9 ms …  40.5 ms    100 runs

Summary
  python perf\ducktools_parse.py ran
    1.16 ± 0.04 times faster than python -c "import re"
    1.22 ± 0.07 times faster than python perf\regex_parse.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ducktools_scriptmetadata-0.2.1.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ducktools_scriptmetadata-0.2.1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file ducktools_scriptmetadata-0.2.1.tar.gz.

File metadata

  • Download URL: ducktools_scriptmetadata-0.2.1.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ducktools_scriptmetadata-0.2.1.tar.gz
Algorithm Hash digest
SHA256 33aba375a235dc64231b827d2fbfe0bf99ea22cfc45c81271f292bfb4651b9d3
MD5 b197d9b4f3bc41a15f8594146c77490c
BLAKE2b-256 e66f5881ebefc9dba4982e59726a00db800a4e3e1bbb4b996d93989887c519db

See more details on using hashes here.

Provenance

The following attestation bundles were made for ducktools_scriptmetadata-0.2.1.tar.gz:

Publisher: publish_to_pypi.yml on DavidCEllis/ducktools-scriptmetadata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ducktools_scriptmetadata-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ducktools_scriptmetadata-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ce4c2fcbb08dd460686aa3e1bf38f527a95fbb7579342062ad748834ca1bed87
MD5 85943737084e383043eb46dcbf559075
BLAKE2b-256 95ecaa98700ec2c41622d05aab54fbb63e4460ad8022501837ef1639b080f83b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ducktools_scriptmetadata-0.2.1-py3-none-any.whl:

Publisher: publish_to_pypi.yml on DavidCEllis/ducktools-scriptmetadata

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page