Skip to main content

Aozora Bunko notation parser — Python bindings (PyO3).

Project description

aozora

ci docs deploy latest release license msrv

🎮 Playground · 📚 Handbook (mdbook) · 📖 API reference (rustdoc) · 📦 Releases & binaries · 🇯🇵 日本語

Pure-functional Rust parser for 青空文庫記法 (Aozora Bunko notation): ruby (|青梅《おうめ》), bouten ([#「X」に傍点]), 縦中横, 外字 references (※[#…、第3水準1-85-54]), kunten / kaeriten, indent / align-end containers ([#ここから2字下げ]… [#ここで字下げ終わり]), and page / section breaks.

The parser is CommonMark-free, Markdown-free — this repository deals only with the 青空文庫 notation itself. The renderer emits semantic HTML5; the lexer reports structured diagnostics; the AST is a borrowed-arena tree that can be walked in O(n) without copying source bytes.

Installation

Pre-built CLI

Pre-built aozora CLI binaries for Linux x86_64, macOS arm64, and Windows x86_64 are attached to every GitHub Release — the releases page carries aozora-vX.Y.Z-<target>.{tar.gz,zip} archives with SHA256SUMS.

Build from source

cargo install --git https://github.com/P4suta/aozora --locked aozora-cli

(builds the latest main; pin to a release tag for reproducible builds — see the install chapter for the tag-pinned form.)

As a Rust library

The Cargo.toml snippet (with the current release tag) lives in the install chapter — keeping it in one place avoids version-pin drift across multiple READMEs. crates.io publication tracks the 1.0 API freeze.

For WASM / C ABI / Python bindings see the Bindings chapters of the handbook.

Quickstart

use aozora::Document;

let source = "|青梅《おうめ》".to_owned();
let doc = Document::new(source);
let tree = doc.parse();

let html: String = tree.to_html();
let canonical: String = tree.serialize();
let diagnostics = tree.diagnostics();

assert_eq!(canonical, "|青梅《おうめ》");

Document owns a bumpalo arena; tree borrows from it for the lifetime of the Document. Dropping the Document releases every node in a single Bump::reset step.

CLI

aozora check FILE.txt           # lex + report diagnostics
aozora fmt --check FILE.txt     # round-trip parse ∘ serialize check
aozora render FILE.txt          # render to HTML on stdout
aozora check -E sjis FILE.txt   # Shift_JIS source from Aozora Bunko

All subcommands accept - (or no path argument) to read from stdin. See the CLI reference chapter for the full subcommand reference.

Crate layout

aozora is a 21-crate workspace. crates/aozora is the public facade — library consumers usually import only this one.

Crate Purpose
crates/aozora Top-level facade. Document::parse() → AozoraTree<'_>, structured Diagnostics, SLUGS catalogue, canonicalise_slug. The single front door.
crates/aozora-spec Single source of truth for shared types: Span, TriggerKind, PairKind, Diagnostic, PUA sentinel codepoints, SLUGS dispatch table. No internal dependency.
crates/aozora-syntax AST types (AozoraNode borrowed-arena variants, ContainerKind, BoutenKind, Indent).
crates/aozora-encoding Shift_JIS decoding + 外字 lookup (compile-time PHF, JIS X 0213 + UCS resolution).
crates/aozora-scan SIMD-friendly multi-pattern scanner backends (Teddy / structural-bitmap / Hoehrmann DFA / naive fallback).
crates/aozora-veb Eytzinger-layout sorted-set lookup (cache-friendly binary search).
crates/aozora-pipeline 4-phase lexer (sanitize → events → pair → classify) plus the lex_into_arena orchestrator — pure fn(&str, &Arena) -> BorrowedLexOutput<'_>.
crates/aozora-render HTML and serialise renderers — html::render_to_string, serialize::serialize.
crates/aozora-cst rowan-backed lossless concrete syntax tree. Editor/formatter surface.
crates/aozora-query Tree-sitter-style pattern DSL (SyntaxKind + capture) for queries over the CST.
crates/aozora-pandoc Pandoc AST projection (AozoraTreepandoc_ast::Pandoc); unlocks 50+ output formats via Pandoc writers.
crates/aozora-cli aozora binary: check / fmt / schema / kinds / explain / pandoc.
crates/aozora-wasm wasm32-unknown-unknown target for wasm-pack build --target web.
crates/aozora-ffi C ABI driver (opaque handle, JSON-encoded structured data).
crates/aozora-py PyO3 bindings, distributed via maturin.
crates/aozora-bench Criterion + corpus-driven probes (PGO profile source).
crates/aozora-conformance WPT-style conformance fixture runner (golden HTML / serialize / diagnostics / wire across 23 fixtures).
crates/aozora-corpus Corpus source abstraction for sweep tests (dev-only, set AOZORA_CORPUS_ROOT).
crates/aozora-proptest Shared proptest strategies (aozora_fragment / pathological_aozora / unicode_adversarial and friends; dev-only).
crates/aozora-trace DWARF symbolicator for samply traces.
crates/aozora-xtask Repo automation (samply wrapper, trace analysis, corpus pack/unpack, schema dumps).

See the Architecture chapter of the handbook for the layered design, the borrowed-arena AST, the SIMD scanner backends, and the dependency graph between these crates.

Development

Everything runs inside Docker — the host toolchain is never invoked. Bring up the dev image once, then drive every operation through just:

just                # list targets
just build          # cargo build --workspace --all-targets
just test           # cargo nextest run --workspace
just prop           # property-based sweep (128 cases per block)
just lint           # fmt + clippy pedantic+nursery + typos + strict-code
just deny           # cargo-deny licenses + advisories + bans
just coverage       # cargo llvm-cov branch coverage
just ci             # full CI replica
just book-build     # render the mdbook handbook
just book-serve     # live-preview the handbook at localhost:3000

Use just run to invoke the CLI inside the container:

just run check FILE.txt
just run render -E sjis FILE.txt > out.html

See CONTRIBUTING.md for the contribution flow, testing strategy, and lint policy.

Documentation

  • 📚 Handbook — the mdbook site: notation reference, architecture (borrowed-arena AST, SIMD scanner backends, encoding), bindings (Rust / WASM / C ABI / Python), performance (samply / bench / corpus sweep), CLI / API / env reference, and the contributor guide.
  • 📖 API reference (rustdoc) — auto-deployed alongside the handbook.
  • CONTRIBUTING.md — dev setup, TDD flow, PR rules.
  • SECURITY.md — vulnerability disclosure.
  • CHANGELOG.md — release history.

Related projects

Repo What it is
P4suta/afm CommonMark + GFM + 青空文庫記法 integrated Markdown dialect, built on top of this parser.
P4suta/aozora-tools Authoring tools: formatter, LSP server, tree-sitter grammar, VS Code extension.

License

Dual-licensed under Apache-2.0 OR MIT at your option, matching Rust community convention. See NOTICE for third-party attribution (Aozora Bunko spec snapshots and public-domain sample works used in tests).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

aozora_py-0.4.1-cp312-cp312-win_amd64.whl (435.1 kB view details)

Uploaded CPython 3.12Windows x86-64

aozora_py-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl (561.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

aozora_py-0.4.1-cp312-cp312-macosx_11_0_arm64.whl (484.0 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

aozora_py-0.4.1-cp311-cp311-win_amd64.whl (433.6 kB view details)

Uploaded CPython 3.11Windows x86-64

aozora_py-0.4.1-cp311-cp311-manylinux_2_34_x86_64.whl (559.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

aozora_py-0.4.1-cp311-cp311-macosx_11_0_arm64.whl (486.0 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

aozora_py-0.4.1-cp310-cp310-win_amd64.whl (433.8 kB view details)

Uploaded CPython 3.10Windows x86-64

aozora_py-0.4.1-cp310-cp310-manylinux_2_34_x86_64.whl (559.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

aozora_py-0.4.1-cp310-cp310-macosx_11_0_arm64.whl (486.3 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

aozora_py-0.4.1-cp39-cp39-win_amd64.whl (434.9 kB view details)

Uploaded CPython 3.9Windows x86-64

aozora_py-0.4.1-cp39-cp39-manylinux_2_34_x86_64.whl (561.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

aozora_py-0.4.1-cp39-cp39-macosx_11_0_arm64.whl (487.7 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file aozora_py-0.4.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: aozora_py-0.4.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 435.1 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for aozora_py-0.4.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 771d32d2d2fe96b4983e2b7d6262ed1598c69800cd68fcb0936ec4142cc402cc
MD5 7463a82f133d0cb63bef9ad111c1d864
BLAKE2b-256 1c2746fc07519e5313dc34658ece70292aeca9db777b45d14aad04929d770a45

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp312-cp312-win_amd64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aozora_py-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ec26f3303deb09fcb965427861b47cd7513371a7929e48003ae33fa3467281c0
MD5 71a66314fea7bc069904a2154b8c730a
BLAKE2b-256 dee13e72ceef12242fcbe2902523761a70029fbb03e3a7df61056c415264e964

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for aozora_py-0.4.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7ff6a1f1609d34eee91a41a09bb3fb1de56a71b0744de9a28d7a551a9fda9315
MD5 a03380dca9eb36f1c03a5f0c7a02eae5
BLAKE2b-256 da416579a7959fc18d360a32ab1d3f66b9292290a63ca67cb2d66c052b59ae2f

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: aozora_py-0.4.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 433.6 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for aozora_py-0.4.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a3e3e407ad50a041241490570d8dd21111c7ea79a56136fc60609a2294acbbfd
MD5 147c97ff71773d1b4d5930d16160eaf3
BLAKE2b-256 162ee219872cd4fda6ffbcab2e2e7d5b8c3d2a64b3e85ac5486c274507445de5

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp311-cp311-win_amd64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aozora_py-0.4.1-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ca4f9f1cce14610ad29371a192cbbd7ffeebb9daf5cc8ef30c3b167f5b277ea4
MD5 559feb019906ac86812f43dc08ec6e2d
BLAKE2b-256 2dc932333d7efb1b2f5d53f6443b1438040312847b3f7e401cae9192f52865e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for aozora_py-0.4.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aa105e3ec2c866b475fd01ce8158b8b9d2d57ce9452ee46c0f8c8350dd99799b
MD5 eb3ec95395157949fa4250e19e96eeb6
BLAKE2b-256 7729522110e01a69f7d7af92669f70cb581df0e44bf60415b44b96a05023fca9

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: aozora_py-0.4.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 433.8 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for aozora_py-0.4.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f0eba9f7183d18840f92583901ed70dac4a3b9f5fa4e62eed6868a1aeca2f478
MD5 3167450ddea380f187262d7be9ce7bd0
BLAKE2b-256 cb15a26756896f46a204255a6cb63cba8cb167d648e4e72229e8d44e5cba97b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp310-cp310-win_amd64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aozora_py-0.4.1-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 31674aaf4079efe44d110eacb0154f36d8b46ebfedddefbcc6d566809d40ac26
MD5 d892830907a5204648d2fedd3f74d5ad
BLAKE2b-256 70ad6b6e0c762ed2f0aa5b1933e8ab8ae2ea449cdbc44abbc4cfe9f6067b29cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for aozora_py-0.4.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0656129b93750457569631c8520b1ef5ae8122e3f2deb1b97d73c6e8189f4426
MD5 8b2d397d1ced8b458fbd6f8f59058d7c
BLAKE2b-256 5cbb4398e3fca56e4a3ffdfba2865e4b337b94f14acb26f53a25616779c12b01

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: aozora_py-0.4.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 434.9 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for aozora_py-0.4.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 d2aee5601ad3270255ede0b67b4cd0399c50d78d4429f72c2ab144a36ec325fc
MD5 3cad2ed1013d78d9bac2ac5013ae0103
BLAKE2b-256 44b66627e529c59fbb04715be6b7ec0d79f88aae82b917f0986ec92a3f22f268

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp39-cp39-win_amd64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for aozora_py-0.4.1-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1f9b71777021a6a4bc522e7fcc82fd99f9ad981544c611a53824fe078896dde4
MD5 1ee9cc4aaa061873b4b1b93974075bd9
BLAKE2b-256 b8be4a627711a34279d533be321805b4328ddfccce20cda776f50883292ce845

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp39-cp39-manylinux_2_34_x86_64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aozora_py-0.4.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for aozora_py-0.4.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d8e25b18d3ee2a7d703be259dfb29fc12332fa3c9b2e52876434455d75fe84ee
MD5 4ec3afaf31fd8988288c121a4e37a176
BLAKE2b-256 4ce61c0e6d12b76f0556ae4a9b9da55a212c96f2dbb11b170c96669c1ec8560a

See more details on using hashes here.

Provenance

The following attestation bundles were made for aozora_py-0.4.1-cp39-cp39-macosx_11_0_arm64.whl:

Publisher: publish-pypi.yml on P4suta/aozora

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page