Skip to main content

Hierarchical document chunking for TEI XML documents

Project description

tei-chunker

Document chunker specialized for TEI XML (i.e. GROBID outputs from academic PDF parsing)

https://www.tei-c.org/
https://github.com/kermitt2/grobid

tei-chunker/
├── .github/
│   └── workflows/
│       ├── docker.yml
│       └── publish.yml
├── examples/
│   └── github-workflow.yml
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_chunking.py
│   ├── test_document.py
│   └── test_github_utils.py
├── tei_chunker/
│   ├── __init__.py
│   ├── __about__.py
│   ├── chunking.py
│   ├── document.py
│   ├── github_utils.py
│   └── service.py
├── .gitignore
├── Dockerfile
├── LICENSE
├── README.md
└── pyproject.toml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tei_chunker-0.1.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tei_chunker-0.1.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file tei_chunker-0.1.0.tar.gz.

File metadata

  • Download URL: tei_chunker-0.1.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for tei_chunker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 59eef05d7f18662eb4a0331ed00c751f96228f6f84c6ab65e22c6cd839b11fde
MD5 e9d0c90d6db517e54575ddc554a77443
BLAKE2b-256 c72810c8df457e4ba1e8e7a86d7daa284fc521d9369873604ac6cdba4ca8baaf

See more details on using hashes here.

Provenance

The following attestation bundles were made for tei_chunker-0.1.0.tar.gz:

Publisher: publish.yaml on dmarx/tei-chunker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tei_chunker-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tei_chunker-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for tei_chunker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55dd58d30e33a48ad64479f490332ed3f14337640729ba00ff4b86cc32bfb699
MD5 a8b2a0df906ef2551184d26249d279c5
BLAKE2b-256 210625a9d6c258e7aa1224b553a6e44f7d6a40ac52cf4736b92f8aaf8b68a211

See more details on using hashes here.

Provenance

The following attestation bundles were made for tei_chunker-0.1.0-py3-none-any.whl:

Publisher: publish.yaml on dmarx/tei-chunker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page