Skip to main content

ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.

Project description


ThunderDots — DTS client for documentary corpora

DTS scrapping, TEI fragmentation, metadata filtering, validation, and export pipelines.

uv ruff CI License: MIT PyPI version


Overview

ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.

It helps you move from a remote DTS API to structured Python objects and JSON records that can feed indexing pipelines, including full-text search, RAG/vector databases, and corpus-analysis workflows.

ThunderDots focuses on practical documentary workflows: crawling DTS collections, fetching TEI/XML resources, extracting reusable text fragments, selecting metadata, validating outputs, and exporting data to downstream search or indexing systems.


What ThunderDots does

ThunderDots can:

  • walk DTS collections and subcollections;
  • fetch resources and TEI/XML documents;
  • extract text fragments from full documents, DTS navigation, or custom TEI XPath rules;
  • preserve or filter Dublin Core and extension metadata;
  • enrich temporal metadata such as dates and coverage ranges;
  • validate generated outputs with JSON Schema;
  • export records to indexing pipelines such as Elasticsearch or Qdrant-compatible formats;
  • cache fetched corpora as JSON and CSV;
  • run synchronous or asynchronous workflows.

Installation

With uv

uv add thunderdots

With pip

pip install thunderdots

For development

git clone https://github.com/chartes/thunderdots.git
cd thunderdots

uv venv
source .venv/bin/activate
uv sync --all-extras --dev

or with pip

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Minimal example

from thunderdots import ThunderDots

td = ThunderDots(
    endpoint_dts="https://dots.chartes.psl.eu/api/dts",
    collection_params={"collection_id": "ENCPOS_1900"},
    resource_params={"fragment_mode": "document"},
)

td.fetch()
results = td.results()

print(td.stats())

Development

Run tests

pytest

Online DTS tests are opt-in:

RUN_NETWORK_TESTS=1 pytest

Run Ruff (linter, format)

ruff format --check
ruff check

Build the documentation

mkdocs build --strict -f mkdocs/mkdocs.yml

Create a new PyPI release

Check the release checklist for details.

License

ThunderDots is distributed under the MIT License.

Citation

If you use ThunderDots in academic work, please cite it as:

@software{terriel_thunderdots_2026,
  author       = {Terriel, Lucas},
  title        = {ThunderDots},
  year         = {2026},
  publisher    = {GitHub},
  institution  = {{École nationale des chartes}},
  url          = {https://github.com/chartes/thunderdots},
  note         = {Python client for Distributed Text Services (DTS) via DoTS}
}

You can also use the repository metadata from CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thunderdots-0.1.5.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thunderdots-0.1.5-py3-none-any.whl (44.0 kB view details)

Uploaded Python 3

File details

Details for the file thunderdots-0.1.5.tar.gz.

File metadata

  • Download URL: thunderdots-0.1.5.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thunderdots-0.1.5.tar.gz
Algorithm Hash digest
SHA256 fd99676932bf2ac59fda471a438f366d219e94159e6ab0ffdc17f56d965c3a4f
MD5 b2b8f7f0a6a43f43459d61726fbd533a
BLAKE2b-256 c3eff7d8a193a9068582431b8a6a7da64f0bf897b954987ce0dca63917fc1feb

See more details on using hashes here.

Provenance

The following attestation bundles were made for thunderdots-0.1.5.tar.gz:

Publisher: release.yml on dots-suite/ThunderDots

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thunderdots-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: thunderdots-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 44.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thunderdots-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 86b134eba2d0a34d89f29b85d2141693e7814c310a5f1f18809a6293b00acf1b
MD5 a8ca0f4b53de31903dedd91243fd9d2e
BLAKE2b-256 1c201f5dddc6d50e1de8e87d90d414e9115bddd6963f613a646249a22058a4ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for thunderdots-0.1.5-py3-none-any.whl:

Publisher: release.yml on dots-suite/ThunderDots

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page