Skip to main content

ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.

Project description


ThunderDots — DTS client for documentary corpora

DTS scrapping, TEI fragmentation, metadata filtering, validation, and export pipelines.

uv ruff CI License: MIT PyPI version


Overview

ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.

It helps you move from a remote DTS API to structured Python objects and JSON records that can feed indexing pipelines, including full-text search, RAG/vector databases, and corpus-analysis workflows.

ThunderDots focuses on practical documentary workflows: crawling DTS collections, fetching TEI/XML resources, extracting reusable text fragments, selecting metadata, validating outputs, and exporting data to downstream search or indexing systems.


What ThunderDots does

ThunderDots can:

  • walk DTS collections and subcollections;
  • fetch resources and TEI/XML documents;
  • extract text fragments from full documents, DTS navigation, or custom TEI XPath rules;
  • preserve or filter Dublin Core and extension metadata;
  • enrich temporal metadata such as dates and coverage ranges;
  • validate generated outputs with JSON Schema;
  • export records to indexing pipelines such as Elasticsearch or Qdrant-compatible formats;
  • cache fetched corpora as JSON and CSV;
  • run synchronous or asynchronous workflows.

Installation

With uv

uv add thunderdots

With pip

pip install thunderdots

For development

git clone https://github.com/chartes/thunderdots.git
cd thunderdots

uv venv
source .venv/bin/activate
uv sync --all-extras --dev

or with pip

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Minimal example

from thunderdots import ThunderDots

td = ThunderDots(
    endpoint_dts="https://dots.chartes.psl.eu/api/dts",
    collection_params={"collection_id": "ENCPOS_1900"},
    resource_params={"fragment_mode": "document"},
)

td.fetch()
results = td.results()

print(td.stats())

Development

Run tests

pytest

Online DTS tests are opt-in:

RUN_NETWORK_TESTS=1 pytest

Run Ruff (linter, format)

ruff format --check
ruff check

Build the documentation

mkdocs build --strict -f mkdocs/mkdocs.yml

Create a new PyPI release

Check the release checklist for details.

License

ThunderDots is distributed under the MIT License.

Citation

If you use ThunderDots in academic work, please cite it as:

@software{terriel_thunderdots_2026,
  author       = {Terriel, Lucas},
  title        = {ThunderDots},
  year         = {2026},
  publisher    = {GitHub},
  institution  = {{École nationale des chartes}},
  url          = {https://github.com/chartes/thunderdots},
  note         = {Python client for Distributed Text Services (DTS) via DoTS}
}

You can also use the repository metadata from CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thunderdots-0.1.3.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thunderdots-0.1.3-py3-none-any.whl (41.3 kB view details)

Uploaded Python 3

File details

Details for the file thunderdots-0.1.3.tar.gz.

File metadata

  • Download URL: thunderdots-0.1.3.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thunderdots-0.1.3.tar.gz
Algorithm Hash digest
SHA256 63bdd44f0d8ef1c43d21ac604063bb25f1987af161ba2673bc35a6e627fbc0d1
MD5 4dc8a4534bf8266f1523896cfa4d11ee
BLAKE2b-256 19f4e420b5a1d356eae5ce3518449bd1b9e67bc03b08d21cad976f46bba58304

See more details on using hashes here.

Provenance

The following attestation bundles were made for thunderdots-0.1.3.tar.gz:

Publisher: release.yml on dots-suite/ThunderDots

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thunderdots-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: thunderdots-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thunderdots-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 63b503fbd4197c8e279280aab0935d883b81ca0b9b4fba38e5b2638250333cff
MD5 68f80334f6ded8b8dea97b6321e6bd4d
BLAKE2b-256 61228a225760de786a15df5e6287f47799ce71dfc269224c577ad0cbd970833a

See more details on using hashes here.

Provenance

The following attestation bundles were made for thunderdots-0.1.3-py3-none-any.whl:

Publisher: release.yml on dots-suite/ThunderDots

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page