Skip to main content

ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.

Project description


ThunderDots — DTS client for documentary corpora

DTS scrapping, TEI fragmentation, metadata filtering, validation, and export pipelines.

uv ruff CI License: MIT PyPI version


Overview

ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.

It helps you move from a remote DTS API to structured Python objects and JSON records that can feed indexing pipelines, including full-text search, RAG/vector databases, and corpus-analysis workflows.

ThunderDots focuses on practical documentary workflows: crawling DTS collections, fetching TEI/XML resources, extracting reusable text fragments, selecting metadata, validating outputs, and exporting data to downstream search or indexing systems.


What ThunderDots does

ThunderDots can:

  • walk DTS collections and subcollections;
  • fetch resources and TEI/XML documents;
  • extract text fragments from full documents, DTS navigation, or custom TEI XPath rules;
  • preserve or filter Dublin Core and extension metadata;
  • enrich temporal metadata such as dates and coverage ranges;
  • validate generated outputs with JSON Schema;
  • export records to indexing pipelines such as Elasticsearch or Qdrant-compatible formats;
  • cache fetched corpora as JSON and CSV;
  • run synchronous or asynchronous workflows.

Installation

With uv

uv add thunderdots

With pip

pip install thunderdots

For development

git clone https://github.com/chartes/thunderdots.git
cd thunderdots

uv venv
source .venv/bin/activate
uv sync --all-extras --dev

or with pip

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Minimal example

from thunderdots import ThunderDots

td = ThunderDots(
    endpoint_dts="https://dots.chartes.psl.eu/api/dts",
    collection_params={"collection_id": "ENCPOS_1900"},
    resource_params={"fragment_mode": "document"},
)

td.fetch()
results = td.results()

print(td.stats())

Development

Run tests

pytest

Online DTS tests are opt-in:

RUN_NETWORK_TESTS=1 pytest

Run Ruff (linter, format)

ruff format --check
ruff check

Build the documentation

mkdocs build --strict -f mkdocs/mkdocs.yml

Create a new PyPI release

Check the release checklist for details.

License

ThunderDots is distributed under the MIT License.

Citation

If you use ThunderDots in academic work, please cite it as:

@software{terriel_thunderdots_2026,
  author       = {Terriel, Lucas},
  title        = {ThunderDots},
  year         = {2026},
  publisher    = {GitHub},
  institution  = {{École nationale des chartes}},
  url          = {https://github.com/chartes/thunderdots},
  note         = {Python client for Distributed Text Services (DTS) via DoTS}
}

You can also use the repository metadata from CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thunderdots-0.1.4.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thunderdots-0.1.4-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file thunderdots-0.1.4.tar.gz.

File metadata

  • Download URL: thunderdots-0.1.4.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thunderdots-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ca69595148cef0c733cac634b2f4303979b489ca90e5fe8e06c3737745c802fe
MD5 8cd30c5de45d55e53ba57eab4c74263a
BLAKE2b-256 45c7b93dbd6e52316bd0381488ffa841a7bba93403910e3101da643d8c03e1f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for thunderdots-0.1.4.tar.gz:

Publisher: release.yml on dots-suite/ThunderDots

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thunderdots-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: thunderdots-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 43.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thunderdots-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7043674c3beb7f21e96beb493ed88a4b11ebe6495afc07df6cf53a3e724786a8
MD5 5ad7bcdba48c7784d142b793f13ec815
BLAKE2b-256 aba32cc0fe94ef701da89867fe01b0c88fea6de2114967135035d43740b0445a

See more details on using hashes here.

Provenance

The following attestation bundles were made for thunderdots-0.1.4-py3-none-any.whl:

Publisher: release.yml on dots-suite/ThunderDots

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page