Skip to main content

ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.

Project description


ThunderDots — DTS client for documentary corpora

Fast DTS crawling, TEI fragmentation, metadata filtering, validation, and export pipelines.

uv ruff CI License: MIT PyPI version


Overview

ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.

It helps you move from a remote DTS API to structured Python objects and JSON records that can feed indexing pipelines, including full-text search, RAG/vector databases, and corpus-analysis workflows.

ThunderDots focuses on practical documentary workflows: crawling DTS collections, fetching TEI/XML resources, extracting reusable text fragments, selecting metadata, validating outputs, and exporting data to downstream search or indexing systems.


What ThunderDots does

ThunderDots can:

  • walk DTS collections and subcollections;
  • fetch resources and TEI/XML documents;
  • extract text fragments from full documents, DTS navigation, or custom TEI XPath rules;
  • preserve or filter Dublin Core and extension metadata;
  • enrich temporal metadata such as dates and coverage ranges;
  • validate generated outputs with JSON Schema;
  • export records to indexing pipelines such as Elasticsearch or Qdrant-compatible formats;
  • cache fetched corpora as JSON and CSV;
  • run synchronous or asynchronous workflows.

Installation

With uv

uv add thunderdots

With pip

pip install thunderdots

For development

git clone https://github.com/chartes/thunderdots.git
cd thunderdots

uv venv
source .venv/bin/activate
uv sync --all-extras --dev

or with pip

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Minimal example

from thunderdots import ThunderDots

td = ThunderDots(
    endpoint_dts="https://dots.chartes.psl.eu/api/dts",
    collection_params={"collection_id": "ENCPOS_1900"},
    resource_params={"fragment_mode": "document"},
)

td.fetch()
results = td.results()

print(td.stats())

Development

Run tests

pytest

Online DTS tests are opt-in:

RUN_NETWORK_TESTS=1 pytest

Run Ruff (linter, format)

ruff format --check
ruff check

Build the documentation

mkdocs build --strict -f mkdocs/mkdocs.yml

Create a new PyPI release

Check the release checklist for details.

License

ThunderDots is distributed under the MIT License.

Citation

If you use ThunderDots in academic work, please cite it as:

@software{terriel_thunderdots_2026,
  author       = {Terriel, Lucas},
  title        = {ThunderDots},
  year         = {2026},
  publisher    = {GitHub},
  institution  = {{École nationale des chartes}},
  url          = {https://github.com/chartes/thunderdots},
  note         = {Python client for Distributed Text Services (DTS) via DoTS}
}

You can also use the repository metadata from CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thunderdots-0.1.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thunderdots-0.1.0-py3-none-any.whl (39.6 kB view details)

Uploaded Python 3

File details

Details for the file thunderdots-0.1.0.tar.gz.

File metadata

  • Download URL: thunderdots-0.1.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thunderdots-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e82d5d24ff6fd1ea3c9a8de537490a37b4b9910c06df0547aa1f0b97b14fcb7a
MD5 2963e5b86b0a3809503b41582e6dd165
BLAKE2b-256 f1a386989ae495512aa573079bbeec675f305c1f99af1d3b9ad0f8f5e12bfae9

See more details on using hashes here.

Provenance

The following attestation bundles were made for thunderdots-0.1.0.tar.gz:

Publisher: release.yml on chartes/ThunderDots

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thunderdots-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: thunderdots-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thunderdots-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a04fb698de1860d471186e540bfac1650460f38417885b0fb9daaa0fec1501a
MD5 44bf9ea71c32043f46dacb6f810b4d1f
BLAKE2b-256 af72bb284a587c4a2b10010b66d263b57c14c212528593af6ee8795110a4ab1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for thunderdots-0.1.0-py3-none-any.whl:

Publisher: release.yml on chartes/ThunderDots

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page