Skip to main content

ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.

Project description


ThunderDots — DTS client for documentary corpora

DTS scrapping, TEI fragmentation, metadata filtering, validation, and export pipelines.

uv ruff CI License: MIT PyPI version


Overview

ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.

It helps you move from a remote DTS API to structured Python objects and JSON records that can feed indexing pipelines, including full-text search, RAG/vector databases, and corpus-analysis workflows.

ThunderDots focuses on practical documentary workflows: crawling DTS collections, fetching TEI/XML resources, extracting reusable text fragments, selecting metadata, validating outputs, and exporting data to downstream search or indexing systems.


What ThunderDots does

ThunderDots can:

  • walk DTS collections and subcollections;
  • fetch resources and TEI/XML documents;
  • extract text fragments from full documents, DTS navigation, or custom TEI XPath rules;
  • preserve or filter Dublin Core and extension metadata;
  • enrich temporal metadata such as dates and coverage ranges;
  • validate generated outputs with JSON Schema;
  • transform DTS resources into Pandas/Polars DataFrames;
  • export records to indexing pipelines such as Elasticsearch or Qdrant-compatible formats;
  • cache fetched corpora as JSON and CSV;
  • run synchronous or asynchronous workflows.

Installation

With uv

uv add thunderdots

With pip

pip install thunderdots

For development

git clone https://github.com/chartes/thunderdots.git
cd thunderdots

uv venv
source .venv/bin/activate
uv sync --all-extras --dev

or with pip

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Minimal example

from thunderdots import ThunderDots

td = ThunderDots(
    endpoint_dts="https://dots.chartes.psl.eu/api/dts",
    collection_params={"collection_id": "ENCPOS_1900"},
    resource_params={"fragment_mode": "document"},
)

td.fetch()
results = td.results()

print(td.stats())

Development

Run tests

pytest

Online DTS tests are opt-in:

RUN_NETWORK_TESTS=1 pytest

Run Ruff (linter, format)

ruff format --check
ruff check

Build the documentation

mkdocs build --strict -f mkdocs/mkdocs.yml

Create a new PyPI release

Check the release checklist for details.

License

ThunderDots is distributed under the MIT License.

Citation

If you use ThunderDots in academic work, please cite it as:

@software{terriel_thunderdots_2026,
  author       = {Terriel, Lucas},
  title        = {ThunderDots},
  year         = {2026},
  publisher    = {GitHub},
  institution  = {{École nationale des chartes}},
  url          = {https://github.com/chartes/thunderdots},
  note         = {Python client for Distributed Text Services (DTS) via DoTS}
}

You can also use the repository metadata from CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thunderdots-0.1.6.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thunderdots-0.1.6-py3-none-any.whl (45.3 kB view details)

Uploaded Python 3

File details

Details for the file thunderdots-0.1.6.tar.gz.

File metadata

  • Download URL: thunderdots-0.1.6.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thunderdots-0.1.6.tar.gz
Algorithm Hash digest
SHA256 159da222821eb4f23edcf096db5fcc58371decf5dbcbbd2a2e11851805d8e9c4
MD5 c68c98dc3268ad57b656b6d118f0eaed
BLAKE2b-256 611f07f2e0b9dff0cb8c0185678aee6160ed0a54eff951c6ba7304a1e710096e

See more details on using hashes here.

Provenance

The following attestation bundles were made for thunderdots-0.1.6.tar.gz:

Publisher: release.yml on dots-suite/ThunderDots

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thunderdots-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: thunderdots-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 45.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for thunderdots-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5ba862c343a787895d3d21e20a268d8b3000d50dfa84fffe3be68373547d9427
MD5 23e5593b42465c929774ab44a35df8f8
BLAKE2b-256 6b25eaf85adb333a3ff6b1592ff0e9790c65253d61a0b5ed4d3926d944c7b8e1

See more details on using hashes here.

Provenance

The following attestation bundles were made for thunderdots-0.1.6-py3-none-any.whl:

Publisher: release.yml on dots-suite/ThunderDots

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page