ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.
Project description

ThunderDots — DTS client for documentary corpora
DTS scrapping, TEI fragmentation, metadata filtering, validation, and export pipelines.
Overview
ThunderDots is a Python client for DTS (Distributed Text Services) endpoints, initially built for DoTS.
It helps you move from a remote DTS API to structured Python objects and JSON records that can feed indexing pipelines, including full-text search, RAG/vector databases, and corpus-analysis workflows.
ThunderDots focuses on practical documentary workflows: crawling DTS collections, fetching TEI/XML resources, extracting reusable text fragments, selecting metadata, validating outputs, and exporting data to downstream search or indexing systems.
What ThunderDots does
ThunderDots can:
- walk DTS collections and subcollections;
- fetch resources and TEI/XML documents;
- extract text fragments from full documents, DTS navigation, or custom TEI XPath rules;
- preserve or filter Dublin Core and extension metadata;
- enrich temporal metadata such as dates and coverage ranges;
- validate generated outputs with JSON Schema;
- export records to indexing pipelines such as Elasticsearch or Qdrant-compatible formats;
- cache fetched corpora as JSON and CSV;
- run synchronous or asynchronous workflows.
Installation
With uv
uv add thunderdots
With pip
pip install thunderdots
For development
git clone https://github.com/chartes/thunderdots.git
cd thunderdots
uv venv
source .venv/bin/activate
uv sync --all-extras --dev
or with pip
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Minimal example
from thunderdots import ThunderDots
td = ThunderDots(
endpoint_dts="https://dots.chartes.psl.eu/api/dts",
collection_params={"collection_id": "ENCPOS_1900"},
resource_params={"fragment_mode": "document"},
)
td.fetch()
results = td.results()
print(td.stats())
Development
Run tests
pytest
Online DTS tests are opt-in:
RUN_NETWORK_TESTS=1 pytest
Run Ruff (linter, format)
ruff format --check
ruff check
Build the documentation
mkdocs build --strict -f mkdocs/mkdocs.yml
Create a new PyPI release
Check the release checklist for details.
License
ThunderDots is distributed under the MIT License.
Citation
If you use ThunderDots in academic work, please cite it as:
@software{terriel_thunderdots_2026,
author = {Terriel, Lucas},
title = {ThunderDots},
year = {2026},
publisher = {GitHub},
institution = {{École nationale des chartes}},
url = {https://github.com/chartes/thunderdots},
note = {Python client for Distributed Text Services (DTS) via DoTS}
}
You can also use the repository metadata from CITATION.cff.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thunderdots-0.1.5.tar.gz.
File metadata
- Download URL: thunderdots-0.1.5.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd99676932bf2ac59fda471a438f366d219e94159e6ab0ffdc17f56d965c3a4f
|
|
| MD5 |
b2b8f7f0a6a43f43459d61726fbd533a
|
|
| BLAKE2b-256 |
c3eff7d8a193a9068582431b8a6a7da64f0bf897b954987ce0dca63917fc1feb
|
Provenance
The following attestation bundles were made for thunderdots-0.1.5.tar.gz:
Publisher:
release.yml on dots-suite/ThunderDots
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thunderdots-0.1.5.tar.gz -
Subject digest:
fd99676932bf2ac59fda471a438f366d219e94159e6ab0ffdc17f56d965c3a4f - Sigstore transparency entry: 1949636089
- Sigstore integration time:
-
Permalink:
dots-suite/ThunderDots@e2b0a009c9287dd1eaa76ce820920598a9912d22 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/dots-suite
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e2b0a009c9287dd1eaa76ce820920598a9912d22 -
Trigger Event:
release
-
Statement type:
File details
Details for the file thunderdots-0.1.5-py3-none-any.whl.
File metadata
- Download URL: thunderdots-0.1.5-py3-none-any.whl
- Upload date:
- Size: 44.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86b134eba2d0a34d89f29b85d2141693e7814c310a5f1f18809a6293b00acf1b
|
|
| MD5 |
a8ca0f4b53de31903dedd91243fd9d2e
|
|
| BLAKE2b-256 |
1c201f5dddc6d50e1de8e87d90d414e9115bddd6963f613a646249a22058a4ad
|
Provenance
The following attestation bundles were made for thunderdots-0.1.5-py3-none-any.whl:
Publisher:
release.yml on dots-suite/ThunderDots
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thunderdots-0.1.5-py3-none-any.whl -
Subject digest:
86b134eba2d0a34d89f29b85d2141693e7814c310a5f1f18809a6293b00acf1b - Sigstore transparency entry: 1949636194
- Sigstore integration time:
-
Permalink:
dots-suite/ThunderDots@e2b0a009c9287dd1eaa76ce820920598a9912d22 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/dots-suite
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e2b0a009c9287dd1eaa76ce820920598a9912d22 -
Trigger Event:
release
-
Statement type: