Pipeline for querying and turning NASA's ADS publications metadata into curated, analysis-ready datasets, topic maps, and citation networks.
Project description
ads-bib
ads-bib takes a NASA ADS search query and produces a normalized, curated dataset, with disambiguated author names (AND via ads-and), topic models (via BERTopic or Toponymy), and citation networks ready for e.g. Gephi, CiteSpace, or VOSviewer, locally or via API.
Installation
Use uv and Python 3.12.
uv pip install ads-bib
# or: pip install ads-bib
Quick Start
Create a .env file in your project root with the relevant API keys.
ADS_TOKEN=your-ads-token # required
OPENROUTER_API_KEY=your-key # only for the openrouter road
HF_TOKEN=your-key # only for the huggingface road
MODAL_TOKEN_ID=your-modal-id # only for AND with backend=modal
MODAL_TOKEN_SECRET=your-modal-secret
ADS user token settings | OpenRouter Keys | Hugging Face Access Tokens | Modal.
Then run in your terminal:
ads-bib run --preset openrouter --set search.query='author:"Hawking, S*"'
Author name disambiguation is off by default. Enable the local CPU/GPU path
with --set author_disambiguation.enabled=true; use
--set author_disambiguation.backend=modal only when your Modal credentials are
configured.
Full setup details: Get Started | Runtime Roads
Iterate From a Previous Run
Every run writes config_used.yaml and reusable stage artifacts. To try one
change without repeating the whole pipeline, start a variant from that run:
ads-bib run --from-run run_20260407_120000_ads_bib_openrouter \
--set topic_model.embedding_model=google/gemini-embedding-001
ads-bib loads the previous config, applies the override, chooses the earliest
stage that needs recomputation, and writes a new run folder with a variant
block in run_summary.yaml. Preview the reuse plan first with --dry-run.
Python API
import ads_bib
ads_bib.run(
preset="openrouter",
query='author:"Hawking, S*"',
)
More examples and the NotebookSession interface: Python API docs
Pick a Runtime Road
| Road | Hardware | Network | Cost |
|---|---|---|---|
openrouter |
any | API calls | pay-per-token |
hf_api |
any | API calls | HF-plan-dependent |
local_cpu |
CPU only | model downloads only | free after setup |
local_gpu |
NVIDIA + CUDA | model downloads only | free after setup |
Full provider matrix and first-run behavior: Runtime Roads
Output
Each project folder keeps shared caches under data/cache/ and writes every
run under runs/<run_id>/:
runs/<run_id>/
├── config_used.yaml
├── run_summary.yaml
├── data/
│ ├── search/ # run-local ADS search result used for export variants
│ ├── export/ # pre-translation publications and references
│ ├── translated/ # translated publications and references
│ ├── tokenized/ # tokenized publications and references
│ ├── and/ # disambiguated frames plus optional ads-and diagnostics
│ ├── dataset/ # final publications, references, topic_info, manifest
│ └── citations/ # GEXF/CSV/JSON networks and WOS export
├── plots/topic_map.html
└── logs/runtime.log
data/search|export|translated|tokenized|and/— run-local stage boundaries used by--from-runvariantsdata/dataset/publications.parquet— cleaned, translated, topic-labeled publications, with disambiguated authors when AND is enableddata/dataset/references.parquet— normalized cited-reference metadata, with disambiguated authors when AND is enableddata/dataset/topic_info.parquet— one row per topic with labels, counts, and representation fieldsplots/topic_map.html— interactive topic visualization (open in any browser), using datamapplotdata/citations/*.gexf— direct citation, co-citation, bibliographic coupling, author co-citationdata/citations/download_wos_export.txt— Web of Science format for e.g. CiteSpace / VOSviewerrun_summary.yaml— full run metadata, stage status, and optional variant provenancedata/dataset/dataset_manifest.json— artifact hashes plus bundle-cleaning provenance
Topic map output from
author:"Hawking, S*" in datamapplot.
Author co-citation output from
author:"Hawking, S*" in Gephi Lite.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ads_bib-0.2.0.tar.gz.
File metadata
- Download URL: ads_bib-0.2.0.tar.gz
- Upload date:
- Size: 423.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b91901a1799d633b04d758bcc89c3c64fb80f40712e0c847f353fd5564a687a
|
|
| MD5 |
acdb8e1606088cb565ba301f91a82a6f
|
|
| BLAKE2b-256 |
a9d1e6da919e3323bce9d0b9b7648246404e0348070591637e273f12dcea1a5c
|
Provenance
The following attestation bundles were made for ads_bib-0.2.0.tar.gz:
Publisher:
release.yml on raphschlatt/ads-bib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ads_bib-0.2.0.tar.gz -
Subject digest:
7b91901a1799d633b04d758bcc89c3c64fb80f40712e0c847f353fd5564a687a - Sigstore transparency entry: 1440040647
- Sigstore integration time:
-
Permalink:
raphschlatt/ads-bib@1234b181d9d4aab2288baeccb2abf96adf865afa -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/raphschlatt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1234b181d9d4aab2288baeccb2abf96adf865afa -
Trigger Event:
push
-
Statement type:
File details
Details for the file ads_bib-0.2.0-py3-none-any.whl.
File metadata
- Download URL: ads_bib-0.2.0-py3-none-any.whl
- Upload date:
- Size: 175.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c28d8d0cbe1be637ac62368656dc772877b0eb89c1e9ccdc5b45c92a8ac9ca8
|
|
| MD5 |
a5891c1b568c5fcefab0b28d72ce9253
|
|
| BLAKE2b-256 |
b98dbe48259548ecd9d2f72b5650ca345117de66aa78c02a0fde7269be4aef85
|
Provenance
The following attestation bundles were made for ads_bib-0.2.0-py3-none-any.whl:
Publisher:
release.yml on raphschlatt/ads-bib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ads_bib-0.2.0-py3-none-any.whl -
Subject digest:
0c28d8d0cbe1be637ac62368656dc772877b0eb89c1e9ccdc5b45c92a8ac9ca8 - Sigstore transparency entry: 1440040671
- Sigstore integration time:
-
Permalink:
raphschlatt/ads-bib@1234b181d9d4aab2288baeccb2abf96adf865afa -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/raphschlatt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1234b181d9d4aab2288baeccb2abf96adf865afa -
Trigger Event:
push
-
Statement type: