Pipeline for querying and turning NASA's ADS publications metadata into curated, analysis-ready datasets, topic maps, and citation networks.
Project description
ads-bib
ads-bib takes a NASA ADS search query and produces a normalized, curated dataset, with disambiguated author names (AND via ads-and), topic models (via BERTopic or Toponymy), and citation networks ready for e.g. Gephi, CiteSpace, or VOSviewer, locally or via API.
Installation
Use uv and Python 3.12.
uv pip install ads-bib
# or: pip install ads-bib
Quick Start
Create a .env file in your project root with the relevant API keys.
ADS_TOKEN=your-ads-token # required
OPENROUTER_API_KEY=your-key # only for the openrouter road
HF_TOKEN=your-key # only for the huggingface road
MODAL_TOKEN_ID=your-modal-id # only for AND with backend=modal
MODAL_TOKEN_SECRET=your-modal-secret
ADS user token settings | OpenRouter Keys | Hugging Face Access Tokens | Modal.
Then run in your terminal:
ads-bib run --preset openrouter --set search.query='author:"Hawking, S*"'
Author name disambiguation is off by default. Enable the local CPU/GPU path
with --set author_disambiguation.enabled=true; use
--set author_disambiguation.backend=modal only when your Modal credentials are
configured.
Full setup details: Get Started | Runtime Roads
Python API
import ads_bib
ads_bib.run(
preset="openrouter",
query='author:"Hawking, S*"',
)
More examples and the NotebookSession interface: Python API docs
Pick a Runtime Road
| Road | Hardware | Network | Cost |
|---|---|---|---|
openrouter |
any | API calls | pay-per-token |
hf_api |
any | API calls | HF-plan-dependent |
local_cpu |
CPU only | model downloads only | free after setup |
local_gpu |
NVIDIA + CUDA | model downloads only | free after setup |
Full provider matrix and first-run behavior: Runtime Roads
Output
Each run produces a self-contained output directory:
publications.parquet— cleaned, translated, topic-labeled publications, with disambiguated authors when AND is enabledreferences.parquet— normalized cited-reference metadata, with disambiguated authors when AND is enabledtopic_info.parquet— one row per topic with labels, counts, and representation fieldstopic_map.html— interactive topic visualization (open in any browser), using datamapplot.gexfcitation networks — direct citation, co-citation, bibliographic coupling, author co-citationdownload_wos_export.txt— Web of Science format for e.g. CiteSpace / VOSviewerrun_summary.yaml— full run metadata and stage timings
Topic map output from
author:"Hawking, S*" in datamapplot.
Author co-citation output from
author:"Hawking, S*" in Gephi Lite.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ads_bib-0.1.0.tar.gz.
File metadata
- Download URL: ads_bib-0.1.0.tar.gz
- Upload date:
- Size: 409.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad596890e910122bb2a739f2867d449aebadd067000d6fa944731ceee7b07ce5
|
|
| MD5 |
125836ec053e22dc6e10dd72f77f030f
|
|
| BLAKE2b-256 |
c86e029077d4cb6efac4d5d2f51589bb18906cfd2e8ab356934a2c9b36a9fded
|
Provenance
The following attestation bundles were made for ads_bib-0.1.0.tar.gz:
Publisher:
release.yml on raphschlatt/ads-bib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ads_bib-0.1.0.tar.gz -
Subject digest:
ad596890e910122bb2a739f2867d449aebadd067000d6fa944731ceee7b07ce5 - Sigstore transparency entry: 1399545884
- Sigstore integration time:
-
Permalink:
raphschlatt/ads-bib@ba6f8fa806d5f721742bc82bf90fe3962d97426c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/raphschlatt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ba6f8fa806d5f721742bc82bf90fe3962d97426c -
Trigger Event:
push
-
Statement type:
File details
Details for the file ads_bib-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ads_bib-0.1.0-py3-none-any.whl
- Upload date:
- Size: 160.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4421febec4d43b0f08670a5b565c6f953774adf4cc78f12ff9a565e0899562c0
|
|
| MD5 |
3d7b7eb8ab9e453a08a35c8d533422df
|
|
| BLAKE2b-256 |
e3faee36ad47f23425ec7f19cac8657555d81a8d3f72a446db88765f15353351
|
Provenance
The following attestation bundles were made for ads_bib-0.1.0-py3-none-any.whl:
Publisher:
release.yml on raphschlatt/ads-bib
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ads_bib-0.1.0-py3-none-any.whl -
Subject digest:
4421febec4d43b0f08670a5b565c6f953774adf4cc78f12ff9a565e0899562c0 - Sigstore transparency entry: 1399545893
- Sigstore integration time:
-
Permalink:
raphschlatt/ads-bib@ba6f8fa806d5f721742bc82bf90fe3962d97426c -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/raphschlatt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ba6f8fa806d5f721742bc82bf90fe3962d97426c -
Trigger Event:
push
-
Statement type: