Skip to main content

Pipeline for querying and turning NASA's ADS publications metadata into curated, analysis-ready datasets, topic maps, and citation networks.

Project description

ads-bib

Python 3.12 License MIT Docs Open in Colab

ads-bib takes a NASA ADS search query and produces a normalized, curated dataset, with disambiguated author names (AND via ads-and), topic models (via BERTopic or Toponymy), and citation networks ready for e.g. Gephi, CiteSpace, or VOSviewer, locally or via API.

Installation

Use uv and Python 3.12.

uv pip install ads-bib
# or: pip install ads-bib

Quick Start

Create a .env file in your project root with the relevant API keys.

ADS_TOKEN=your-ads-token           # required
OPENROUTER_API_KEY=your-key        # only for the openrouter road
HF_TOKEN=your-key                  # only for the huggingface road
MODAL_TOKEN_ID=your-modal-id       # only for AND with backend=modal
MODAL_TOKEN_SECRET=your-modal-secret

ADS user token settings | OpenRouter Keys | Hugging Face Access Tokens | Modal.

Then run in your terminal:

ads-bib run --preset openrouter --set search.query='author:"Hawking, S*"'

Author name disambiguation is off by default. Enable the local CPU/GPU path with --set author_disambiguation.enabled=true; use --set author_disambiguation.backend=modal only when your Modal credentials are configured.

Full setup details: Get Started | Runtime Roads

Iterate From a Previous Run

Every run writes config_used.yaml and reusable stage artifacts. To try one change without repeating the whole pipeline, start a variant from that run:

ads-bib run --from-run run_20260407_120000_ads_bib_openrouter \
  --set topic_model.embedding_model=google/gemini-embedding-001

ads-bib loads the previous config, applies the override, chooses the earliest stage that needs recomputation, and writes a new run folder with a variant block in run_summary.yaml. Preview the reuse plan first with --dry-run.

Python API

import ads_bib

ads_bib.run(
    preset="openrouter",
    query='author:"Hawking, S*"',
)

More examples and the NotebookSession interface: Python API docs

Pick a Runtime Road

Road Hardware Network Cost
openrouter any API calls pay-per-token
hf_api any API calls HF-plan-dependent
local_cpu CPU only model downloads only free after setup
local_gpu NVIDIA + CUDA model downloads only free after setup

Full provider matrix and first-run behavior: Runtime Roads

Output

Each project folder keeps shared caches under data/cache/ and writes every run under runs/<run_id>/:

runs/<run_id>/
├── config_used.yaml
├── run_summary.yaml
├── data/
│   ├── search/        # run-local ADS search result used for export variants
│   ├── export/        # pre-translation publications and references
│   ├── translated/    # translated publications and references
│   ├── tokenized/     # tokenized publications and references
│   ├── and/           # disambiguated frames plus optional ads-and diagnostics
│   ├── dataset/       # final publications, references, topic_info, manifest
│   └── citations/     # GEXF/CSV/JSON networks and WOS export
├── plots/topic_map.html
└── logs/runtime.log
  • data/search|export|translated|tokenized|and/ — run-local stage boundaries used by --from-run variants
  • data/dataset/publications.parquet — cleaned, translated, topic-labeled publications, with disambiguated authors when AND is enabled
  • data/dataset/references.parquet — normalized cited-reference metadata, with disambiguated authors when AND is enabled
  • data/dataset/topic_info.parquet — one row per topic with labels, counts, and representation fields
  • plots/topic_map.html — interactive topic visualization (open in any browser), using datamapplot
  • data/citations/*.gexf — direct citation, co-citation, bibliographic coupling, author co-citation
  • data/citations/download_wos_export.txt — Web of Science format for e.g. CiteSpace / VOSviewer
  • run_summary.yaml — full run metadata, stage status, and optional variant provenance
  • data/dataset/dataset_manifest.json — artifact hashes plus bundle-cleaning provenance

Interactive topic map from the Hawking query Topic map output from author:"Hawking, S*" in datamapplot.

Author co-citation network from the Hawking query Author co-citation output from author:"Hawking, S*" in Gephi Lite.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ads_bib-0.2.0.tar.gz (423.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ads_bib-0.2.0-py3-none-any.whl (175.1 kB view details)

Uploaded Python 3

File details

Details for the file ads_bib-0.2.0.tar.gz.

File metadata

  • Download URL: ads_bib-0.2.0.tar.gz
  • Upload date:
  • Size: 423.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ads_bib-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7b91901a1799d633b04d758bcc89c3c64fb80f40712e0c847f353fd5564a687a
MD5 acdb8e1606088cb565ba301f91a82a6f
BLAKE2b-256 a9d1e6da919e3323bce9d0b9b7648246404e0348070591637e273f12dcea1a5c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ads_bib-0.2.0.tar.gz:

Publisher: release.yml on raphschlatt/ads-bib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ads_bib-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ads_bib-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 175.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ads_bib-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c28d8d0cbe1be637ac62368656dc772877b0eb89c1e9ccdc5b45c92a8ac9ca8
MD5 a5891c1b568c5fcefab0b28d72ce9253
BLAKE2b-256 b98dbe48259548ecd9d2f72b5650ca345117de66aa78c02a0fde7269be4aef85

See more details on using hashes here.

Provenance

The following attestation bundles were made for ads_bib-0.2.0-py3-none-any.whl:

Publisher: release.yml on raphschlatt/ads-bib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page