Fetch publication and book metadata + PDFs from open sources (OpenAlex, Crossref, Semantic Scholar, arXiv, Unpaywall, Open Library, Google Books, BnF) and return normalised JSON

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

alice-vcoeur

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Text Processing :: Markup

Project description

quelle

quelle is a local CLI that fetches publication metadata and PDFs from open sources (OpenAlex, Crossref, Semantic Scholar, arXiv, Unpaywall, Open Library, Google Books, BnF) and returns them as normalised JSON. Handles both academic articles and books. Designed as a composable building block — feed the output into any note-taking system, reference manager, or research workflow.

The name is German for source — in academic German, "Quelle:" is the word that introduces a bibliographic reference, and fetching from open sources is exactly what the tool does.

What it does

Given a publication identifier (DOI, arXiv id, ISBN-10/13) or a free-text title, quelle fetch returns a normalised JSON blob with title, authors, year, venue or publisher, DOI or ISBN, abstract or description, citation count and (optionally) a downloaded local PDF. It walks a fallback chain of free open sources, picking the right ones based on the shape of the query:

For academic articles (DOI, arXiv id, paper title):

Source	Role	Rate limit
OpenAlex	Primary metadata + OA PDF URL	~100k/day on the polite pool; the keyed tier is credit-metered
Crossref	DOI-authoritative fallback (abstract, journal block)	polite pool (no hard cap)
Semantic Scholar	Citation graph + metadata fallback	5000 / 5 min unauth
arXiv	Preprint metadata + direct PDFs	1 req / 3s (enforced)
Unpaywall	DOI → OA PDF lookup	100k / day

For books (ISBN-10, ISBN-13, book title):

Source	Role	Rate limit
Open Library	Primary book metadata, broad ISBN coverage	no published cap; be polite
Google Books	Metadata fallback, public-domain PDFs	1k / day per IP unauth
OpenAlex (books)	Cross-reference for academic monographs	as above
BnF SRU	Strong on French-language books	no published cap

Google Scholar URLs are not supported: Scholar has no public API and its Terms of Service prohibit automated access. If you only have a Scholar link, open the page, copy the paper title, and feed that to quelle fetch as a free-text query — OpenAlex and Crossref cover almost every paper with a DOI.

Stack

Python 3.12+, uv-managed. Typer (CLI) + httpx (sync HTTP) + stdlib sqlite3 (cache) + rich + environs + platformdirs + pytest + pytest-httpx. No GUI, no ORM, no async.

Installation

Install from PyPI:

pipx install quelle
# or: uv tool install quelle

Both install quelle into its own isolated venv and put it on your $PATH. The first invocation of any command creates the config, data, and cache directories. To seed a default .env and open it in your editor:

quelle config edit        # creates the .env on first run, then opens it in $EDITOR

Development from a source checkout

git clone https://github.com/vcoeur/quelle.git
cd quelle
make dev-install          # uv sync --all-groups
make test                 # pytest
make lint                 # ruff check + format --check
make format               # ruff --fix + format
uv run quelle --help      # run the CLI straight from the repo

When run from the repo, quelle picks up the .env at the repo root and stores dev cache / PDFs under a repo-local .dev-state/ instead of polluting your installed user data.

Configuration

quelle follows each OS's standard "config dir + data dir + cache dir" layout via platformdirs:

Role	Linux (XDG)	macOS	Windows
Config (`.env`)	`~/.config/quelle/`	`~/Library/Application Support/quelle/`	`%APPDATA%\quelle\`
Data (downloaded PDFs)	`~/.local/share/quelle/`	`~/Library/Application Support/quelle/`	`%LOCALAPPDATA%\quelle\`
Cache (sqlite index)	`~/.cache/quelle/`	`~/Library/Caches/quelle/`	`%LOCALAPPDATA%\quelle\Cache\`

Any of the three can be overridden via env vars — useful for tests, Docker, or custom deployments:

export QUELLE_CONFIG_DIR=/etc/quelle
export QUELLE_DATA_DIR=/srv/quelle/data
export QUELLE_CACHE_DIR=/var/cache/quelle

Inspect the resolved paths and effective config at any time:

quelle config             # all values including resolved paths and redacted API key
quelle --json config      # same payload as JSON, scriptable

The only variable worth setting by default is QUELLE_CONTACT_EMAIL — it goes into the User-Agent header and enrolls you in the Crossref / OpenAlex polite pool. See .env.example for the full list.

Dev mode: when you run quelle from a source checkout (uv run quelle … inside the repo), the .env at the repo root is still picked up — the same ergonomics as before — but downloaded PDFs and the cache go into a repo-local .dev-state/ directory so your installed user data stays clean.

Universal `resolve` + CiteKey

Beyond academic fetch, quelle resolve accepts any source — a DOI / ISBN / arXiv id, a free-text title, an http(s) URL (a web page or a video → a web / media record built from Open Graph metadata), or a local .pdf path — and always returns a normalised Source: the Publication dict plus a top-level x_vcoeur block carrying a vault-ready CiteKey.

quelle --json resolve https://bambulab.com/en/x1          # web page → web Source
quelle --json resolve https://www.youtube.com/watch?v=ID  # video → media Source
quelle --json resolve ./paper.pdf                         # local PDF → metadata + key
quelle --json resolve "attention is all you need"         # free text → academic record

quelle owns the CiteKey naming convention but stays decoupled from any vault: you inject the set of keys already in use and it disambiguates against them (collision → lowercase suffix a, b, …):

knoten citekeys --json | quelle --json resolve "<input>" --taken-file -

--csl exports a CSL-JSON item instead of the Source. See docs/commands.md for the full resolve / schema / skill reference, and quelle --json schema for the authoritative, never-drifting machine contract.

Usage

# Resolve by DOI (uses OpenAlex + Crossref enrichment by default).
quelle fetch 10.1109/83.902291

# Resolve by arXiv id, with PDF download into the data dir.
quelle fetch 1706.03762 --download-pdf

# Resolve by free-text title (place --json before the subcommand).
quelle --json fetch "The Perceptron: A Probabilistic Model"

# Resolve a book by ISBN-13 (Open Library primary, Google Books / OpenAlex / BnF fallback).
quelle fetch 9782070407132

# Resolve a book by ISBN-10, hyphens and `ISBN:` prefix tolerated.
quelle fetch "ISBN: 0-14-018633-6"

# Bypass the local cache and force network.
quelle fetch 10.xxxx/yyyy --no-cache

# Search across multiple open sources, then resolve the chosen hit.
quelle search "attention is all you need"
quelle search "etranger, camus" --book               # comma splits title from author hint
quelle --json search "transformer" --limit 5 --source openalex --source arxiv

# Inspect the cache (`list` includes a header with the total / last upsert / schema).
quelle cache list --limit 20
quelle cache show 10.1109/83.902291
quelle cache show 9782070407132
quelle cache clear --yes

Claude Code skill

quelle ships a convention-free agent skill as package data and installs it for you:

quelle skill install --user        # -> ~/.config/agents/skills/quelle/SKILL.md
quelle skill install --project     # -> <cwd>/.agents/skills/quelle/SKILL.md
quelle skill install --claude      # -> ~/.claude/skills/quelle/SKILL.md
quelle skill status                # where it is installed + whether it matches the bundled copy

Because the skill is bundled with the wheel, it updates in lockstep with the CLI. It documents quelle's CLI contract — resolve / fetch / search / schema, the Source shape + x_vcoeur, CiteKey minting via --taken-file, the --csl export, and exit codes — and points readers to quelle --json schema as the authoritative contract. It deliberately encodes no vault conventions; layer those in a separate skill that references this one. (A minimal standalone example also ships at the repo root in SKILL.md.)

Layout

quelle/
  models/        <- Publication, Author (pure dataclasses)
  repositories/
    cache.py           <- SQLite cache; row identity = identifiers (DOI / arXiv / ISBN / OpenAlex id), title as last-resort lookup
    errors.py          <- Error hierarchy -> exit codes 1/2/3/4 (CLI usage errors exit 64)
    http_client.py     <- httpx + polite User-Agent
    pdf_downloader.py  <- Streaming PDF download with content-type + size checks
    sources/           <- One module per source: openalex, crossref, semantic_scholar,
                          arxiv, unpaywall, open_library, google_books, bnf
  services/
    resolver.py         <- Source orchestration + enrichment chain + cache lookup
    pdf_resolver.py     <- Lazy PDF fallback chain
  cli/
    main.py             <- Typer app (fetch, search, cache, root --version + --json flags)
    config.py           <- bare `config` (callback) + `config edit` (seeds + opens .env)
    _helpers.py         <- heuristics, error reporting, dataclass→dict flatteners
    output.py           <- JSON vs rich TTY rendering
  paths.py              <- platformdirs resolution (config / data / cache)
  migrate.py            <- One-shot migration from the legacy config/cache layout
  settings.py           <- environs-layered config
tests/

Layer rules: imports only go downward. Models import nothing from this project. Repositories import models. Services import models + repositories. CLI is the wiring layer.

Status

All eight open-API sources wired up with a merge-logic enrichment chain. Article identifiers (DOI / arXiv id / title) walk OpenAlex → Crossref → Semantic Scholar; book identifiers (ISBN-10 / ISBN-13) walk Open Library → Google Books → BnF → OpenAlex. SQLite cache keyed by DOI / arXiv id / OpenAlex id / ISBN-10 / ISBN-13 / title — the second query for the same record is offline. PDF download chain (OpenAlex → arXiv → Unpaywall) only fires for articles and OA / public-domain books; in-copyright books are intentionally skipped even with --download-pdf set. quelle resolve extends this to any source — web pages, videos, and local PDFs — and mints a vault-ready CiteKey for each (the web / media kinds and the CiteKey convention live in quelle/services/citekey.py).

Usage and terms

This tool is intended for personal and academic research use. It queries free, public APIs on your behalf. You are responsible for complying with each upstream's terms of service — the MIT licence on this repo covers the code of this tool, not the data you fetch through it.

Not supported use cases:

Bulk scraping / batch ingestion of many records. Most upstreams publish free database snapshots; use those instead of hammering the live API.
Rehosting downloaded PDFs on a public server. The --download-pdf flag writes to a local cache on your machine — that is fine. Re-serving arXiv PDFs, publisher PDFs, or full text from your own infrastructure is not (see arXiv and Semantic Scholar rows below).
Downloading in-copyright books. --download-pdf will only follow pdf_url when one is advertised, and the only book sources that publish a pdf_url are public-domain editions (e.g. Google Books FULL_PUBLIC_DOMAIN). The tool does not attempt library-genesis lookups, publisher scraping, or any in-copyright PDF resolution for books. Do not work around this — most books are still in copyright and downloading them without permission is infringement in nearly every jurisdiction.
Commercial repackaging of the JSON output as a paid product. Individual commercial use of the metadata is generally allowed by the underlying licences, but Semantic Scholar in particular requires attribution and some S2 records are CC BY-NC. Book descriptions returned by Google Books may be publisher-supplied and remain copyrighted independently of the JSON envelope around them — treat the abstract field for books as quotable but not redistributable.

Per-source summary:

Source	Data licence	Rate limit	Attribution	Notes
OpenAlex	CC0 — "OpenAlex data is and will remain available at no cost"	~100k / day on the polite pool; single-entity lookups unlimited	not required	Provide an email via `QUELLE_CONTACT_EMAIL` for the polite pool, or set `OPENALEX_API_KEY` for the new key-based tier (OpenAlex announced in January 2026 that key authentication is replacing the mailto polite pool; the tool supports both).
Crossref REST	CC0 for metadata — "almost none of the metadata is subject to copyright, and you may use it for any purpose". Some abstracts may remain copyrighted.	No hard cap; the polite pool is requested via your `mailto=` / User-Agent	not required, but recommended	Commercial users who need SLAs should subscribe to Metadata Plus directly with Crossref.
arXiv API	Metadata CC0. PDFs retain their authors' / arXiv's licence.	1 request / 3 seconds (the tool enforces this globally via a module-level lock)	Do not claim arXiv endorses your project.	You may not store and re-serve arXiv e-prints (PDFs, source files, other content) from your own servers unless you have the copyright holder's permission. Downloading for local personal reading is explicitly allowed.
Semantic Scholar	S2 data may be `CC BY-NC` or `ODC-BY` depending on the record. The API itself is provided "AS IS, WITH ALL FAULTS, AND AS AVAILABLE" with no warranty.	Public endpoints need no auth; higher throughput requires a free key from Ai2.	Required — "Licensee will include an attribution to 'Semantic Scholar'", and publications must cite The Semantic Scholar Open Data Platform.	You may not "repackage, sell, rent, lease, lend, distribute, or sublicense the API". This tool is a personal client, not a proxy.
Unpaywall	CC0 data	100k requests / day	not required	The email parameter is mandatory — Unpaywall uses it to contact you if something goes wrong. Don't fake it. For bulk workloads, download the free data snapshot instead of hammering the API.
Open Library	CC0 metadata; cover images CC-BY-SA via Internet Archive.	No published hard cap; honour the platform's general guidance to be considerate (Open Library runs on volunteer-funded Internet Archive infrastructure).	not required	Open Library publishes monthly bulk dumps — use those for any non-trivial volume rather than crawling the live API.
Google Books	Subject to Google's API Terms of Service; metadata may be cached but not redistributed in bulk. Public-domain PDF downloads only.	1 000 requests / day per IP unauth. Set `GOOGLE_BOOKS_API_KEY` for higher quotas.	not required	The Volumes API permits caching for a limited duration but disallows building a competing service from its data.
BnF SRU catalogue	Open public-sector data licence (Etalab 2.0 / similar).	No published cap; the SRU endpoint is shared infrastructure — keep volume reasonable.	requested where reused	Strong on French-language books and serials; coverage of non-French material is patchy.

Google Scholar is not supported. Google Scholar has no official API, and Google's Terms of Service prohibit automated access. Passing a Scholar URL to quelle fetch returns a UserError asking you to copy the paper title manually and retry — OpenAlex and Crossref together cover almost every paper with a DOI, so the workaround is usually one extra copy-paste.

No warranty: see the MIT LICENSE — this tool is provided as-is, with no guarantee that its JSON output is correct, complete, or current. Verify critical metadata against the canonical upstream before relying on it.

Licence

MIT — see LICENSE.

Questions or feedback

This is a personal tool — I'm happy to hear from you, but there is no formal support. The best way to reach me is the contact form on vcoeur.com.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

alice-vcoeur

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Text Processing :: Markup

Release history Release notifications | RSS feed

This version

0.10.0

Jun 10, 2026

0.9.2

Jun 7, 2026

0.9.1

Jun 7, 2026

0.9.0

Jun 7, 2026

0.8.2

May 23, 2026

0.8.1

May 9, 2026

0.8.0

May 8, 2026

0.7.0

May 8, 2026

0.6.0

May 8, 2026

0.5.0

May 7, 2026

0.4.1

May 7, 2026

0.4.0

May 7, 2026

0.3.0

May 7, 2026

0.2.0

May 7, 2026

0.1.3

Apr 15, 2026

0.1.2

Apr 15, 2026

0.1.1

Apr 15, 2026

0.1.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quelle-0.10.0.tar.gz (280.1 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quelle-0.10.0-py3-none-any.whl (100.6 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file quelle-0.10.0.tar.gz.

File metadata

Download URL: quelle-0.10.0.tar.gz
Upload date: Jun 10, 2026
Size: 280.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for quelle-0.10.0.tar.gz
Algorithm	Hash digest
SHA256	`11f7272f162562920435ca0c4f599d5287cbcc297fb93d93df58c1b32a55688e`
MD5	`49c82e97f45a89a687e662489b224589`
BLAKE2b-256	`5344db7fc9a031c84c91680cc73d2d24eee5b001c92b10b9c0b5b4080f7f031f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for quelle-0.10.0.tar.gz:

Publisher: release.yml on vcoeur/quelle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: quelle-0.10.0.tar.gz
- Subject digest: 11f7272f162562920435ca0c4f599d5287cbcc297fb93d93df58c1b32a55688e
- Sigstore transparency entry: 1772649779
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: vcoeur/quelle@8d57ad519eacd4af2abe2c5efbc5723fb3a293a1
- Branch / Tag: refs/tags/v0.10.0
- Owner: https://github.com/vcoeur
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8d57ad519eacd4af2abe2c5efbc5723fb3a293a1
- Trigger Event: push

File details

Details for the file quelle-0.10.0-py3-none-any.whl.

File metadata

Download URL: quelle-0.10.0-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 100.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for quelle-0.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9501aa2ba6fccd7dd486eaabf94f199a1a65df6c0bb19b07c148fb9dfd77c296`
MD5	`4990ec2511b3b1e666a4b95fd4c193e4`
BLAKE2b-256	`67758d2e312bc3bfc66b349ddccf72fa7851a7ee58ff861a428f395614d52b52`

See more details on using hashes here.

Provenance

The following attestation bundles were made for quelle-0.10.0-py3-none-any.whl:

Publisher: release.yml on vcoeur/quelle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: quelle-0.10.0-py3-none-any.whl
- Subject digest: 9501aa2ba6fccd7dd486eaabf94f199a1a65df6c0bb19b07c148fb9dfd77c296
- Sigstore transparency entry: 1772650003
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: vcoeur/quelle@8d57ad519eacd4af2abe2c5efbc5723fb3a293a1
- Branch / Tag: refs/tags/v0.10.0
- Owner: https://github.com/vcoeur
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8d57ad519eacd4af2abe2c5efbc5723fb3a293a1
- Trigger Event: push

quelle 0.10.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

quelle

What it does

Stack

Installation

Development from a source checkout

Configuration

Universal resolve + CiteKey

Usage

Claude Code skill

Layout

Status

Usage and terms

Licence

Questions or feedback

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Universal `resolve` + CiteKey