Skip to main content

Audiobook scraper — search and stream from Librivox, LoyalBooks, and more

Project description

audiobooker

Search and stream free audiobooks from multiple web sources. Parallel search, fuzzy scoring, a unified AudioBook dataclass, local cache, SQLite index, and a mediavocab Release converter — one API regardless of where the book comes from.

Install

pip install audiobooker

# Optional extras
pip install audiobooker[youtube]   # YouTube channel/playlist sources (tutubo)
pip install audiobooker[stealth]   # curl_cffi TLS-fingerprint transport
pip install audiobooker[test]      # pytest + vcrpy (dev only)

Quick start

from audiobooker import search

for book in search("Lovecraft", max_per_source=5, timeout=30):
    print(f"[{book.score:.2f}] [{book.source}] {book.title}")
    print(f"  authors={[f'{a.first_name} {a.last_name}'.strip() for a in book.authors]}")
    print(f"  streams={len(book.streams)}")

Supported sources

Source Site Catalogue Native search
Librivox librivox.org ~18 000 books REST API (title, author, narrator, tag)
LoyalBooks loyalbooks.com ~3 500 books sitemap + genre pages
GoldenAudioBooks goldenaudiobook.co ~6 500 books linear scan
StephenKingAudioBooks stephenkingaudiobooks.com ~113 books native site search
AudioAnarchy audioanarchy.org ~11 books linear scan
DarkerProjects darkerprojects.com ~244 episodes linear scan
HPTalesAudioBooks hpaudiotales.com ~20 books linear scan

YouTube (pip install audiobooker[youtube]):

Source Channel Content
TheCybrarian @TheCybrarian Robert E. Howard fiction
HorrorBabble @HorrorBabble Horror short fiction

Python API

from audiobooker import (
    search, search_by_title, search_by_author, search_by_tag, search_by_narrator,
    audiobook_to_release,
    BookIndex, IndexedSource,
    AudioBook, BookAuthor, AudiobookNarrator, AudioBookChapter,
)

# Targeted searches — all run in parallel across all sources
for book in search_by_author("Dickens", max_per_source=5):
    print(book.title)

for book in search_by_tag("horror", max_per_source=5):
    print(book.title)

Per-source

from audiobooker.scrappers.librivox import Librivox

lv = Librivox()
for book in lv.search_by_title("Dracula"):
    print(book.title, book.runtime)

for book in lv.iterate_all():   # full catalogue
    print(book.title)

All scrapers share the same interface: search(), search_by_title(), search_by_author(), search_by_tag(), search_by_narrator(), iterate_all(), iterate_popular(), iterate_by_author(), iterate_by_tag().

mediavocab integration

mediavocab is a required dependency. audiobook_to_release() projects an AudioBook into the typed mediavocab.Release schema — Work, credits, chapters, external IDs, codec, license.

from audiobooker import search, audiobook_to_release

for book in search("Lovecraft", max_per_source=3):
    release = audiobook_to_release(book)
    lic = release.license
    if lic and lic.is_open():
        print(release.work.title, lic.identifier)

See docs/converters.md for the full field mapping.

HTTP transport

By default every scraper uses a requests.Session with a randomised User-Agent. Two ways to override:

Environment variable — set before any import:

AUDIOBOOKER_TRANSPORT=curl_cffi python myscript.py

Falls back to plain requests if curl_cffi is not installed. Install with pip install audiobooker[stealth].

Per-instance injection — pass any requests-compatible session:

from curl_cffi import requests as cffi_requests
from audiobooker.scrappers.librivox import Librivox

session = cffi_requests.Session(impersonate="chrome")
lv = Librivox(session=session)

default_session() from audiobooker.transport respects AUDIOBOOKER_TRANSPORT and returns the appropriate session type. — audiobooker/transport.py:1

Local index

Build once, search without network access:

from audiobooker.index import BookIndex

idx = BookIndex()   # ~/.audiobooker/index.db
idx.build()         # iterate_all() on all 7 web sources

for book in idx.search_by_title("Sherlock Holmes", max_results=5):
    print(f"[{book.score:.2f}] {book.title}")

CLI reference

audiobooker search <query>
    --method  search|search_by_title|search_by_author|search_by_tag|search_by_narrator
    -n        max results (default 10)
    --source  limit to one source
    --timeout seconds (default 30)
    -v        verbose (tags, narrator, stream URLs)

audiobooker index build [--sources librivox loyalbooks ...]
audiobooker index update
audiobooker index search <query> [--method ...] [-n N]
audiobooker index stats
audiobooker index follow <url> [--kind channel|playlist] [--tags ...] [--blacklist ...]
audiobooker index unfollow <url>
audiobooker index list

audiobooker cache download <query> [--stream INDEX]
audiobooker cache play     <query> [--stream INDEX]
audiobooker cache list
audiobooker cache clear    [<query>]
audiobooker cache info     <query>

All index and cache commands accept --db PATH and --cache-dir PATH to override default locations (~/.audiobooker/index.db and ~/.cache/audiobooker).

Docs

Full documentation is in /docs/:

Runnable examples are in /examples/ — numbered 01 → 10 from quickstart to advanced index usage.

Error handling

Network failures and malformed pages are swallowed per-item — a bad page never aborts an iterate_all() run. If a source site is down or has restructured its HTML, that scraper silently yields nothing.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiobooker-0.9.0a2.tar.gz (65.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audiobooker-0.9.0a2-py3-none-any.whl (48.5 kB view details)

Uploaded Python 3

File details

Details for the file audiobooker-0.9.0a2.tar.gz.

File metadata

  • Download URL: audiobooker-0.9.0a2.tar.gz
  • Upload date:
  • Size: 65.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for audiobooker-0.9.0a2.tar.gz
Algorithm Hash digest
SHA256 1da93c33d6239656b3046a2aefc0ba582b048bbe74d7e7f5092ddb54b43ed2f2
MD5 0ab365468644a3051134ebbf6d251c89
BLAKE2b-256 e54759deb3b7d882a269b3851556e4f8cb30d6aa004963766694a799c41a6140

See more details on using hashes here.

File details

Details for the file audiobooker-0.9.0a2-py3-none-any.whl.

File metadata

  • Download URL: audiobooker-0.9.0a2-py3-none-any.whl
  • Upload date:
  • Size: 48.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for audiobooker-0.9.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 d7869aa829da1fbdf734c6135430e2522ab89d42f3f58e7d1cbad7e02974a9d3
MD5 3744a45cf70c0ca2ba255583316ec02a
BLAKE2b-256 6177cbb62b033bc1f1c5917117196d8c0d140c54f027f342df48cd9b43148a5d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page