Skip to main content

Add your description here

Project description

scrape-forvo

Download pronunciation MP3s from Forvo search pages.

Installation

uv run python -m pip install -e .

Usage

Only this command is confirmed to work reliably:

scrape-forvo egg --use-playwright --headed

By default, the scraper uses Forvo language code no and downloads files using no as the filename prefix. You can change the language code with --lang:

scrape-forvo egg --lang en --use-playwright --headed

If you want a custom filename prefix, pass --prefix (this overrides the default language-based prefix):

scrape-forvo egg --lang en --prefix myset --use-playwright --headed

Forvo Language Codes (YAML)

The pairs below were collected from https://forvo.com/ language links (plus no from the homepage language menu: Norsk).

forvo_language_codes:
  ar: Arabic
  ca: Catalan
  chm: Mari
  cs: Czech
  de: German
  el: Greek
  en: English
  eo: Esperanto
  es: Spanish
  fa: Persian
  fi: Finnish
  fr: French
  grc: Ancient Greek
  ha: Hausa
  he: Hebrew
  hu: Hungarian
  it: Italian
  ja: Japanese
  ko: Korean
  lb: Luxembourgish
  nl: Dutch
  no: Norwegian
  pl: Polish
  pt: Portuguese
  ru: Russian
  sk: Slovak
  sv: Swedish
  tr: Turkish
  tt: Tatar
  uk: Ukrainian
  yue: Cantonese
  zh: Mandarin Chinese

Scriptable Usage

You can also import scrape_forvo and use it from Python:

from scrape_forvo import scrape

result = scrape(
    "egg",
    outdir="forvo_mp3",
    lang="no",
    use_playwright=True,
    headed=True,
)

print(result.downloaded_count)
for candidate in result.candidates:
    print(candidate.url, "->", candidate.out_path)

The scrape() arguments map directly to CLI flags, so both interfaces share the same behavior without duplicated logic. Internally, the search URL is built as https://forvo.com/search/<word>/<lang>/ (default lang="no").

Development

Set up the project virtual environment with uv:

uv sync

Then run commands from the environment:

source .venv/bin/activate

Install dev dependencies:

python -m pip install -e .[dev]

Run tests:

pytest

Optional live test

Set FORVO_LIVE_TEST=1 to enable the live integration test.

TODO

edge cases

  • when multiple pronunciation files come out. which one to pick?
  • when there's no pronunciation available.

integration

  • integration with the vocab repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrape_forvo-1.1.1.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrape_forvo-1.1.1-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file scrape_forvo-1.1.1.tar.gz.

File metadata

  • Download URL: scrape_forvo-1.1.1.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for scrape_forvo-1.1.1.tar.gz
Algorithm Hash digest
SHA256 74e9a1dac25231966662e48f9f34183f8f0f953027afd8af91f98cc6d1735076
MD5 4ec8cd0bbbd0eea7cfb6209b976b39fd
BLAKE2b-256 37c11db93306d130d8abe6d95afa1c7c7de507c340070a042f06a99e528f542d

See more details on using hashes here.

File details

Details for the file scrape_forvo-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: scrape_forvo-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for scrape_forvo-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d0a63bb4009add0faf6c64d738d48fcdd52df8aead1befeb352e72b133b06ec0
MD5 1fcf74abff2552a198cbcfe608fda4ae
BLAKE2b-256 87e95284e44cd99f7231b94a030f55c5b2635a7979766ce5229653fb89c0ebd9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page