Skip to main content

Scrape glosbe dicts given a head words file

Project description

scrape-glosbe-dict

pytestpythonCode style: blackLicense: MITPyPI version

Scrape a glosbe dict

Install it

pip install scrape-glosbe-dict

# pip install git+https://github.com/ffreemt/scrape-glosbe-dict
# poetry add git+https://github.com/ffreemt/scrape-glosbe-dict
# git clone https://github.com/ffreemt/scrape-glosbe-dict && cd scrape-glosbe-dict

Use it

scrape-glosbe-dict head-word-file  # default english-chinese

# or python -m scrape_glosbe_dict head-word-file

# scrape-glosbe-dict head-word-file -f de  # german-chinese

head word file formt: one word/phrase per line, empty lines will be ignored.

output will be saved to a tsv file.

Docs

python -m scrape_glosbe_dict --help
Usage: python -m scrape_glosbe_dict [OPTIONS] head-word-file

Arguments:
  head-word-file  Head word file, one word/phrase per line, each will be used
                  to fetch corresponding definitons from https://glosbe.com/.
                  [required]

Options:
  -f, --from-lang TEXT  Source language, check https://glosbe.com/ for valid
                        value, e.g. https://glosbe.com/en/zh implies
                        from_lang='en'.  [default: en]
  -t, --to-lang TEXT    Target language, check https://glosbe.com/ for valid
                        value, e.g. https://glosbe.com/en/zh implies
                        to_lang='zh'.  [default: zh]
  -v, --verbose         Show output in the process.
  -V, --version         Show version info and exit.
  --help                Show this message and exit.

Miscellany

  • A retry mechanism (via pypi tenacity) is built-in to fetch info from glosbe. Refer to the source file for details.
  • Local cache (via pypi joblib) is used so that you can interrupt anytime and continue later.
  • Scraping is often frowned upon and sometimes can result in your IP being banned from the website. Use this package at your own discretion.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrape-glosbe-dict-0.1.1.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

scrape_glosbe_dict-0.1.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file scrape-glosbe-dict-0.1.1.tar.gz.

File metadata

  • Download URL: scrape-glosbe-dict-0.1.1.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.2 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/1.4.0 colorama/0.4.4 CPython/3.8.5

File hashes

Hashes for scrape-glosbe-dict-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b9039284c2ace51ec8d55ba3380d28f616259d8a432ae648c717e589ebaf2c76
MD5 ab4bc07e5d8098a1ee68fb3fe5d86dc8
BLAKE2b-256 2abffdb0bc44dcc3ce89ba49c9268fabbf51d5b47ae69ba6121141c0b23abf05

See more details on using hashes here.

File details

Details for the file scrape_glosbe_dict-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: scrape_glosbe_dict-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.2 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/1.4.0 colorama/0.4.4 CPython/3.8.5

File hashes

Hashes for scrape_glosbe_dict-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 27e38edbd29ebca42a2c7c79f5958ea999f7b4ddbf30ffc30b28591e6420afe1
MD5 78489d8486f44a9a79c72d709da0975b
BLAKE2b-256 068ad8cd155d5897395fb336d1d24e85507c604197f8582c779c422e7f7f6dad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page