Scrape glosbe dicts given a head words file
Project description
scrape-glosbe-dict
Scrape a glosbe dict
Install it
pip install scrape-glosbe-dict
# pip install git+https://github.com/ffreemt/scrape-glosbe-dict
# poetry add git+https://github.com/ffreemt/scrape-glosbe-dict
# git clone https://github.com/ffreemt/scrape-glosbe-dict && cd scrape-glosbe-dict
Use it
scrape-glosbe-dict head-word-file # default english-chinese
# or python -m scrape_glosbe_dict head-word-file
# scrape-glosbe-dict head-word-file -f de # german-chinese
head word file formt: one word/phrase per line, empty lines will be ignored.
output will be saved to a tsv file.
Docs
python -m scrape_glosbe_dict --help
Usage: python -m scrape_glosbe_dict [OPTIONS] head-word-file
Arguments:
head-word-file Head word file, one word/phrase per line, each will be used
to fetch corresponding definitons from https://glosbe.com/.
[required]
Options:
-f, --from-lang TEXT Source language, check https://glosbe.com/ for valid
value, e.g. https://glosbe.com/en/zh implies
from_lang='en'. [default: en]
-t, --to-lang TEXT Target language, check https://glosbe.com/ for valid
value, e.g. https://glosbe.com/en/zh implies
to_lang='zh'. [default: zh]
-v, --verbose Show output in the process.
-V, --version Show version info and exit.
--help Show this message and exit.
Miscellany
- A retry mechanism (via pypi
tenacity
) is built-in to fetch info from glosbe. Refer to the source file for details. - Local cache (via pypi
joblib
) is used so that you can interrupt anytime and continue later. - Scraping is often frowned upon and sometimes can result in your IP being banned from the website. Use this package at your own discretion.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrape_glosbe_dict-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27e38edbd29ebca42a2c7c79f5958ea999f7b4ddbf30ffc30b28591e6420afe1 |
|
MD5 | 78489d8486f44a9a79c72d709da0975b |
|
BLAKE2b-256 | 068ad8cd155d5897395fb336d1d24e85507c604197f8582c779c422e7f7f6dad |