Scrape glosbe dicts given a head words file
Project description
scrape-glosbe-dict
Scrape a glosbe dict
Install it
pip install scrape-glosbe-dict
# pip install git+https://github.com/ffreemt/scrape-glosbe-dict
# poetry add git+https://github.com/ffreemt/scrape-glosbe-dict
# git clone https://github.com/ffreemt/scrape-glosbe-dict && cd scrape-glosbe-dict
Use it
scrape-glosbe-dict head-word-file # default english-chinese
# or python -m scrape_glosbe_dict head-word-file
# scrape-glosbe-dict head-word-file -f de # german-chinese
head word file formt: one word/phrase per line, empty lines will be ignored.
output will be saved to a tsv file.
Docs
python -m scrape_glosbe_dict --help
Usage: python -m scrape_glosbe_dict [OPTIONS] head-word-file
Arguments:
head-word-file Head word file, one word/phrase per line, each will be used
to fetch corresponding definitons from https://glosbe.com/.
[required]
Options:
-f, --from-lang TEXT Source language, check https://glosbe.com/ for valid
value, e.g. https://glosbe.com/en/zh implies
from_lang='en'. [default: en]
-t, --to-lang TEXT Target language, check https://glosbe.com/ for valid
value, e.g. https://glosbe.com/en/zh implies
to_lang='zh'. [default: zh]
-v, --verbose Show output in the process.
-V, --version Show version info and exit.
--help Show this message and exit.
Miscellany
- A retry mechanism (via pypi
tenacity
) is built-in to fetch info from glosbe. Refer to the source file for details. - Local cache (via pypi
joblib
) is used so that you can interrupt anytime and continue later. - Scraping is often frowned upon and sometimes can result in your IP being banned from the website. Use this package at your own discretion.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrape_glosbe_dict-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf5e74d06c0d080b5a2db161984ba28026318b332fa94f36e02b5fe0a5edcff0 |
|
MD5 | 7693d34a77fddcd1392fde59207275b0 |
|
BLAKE2b-256 | 23f8836630c276604d7e59e9dda7f54b28cfd9e6dec85c03b35e86c34d29a88b |