Scraping grapheme-to-phoneme data from Wiktionary.
Project description
WikiPron
WikiPron is a command line toolkit for scraping grapheme-to-phoneme (G2P) data from Wiktionary.
Installation
WikiPron requires Python 3.6+. It is available through pip:
pip install wikipron
Usage
After installation, the terminal command wikipron
will be available.
As a basic example, the following command scrapes G2P data for French
(with the ISO language code fr
):
wikipron fr
By default, the results appear on the terminal, where each line has the orthography of a word, followed by a tab and then the word's pronunciation in IPA.
For example commands using advanced options,
the languages/wikipron/scrape
script shows
how a multilingual G2P dataset can be created.
For a full list of command-line options, please run wikipron -h
.
The underlying module can also be used from Python. A standard workflow looks like:
import wikipron
config = wikipron.Config(key="fr") # French, with default options.
for word, pron in wikipron.scrape(config):
...
Development and Contribution
For questions, bug reports, and feature requests, please file an issue.
If you would like to contribute to the wikipron
codebase,
please see CONTRIBUTING.md
.
License
Apache 2.0. Please see LICENSE.txt
for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.