Skip to main content

Scraping grapheme-to-phoneme data from Wiktionary.

Project description

WikiPron

PyPI version Supported Python versions CircleCI

WikiPron is a command line toolkit for scraping grapheme-to-phoneme (G2P) data from Wiktionary.

Installation

WikiPron requires Python 3.6+. It is available through pip:

pip install wikipron

Usage

After installation, the terminal command wikipron will be available. As a basic example, the following command scrapes G2P data for French (with the ISO language code fr):

wikipron fr

By default, the results appear on the terminal, where each line has the orthography of a word, followed by a tab and then the word's pronunciation in IPA.

For example commands using advanced options, the languages/wikipron/scrape script shows how a multilingual G2P dataset can be created.

For a full list of command-line options, please run wikipron -h.

The underlying module can also be used from Python. A standard workflow looks like:

import wikipron

config = wikipron.Config(key="fr")  # French, with default options.
for word, pron in wikipron.scrape(config):
    ...

Development and Contribution

For questions, bug reports, and feature requests, please file an issue.

If you would like to contribute to the wikipron codebase, please see CONTRIBUTING.md.

We keep track of notable changes in CHANGELOG.md.

License

Apache 2.0. Please see LICENSE.txt for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikipron-0.1.1.tar.gz (10.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page