Skip to main content

Module for creating context-aware, rule-based G2P mappings that preserve indices

Project description

Gⁱ-2-Pⁱ

Coverage Status Build Status PyPI package license standard-readme compliant

Grapheme-to-Phoneme transductions that preserve input and output indices!

This library is for handling arbitrary transductions between input and output segments while preserving indices.

Table of Contents

Background

The initial version of this package was developed by Patrick Littell and was developed in order to allow for g2p from community orthographies to IPA and back again in ReadAlong-Studio. We decided to then pull out the g2p mechanism from Convertextract which allows transducer relations to be declared in CSV files, and turn it into its own library - here it is!

Install

The best thing to do is install with pip pip install g2p.

Otherwise, clone the repo and pip install it locally.

$ git clone https://github.com/roedoejet/g2p.git
$ cd g2p
$ pip install -e .

Usage

In order to initialize a Transducer, you must first create a Mapping object.

Mapping

You can create mappings either by initializing them directly with a list:

from g2p.mappings import Mapping

mappings = Mapping([{"in": 'a', "out": 'b'}])

Alternatively, you can add a CSV file to g2p/mappings/langs/<YourLang>/<YourLookupTable>

from g2p.mappings import Mapping

mappings = Mapping(language={"lang": "<YourLang>", "table": "<YourLookupTable>"})

Transducer

Initialize a Transducer with a Mapping object. Calling the Transducer then produces the output. In order to preserve the indices, pass index=True when calling the Transducer.

from g2p.mappings import Mapping
from g2p.transducer import Transducer

mappings = Mapping([{"in": 'a', "out": 'b'}])
transducer = Transducer(mappings)
transducer('a')
# 'b'
transducer('a', index=True)
# ('b', <g2p.transducer.indices.Indices object>)

To make sense of the Indices object that is produced, you can either call it, and produce a list of each character. Doing that for the above produces [((0, 'a'), (0, 'b'))] - a list of relation tuples where each relation tuple is comprised of an input and output. Each input tuple and output tuple is in turn comprised of an index and a corresponding character. You can also call output() and input() to see the plain text output and input respectively.

Studio

You can also run the g2p Studio which is a web interface for creating custom lookup tables to be used with g2p. To run the g2p Studio either visit ***** or run it locally using python run_studio.py.

You can also import the app directly from the package:

from g2p import app

app.run(host='0.0.0.0', port=5000, debug=True)

Maintainers

@roedoejet.

Contributing

Feel free to dive in! Open an issue or submit PRs.

This repo follows the Contributor Covenant Code of Conduct.

Contributors

This project exists thanks to all the people who contribute.

@littell. @finguist. @eddieantonio. @dhdaines.

License

MIT © Patrick Littell, Aidan Pine

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

g2p-0.2.20190919.tar.gz (1.5 MB view hashes)

Uploaded Source

Built Distribution

g2p-0.2.20190919-py3-none-any.whl (2.2 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page