Skip to main content

No project description provided

Project description

Mother Tongues Dictionaries (MTD)

:speech_balloon: This repo is a near-complete re-write of the legacy mothertongues code and is set to replace it. For a list of improvements see this section. For legacy documentation, please go here :speech_balloon:

codecov Documentation Status Build Status PyPI package license standard-readme compliant

MTD is an open-source tool that allow language communities and developers to quickly and inexpensively make their dictionary data digitally accessible. MTD is a tool that parses and prepares your data for being used with an MTD User Interface.

Please visit the website or docs for more information.

Table of Contents

Background

This project started as just a single dictionary for Gitxsan - a language spoken in Northern British Columbia, but it became quickly apparent that many communities also had the same problem. That is, they had some dictionary data but all of the options for sharing that data online were prohibitively expensive. MTD aims to make it easier to create online digital dictionary resources.

Note - Just because you can make an online dictionary does not mean you should. Before making a dictionary, you must have clear consent from the language community in order to publish a dictionary. For some background on why this is important, please read sections 1 and 2.1 here

Install

It is recommended to install mothertongues using pip. The package name is mothertongues, and as of version 1.0.x it is imported with import mothertongues. The CLI can be run using mothertongues --help.

pip install mothertongues

Quick Start

If you just want to try something out you can use the mothertongues command line to create a configuration and some sample data:

  1. poetry run python3 cli.py new-project
  2. Then run your dictionary: poetry run python3 cli.py build-and-run <YourDictionaryConfigDirPath>/config.mtd.json

Local Install

To install locally you will have to have Git, Python 3.8+, poetry and Node 16+ on your machine. You can then follow these steps:

  1. Clone repo and UI submodule git clone https://github.com/MotherTongues/mothertongues.git --recursive
  2. Build the UI: cd mothertongues/mothertongues-UI && npm install
  3. Build the Python Development version of the UI: npx nx build mtd-mobile-ui --configuration=pydev
  4. Install the Python package: cd .. && poetry install

Usage

In order to create a Mother Tongues Dictionary you will need at least two things:

  • A configuration file for your language/dictionary
  • A configuration file for each source of data

You can find out more about how to create these files against the MTD configuration schema by visiting the guides

Once you have those files, you can either create a dictionary using the command line interface.

The basic workflow for creating a dictionary is as follows:

  1. Fork and clone the mtd-starter
  2. Edit and prepare the repo using your own data
  3. Export your data to a format readable by the Mother Tongues User Interfaces
  4. Add your exported data (dictionary_data.json) from step 3 and then publish your dictionary! 🎉

Improvements

There are a variety of improvements from the legacy mothertongues library. Here are a few of them:

  • The CLI now builds an inverted index over any of your dictionary entry keys and search is conducted on the terms of that index.
  • Results are now ranked by a combination of Edit Distance and OkapiBM25 meaning you're likely to get better results. We sort results by first sorting them based on edit distance - for results with the same edit distance, we then sort them based on their OkapiBM25 score.
  • We use strongly typed configurations to reduce errors when building your dictionary
  • We support two search algorithms out-of-the-box: an unweighted Levenstein search over Levenstein automata (very fast) or a quadratic weighted Levenstein search over the terms of the index (flexible and slower, but still pretty fast).
  • The search algorithm is now optimized for multi-term searches
  • There is a unified normalization strategy between the UI and CLI that allows for performing case normalization, Unicode normalization, removal of punctuation (customizable), arbitrary replace rules, and removal of combining diacritics/accents.
  • There is a built in web-server in the CLI for quickly spinning up a development version of your app mothertongues build-and-run <path_to_language_config>
  • There is an API that serves the MTD JSON schemas and validates your configuration files and dictionary data.
  • The sorter is improved to be able to handle Out of Vocabulary (OOV) characters
  • Parsing JSON is many times faster thanks to @dhdaines

Contributing

If something is not working, or you'd like to see another feature added, feel free to dive in! We please ask that you read the contributing guidelines before submitting any pull requests. Open an issue or submit PRs. Help writing and clarifying documentation is also very welcome.

This repo follows the Contributor Covenant Code of Conduct.

Acknowledgements

Thank you to both Patrick Littell & Mark Turin for their contributions, guidance and support as well as institutional support from the First Peoples' Cultural Council and SSHRC Insight Grant 435-2016-1694, ‘Enhancing Lexical Resources for BC First Nations Languages’.

Thank you to all other contributors for support with improving MotherTongues, finding bugs and writing documentation.

Contributors

This project exists thanks to all the people who contribute.

@dhdaines. @littell. @markturin. @eddieantonio. @kavonjon.

License

MIT © Aidan Pine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mothertongues-1.0.20250728.tar.gz (7.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mothertongues-1.0.20250728-py3-none-any.whl (9.1 MB view details)

Uploaded Python 3

File details

Details for the file mothertongues-1.0.20250728.tar.gz.

File metadata

  • Download URL: mothertongues-1.0.20250728.tar.gz
  • Upload date:
  • Size: 7.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.14

File hashes

Hashes for mothertongues-1.0.20250728.tar.gz
Algorithm Hash digest
SHA256 3e4148aec56c6f9ae6b46301c83db5d2e4bc6b5c58229eeaa4990c150db9cc6e
MD5 05fd28a439d69f16e6a43dbf52f7f6bf
BLAKE2b-256 bd96f70c9a3ae183c9b03ce652466eaa952f4c78f6909bd57458fa4b02024a22

See more details on using hashes here.

File details

Details for the file mothertongues-1.0.20250728-py3-none-any.whl.

File metadata

File hashes

Hashes for mothertongues-1.0.20250728-py3-none-any.whl
Algorithm Hash digest
SHA256 c632d1db4a87fde485338713ed8b112fd0fe00fb480c3705213a9e1c4cbc58ab
MD5 f82d54e4d6d64855846a8ec51d437eb4
BLAKE2b-256 b9936df16537e02d7e75510067385bdc4718f81eeedfee1735bea3573c436679

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page