Skip to main content

A practical Python Library for identifying morphemes in the english language.

Project description

Contributors Forks Stargazers Issues MIT License

Downloads


Logo

morphemes

A practical Python Library for identifying morphemes in the english language.


Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

A simple and practical solution for obtaining morpheme information for a word. The majority of the logic uses a simple lookup strategy based off of the MorphoLex-en project. Unknown's ie. names of people & places are all counted as 1 morpheme.
This is a non-contextual solution intended to feed more complex logic for NLP.

(back to top)

Built With

(back to top)

Getting Started

Using this library is fairly routine and easy. More detail will be added to this section as we get closer to the first release.

Prerequisites

This project was developed with Python 3.9 other versions of Python 3 should work.

Installation

pip install morphemes

(back to top)

Usage

Using the morphemes library is very simple.

  1. Import the library
  2. Create an instance of the Morphemes class
    1. Optional - Specify a data path where the morphemes database will be stored. If no data path is specified local app storage will be used.
  3. Use the library by calling the parse function.

Example:

from morphemes import Morphemes

path = "./data"

m = Morphemes(path) #Data path is optional, local storage will be used if left out.
print(m.parse("organizationally"))

Output:

{
  "word": "organizationally",
  "status": "FOUND_IN_DATABASE",
  "morpheme_count": 5,
  "tree": [
    {
      "children": [
        {
          "text": "organ",
          "type": "root"
        },
        {
          "text": "ize",
          "type": "bound"
        }
      ],
      "type": "free"
    },
    {
      "text": "ion",
      "type": "bound"
    },
    {
      "text": "al",
      "type": "bound"
    },
    {
      "text": "ly",
      "type": "bound"
    }
  ]
}

Types definition:

  • root: Root value of the word (some morphemes may have multiple roots (example: milkshake)
  • bound: adds to the root morphemes. Does not contribute meaning on it's own.
  • free: A word which can be used on its own. There can be multiple free types in a single morphem (example: milkshake)

Words which are not found are marked with status NOT_FOUND and will default to 1 morpheme. This will be improved in future releases.

NOTE: the data path specified is where the morphemes library will store a database containing morphemes from MorphoLex-en along with other lookups to help properly detect morphemes.

(back to top)

Roadmap

  • Morpheme detection of known words
  • Handling of common names and places (counted as 1 morpheme)
  • Handling of unknown words

See the open issues for a full list of proposed features (and known issues).

(back to top)

Developers

Clone the repo and use the Make file to build a local version: make install

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Do you want other languages supported? Are you an fluent speaker of the language you want? Help contribute and grow this project in to a more universal morpheme solution!

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

ECSC, ltd - ecsctechdepartment@gmail.com

Project Link: https://github.com/ecscstatsconsulting/morphemes

(back to top)

Acknowledgments

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

morphemes-1.2.0.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

morphemes-1.2.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file morphemes-1.2.0.tar.gz.

File metadata

  • Download URL: morphemes-1.2.0.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for morphemes-1.2.0.tar.gz
Algorithm Hash digest
SHA256 14637571ea020c3c4ce1b4483ed9a3d817471d09fba96304781959bf27e022da
MD5 f89571552274b92c52da536b0de67079
BLAKE2b-256 e415baf404685806e358dcb8b1658f13dcd03fda8045a81393c234de9d124edd

See more details on using hashes here.

File details

Details for the file morphemes-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: morphemes-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for morphemes-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 170898e90b72997d16b11406e54e736cb2cc3302a7f5c4c06811b0abe43ab947
MD5 2efe68d13e2efdb8e5b24f7bdddb155d
BLAKE2b-256 bc6dd687412c3e1d4e7d63d995cf94268786b3bed4a12aad0b3c5e7e37940a34

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page