Skip to main content

A practical Python Library for identifying morphemes in the english language.

Project description

Contributors Forks Stargazers Issues MIT License

Downloads


Logo

morphemes

A practical Python Library for identifying morphemes in the english language.


Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact
  8. Acknowledgments

About The Project

A simple and practical solution for obtaining morpheme information for a word. The majority of the logic uses a simple lookup strategy based off of the MorphoLex-en project. Unknown's ie. names of people & places are all counted as 1 morpheme.
This is a non-contextual solution intended to feed more complex logic for NLP.

(back to top)

Built With

(back to top)

Getting Started

Using this library is fairly routine and easy. More detail will be added to this section as we get closer to the first release.

Prerequisites

This project was developed with Python 3.9 other versions of Python 3 should work.

Installation

pip install morphemes

(back to top)

Usage

Using the morphemes library is very simple.

  1. Import the library
  2. Create an instance of the Morphemes class
    1. Optional - Specify a data path where the morphemes database will be stored. If no data path is specified local app storage will be used.
  3. Use the library by calling the parse function.

Example:

from morphemes import Morphemes

path = "./data"

m = Morphemes(path) #Data path is optional, local storage will be used if left out.
print(m.parse("organizationally"))

Output:

{
  "word": "organizationally",
  "status": "FOUND_IN_DATABASE",
  "morpheme_count": 5,
  "tree": [
    {
      "children": [
        {
          "text": "organ",
          "type": "root"
        },
        {
          "text": "ize",
          "type": "bound"
        }
      ],
      "type": "free"
    },
    {
      "text": "ion",
      "type": "bound"
    },
    {
      "text": "al",
      "type": "bound"
    },
    {
      "text": "ly",
      "type": "bound"
    }
  ]
}

Types definition:

  • root: Root value of the word (some morphemes may have multiple roots (example: milkshake)
  • bound: adds to the root morphemes. Does not contribute meaning on it's own.
  • free: A word which can be used on its own. There can be multiple free types in a single morphem (example: milkshake)

Words which are not found are marked with status NOT_FOUND and will default to 1 morpheme. This will be improved in future releases.

NOTE: the data path specified is where the morphemes library will store a database containing morphemes from MorphoLex-en along with other lookups to help properly detect morphemes.

(back to top)

Roadmap

  • Morpheme detection of known words
  • Handling of common names and places (counted as 1 morpheme)
  • Handling of unknown words

See the open issues for a full list of proposed features (and known issues).

(back to top)

Developers

Clone the repo and use the Make file to build a local version: make install

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Do you want other languages supported? Are you an fluent speaker of the language you want? Help contribute and grow this project in to a more universal morpheme solution!

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

ECSC, ltd - ecsctechdepartment@gmail.com

Project Link: https://github.com/ecscstatsconsulting/morphemes

(back to top)

Acknowledgments

(back to top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

morphemes-1.2.0.tar.gz (11.4 kB view hashes)

Uploaded Source

Built Distribution

morphemes-1.2.0-py3-none-any.whl (10.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page