A practical Python Library for identifying morphemes in the english language.
Project description
Table of Contents
About The Project
A simple and practical solution for obtaining morpheme information
for a word. The majority of the logic uses a simple lookup strategy
based off of the MorphoLex-en
project. Unknown's ie. names of people & places are all counted as 1 morpheme.
This is a non-contextual solution intended to feed more complex logic for NLP.
Built With
Getting Started
Using this library is fairly routine and easy. More detail will be added to this section as we get closer to the first release.
Prerequisites
This project was developed with Python 3.9 other versions of Python 3 should work.
Installation
pip install morphemes
Usage
Using the morphemes library is very simple.
- Import the library
- Create an instance of the
Morphemes
class- Optional - Specify a data path where the morphemes database will be stored. If no data path is specified local app storage will be used.
- Use the library by calling the
parse
function.
Example:
from morphemes import Morphemes
path = "./data"
m = Morphemes(path) #Data path is optional, local storage will be used if left out.
print(m.parse("organizationally"))
Output:
{
"word": "organizationally",
"status": "FOUND_IN_DATABASE",
"morpheme_count": 5,
"tree": [
{
"children": [
{
"text": "organ",
"type": "root"
},
{
"text": "ize",
"type": "bound"
}
],
"type": "free"
},
{
"text": "ion",
"type": "bound"
},
{
"text": "al",
"type": "bound"
},
{
"text": "ly",
"type": "bound"
}
]
}
Types definition:
- root: Root value of the word (some morphemes may have multiple roots (example: milkshake)
- bound: adds to the root morphemes. Does not contribute meaning on it's own.
- free: A word which can be used on its own. There can be multiple free types in a single morphem (example: milkshake)
Words which are not found are marked with status NOT_FOUND
and will default
to 1 morpheme. This will be improved in future releases.
NOTE: the data
path specified is where the morphemes library will
store a database containing morphemes from MorphoLex-en
along with other lookups to help properly detect morphemes.
Roadmap
- Morpheme detection of known words
- Handling of common names and places (counted as 1 morpheme)
- Handling of unknown words
See the open issues for a full list of proposed features (and known issues).
Developers
Clone the repo and use the Make file to build a local version:
make install
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
Do you want other languages supported? Are you an fluent speaker of the language you want? Help contribute and grow this project in to a more universal morpheme solution!
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE.txt
for more information.
Contact
ECSC, ltd - ecsctechdepartment@gmail.com
Project Link: https://github.com/ecscstatsconsulting/morphemes
Acknowledgments
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file morphemes-1.2.0.tar.gz
.
File metadata
- Download URL: morphemes-1.2.0.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14637571ea020c3c4ce1b4483ed9a3d817471d09fba96304781959bf27e022da |
|
MD5 | f89571552274b92c52da536b0de67079 |
|
BLAKE2b-256 | e415baf404685806e358dcb8b1658f13dcd03fda8045a81393c234de9d124edd |
File details
Details for the file morphemes-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: morphemes-1.2.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 170898e90b72997d16b11406e54e736cb2cc3302a7f5c4c06811b0abe43ab947 |
|
MD5 | 2efe68d13e2efdb8e5b24f7bdddb155d |
|
BLAKE2b-256 | bc6dd687412c3e1d4e7d63d995cf94268786b3bed4a12aad0b3c5e7e37940a34 |