A Python library to conjugate French, English, Spanish, Italian, Portuguese and Romanian verbs using Machine Learning techniques.
Project description
mlconjug
a binary feature extractor,
a feature selector using Linear Support Vector Classification,
a classifier using Stochastic Gradient Descent.
Free software: MIT license
Documentation: https://mlconjug.readthedocs.io.
Supported Languages
French
English
Spanish
Italian
Portuguese
Romanian
Features
Easy to use API.
Includes pre-trained models with 99% + accuracy in predicting conjugation class of unknown verbs.
Easily train new models or add new languages.
Easily integrate MLConjug in your own projects.
Can be used as a command line tool.
Credits
This package was created with the help of Verbiste and scikit-learn.
The logo was designed by Zuur.
Installation
Stable release
To install MLConjug, run this command in your terminal:
$ pip install mlconjug
This is the preferred method to install MLConjug, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
From sources
The sources for MLConjug can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/SekouD/mlconjug
Or download the tarball:
$ curl -OL https://github.com/SekouD/mlconjug/tarball/master
Once you have a copy of the source, you can install it with:
$ python setup.py install
Usage
To use MLConjug in a project with the provided pre-trained conjugation models:
import mlconjug # To use mlconjug with the default parameters and a pre-trained conjugation model. default_conjugator = mlconjug.Conjugator(language='fr') # Verify that the model works test1 = default_conjugator.conjugate("manger").conjug_info['Indicatif']['Passé Simple']['1p'] test2 = default_conjugator.conjugate("partir").conjug_info['Indicatif']['Passé Simple']['1p'] test3 = default_conjugator.conjugate("facebooker").conjug_info['Indicatif']['Passé Simple']['1p'] test4 = default_conjugator.conjugate("astigratir").conjug_info['Indicatif']['Passé Simple']['1p'] test5 = default_conjugator.conjugate("mythoner").conjug_info['Indicatif']['Passé Simple']['1p'] print(test1) print(test2) print(test3) print(test4) print(test5) # You can now iterate over all conjugated forms of a verb by using the newly added Verb.iterate() method. default_conjugator = mlconjug.Conjugator(language='en') test_verb = default_conjugator.conjugate("be") all_conjugated_forms = test_verb.iterate() print(all_conjugated_forms)
To use MLConjug in a project and train a new model:
# Set a language to train the Conjugator on lang = 'fr' # Set a ngram range sliding window for the vectorizer ngrange = (2,7) # Transforms dataset with CountVectorizer. We pass the function extract_verb_features to the CountVectorizer. vectorizer = mlconjug.CountVectorizer(analyzer=partial(mlconjug.extract_verb_features, lang=lang, ngram_range=ngrange), binary=True) # Feature reduction feature_reductor = mlconjug.SelectFromModel(mlconjug.LinearSVC(penalty="l1", max_iter=12000, dual=False, verbose=0)) # Prediction Classifier classifier = mlconjug.SGDClassifier(loss="log", penalty='elasticnet', l1_ratio=0.15, max_iter=4000, alpha=1e-5, random_state=42, verbose=0) # Initialize Data Set dataset = mlconjug.DataSet(mlconjug.Verbiste(language=lang).verbs) dataset.construct_dict_conjug() dataset.split_data(proportion=0.9) # Initialize Conjugator model = mlconjug.Model(vectorizer, feature_reductor, classifier) conjugator = mlconjug.Conjugator(lang, model) #Training and prediction conjugator.model.train(dataset.train_input, dataset.train_labels) predicted = conjugator.model.predict(dataset.test_input) # Assess the performance of the model's predictions score = len([a == b for a, b in zip(predicted, dataset.test_labels) if a == b]) / len(predicted) print('The score of the model is {0}'.format(score)) # Verify that the model works test1 = conjugator.conjugate("manger").conjug_info['Indicatif']['Passé Simple']['1p'] test2 = conjugator.conjugate("partir").conjug_info['Indicatif']['Passé Simple']['1p'] test3 = conjugator.conjugate("facebooker").conjug_info['Indicatif']['Passé Simple']['1p'] test4 = conjugator.conjugate("astigratir").conjug_info['Indicatif']['Passé Simple']['1p'] test5 = conjugator.conjugate("mythoner").conjug_info['Indicatif']['Passé Simple']['1p'] print(test1) print(test2) print(test3) print(test4) print(test5) # Save trained model with open('path/to/save/data/trained_model-fr.pickle', 'wb') as file: pickle.dump(conjugator.model, file)
To use MLConjug from the command line:
$ mlconjug manger $ mlconjug bring -l en $ mlconjug gallofar --language es
History
3.4 (2019-29-04)
Fixed bug when verbs with no common roots with their conjugated form get their root inserted as a prefix.
Added the method iterate() to the Verb Class as per @poolebu’s feature request.
Updated Dependencies.
3.3.2 (2019-06-04)
Corrected bug with regular english verbs not being properly regulated. Thanks to @vectomon
Updated Dependencies.
3.3.1 (2019-02-04)
Corrected bug when updating dependencies to use scikit-learn v 0.20.2 and higher.
Updated Dependencies.
3.3 (2019-04-03)
Updated Dependencies to use scikit-learn v 0.20.2 and higher.
Updated the pre-trained models to use scikit-learn v 0.20.2 and higher.
3.2.3 (2019-26-02)
Updated Dependencies.
Fixed bug which prevented the installation of the pre-trained models.
3.2.2 (2018-18-11)
Updated Dependencies.
3.2.0 (2018-04-11)
Updated Dependencies.
3.1.3 (2018-07-10)
Updated Documentation.
Added support for pipenv.
Included tests and documentation in the package distribution.
3.1.2 (2018-06-27)
Updated Type annotations to the whole library for PEP-561 compliance.
3.1.1 (2018-06-26)
Minor Api enhancement (see API documentation)
3.1.0 (2018-06-24)
Updated the conjugation models for Spanish and Portuguese.
Internal changes to the format of the verbiste data from xml to json for better handling of unicode characters.
New class ConjugManager to more easily add new languages to mlconjug.
Minor Api enhancement (see API documentation)
3.0.1 (2018-06-22)
- Updated all provided pre-trained prediction models:
Implemented a new vectrorizer extracting more meaningful features.
As a result the performance of the models has gone through the roof in all languages.
Recall and Precision are intesimally close to 100 %. English being the anly to achieve a perfect score at both Recall and Precision.
- Major API changes:
I removed the class EndingCustomVectorizer and refactored it’s functionnality in a top level function called extract_verb_features()
The provided new improved model are now being zip compressed before release because the feature space has so much grown that their size made them impractical to distribute with the package.
Renamed “Model.model” to “Model.pipeline”
Renamed “DataSet.liste_verbes” and “DataSet.liste_templates” to “DataSet.verbs_list” and “DataSet.templates_list” respectively. (Pardon my french ;-) )
Added the attributes “predicted” and “confidence_score” to the class Verb.
The whole package have been typed check. I will soon add mlconjug’s type stubs to typeshed.
2.1.11 (2018-06-21)
- Updated all provided pre-trained prediction models
The French Conjugator has accuracy of about 99.94% in predicting the correct conjugation class of a French verb. This is the baseline as i have been working on it for some time now.
The English Conjugator has accuracy of about 99.78% in predicting the correct conjugation class of an English verb. This is one of the biggest improvement since version 2.0.0
The Spanish Conjugator has accuracy of about 99.65% in predicting the correct conjugation class of a Spanish verb. It has also seen a sizable improvement since version 2.0.0
The Romanian Conjugator has accuracy of about 99.06% in predicting the correct conjugation class of a Romanian verb.This is by far the bigger gain. I modified the vectorizer to better take into account the morphological features or romanian verbs. (the previous score was about 86%, so it wil be nice for our romanian friends to have a trusted conjugator)
The Portuguese Conjugator has accuracy of about 96.73% in predicting the correct conjugation class of a Portuguese verb.
The Italian Conjugator has accuracy of about 94.05% in predicting the correct conjugation class of a Italian verb.
2.1.9 (2018-06-21)
- Now the Conjugator adds additional information to the Verb object returned.
If the verb under consideration is already in Verbiste, the conjugation for the verb is retrieved directly from memory.
If the verb under consideration is unknown in Verbiste, the Conjugator class now sets the boolean attribute ‘predicted’ and the float attribute confidence score to the instance of the Verb object the Conjugator.conjugate(verb) returns.
Added Type annotations to the whole library for robustness and ease of scaling-out.
The performance of the Engish and Romanian Models have improved significantly lately. I guess in a few more iteration they will be on par with the French Model which is the best performing at the moment as i have been tuning its parameters for a caouple of year now. Not so much with the other languages, but if you update regularly you will see nice improvents in the 2.2 release.
Enhanced the localization of the program.
Now the user interface of mlconjug is avalaible in French, Spanish, Italian, Portuguese and Romanian, in addition to English.
All the documentation of the project have been translated in the supported languages.
2.1.5 (2018-06-15)
Added localization.
Now the user interface of mlconjug is avalaible in French, Spanish, Italian, Portuguese and Romanian, in addition to English.
2.1.2 (2018-06-15)
Added invalid verb detection.
2.1.0 (2018-06-15)
Updated all language models for compatibility with scikit-learn 0.19.1.
2.0.0 (2018-06-14)
Includes English conjugation model.
Includes Spanish conjugation model.
Includes Italian conjugation model.
Includes Portuguese conjugation model.
Includes Romanian conjugation model.
1.2.0 (2018-06-12)
Refactored the API. Now a Single class Conjugator is needed to interface with the module.
Includes improved french conjugation model.
Added support for multiple languages.
1.1.0 (2018-06-11)
Refactored the API. Now a Single class Conjugator is needed to interface with the module.
Includes improved french conjugation model.
1.0.0 (2018-06-10)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mlconjug-3.4.0.tar.gz
.
File metadata
- Download URL: mlconjug-3.4.0.tar.gz
- Upload date:
- Size: 7.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be03bdc5757f8207f2faffba127be40229b6b980db53c6ab018ed904395b424e |
|
MD5 | 2e2afab89e842c888f76b530f71f8dd5 |
|
BLAKE2b-256 | e6575f57a35067cfa77c842fcca2b5e503351674c5b837c63bc27779b1569ddf |
File details
Details for the file mlconjug-3.4.0-py3-none-any.whl
.
File metadata
- Download URL: mlconjug-3.4.0-py3-none-any.whl
- Upload date:
- Size: 8.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 396803fdbf205be64833d50ae8ac552f10f2f5acb50f6f43875b2424030c6c47 |
|
MD5 | d82b22da445e16aa0934372fecbf4d9d |
|
BLAKE2b-256 | 768acecb724e0182510a04b737629e55a9f3642ee844821f17900012ae32b4e7 |