Kadot, unsupervised natural language processing.
Project description
.. figure:: https://github.com/the-new-sky/Kadot/raw/master/logo.png
:alt: Kadot
Kadot
Unsupervised natural language processing library.
-------------------------------------------------
|Build Status| |Code Health| |PyPI version| |GitHub license|
**Kadot** just lets you process a text easily.
.. code:: python
>>> hello_world = Text("Kadot just lets you process a text easily.")
>>> hello_world.ngrams(n=2)
[('Kadot', 'just'), ('just', 'lets'), ('lets', 'you'), ('you', 'process'), ('process', 'a'), ('a', 'text'), ('text', 'easily')]
🔋 What's included ?
-------------------
Kadot includes **tokenizers**, text **generators**, **classifiers**, word-level and
document-level **vectorizers**.
The philosophy of Kadot is *"never hardcode the language rules"* : use
**unsupervised solutions** to support most languages. So it will never
includes Treebank based algorithms (like a POS Tagger) : use
`TextBlob <https://textblob.readthedocs.io/en/dev/>`__ to do that.
🤔 How to use it ?
-----------------
You can play with the TextBlob-like syntax :
.. code:: python
>>> from kadot import Text
>>> example_text = Text("This is an example text !")
>>> example_text.words
['This', 'is', 'an', 'example', 'text']
>>> example_text.ngrams(n=2)
[('This', 'is'), ('is', 'an'), ('an', 'example'), ('example', 'text')]
And use the words vectorizer to get words relations :
.. code:: python
>>> large_corpus = """Enter a large text, in preference about history."""
>>> history_book = Text(large_corpus)
>>> vectors = history_book.vectorize(window=20, reduce_rate=300)
>>> vectors.apply_translation(vectors['man'], vectors['woman'], vectors['king'], best=1)
# 'man' is to 'woman' what 'king' is to...
[('queen', 0.86899999)]
For more usages, check
`examples <https://github.com/the-new-sky/Kadot/blob/master/examples>`__.
An advanced documentation is coming.
🔨 Installation
--------------
Use the ``pip`` command that refair to the Python 3.5 or 3.6
interpreter. In my case :
::
$ pip3 install kadot
⚖️ License
----------
Kadot is under `MIT
license <https://github.com/the-new-sky/Kadot/blob/master/LICENSE.md>`__.
|forthebadge|
.. |Build Status| image:: https://travis-ci.org/the-new-sky/Kadot.svg?branch=master
:target: https://travis-ci.org/the-new-sky/Kadot
.. |Code Health| image:: https://landscape.io/github/the-new-sky/Kadot/master/landscape.svg?style=flat
:target: https://landscape.io/github/the-new-sky/Kadot/master
.. |PyPI version| image:: https://badge.fury.io/py/Kadot.svg
:target: https://badge.fury.io/py/Kadot
.. |GitHub license| image:: https://img.shields.io/badge/license-MIT-blue.svg
:target: https://raw.githubusercontent.com/the-new-sky/Kadot/master/LICENSE.md
.. |forthebadge| image:: http://forthebadge.com/badges/built-with-love.svg
:target: http://forthebadge.com
:alt: Kadot
Kadot
Unsupervised natural language processing library.
-------------------------------------------------
|Build Status| |Code Health| |PyPI version| |GitHub license|
**Kadot** just lets you process a text easily.
.. code:: python
>>> hello_world = Text("Kadot just lets you process a text easily.")
>>> hello_world.ngrams(n=2)
[('Kadot', 'just'), ('just', 'lets'), ('lets', 'you'), ('you', 'process'), ('process', 'a'), ('a', 'text'), ('text', 'easily')]
🔋 What's included ?
-------------------
Kadot includes **tokenizers**, text **generators**, **classifiers**, word-level and
document-level **vectorizers**.
The philosophy of Kadot is *"never hardcode the language rules"* : use
**unsupervised solutions** to support most languages. So it will never
includes Treebank based algorithms (like a POS Tagger) : use
`TextBlob <https://textblob.readthedocs.io/en/dev/>`__ to do that.
🤔 How to use it ?
-----------------
You can play with the TextBlob-like syntax :
.. code:: python
>>> from kadot import Text
>>> example_text = Text("This is an example text !")
>>> example_text.words
['This', 'is', 'an', 'example', 'text']
>>> example_text.ngrams(n=2)
[('This', 'is'), ('is', 'an'), ('an', 'example'), ('example', 'text')]
And use the words vectorizer to get words relations :
.. code:: python
>>> large_corpus = """Enter a large text, in preference about history."""
>>> history_book = Text(large_corpus)
>>> vectors = history_book.vectorize(window=20, reduce_rate=300)
>>> vectors.apply_translation(vectors['man'], vectors['woman'], vectors['king'], best=1)
# 'man' is to 'woman' what 'king' is to...
[('queen', 0.86899999)]
For more usages, check
`examples <https://github.com/the-new-sky/Kadot/blob/master/examples>`__.
An advanced documentation is coming.
🔨 Installation
--------------
Use the ``pip`` command that refair to the Python 3.5 or 3.6
interpreter. In my case :
::
$ pip3 install kadot
⚖️ License
----------
Kadot is under `MIT
license <https://github.com/the-new-sky/Kadot/blob/master/LICENSE.md>`__.
|forthebadge|
.. |Build Status| image:: https://travis-ci.org/the-new-sky/Kadot.svg?branch=master
:target: https://travis-ci.org/the-new-sky/Kadot
.. |Code Health| image:: https://landscape.io/github/the-new-sky/Kadot/master/landscape.svg?style=flat
:target: https://landscape.io/github/the-new-sky/Kadot/master
.. |PyPI version| image:: https://badge.fury.io/py/Kadot.svg
:target: https://badge.fury.io/py/Kadot
.. |GitHub license| image:: https://img.shields.io/badge/license-MIT-blue.svg
:target: https://raw.githubusercontent.com/the-new-sky/Kadot/master/LICENSE.md
.. |forthebadge| image:: http://forthebadge.com/badges/built-with-love.svg
:target: http://forthebadge.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Kadot-0.1.8.tar.gz
(6.6 kB
view hashes)