This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Polyglot is a natural language pipeline that supports massive multilingual applications.

Project Description
polyglot
========

|Downloads| |Latest Version| |Build Status| |Documentation Status|

.. |Downloads| image:: https://img.shields.io/pypi/dm/polyglot.svg
:target: https://pypi.python.org/pypi/polyglot
.. |Latest Version| image:: https://badge.fury.io/py/polyglot.svg
:target: https://pypi.python.org/pypi/polyglot
.. |Build Status| image:: https://travis-ci.org/aboSamoor/polyglot.png?branch=master
:target: https://travis-ci.org/aboSamoor/polyglot
.. |Documentation Status| image:: https://readthedocs.org/projects/polyglot/badge/?version=latest
:target: https://readthedocs.org/builds/polyglot/

Polyglot is a natural language pipeline that supports massive
multilingual applications.

- Free software: GPLv3 license
- Documentation: http://polyglot.readthedocs.org.
- GitHub: https://github.com/aboSamoor/polyglot

Features
~~~~~~~~

- Tokenization (165 Languages)
- Language detection (196 Languages)
- Named Entity Recognition (40 Languages)
- Part of Speech Tagging (16 Languages)
- Sentiment Analysis (136 Languages)
- Word Embeddings (137 Languages)
- Morphological analysis (135 Languages)
- Transliteration (69 Languages)

Developer
~~~~~~~~~

- Rami Al-Rfou @ ``rmyeid gmail com``

Quick Tutorial
--------------

.. code:: python

import polyglot
from polyglot.text import Text, Word

Language Detection
~~~~~~~~~~~~~~~~~~

.. code:: python

text = Text("Bonjour, Mesdames.")
print("Language Detected: Code={}, Name={}\n".format(text.language.code, text.language.name))


.. parsed-literal::

Language Detected: Code=fr, Name=French



Tokenization
~~~~~~~~~~~~

.. code:: python

zen = Text("Beautiful is better than ugly. "
"Explicit is better than implicit. "
"Simple is better than complex.")
print(zen.words)


.. parsed-literal::

[u'Beautiful', u'is', u'better', u'than', u'ugly', u'.', u'Explicit', u'is', u'better', u'than', u'implicit', u'.', u'Simple', u'is', u'better', u'than', u'complex', u'.']


.. code:: python

print(zen.sentences)


.. parsed-literal::

[Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")]


Part of Speech Tagging
~~~~~~~~~~~~~~~~~~~~~~

.. code:: python

text = Text(u"O primeiro uso de desobediência civil em massa ocorreu em setembro de 1906.")

print("{:<16}{}".format("Word", "POS Tag")+"\n"+"-"*30)
for word, tag in text.pos_tags:
print(u"{:<16}{:>2}".format(word, tag))


.. parsed-literal::

Word POS Tag
------------------------------
O DET
primeiro ADJ
uso NOUN
de ADP
desobediência NOUN
civil ADJ
em ADP
massa NOUN
ocorreu ADJ
em ADP
setembro NOUN
de ADP
1906 NUM
. PUNCT


Named Entity Recognition
~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: python

text = Text(u"In Großbritannien war Gandhi mit dem westlichen Lebensstil vertraut geworden")
print(text.entities)


.. parsed-literal::

[I-LOC([u'Gro\\xdfbritannien']), I-PER([u'Gandhi'])]


Polarity
~~~~~~~~

.. code:: python

print("{:<16}{}".format("Word", "Polarity")+"\n"+"-"*30)
for w in zen.words[:6]:
print("{:<16}{:>2}".format(w, w.polarity))


.. parsed-literal::

Word Polarity
------------------------------
Beautiful 0
is 0
better 1
than 0
ugly -1
. 0


Embeddings
~~~~~~~~~~

.. code:: python

word = Word("Obama", language="en")
print("Neighbors (Synonms) of {}".format(word)+"\n"+"-"*30)
for w in word.neighbors:
print("{:<16}".format(w))
print("\n\nThe first 10 dimensions out the {} dimensions\n".format(word.vector.shape[0]))
print(word.vector[:10])


.. parsed-literal::

Neighbors (Synonms) of Obama
------------------------------
Bush
Reagan
Clinton
Ahmadinejad
Nixon
Karzai
McCain
Biden
Huckabee
Lula


The first 10 dimensions out the 256 dimensions

[-2.57382345 1.52175975 0.51070285 1.08678675 -0.74386948 -1.18616164
2.92784619 -0.25694436 -1.40958667 -2.39675403]


Morphology
~~~~~~~~~~

.. code:: python

word = Text("Preprocessing is an essential step.").words[0]
print(word.morphemes)


.. parsed-literal::

[u'Pre', u'process', u'ing']


Transliteration
~~~~~~~~~~~~~~~

.. code:: python

from polyglot.transliteration import Transliterator
transliterator = Transliterator(source_lang="en", target_lang="ru")
print(transliterator.transliterate(u"preprocessing"))


.. parsed-literal::

препрокессинг





History
-------

"14.11" (2014-01-11)
---------------------

* First release on PyPI.


"15.5.2" (2015-05-02)
---------------------

* Polyglot is feature complete.


"15.10.03" (2015-10-03)
---------------------------

* Change the polyglot models mirror to Stony Brook University DSL lab instead
of Google cloud storage.


"16.07.04" (2016-07-03)
---------------------------

* New Features:
- Support Transfer POS Tagging.
- Support supplying `hint_language_code` for `Text`.

* Bug Fix:
- Improve sentence serialization (PR #34)
- Fix rare unicode encode error (PR #35)
- Fix transliteration from languages other than English (PR 46)
- Add link to Github in README (PR #49)
- Make handling of paths more coherent (RP #55)
- Fix normalizing embedding in place for NER corrupts the features of POS (issue #60, PR #62)
Release History

Release History

This version
History Node

16.7.4

History Node

15.10.03

History Node

15.5.2

History Node

15.5.1

History Node

15.04.19

History Node

15.03.17

History Node

15.03.05

History Node

15.03

History Node

14.11

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
polyglot-16.7.4.tar.gz (126.3 kB) Copy SHA256 Checksum SHA256 Source Jul 3, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting