Skip to main content

Polyglot is a natural language pipeline that supports massive multilingual applications.

Project description

Downloads Latest Version Build Status Documentation Status

Polyglot is a natural language pipeline that supports massive multilingual applications.

Features

  • Tokenization (165 Languages)

  • Language detection (196 Languages)

  • Named Entity Recognition (40 Languages)

  • Part of Speech Tagging (16 Languages)

  • Sentiment Analysis (136 Languages)

  • Word Embeddings (137 Languages)

  • Morphological analysis (135 Languages)

  • Transliteration (69 Languages)

Developer

  • Rami Al-Rfou @ rmyeid gmail com

Quick Tutorial

import polyglot
from polyglot.text import Text, Word

Language Detection

text = Text("Bonjour, Mesdames.")
print("Language Detected: Code={}, Name={}\n".format(text.language.code, text.language.name))
Language Detected: Code=fr, Name=French

Tokenization

zen = Text("Beautiful is better than ugly. "
           "Explicit is better than implicit. "
           "Simple is better than complex.")
print(zen.words)
[u'Beautiful', u'is', u'better', u'than', u'ugly', u'.', u'Explicit', u'is', u'better', u'than', u'implicit', u'.', u'Simple', u'is', u'better', u'than', u'complex', u'.']
print(zen.sentences)
[Sentence("Beautiful is better than ugly."), Sentence("Explicit is better than implicit."), Sentence("Simple is better than complex.")]

Part of Speech Tagging

text = Text(u"O primeiro uso de desobediência civil em massa ocorreu em setembro de 1906.")

print("{:<16}{}".format("Word", "POS Tag")+"\n"+"-"*30)
for word, tag in text.pos_tags:
    print(u"{:<16}{:>2}".format(word, tag))
Word            POS Tag
------------------------------
O               DET
primeiro        ADJ
uso             NOUN
de              ADP
desobediência   NOUN
civil           ADJ
em              ADP
massa           NOUN
ocorreu         ADJ
em              ADP
setembro        NOUN
de              ADP
1906            NUM
.               PUNCT

Named Entity Recognition

text = Text(u"In Großbritannien war Gandhi mit dem westlichen Lebensstil vertraut geworden")
print(text.entities)
[I-LOC([u'Groxdfbritannien']), I-PER([u'Gandhi'])]

Polarity

print("{:<16}{}".format("Word", "Polarity")+"\n"+"-"*30)
for w in zen.words[:6]:
    print("{:<16}{:>2}".format(w, w.polarity))
Word            Polarity
------------------------------
Beautiful        0
is               0
better           1
than             0
ugly            -1
.                0

Embeddings

word = Word("Obama", language="en")
print("Neighbors (Synonms) of {}".format(word)+"\n"+"-"*30)
for w in word.neighbors:
    print("{:<16}".format(w))
print("\n\nThe first 10 dimensions out the {} dimensions\n".format(word.vector.shape[0]))
print(word.vector[:10])
Neighbors (Synonms) of Obama
------------------------------
Bush
Reagan
Clinton
Ahmadinejad
Nixon
Karzai
McCain
Biden
Huckabee
Lula


The first 10 dimensions out the 256 dimensions

[-2.57382345  1.52175975  0.51070285  1.08678675 -0.74386948 -1.18616164
  2.92784619 -0.25694436 -1.40958667 -2.39675403]

Morphology

word = Text("Preprocessing is an essential step.").words[0]
print(word.morphemes)
[u'Pre', u'process', u'ing']

Transliteration

from polyglot.transliteration import Transliterator
transliterator = Transliterator(source_lang="en", target_lang="ru")
print(transliterator.transliterate(u"preprocessing"))
препрокессинг

History

“14.11” (2014-01-11)

  • First release on PyPI.

“15.5.2” (2015-05-02)

  • Polyglot is feature complete.

“15.10.03” (2015-10-03)

  • Change the polyglot models mirror to Stony Brook University DSL lab instead of Google cloud storage.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polyglot-15.10.03.tar.gz (126.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polyglot-15.10.03-py2.py3-none-any.whl (54.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file polyglot-15.10.03.tar.gz.

File metadata

  • Download URL: polyglot-15.10.03.tar.gz
  • Upload date:
  • Size: 126.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for polyglot-15.10.03.tar.gz
Algorithm Hash digest
SHA256 c63566ea655e8790f1fb8a3f5f60626418e0de085dd5447c4e43297a58e49f20
MD5 ca114b46b4f6150c6a8388c3eb4631da
BLAKE2b-256 d3c3fee35a094d07a3f19142ba64fa32446af2ee23584b5ee3c1f60519dc3b72

See more details on using hashes here.

File details

Details for the file polyglot-15.10.03-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for polyglot-15.10.03-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f763f8ffb85f9b24ca1dde3e4fd41d1f5db8cbec32c1703b3fe37e8c5f1c64d9
MD5 9aaeac6ece72e4ace1703f793fdd9dd5
BLAKE2b-256 c3e9fe3669dbc44b4c4d1e4dc01e62dd89f5513d5ad537516284136f55002627

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page