Skip to main content

Python library for managing stop words in many languages.

Project description

Build Status - develop branch Coverage of the code

Python library for managing common stop words in 39 languages.

Usage

Simple

Better than a long speech, here a direct introduction:

>>> from mots_vides import stop_words

>>> english_stop_words = stop_words('en')
>>> text = """
... Even though using "lorem ipsum" often arouses curiosity
... due to its resemblance to classical Latin,
... it is not intended to have meaning.
... """

>>> print(english_stop_words.rebase(text))
XXXX XXXXXX XXXXX "lorem ipsum" XXXXX arouses curiosity
XXX XX XXX resemblance XX classical Latin,
XX XX XXX intended XX XXXX meaning.

>>> print(english_stop_words.rebase(text, '').split())
['"lorem', 'ipsum"', 'arouses', 'curiosity', 'resemblance',
'classical', 'Latin,', 'intended', 'meaning.']

Advanced

Mots vides also provides two classes for managing the stop words in your language.

StopWord which is a container for a collection of stop words. By default is language agnostic, but can be easily manipulated to create the collection:

>>> from mots_vides import StopWord

>>> french_stop_words = StopWord('french', ['le', 'la', 'les'])
>>> french_stop_words += StopWord('french', ['un', 'une', 'des'])
>>> french_stop_words += ['or', 'ni', 'car']
>>> french_stop_words += 'assez'
>>> french_stop_words += u'aussitôt'
>>> print(sorted(french_stop_words))
['assez', u'aussitôt', 'car', 'des', 'la', 'le', 'les', 'ni', 'or', 'un', 'une']

StopWordFactory is a factory for initializing StopWord objects by language and the appropriate collection of stop words.

>>> from mots_vides import StopWordFactory

>>> factory = StopWordFactory()
>>> french_stop_words = factory.get_stop_words('french')
>>> print(len(french_stop_words))
577

You can also use international language code to query a collection:

>>> french_stop_words = factory.get_stop_words('fr')
>>> print(len(french_stop_words))
577

If the required language does not exist a StopWordError is raised, unless the fail_safe parameter is set to True:

>>> klingon_stop_words = factory.get_stop_words('klingon')
StopWordError: Stop words are not available in "klingon".
>>> klingon_stop_words = factory.get_stop_words('klingon', fail_safe=True)
>>> print(len(klingon_stop_words))
0

Supported languages

  • Arabic

  • Armenian

  • Basque

  • Bengali

  • Bulgarian

  • Catalan

  • Chinese

  • Czech

  • Danish

  • Dutch

  • English

  • Finnish

  • French

  • Galician

  • German

  • Greek

  • Hindi

  • Hungarian

  • Indonesian

  • Irish

  • Italian

  • Japanese

  • Korean

  • Latvian

  • Lithuanian

  • Marathi

  • Norwegian

  • Persian

  • Polish

  • Portuguese

  • Romanian

  • Russian

  • Slovak

  • Spanish

  • Swedish

  • Thai

  • Turkish

  • Ukrainian

  • Urdu

Compatibility

Tested with Python 2.6, 2.7, 3.2, 3.3, 3.4.

Authors

Notes

Mots vides means stop words in french.

Inspired from https://github.com/Alir3z4/python-stop-words

Changelog

2015.5.11

  • Fix cache system for Python 3

2015.2.6

  • Fix potential issue in factory.get_available_languages

2015.2.5

  • Fix packaging

  • Add a rebaser command script

2015.2.4

  • Initial release

2015.1.21.dev0

  • Development release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mots-vides-2015.5.11.tar.gz (53.0 kB view details)

Uploaded Source

Built Distribution

mots_vides-2015.5.11-py2.py3-none-any.whl (59.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file mots-vides-2015.5.11.tar.gz.

File metadata

File hashes

Hashes for mots-vides-2015.5.11.tar.gz
Algorithm Hash digest
SHA256 cfbc05d38538af21e20e7b1c44c82076f1489c8d4949019f184f0765c8bf6a44
MD5 3ff563a9a9fa306b604b32c059db7c1d
BLAKE2b-256 cef35e55cedd94550cbfd9dd62d48cd0d535de049e8a49e066ca720326101bbe

See more details on using hashes here.

File details

Details for the file mots_vides-2015.5.11-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for mots_vides-2015.5.11-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5c00af05234f4021396c6d888c8e34142cfe880fe732ff063f6cfad2d6342dc8
MD5 609dbfa50fbd094feefcfd2964faaa87
BLAKE2b-256 9534f5a4ec9cfad0e484b087de46e381efc991d5fde07412de51b85f59853ed7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page