Skip to main content

Python library for managing stop words in many languages.

Project description

Build Status - develop branch Coverage of the code

Python library for managing common stop words in 39 languages.



Better than a long speech, here a direct introduction:

>>> from mots_vides import stop_words

>>> english_stop_words = stop_words('en')
>>> text = """
... Even though using "lorem ipsum" often arouses curiosity
... due to its resemblance to classical Latin,
... it is not intended to have meaning.
... """

>>> print(english_stop_words.rebase(text))
XXXX XXXXXX XXXXX "lorem ipsum" XXXXX arouses curiosity
XXX XX XXX resemblance XX classical Latin,
XX XX XXX intended XX XXXX meaning.

>>> print(english_stop_words.rebase(text, '').split())
['"lorem', 'ipsum"', 'arouses', 'curiosity', 'resemblance',
'classical', 'Latin,', 'intended', 'meaning.']


Mots vides also provides two classes for managing the stop words in your language.

StopWord which is a container for a collection of stop words. By default is language agnostic, but can be easily manipulated to create the collection:

>>> from mots_vides import StopWord

>>> french_stop_words = StopWord('french', ['le', 'la', 'les'])
>>> french_stop_words += StopWord('french', ['un', 'une', 'des'])
>>> french_stop_words += ['or', 'ni', 'car']
>>> french_stop_words += 'assez'
>>> french_stop_words += u'aussitôt'
>>> print(sorted(french_stop_words))
['assez', u'aussitôt', 'car', 'des', 'la', 'le', 'les', 'ni', 'or', 'un', 'une']

StopWordFactory is a factory for initializing StopWord objects by language and the appropriate collection of stop words.

>>> from mots_vides import StopWordFactory

>>> factory = StopWordFactory()
>>> french_stop_words = factory.get_stop_words('french')
>>> print(len(french_stop_words))

You can also use international language code to query a collection:

>>> french_stop_words = factory.get_stop_words('fr')
>>> print(len(french_stop_words))

If the required language does not exist a StopWordError is raised, unless the fail_safe parameter is set to True:

>>> klingon_stop_words = factory.get_stop_words('klingon')
StopWordError: Stop words are not available in "klingon".
>>> klingon_stop_words = factory.get_stop_words('klingon', fail_safe=True)
>>> print(len(klingon_stop_words))

Supported languages

  • Arabic

  • Armenian

  • Basque

  • Bengali

  • Bulgarian

  • Catalan

  • Chinese

  • Czech

  • Danish

  • Dutch

  • English

  • Finnish

  • French

  • Galician

  • German

  • Greek

  • Hindi

  • Hungarian

  • Indonesian

  • Irish

  • Italian

  • Japanese

  • Korean

  • Latvian

  • Lithuanian

  • Marathi

  • Norwegian

  • Persian

  • Polish

  • Portuguese

  • Romanian

  • Russian

  • Slovak

  • Spanish

  • Swedish

  • Thai

  • Turkish

  • Ukrainian

  • Urdu


Tested with Python 2.6, 2.7, 3.2, 3.3, 3.4.



Mots vides means stop words in french.

Inspired from



  • Fix cache system for Python 3


  • Fix potential issue in factory.get_available_languages


  • Fix packaging

  • Add a rebaser command script


  • Initial release


  • Development release

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mots-vides-2015.5.11.tar.gz (53.0 kB view hashes)

Uploaded source

Built Distribution

mots_vides-2015.5.11-py2.py3-none-any.whl (59.5 kB view hashes)

Uploaded 2 7

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page