stop-words

Get list of common stop words in various languages in Python

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Alir3z4

These details have not been verified by PyPI

Project description

Overview

A Python library providing curated lists of stop words across 34+ languages. Stop words are common words (like “the”, “is”, “at”) that are typically filtered out in natural language processing and text analysis tasks.

Key Features:

34+ Languages - Extensive language support.
Performance - Built-in caching for fast repeated access.
Flexible - Custom filtering system for advanced use cases.
Zero Dependencies - Lightweight with no external requirements.

Available Languages

All the available languages supported by https://github.com/Alir3z4/stop-words

Each language is identified by both its ISO 639-1 language code (e.g., en) and full name (e.g., english).

Installation

Via pip (Recommended):

$ pip install stop-words

Via Git:

$ git clone --recursive https://github.com/Alir3z4/python-stop-words.git
$ cd python-stop-words
$ pip install -e .

Requirements:

Usually any version of Python that supports type hints and probably has not been marked as EOL.

Quick Start

Basic Usage

from stop_words import get_stop_words

# Get English stop words using language code
stop_words = get_stop_words('en')

# Or use the full language name
stop_words = get_stop_words('english')

# Use in text processing
text = "The quick brown fox jumps over the lazy dog"
words = text.lower().split()
filtered_words = [word for word in words if word not in stop_words]
print(filtered_words)  # ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']

Safe Loading

Use safe_get_stop_words() when you’re not sure if a language is supported:

from stop_words import safe_get_stop_words

# Returns empty list instead of raising an exception
stop_words = safe_get_stop_words('klingon')  # Returns []

# Works normally with supported languages
stop_words = safe_get_stop_words('fr')  # Returns French stop words

Advanced Usage

Checking Available Languages

from stop_words import AVAILABLE_LANGUAGES, LANGUAGE_MAPPING

# List all available languages
print(AVAILABLE_LANGUAGES)
# ['arabic', 'bulgarian', 'catalan', ...]

# View language code mappings
print(LANGUAGE_MAPPING)
# {'en': 'english', 'fr': 'french', ...}

Caching Control

By default, stop words are cached for performance. You can control this behavior:

from stop_words import get_stop_words, STOP_WORDS_CACHE

# Disable caching for this call
stop_words = get_stop_words('en', cache=False)

# Clear the cache manually
STOP_WORDS_CACHE.clear()

# Check what's cached
print(STOP_WORDS_CACHE.keys())  # ['english', 'french', ...]

Custom Filters

Apply custom transformations to stop words using the filter system:

from stop_words import get_stop_words, add_filter, remove_filter

# Add a global filter (applies to all languages)
def remove_short_words(words, language):
    """Remove words shorter than 3 characters."""
    return [w for w in words if len(w) >= 3]

add_filter(remove_short_words)
stop_words = get_stop_words('en', cache=False)

# Add a language-specific filter
def uppercase_words(words):
    """Convert all words to uppercase."""
    return [w.upper() for w in words]

add_filter(uppercase_words, language='english')
stop_words = get_stop_words('en', cache=False)

# Remove a filter when done
remove_filter(uppercase_words, language='english')

Note: Filters only apply to newly loaded stop words, not cached ones. Use cache=False or clear the cache to apply new filters.

Practical Examples

Text Preprocessing

from stop_words import get_stop_words
import re

def preprocess_text(text, language='en'):
    """Clean and filter text for NLP tasks."""
    stop_words = set(get_stop_words(language))

    # Convert to lowercase and extract words
    words = re.findall(r'\b\w+\b', text.lower())

    # Remove stop words
    filtered_words = [w for w in words if w not in stop_words]

    return filtered_words

text = "The quick brown fox jumps over the lazy dog"
print(preprocess_text(text))
# ['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']

Multilingual Processing

from stop_words import get_stop_words

def filter_multilingual_text(texts_dict):
    """Process texts in multiple languages.

    Args:
        texts_dict: Dictionary mapping language codes to text strings

    Returns:
        Dictionary with filtered words for each language
    """
    results = {}

    for lang_code, text in texts_dict.items():
        stop_words = set(get_stop_words(lang_code))
        words = text.lower().split()
        filtered = [w for w in words if w not in stop_words]
        results[lang_code] = filtered

    return results

texts = {
    'en': 'The cat is on the table',
    'fr': 'Le chat est sur la table',
    'es': 'El gato está en la mesa'
}

print(filter_multilingual_text(texts))

Keyword Extraction

from stop_words import get_stop_words
from collections import Counter
import re

def extract_keywords(text, language='en', top_n=10):
    """Extract the most common meaningful words from text."""
    stop_words = set(get_stop_words(language))

    # Extract words and filter
    words = re.findall(r'\b\w+\b', text.lower())
    meaningful_words = [w for w in words if w not in stop_words and len(w) > 2]

    # Count and return top keywords
    word_counts = Counter(meaningful_words)
    return word_counts.most_common(top_n)

article = """
Python is a high-level programming language. Python is known for its
simplicity and readability. Many developers choose Python for data science.
"""

keywords = extract_keywords(article)
print(keywords)
# [('python', 3), ('language', 1), ('high-level', 1), ...]

API Reference

Functions

get_stop_words(language, *, cache=True)

Load stop words for a specified language.

Parameters:

language (str): Language code (e.g., ‘en’) or full name (e.g., ‘english’)
cache (bool, optional): Enable caching. Defaults to True.

Returns:

list[str]: List of stop words

Raises:

StopWordError: If language is unavailable or files are unreadable

Example:

stop_words = get_stop_words('en')
stop_words = get_stop_words('french', cache=False)

safe_get_stop_words(language)

Safely load stop words, returning empty list on error.

Parameters:

language (str): Language code or full name

Returns:

list[str]: Stop words, or empty list if unavailable

Example:

stop_words = safe_get_stop_words('unknown')  # Returns []

add_filter(func, language=None)

Parameters:

func (Callable): Filter function
language (str | None, optional): Language code or None for global filter

Filter Signatures:

Language-specific: func(stopwords: list[str]) -> list[str]
Global: func(stopwords: list[str], language: str) -> list[str]

Example:

def remove_short(words, lang):
    return [w for w in words if len(w) > 3]

add_filter(remove_short)  # Global filter

remove_filter(func, language=None)

Remove a previously registered filter.

Parameters:

func (Callable): The filter function to remove
language (str | None, optional): Language code or None

Returns:

bool: True if removed, False if not found

Example:

success = remove_filter(my_filter, language='english')

Constants

AVAILABLE_LANGUAGES

List of all supported language names.

['arabic', 'bulgarian', 'catalan', ...]

LANGUAGE_MAPPING

Dictionary mapping language codes to full names.

{'en': 'english', 'fr': 'french', 'de': 'german', ...}

STOP_WORDS_CACHE

Dictionary storing cached stop words. Can be manually cleared.

STOP_WORDS_CACHE.clear()  # Clear all cached data

Exceptions

StopWordError

Raised when a language is unavailable or files cannot be read.

try:
    stop_words = get_stop_words('invalid')
except StopWordError as e:
    print(f"Error: {e}")

Performance Tips

Use caching - Keep cache=True (default) for repeated access to the same language

Reuse stop word sets - Convert to set() once for O(1) lookup performance:

stop_words_set = set(get_stop_words('en'))
# Fast membership testing
is_stop_word = 'the' in stop_words_set

Preload languages - Load stop words during initialization, not in tight loops
Use safe_get_stop_words - Avoid try/except overhead when language availability is uncertain

Troubleshooting

“Language unavailable” error

Check spelling and use either the language code or full name
Verify the language is in AVAILABLE_LANGUAGES
See the Available Languages table above

“File is unreadable” error

Ensure the package installed correctly: pip install --force-reinstall stop-words
Check file permissions in the installation directory
Verify the stop-words subdirectory exists in the package

Filters not applying

Filters only affect newly loaded stop words
Clear the cache: STOP_WORDS_CACHE.clear()
Use cache=False when testing filters

Performance issues

Ensure caching is enabled (default behavior)
Convert stop word lists to sets for faster lookups
Preload stop words outside of loops

Contributing

Contributions are welcome! Here’s how you can help:

Add new languages - Submit stop word lists for unsupported languages via https://github.com/Alir3z4/stop-words
Improve existing lists - Suggest additions or removals for existing languages via https://github.com/Alir3z4/stop-words
Report bugs - Open issues on GitHub
Submit PRs - Fix bugs or add features

Repository: https://github.com/Alir3z4/python-stop-words

License

This project is licensed under the BSD 3-Clause License. See LICENSE file for details.

Changelog

See ChangeLog.rst for version history.

Support

Issues: https://github.com/Alir3z4/python-stop-words/issues
PyPI: https://pypi.org/project/stop-words/

Credits

Maintained by Alireza Savand
Stop word lists compiled from various open sources
Contributors: See GitHub contributors

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Alir3z4

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2025.11.4

Nov 3, 2025

2018.7.23

Jul 23, 2018

2015.2.23.1

Feb 23, 2015

2015.2.23

Feb 23, 2015

2015.2.21

Feb 21, 2015

2015.1.31

Feb 1, 2015

2015.1.22

Jan 22, 2015

2015.1.19

Jan 19, 2015

2014.5.26

May 26, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stop_words-2025.11.4.tar.gz (68.6 kB view details)

Uploaded Nov 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stop_words-2025.11.4-py3-none-any.whl (59.5 kB view details)

Uploaded Nov 3, 2025 Python 3

File details

Details for the file stop_words-2025.11.4.tar.gz.

File metadata

Download URL: stop_words-2025.11.4.tar.gz
Upload date: Nov 3, 2025
Size: 68.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stop_words-2025.11.4.tar.gz
Algorithm	Hash digest
SHA256	`0459072b54b11e43a6fb4c5b05bda87d2accfc4f14c1697974f3739af0f7b43d`
MD5	`0f8bbd9b602626c4c1268bbb01f781e9`
BLAKE2b-256	`b7cb27ee3d3e0b7b1169269e83331c075b2dd3c4bcc1a005821174c32a273dc4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stop_words-2025.11.4.tar.gz:

Publisher: pypi.yml on Alir3z4/python-stop-words

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stop_words-2025.11.4.tar.gz
- Subject digest: 0459072b54b11e43a6fb4c5b05bda87d2accfc4f14c1697974f3739af0f7b43d
- Sigstore transparency entry: 664122196
- Sigstore integration time: Nov 3, 2025
Source repository:
- Permalink: Alir3z4/python-stop-words@2c89a84edbdc0636090fa539f26249273e5cdef3
- Branch / Tag: refs/tags/2025.11.4
- Owner: https://github.com/Alir3z4
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@2c89a84edbdc0636090fa539f26249273e5cdef3
- Trigger Event: release

File details

Details for the file stop_words-2025.11.4-py3-none-any.whl.

File metadata

Download URL: stop_words-2025.11.4-py3-none-any.whl
Upload date: Nov 3, 2025
Size: 59.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stop_words-2025.11.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3fc0722e42b722a9350aad59a8ba5850085a5b45a4ba9de390b4f5c4b86df25`
MD5	`5e13fc1507a7286b246df694ab2ace83`
BLAKE2b-256	`fcf5992d668d21590ed39c6a9d1c62220e9b4b086a165e15fcb7580764cc7ceb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stop_words-2025.11.4-py3-none-any.whl:

Publisher: pypi.yml on Alir3z4/python-stop-words

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stop_words-2025.11.4-py3-none-any.whl
- Subject digest: b3fc0722e42b722a9350aad59a8ba5850085a5b45a4ba9de390b4f5c4b86df25
- Sigstore transparency entry: 664122215
- Sigstore integration time: Nov 3, 2025
Source repository:
- Permalink: Alir3z4/python-stop-words@2c89a84edbdc0636090fa539f26249273e5cdef3
- Branch / Tag: refs/tags/2025.11.4
- Owner: https://github.com/Alir3z4
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi.yml@2c89a84edbdc0636090fa539f26249273e5cdef3
- Trigger Event: release

stop-words 2025.11.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

get_stop_words(language, *, cache=True)

safe_get_stop_words(language)

add_filter(func, language=None)

remove_filter(func, language=None)

AVAILABLE_LANGUAGES

LANGUAGE_MAPPING

STOP_WORDS_CACHE

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance