Skip to main content

An open-source python package to clean raw text data

Project description


cleantext is a an open-source python package to clean raw text data. Source code for the library can be found here.


cleantext has two main methods,

  • clean: to clean raw text and return the cleaned text
  • clean_words: to clean raw text and return a list of clean words

cleantext can apply all, or a selected combination of the following cleaning operations:

  • Remove extra white spaces
  • Convert the entire text into a uniform lowercase
  • Remove digits from the text
  • Remove punctuations from the text
  • Remove stop words, and choose a language for stop words ( Stop words are generally the most common words in a language with no significant meaning such as is, am, the, this, are etc.)
  • Stem the words (Stemming is a process of converting words with similar meaning into a single word. For example, stemming of words run, runs, running will result run, run, run)


cleantext requires Python 3 and NLTK to execute.

To install using pip, use

pip install cleantext


  • Import the library:
import cleantext
  • Choose a method:

To return the text in a string format,

cleantext.clean("your_raw_text_here", all= True) 

To return a list of words from the text,

cleantext.clean_words("your_raw_text_here", all= True) 

To choose a specific set of cleaning operations,

all= False # Execute all cleaning operations
extra_spaces=True ,  # Remove extra white space 
stemming=True , # Stem the words
stopwords=True ,# Remove stop words
lowercase=True ,# Convert to lowercase
numbers=True ,# Remove all digits 
punct=True ,# Remove all punctuations
stp_lang='english'  # Language for stop words


import cleantext
cleantext.clean('This is A s$ample !!!! tExt3% to   cleaN566556+2+59*/133', extra_spaces=True, lowercase=True, numbers=True, punct=True)


'this is a sample text to clean'

import cleantext
cleantext.clean_words('This is A s$ample !!!! tExt3% to   cleaN566556+2+59*/133', all=True)


['sampl', 'text', 'clean']



For any questions, issues, bugs, and suggestions please visit here

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for cleantext, version 1.1.3
Filename, size File type Python version Upload date Hashes
Filename, size cleantext-1.1.3-py3-none-any.whl (3.7 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size cleantext-1.1.3.tar.gz (2.6 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page