Skip to main content

This is for text preprocessing

Project description

Text and Tweet Preprocessing package

This package is created by Behdad (Ben) Ehsani. The package is designed for cleaning tweets on Twitter immediately and with one-shot coding. Additionally, some functions can be used for text preprocessing. An example is provided to demonstrate efficient usage.

Installing the library

pip install preprocessing-text-ben

Unistalling the library

pip uninstall preprocessing-text-ben

Example of one-shot cleaning the code:

import preprocessing-text-ben as pp

def get_clean(x):
    
    # Convert the string to lowercase
    x = str(x).lower()
    
    # Expand contractions like "don't" to "do not"
    x = pp.cont_to_exp(x)
    
    # Remove any email addresses from the string
    x = pp.remove_emails(x)
    
    # Remove any URLs from the string
    x = pp.remove_urls(x)
    
    # Remove any HTML tags from the string
    x = pp.remove_html_tags(x)
    
    # Remove any retweet tags (RT) from the string
    x = pp.remove_rt(x)
    
    # Remove any accented characters from the string
    x = pp.remove_accented_chars(x)
    
    # Remove any special characters from the string
    x = pp.remove_special_chars(x)
    
    # Return the cleaned string
    return x


#here is the cleaned text in one shot
df['your_cleaned_column'] = df['your_text_column'].apply(lambda x: get_clean(x))

version: 0.0.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tweetben-0.0.1.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tweetben-0.0.1-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file tweetben-0.0.1.tar.gz.

File metadata

  • Download URL: tweetben-0.0.1.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for tweetben-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1fa997dece0e1121d684022b87af9e24e9d328b89f2b8e38c6715c3d729b1ced
MD5 1b4035448cf64ab097c487374fe7f270
BLAKE2b-256 a30b2d7e915e08bb9da05a34bb4c601d0acebdceef47dd0dc6ba21fd53f18b04

See more details on using hashes here.

File details

Details for the file tweetben-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: tweetben-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for tweetben-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aa726d3e375a3c712db382eb4a0faee02a633c26c5361940cc169027b2935aae
MD5 28140dde169246506a2f0558e765a671
BLAKE2b-256 b1038d3447f7037d88066233ca7c10197e3c0894aed9962cb42c13a800807ae4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page