Skip to main content

This is for text preprocessing

Project description

Text and Tweet Preprocessing package

This package is created by Behdad (Ben) Ehsani. The package is designed for cleaning tweets on Twitter immediately and with one-shot coding. Additionally, some functions can be used for text preprocessing. An example is provided to demonstrate efficient usage.

Installing the library

pip install preprocessing-text-ben

Unistalling the library

pip uninstall preprocessing-text-ben

Example of one-shot cleaning the code:

import preprocessing-text-ben as pp

def get_clean(x):
    
    # Convert the string to lowercase
    x = str(x).lower()
    
    # Expand contractions like "don't" to "do not"
    x = pp.cont_to_exp(x)
    
    # Remove any email addresses from the string
    x = pp.remove_emails(x)
    
    # Remove any URLs from the string
    x = pp.remove_urls(x)
    
    # Remove any HTML tags from the string
    x = pp.remove_html_tags(x)
    
    # Remove any retweet tags (RT) from the string
    x = pp.remove_rt(x)
    
    # Remove any accented characters from the string
    x = pp.remove_accented_chars(x)
    
    # Remove any special characters from the string
    x = pp.remove_special_chars(x)
    
    # Return the cleaned string
    return x


#here is the cleaned text in one shot
df['your_cleaned_column'] = df['your_text_column'].apply(lambda x: get_clean(x))

version: 0.0.1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text_tweet_ben-0.0.1.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

text_tweet_ben-0.0.1-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file text_tweet_ben-0.0.1.tar.gz.

File metadata

  • Download URL: text_tweet_ben-0.0.1.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for text_tweet_ben-0.0.1.tar.gz
Algorithm Hash digest
SHA256 55a51bda68b9b78c928d5632a544aa7d2bfe3ad1c6fc3d93be42076e3fe61522
MD5 b19d7c111499e4668f46741a0a52ad21
BLAKE2b-256 14e0de8311a7c78b4559e1a53ce55c3cde1685c4eecc1b8ea9d768e95b0b9066

See more details on using hashes here.

File details

Details for the file text_tweet_ben-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: text_tweet_ben-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for text_tweet_ben-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fb338bfcce3362f263bacbc977e6611d6afdc6ffb0ff8130ba284df9fdb4db14
MD5 b6844463d03f481d6379672caa0c008f
BLAKE2b-256 b80fd3ba1b9b23642e488dadd6ae2239e4f9223005772cc041faf7e0ff97ab5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page