Skip to main content

A text PREprocessor for TWeets in the ITAlian language

Project description

PreTwITA

PreTwITA is an open source Preprocessor for Tweets in the ITAlian language written in Python. The purpose of such library is to provide the user with language-specific tools for text cleaning (i.e. the process of preparing raw text for Natural Language Processing).

Included features

  • correction of most common italian abbreviations (e.g. xk replaced with perché)
  • remove urls
  • remove emojis
  • remove emoticons
  • remove mentions
  • remove hashtags
  • remove twitter reserved words (i.e. 'rt' and 'fav')
  • remove stopwords
    • an option to define additional stopwords
  • remove punctuation
  • remove numbers
    • an option to avoid removing dates in yyyy format
  • remove multiple spaces
  • tokenization

Usage

For usage and tips, please refer to the demo.ipynb file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pretwita-0.1.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pretwita-0.1-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file pretwita-0.1.tar.gz.

File metadata

  • Download URL: pretwita-0.1.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.21.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for pretwita-0.1.tar.gz
Algorithm Hash digest
SHA256 202b70444febc0aa60b5c5a2d74135be02ffc78434bc18bc136c49289426c92c
MD5 69a2ebe99d4154a5057ee20ff3afbc06
BLAKE2b-256 10e3fe63aa06f49d4d37b79cad58dcf681a07cfb85345d8c2afe0b11cd0f3e9a

See more details on using hashes here.

File details

Details for the file pretwita-0.1-py3-none-any.whl.

File metadata

  • Download URL: pretwita-0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.21.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for pretwita-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b128730e30a68b94b7dd7e66c0bc9b3ee190a2f8d90c5e01313c0663c3e6c872
MD5 a5517c8c03b5b0f792129812caf4c291
BLAKE2b-256 98d37cb053bc932506c25619cdecabeb9679f08e3f002bacbbf1f6a34658b9b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page