Skip to main content

textpipe: clean and extract metadata from text

Project description

Build Status Codacy Badge

textpipe: clean and extract metadata from text

textpipe is a Python package for converting raw text in to clean, readable text and extracting metadata from that text. Its functionalities include transforming raw text into readable text by removing HTML tags and extracting metadata such as the number of words and named entities from the text.

Features

  • Clean raw text by removing HTML and other unreadable constructs
  • Identify the language of text
  • Extract the number of words, number of sentences, named entities from a text
  • Calculate the complexity of a text
  • Obtain text metadata by specifying a pipeline containing all desired elements

Usage example

>>> import textpipe 
>>> sample_text = 'Sample text! <!DOCTYPE>'
>>> doc = textpipe.Doc(sample_text)
>>> print(doc.clean_text)
'Sample text!'
>>> print(doc.language)
'en'
>>> print(doc.nwords)
2

>>> pipe = Pipeline(['clean_text', 'language', 'nwords'])
>>> print(pipe(sample_text))
("clean_text":'Sample text!', "language": 'en', "nwords": 2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textpipe-0.1.0.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

textpipe-0.1.0-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file textpipe-0.1.0.tar.gz.

File metadata

  • Download URL: textpipe-0.1.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for textpipe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e60797d6067c8604cce191faf1317dc416fa138a67eb40147f858970b700334e
MD5 498f9bc4736a98faabe02c97c6446e64
BLAKE2b-256 a29931a8f229f894da1e014f8a4050de9263a02da995a5692d84e27469a6a6e0

See more details on using hashes here.

File details

Details for the file textpipe-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for textpipe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a58e2775c59501cc35d312fdfc71032544214b3ff7c1b128f0da3f7a04a4631
MD5 9b171fec8135082aec61cde2f7f6e451
BLAKE2b-256 90b2ddfffc1fa95e69e4de712338ac1994ef032ad54f3a501f8c6d8623dff7bf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page