textpipe: clean and extract metadata from text
Project description
textpipe: clean and extract metadata from text
textpipe
is a Python package for converting raw text in to clean, readable text and
extracting metadata from that text. Its functionalities include transforming
raw text into readable text by removing HTML tags and extracting
metadata such as the number of words and named entities from the text.
Features
- Clean raw text by removing
HTML
and other unreadable constructs - Identify the language of text
- Extract the number of words, number of sentences, named entities from a text
- Calculate the complexity of a text
- Obtain text metadata by specifying a pipeline containing all desired elements
Usage example
>>> import textpipe
>>> sample_text = 'Sample text! <!DOCTYPE>'
>>> doc = textpipe.Doc(sample_text)
>>> print(doc.clean_text)
'Sample text!'
>>> print(doc.language)
'en'
>>> print(doc.nwords)
2
>>> pipe = Pipeline(['clean_text', 'language', 'nwords'])
>>> print(pipe(sample_text))
("clean_text":'Sample text!', "language": 'en', "nwords": 2)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
textpipe-0.1.0.tar.gz
(3.7 kB
view details)
Built Distribution
File details
Details for the file textpipe-0.1.0.tar.gz
.
File metadata
- Download URL: textpipe-0.1.0.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e60797d6067c8604cce191faf1317dc416fa138a67eb40147f858970b700334e |
|
MD5 | 498f9bc4736a98faabe02c97c6446e64 |
|
BLAKE2b-256 | a29931a8f229f894da1e014f8a4050de9263a02da995a5692d84e27469a6a6e0 |
File details
Details for the file textpipe-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: textpipe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a58e2775c59501cc35d312fdfc71032544214b3ff7c1b128f0da3f7a04a4631 |
|
MD5 | 9b171fec8135082aec61cde2f7f6e451 |
|
BLAKE2b-256 | 90b2ddfffc1fa95e69e4de712338ac1994ef032ad54f3a501f8c6d8623dff7bf |