A simple tool for text extraction from pdf, epub, txt, and docx files
Project description
extractText
A simple tool for extracting text from PDF, EPUB, TXT, and DOCX files. This library was primarily developed for personal use in various NLP-related projects.
Installation
Install text-extra
using pip:
pip install text-extra
Usage
from text_extractor import extract_text
file_path = "path/to/your/file.pdf"
extracted_text = extract_text(file_path)
print(extracted_text)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
text-extra-0.1.3.tar.gz
(41.4 kB
view hashes)
Built Distribution
text_extra-0.1.3-py3-none-any.whl
(28.5 kB
view hashes)
Close
Hashes for text_extra-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30bb1fd3133f93d0d6f9529ef4a93102baef9999f5f4513459bf13e8f4c18537 |
|
MD5 | 9cffecc9a569703dab6644c43ded07a9 |
|
BLAKE2b-256 | 65952282e324463c48370e539dfeb7913b9e573c0aa5f3b0ec34b09f87794734 |