A comprehensive text extraction tool supporting multiple file formats
Project description
Kvell Extraction
A comprehensive text extraction tool that supports multiple file formats:
- PDF files
- Images (jpg, jpeg, png, bmp)
- Word documents (doc, docx, docm, dot, dotx, dotm)
- Excel files (xlsx, xls)
- PowerPoint presentations (pptx, potx)
Installation
pip install kvell-extraction
Usage
PDF Extraction
from kvell_extraction import PDFExtracter
pdf_extracter = PDFExtracter()
pdf_path = 'document.pdf'
texts = pdf_extracter(pdf_path)
print(texts)
Image Extraction
from kvell_extraction import ImageExtracter
img_extracter = ImageExtracter()
img_path = 'image.png'
texts = img_extracter(img_path)
print(texts)
Word Document Extraction
from kvell_extraction import DocExtracter
doc_extracter = DocExtracter()
doc_path = 'document.docx'
texts = doc_extracter(doc_path)
print(texts)
Excel Extraction
from kvell_extraction import ExcelExtracter
excel_extracter = ExcelExtracter()
excel_path = 'spreadsheet.xlsx'
texts = excel_extracter(excel_path)
print(texts)
PowerPoint Extraction
from kvell_extraction import PresentationExtracter
ppt_extracter = PresentationExtracter()
ppt_path = 'presentation.pptx'
texts = ppt_extracter(ppt_path)
print(texts)
Return Format
All extracters return a list of lists, where each inner list contains:
- Page/slide number (string)
- Extracted text (string)
- Confidence score (string, usually "1.0")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kvell_extraction-0.0.4.tar.gz.
File metadata
- Download URL: kvell_extraction-0.0.4.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4942099281f2c85e731eaa4b0faa6b633ee714435c3e4f7d003c735869ec4efa
|
|
| MD5 |
1e7d67ce6a19392d896f2cff4f7ddada
|
|
| BLAKE2b-256 |
7aaecd7310cc8a3f2652f4d7421dd16a6c3ff9a148fbf840525375d4dab066a9
|
File details
Details for the file kvell_extraction-0.0.4-py3-none-any.whl.
File metadata
- Download URL: kvell_extraction-0.0.4-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34ab48f9bb6d95be619ab2ebb76f666935c162aece135afc0b69b58c8f4b1c84
|
|
| MD5 |
e8186a9178f0e19c35ef5315e0ba6644
|
|
| BLAKE2b-256 |
728c33e498116e91f6c86891e2162a7a3781a3b9a3cd72baeded02053ea68de2
|