Skip to main content

A comprehensive text extraction tool supporting multiple file formats

Project description

Kvell Extraction

A comprehensive text extraction tool that supports multiple file formats:

  • PDF files
  • Images (jpg, jpeg, png, bmp)
  • Word documents (doc, docx, docm, dot, dotx, dotm)
  • Excel files (xlsx, xls)
  • PowerPoint presentations (pptx, potx)

Installation

pip install kvell-extraction

Usage

PDF Extraction

from kvell_extraction import PDFExtracter
pdf_extracter = PDFExtracter()
pdf_path = 'document.pdf'
texts = pdf_extracter(pdf_path)
print(texts)

Image Extraction

from kvell_extraction import ImageExtracter
img_extracter = ImageExtracter()
img_path = 'image.png'
texts = img_extracter(img_path)
print(texts)

Word Document Extraction

from kvell_extraction import DocExtracter
doc_extracter = DocExtracter()
doc_path = 'document.docx'
texts = doc_extracter(doc_path)
print(texts)

Excel Extraction

from kvell_extraction import ExcelExtracter
excel_extracter = ExcelExtracter()
excel_path = 'spreadsheet.xlsx'
texts = excel_extracter(excel_path)
print(texts)

PowerPoint Extraction

from kvell_extraction import PresentationExtracter
ppt_extracter = PresentationExtracter()
ppt_path = 'presentation.pptx'
texts = ppt_extracter(ppt_path)
print(texts)

Return Format

All extracters return a list of lists, where each inner list contains:

  • Page/slide number (string)
  • Extracted text (string)
  • Confidence score (string, usually "1.0")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kvell_extraction-0.0.6.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kvell_extraction-0.0.6-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file kvell_extraction-0.0.6.tar.gz.

File metadata

  • Download URL: kvell_extraction-0.0.6.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for kvell_extraction-0.0.6.tar.gz
Algorithm Hash digest
SHA256 cbb33bb66ae42542fb3d7469209d1371747cd207453cfff2abc8328deb20b63b
MD5 62654ae6541c6e2770408c4f7aa590e6
BLAKE2b-256 6902e703b0d3025cfd0cea4bb490427ec68d5aa5df26d86c18fe29522e43c8f8

See more details on using hashes here.

File details

Details for the file kvell_extraction-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for kvell_extraction-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 16592d1c73318ae25bdbe749202d9e7209cbbb72630bb117b2113ef41fca2fd3
MD5 be4bb77892c14426ecf18a8c4eda3b03
BLAKE2b-256 a9d8b4ecb1730cd08b0be7855268970cea074f4aa15f58e8620c63c9925d1bbc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page