Skip to main content

A comprehensive text extraction tool supporting multiple file formats

Project description

Kvell Extraction

A comprehensive text extraction tool that supports multiple file formats:

  • PDF files
  • Images (jpg, jpeg, png, bmp)
  • Word documents (doc, docx, docm, dot, dotx, dotm)
  • Excel files (xlsx, xls)
  • PowerPoint presentations (pptx, potx)

Installation

pip install kvell-extraction

Usage

PDF Extraction

rom kvell_extraction import PDFExtracter
pdf_extracter = PDFExtracter()
pdf_path = 'document.pdf'
texts = pdf_extracter(pdf_path)
print(texts)

Image Extraction

from kvell_extraction import ImageExtracter
img_extracter = ImageExtracter()
img_path = 'image.png'
texts = img_extracter(img_path)
print(texts)

Word Document Extraction

from kvell_extraction import DocExtracter
doc_extracter = DocExtracter()
doc_path = 'document.docx'
texts = doc_extracter(doc_path)
print(texts)

Excel Extraction

from kvell_extraction import ExcelExtracter
excel_extracter = ExcelExtracter()
excel_path = 'spreadsheet.xlsx'
texts = excel_extracter(excel_path)
print(texts)

PowerPoint Extraction

from kvell_extraction import PresentationExtracter
ppt_extracter = PresentationExtracter()
ppt_path = 'presentation.pptx'
texts = ppt_extracter(ppt_path)
print(texts)

Return Format

All extracters return a list of lists, where each inner list contains:

  • Page/slide number (string)
  • Extracted text (string)
  • Confidence score (string, usually "1.0")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kvell_extraction-0.0.3.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kvell_extraction-0.0.3-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file kvell_extraction-0.0.3.tar.gz.

File metadata

  • Download URL: kvell_extraction-0.0.3.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for kvell_extraction-0.0.3.tar.gz
Algorithm Hash digest
SHA256 350309e105dc0ebf7be757514e0f901c403089452e56fffbc17b178cd58dcd7f
MD5 79ad15b3f13d137751f6afc0e6ee018b
BLAKE2b-256 a737b7783395929e46004a7ca0fd6cd71de3c2c7d8e0eaa4f9df34d2f0758e7f

See more details on using hashes here.

File details

Details for the file kvell_extraction-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for kvell_extraction-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a4e9be162dd4e8e7967b4e50d9f6de44edf36fd0ec929396380e16a08fa65d85
MD5 c9612eff2123fde1521fff9be76df9e6
BLAKE2b-256 ab296024fca3219b135d9e1ce7d6f883cfb2d0893778004261568a60164c69e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page