Skip to main content

A comprehensive text extraction tool supporting multiple file formats

Project description

Kvell Extraction

A comprehensive text extraction tool that supports multiple file formats:

  • PDF files
  • Images (jpg, jpeg, png, bmp)
  • Word documents (doc, docx, docm, dot, dotx, dotm)
  • Excel files (xlsx, xls)
  • PowerPoint presentations (pptx, potx)

Installation

pip install kvell-extraction

Usage

PDF Extraction

from kvell_extraction import PDFExtracter
pdf_extracter = PDFExtracter()
pdf_path = 'document.pdf'
texts = pdf_extracter(pdf_path)
print(texts)

Image Extraction

from kvell_extraction import ImageExtracter
img_extracter = ImageExtracter()
img_path = 'image.png'
texts = img_extracter(img_path)
print(texts)

Word Document Extraction

from kvell_extraction import DocExtracter
doc_extracter = DocExtracter()
doc_path = 'document.docx'
texts = doc_extracter(doc_path)
print(texts)

Excel Extraction

from kvell_extraction import ExcelExtracter
excel_extracter = ExcelExtracter()
excel_path = 'spreadsheet.xlsx'
texts = excel_extracter(excel_path)
print(texts)

PowerPoint Extraction

from kvell_extraction import PresentationExtracter
ppt_extracter = PresentationExtracter()
ppt_path = 'presentation.pptx'
texts = ppt_extracter(ppt_path)
print(texts)

Return Format

All extracters return a list of lists, where each inner list contains:

  • Page/slide number (string)
  • Extracted text (string)
  • Confidence score (string, usually "1.0")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kvell_extraction-0.0.4.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kvell_extraction-0.0.4-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file kvell_extraction-0.0.4.tar.gz.

File metadata

  • Download URL: kvell_extraction-0.0.4.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for kvell_extraction-0.0.4.tar.gz
Algorithm Hash digest
SHA256 4942099281f2c85e731eaa4b0faa6b633ee714435c3e4f7d003c735869ec4efa
MD5 1e7d67ce6a19392d896f2cff4f7ddada
BLAKE2b-256 7aaecd7310cc8a3f2652f4d7421dd16a6c3ff9a148fbf840525375d4dab066a9

See more details on using hashes here.

File details

Details for the file kvell_extraction-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for kvell_extraction-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 34ab48f9bb6d95be619ab2ebb76f666935c162aece135afc0b69b58c8f4b1c84
MD5 e8186a9178f0e19c35ef5315e0ba6644
BLAKE2b-256 728c33e498116e91f6c86891e2162a7a3781a3b9a3cd72baeded02053ea68de2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page