Skip to main content

Xtracture is an open source library designed to efficiently extract arbitrary elements from documents.

Project description

Xtracture: Open Source Document Content Extractuion Library

Xtracture is an open source library designed to efficiently extract arbitrary elements from documents.

Features

  • Natural language rule creation using LLMs
  • Switchable OCR engines for optimized perfomance and accuracy

prerequirements

  • OpenAI API Key (for LLM rule creation)

Installation

pip install -U xtracture

Usage

Use Google Cloud Vision API

Google CLoud Vision Credentials must be correctly configured.

see examples/google_cloud_vision_example.py.

Use Tesseract

Tesseract must be installed beforehand.

see examples/tesseract_example.py.

Use only GPT Extractor

You can input OCR-processed text file. see examples/lambda_example.py.

License

Xtracture is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xtracture-0.4.0.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xtracture-0.4.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file xtracture-0.4.0.tar.gz.

File metadata

  • Download URL: xtracture-0.4.0.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.2 Darwin/22.3.0

File hashes

Hashes for xtracture-0.4.0.tar.gz
Algorithm Hash digest
SHA256 b33c3bda58978e9d63dc159322882eb0b74c04669a537003d00551436599ffb9
MD5 90a0b475296d328acf7475ecc2315d16
BLAKE2b-256 620f5ba6fc73dbe2b5ce0566664402ddbbd7f11285ea644aa9300fa3613bb0d9

See more details on using hashes here.

File details

Details for the file xtracture-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: xtracture-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.2 Darwin/22.3.0

File hashes

Hashes for xtracture-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 903b5808812924a44da3de79388bc671a463306a66f563a53d9bb388f48853ab
MD5 130633b04333762c2a53361f8b4f8aa1
BLAKE2b-256 079f8075ad9349c3ee7d7eae90253db70f7fd7838bb2a0465f721a55e44a5b77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page