Xtracture is an open source library designed to efficiently extract arbitrary elements from documents.
Project description
Xtracture: Open Source Document Content Extractuion Library
Xtracture is an open source library designed to efficiently extract arbitrary elements from documents.
Features
- Natural language rule creation using LLMs
- Switchable OCR engines for optimized perfomance and accuracy
prerequirements
- OpenAI API Key (for LLM rule creation)
Installation
pip install -U xtracture
Usage
Use Google Cloud Vision API
Google CLoud Vision Credentials must be correctly configured.
see examples/google_cloud_vision_example.py
.
Use Tesseract
Tesseract must be installed beforehand.
see examples/tesseract_example.py
.
Use only GPT Extractor
You can input OCR-processed text file.
see examples/lambda_example.py
.
License
Xtracture is released under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
xtracture-0.4.0.tar.gz
(3.6 kB
view hashes)
Built Distribution
Close
Hashes for xtracture-0.4.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 903b5808812924a44da3de79388bc671a463306a66f563a53d9bb388f48853ab |
|
MD5 | 130633b04333762c2a53361f8b4f8aa1 |
|
BLAKE2b-256 | 079f8075ad9349c3ee7d7eae90253db70f7fd7838bb2a0465f721a55e44a5b77 |