Extractous Python Binding
Project description
Extractous Python Bindings
This project provides Python bindings for the Extractous library, allowing you to use extractous functionality in your Python applications.
Installation
To install the extractous Python bindings, you can use pip:
pip install extractous
Usage
Extracting a file to string:
from extractous import Extractor
# Create a new extractor
extractor = Extractor()
extractor.set_extract_string_max_length(1000)
# Extract text from a file
result, metadata = extractor.extract_file_to_string("README.md")
print(result)
print(metadata)
Extracting a file(URL / bytearray) to a buffered stream:
from extractous import Extractor
extractor = Extractor()
# for file
reader, metadata = extractor.extract_file("tests/quarkus.pdf")
# for url
# reader, metadata = extractor.extract_url("https://www.google.com")
# for bytearray
# with open("tests/quarkus.pdf", "rb") as file:
# buffer = bytearray(file.read())
# reader, metadata = extractor.extract_bytes(buffer)
result = ""
buffer = reader.read(4096)
while len(buffer) > 0:
result += buffer.decode("utf-8")
buffer = reader.read(4096)
print(result)
print(metadata)
Extracting a file with OCR:
from extractous import Extractor, TesseractOcrConfig
extractor = Extractor().set_ocr_config(TesseractOcrConfig().set_language("deu"))
result, metadata = extractor.extract_file_to_string("../../test_files/documents/eng-ocr.pdf")
print(result)
print(metadata)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file extractous-0.2.0.tar.gz
.
File metadata
- Download URL: extractous-0.2.0.tar.gz
- Upload date:
- Size: 177.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52cded4bad0830f3680e2f2affca5f9cb42bb8f823cc59c2d81e5c4be29be3d7 |
|
MD5 | 47a0820f2ec6c9c6f38aa1034ef5d4fc |
|
BLAKE2b-256 | 3f9de60e38d67367cc33439b954d88d154375bb290283d597e597c5a1aa454ca |
Provenance
The following attestation bundles were made for extractous-0.2.0.tar.gz
:
Publisher:
release_python.yml
on yobix-ai/extractous
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
extractous-0.2.0.tar.gz
- Subject digest:
52cded4bad0830f3680e2f2affca5f9cb42bb8f823cc59c2d81e5c4be29be3d7
- Sigstore transparency entry: 149382414
- Sigstore integration time:
- Predicate type:
File details
Details for the file extractous-0.2.0-cp38-abi3-win_amd64.whl
.
File metadata
- Download URL: extractous-0.2.0-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 40.0 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6e6b2dc22123575ec13aa9191738119a9bbda456c36615e0cc3f5e3d21db99e |
|
MD5 | 2587d1a6fa5ce4c319793efdc0ef0f53 |
|
BLAKE2b-256 | 7a19f7783d45710d974fe535b9915b72e5c41c2e396b7f44d28a9bfe150e7c6f |
Provenance
The following attestation bundles were made for extractous-0.2.0-cp38-abi3-win_amd64.whl
:
Publisher:
release_python.yml
on yobix-ai/extractous
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
extractous-0.2.0-cp38-abi3-win_amd64.whl
- Subject digest:
e6e6b2dc22123575ec13aa9191738119a9bbda456c36615e0cc3f5e3d21db99e
- Sigstore transparency entry: 149382419
- Sigstore integration time:
- Predicate type:
File details
Details for the file extractous-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: extractous-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 41.2 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 004fc5904fd1948254a42f231e097b7e1a988054f623d3b97d13df3639e1c1f6 |
|
MD5 | 63f9c1d9d14700b6e1365c1343d410dc |
|
BLAKE2b-256 | 5424239e07b7d7c5d3e49e94c0beeb7ef1f42b3a5934ce9689578e7b23fe74e5 |
Provenance
The following attestation bundles were made for extractous-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl
:
Publisher:
release_python.yml
on yobix-ai/extractous
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
extractous-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl
- Subject digest:
004fc5904fd1948254a42f231e097b7e1a988054f623d3b97d13df3639e1c1f6
- Sigstore transparency entry: 149382418
- Sigstore integration time:
- Predicate type:
File details
Details for the file extractous-0.2.0-cp38-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: extractous-0.2.0-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 47.5 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | beba3e32a0946b801978eac62c3c443b959bdd219e9b0f1836f1dd22db353178 |
|
MD5 | ea62ecb5acbd625cabf4e37929672554 |
|
BLAKE2b-256 | a718208e50b1e6281f71bd79999c940e3218b0923bdd4cf83fd59442363d63a4 |
Provenance
The following attestation bundles were made for extractous-0.2.0-cp38-abi3-macosx_11_0_arm64.whl
:
Publisher:
release_python.yml
on yobix-ai/extractous
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
extractous-0.2.0-cp38-abi3-macosx_11_0_arm64.whl
- Subject digest:
beba3e32a0946b801978eac62c3c443b959bdd219e9b0f1836f1dd22db353178
- Sigstore transparency entry: 149382415
- Sigstore integration time:
- Predicate type:
File details
Details for the file extractous-0.2.0-cp38-abi3-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: extractous-0.2.0-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 48.1 MB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd8418541170f8e1bcf5bebff38810475c6809ba771219e0c06c55bdf355fda7 |
|
MD5 | 8fa047a98b960a683d3757af0711759d |
|
BLAKE2b-256 | d0bdf80165b048bed8de38acf8b7bd046a8d15b640d5fa8aba9ce383ca84331f |
Provenance
The following attestation bundles were made for extractous-0.2.0-cp38-abi3-macosx_10_12_x86_64.whl
:
Publisher:
release_python.yml
on yobix-ai/extractous
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
extractous-0.2.0-cp38-abi3-macosx_10_12_x86_64.whl
- Subject digest:
cd8418541170f8e1bcf5bebff38810475c6809ba771219e0c06c55bdf355fda7
- Sigstore transparency entry: 149382417
- Sigstore integration time:
- Predicate type: