Skip to main content

Extractous Python Binding

Project description

Extractous Python Bindings

This project provides Python bindings for the Extractous library, allowing you to use extractous functionality in your Python applications.

Installation

To install the extractous Python bindings, you can use pip:

pip install extractous

Usage

Extracting a file to string:

from extractous import Extractor

# Create a new extractor
extractor = Extractor()
extractor.set_extract_string_max_length(1000)

# Extract text from a file
result, metadata = extractor.extract_file_to_string("README.md")
print(result)
print(metadata)

Extracting a file(URL / bytearray) to a buffered stream:

from extractous import Extractor

extractor = Extractor()
# for file
reader, metadata = extractor.extract_file("tests/quarkus.pdf")
# for url
# reader, metadata = extractor.extract_url("https://www.google.com")
# for bytearray
# with open("tests/quarkus.pdf", "rb") as file:
#     buffer = bytearray(file.read())
# reader, metadata = extractor.extract_bytes(buffer)

result = ""
buffer = reader.read(4096)
while len(buffer) > 0:
    result += buffer.decode("utf-8")
    buffer = reader.read(4096)

print(result)
print(metadata)

Extracting a file with OCR:

from extractous import Extractor, TesseractOcrConfig

extractor = Extractor().set_ocr_config(TesseractOcrConfig().set_language("deu"))
result, metadata = extractor.extract_file_to_string("../../test_files/documents/eng-ocr.pdf")

print(result)
print(metadata)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extractous-0.2.0.tar.gz (177.4 kB view details)

Uploaded Source

Built Distributions

extractous-0.2.0-cp38-abi3-win_amd64.whl (40.0 MB view details)

Uploaded CPython 3.8+ Windows x86-64

extractous-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl (41.2 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.28+ x86-64

extractous-0.2.0-cp38-abi3-macosx_11_0_arm64.whl (47.5 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

extractous-0.2.0-cp38-abi3-macosx_10_12_x86_64.whl (48.1 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file extractous-0.2.0.tar.gz.

File metadata

  • Download URL: extractous-0.2.0.tar.gz
  • Upload date:
  • Size: 177.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for extractous-0.2.0.tar.gz
Algorithm Hash digest
SHA256 52cded4bad0830f3680e2f2affca5f9cb42bb8f823cc59c2d81e5c4be29be3d7
MD5 47a0820f2ec6c9c6f38aa1034ef5d4fc
BLAKE2b-256 3f9de60e38d67367cc33439b954d88d154375bb290283d597e597c5a1aa454ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for extractous-0.2.0.tar.gz:

Publisher: release_python.yml on yobix-ai/extractous

Attestations:

File details

Details for the file extractous-0.2.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: extractous-0.2.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 40.0 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for extractous-0.2.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e6e6b2dc22123575ec13aa9191738119a9bbda456c36615e0cc3f5e3d21db99e
MD5 2587d1a6fa5ce4c319793efdc0ef0f53
BLAKE2b-256 7a19f7783d45710d974fe535b9915b72e5c41c2e396b7f44d28a9bfe150e7c6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for extractous-0.2.0-cp38-abi3-win_amd64.whl:

Publisher: release_python.yml on yobix-ai/extractous

Attestations:

File details

Details for the file extractous-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for extractous-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 004fc5904fd1948254a42f231e097b7e1a988054f623d3b97d13df3639e1c1f6
MD5 63f9c1d9d14700b6e1365c1343d410dc
BLAKE2b-256 5424239e07b7d7c5d3e49e94c0beeb7ef1f42b3a5934ce9689578e7b23fe74e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for extractous-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl:

Publisher: release_python.yml on yobix-ai/extractous

Attestations:

File details

Details for the file extractous-0.2.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for extractous-0.2.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 beba3e32a0946b801978eac62c3c443b959bdd219e9b0f1836f1dd22db353178
MD5 ea62ecb5acbd625cabf4e37929672554
BLAKE2b-256 a718208e50b1e6281f71bd79999c940e3218b0923bdd4cf83fd59442363d63a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for extractous-0.2.0-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: release_python.yml on yobix-ai/extractous

Attestations:

File details

Details for the file extractous-0.2.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for extractous-0.2.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cd8418541170f8e1bcf5bebff38810475c6809ba771219e0c06c55bdf355fda7
MD5 8fa047a98b960a683d3757af0711759d
BLAKE2b-256 d0bdf80165b048bed8de38acf8b7bd046a8d15b640d5fa8aba9ce383ca84331f

See more details on using hashes here.

Provenance

The following attestation bundles were made for extractous-0.2.0-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: release_python.yml on yobix-ai/extractous

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page