Skip to main content

Extractous Python Binding

Project description

Extractous Python Bindings

This project provides Python bindings for the Extractous library, allowing you to use extractous functionality in your Python applications.

Installation

To install the extractous Python bindings, you can use pip:

pip install extractous

Usage

Extracting a file to string:

from extractous import Extractor

# Create a new extractor
extractor = Extractor()
extractor = extractor.set_extract_string_max_length(1000)
# if you need an xml
# extractor = extractor.set_xml_output(True)

# Extract text from a file
result, metadata = extractor.extract_file_to_string("README.md")
print(result)
print(metadata)

Extracting a file(URL / bytearray) to a buffered stream:

from extractous import Extractor

extractor = Extractor()
# for file
reader, metadata = extractor.extract_file("tests/quarkus.pdf")
# for url
# reader, metadata = extractor.extract_url("https://www.google.com")
# for bytearray
# with open("tests/quarkus.pdf", "rb") as file:
#     buffer = bytearray(file.read())
# reader, metadata = extractor.extract_bytes(buffer)

result = ""
buffer = reader.read(4096)
while len(buffer) > 0:
    result += buffer.decode("utf-8")
    buffer = reader.read(4096)

print(result)
print(metadata)

Extracting a file with OCR:

from extractous import Extractor, TesseractOcrConfig

extractor = Extractor().set_ocr_config(TesseractOcrConfig().set_language("deu"))
result, metadata = extractor.extract_file_to_string("../../test_files/documents/eng-ocr.pdf")

print(result)
print(metadata)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extractous-0.3.0.tar.gz (179.2 kB view details)

Uploaded Source

Built Distributions

extractous-0.3.0-cp38-abi3-win_amd64.whl (41.1 MB view details)

Uploaded CPython 3.8+ Windows x86-64

extractous-0.3.0-cp38-abi3-manylinux_2_28_x86_64.whl (42.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.28+ x86-64

extractous-0.3.0-cp38-abi3-macosx_11_0_arm64.whl (48.5 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

extractous-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl (49.1 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file extractous-0.3.0.tar.gz.

File metadata

  • Download URL: extractous-0.3.0.tar.gz
  • Upload date:
  • Size: 179.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for extractous-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ebccc7778137d6be7680660f1e880e8a1898dedca7f64ab4b354a68d22359e84
MD5 9f04f37f40516b54268223de08499ea1
BLAKE2b-256 2a1236c9422feee01461d4db1a915857df21e431ef88ddf9d84e689dd4a984b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for extractous-0.3.0.tar.gz:

Publisher: release_python.yml on yobix-ai/extractous

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file extractous-0.3.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: extractous-0.3.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 41.1 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for extractous-0.3.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 d86754320beba736f33c10227a641cce473cfb690c9e2d5ae823c917110c98ab
MD5 3a779467d8183730c2093da8b15bed7f
BLAKE2b-256 07a1dd01a3abb4c4af89cf3775735948d76522233ae3550a166b8c2f7c849a52

See more details on using hashes here.

Provenance

The following attestation bundles were made for extractous-0.3.0-cp38-abi3-win_amd64.whl:

Publisher: release_python.yml on yobix-ai/extractous

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file extractous-0.3.0-cp38-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for extractous-0.3.0-cp38-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 01ef136a5901aed0e813f747989bbeaff60e4777cc77b933bef8f10de80b7dfd
MD5 b96442217233e6b3e310fade9a84c29d
BLAKE2b-256 6fd91a3838e24f78902ca1a594110a00812134102c5cad13f889141509062481

See more details on using hashes here.

Provenance

The following attestation bundles were made for extractous-0.3.0-cp38-abi3-manylinux_2_28_x86_64.whl:

Publisher: release_python.yml on yobix-ai/extractous

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file extractous-0.3.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for extractous-0.3.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6e43d9dee26e7cb322b6f9059f3e3565252f444b80d2f4ff2007a002fdd00727
MD5 a115c466c96830c862c05c24a584e46c
BLAKE2b-256 66917debbfabadb88d34687bf93e23d176692bdae7e82c51180b2481710bb709

See more details on using hashes here.

Provenance

The following attestation bundles were made for extractous-0.3.0-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: release_python.yml on yobix-ai/extractous

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file extractous-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for extractous-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9f513d8da78b74ab655e659c64e287ddd653009f5af011cef0d2c8467a931e38
MD5 79f911b0ba0810ff749cd926ba7ef147
BLAKE2b-256 985099d6e8982ced454cc7a0e184988b63c65e199587626c45404fc7b6ab9d90

See more details on using hashes here.

Provenance

The following attestation bundles were made for extractous-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: release_python.yml on yobix-ai/extractous

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page