Skip to main content

No project description provided

Project description

Peafowl Dox

Peafowl Dox Logo

A utility library for image and document processing. Essential tools for handling multipart uploads, PDF conversion, and preparing documents for OCR and ML pipelines.

Table of Contents


Installation

pip install peafowl_dox

Quick Start

from fastapi import UploadFile
from peafowl_dox import multipart_to_array

@app.post("/upload/")
async def upload_image(file: UploadFile):
        image_array = multipart_to_array(file.file)
        print(f"Image shape: {image_array.shape}")
        return {"message": "Image processed successfully"}

API Reference

Image Upload Processing

Convert multipart file uploads to numpy arrays.

from peafowl_dox import multipart_to_array

# From FastAPI upload
array = multipart_to_array(file.file)

# From Flask upload
array = multipart_to_array(request.files['image'])

# From BytesIO
from io import BytesIO
with open('image.jpg', 'rb') as f:
        buffer = BytesIO(f.read())
        array = multipart_to_array(buffer)

Returns: np.ndarray with shape (height, width, channels)


PDF Conversion

Convert PDF pages to image arrays.

from peafowl_dox import pdf_to_images

# From file path
images = pdf_to_images("document.pdf", dpi=150)

# From bytes
with open("document.pdf", "rb") as f:
        images = pdf_to_images(f.read(), dpi=200)

# From upload
images = pdf_to_images(file.file, dpi=150)

print(f"Converted {len(images)} pages")

Parameters:

  • pdf_input: File path, bytes, or file-like object
  • dpi: Resolution (default: 150)
  • image_format: "RGB", "RGBA", or "L" (grayscale)

Returns: List of numpy arrays (one per page)


Image Resizing

Resize images with aspect ratio preservation.

from peafowl_dox import resize_image

# Resize to specific dimensions
resized = resize_image(image, (800, 600), maintain_aspect=True)

# Resize by max dimension
resized = resize_image(image, 1024)  # Max 1024px

# Resize without aspect ratio
resized = resize_image(image, (800, 600), maintain_aspect=False)

Parameters:

  • image: Numpy array
  • target_size: (width, height) tuple or single int for max dimension
  • maintain_aspect: Preserve aspect ratio (default: True)
  • interpolation: OpenCV interpolation method (default: cv2.INTER_AREA)

OCR Preparation

Prepare images for OCR with enhancement.

from peafowl_dox import prepare_for_ocr

# Basic preparation
ocr_ready = prepare_for_ocr(image)

# With resize and enhancement
ocr_ready = prepare_for_ocr(image, target_size=1200, enhance=True)

# Custom size without enhancement
ocr_ready = prepare_for_ocr(image, target_size=(1024, 768), enhance=False)

Parameters:

  • image: Numpy array
  • target_size: Optional resize target (int or tuple)
  • enhance: Apply noise reduction and contrast enhancement (default: True)

Returns: Grayscale numpy array optimized for OCR


Image Processor Class

Full-featured processor for complex workflows.

from peafowl_dox import ImageProcessor

processor = ImageProcessor(default_ocr_size=1200)

# Process from path
image = processor.process_image("path/to/image.jpg")

# Process from bytes
with open("image.jpg", "rb") as f:
        image = processor.process_image(f.read())

# Process from numpy array
image = processor.process_image(existing_array)

# Process directly for OCR
ocr_ready = processor.process_for_ocr("path/to/scan.jpg", enhance=True)

Methods:

  • process_image(img): Accepts path, bytes, or numpy array → returns RGB array
  • process_for_ocr(img, target_size, enhance): Process and prepare for OCR

Error Handling

from peafowl_dox import (
        PeafowlDoxError,
        ImageProcessingError,
        PDFConversionError
)

try:
        images = pdf_to_images("document.pdf")
except PDFConversionError as e:
        print(f"PDF conversion failed: {e}")

try:
        array = multipart_to_array(file)
except ImageProcessingError as e:
        print(f"Image processing failed: {e}")

Dependencies

  • Python >= 3.8
  • numpy >= 1.24.0
  • Pillow >= 10.0.0
  • opencv-python >= 4.8.0
  • PyMuPDF >= 1.23.0

Changelog

[0.1.0] - 2025-11-06

  • Initial release
  • multipart_to_array: Convert uploads to numpy arrays
  • pdf_to_images: PDF to image conversion
  • resize_image: Smart image resizing
  • prepare_for_ocr: OCR preparation with enhancement
  • ImageProcessor: Full-featured processing class

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peafowl_dox-0.1.0.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peafowl_dox-0.1.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file peafowl_dox-0.1.0.tar.gz.

File metadata

  • Download URL: peafowl_dox-0.1.0.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for peafowl_dox-0.1.0.tar.gz
Algorithm Hash digest
SHA256 46480fb8c87c80ead856d6d60dcded76f512412021789bb800a540bb95fd2326
MD5 f911ca1289b9a58a9ca76557d7f0beef
BLAKE2b-256 f85aaa30a5b57b339ca5b96e97c65855b0a7003a197f669136ce094b5cb40342

See more details on using hashes here.

File details

Details for the file peafowl_dox-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: peafowl_dox-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for peafowl_dox-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e0045529b89c96dfff2382cbb49d0c17c54110ca59ae9fcad044427d9ef1635f
MD5 eae1c5ffdbb5af8f507db9f4e3ae1aa6
BLAKE2b-256 aa6209f6c1e357d761494320e10e6d66923f3c637766fb4cd00a479a1eba0fb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page