Skip to main content

A comprehensive JPEG2000 processing tool with BnF compatibility

Project description

JP2Forge: JPEG2000 Processing Tool

License: MIT Project Status: Active Version: 0.9.5

Current Status: Release - JP2Forge 0.9.5 is now available! This version fixes report generation issues in all processing modes and adds comprehensive testing capabilities. See the CHANGELOG for details.

JP2Forge is a comprehensive solution for converting images to JPEG2000 format with support for both standard and BnF (Bibliothèque nationale de France) compliant workflows. This project implements JPEG2000 processing according to technical specifications published by the Bibliothèque nationale de France (BnF) in their "Référentiel de format de fichier image v2" (2015). This implementation is provided for educational and training purposes to demonstrate standards implementation. All BnF-specific parameters are based on publicly available technical documentation with proper attribution to BnF as the source of these specifications.

Recent Updates

  • Fixed report generation issues in single file processing mode
  • Added comprehensive test script for validating JP2 conversion in different scenarios
  • Added advanced parallel processing with adaptive worker scaling
  • Implemented memory-efficient streaming image processor for large files
  • Added proper XML processing for metadata with lxml
  • Created centralized configuration system with validation
  • Enhanced external tool management with automatic detection
  • Added support for large image processing via chunking
  • Improved color profile handling for unusual color spaces
  • Enhanced error recovery and robustness
  • Refactored metadata handling for better organization and extensibility
  • Added configurable logging with file output option
  • See the CHANGELOG for a full list of changes

Table of Contents

  1. Overview
  2. Features
  3. Installation
  4. Basic Usage
  5. BnF Compliance
  6. Mode Switching
  7. Result Interpretation
  8. Advanced Configuration
  9. API Usage
  10. Troubleshooting
  11. Intellectual Property
  12. Project Architecture
  13. Web Interface
  14. Contributing

Overview

JP2Forge provides tools for:

  • Converting standard image formats to JPEG2000
  • Quality analysis with pixel loss assessment
  • Metadata handling with BnF compliant format support
  • Parallel processing for improved performance
  • Configurable workflows and compression modes

Features

  • Multiple Compression Modes:

    • lossless: No data loss, larger file size
    • lossy: Higher compression with data loss
    • supervised: Quality-controlled compression with analysis
    • bnf_compliant: BnF standards with fixed compression ratios
  • BnF Standards Support:

    • Fixed compression ratios by document type
    • Standard technical parameters (resolution levels, progression order)
    • XMP metadata in UUID box
  • Advanced Parallel Processing:

    • Adaptive worker pool with resource monitoring
    • Automatic scaling based on system load
    • Progress tracking with ETA estimation
    • Memory-aware resource allocation
  • Memory-Efficient Processing:

    • Streaming image processor for files of any size
    • Automatic chunking for memory optimization
    • Adaptive memory management
    • Support for very large files (>50MP)
  • Multi-page Document Support:

    • Automatic detection and handling of multi-page TIFF files
    • Individual page extraction and processing
    • Options for skipping, overwriting, or maintaining existing files
    • Detailed reports for each page in multi-page documents
  • Color Profile Management:

    • Automatic normalization of color profiles
    • Support for unusual color spaces (CMYK, LAB, etc.)
    • ICC profile preservation and conversion
  • Quality Analysis:

    • PSNR (Peak Signal-to-Noise Ratio) calculation
    • SSIM (Structural Similarity Index) analysis
    • MSE (Mean Square Error) measurement
  • Configuration Management:

    • Hierarchical configuration system
    • YAML and JSON configuration files
    • Environment variable override support
    • Schema validation with type checking
  • External Tool Integration:

    • Automatic detection of Kakadu, ExifTool, and jpylyzer
    • Capability-based tool selection
    • Graceful fallbacks when tools aren't available
    • Version and compatibility checking

Installation

System Dependencies

Required external tools:

  • ExifTool (required for metadata functionality):

    • On macOS: brew install exiftool
    • On Ubuntu/Debian: sudo apt install libimage-exiftool-perl
    • On Windows: Download from ExifTool's website
  • JPylyzer (required for proper JP2 validation):

    • pip install jpylyzer

If using the --use-kakadu option, Kakadu Software must be separately acquired and installed.

Using Python venv (Recommended for Most Users)

Python's built-in virtual environment tool is lightweight and perfect for most use cases:

# Clone repository
git clone https://github.com/xy-liao/jp2forge.git
cd jp2forge

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
pip install jpylyzer

Advantages of venv:

  • Built into Python, no additional installation required
  • Lightweight and simple to use
  • Perfect for development and most production uses
  • Isolates project dependencies from system Python

Using Conda (Recommended for Complex Environments)

Conda is excellent for managing complex dependencies, especially when you need specific versions of scientific libraries:

# Clone repository
git clone https://github.com/xy-liao/jp2forge.git
cd jp2forge

# Create and activate conda environment
conda create -n jp2forge python=3.9
conda activate jp2forge

# Install dependencies
conda install -c conda-forge pillow numpy psutil exiftool lxml pyyaml
pip install jpylyzer structlog

Advantages of conda:

  • Better handling of complex binary dependencies
  • Often easier to install packages with C extensions
  • Can manage both Python and non-Python dependencies
  • Useful for environments with specific version requirements

Standard Installation

# Clone repository
git clone https://github.com/xy-liao/jp2forge.git
cd jp2forge

# Install Python dependencies
pip install -r requirements.txt
pip install jpylyzer

# Don't forget to install system dependencies (ExifTool) as mentioned above

Basic Usage

Command-line Interface

Convert a single file:

python -m cli.workflow input.tif output_dir/

Process a directory:

python -m cli.workflow input_dir/ output_dir/ --recursive

Use parallel processing:

python -m cli.workflow input_dir/ output_dir/ --parallel --max-workers 4

Enable detailed logging to a file:

python -m cli.workflow input_dir/ output_dir/ --verbose --log-file conversion.log

Document Types

Select the document type based on your content:

python -m cli.workflow input_dir/ output_dir/ --document-type photograph

Available document types:

  • photograph: Standard photographic images (default)
  • heritage_document: Historical documents with high-quality settings
  • color: General color images
  • grayscale: Grayscale images

Compression Modes

Select how compression is applied:

python -m cli.workflow input_dir/ output_dir/ --compression-mode supervised

Available compression modes:

  • supervised: Quality-controlled compression (default)
  • lossless: No data loss
  • lossy: Higher compression with data loss
  • bnf_compliant: BnF standards with fixed ratios

BnF Compliance

Process files according to BnF standards:

python -m cli.workflow input_dir/ output_dir/ --bnf-compliant

Note: JP2Forge supports BnF-compliant processing without requiring Kakadu, implementing all essential BnF parameters through our built-in Pillow integration. While Kakadu would provide more complete support for all BnF robustness markers (SOP, EPH, PLT), the default implementation adheres to core BnF standards using Pillow's parameters.

BnF Compliance Capabilities

JP2Forge implements the following BnF specifications:

Note on Compression Ratio Notation: BnF documentation uses 1:N notation while JP2Forge uses N:1 format. See Notation Conventions for a detailed explanation.

Core Requirements (Fully Supported)

  • Compression Parameters: Proper 9-7 floating wavelet transform (ICT)
  • BnF-specific Compression Ratios: 4:1 (photograph/heritage), 6:1 (color), 16:1 (grayscale)
  • Tolerance Setting: Configurable, default 5% as per BnF specs
  • Resolution Levels: 10 levels as required
  • Progression Order: RPCL as specified by BnF
  • Code Block Size: 64x64 as specified
  • Tile Size: 1024x1024 as specified

Metadata Handling (Fully Supported)

  • XMP Metadata Structure: Fully compliant with BnF specifications
  • UUID Box: Correctly implemented with BnF UUID format
  • Required Fields: All required BnF metadata fields supported

Advanced Features (Partially Supported)

  • Robustness Markers: Partial support for SOP, EPH, PLT markers (fully supported when using Kakadu)
  • Precinct Sizes: Basic support for precinct size parameters

Validation

  • Compliance Checking: Integration with jpylyzer for validation
  • Automated Verification: BnF-specific validation checks

Ongoing Development: We continue to enhance our BnF compliance capabilities and are working on testing and additional improvements to better adhere to all BnF specifications.

BnF Document Types and Ratios

  • photograph and heritage_document: 4:1 ratio
  • color: 6:1 ratio
  • grayscale: 16:1 ratio

BnF Metadata

Provide BnF-compliant metadata:

python -m cli.workflow input_dir/ output_dir/ --bnf-compliant --metadata bnf_metadata.json

Example metadata.json:

{
  "dcterms:isPartOf": "NUM_123456",
  "dcterms:provenance": "University Library",
  "dc:relation": "ark:/12148/cb123456789",
  "dc:source": "FOL-Z-123",
  "tiff:Model": "Phase One P45+",
  "tiff:Make": "Phase One",
  "aux:SerialNumber": "1234567890",
  "xmp:CreatorTool": "University Workflow",
  "tiff:Artist": "University Operator"
}

Using Kakadu (Conceptual Implementation)

python -m cli.workflow input_dir/ output_dir/ --bnf-compliant --use-kakadu --kakadu-path=/path/to/kdu_compress

Important Note on Kakadu

Kakadu is a commercial software that requires a separate license for usage:

  • CONCEPTUAL IMPLEMENTATION: The Kakadu integration is currently in conceptual status and has not been tested with actual Kakadu software
  • Kakadu is not included with JP2Forge and must be acquired separately from Kakadu Software
  • A valid license is required to use Kakadu in production or commercial environments
  • Using Kakadu with JP2Forge is entirely optional - the project works with Pillow for BnF-compliant processing
  • The Kakadu integration code is provided as a reference implementation for how strict BnF compliance might be achieved in the future

Mode Switching

Normal Mode (Default)

python -m cli.workflow input_dir/ output_dir/

Options for normal mode:

python -m cli.workflow input_dir/ output_dir/ --compression-mode supervised --quality 40.0

BnF Mode

python -m cli.workflow input_dir/ output_dir/ --bnf-compliant

Advanced BnF options:

python -m cli.workflow input_dir/ output_dir/ --bnf-compliant --compression-ratio-tolerance 0.03

Result Interpretation

Processing Statuses

  • SUCCESS: Successfully processed with all quality metrics above thresholds
  • WARNING: Processed but quality metrics below thresholds
  • FAILURE: Processing failed, no output file generated
  • SKIPPED: File ignored (invalid image or corrupted)

Summary Report

A summary report is generated that shows the overall results of batch processing and includes file size information for each file:

## Detailed Results
----------------
File: input_dir/example_photo_001.tif
Status: SUCCESS
Output: output_dir/example_photo_001.jp2
Original Size: 24.12 MB
Converted Size: 5.31 MB
Compression Ratio: 4.54:1

File: input_dir/example_photo_002.tif
Status: SUCCESS
Output: output_dir/example_photo_002.jp2
Original Size: 23.88 MB
Converted Size: 5.24 MB
Compression Ratio: 4.56:1

File: input_dir/example_photo_003.tif
Status: SUCCESS
Output: output_dir/example_photo_003.jp2
Original Size: 23.71 MB
Converted Size: 5.56 MB
Compression Ratio: 4.27:1

Quality Metrics (Normal Mode)

  • PSNR: Peak Signal-to-Noise Ratio (higher is better)

    • 40 dB: Excellent quality

    • 30-40 dB: Good quality
    • < 30 dB: Medium to low quality
  • SSIM: Structural Similarity Index (higher is better)

    • 0.95: Excellent structural preservation

    • 0.90-0.95: Good structural preservation
    • < 0.90: More significant structural changes
  • MSE: Mean Square Error (lower is better)

Validation Reports

JP2Forge validates generated JPEG2000 files using JPylyzer, a specialized tool for validating JP2 files. The validation results are stored in reports/info_jpylyzer.json and include:

  • Compliance with JP2 format specifications
  • File structure validation
  • Codestream properties validation
  • Detailed box structure information
  • Technical metadata extraction

Important: Installing JPylyzer (pip install jpylyzer) is strongly recommended for complete validation reports. Without JPylyzer, validation will be limited to basic file signature checks, and the "properties" fields in validation reports will be empty.

Example of JPylyzer validation properties:

{
  "toolInfo": {
    "toolName": "jpylyzer",
    "toolVersion": "2.2.1"
  },
  "fileInfo": {
    "fileName": "example.jp2",
    "filePath": "/path/to/example.jp2",
    "fileSizeInBytes": "1993629"
  },
  "isValid": true,
  "properties": {
    "fileTypeBox": {
      "br": "jp2 ",
      "minV": "0",
      "cL": "jp2 "
    },
    "jp2HeaderBox": {
      "imageHeaderBox": {
        "height": "4311",
        "width": "3358",
        "nC": "3",
        "bPCSign": "unsigned",
        "bPCDepth": "8"
      },
      "colourSpecificationBox": {
        "meth": "Enumerated",
        "enumCS": "sRGB"
      }
    },
    "compressionRatio": "2.1"
  },
  "warnings": [],
  "validationTool": "jpylyzer",
  "validationToolVersion": "2.2.1"
}

File Size Information

File size information is included in both individual file reports and the summary report:

  • Original Size: Size of the input file (both in bytes and human-readable format)
  • Converted Size: Size of the output JPEG2000 file (both in bytes and human-readable format)
  • Compression Ratio: Ratio of original size to converted size (e.g., "4.50:1")

Example report with file size information:

{
    "original_file": "example_photo_003.tif",
    "converted_file": "example_photo_003.jp2",
    "file_sizes": {
        "original_size": 24860832,
        "original_size_human": "23.71 MB",
        "converted_size": 5826410,
        "converted_size_human": "5.56 MB",
        "compression_ratio": "4.27:1"
    },
    "metrics": {
        "psnr": "44.52 dB",
        "ssim": "0.9998",
        "mse": "2.29"
    },
    "quality_passed": "yes",
    "thresholds": {
        "psnr": 40.0,
        "ssim": 0.95,
        "mse": 50.0
    },
    "recommendations": [
        "No quality issues detected. All metrics within acceptable thresholds."
    ]
}

This information helps you evaluate the effectiveness of different compression settings.

Compression Ratio (BnF Mode)

BnF mode focuses on maintaining specific compression ratios within tolerance (default: 5%).

Advanced Configuration

Configuration Files

Save current configuration:

python -m cli.workflow input_dir/ output_dir/ --parallel --save-config my_config.yaml

Use saved configuration:

python -m cli.workflow input_dir/ output_dir/ --config my_config.yaml

Environment Variables

All configuration options can be set via environment variables:

export JP2FORGE_PROCESSING_MAX_WORKERS=4
export JP2FORGE_BNF_COMPLIANT=true
export JP2FORGE_JPEG2000_COMPRESSION_MODE=supervised

Advanced Memory Management

# Process large images with streaming processor
python -m cli.workflow large_images/ output_dir/ --memory-limit 2048

# Force chunking for all images
python -m cli.workflow input_dir/ output_dir/ --force-chunking

# Set minimum chunk height for processing
python -m cli.workflow input_dir/ output_dir/ --min-chunk-height 32

Parallel Processing Options

# Use adaptive worker pool with resource monitoring
python -m cli.workflow input_dir/ output_dir/ --parallel --adaptive-workers

# Set minimum and maximum worker count
python -m cli.workflow input_dir/ output_dir/ --parallel --min-workers 2 --max-workers 8

# Set resource thresholds for scaling
python -m cli.workflow input_dir/ output_dir/ --parallel --memory-threshold 0.8 --cpu-threshold 0.9

Custom Color Management

# Preserve color profiles
python -m cli.workflow input_dir/ output_dir/ --preserve-icc-profiles

# Specify default color profiles
python -m cli.workflow input_dir/ output_dir/ --default-rgb-profile=/path/to/srgb.icc

API Usage

Standard Workflow

from core.types import WorkflowConfig, DocumentType, CompressionMode
from workflow.standard import StandardWorkflow

# Create configuration
config = WorkflowConfig(
    output_dir="output_dir/",
    report_dir="reports/",
    document_type=DocumentType.PHOTOGRAPH,
    compression_mode=CompressionMode.SUPERVISED,
    quality_threshold=40.0
)

# Create workflow
workflow = StandardWorkflow(config)

# Process a file
result = workflow.process_file("input.tif")

Advanced Parallel Workflow

from core.types import WorkflowConfig, ProcessingMode
from workflow.parallel import ParallelWorkflow
from utils.parallel.adaptive_pool import AdaptiveWorkerPool
from utils.parallel.progress_tracker import ProgressTracker

# Create advanced configuration
config = WorkflowConfig(
    output_dir="output_dir/",
    processing_mode=ProcessingMode.PARALLEL,
    max_workers=8,
    min_workers=2,
    memory_limit_mb=4096
)

# Create parallel workflow
workflow = ParallelWorkflow(config)

# Set up progress callback
def progress_callback(progress_data):
    print(f"Completed: {progress_data['percent_complete']:.1f}%, ETA: {progress_data['eta_time']}")

# Process a directory with progress tracking
results = workflow.process_directory(
    "input_dir/", 
    recursive=True,
    progress_callback=progress_callback
)

Memory-Efficient Image Processing

from utils.imaging.streaming_processor import StreamingImageProcessor

# Create streaming processor
processor = StreamingImageProcessor(
    memory_limit_mb=2048,
    min_chunk_height=16
)

# Process a large image
def enhance_image(img):
    # Image processing function
    return img.filter(ImageFilter.SHARPEN)

processor.process_in_chunks(
    "large_image.tif",
    "output.jp2",
    enhance_image,
    save_kwargs={'quality': 90}
)

Working with Configuration

from utils.config.config_manager import ConfigManager

# Create config manager
config_manager = ConfigManager()

# Load configuration from various sources
config_manager.load_from_file('config.yaml')
config_manager.load_from_env(prefix='JP2FORGE_')

# Get configuration values
output_dir = config_manager.get('output.directory', 'default_dir/')
max_workers = config_manager.get('processing.max_workers', 4)

# Validate configuration
is_valid, issues = config_manager.validate()
if not is_valid:
    for issue in issues:
        print(f"Configuration issue: {issue}")

Troubleshooting

Common Issues

Memory Issues with Large Images

If you encounter memory errors with large images:

  • Use the streaming processor: --memory-limit 2048
  • Adjust chunk size: --min-chunk-height 32
  • Enable memory mapping: --use-memory-mapping

Slow Processing

If processing is slow:

  • Use adaptive parallel processing: --parallel --adaptive-workers
  • Adjust worker count: --max-workers 8
  • Check for other resource-intensive processes running on your system

Metadata Issues

If metadata operations fail:

  • Check if ExifTool is installed and accessible
  • Use the tool manager to diagnose: python -m utils.tools.tool_manager check
  • Try specifying ExifTool path explicitly: --exiftool-path=/path/to/exiftool

Configuration Problems

If configuration isn't being applied:

  • Check for configuration file syntax errors
  • Verify environment variable format (should be JP2FORGE_SECTION_KEY=value)
  • Use --debug mode to see configuration loading details

Intellectual Property

JP2Forge is designed with careful consideration for intellectual property rights:

  • Independent Implementation: JP2Forge is an original implementation that doesn't reuse code from other JPEG2000 libraries like OpenJPEG
  • BnF Specifications: Implementation follows published BnF technical specifications while acknowledging BnF's intellectual property
  • Third-Party Software:
    • Uses Pillow (permissive license) for basic operations
    • Optional Kakadu integration requires separate licensing and installation

For detailed information about intellectual property considerations, see the IP Considerations Document.

Project Architecture

The project follows a modular architecture with well-defined components:

  • CLI Interface: Entry point for command line operations
  • Workflow Components: Manages processing pipeline (standard and parallel)
  • Core Components: Handles compression, analysis, and metadata
  • Utility Modules: Provides support for image processing, validation, etc.

For a detailed visual representation of the architecture and workflow, see the JP2Forge Workflow Diagram.

Web Interface

JP2Forge has a companion web interface project called JP2Forge Web. This is a limited implementation that provides a user-friendly way to interact with JP2Forge functionality through a web browser.

Key differences between JP2Forge and JP2Forge Web:

  • Scope: JP2Forge Web focuses on core BnF compliance features, while JP2Forge provides the complete feature set
  • Usage: JP2Forge Web offers a simple web interface, while JP2Forge is a comprehensive command-line tool
  • Target Audience: JP2Forge Web is for users seeking a simple interface, while JP2Forge is for power users needing full control

For more details, see INTEGRATION_WITH_JP2FORGE_WEB.md.

Contributing

Contributions to JP2Forge are welcome! Please see our Contributing Guidelines for details on:

  • Setting up your development environment
  • Coding standards and guidelines
  • Metadata handling guidelines
  • Testing requirements
  • Pull request process

JP2Forge is actively maintained, and we appreciate bug reports, feature suggestions, and code contributions that help improve the project.

Project Origin and Acknowledgments

JP2Forge was created by xy-liao as the original author and maintainer. The project was inspired by the BnF (Bibliothèque nationale de France) image format specifications. The implementation follows the technical requirements described in their reference documents:

  • BnF Referential (2015): "Référentiel de format de fichier image v2" - PDF
  • BnF Documentation (2021): "Formats de données pour la préservation à long terme" - PDF

Implementation Note

JP2Forge is an independent implementation of the JPEG2000 standard with BnF compliance. It is not based on or derived from OpenJPEG's implementation. Instead, it relies on either Pillow or Kakadu (if available) for actual JPEG2000 encoding/decoding operations.

JPEG2000 Library Comparison

There are several JPEG2000 implementations available. Here's how JP2Forge relates to them:

Library License Relationship to JP2Forge
Pillow HPND/PIL License (permissive) Primary library used by JP2Forge for non-BnF compliant operations.
Kakadu Commercial Optional integration for BnF-compliant encoding. Users must acquire separately.
OpenJPEG BSD 2-Clause Not used by JP2Forge. Independent, open-source JPEG2000 codec.
JasPer JasPer License (MIT-style) Not used by JP2Forge.
Grok/OpenJPH BSD 2-Clause Not used by JP2Forge.

JP2Forge is not derived from any of these implementations but rather uses Pillow's JPEG2000 support for basic operations and optionally interfaces with Kakadu for BnF-compliant processing when available.

References

  1. BnF Referential (2015): "Référentiel de format de fichier image v2" - PDF
  2. BnF Documentation (2021): "Formats de données pour la préservation à long terme" - PDF
  3. JPEG2000 Standard: ISO/IEC 15444

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jp2forge-0.9.5.post1.tar.gz (167.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jp2forge-0.9.5.post1-py3-none-any.whl (125.8 kB view details)

Uploaded Python 3

File details

Details for the file jp2forge-0.9.5.post1.tar.gz.

File metadata

  • Download URL: jp2forge-0.9.5.post1.tar.gz
  • Upload date:
  • Size: 167.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for jp2forge-0.9.5.post1.tar.gz
Algorithm Hash digest
SHA256 f1da0645e9af74c25e6b26d501356fd2c3f84fb07eec4da3fd7f668e5b85ed7c
MD5 afbd349fa1ee98a057cdbd6067b97a9e
BLAKE2b-256 79055422e508297667acc7121d49e5ed9af611fc0c0fa0a3cecfe22c7fa2f1c3

See more details on using hashes here.

File details

Details for the file jp2forge-0.9.5.post1-py3-none-any.whl.

File metadata

  • Download URL: jp2forge-0.9.5.post1-py3-none-any.whl
  • Upload date:
  • Size: 125.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for jp2forge-0.9.5.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 45ebd4e3982050ab8f07180667ceb3c7c97083bdea4fbd5bb3352fb11de9e9b8
MD5 39a60e317aed9c2245e9eeb59e22473c
BLAKE2b-256 4bea0d171db60b6ded11c94a7d3465dca9941bae515c3aa6c260589c7546f47f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page