Skip to main content

A comprehensive JPEG2000 processing tool with BnF compatibility

Project description

JP2Forge: JPEG2000 Processing Tool

License: MIT Project Status: Active Version: 0.9.1

Current Status: Release - JP2Forge 0.9.1 is now available! See the Release Notes for details.

JP2Forge is a comprehensive solution for converting images to JPEG2000 format with support for both standard and BnF (Bibliothèque nationale de France) compliant workflows. This project implements JPEG2000 processing according to technical specifications published by the Bibliothèque nationale de France (BnF) in their "Référentiel de format de fichier image v2" (2015). This implementation is provided for educational and training purposes to demonstrate standards implementation. All BnF-specific parameters are based on publicly available technical documentation with proper attribution to BnF as the source of these specifications.

Recent Updates

  • Added advanced parallel processing with adaptive worker scaling
  • Implemented memory-efficient streaming image processor for large files
  • Added proper XML processing for metadata with lxml
  • Created centralized configuration system with validation
  • Enhanced external tool management with automatic detection
  • Added support for large image processing via chunking
  • Improved color profile handling for unusual color spaces
  • Enhanced error recovery and robustness
  • Refactored metadata handling for better organization and extensibility
  • Added configurable logging with file output option
  • See the CHANGELOG for a full list of changes

Table of Contents

  1. Overview
  2. Features
  3. Installation
  4. Basic Usage
  5. BnF Compliance
  6. Mode Switching
  7. Result Interpretation
  8. Advanced Configuration
  9. API Usage
  10. Troubleshooting
  11. Intellectual Property
  12. Project Architecture

Overview

JP2Forge provides tools for:

  • Converting standard image formats to JPEG2000
  • Quality analysis with pixel loss assessment
  • Metadata handling with BnF compliant format support
  • Parallel processing for improved performance
  • Configurable workflows and compression modes

Features

  • Multiple Compression Modes:

    • lossless: No data loss, larger file size
    • lossy: Higher compression with data loss
    • supervised: Quality-controlled compression with analysis
    • bnf_compliant: BnF standards with fixed compression ratios
  • BnF Standards Support:

    • Fixed compression ratios by document type
    • Standard technical parameters (resolution levels, progression order)
    • XMP metadata in UUID box
  • Advanced Parallel Processing:

    • Adaptive worker pool with resource monitoring
    • Automatic scaling based on system load
    • Progress tracking with ETA estimation
    • Memory-aware resource allocation
  • Memory-Efficient Processing:

    • Streaming image processor for files of any size
    • Automatic chunking for memory optimization
    • Adaptive memory management
    • Support for very large files (>50MP)
  • Multi-page Document Support:

    • Automatic detection and handling of multi-page TIFF files
    • Individual page extraction and processing
    • Options for skipping, overwriting, or maintaining existing files
    • Detailed reports for each page in multi-page documents
  • Color Profile Management:

    • Automatic normalization of color profiles
    • Support for unusual color spaces (CMYK, LAB, etc.)
    • ICC profile preservation and conversion
  • Quality Analysis:

    • PSNR (Peak Signal-to-Noise Ratio) calculation
    • SSIM (Structural Similarity Index) analysis
    • MSE (Mean Square Error) measurement
  • Configuration Management:

    • Hierarchical configuration system
    • YAML and JSON configuration files
    • Environment variable override support
    • Schema validation with type checking
  • External Tool Integration:

    • Automatic detection of Kakadu, ExifTool, and jpylyzer
    • Capability-based tool selection
    • Graceful fallbacks when tools aren't available
    • Version and compatibility checking

Installation

System Dependencies

Required external tools:

  • ExifTool (required for metadata functionality):

    • On macOS: brew install exiftool
    • On Ubuntu/Debian: sudo apt install libimage-exiftool-perl
    • On Windows: Download from ExifTool's website
  • JPylyzer (required for proper JP2 validation):

    • pip install jpylyzer

If using the --use-kakadu option, Kakadu Software must be separately acquired and installed.

Conda Environment (Recommended)

# Clone repository
git clone https://github.com/xy-liao/jp2forge
cd jp2forge

# Create and activate conda environment
conda create -n jp2forge python=3.9
conda activate jp2forge

# Install dependencies
conda install -c conda-forge pillow numpy psutil exiftool lxml pyyaml
pip install jpylyzer

Standard Installation

# Clone repository
git clone https://github.com/xy-liao/jp2forge
cd jp2forge

# Install Python dependencies
pip install -r requirements.txt
pip install jpylyzer

# Don't forget to install system dependencies (ExifTool) as mentioned above

Basic Usage

Command-line Interface

Convert a single file:

python -m cli.workflow input.tif output_dir/

Process a directory:

python -m cli.workflow input_dir/ output_dir/ --recursive

Use parallel processing:

python -m cli.workflow input_dir/ output_dir/ --parallel --max-workers 4

Enable detailed logging to a file:

python -m cli.workflow input_dir/ output_dir/ --verbose --log-file conversion.log

Document Types

Select the document type based on your content:

python -m cli.workflow input_dir/ output_dir/ --document-type photograph

Available document types:

  • photograph: Standard photographic images (default)
  • heritage_document: Historical documents with high-quality settings
  • color: General color images
  • grayscale: Grayscale images

Compression Modes

Select how compression is applied:

python -m cli.workflow input_dir/ output_dir/ --compression-mode supervised

Available compression modes:

  • supervised: Quality-controlled compression (default)
  • lossless: No data loss
  • lossy: Higher compression with data loss
  • bnf_compliant: BnF standards with fixed ratios

BnF Compliance

Process files according to BnF standards:

python -m cli.workflow input_dir/ output_dir/ --bnf-compliant

Note: JP2Forge can perform BnF-compliant compression using the built-in Pillow library without requiring Kakadu installation. While Kakadu would provide more complete support for all BnF robustness markers (SOP, EPH, PLT), the default implementation makes a best effort to adhere to BnF standards using Pillow's parameters.

BnF Document Types and Ratios

  • photograph and heritage_document: 1:4 ratio
  • color: 1:6 ratio
  • grayscale: 1:16 ratio

BnF Metadata

Provide BnF-compliant metadata:

python -m cli.workflow input_dir/ output_dir/ --bnf-compliant --metadata bnf_metadata.json

Example metadata.json:

{
  "dcterms:isPartOf": "NUM_123456",
  "dcterms:provenance": "University Library",
  "dc:relation": "ark:/12148/cb123456789",
  "dc:source": "FOL-Z-123",
  "tiff:Model": "Phase One P45+",
  "tiff:Make": "Phase One",
  "aux:SerialNumber": "1234567890",
  "xmp:CreatorTool": "University Workflow",
  "tiff:Artist": "University Operator"
}

Using Kakadu (Experimental)

python -m cli.workflow input_dir/ output_dir/ --bnf-compliant --use-kakadu --kakadu-path=/path/to/kdu_compress

Important Note on Kakadu

Kakadu is a commercial software that requires a separate license for usage:

  • EXPERIMENTAL FEATURE: The Kakadu integration is currently conceptual/experimental and has not been tested with actual Kakadu software
  • Kakadu is not included with jp2forge and must be acquired separately from Kakadu Software
  • A valid license is required to use Kakadu in production or commercial environments
  • Using Kakadu with jp2forge is entirely optional - the project works with Pillow for non-BnF use cases
  • For strict compliance with BnF requirements, Kakadu would theoretically provide better support for the robustness markers specified in BnF reference documents

Mode Switching

Normal Mode (Default)

python -m cli.workflow input_dir/ output_dir/

Options for normal mode:

python -m cli.workflow input_dir/ output_dir/ --compression-mode supervised --quality 40.0

BnF Mode

python -m cli.workflow input_dir/ output_dir/ --bnf-compliant

Advanced BnF options:

python -m cli.workflow input_dir/ output_dir/ --bnf-compliant --compression-ratio-tolerance 0.03

Result Interpretation

Processing Statuses

  • SUCCESS: Successfully processed with all quality metrics above thresholds
  • WARNING: Processed but quality metrics below thresholds
  • FAILURE: Processing failed, no output file generated
  • SKIPPED: File ignored (invalid image or corrupted)

Summary Report

A summary report is generated that shows the overall results of batch processing and includes file size information for each file:

## Detailed Results
----------------
File: input_dir/example_photo_001.tif
Status: SUCCESS
Output: output_dir/example_photo_001.jp2
Original Size: 24.12 MB
Converted Size: 5.31 MB
Compression Ratio: 4.54:1

File: input_dir/example_photo_002.tif
Status: SUCCESS
Output: output_dir/example_photo_002.jp2
Original Size: 23.88 MB
Converted Size: 5.24 MB
Compression Ratio: 4.56:1

File: input_dir/example_photo_003.tif
Status: SUCCESS
Output: output_dir/example_photo_003.jp2
Original Size: 23.71 MB
Converted Size: 5.56 MB
Compression Ratio: 4.27:1

Quality Metrics (Normal Mode)

  • PSNR: Peak Signal-to-Noise Ratio (higher is better)

    • 40 dB: Excellent quality

    • 30-40 dB: Good quality
    • < 30 dB: Medium to low quality
  • SSIM: Structural Similarity Index (higher is better)

    • 0.95: Excellent structural preservation

    • 0.90-0.95: Good structural preservation
    • < 0.90: More significant structural changes
  • MSE: Mean Square Error (lower is better)

Validation Reports

JP2Forge validates generated JPEG2000 files using JPylyzer, a specialized tool for validating JP2 files. The validation results are stored in reports/info_jpylyzer.json and include:

  • Compliance with JP2 format specifications
  • File structure validation
  • Codestream properties validation
  • Detailed box structure information
  • Technical metadata extraction

Important: Installing JPylyzer (pip install jpylyzer) is strongly recommended for complete validation reports. Without JPylyzer, validation will be limited to basic file signature checks, and the "properties" fields in validation reports will be empty.

Example of JPylyzer validation properties:

{
  "toolInfo": {
    "toolName": "jpylyzer",
    "toolVersion": "2.2.1"
  },
  "fileInfo": {
    "fileName": "example.jp2",
    "filePath": "/path/to/example.jp2",
    "fileSizeInBytes": "1993629"
  },
  "isValid": true,
  "properties": {
    "fileTypeBox": {
      "br": "jp2 ",
      "minV": "0",
      "cL": "jp2 "
    },
    "jp2HeaderBox": {
      "imageHeaderBox": {
        "height": "4311",
        "width": "3358",
        "nC": "3",
        "bPCSign": "unsigned",
        "bPCDepth": "8"
      },
      "colourSpecificationBox": {
        "meth": "Enumerated",
        "enumCS": "sRGB"
      }
    },
    "compressionRatio": "2.1"
  },
  "warnings": [],
  "validationTool": "jpylyzer",
  "validationToolVersion": "2.2.1"
}

File Size Information

File size information is included in both individual file reports and the summary report:

  • Original Size: Size of the input file (both in bytes and human-readable format)
  • Converted Size: Size of the output JPEG2000 file (both in bytes and human-readable format)
  • Compression Ratio: Ratio of original size to converted size (e.g., "4.50:1")

Example report with file size information:

{
    "original_file": "example_photo_003.tif",
    "converted_file": "example_photo_003.jp2",
    "file_sizes": {
        "original_size": 24860832,
        "original_size_human": "23.71 MB",
        "converted_size": 5826410,
        "converted_size_human": "5.56 MB",
        "compression_ratio": "4.27:1"
    },
    "metrics": {
        "psnr": "44.52 dB",
        "ssim": "0.9998",
        "mse": "2.29"
    },
    "quality_passed": "yes",
    "thresholds": {
        "psnr": 40.0,
        "ssim": 0.95,
        "mse": 50.0
    },
    "recommendations": [
        "No quality issues detected. All metrics within acceptable thresholds."
    ]
}

This information helps you evaluate the effectiveness of different compression settings.

Compression Ratio (BnF Mode)

BnF mode focuses on maintaining specific compression ratios within tolerance (default: 5%).

Advanced Configuration

Configuration Files

Save current configuration:

python -m cli.workflow input_dir/ output_dir/ --parallel --save-config my_config.yaml

Use saved configuration:

python -m cli.workflow input_dir/ output_dir/ --config my_config.yaml

Environment Variables

All configuration options can be set via environment variables:

export JP2FORGE_PROCESSING_MAX_WORKERS=4
export JP2FORGE_BNF_COMPLIANT=true
export JP2FORGE_JPEG2000_COMPRESSION_MODE=supervised

Advanced Memory Management

# Process large images with streaming processor
python -m cli.workflow large_images/ output_dir/ --memory-limit 2048

# Force chunking for all images
python -m cli.workflow input_dir/ output_dir/ --force-chunking

# Set minimum chunk height for processing
python -m cli.workflow input_dir/ output_dir/ --min-chunk-height 32

Parallel Processing Options

# Use adaptive worker pool with resource monitoring
python -m cli.workflow input_dir/ output_dir/ --parallel --adaptive-workers

# Set minimum and maximum worker count
python -m cli.workflow input_dir/ output_dir/ --parallel --min-workers 2 --max-workers 8

# Set resource thresholds for scaling
python -m cli.workflow input_dir/ output_dir/ --parallel --memory-threshold 0.8 --cpu-threshold 0.9

Custom Color Management

# Preserve color profiles
python -m cli.workflow input_dir/ output_dir/ --preserve-icc-profiles

# Specify default color profiles
python -m cli.workflow input_dir/ output_dir/ --default-rgb-profile=/path/to/srgb.icc

API Usage

Standard Workflow

from core.types import WorkflowConfig, DocumentType, CompressionMode
from workflow.standard import StandardWorkflow

# Create configuration
config = WorkflowConfig(
    output_dir="output_dir/",
    report_dir="reports/",
    document_type=DocumentType.PHOTOGRAPH,
    compression_mode=CompressionMode.SUPERVISED,
    quality_threshold=40.0
)

# Create workflow
workflow = StandardWorkflow(config)

# Process a file
result = workflow.process_file("input.tif")

Advanced Parallel Workflow

from core.types import WorkflowConfig, ProcessingMode
from workflow.parallel import ParallelWorkflow
from utils.parallel.adaptive_pool import AdaptiveWorkerPool
from utils.parallel.progress_tracker import ProgressTracker

# Create advanced configuration
config = WorkflowConfig(
    output_dir="output_dir/",
    processing_mode=ProcessingMode.PARALLEL,
    max_workers=8,
    min_workers=2,
    memory_limit_mb=4096
)

# Create parallel workflow
workflow = ParallelWorkflow(config)

# Set up progress callback
def progress_callback(progress_data):
    print(f"Completed: {progress_data['percent_complete']:.1f}%, ETA: {progress_data['eta_time']}")

# Process a directory with progress tracking
results = workflow.process_directory(
    "input_dir/", 
    recursive=True,
    progress_callback=progress_callback
)

Memory-Efficient Image Processing

from utils.imaging.streaming_processor import StreamingImageProcessor

# Create streaming processor
processor = StreamingImageProcessor(
    memory_limit_mb=2048,
    min_chunk_height=16
)

# Process a large image
def enhance_image(img):
    # Image processing function
    return img.filter(ImageFilter.SHARPEN)

processor.process_in_chunks(
    "large_image.tif",
    "output.jp2",
    enhance_image,
    save_kwargs={'quality': 90}
)

Working with Configuration

from utils.config.config_manager import ConfigManager

# Create config manager
config_manager = ConfigManager()

# Load configuration from various sources
config_manager.load_from_file('config.yaml')
config_manager.load_from_env(prefix='JP2FORGE_')

# Get configuration values
output_dir = config_manager.get('output.directory', 'default_dir/')
max_workers = config_manager.get('processing.max_workers', 4)

# Validate configuration
is_valid, issues = config_manager.validate()
if not is_valid:
    for issue in issues:
        print(f"Configuration issue: {issue}")

Troubleshooting

Common Issues

Memory Issues with Large Images

If you encounter memory errors with large images:

  • Use the streaming processor: --memory-limit 2048
  • Adjust chunk size: --min-chunk-height 32
  • Enable memory mapping: --use-memory-mapping

Slow Processing

If processing is slow:

  • Use adaptive parallel processing: --parallel --adaptive-workers
  • Adjust worker count: --max-workers 8
  • Check for other resource-intensive processes running on your system

Metadata Issues

If metadata operations fail:

  • Check if ExifTool is installed and accessible
  • Use the tool manager to diagnose: python -m utils.tools.tool_manager check
  • Try specifying ExifTool path explicitly: --exiftool-path=/path/to/exiftool

Configuration Problems

If configuration isn't being applied:

  • Check for configuration file syntax errors
  • Verify environment variable format (should be JP2FORGE_SECTION_KEY=value)
  • Use --debug mode to see configuration loading details

Intellectual Property

JP2Forge is designed with careful consideration for intellectual property rights:

  • Independent Implementation: JP2Forge is an original implementation that doesn't reuse code from other JPEG2000 libraries like OpenJPEG
  • BnF Specifications: Implementation follows published BnF technical specifications while acknowledging BnF's intellectual property
  • Third-Party Software:
    • Uses Pillow (permissive license) for basic operations
    • Optional Kakadu integration requires separate licensing and installation

For detailed information about intellectual property considerations, see the IP Considerations Document.

Project Architecture

The project follows a modular architecture with well-defined components:

  • CLI Interface: Entry point for command line operations
  • Workflow Components: Manages processing pipeline (standard and parallel)
  • Core Components: Handles compression, analysis, and metadata
  • Utility Modules: Provides support for image processing, validation, etc.

For a detailed visual representation of the architecture and workflow, see the JP2Forge Workflow Diagram.

Project Origin and Implementation

JP2Forge is created by xy-liao and implements the JPEG2000 standard with BnF (Bibliothèque nationale de France) compliance. The implementation follows the technical specifications described in BnF reference documents.

Implementation Note

JP2Forge is an independent implementation of the JPEG2000 standard with BnF compliance. It is not based on or derived from OpenJPEG's implementation. Instead, it relies on either Pillow or Kakadu (if available) for actual JPEG2000 encoding/decoding operations.

Technical Specifications (BnF Standards)

BnF JPEG2000 Parameters

  • Compression: Irreversible (9-7 floating transform, ICT)
  • Compression Ratios:
    • Specialized documents: 1:4
    • Exceptional documents: 1:4
    • Standard printed documents: 1:6
    • Grayscale transparent documents: 1:16
  • Tolerance: 5% (configurable)
  • Fallback: Lossless compression (5-3 integer transform, RCT) for files outside tolerance
  • Resolution Levels: 10
  • Quality Levels: 10
  • Progression Order: RPCL (Resolution-Position-Component-Layer)
  • Robustness Markers: SOP, EPH, PLT
  • Code Block Size: 64x64
  • Tile Size: 1024x1024
  • Precinct Size: {256,256},{256,256},{128,128}

BnF Metadata Structure

Required XMP metadata in UUID box (BE7ACFCB97A942E89C71999491E3AFAC):

  • dcterms:isPartOf: Document identifier (e.g., "NUM_123456")
  • dcterms:provenance: Document owner (e.g., "Bibliothèque nationale de France")
  • dc:relation: ARK identifier (e.g., "ark:/12148/cb123456789")
  • dc:source: Original document call number (e.g., "FOL-Z-123")
  • tiff:Model: Device model used for digitization
  • tiff:Make: Device manufacturer
  • aux:SerialNumber: Device serial number
  • xmp:CreatorTool: Software used for creation
  • xmp:CreateDate: Creation date (ISO-8601 format)
  • xmp:ModifyDate: Last modification date (ISO-8601 format)
  • tiff:Artist: Digitization operator or organization

Project Origin and Acknowledgments

JP2Forge was created by xy-liao and was inspired by the BnF (Bibliothèque nationale de France) image format specifications. The implementation follows the technical requirements described in their reference documents.

Implementation Note

JP2Forge is an independent implementation of the JPEG2000 standard with BnF compliance. It is not based on or derived from OpenJPEG's implementation. Instead, it relies on either Pillow or Kakadu (if available) for actual JPEG2000 encoding/decoding operations.

JPEG2000 Library Comparison

There are several JPEG2000 implementations available. Here's how JP2Forge relates to them:

Library License Relationship to JP2Forge
Pillow HPND/PIL License (permissive) Primary library used by JP2Forge for non-BnF compliant operations.
Kakadu Commercial Optional integration for BnF-compliant encoding. Users must acquire separately.
OpenJPEG BSD 2-Clause Not used by JP2Forge. Independent, open-source JPEG2000 codec.
JasPer JasPer License (MIT-style) Not used by JP2Forge.
Grok/OpenJPH BSD 2-Clause Not used by JP2Forge.

JP2Forge is not derived from any of these implementations but rather uses Pillow's JPEG2000 support for basic operations and optionally interfaces with Kakadu for BnF-compliant processing when available.

References

  1. BnF Referential (2015): "Référentiel de format de fichier image v2" - PDF
  2. BnF Documentation (2021): "Formats de données pour la préservation à long terme" - PDF
  3. JPEG2000 Standard: ISO/IEC 15444

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jp2forge-0.9.1.tar.gz (160.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jp2forge-0.9.1-py3-none-any.whl (125.2 kB view details)

Uploaded Python 3

File details

Details for the file jp2forge-0.9.1.tar.gz.

File metadata

  • Download URL: jp2forge-0.9.1.tar.gz
  • Upload date:
  • Size: 160.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for jp2forge-0.9.1.tar.gz
Algorithm Hash digest
SHA256 1a25b412ed1e4237ddb7a68b7cbfba0f3bef265f9bfe33d6346c492366f70c17
MD5 ebfa5eaea36b230469015c279ff36467
BLAKE2b-256 0ccba52d400a2964a847736f932b8f6b2f7fd70abae03af70492bbffc239c56c

See more details on using hashes here.

File details

Details for the file jp2forge-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: jp2forge-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 125.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for jp2forge-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f5b907d78b8a611f09abd1a833293a624a60ead3ab2a0ebe252cc3fdbf20021a
MD5 1f4eb20176f67c42e712147dabc0956c
BLAKE2b-256 306ea2c227e1ac6e28900ff1b5a18c3e470bdd23cb5986bb640c95b6eccc8a64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page