Skip to main content

Document Scanner SDK for document edge detection, border cropping, perspective correction and brightness adjustment

Project description

Python Document Scanner SDK

A Python wrapper for the Dynamsoft Document Normalizer SDK, providing simple and user-friendly APIs across Windows, Linux, and macOS. Compatible with desktop PCs, embedded devices, Raspberry Pi, and Jetson Nano.

Note: This is an unofficial, community-maintained wrapper. For official support and full feature coverage, consider the Dynamsoft Capture Vision Bundle on PyPI.

Quick Links

Comparison: Community vs Official

Feature Community Wrapper Official Dynamsoft SDK
Support Community-driven ✅ Official Dynamsoft support
Documentation Basic README and limited examples ✅ Comprehensive online documentation
API Coverage Core features only ✅ Full API coverage
Updates May lag behind ✅ Always includes the latest features
Testing Tested in limited environments ✅ Thoroughly tested
API Usage ✅ Simple and intuitive More complex and verbose

Installation

Requirements

  • Python 3.x

  • OpenCV (for UI display)

    pip install opencv-python
    
  • Dynamsoft Capture Vision Bundle SDK

    pip install dynamsoft-capture-vision-bundle
    

Build from Source

# Source distribution
python setup.py sdist

# Build wheel
python setup.py bdist_wheel

Command-line Usage

After installation, you can use the built-in command-line interface:

# Scan document from image file
scandocument -f <file-name> -l <license-key>

# Scan documents from camera (camera index 0)
scandocument -c 1 -l <license-key>

Quick Start

Document Detection Example

Basic Document Detection

import docscanner
import cv2

# Initialize license (required)
docscanner.initLicense("YOUR_LICENSE_KEY")  # Get trial key from Dynamsoft

# Create scanner instance
scanner = docscanner.createInstance()

# Detect from image file
results = scanner.detect("document.jpg")

# OR detect from OpenCV image matrix
image = cv2.imread("document.jpg")
results = scanner.detect(image)

# Process results
for result in results:
    print(f"Document found:")
    print(f"  Top-left: ({result.x1}, {result.y1})")
    print(f"  Top-right: ({result.x2}, {result.y2})")
    print(f"  Bottom-right: ({result.x3}, {result.y3})")
    print(f"  Bottom-left: ({result.x4}, {result.y4})")
    
    # Draw detection rectangle
    import numpy as np
    corners = np.array([(result.x1, result.y1), (result.x2, result.y2), 
                       (result.x3, result.y3), (result.x4, result.y4)])
    cv2.drawContours(image, [corners.astype(int)], -1, (0, 255, 0), 2)

cv2.imshow("Detected Documents", image)
cv2.waitKey(0)

Document Normalization (Perspective Correction)

import docscanner
import cv2
from docscanner import *

# Setup (license + scanner)
docscanner.initLicense("YOUR_LICENSE_KEY")
scanner = docscanner.createInstance()

# Detect documents
results = scanner.detect("skewed_document.jpg")

if results:
    result = results[0]  # Process first detected document
    
    # Normalize the document (correct perspective) - now returns the image
    normalized_img = scanner.normalize(result, EnumImageColourMode.ICM_COLOUR)
    
    # Use the returned normalized image directly
    if normalized_img is not None:
        cv2.imshow("Original", cv2.imread("skewed_document.jpg"))
        cv2.imshow("Normalized", normalized_img)
        cv2.waitKey(0)
        
        # Save normalized image
        cv2.imwrite("normalized_document.jpg", normalized_img)
        print("Normalized document saved!")
        

Real-time Camera Scanning

import docscanner
import cv2
import numpy as np

def on_document_detected(results):
    """Callback function for async document detection"""
    for result in results:
        print(f"Document detected at ({result.x1},{result.y1}), ({result.x2},{result.y2}), ({result.x3},{result.y3}), ({result.x4},{result.y4})")

# Setup
docscanner.initLicense("YOUR_LICENSE_KEY")
scanner = docscanner.createInstance()

# Start async detection
scanner.addAsyncListener(on_document_detected)

# Camera loop
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    # Queue frame for async processing
    scanner.detectMatAsync(frame)
    
    # Display frame
    cv2.imshow("Document Scanner", frame)
    
    key = cv2.waitKey(1) & 0xFF
    if key == ord('q'):
        break

# Cleanup
scanner.clearAsyncListener()
cap.release()
cv2.destroyAllWindows()

API Reference

Core Functions

docscanner.initLicense(license_key: str) -> Tuple[int, str]

Initialize the Dynamsoft license. Required before using any other functions.

Parameters:

  • license_key: Your Dynamsoft license key

Returns:

  • (error_code, error_message): License initialization result

Example:

error_code, error_msg = docscanner.initLicense("YOUR_LICENSE_KEY")
if error_code != 0:
    print(f"License error: {error_msg}")

docscanner.createInstance() -> DocumentScanner

Create a new DocumentScanner instance.

Returns:

  • DocumentScanner: Ready-to-use scanner instance

DocumentScanner Class

Detection Methods

detect(input: Union[str, numpy.ndarray]) -> List[DocumentResult]

Detect documents from various input sources (unified detection method).

Parameters:

  • input: Input source for document detection:
    • str: File path to image (JPEG, PNG, BMP, TIFF, etc.)
    • numpy.ndarray: OpenCV image matrix (BGR or grayscale)

Returns:

  • List[DocumentResult]: List of detected documents with boundary coordinates

Examples:

# Detect from file path
results = scanner.detect("document.jpg")

# Detect from OpenCV matrix
import cv2
image = cv2.imread("document.jpg") 
results = scanner.detect(image)

# Process results
for result in results:
    print(f"Found document at ({result.x1},{result.y1}), ({result.x2},{result.y2}), ({result.x3},{result.y3}), ({result.x4},{result.y4})")

Asynchronous Processing

addAsyncListener(callback: Callable[[List[DocumentResult]], None]) -> None

Start asynchronous document detection with callback.

Parameters:

  • callback: Function called with detection results

Example:

def on_documents_found(results):
    print(f"Found {len(results)} documents")

scanner.addAsyncListener(on_documents_found)
detectMatAsync(image: numpy.ndarray) -> None

Queue an image for asynchronous processing.

Parameters:

  • image: OpenCV image to process
clearAsyncListener() -> None

Stop asynchronous processing and remove callback.

Document Normalization

normalize(document: DocumentResult, color: EnumImageColourMode) -> numpy.ndarray

Perform document normalization (perspective correction) on a detected document.

Parameters:

  • document: DocumentResult containing boundary coordinates and source image
  • color: Color mode for output (ICM_COLOUR, ICM_GRAYSCALE, or ICM_BINARY)

Returns:

  • numpy.ndarray or None: The normalized document image as numpy array, or None if normalization fails

Usage Patterns:

# Method 1: Use return value directly
normalized_img = scanner.normalize(result, EnumImageColourMode.ICM_COLOUR)
if normalized_img is not None:
    cv2.imshow("Normalized", normalized_img)

# Method 2: Access from document object (also available)
scanner.normalize(result, EnumImageColourMode.ICM_COLOUR)
if result.normalized_image is not None:
    cv2.imwrite("output.jpg", result.normalized_image)

DocumentResult Class

Container for document detection results.

Attributes:

  • x1, y1: Top-left corner coordinates
  • x2, y2: Top-right corner coordinates
  • x3, y3: Bottom-right corner coordinates
  • x4, y4: Bottom-left corner coordinates
  • source: Original image (file path or numpy array)
  • normalized_image: Perspective-corrected image (numpy array)

Utility Functions

convertMat2ImageData(mat: numpy.ndarray) -> ImageData

Convert OpenCV matrix to Dynamsoft ImageData format.

Parameters:

  • mat: OpenCV image (RGB, BGR, or grayscale)

Returns:

  • ImageData: SDK-compatible image data

convertNormalizedImage2Mat(normalized_image: ImageData) -> numpy.ndarray

Convert Dynamsoft ImageData back to OpenCV-compatible numpy array.

Parameters:

  • normalized_image: ImageData object from SDK normalization results

Returns:

  • numpy.ndarray: OpenCV-compatible image matrix

Supported Formats:

  • Binary images (1-bit): Converted to 8-bit grayscale
  • Grayscale images: Single channel 8-bit
  • Color images: 3-channel RGB format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document_scanner_sdk-3.0.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

document_scanner_sdk-3.0.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file document_scanner_sdk-3.0.0.tar.gz.

File metadata

  • Download URL: document_scanner_sdk-3.0.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.24

File hashes

Hashes for document_scanner_sdk-3.0.0.tar.gz
Algorithm Hash digest
SHA256 f3ab394bc3d3d530a03c8df81ea07c630c6d7e9cac6f9a7c1d15c6cc4f29dadd
MD5 50a58d388aef83e3b3cc49e04f1cd95f
BLAKE2b-256 b746bc035af6b530dc43dd684c4b10744a68c6a4ef4f10e18491da049357398c

See more details on using hashes here.

File details

Details for the file document_scanner_sdk-3.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for document_scanner_sdk-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b543544cac8a59c4c1f1452ee2125309e51ee5037177012f868c8b1dde836f2
MD5 e0df4c44d3607f79980399bac2e13729
BLAKE2b-256 08ff52a4ac0345a10d34ff1a120177da943a4d4ae5e0f35e4aff0c39b7e22cf1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page