Skip to main content

Document Scanner SDK for document edge detection, border cropping, perspective correction and brightness adjustment

Project description

Python Document Scanner SDK

This project provides Python bindings for the Dynamsoft C/C++ Document Scanner SDK v1.x, enabling developers to quickly create document scanner applications for Windows and Linux desktop environments.

Note: This project is an unofficial, community-maintained Python wrapper for the Dynamsoft Document Normalizer SDK. For those seeking the most reliable and fully-supported solution, Dynamsoft offers an official Python package. Visit the Dynamsoft Capture Vision Bundle page on PyPI for more details.

About Dynamsoft Capture Vision Bundle

Comparison Table

Feature Unofficial Wrapper (Community) Official Dynamsoft Capture Vision SDK
Support Community-driven, best effort Official support from Dynamsoft
Documentation README only Comprehensive Online Documentation
API Coverage Limited Full API coverage
Feature Updates May lag behind the official SDK First to receive new features
Compatibility Limited testing across environments Thoroughly tested across all supported environments
OS Support Windows, Linux Windows, Linux, macOS

Supported Python Versions

  • Python 3.x

Dependencies

Install the required dependencies using pip:

pip install opencv-python

Command-line Usage

  • Scan documents from images:

    scandocument -f <file-name> -l <license-key>
    
  • Scan documents from a camera video stream:

    scandocument -c 1 -l <license-key>
    

Quick Start

  • Scan documents from an image file:

    import argparse
    import docscanner
    import sys
    import numpy as np
    import cv2
    import time
    
    def showNormalizedImage(name, normalized_image):
        mat = docscanner.convertNormalizedImage2Mat(normalized_image)
        cv2.imshow(name, mat)
        return mat
    
    def process_file(filename, scanner):
        image = cv2.imread(filename)
        results = scanner.detectMat(image)
        for result in results:
            x1 = result.x1
            y1 = result.y1
            x2 = result.x2
            y2 = result.y2
            x3 = result.x3
            y3 = result.y3
            x4 = result.x4
            y4 = result.y4
            
            normalized_image = scanner.normalizeBuffer(image, x1, y1, x2, y2, x3, y3, x4, y4)
            showNormalizedImage("Normalized Image", normalized_image)
            cv2.drawContours(image, [np.intp([(x1, y1), (x2, y2), (x3, y3), (x4, y4)])], 0, (0, 255, 0), 2)
        
        cv2.imshow('Document Image', image)
        cv2.waitKey(0)
        
        normalized_image.save(str(time.time()) + '.png')
        print('Image saved')
    
    def scandocument():
        """
        Command-line script for scanning documents from a given image
        """
        parser = argparse.ArgumentParser(description='Scan documents from an image file')
        parser.add_argument('-f', '--file', help='Path to the image file')
        parser.add_argument('-l', '--license', default='', type=str, help='Set a valid license key')
        args = parser.parse_args()
        # print(args)
        try:
            filename = args.file
            license = args.license
            
            if filename is None:
                parser.print_help()
                return
            
            # set license
            if  license == '':
                docscanner.initLicense("LICENSE-KEY")
            else:
                docscanner.initLicense(license)
                
            # initialize mrz scanner
            scanner = docscanner.createInstance()
            ret = scanner.setParameters(docscanner.Templates.color)
    
            if filename is not None:
                process_file(filename, scanner)
                
        except Exception as err:
            print(err)
            sys.exit(1)
    
    scandocument()
    

    python document scanner from file

  • Scan documents from camera video stream:

    import argparse
    import docscanner
    import sys
    import numpy as np
    import cv2
    import time
    
    g_results = None
    g_normalized_images = []
    
    
    def callback(results):
        global g_results
        g_results = results
    
    
    def showNormalizedImage(name, normalized_image):
        mat = docscanner.convertNormalizedImage2Mat(normalized_image)
        cv2.imshow(name, mat)
        return mat
    
    
    def process_video(scanner):
        scanner.addAsyncListener(callback)
    
        cap = cv2.VideoCapture(0)
        while True:
            ret, image = cap.read()
    
            ch = cv2.waitKey(1)
            if ch == 27:
                break
            elif ch == ord('n'):  # normalize image
                if g_results != None:
                    g_normalized_images = []
                    index = 0
                    for result in g_results:
                        x1 = result.x1
                        y1 = result.y1
                        x2 = result.x2
                        y2 = result.y2
                        x3 = result.x3
                        y3 = result.y3
                        x4 = result.x4
                        y4 = result.y4
    
                        normalized_image = scanner.normalizeBuffer(
                            image, x1, y1, x2, y2, x3, y3, x4, y4)
                        g_normalized_images.append(
                            (str(index), normalized_image))
                        mat = showNormalizedImage(str(index), normalized_image)
                        index += 1
            elif ch == ord('s'):  # save image
                for data in g_normalized_images:
                    # cv2.imwrite('images/' + str(time.time()) + '.png', image)
                    cv2.destroyWindow(data[0])
                    data[1].save(str(time.time()) + '.png')
                    print('Image saved')
    
                g_normalized_images = []
    
            if image is not None:
                scanner.detectMatAsync(image)
    
            if g_results != None:
                for result in g_results:
                    x1 = result.x1
                    y1 = result.y1
                    x2 = result.x2
                    y2 = result.y2
                    x3 = result.x3
                    y3 = result.y3
                    x4 = result.x4
                    y4 = result.y4
    
                    cv2.drawContours(
                        image, [np.intp([(x1, y1), (x2, y2), (x3, y3), (x4, y4)])], 0, (0, 255, 0), 2)
    
            cv2.putText(image, 'Press "n" to normalize image',
                        (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)
            cv2.putText(image, 'Press "s" to save image', (10, 60),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)
            cv2.putText(image, 'Press "ESC" to exit', (10, 90),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2)
            cv2.imshow('Document Scanner', image)
    
    
    docscanner.initLicense(
        "LICENSE-KEY")
    
    scanner = docscanner.createInstance()
    ret = scanner.setParameters(docscanner.Templates.color)
    process_video(scanner)
    

    python document scanner from camera

API Methods

  • docscanner.initLicense('YOUR-LICENSE-KEY'): Set the license key.

    docscanner.initLicense("LICENSE-KEY")
    
  • docscanner.createInstance(): Create a Document Scanner instance.

    scanner = docscanner.createInstance()
    
  • detectFile(filename): Perform edge detection from an image file.

    results = scanner.detectFile(<filename>)
    
  • detectMat(Mat image): Perform edge detection from an OpenCV Mat.

    image = cv2.imread(<filename>)
    results = scanner.detectMat(image)
    for result in results:
        x1 = result.x1
        y1 = result.y1
        x2 = result.x2
        y2 = result.y2
        x3 = result.x3
        y3 = result.y3
        x4 = result.x4
        y4 = result.y4
    
  • setParameters(Template): Select color, binary, or grayscale template.

    scanner.setParameters(docscanner.Templates.color)
    
  • addAsyncListener(callback function): Start a native thread to run document scanning tasks asynchronously.

  • detectMatAsync(<opencv mat data>): Queue a document scanning task into the native thread.

    def callback(results):
        for result in results:
            print(result.x1)
            print(result.y1)
            print(result.x2)
            print(result.y2)
            print(result.x3)
            print(result.y3)
            print(result.x4)
            print(result.y4)
                                                        
    import cv2
    image = cv2.imread(<filename>)
    scanner.addAsyncListener(callback)
    scanner.detectMatAsync(image)
    sleep(5)
    
  • normalizeBuffer(mat, x1, y1, x2, y2, x3, y3, x4, y4): Perform perspective correction from an OpenCV Mat.

    normalized_image = scanner.normalizeBuffer(image, x1, y1, x2, y2, x3, y3, x4, y4)
    
  • normalizeFile(filename, x1, y1, x2, y2, x3, y3, x4, y4): Perform perspective correction from an image file.

    normalized_image = scanner.normalizeFile(<filename>, x1, y1, x2, y2, x3, y3, x4, y4)
    
  • normalized_image.save(filename): Save the normalized image to a file.

    normalized_image.save(<filename>)
    
  • normalized_image.recycle(): Release the memory of the normalized image.

  • clearAsyncListener(): Stop the native thread and clear the registered Python function.

How to Build the Python Document Scanner Extension

  • Create a source distribution:

    python setup.py sdist
    
  • setuptools:

    python setup_setuptools.py build
    python setup_setuptools.py develop 
    
  • Build wheel:

    pip wheel . --verbose
    # Or
    python setup.py bdist_wheel
    

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

document-scanner-sdk-1.1.1.tar.gz (22.0 MB view details)

Uploaded Source

Built Distribution

document_scanner_sdk-1.1.1-cp310-cp310-win_amd64.whl (8.0 MB view details)

Uploaded CPython 3.10 Windows x86-64

File details

Details for the file document-scanner-sdk-1.1.1.tar.gz.

File metadata

  • Download URL: document-scanner-sdk-1.1.1.tar.gz
  • Upload date:
  • Size: 22.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for document-scanner-sdk-1.1.1.tar.gz
Algorithm Hash digest
SHA256 9c9895577b18129abcb3d68cfbc35c25e88501d6c46eff0410fc72ade057256c
MD5 d81c561b1914defb6177317bd631e6b5
BLAKE2b-256 693a4592ed53a0dc1cee1ef95fa072255bf078135fa374eb3bf1deab2ca900c9

See more details on using hashes here.

File details

Details for the file document_scanner_sdk-1.1.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for document_scanner_sdk-1.1.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 23ba23c00107020fca5830e0a7e9d99058669ad291cd54974bcb359350a07823
MD5 a86ee0778e3eb2c3292b52d45b39dc3c
BLAKE2b-256 d9194dfb8c82f74376201f189a0193b9c5297c62557051224b85507e6772dae9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page