Document Scanner SDK for document edge detection, border cropping, perspective correction and brightness adjustment
Project description
Python Document Scanner SDK
This project provides Python bindings for the Dynamsoft C/C++ Document Scanner SDK v1.x, enabling developers to quickly create document scanner applications for Windows and Linux desktop environments.
Note: This project is an unofficial, community-maintained Python wrapper for the Dynamsoft Document Normalizer SDK. For those seeking the most reliable and fully-supported solution, Dynamsoft offers an official Python package. Visit the Dynamsoft Capture Vision Bundle page on PyPI for more details.
About Dynamsoft Capture Vision Bundle
- Activate the SDK with a 30-day FREE trial license.
- Install the SDK via
pip install dynamsoft-capture-vision-bundle
.
Comparison Table
Feature | Unofficial Wrapper (Community) | Official Dynamsoft Capture Vision SDK |
---|---|---|
Support | Community-driven, best effort | Official support from Dynamsoft |
Documentation | README only | Comprehensive Online Documentation |
API Coverage | Limited | Full API coverage |
Feature Updates | May lag behind the official SDK | First to receive new features |
Compatibility | Limited testing across environments | Thoroughly tested across all supported environments |
OS Support | Windows, Linux | Windows, Linux, macOS |
Supported Python Versions
- Python 3.x
Dependencies
Install the required dependencies using pip:
pip install opencv-python
Command-line Usage
-
Scan documents from images:
scandocument -f <file-name> -l <license-key>
-
Scan documents from a camera video stream:
scandocument -c 1 -l <license-key>
Quick Start
-
Scan documents from an image file:
import argparse import docscanner import sys import numpy as np import cv2 import time def showNormalizedImage(name, normalized_image): mat = docscanner.convertNormalizedImage2Mat(normalized_image) cv2.imshow(name, mat) return mat def process_file(filename, scanner): image = cv2.imread(filename) results = scanner.detectMat(image) for result in results: x1 = result.x1 y1 = result.y1 x2 = result.x2 y2 = result.y2 x3 = result.x3 y3 = result.y3 x4 = result.x4 y4 = result.y4 normalized_image = scanner.normalizeBuffer(image, x1, y1, x2, y2, x3, y3, x4, y4) showNormalizedImage("Normalized Image", normalized_image) cv2.drawContours(image, [np.intp([(x1, y1), (x2, y2), (x3, y3), (x4, y4)])], 0, (0, 255, 0), 2) cv2.imshow('Document Image', image) cv2.waitKey(0) normalized_image.save(str(time.time()) + '.png') print('Image saved') def scandocument(): """ Command-line script for scanning documents from a given image """ parser = argparse.ArgumentParser(description='Scan documents from an image file') parser.add_argument('-f', '--file', help='Path to the image file') parser.add_argument('-l', '--license', default='', type=str, help='Set a valid license key') args = parser.parse_args() # print(args) try: filename = args.file license = args.license if filename is None: parser.print_help() return # set license if license == '': docscanner.initLicense("LICENSE-KEY") else: docscanner.initLicense(license) # initialize mrz scanner scanner = docscanner.createInstance() ret = scanner.setParameters(docscanner.Templates.color) if filename is not None: process_file(filename, scanner) except Exception as err: print(err) sys.exit(1) scandocument()
-
Scan documents from camera video stream:
import argparse import docscanner import sys import numpy as np import cv2 import time g_results = None g_normalized_images = [] def callback(results): global g_results g_results = results def showNormalizedImage(name, normalized_image): mat = docscanner.convertNormalizedImage2Mat(normalized_image) cv2.imshow(name, mat) return mat def process_video(scanner): scanner.addAsyncListener(callback) cap = cv2.VideoCapture(0) while True: ret, image = cap.read() ch = cv2.waitKey(1) if ch == 27: break elif ch == ord('n'): # normalize image if g_results != None: g_normalized_images = [] index = 0 for result in g_results: x1 = result.x1 y1 = result.y1 x2 = result.x2 y2 = result.y2 x3 = result.x3 y3 = result.y3 x4 = result.x4 y4 = result.y4 normalized_image = scanner.normalizeBuffer( image, x1, y1, x2, y2, x3, y3, x4, y4) g_normalized_images.append( (str(index), normalized_image)) mat = showNormalizedImage(str(index), normalized_image) index += 1 elif ch == ord('s'): # save image for data in g_normalized_images: # cv2.imwrite('images/' + str(time.time()) + '.png', image) cv2.destroyWindow(data[0]) data[1].save(str(time.time()) + '.png') print('Image saved') g_normalized_images = [] if image is not None: scanner.detectMatAsync(image) if g_results != None: for result in g_results: x1 = result.x1 y1 = result.y1 x2 = result.x2 y2 = result.y2 x3 = result.x3 y3 = result.y3 x4 = result.x4 y4 = result.y4 cv2.drawContours( image, [np.intp([(x1, y1), (x2, y2), (x3, y3), (x4, y4)])], 0, (0, 255, 0), 2) cv2.putText(image, 'Press "n" to normalize image', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2) cv2.putText(image, 'Press "s" to save image', (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2) cv2.putText(image, 'Press "ESC" to exit', (10, 90), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 0, 255), 2) cv2.imshow('Document Scanner', image) docscanner.initLicense( "LICENSE-KEY") scanner = docscanner.createInstance() ret = scanner.setParameters(docscanner.Templates.color) process_video(scanner)
API Methods
-
docscanner.initLicense('YOUR-LICENSE-KEY')
: Set the license key.docscanner.initLicense("LICENSE-KEY")
-
docscanner.createInstance()
: Create a Document Scanner instance.scanner = docscanner.createInstance()
-
detectFile(filename)
: Perform edge detection from an image file.results = scanner.detectFile(<filename>)
-
detectMat(Mat image)
: Perform edge detection from an OpenCV Mat.image = cv2.imread(<filename>) results = scanner.detectMat(image) for result in results: x1 = result.x1 y1 = result.y1 x2 = result.x2 y2 = result.y2 x3 = result.x3 y3 = result.y3 x4 = result.x4 y4 = result.y4
-
setParameters(Template)
: Select color, binary, or grayscale template.scanner.setParameters(docscanner.Templates.color)
-
addAsyncListener(callback function)
: Start a native thread to run document scanning tasks asynchronously. -
detectMatAsync(<opencv mat data>)
: Queue a document scanning task into the native thread.def callback(results): for result in results: print(result.x1) print(result.y1) print(result.x2) print(result.y2) print(result.x3) print(result.y3) print(result.x4) print(result.y4) import cv2 image = cv2.imread(<filename>) scanner.addAsyncListener(callback) scanner.detectMatAsync(image) sleep(5)
-
normalizeBuffer(mat, x1, y1, x2, y2, x3, y3, x4, y4)
: Perform perspective correction from an OpenCV Mat.normalized_image = scanner.normalizeBuffer(image, x1, y1, x2, y2, x3, y3, x4, y4)
-
normalizeFile(filename, x1, y1, x2, y2, x3, y3, x4, y4)
: Perform perspective correction from an image file.normalized_image = scanner.normalizeFile(<filename>, x1, y1, x2, y2, x3, y3, x4, y4)
-
normalized_image.save(filename)
: Save the normalized image to a file.normalized_image.save(<filename>)
-
normalized_image.recycle()
: Release the memory of the normalized image. -
clearAsyncListener()
: Stop the native thread and clear the registered Python function.
How to Build the Python Document Scanner Extension
-
Create a source distribution:
python setup.py sdist
-
setuptools:
python setup_setuptools.py build python setup_setuptools.py develop
-
Build wheel:
pip wheel . --verbose # Or python setup.py bdist_wheel
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file document-scanner-sdk-1.1.1.tar.gz
.
File metadata
- Download URL: document-scanner-sdk-1.1.1.tar.gz
- Upload date:
- Size: 22.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c9895577b18129abcb3d68cfbc35c25e88501d6c46eff0410fc72ade057256c |
|
MD5 | d81c561b1914defb6177317bd631e6b5 |
|
BLAKE2b-256 | 693a4592ed53a0dc1cee1ef95fa072255bf078135fa374eb3bf1deab2ca900c9 |
File details
Details for the file document_scanner_sdk-1.1.1-cp310-cp310-win_amd64.whl
.
File metadata
- Download URL: document_scanner_sdk-1.1.1-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 8.0 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 23ba23c00107020fca5830e0a7e9d99058669ad291cd54974bcb359350a07823 |
|
MD5 | a86ee0778e3eb2c3292b52d45b39dc3c |
|
BLAKE2b-256 | d9194dfb8c82f74376201f189a0193b9c5297c62557051224b85507e6772dae9 |