Automated signature placement for synthetic data generation - designed for creating ML training datasets
Project description
SignLib - Signature Placement for Synthetic Data Generation
SignLib is a Python library for automatically placing signatures on documents. It is specifically designed for generating synthetic training data for AI and machine learning models. The library processes signature images, removes backgrounds, and intelligently positions them on documents.
Use Cases
- Generate synthetic signed documents for training machine learning models
- Create diverse training datasets with varied signature positions and styles
- Automate document processing pipelines for testing and development
- Batch process large document collections with consistent signature placement
- Augment existing datasets with signature variations
Features
- Automatic background removal from signature images
- Intelligent positioning to find optimal white space in documents
- Color adaptation: auto-detect document text color or specify custom colors
- Automatic scaling based on document dimensions
- Optional rotation for natural appearance and variation
- Support for multiple formats: PDF, TIFF, PNG, JPEG
- Customizable position control with bottom_percent and right_percent parameters
- High-quality image processing with contrast enhancement
Installation
pip install signlib
Quick Start
Basic Usage
from signlib import create_sign
# Simplest usage - auto-detect signature color
create_sign('document.pdf', 'signature.png')
# Specify output path
create_sign('document.pdf', 'signature.png', output_path='signed_document.pdf')
Custom Color
from signlib import create_sign
# Blue signature
create_sign('document.pdf', 'signature.png', signature_color=(0, 0, 255))
# Black signature
create_sign('document.pdf', 'signature.png', signature_color=(0, 0, 0))
# Dark gray signature
create_sign('document.pdf', 'signature.png', signature_color=(50, 50, 50))
Position Control (New Feature)
from signlib import create_sign
# Search in bottom 40% and right 40% of document
create_sign(
'document.pdf',
'signature.png',
bottom_percent=40, # Search from bottom 40% upward
right_percent=40 # Search from right 40% leftward
)
# Place signature in bottom-left area
create_sign(
'document.pdf',
'signature.png',
bottom_percent=30, # Bottom 30%
right_percent=70 # Left 70% (starting from right)
)
Advanced Usage
from signlib import create_sign
# Full control over all parameters
create_sign(
document_path='document.tif',
sign_path='signature.png',
output_path='signed.tif',
signature_color=(0, 0, 100), # Dark blue
scale_factor=0.15, # 15% of document width
rotation_angle=5.0, # 5 degrees clockwise
bottom_percent=25, # Bottom 25%
right_percent=50 # Right 50%
)
Batch Processing for Synthetic Training Data
from signlib import create_sign
from pathlib import Path
# Create synthetic training data
doc_folder = Path('documents/')
signature_folder = Path('signatures/')
output_folder = Path('synthetic_data/')
output_folder.mkdir(exist_ok=True)
# Generate diverse signed documents
for doc_file in doc_folder.glob('*.pdf'):
for sig_file in signature_folder.glob('*.png'):
output_name = f"{doc_file.stem}_{sig_file.stem}_signed.pdf"
output_path = output_folder / output_name
create_sign(
str(doc_file),
str(sig_file),
output_path=str(output_path),
scale_factor=0.12, # Vary these for diversity
rotation_angle=0.0,
bottom_percent=25,
right_percent=50
)
print(f"Generated: {output_name}")
Class-Based Usage (Advanced)
from signlib import SignatureProcessor
processor = SignatureProcessor()
# Step-by-step processing
result = processor.create_sign(
document_path='document.pdf',
sign_path='signature.png',
signature_color=None, # Auto-detect
scale_factor=0.12,
rotation_angle=0.0,
enhance_contrast=True,
bottom_percent=25,
right_percent=50
)
print(f"Signed document: {result}")
API Reference
create_sign() Function
| Parameter | Type | Default | Description |
|---|---|---|---|
document_path |
str | Required | Path to document file |
sign_path |
str | Required | Path to signature file |
output_path |
str | None | Output file path (auto-generated if None) |
signature_color |
tuple | None | RGB color (r, g, b). None for auto-detect |
scale_factor |
float | 0.12 | Signature size ratio (0.12 = 12% of document width) |
rotation_angle |
float | 0.0 | Rotation angle in degrees |
bottom_percent |
float | 25 | Search area from bottom (25 = bottom 25%) |
right_percent |
float | 50 | Search area from right (50 = right 50%) |
Position Control
-
bottom_percent: Controls how far from the bottom to search
- 25 = Search in bottom 25% of document (default, professional)
- 40 = Search in bottom 40% (more flexible)
- 50 = Search in bottom 50% (entire lower half)
-
right_percent: Controls how far from the right to search
- 50 = Search in right 50% of document (default, typical signature position)
- 40 = Search in rightmost 40% (more to the right)
- 70 = Search in right 70% (includes left-center area)
Designed for Synthetic Data Generation
SignLib is designed for generating synthetic training data:
- Consistent Quality: Generate thousands of signed documents with consistent quality
- Variation Control: Easily control position, size, rotation, and color for data diversity
- Batch Processing: Process large datasets efficiently
- Reproducible: Same parameters produce same results for reproducible experiments
Notes
- Signature files should be in PNG format (for transparent background support)
- Supported document formats: PDF, TIFF, PNG, JPEG
- When
signature_color=None, the library auto-detects the most common dark color from the document - Signatures are typically placed in the bottom-right area within the whitest available space
- Background is automatically removed and contrast is enhanced
Requirements
- Python 3.6+
- Pillow >= 9.0.0
- NumPy >= 1.20.0
License
MIT License - Free to use in commercial and open-source projects.
Author
Cagri Gungor (@cagrigungor)
Specialized in synthetic data generation for machine learning applications.
Contributing
Contributions are welcome. Please feel free to submit a Pull Request.
Issues
Found a bug or have a feature request? Please open an issue on GitHub.
Example: Generate Training Dataset
import random
from signlib import create_sign
from pathlib import Path
# Generate diverse training dataset
documents = list(Path('documents').glob('*.pdf'))
signatures = list(Path('signatures').glob('*.png'))
for i in range(1000): # Generate 1000 synthetic samples
doc = random.choice(documents)
sig = random.choice(signatures)
# Vary parameters for diversity
create_sign(
str(doc),
str(sig),
output_path=f'training_data/sample_{i:04d}.pdf',
scale_factor=random.uniform(0.10, 0.15),
rotation_angle=random.uniform(-10, 10),
bottom_percent=random.randint(20, 35),
right_percent=random.randint(40, 60)
)
SignLib - Automated signature placement for synthetic data generation and machine learning training datasets.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file signlib-1.0.2.tar.gz.
File metadata
- Download URL: signlib-1.0.2.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31913574037d42759465f1e218dd1164046dbd5b3ebd48a81420b6aba533f7aa
|
|
| MD5 |
599da98220dc625f5f2604d43e7e5aef
|
|
| BLAKE2b-256 |
459d08e31c307ceaf377f76071f25b000d0fb6cc1018065fa91bcaf77d7e7b7c
|
File details
Details for the file signlib-1.0.2-py3-none-any.whl.
File metadata
- Download URL: signlib-1.0.2-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a77aefa1877c4f02daf78ff5b15a601f046506c185b9a98fab3cfd1916542ea
|
|
| MD5 |
a80ebe6e94408c0c843393adc2f1ed2b
|
|
| BLAKE2b-256 |
f7802c2d2c112ec9a24aeade6fbb28992afe14c58142e29845dcb7a167ebb6fd
|