Skip to main content

Historical document analysis CLI - Extract, analyze, and present handwritten text from document images

Project description

Flatfish Logo

Flatfish

Historical document analysis CLI - Extract, analyze, and present handwritten text from document images.

Features

  • 📜 Handwritten Text Recognition (HTR) - Extract text from historical document images
  • 🏷️ Named Entity Recognition - Identify people, places, dates, and more with contextual descriptions
  • 📊 AI-Powered Summaries - Generate timelines, track changes, and suggest research questions
  • 🌐 Static Website Builder - Create searchable, browsable document collections

Installation

pip install flatfish

Quick Start

# Initialize a new project
flatfish init

# Edit configuration
nano flatfish.yaml
nano .env

# Validate setup
flatfish validate

# Process documents
flatfish process

# Preview the site
flatfish publish

Configuration

flatfish.yaml

dataset:
  source: "username/dataset-name"
  splits:
    - "train"
  image_column: "image"

processing:
  extract_entities: true
  entity_context: true

summary:
  enabled: true
  model: "qwen-vl-max"

website:
  title: "Document Collection"
  password: "changeme"

.env

HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxx
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxx

Commands

Command Description
flatfish init Initialize a new project
flatfish process Run the full pipeline
flatfish extract Extract text from images only
flatfish entities Extract entities only
flatfish summarize Generate AI summary only
flatfish build Build static site only
flatfish serve Preview site locally
flatfish deploy Deploy to Netlify
flatfish status Show processing status
flatfish validate Validate configuration

Deployment .

Deploy your site to Netlify:

# Install netlify-python
pip install netlify-python

# Set your Netlify token (get from https://app.netlify.com/user/applications)
export NETLIFY_TOKEN=your-token
export NETLIFY_SITE_ID=your-site-id

# Deploy a draft preview
flatfish deploy

# Deploy to production
flatfish deploy --prod

# Specify a site ID directly
flatfish deploy --prod --site your-site-id

Output

project/
├── transcriptions/     # Extracted text files
├── entities/           # Entity JSON files
├── summaries/          # AI-generated summaries
└── _site/              # Built static website

License

MIT

Disclosure of Delegation to Generative AI

The authors declare the use of generative AI in the research and writing process. According to the GAIDeT taxonomy (2025), the following tasks were delegated to GAI tools under full human supervision:

  • Code generation
  • Code optimization

The GAI tool used was: Claude Sonnet. Responsibility for the final manuscript lies entirely with the authors. GAI tools are not listed as authors and do not bear responsibility for the final outcomes. Declaration submitted by: Andrew Janco

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flatfish-0.1.1.tar.gz (62.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flatfish-0.1.1-py3-none-any.whl (77.8 kB view details)

Uploaded Python 3

File details

Details for the file flatfish-0.1.1.tar.gz.

File metadata

  • Download URL: flatfish-0.1.1.tar.gz
  • Upload date:
  • Size: 62.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for flatfish-0.1.1.tar.gz
Algorithm Hash digest
SHA256 84aeb5476ce2dee97c20a92ee9884a96b94d044253ae15378a2bbc97f125bd09
MD5 889209fbe4b273b55bc204ab144d68a7
BLAKE2b-256 865382a3ddb0c610da70d8689eedc9d2e51c5f8a2237567d7d6a6fe0df52e576

See more details on using hashes here.

File details

Details for the file flatfish-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: flatfish-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 77.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for flatfish-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1c1f15eaccc1bcae9be100828ffecd65eaf29216d800c4ddc2c5949867052c37
MD5 8a6765d61073996627fec28af5e9f676
BLAKE2b-256 2fbf5f56e541de6e4837dc378a3b99dc6564a90a19140b67eb948c422dd38b31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page