Skip to main content

Historical document analysis CLI - Extract, analyze, and present handwritten text from document images

Project description

Flatfish Logo

Flatfish

Historical document analysis CLI - Extract, analyze, and present handwritten text from document images.

Features

  • 📜 Handwritten Text Recognition (HTR) - Extract text from historical document images
  • 🏷️ Named Entity Recognition - Identify people, places, dates, and more with contextual descriptions
  • 📊 AI-Powered Summaries - Generate timelines, track changes, and suggest research questions
  • 🌐 Static Website Builder - Create searchable, browsable document collections

Installation

pip install flatfish

Quick Start

# Initialize a new project
flatfish init

# Edit configuration
nano flatfish.yaml
nano .env

# Validate setup
flatfish validate

# Process documents
flatfish process

# Preview the site
flatfish publish

Configuration

flatfish.yaml

dataset:
  source: "username/dataset-name"
  splits:
    - "train"
  image_column: "image"

processing:
  extract_entities: true
  entity_context: true

summary:
  enabled: true
  model: "qwen-vl-max"

website:
  title: "Document Collection"
  password: "changeme"

.env

HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxx
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxx

Commands

Command Description
flatfish init Initialize a new project
flatfish process Run the full pipeline
flatfish extract Extract text from images only
flatfish entities Extract entities only
flatfish summarize Generate AI summary only
flatfish build Build static site only
flatfish serve Preview site locally
flatfish deploy Deploy to Netlify
flatfish status Show processing status
flatfish validate Validate configuration

Deployment .

Deploy your site to Netlify:

# Install netlify-python
pip install netlify-python

# Set your Netlify token (get from https://app.netlify.com/user/applications)
export NETLIFY_TOKEN=your-token
export NETLIFY_SITE_ID=your-site-id

# Deploy a draft preview
flatfish deploy

# Deploy to production
flatfish deploy --prod

# Specify a site ID directly
flatfish deploy --prod --site your-site-id

Output

project/
├── transcriptions/     # Extracted text files
├── entities/           # Entity JSON files
├── summaries/          # AI-generated summaries
└── _site/              # Built static website

License

MIT

Disclosure of Delegation to Generative AI

The authors declare the use of generative AI in the research and writing process. According to the GAIDeT taxonomy (2025), the following tasks were delegated to GAI tools under full human supervision:

  • Code generation
  • Code optimization

The GAI tool used was: Claude Sonnet. Responsibility for the final manuscript lies entirely with the authors. GAI tools are not listed as authors and do not bear responsibility for the final outcomes. Declaration submitted by: Andrew Janco

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flatfish-0.1.3.tar.gz (61.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flatfish-0.1.3-py3-none-any.whl (77.6 kB view details)

Uploaded Python 3

File details

Details for the file flatfish-0.1.3.tar.gz.

File metadata

  • Download URL: flatfish-0.1.3.tar.gz
  • Upload date:
  • Size: 61.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for flatfish-0.1.3.tar.gz
Algorithm Hash digest
SHA256 de0f56b53f1fce8fb0c3159acab4996eb7982ae9c41698794fe661612cae8358
MD5 6f575d935c135d7df5f154f8c78ddcc1
BLAKE2b-256 ac7bfe01b68ec6e810bebaa9775ca5695c5d43634851d30181db0c006f6d448a

See more details on using hashes here.

File details

Details for the file flatfish-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: flatfish-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 77.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for flatfish-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f3ea6348c2c55e9649dc49df3acbe4cc640282cc3482158a919f0f55dc2438e4
MD5 cc6650173074e553d8c3861879eede57
BLAKE2b-256 c129bef9b7cb54b5ba1ffed5a0b09de5706ff779d4e1b65131f80c50cb545bb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page