Skip to main content

Historical document analysis CLI - Extract, analyze, and present handwritten text from document images

Project description

Flatfish Logo

Flatfish

Historical document analysis CLI - Extract, analyze, and present handwritten text from document images.

Features

  • 📜 Handwritten Text Recognition (HTR) - Extract text from historical document images
  • 🏷️ Named Entity Recognition - Identify people, places, dates, and more with contextual descriptions
  • 📊 AI-Powered Summaries - Generate timelines, track changes, and suggest research questions
  • 🌐 Static Website Builder - Create searchable, browsable document collections

Installation

pip install flatfish

Quick Start

# Initialize a new project
flatfish init

# Edit configuration
nano flatfish.yaml
nano .env

# Validate setup
flatfish validate

# Process documents
flatfish process

# Preview the site
flatfish publish

Configuration

flatfish.yaml

dataset:
  source: "username/dataset-name"
  splits:
    - "train"
  image_column: "image"

processing:
  extract_entities: true
  entity_context: true

summary:
  enabled: true
  model: "qwen-vl-max"

website:
  title: "Document Collection"
  password: "changeme"

.env

HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxx
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxx

Commands

Command Description
flatfish init Initialize a new project
flatfish process Run the full pipeline
flatfish extract Extract text from images only
flatfish entities Extract entities only
flatfish summarize Generate AI summary only
flatfish build Build static site only
flatfish serve Preview site locally
flatfish deploy Deploy to Netlify
flatfish status Show processing status
flatfish validate Validate configuration

Deployment .

Deploy your site to Netlify:

# Install netlify-python
pip install netlify-python

# Set your Netlify token (get from https://app.netlify.com/user/applications)
export NETLIFY_TOKEN=your-token
export NETLIFY_SITE_ID=your-site-id

# Deploy a draft preview
flatfish deploy

# Deploy to production
flatfish deploy --prod

# Specify a site ID directly
flatfish deploy --prod --site your-site-id

Output

project/
├── transcriptions/     # Extracted text files
├── entities/           # Entity JSON files
├── summaries/          # AI-generated summaries
└── _site/              # Built static website

License

MIT

Disclosure of Delegation to Generative AI

The authors declare the use of generative AI in the research and writing process. According to the GAIDeT taxonomy (2025), the following tasks were delegated to GAI tools under full human supervision:

  • Code generation
  • Code optimization

The GAI tool used was: Claude Sonnet. Responsibility for the final manuscript lies entirely with the authors. GAI tools are not listed as authors and do not bear responsibility for the final outcomes. Declaration submitted by: Andrew Janco

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flatfish-0.1.4.tar.gz (62.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flatfish-0.1.4-py3-none-any.whl (78.5 kB view details)

Uploaded Python 3

File details

Details for the file flatfish-0.1.4.tar.gz.

File metadata

  • Download URL: flatfish-0.1.4.tar.gz
  • Upload date:
  • Size: 62.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for flatfish-0.1.4.tar.gz
Algorithm Hash digest
SHA256 5d70e228cad4e16c783fb38b664b79bb25a36be99e56fd6889541ecb2cc44ee0
MD5 4928df8499af723fc670ce144e1bb2f0
BLAKE2b-256 8bd15d1664bca12945f0b46c2172d8de0d149513b0cfc8071b6234832e343d77

See more details on using hashes here.

File details

Details for the file flatfish-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: flatfish-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 78.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for flatfish-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 619e2db4404dcef1190630fdf059bf42733d51613e23a7d497eb1a1fddb3da35
MD5 8e6ae1e58f13f0a5dba136bc9879937e
BLAKE2b-256 36a11186e0a67130982292d402c23c33794c64093146279e06ffd0c9a5265110

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page