Skip to main content

Historical document analysis CLI - Extract, analyze, and present handwritten text from document images

Project description

Flatfish Logo

Flatfish

Historical document analysis CLI - Extract, analyze, and present handwritten text from document images.

Features

  • 📜 Handwritten Text Recognition (HTR) - Extract text from historical document images
  • 🏷️ Named Entity Recognition - Identify people, places, dates, and more with contextual descriptions
  • 📊 AI-Powered Summaries - Generate timelines, track changes, and suggest research questions
  • 🌐 Static Website Builder - Create searchable, browsable document collections

Installation

pip install flatfish

Quick Start

# Initialize a new project
flatfish init

# Edit configuration
nano flatfish.yaml
nano .env

# Validate setup
flatfish validate

# Process documents
flatfish process

# Preview the site
flatfish publish

Configuration

flatfish.yaml

dataset:
  source: "username/dataset-name"
  splits:
    - "train"
  image_column: "image"

processing:
  extract_entities: true
  entity_context: true

summary:
  enabled: true
  model: "qwen-vl-max"

website:
  title: "Document Collection"
  password: "changeme"

.env

HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxx
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxx

Commands

Command Description
flatfish init Initialize a new project
flatfish process Run the full pipeline
flatfish extract Extract text from images only
flatfish entities Extract entities only
flatfish summarize Generate AI summary only
flatfish build Build static site only
flatfish serve Preview site locally
flatfish deploy Deploy to Netlify
flatfish status Show processing status
flatfish validate Validate configuration

Deployment .

Deploy your site to Netlify:

# Install netlify-python
pip install netlify-python

# Set your Netlify token (get from https://app.netlify.com/user/applications)
export NETLIFY_TOKEN=your-token
export NETLIFY_SITE_ID=your-site-id

# Deploy a draft preview
flatfish deploy

# Deploy to production
flatfish deploy --prod

# Specify a site ID directly
flatfish deploy --prod --site your-site-id

Output

project/
├── transcriptions/     # Extracted text files
├── entities/           # Entity JSON files
├── summaries/          # AI-generated summaries
└── _site/              # Built static website

License

MIT

Disclosure of Delegation to Generative AI

The authors declare the use of generative AI in the research and writing process. According to the GAIDeT taxonomy (2025), the following tasks were delegated to GAI tools under full human supervision:

  • Code generation
  • Code optimization

The GAI tool used was: Claude Sonnet. Responsibility for the final manuscript lies entirely with the authors. GAI tools are not listed as authors and do not bear responsibility for the final outcomes. Declaration submitted by: Andrew Janco

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flatfish-0.1.2.tar.gz (62.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flatfish-0.1.2-py3-none-any.whl (77.9 kB view details)

Uploaded Python 3

File details

Details for the file flatfish-0.1.2.tar.gz.

File metadata

  • Download URL: flatfish-0.1.2.tar.gz
  • Upload date:
  • Size: 62.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for flatfish-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e3043ef82bbde2faee5ba33237ef1e8a047b1263008e13878fb3fd7f3c0ae9f3
MD5 78acd4982cee87fcd059de897ebdfa4f
BLAKE2b-256 9c7a085c859cfe46a75912396125f2843301fb374fef962020f5753b4f8713e2

See more details on using hashes here.

File details

Details for the file flatfish-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: flatfish-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 77.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for flatfish-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c27049ffaa2a352913fdb38198a00a2a1ff15664c2186f5066ef52c456bf21e4
MD5 5f7bc70c88386ff161fee4e01abb85ec
BLAKE2b-256 a26684b407c6230d80a905374d7f6a9e0ff900e9d097707fae4b122262d571fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page