Skip to main content

Historical document analysis CLI - Extract, analyze, and present handwritten text from document images

Project description

Flatfish Logo

Flatfish

Historical document analysis CLI - Extract, analyze, and present handwritten text from document images.

Features

  • 📜 Handwritten Text Recognition (HTR) - Extract text from historical document images
  • 🏷️ Named Entity Recognition - Identify people, places, dates, and more with contextual descriptions
  • 📊 AI-Powered Summaries - Generate timelines, track changes, and suggest research questions
  • 🌐 Static Website Builder - Create searchable, browsable document collections

Installation

pip install flatfish

Quick Start

# Initialize a new project
flatfish init

# Edit configuration
nano flatfish.yaml
nano .env

# Validate setup
flatfish validate

# Process documents
flatfish process

# Preview the site
flatfish publish

Configuration

flatfish.yaml

dataset:
  source: "username/dataset-name"
  splits:
    - "train"
  image_column: "image"

processing:
  extract_entities: true
  entity_context: true

summary:
  enabled: true
  model: "qwen-vl-max"

website:
  title: "Document Collection"
  password: "changeme"

.env

HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxx
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxx

Commands

Command Description
flatfish init Initialize a new project
flatfish process Run the full pipeline
flatfish extract Extract text from images only
flatfish entities Extract entities only
flatfish summarize Generate AI summary only
flatfish build Build static site only
flatfish serve Preview site locally
flatfish deploy Deploy to Netlify
flatfish status Show processing status
flatfish validate Validate configuration

Deployment .

Deploy your site to Netlify:

# Install netlify-python
pip install netlify-python

# Set your Netlify token (get from https://app.netlify.com/user/applications)
export NETLIFY_TOKEN=your-token
export NETLIFY_SITE_ID=your-site-id

# Deploy a draft preview
flatfish deploy

# Deploy to production
flatfish deploy --prod

# Specify a site ID directly
flatfish deploy --prod --site your-site-id

Output

project/
├── transcriptions/     # Extracted text files
├── entities/           # Entity JSON files
├── summaries/          # AI-generated summaries
└── _site/              # Built static website

License

MIT

Disclosure of Delegation to Generative AI

The authors declare the use of generative AI in the research and writing process. According to the GAIDeT taxonomy (2025), the following tasks were delegated to GAI tools under full human supervision:

  • Code generation
  • Code optimization

The GAI tool used was: Claude Sonnet. Responsibility for the final manuscript lies entirely with the authors. GAI tools are not listed as authors and do not bear responsibility for the final outcomes. Declaration submitted by: Andrew Janco

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flatfish-0.1.0.tar.gz (59.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flatfish-0.1.0-py3-none-any.whl (74.0 kB view details)

Uploaded Python 3

File details

Details for the file flatfish-0.1.0.tar.gz.

File metadata

  • Download URL: flatfish-0.1.0.tar.gz
  • Upload date:
  • Size: 59.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for flatfish-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fcb3bd40cd7e839b22b40c3b98c97928f502018e356b9c5e76fa665d25017676
MD5 622c4477e8036e0f8d48080e081781bf
BLAKE2b-256 84b596a133cc2fc63171e751aab18ea83258353212b50045c39f5fbe88ee7019

See more details on using hashes here.

File details

Details for the file flatfish-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: flatfish-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 74.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for flatfish-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8355d2fd4876975a840a94e9755be37889b0020d4ef3f23a165d4efbceb4e58
MD5 67d19ffe8545d3bca3f9e7706e3e426e
BLAKE2b-256 656e4a02aec666b8f7162191da309ccd4ce28352e525250058b4e3c56e734b9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page