Skip to main content

Archival document management as a reusable Django app

Project description

grime

Grime is an archival document management platform — a reusable Django app for ingesting scanned documents, running OCR / NER pipelines, and annotating pages with tagged regions.

Quick start

pip install -e ".[dev]"
python manage.py migrate
python manage.py createsuperuser
python manage.py ingest MY_DOCUMENT.pdf
# Should return an endpoint like '/documents/1'

python manage.py runserver
# then visit http://127.0.0.1:8000/documents/1
# or visit http://127.0.0.1:8000/admin, login, and 
# navigate to you document

Management commands

You can run ocr and ner page by page in the document viewer or you can bulk process documents from the command line using the following commands:

python manage.py ocr        --document 42 [--page N] [--textract] [--force] [--dry-run]
python manage.py ner        --document 42 [--page N] [--threshold 0.85] [--force] [--dry-run]
python manage.py match_tags --label "member entry" [--source-document 3] [--target-document 5] \
                            [--create-tags] [--force] [--min-score 0.5] [--tolerance 0.08]

Status

This is an initial scaffold. The admin loads and the management commands run end-to-end, but the embedded document viewer (templates/admin/grime/_document_viewer.html) is read-only: bboxes and tags render on the page image, but interactive editing (OCR correction, tag CRUD, NER label correction) needs AJAX endpoints that have not been implemented yet.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grime-0.1.1.tar.gz (104.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grime-0.1.1-py3-none-any.whl (101.8 kB view details)

Uploaded Python 3

File details

Details for the file grime-0.1.1.tar.gz.

File metadata

  • Download URL: grime-0.1.1.tar.gz
  • Upload date:
  • Size: 104.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for grime-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2bce3cdaae702ec312822ae91cf1d4cdcb4177cc316070e92447ba3b1a411395
MD5 5d6202526e5e6034fe52704c257fe6ca
BLAKE2b-256 4af151b85a38f93e825d5fcbaa078beec85d45451b467cd83a365297525f42f6

See more details on using hashes here.

File details

Details for the file grime-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: grime-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 101.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for grime-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2cb65ddceadd04b0edab406a71aaa7e196e300e34db478aa6b8a1a0290caa068
MD5 9f75311eb562a49dd633685e19e7a054
BLAKE2b-256 42bc9901ddc1db507d41a744655ff23b2c76e822aaa29ab11a234311032b0952

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page