Skip to main content

Read marked paper forms (bubble sheets, surveys, checklists, exams) into structured data — locally, no cloud.

Project description

marksense

CI PyPI License: MIT

Read marked paper forms — bubble sheets, surveys, checklists, exams — into structured data (JSON/CSV). Runs locally: no cloud, no account, no telemetry.

Point marksense at a scan or phone photo of a filled form plus a template you define once, and it returns every answer with a confidence score:

marksense read examples/samples/quiz_filled_01.png -t examples/templates/quiz.json
{
  "form_type": "quiz",
  "source": "examples/samples/quiz_filled_01.png",
  "answers": {
    "Q1": "A",
    "Q2": "D",
    "Q3": "D"
  }
}

(output truncated — the full result carries all 20 answers plus per-question confidence, multi-mark flags, and per-page alignment confidence)

Why marksense

  • Any layout. Layout knowledge lives in a template JSON, not in code — checkboxes, bubbles, grids, multi-page forms, mixed mark types. Adding a form means writing JSON, never code.
  • Robust to real-world scans. Every page is aligned onto the blank template (feature matching
    • ECC refinement) before detection, so skewed scans and phone photos read correctly.
  • Model-free by default, learned detection when you want it. A pixel-density detector works out of the box with zero downloads; a small ONNX mark-detection model can be plugged in (--model / --download) for harder real-world scans.
  • Lean runtime. onnxruntime, OpenCV, NumPy, PyMuPDF. No PyTorch, no GPU needed.

Install

pip install marksense

Quickstart

The repository ships a self-contained synthetic demo (generated by examples/generate_samples.py):

# Read one form -> JSON on stdout
marksense read examples/samples/quiz_filled_01.png -t examples/templates/quiz.json

# A whole stack -> one CSV row per form
marksense batch examples/samples/ -t examples/templates/survey.json -o results.csv

# Check a template you are authoring
marksense template validate examples/templates/quiz.json

Or from Python:

from marksense import read_form

result = read_form("scan.jpg", template="my-form.json")
result.answers                # {"Q1": "3", ...}
result.confidence             # per-question confidence
result.multi_marked           # questions with more than one mark (review these)
result.to_csv()               # question,answer,confidence,flags

Run as a service

pip install "marksense[service]"
marksense serve                       # http://127.0.0.1:8000, bundled demo templates
marksense serve --templates-dir ./my-templates --port 9000

Or with Docker:

docker build -t marksense .
docker run -p 8000:8000 -v ./my-templates:/templates marksense

Endpoints: GET /health, GET /templates, POST /read?form_type=<name> (multipart file). Interactive docs at /docs.

Reading your own forms

  1. Get a clean image of the blank form (render the PDF or scan an empty copy).
  2. Write a template JSON describing where each option is — see the template authoring guide.
  3. marksense template validate my-form.json, then marksense read.

How it works

input (PDF / image)
  └─ render pages ──> align to template ──> detect marks ──> map to answers
                      (ORB → SIFT → ECC)    (ONNX YOLO or     (nearest ROI +
                                             density fallback)  confidence)

The detector only knows two things: what a check looks like and what a circle looks like. All form-specific knowledge — page sizes, question positions, answer values — is declarative template JSON. That separation is what makes the engine general.

v0.1 uses the density detector by default — no downloads, fully offline. Learned-model weights trained on public datasets ship in an upcoming release; pass --download / auto_download=True to fetch them once published (cached under ~/.marksense/models/), or --model path/to/weights.onnx to use your own.

Roadmap

  • Self-hosted REST service + Docker image (shipped in v0.2)
  • Published accuracy benchmarks
  • Clean-provenance model weights (training pipeline and guide: docs/training.md)
  • Template authoring helpers (auto-detect form regions)

Development

git clone https://github.com/RoyAbra27/marksense
cd marksense
uv venv && uv pip install -e ".[dev]"
pytest            # full suite runs with no model file and no network
ruff check .

Design docs live in docs/design/; start with 0001-marksense-v1.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marksense-0.2.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marksense-0.2.0-py3-none-any.whl (35.8 kB view details)

Uploaded Python 3

File details

Details for the file marksense-0.2.0.tar.gz.

File metadata

  • Download URL: marksense-0.2.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marksense-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0dbba97247f5bc5da62ad20a4d900c3bf49f8c8b251b09f53f8d0870c5143a65
MD5 6c0260c320ade228ce25eeaed6ee1acf
BLAKE2b-256 e5ea188918f39bb1b3cb52e434772da5ebbb2f81097fbb4a5db6dc54b8aba6d5

See more details on using hashes here.

Provenance

The following attestation bundles were made for marksense-0.2.0.tar.gz:

Publisher: release.yml on RoyAbra27/marksense

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file marksense-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: marksense-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 35.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for marksense-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b5f0dd8a2c376d0f7a6858dc780c9558c793997ee63da2c80ec9453c02d86cf
MD5 dc01bf8202a6a4dd83940f9cf1b4a12b
BLAKE2b-256 3029e1f201011def29fc2e96ad9e4b82c569c30df334eac8ff91f13f41b89bb3

See more details on using hashes here.

Provenance

The following attestation bundles were made for marksense-0.2.0-py3-none-any.whl:

Publisher: release.yml on RoyAbra27/marksense

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page