Skip to main content

Simple Narrative Edge Segmenter — a ModernBERT-based scene/chapter boundary detector

Project description

SNES — Simple Narrative Edge Segmenter

ModernBERT-based scene/chapter boundary detector for narrative text.

SNES predicts paragraph-level transitions (edges) in long-form documents using an encoder with 8k token context.

Install

pip install -e .

Requires Python 3.10+ and the dependencies listed in pyproject.toml.

Quick Start

Train:

snes-train --train_file data/train.jsonl --val_file data/val.jsonl \
           --model_name answerdotai/ModernBERT-base \
           --output_dir ./snes_model \
           --epochs 3 --lr 2e-5 --batch_size 1 \
           --max_length 8192

Evaluate:

snes-eval --model ./snes_model --data data/test.jsonl

Infer:

snes-infer story.txt --threshold 0.35 --output scene_breaks.json

Data Format (.jsonl)

Each record is one story pre-split into paragraphs:

{"story_id": "uuid-1234", "paragraphs": ["Para 1", "Para 2"], "labels": [0, 1]}

Optional keys: soft_labels, meta.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snes-0.0.1.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snes-0.0.1-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file snes-0.0.1.tar.gz.

File metadata

  • Download URL: snes-0.0.1.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for snes-0.0.1.tar.gz
Algorithm Hash digest
SHA256 24ebf849d40b50f1bfee4361329d1ee4913fc7b3ff36f6cc27cb3e3733812760
MD5 22f693016357ed93144b97c794a00cdb
BLAKE2b-256 37b12a4efb435fa3285c2538257ef9d8291396d5d27d3590a0da1de0e1ca0cfb

See more details on using hashes here.

File details

Details for the file snes-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: snes-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for snes-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 180b7b70b4ea7413b7f2f25f600e105bd064889001a7b36cd0bc966999e71673
MD5 71731040fec35d384c6aa76e71169135
BLAKE2b-256 9f5f5bbdfcd5400298f9d233a4baa5bb6129464ac5a7834d7d59359dbe529b0d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page