LASER (Least Action Semantic Router) — globally optimal text chunking for RAG

These details have not been verified by PyPI

Project links

Project description

LASR — Least Action Semantic Router

Pronounced "laser"

Globally optimal text chunking for RAG pipelines. LASR treats chunking as a physics-inspired optimization problem — it considers every possible way to partition a document and selects the one that minimizes a global objective balancing semantic cohesion against boundary cost.

No heuristics. No greedy local decisions. Just dynamic programming that finds the mathematically optimal partition.

Install

pip install lasr
python -m spacy download en_core_web_sm

Quick Start

from lasr import chunk

chunks = chunk(open("document.txt").read())

for c in chunks:
    print(f"[{c.start_char}:{c.end_char}] ({c.num_sentences} sentences)")
    print(c.text)
    print("---")

Control Granularity

from lasr import chunk

# Fewer, larger chunks (higher alpha = more expensive boundaries)
chunks = chunk(document, alpha=3.0)

# More, smaller chunks
chunks = chunk(document, alpha=1.5)

# Adjust sentence constraints
chunks = chunk(document, min_sentences=3, max_sentences=20)

Power-User API

For full control, use LaserPipeline and LaserConfig directly:

from lasr import LaserPipeline, LaserConfig

config = LaserConfig(
    alpha_base=2.5,     # boundary cost
    rho=1.0,            # tension coefficient
    l_min=5,            # min sentences per chunk
    l_max=30,           # max sentences per chunk
    model_name="all-MiniLM-L6-v2",
)

pipeline = LaserPipeline(config)
chunks = pipeline.chunk(text)

# Each chunk has context bleed for richer retrieval
for c in chunks:
    print(c.text)               # core DP-optimal text
    print(c.text_with_context)  # with 1-sentence bleed from neighbors

CLI

lasr chunk document.txt --alpha 2.5 --format json
lasr chunk document.txt --format text --output chunks.txt
lasr chunk document.txt --encoder openai --model text-embedding-3-large

Parameters

Parameter	Default	Effect
`alpha` / `alpha_base`	2.5	Boundary cost. Higher = fewer, larger chunks.
`rho`	1.0	Tension coefficient (anchor parameter).
`min_sentences` / `l_min`	5	Minimum sentences per chunk.
`max_sentences` / `l_max`	30	Maximum sentences per chunk.
`w_struct`	0.25	Structural discount (headers, double newlines).
`w_bind`	1.0	Coreference binding penalty (pronouns).
`w_disc`	0.3	Discourse connective penalty.

Benchmark Highlights

All results use all-MiniLM-L6-v2 (22M parameters, 384 dimensions) with alpha=2.5.

Dataset	Domain	LASR Recall@5	Next Best	Margin
MSMARCO	Web passages	0.999	0.985	+0.014
HotpotQA	Multi-hop QA	0.974	0.972	+0.002
FinanceBench	SEC filings	0.930	0.629	+0.301
CUAD	Legal contracts	0.826	0.775	+0.051

LASR places first on every retrieval benchmark tested. On FinanceBench, the margin over the next best method is 30 percentage points.

How It Works

LASR models each document as a chain of semantic units (sentences) and finds the partition that minimizes:

Action = Tension + Boundary Cost

Tension measures semantic dispersion inside each chunk (cosine distance to centroid via prefix sums)
Boundary Cost (alpha) penalizes each split, preventing over-fragmentation

The optimization is solved exactly via dynamic programming in O(T * L_max) time, where T is the number of sentences. No approximations, no sampling — the same input always produces the same output.

Development

git clone https://github.com/lasr-chunker/lasr
cd lasr
pip install -e ".[dev]"
python -m spacy download en_core_web_sm
pytest

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lasr-0.1.0-py3-none-any.whl (18.9 kB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file lasr-0.1.0-py3-none-any.whl.

File metadata

Download URL: lasr-0.1.0-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 18.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for lasr-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d64e471912f5763ad49ea90c9560cc380d1bc6399c3219319a0d650324e8397d`
MD5	`d48125aa166942a433bd896d3142b49e`
BLAKE2b-256	`1a833d36eda4bb12ec4011a5b68a0adc1e98a52a5add72c5c6fc147959771558`

See more details on using hashes here.

lasr 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LASR — Least Action Semantic Router

Install

Quick Start

Control Granularity

Power-User API

CLI

Parameters

Benchmark Highlights

How It Works

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes