Open-justice RAG framework - jurisdiction-specific legal Q&A over public court decisions
Project description
Astraea
Open-justice RAG framework for building jurisdiction-specific legal Q&A tools over public court decisions.
Named after Astraea, the Greek goddess of justice who carried the scales.
What it is
A small runtime framework that provides the infrastructure for legal RAG tools - SSE streaming, concurrent request queue, statute routing, live legislation anchors, citation verification, security hardening, and smoke tests - so that a new jurisdiction only needs to provide one Python module.
from jurisdictions.nz_tenancy import jurisdiction
from core.api import create_app
app = create_app(jurisdiction)
Design principles
- One process = one jurisdiction. No multi-tenancy, no plugin registry. Simple deployment.
- Four required things. A jurisdiction must provide: a name, a corpus config, a system prompt, and a route table. Everything else has a working default.
- Security and queue are non-overridable. Input sanitization, request body limits, security headers, and queue concurrency are enforced by core regardless of jurisdiction config.
- Scraper is offline. Ingestion runs separately from the API. Core only needs a populated Qdrant collection conforming to
schemas/qdrant_payload.schema.json. - Tests are data-driven. Jurisdictions provide smoke test fixtures; core runs the test suite against them automatically.
Supported jurisdictions
| Jurisdiction | Status | Corpus |
|---|---|---|
NZ Tenancy (nz_tenancy) |
Live - tenancy.localrun.ai | 31,000+ Tenancy Tribunal decisions, RTA 1986 + Healthy Homes Standards 2019 |
NZ Legal (nz_legal) |
Live - nz-legal-rag.localrun.ai | All NZ courts, 3M+ chunks (NZHC, NZCA, NZSC, NZERA, NZEmpC, NZTT) |
NZ Employment (nz_employment) |
Ready | 300+ ERA + Employment Court decisions through May 2026, live ERA 2000 |
NSW Tenancy (nsw_tenancy) |
PoC (framework demo) | Proves interface generalises - not actively developed |
Adding a new jurisdiction
See CONTRIBUTING.md for the full fork-to-running walkthrough.
Quick version:
- Copy
examples/minimal_jurisdiction/tojurisdictions/your_name/ - Implement the 4 required properties in
jurisdiction.py - Run the contract tests:
pytest tests/core/test_jurisdiction_contract.py --jurisdiction your_name - Ingest your corpus into Qdrant (see
ingest/andschemas/qdrant_payload.schema.json) - Add smoke fixtures and run:
pytest tests/jurisdictions/test_smoke.py --jurisdiction your_name -m retrieval
Jurisdiction extension points
Beyond the 4 required properties, jurisdictions can opt into additional behaviour:
Extra routes (register_routes)
Add jurisdiction-specific endpoints (e.g. structured data trackers) on top of the core API:
def register_routes(self, app: FastAPI) -> None:
from jurisdictions.nz_legal.routes import register
register(app)
Called at the end of create_app(). Route handlers access pipeline and store via request.app.state.
nz_legal uses this to expose /search, /notable, /sentencing-tracker, /pg-tracker, and /contrasting-cases.
Federated per-Act legislation retrieval (leg_sources)
By default, legislation retrieval does one vector search across the entire legislation collection. As a corpus grows (more Acts), smaller Acts get crowded out by larger ones on embedding similarity alone.
Override leg_sources to run one search per registered Act in parallel, each with its own top_k quota.
The re-ranker phase (Phase 2) can then select the best sections across all sources without manual routes:
from core.jurisdiction import LegislationSource
@property
def leg_sources(self) -> list[LegislationSource]:
return [
LegislationSource("RTA", "Residential Tenancies Act 1986", default_top_k=6, boost_top_k=10),
LegislationSource("HHS2019","Residential Tenancies (Healthy Homes Standards) Regulations 2019", default_top_k=4, boost_top_k=8),
]
When a matched route targets a specific Act (e.g. healthy_homes route targets HHS2019), that
Act's search uses boost_top_k instead of default_top_k, giving it more candidates before ranking.
Routes remain as hard floor guarantees - forced sections are always included in the candidate pool regardless of federated search results. This means a cross-encoder re-ranker (Phase 2) can reorder freely without risking that a critical section is dropped.
A CrossEncoderReranker (Phase 1: log-only) is available in core/reranker.py. It scores
candidates after federated search and logs the scores for observability without affecting ranking.
Promote to production ranking after benchmarking shows it matches route-based quality.
Case retrieval augmentation (case_synthetic_query on StatuteRoute)
When a matched route defines case_synthetic_query, a supplementary case retrieval pass
runs with that query and unique results are merged into context (up to 8 total chunks).
Fixes cases where the query rewriter drops legally significant framing that is obvious from the original question but lost in rewriting:
StatuteRoute(
intent="sham_flatmate_agreement",
include_any=("flatmate agreement", "meant to be tenants", ...),
forced_sections=("NZLEG/RTA/s5",),
synthetic_query="...",
case_synthetic_query=(
"flatmate agreement landlord not living property sham tenancy RTA applies "
"boarder licensee residential tenancy act tenant rights eviction notice"
),
)
Smoke fixture source count (min_sources on SmokeFixture)
Assert that supplementary retrieval ran and returned the expected number of case sources:
SmokeFixture(
question="My landlord put us on a flatmate agreement...",
expected_sections=[],
min_sources=6,
description="sham_flatmate_agreement route - case_synthetic_query augmentation",
)
Qdrant payload schema
All jurisdictions must produce chunks conforming to schemas/qdrant_payload.schema.json.
Required fields: document_id, court, court_name, title, date, url, text, source_type.
Stack
| Component | Technology |
|---|---|
| Vector database | Qdrant |
| Embeddings | nomic-embed-text-v1.5 / Qwen3-Embedding-0.6B via sentence-transformers |
| LLM inference | llama.cpp (OpenAI-compatible) |
| API | FastAPI + SSE streaming |
| Cache | Redis (web verify results) |
| Queue | Semaphore-based, per-IP fairness |
Milestones
- Milestone 0 - core interface design, runtime modules,
nz_tenancyjurisdiction - Milestone 1 -
nsw_tenancyskeleton +nz_legal+nz_employmentprove interface generalises - Milestone 2 - smoke test runner wired to pytest (Tier 1/2/3), Docker Compose
- Milestone 3 - CONTRIBUTING.md, packaging, NSW NCAT scraper + corpus (225+ decisions)
- Milestone 4 -
nz_legalmigration: tracker endpoints, contrasting cases,register_routeshook - Milestone 5 - federated per-Act legislation retrieval, Healthy Homes Standards 2019 corpus, cross-encoder reranker (Phase 1 log-only), Qdrant payload indexes for fast filtered search
Related project
The NZ tenancy tool running on this framework: https://tenancy.localrun.ai
Source: https://github.com/jwongso/nz-legal-rag
MIT License. Not legal advice.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file astraea_framework-0.2.0.tar.gz.
File metadata
- Download URL: astraea_framework-0.2.0.tar.gz
- Upload date:
- Size: 65.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59098a82e53d5c304676380ae9ea0da0bc2fa1383ba35dd8e643add6e803f1c3
|
|
| MD5 |
bb6fea90fc5429fbd6d7821aaba17f5f
|
|
| BLAKE2b-256 |
e755c8d988dad038d0713a8b4178ce2ab787c92ea142809eb5786f7520d45ef9
|
File details
Details for the file astraea_framework-0.2.0-py3-none-any.whl.
File metadata
- Download URL: astraea_framework-0.2.0-py3-none-any.whl
- Upload date:
- Size: 80.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee97995d494ab0e3a99a01b4c4c75d32d3043365c082851fa3c0676e1bf047f8
|
|
| MD5 |
d4b5748ea1e8a3329c19480a0d2cca83
|
|
| BLAKE2b-256 |
a3c4c94700aff14566fcb8c0b996575435456fcf194f0b970ff98e692a424192
|