23 projects
charstreamer
Fast Rust/PyO3 semantic text segmentation
shuflr-client
Python client for shuflr — HTTP NDJSON and shuflr-wire/1 binary transports
kernel-lore-mcp
MCP server exposing fast, structured search over lore.kernel.org to LLM developer tools.
folio-python
Python library for FOLIO, the Federated Open Legal Information Ontology
folio-mcp
MCP server for FOLIO, the Federated Open Legal Information Ontology
alea-llm-client
ALEA LLM client abstraction library for Python
alea-markdown
Convert HTML files to Markdown with configurable options (pure Python)
llm-detector
Transparent, probabilistic classification of text as human-generated or LLM-generated
npm-vuln-scanner
Detect compromised npm packages from the September 2025 supply chain attack
logillm
A generic, high-performance, low-dependency LLM programming framework inspired by dspy
pyenvsearch
Python library navigation and AI-powered analysis tool for developers and AI agents. Combines traditional code search with LLM-powered package insights.
nupunkt-rs
High-performance Rust implementation of nupunkt sentence/paragraph tokenization
nupunkt
Next-generation Punkt sentence and paragraph boundary detection with zero dependencies
cheesecloth
High-performance text metrics and filtering for large-scale corpora and pretrain curation
charboundary
Fast character-based boundary detection for sentence and paragraphs
kl3m-data-client
Client for interacting with KL3M data stored in S3 with JSON output support
alea-data-generator
ALEA low-level data generation techniques (procedural, KL3M)
soli-python
Python library for SOLI, the Standard for Open Legal Information
alea-preprocess
Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation from the ALEA Institute.
alea-dublincore
ALEA Dublin Core Metadata library with zero dependencies
alea-data-resources
ALEA data resources library
soli-data-generator
Python library for SOLI data generation
rfcorr
Random Forest-inspired correlation/dependence methods