Placeholder package for innit — name reserved while model is trained
Project description
innit - Fast English Detection
Note: The current PyPI release is a lightweight placeholder to reserve the package name while the model is trained and productized. It installs quickly and does not include heavy training dependencies. The CLI expects you to provide an ONNX model file.
A tiny, fast, and dependency-light tool to determine if text is English or not English. Perfect for book-length texts where you need quick language detection without heavy ML frameworks.
Features
- Fast: Sub-millisecond inference per 2KB window on CPU
- Small: ~1-2MB model size (0.5-1MB with int8 quantization)
- Simple: Binary classification - English vs Not-English
- Legal: Trained only on legally clean datasets
- Deployable: Ships as ONNX runtime (no PyTorch dependency for inference)
Installation
For inference only (lightweight):
pip install onnxruntime
# Download the innit.onnx model file
For training and development:
git clone <repo>
cd innit
pip install -e .
Quick Start
CLI Usage
# Analyze a text file
innit book.txt
# Output as JSON
innit book.txt --json
# Use specific model
innit book.txt --model path/to/innit.onnx
Python API
from innit.onnx_runner import ONNXInnitRunner, score_text_onnx
# Load model
runner = ONNXInnitRunner("innit.onnx")
# Score text
result = score_text_onnx(runner, text)
print(result["label"]) # "ENGLISH", "NOT-EN", or "UNCERTAIN"
Training Your Own Model
- Train the model:
python train_innit.py
- Export to ONNX:
python export_onnx.py
- Test evaluation:
python eval_innit.py sample_text.txt
How It Works
- Architecture: Tiny byte-level CNN with depthwise separable convolutions
- Input: UTF-8 bytes (no tokenizer needed)
- Strategy: Slides 2KB windows over text and aggregates predictions
- Thresholds: Conservative - requires high confidence across many windows
Model Details
- Input: Sequences of up to 2048 UTF-8 bytes
- Architecture: 4-block CNN with residual connections
- Output: Binary classification (English probability)
- Training: ~50K samples each of English and non-English text
- Datasets: Project Gutenberg (English) + multilingual sources (non-English)
Legal & Licensing
Training Data Sources
- English: Project Gutenberg texts (public domain in US)
- Non-English: HuggingFace multilingual datasets with permissive licenses
- See
DATA_SOURCES.mdfor complete dataset information
Model License
This model and code are released under MIT License. See LICENSE for details.
Usage Notes
- The model weights are original work trained on legally clean data
- No copyrighted text content is redistributed
- Safe for commercial use
Performance
| Metric | Value |
|---|---|
| Model Size (FP32) | ~1.5 MB |
| Model Size (INT8) | ~0.8 MB |
| Inference Speed | <1ms per 2KB window |
| Memory Usage | <100 MB |
| Accuracy | >95% on book-length texts |
Contributing
- Fork the repository
- Create your feature branch
- Add tests if applicable
- Submit a pull request
Troubleshooting
Model file not found: Ensure you've either trained a model with python train_innit.py or downloaded a pre-trained innit.onnx file.
Import errors: For inference, you only need onnxruntime. For training, install the full development dependencies.
Poor performance: The model works best on book-length texts (>1KB). Very short texts may return "UNCERTAIN".
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file innit-0.0.1a0.tar.gz.
File metadata
- Download URL: innit-0.0.1a0.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
057adb0f64b57e8aa53e25956a17d73dec4aa362fbd7b20caaa8c30992d875f0
|
|
| MD5 |
d12baf715d48ff5455381ec787724f6d
|
|
| BLAKE2b-256 |
2c64b62ecd01c3febbe8baff80d5405b9bc00357764132576ed938c399b8e830
|
File details
Details for the file innit-0.0.1a0-py3-none-any.whl.
File metadata
- Download URL: innit-0.0.1a0-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07e1db4073291b7e84436d610e012bbb1595cc791aca2875934534120234e360
|
|
| MD5 |
e24a8c23aa3d90a52f86121a0aedf054
|
|
| BLAKE2b-256 |
d2ddd0637e05784d4f782e04062dc7d6b2899083748ae7b515746376a64f834a
|