Skip to main content

Pure Python ByteCNN for edge deployment

Project description

PinyByteCNN

Codecov

Pure Python implementation of ByteCNN for toxicity detection and edge deployment.

Overview

PinyByteCNN is a lightweight, dependency-free neural network implementation designed for production deployment in constrained environments. It provides CNN-based text classification with minimal memory footprint and fast inference.

Quick Start

from tinybytecnn.model import ByteCNN

# Create model
model = ByteCNN(
    vocab_size=256,
    embed_dim=14,
    conv_filters=28,
    conv_kernel_size=3,
    hidden_dim=48,
    max_len=512
)

# Predict toxicity
score = model.predict("Hello world")  # Returns float [0.0, 1.0]

Features

  • Pure Python: No external dependencies beyond standard library
  • Memory Efficient: Optimized for minimal RAM usage
  • Fast Inference: Single-pass prediction with pre-allocated buffers
  • Multiple Architectures: Support for 1-3 layer CNN configurations
  • Flexible Input: Handles variable-length text with multiple strategies

Architecture

ByteCNN processes text through the following pipeline:

  1. Byte Encoding: Convert text to UTF-8 bytes (0-255)
  2. Embedding: Map bytes to dense vectors
  3. Convolution: 1D CNN with ReLU activation
  4. Pooling: Global average/max pooling
  5. Classification: Dense layers with sigmoid output

Installation

Clone the repository and import directly:

git clone <repository-url>
cd PinyByteCNN
python3 -c "from tinybytecnn.model import ByteCNN; print('Success')"

Usage

Basic Classification

from tinybytecnn.model import ByteCNN

model = ByteCNN(vocab_size=256, embed_dim=14, conv_filters=28, 
                conv_kernel_size=3, hidden_dim=48)

# Single prediction
score = model.predict("This is a test message")

# Batch processing
texts = ["Hello", "Goodbye", "Test message"]
scores = [model.predict(text) for text in texts]

Multi-Layer Models

from tinybytecnn.multi_layer_optimized import MultiLayerByteCNN

# Define layer configuration
layers = [
    {"in_channels": 14, "out_channels": 28, "kernel_size": 3},
    {"in_channels": 28, "out_channels": 40, "kernel_size": 3}
]

model = MultiLayerByteCNN(layers_config=layers, hidden_dim=128, max_len=512)
score = model.predict("Multi-layer processing")

Prediction Strategies

  • truncate: Use first max_len bytes (fastest)
  • average: Average predictions over sliding windows
  • attention: Weighted average with attention mechanism
score = model.predict("Long text...", strategy="average")

Testing

Run the test suite:

python3 -m unittest discover tests/

Smoke Tests

Validate against production models:

python3 tests/test_bytecnn_10k_smoke.py

Performance

Model Parameters Accuracy Inference Time
ByteCNN-10K 10,009 78.97% 0.5ms
ByteCNN-32K 32,768 82.15% 1.2ms

Benchmarks on MacBook Pro M1, single-threaded

Production Deployment

PinyByteCNN is designed for edge deployment scenarios:

  • Cloudflare Workers: Sub-10ms inference
  • AWS Lambda: Cold start friendly
  • Mobile/IoT: Minimal memory footprint
  • Air-gapped Systems: No external dependencies

See DEPLOYMENT.md for detailed deployment guides.

Model Architecture Details

For detailed architecture information and training procedures, see ARCHITECTURE.md.

Development

Setup Development Environment

With uv (recommended):

# Install dev dependencies
uv sync --dev

# Run linting (performance-optimized rules)
uv run python scripts/lint.py

# Quick lint check
uv run ruff check tinybytecnn/

# Format code  
uv run ruff format .

With pip:

# Install development tools
python scripts/setup_dev.py

# Run linting
python scripts/lint.py

Linting Philosophy

PinyByteCNN uses performance-focused linting rules:

  • Core library (tinybytecnn/): Strict quality checks
  • Performance exceptions: Complexity rules relaxed for optimization
  • Documentation: Optional (prioritizes code density)
  • Tests/Scripts: Lenient rules for development flexibility

Contributing

  1. Run python scripts/setup_dev.py to install dev tools
  2. Ensure python scripts/lint.py passes on core library
  3. Maintain 80%+ test coverage with python scripts/coverage_analyzer.py
  4. Add tests for new features

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pinybytecnn-1.0.0.tar.gz (38.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pinybytecnn-1.0.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file pinybytecnn-1.0.0.tar.gz.

File metadata

  • Download URL: pinybytecnn-1.0.0.tar.gz
  • Upload date:
  • Size: 38.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for pinybytecnn-1.0.0.tar.gz
Algorithm Hash digest
SHA256 83a88cdb025946086d3e23c318aab51580b0a24ec31c0e9e90ae43a782d010bc
MD5 e0c13289bc2d042953b7bf7840e88069
BLAKE2b-256 39da4c1946323a021a483425d310fbf5656c3b543193b7e9a1a0493c92424a07

See more details on using hashes here.

File details

Details for the file pinybytecnn-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pinybytecnn-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for pinybytecnn-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e99f5e28c1e16855d315d420d6bb2bd3d6829dfe038a4147116b5169d23e08bf
MD5 d01b7ca0ab0380b167b1cd032a1e3fc4
BLAKE2b-256 2d78df32e4158f2a9ba5a815564fda70badd74b5f604838a9b28ab5ade90acd1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page