Skip to main content

A 109M parameter historical language model trained from scratch on Project Gutenberg

Project description

EKA-1

A 109M parameter historical language model trained from scratch on Project Gutenberg

PyPI version Python License Tests


EKA-1 is a decoder-only Transformer with 109,529,856 parameters, trained entirely from scratch on classical texts from Project Gutenberg. It features a modern architecture — RMSNorm, Rotary Position Embeddings (RoPE), SwiGLU activation, multi-head causal attention via PyTorch SDPA, and weight tying — packed into a library with a clean, one-line API.


Table of Contents


Features

Feature Status
Automatic model & tokenizer download
CPU inference
CUDA inference
Text generation
Chat interface (single & multi-turn)
Streaming generation
Temperature sampling
Top-k sampling
Top-p (nucleus) sampling
Repetition penalty
Context truncation
PEP 561 typed package

Installation

From PyPI (recommended):

pip install eka-ai

From source:

git clone https://github.com/eka-ai/eka-ai.git
cd eka-ai
pip install -e ".[dev]"

Requirements:

  • Python ≥ 3.9
  • PyTorch ≥ 2.1
  • sentencepiece ≥ 0.1.99
  • gdown ≥ 4.7.3

On first use, EKA() automatically downloads eka_model.pt (~424 MB) and tokenizer.model (~768 KB) to ~/.cache/eka_ai/.

Custom cache location: Set the EKA_CACHE_DIR environment variable to override the default cache directory.


Quick Start

from eka_ai import EKA

# Loads model on first call — downloads files automatically
model = EKA()

# ── Text generation ────────────────────────────────────────
print(
    model.generate(
        "Tell me about the Roman Empire",
        max_new_tokens=128,
    )
)

# ── Single-turn chat ───────────────────────────────────────
print(
    model.chat(
        "Who was Ashoka?"
    )
)

# ── Streaming generation ───────────────────────────────────
for token in model.stream(
    "Tell me a story"
):
    print(token, end="", flush=True)

API Reference

EKA(...) — constructor

model = EKA(
    device=None,           # "cpu" | "cuda" | torch.device — auto-detected
    dtype=None,            # torch.bfloat16 on CPU by default
    model_path=None,       # override cached model path
    tokenizer_path=None,   # override cached tokenizer path
    auto_download=True,    # download files if not cached
    system_prompt=None,    # custom system prompt for chat()
    verbose=True,          # print loading progress
)

model.generate(prompt, ...)str

Generate a text continuation for a plain prompt.

text = model.generate(
    prompt,
    max_new_tokens=256,        # int   — max tokens to generate
    temperature=0.8,           # float — sampling temperature (> 0)
    top_k=50,                  # int   — top-k cutoff (0 = disabled)
    top_p=0.95,                # float — nucleus threshold (1.0 = disabled)
    repetition_penalty=1.1,    # float — penalty for repeated tokens (1.0 = disabled)
    return_result=False,       # bool  — return GenerationResult with metadata
)

When return_result=True, returns a GenerationResult with attributes:

  • .text — the generated text
  • .tokens_generated — number of new tokens
  • .tokens_per_second — generation throughput
  • .device — device used

model.chat(message, ...)str

Single-turn or multi-turn chat using the built-in chat template.

reply = model.chat(
    message,
    history=None,              # list of (user, assistant) tuples
    system_prompt=None,        # override instance system prompt
    max_new_tokens=256,
    temperature=0.8,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.1,
)

Multi-turn example:

history = []
reply1 = model.chat("Who was Julius Caesar?")
history.append(("Who was Julius Caesar?", reply1))

reply2 = model.chat("What were his greatest conquests?", history=history)

model.stream(prompt, ...)Iterator[str]

Streaming generation — yields one decoded token piece at a time.

for piece in model.stream(
    "The ancient library of",
    max_new_tokens=128,
    temperature=0.8,
):
    print(piece, end="", flush=True)

model.stream_chat(message, ...)Iterator[str]

Like stream() but uses the chat template.

for piece in model.stream_chat("Tell me about Socrates"):
    print(piece, end="", flush=True)

model.info()dict

info = model.info()
# {
#   "version": "1.0.0",
#   "parameters": 109529856,
#   "device": "cpu",
#   "dtype": "torch.bfloat16",
#   "vocab_size": 32000,
#   "n_layers": 12,
#   "n_heads": 12,
#   "d_model": 768,
#   "context_length": 512,
# }

Model Architecture

EKA-1 is a decoder-only Transformer with the following design choices:

Component Implementation
Normalisation RMSNorm (no bias, no mean subtraction)
Position encoding Rotary Position Embeddings (RoPE) — no learned position table
Feed-forward SwiGLUdown(silu(gate(x)) ⊙ up(x))
Attention Multi-Head Causal Attention via PyTorch scaled_dot_product_attention
Weight sharing Weight tying — embedding and LM-head share parameters
Architecture Pre-norm residual blocks
Bias None (all linear layers are bias-free)

The attention implementation uses PyTorch's built-in SDPA which automatically selects the optimal kernel: FlashAttention 2 (when available), memory-efficient attention, or the reference implementation.


Model Configuration

{
  "vocab_size": 32000,
  "n_layers": 12,
  "n_heads": 12,
  "n_kv_heads": 12,
  "d_model": 768,
  "d_ffn": 3072,
  "context_length": 512,
  "dropout": 0.0,
  "bias": false
}
Parameter Value
Total parameters 109,529,856
Embedding dimension 768
Attention heads 12 × 64-dim heads
Transformer layers 12
FFN hidden dim 2048 (after SwiGLU 2/3 ratio)
Vocabulary 32,000 BPE tokens (SentencePiece)
Context length 512 tokens
Training data Project Gutenberg

Examples

Run the provided example scripts from the repository root:

# Interactive text generation
python examples/generation.py --prompt "The fall of Rome" --max_tokens 200

# Interactive chat REPL
python examples/chat.py
python examples/chat.py --stream       # streaming mode
python examples/chat.py --device cuda  # GPU

# Throughput benchmark
python examples/benchmark.py --runs 10 --max_tokens 256
python examples/benchmark.py --device cuda

Development

# Clone and install in editable mode with dev extras
git clone https://github.com/eka-ai/eka-ai.git
cd eka-ai
pip install -e ".[dev]"

# Run the full test suite
pytest tests/ -v

# Run specific test file
pytest tests/test_model.py -v

# Lint
ruff check eka_ai/ tests/
black --check eka_ai/ tests/ examples/

# Auto-format
black eka_ai/ tests/ examples/
isort eka_ai/ tests/

# Type check
mypy eka_ai/

Project Structure

eka-ai/
│
├── eka_ai/
│   ├── __init__.py       # Public API exports
│   ├── config.py         # EKAConfig dataclass
│   ├── model.py          # EKA1Model architecture
│   ├── tokenizer.py      # EKATokenizer (SentencePiece wrapper)
│   ├── downloader.py     # Automatic model/tokenizer download
│   ├── generation.py     # EKA high-level API
│   ├── utils.py          # Sampling utilities
│   └── py.typed          # PEP 561 marker
│
├── examples/
│   ├── chat.py           # Interactive chat REPL
│   ├── generation.py     # Text generation demo
│   └── benchmark.py      # Throughput benchmark
│
├── tests/
│   ├── test_config.py    # Config unit tests
│   ├── test_model.py     # Architecture unit tests
│   ├── test_utils.py     # Sampling utility tests
│   └── test_generation.py# API integration tests
│
├── .github/workflows/
│   ├── publish.yml       # PyPI publishing workflow
│   └── tests.yml         # CI test matrix
│
├── README.md
├── LICENSE               # Apache 2.0
├── setup.py
├── pyproject.toml
├── requirements.txt
└── MANIFEST.in

Citation

If you use EKA-1 in your research, please cite:

@misc{eka1_2024,
  title  = {{EKA-1}: A 109M Parameter Historical Language Model},
  author = {chvkrsubhash},
  year   = {2026},
  url    = {https://github.com/eka-ai/eka-ai},
  note   = {Trained from scratch on Project Gutenberg}
}

License

This project is licensed under the Apache License 2.0 — see LICENSE for details.

The training data is sourced from Project Gutenberg, which consists of public-domain works.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eka_ai-1.0.0.tar.gz (36.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eka_ai-1.0.0-py3-none-any.whl (25.9 kB view details)

Uploaded Python 3

File details

Details for the file eka_ai-1.0.0.tar.gz.

File metadata

  • Download URL: eka_ai-1.0.0.tar.gz
  • Upload date:
  • Size: 36.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for eka_ai-1.0.0.tar.gz
Algorithm Hash digest
SHA256 60b879bd504fa45a7b8e7848185359447d44ac1840e77f5f9b97f3d17d5aa36b
MD5 f04017f3d1f00a107c031f809ec6bcaf
BLAKE2b-256 71e0afe18b80a1335d41d9613c69681b68ed85126252a298edfe52deb18db08c

See more details on using hashes here.

File details

Details for the file eka_ai-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: eka_ai-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 25.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for eka_ai-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 28b2c9c3b93923b1bbb45b7440134a05f6821fbf572e53588e8b9b2fb3587f14
MD5 ffbf5f8fab5f556813fe471370432d2d
BLAKE2b-256 49c97b6587188c09e0974774a09050674173a5cf764ee00e94b21d221193ca30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page