Load and run BCLM language models with PyTorch.

These details have not been verified by PyPI

Project links

Project description

BCLM-PyTorch Documentation

bclm-pytorch is the official Python library for loading and running inference with BCLM language models.

Installation

pip install bclm-pytorch

Dependencies installed automatically: torch, safetensors, tokenmonster, huggingface_hub.

Quick Start

import bclm

# Loads a model from HuggingFace
model = bclm.load("bclm-1-small-preview")

# Multi-turn chat
chat = model.chat(system_prompt="You are a helpful assistant.")
print(chat.send("What is 2+2?"))
print(chat.send("Why is that the case?"))

# Streaming chat (real-time token output)
for chunk in chat.send_stream("Tell me a short story about a cat."):
    print(chunk, end="", flush=True)
print()

Loading Models

bclm.load() is the single entry point for all model sources.

From Hugging Face

# Short form — resolves to huggingface.co/bclm/bclm-1-small-preview
model = bclm.load("bclm-1-small-preview")

# Explicit repo ID
model = bclm.load("bclm/bclm-1-small-preview")

# Any HF repo
model = bclm.load("your-org/your-model")

From a Local Directory

Point to a directory containing config.json and model.safetensors:

model = bclm.load("/path/to/model/directory")

From a URL

Point to a directory served over HTTPS that contains config.json and model.safetensors:

model = bclm.load("https://example.com/models/bclm-1-small/")

Options

model = bclm.load(
    "bclm-1-small-preview",
    device="cuda",          # "cpu", "cuda", "cuda:0", etc. (default: auto)
    dtype=torch.bfloat16,   # torch.float16, torch.float32 (default: bf16 on GPU, f32 on CPU)
    compile=False,          # Enable torch.compile for inference (default: False)
)

Parameter	Default	Description
`device`	auto	`"cpu"`, `"cuda"`, or a specific device string. Auto-selects CUDA if available.
`dtype`	auto	`torch.bfloat16` on CUDA, `torch.float32` on CPU. Weights ship as float16 but were trained in bfloat16.
`compile`	`False`	Wrap the model with `torch.compile`. Off by default to avoid warmup latency.

Chat Interface

The chat interface supports multi-turn conversations with automatic history management.

Basic Usage

chat = model.chat(
    system_prompt="You are a helpful assistant.",  # optional
    max_new_tokens=512,
    temperature=1.0,
    top_k=50,
    top_p=0.9,
    repetition_penalty=1.15,
    frequency_penalty=0.1,
)

# Each call appends to the conversation history
response = chat.send("Hello, who are you?")
print(response)

follow_up = chat.send("Can you elaborate?")
print(follow_up)

Streaming

for chunk in chat.send_stream("Write a poem about the ocean."):
    print(chunk, end="", flush=True)
print()

Per-Message Overrides

Override generation parameters for a single message without changing the session defaults:

response = chat.send(
    "Give me a one-word answer: is the sky blue?",
    max_new_tokens=10,
    temperature=0.1,
    repetition_penalty=1.0,   # disable for this message
    frequency_penalty=0.0,
)

History Management

# View conversation history
for msg in chat.messages:
    print(f"{msg['role']}: {msg['content'][:80]}")

# Clear history (keeps system prompt)
chat.clear()

Interactive CLI

Launch a blocking interactive chat loop in the terminal:

chat.interactive()

Commands inside the interactive loop:

/clear — reset conversation
/history — print message list
/help — show commands
Ctrl-C — exit

Text Completion

For non-chat use cases (continuing a text prompt):

text = model.complete("Once upon a time", max_tokens=200, temperature=0.8)
print(text)

Streaming Completion

for chunk in model.complete_stream("The quick brown fox", max_tokens=100):
    print(chunk, end="", flush=True)
print()

Generation Parameters

All generation methods accept these parameters:

Parameter	Default	Description
`max_new_tokens`	512	Maximum number of tokens to generate.
`temperature`	1.0	Sampling temperature. 0 = greedy, higher = more random.
`top_k`	50	Restrict sampling to the top-k most likely tokens. `None` to disable.
`top_p`	0.9	Nucleus sampling threshold. `None` to disable.
`repetition_penalty`	1.15	Multiplicative penalty applied to every token already present in the context. Positive logits are divided by the penalty; negative logits are multiplied — both directions make the token less likely. 1.0 disables. Defaults are tuned for small language models, which are more prone to degenerate repetition.
`frequency_penalty`	0.1	Additive penalty proportional to how many times each token has appeared. Logits are reduced by `frequency_penalty × count`, discouraging high-frequency tokens more strongly than low-frequency ones. 0.0 disables.

Model Information

model = bclm.load("bclm-1-small-preview")

# Architecture name (e.g., "BCLM1Model")
print(model.architecture)

# Model config object
print(model.config)

# Parameter count
print(f"{model.num_parameters / 1e6:.1f}M parameters")

# Device and dtype
print(model.device, model.dtype)

Advanced: Direct Access

For advanced use cases, you can access the underlying PyTorch modules:

model = bclm.load("bclm-1-small-preview")

# Raw nn.Module (e.g., BCLM1Model)
raw = model.raw_model

# Inference wrapper (e.g., BCLM1ForGeneration)
gen = model.generator

# Tokenizer
tok = model.tokenizer

Tokenizer

The current tokenizer backend is TokenMonster. The tokenizer spec is embedded in each model's config.json (e.g., "tokenmonster:english-32000-consistent-v1"), so the correct tokenizer is loaded automatically.

tok = model.tokenizer

ids = tok.encode("Hello world")
text = tok.decode(ids)

Environment Variables

Variable	Description
`BCLM_TOKENIZER`	Override tokenizer spec (e.g., `tokenmonster:english-32000-consistent-v1`).
`BCLM_TOKENMONSTER_DIR`	Custom cache directory for TokenMonster vocab files.

Config Format

Model directories must contain a config.json:

{
  "architecture": "BCLM1Model",
  "model": {
    "vocab_size": 32768,
    "tokenizer": "tokenmonster:english-32000-consistent-v1",
    "embed_dim": 384,
    "n_layers": 12,
    "max_seq_len": 16384,
    "dropout": 0.0,
    "attn_heads": 6,
    "attn_kv_heads": 2,
    "local_attn_layers": [1, 5, 7, 11],
    "global_attn_layers": [3, 9],
    "attn_window_size": 1024,
    "conv_kernel_size": 4,
    "osc_n_pairs": 1,
    "osc_n_real": 16,
    "osc_clamp_min_decay": 1e-05,
    "bigram_table_factor": 5
  }
}

The "architecture" field determines which model class is instantiated. Weights should be in model.safetensors (safetensors format).

Error Handling

import bclm

try:
    model = bclm.load("nonexistent-model")
except FileNotFoundError:
    print("Model not found")
except ValueError as e:
    print(f"Invalid model: {e}")
except ImportError as e:
    print(f"Missing dependency: {e}")

Requirements

Python ≥ 3.9
PyTorch ≥ 2.1
safetensors
tokenmonster
huggingface_hub

Optional:

xformers — enables optimized attention kernels on CUDA

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.7

Feb 15, 2026

1.0.6

Feb 15, 2026

1.0.5

Feb 15, 2026

1.0.4

Feb 15, 2026

This version

1.0.3

Feb 15, 2026

1.0.2

Feb 15, 2026

1.0.1

Feb 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bclm_pytorch-1.0.3.tar.gz (37.8 kB view details)

Uploaded Feb 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bclm_pytorch-1.0.3-py3-none-any.whl (35.2 kB view details)

Uploaded Feb 15, 2026 Python 3

File details

Details for the file bclm_pytorch-1.0.3.tar.gz.

File metadata

Download URL: bclm_pytorch-1.0.3.tar.gz
Upload date: Feb 15, 2026
Size: 37.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for bclm_pytorch-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`ff1bd63294130490e6a7685a17516d8dfe33e5ccf092bb262d76918998912bfe`
MD5	`f14795bb4119d9d7ae8c8f63cfed8aa3`
BLAKE2b-256	`f34a187bf84d2bd85050c52a1c7d47c96484360313e8a5ab89088772db02f91f`

See more details on using hashes here.

File details

Details for the file bclm_pytorch-1.0.3-py3-none-any.whl.

File metadata

Download URL: bclm_pytorch-1.0.3-py3-none-any.whl
Upload date: Feb 15, 2026
Size: 35.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for bclm_pytorch-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d4d31c68b25f3c10615ed1f0449305b9919641b623efbbac811959798402606f`
MD5	`c23a708c22848c46c2c1495ad037702a`
BLAKE2b-256	`ce85cb37e435c1833c3b9f2e889258770814faaaf9f175368692f2442a0a5370`

See more details on using hashes here.

bclm-pytorch 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BCLM-PyTorch Documentation

Installation

Quick Start

Loading Models

From Hugging Face

From a Local Directory

From a URL

Options

Chat Interface

Basic Usage

Streaming

Per-Message Overrides

History Management

Interactive CLI

Text Completion

Streaming Completion

Generation Parameters

Model Information

Advanced: Direct Access

Tokenizer

Environment Variables

Config Format

Error Handling

Requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes