No project description provided

These details have not been verified by PyPI

Project links

Project description

Here's a revised version of your README.md with tighter prose, clearer structure, and minimal fluff. I've preserved all essential information while improving readability and precision.

cereproc.py

old/cereproc.py processes large documents by splitting them into chunks suitable for the Cerebras qwen-3-coder-480b model, generating completions for each chunk, and reassembling the results while maintaining context.

Quick Start

export CEREBRAS_API_KEY="csk-..."
uv run old/cereproc.py --input_data document.md --output_data document.out.md

Add optional guidance using inline prompts or instruction files:

uv run old/cereproc.py \
  --input_data huge.md \
  --file_prompt prompts/style.md \
  --prompt "Write concise technical summaries." \
  -c code \
  --chunk_size 28000 \
  --sample_size 256 \
  --verbose

CLI

NAME
    cerebrate-file - Process large documents by chunking for Cerebras qwen-3-coder-480b

SYNOPSIS
    cerebrate-file INPUT_DATA <flags>

POSITIONAL ARGUMENTS
    INPUT_DATA
        Path to input file to process

FLAGS
    -o, --output_data=OUTPUT_DATA
        Output file path (default: overwrite input)
    -f, --file_prompt=FILE_PROMPT
        Path to file with initial instructions
    -p, --prompt=PROMPT
        Inline prompt text (appended after file_prompt)
    -c, --chunk_size=CHUNK_SIZE
        Target max chunk size in tokens (default: 32000)
    --max_tokens_ratio=MAX_TOKENS_RATIO
        Completion budget as % of chunk size (default: 100)
    --data_format=DATA_FORMAT
        Chunking strategy: text | semantic | markdown | code (default: markdown)
    -s, --sample_size=SAMPLE_SIZE
        Tokens from previous request/response to maintain context (default: 200)
    --temp=TEMP
        Model temperature (default: 0.7)
    --top_p=TOP_P
        Model top-p sampling (default: 0.8)
    --model=MODEL
        Override default model name (default: qwen-3-coder-480b)
    -v, --verbose
        Enable debug logging
    -e, --explain
        Parse and update frontmatter metadata
    --dry_run
        Show chunking details without calling the API

Streaming via STDIN/STDOUT

Use - to read from stdin or write to stdout:

cat huge.md | uv run cerebrate_file --input_data - --output_data - > processed.md

Processing Pipeline

Load .env and validate CEREBRAS_API_KEY and CLI arguments.
Construct base prompt from --file_prompt and --prompt, separated by two newlines. Count its tokens.
Read input file, preserving frontmatter. Parse metadata if --explain is enabled.
Split document body using one of these strategies:
- text: line-based greedy splitting
- semantic: paragraph-aware via semantic-text-splitter
- markdown: structure-preserving Markdown splitting
- code: regex-based source code boundaries
For each chunk, optionally prepend/append continuity examples (--sample_size tokens each) from prior interactions, ensuring total tokens stay under the 131K limit.
Stream responses from Cerebras, with automatic retry and backoff on transient errors (tenacity).
Write final output atomically. Update frontmatter if --explain is active.

Explain Mode Metadata

When --explain is set, the script looks for frontmatter containing:

title
author
id
type
date

Missing fields are filled via a structured JSON query to the model. Use --dry_run to preview parsed metadata without making network calls.

Dry Run Workflow

Use --dry_run to inspect:

Chunk sizes
Token budgets
Message structure

No API calls are made in this mode.

Dependencies

Install with uv or your preferred package manager:

fire
loguru
python-dotenv
tenacity
cerebras-cloud-sdk
semantic-text-splitter
qwen-tokenizer
tqdm
python-frontmatter

Environment Setup

Set CEREBRAS_API_KEY before running. The tool will warn about placeholder keys and validate basic formatting. Use --verbose for extra runtime info and rate-limit headers.

Testing Tips

Run with --dry_run to check chunking logic quickly.
Test on a small sample file with --verbose to observe:
- Context blending between chunks
- Output statistics
Only then run on larger inputs.

Let me know if you'd like this tailored further toward users, developers, or integration into a larger documentation system.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.36

Mar 6, 2026

1.0.35

Mar 6, 2026

1.0.34

Feb 22, 2026

1.0.32

Nov 27, 2025

1.0.31

Nov 14, 2025

1.0.30

Nov 14, 2025

1.0.28

Nov 6, 2025

This version

1.0.26

Sep 28, 2025

1.0.25

Sep 24, 2025

1.0.24

Sep 20, 2025

1.0.23

Sep 20, 2025

1.0.22

Sep 20, 2025

1.0.21

Sep 20, 2025

1.0.20

Sep 20, 2025

1.0.19

Sep 20, 2025

1.0.18

Sep 20, 2025

1.0.17

Sep 20, 2025

1.0.16

Sep 20, 2025

1.0.15

Sep 20, 2025

1.0.14

Sep 20, 2025

1.0.13

Sep 20, 2025

1.0.12

Sep 19, 2025

1.0.11

Sep 19, 2025

1.0.10

Sep 19, 2025

1.0.9

Sep 19, 2025

1.0.8

Sep 19, 2025

1.0.7

Sep 19, 2025

1.0.6

Sep 19, 2025

1.0.5

Sep 19, 2025

1.0.4

Sep 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cerebrate_file-1.0.26.tar.gz (11.4 kB view details)

Uploaded Sep 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cerebrate_file-1.0.26-py3-none-any.whl (56.2 kB view details)

Uploaded Sep 28, 2025 Python 3

File details

Details for the file cerebrate_file-1.0.26.tar.gz.

File metadata

Download URL: cerebrate_file-1.0.26.tar.gz
Upload date: Sep 28, 2025
Size: 11.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for cerebrate_file-1.0.26.tar.gz
Algorithm	Hash digest
SHA256	`0028a48c1a2f4de9d37b3b9d82d604239929baf76505393c31a93d9c062ef7ef`
MD5	`f43909986a541db264eba5dc9f6a8e82`
BLAKE2b-256	`b26e3e75bce688cb8dc4ca2cdd7ae5899a7d2f3d15d3fa0167cc826fa74766a4`

See more details on using hashes here.

File details

Details for the file cerebrate_file-1.0.26-py3-none-any.whl.

File metadata

Download URL: cerebrate_file-1.0.26-py3-none-any.whl
Upload date: Sep 28, 2025
Size: 56.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for cerebrate_file-1.0.26-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6783b74f09ec5f56062d1102d9bdbbd9939d0eb3c20ee8b1341b7edb84e33770`
MD5	`3c6ffd19683151ad73dda3d93b26862b`
BLAKE2b-256	`1cde99bff721f5291b1c145c8484b65ff0ef5f51fd6bc6c2b5fdccee24e53b01`

See more details on using hashes here.

cerebrate-file 1.0.26

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cereproc.py

Quick Start

CLI

Streaming via STDIN/STDOUT

Processing Pipeline

Explain Mode Metadata

Dry Run Workflow

Dependencies

Environment Setup

Testing Tips

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes