No project description provided

These details have not been verified by PyPI

Project links

Project description

this_file: README.md

cereproc.py

old/cereproc.py is a single-file utility that splits oversized documents into Cerebras-friendly chunks, calls the qwen-3-coder-480b chat completion model for each chunk, and stitches the results back together while keeping context intact.

Quick Start

export CEREBRAS_API_KEY="csk-..."
uv run old/cereproc.py --input_data document.md --output_data document.out.md

Add optional guidance by supplying an inline prompt or a separate instructions file:

uv run old/cereproc.py \
  --input_data huge.md \
  --file_prompt prompts/style.md \
  --prompt "Write concise technical summaries." \
  --data_format code \
  --chunk_size 28000 \
  --sample_size 256 \
  --verbose

CLI Flags

--input_data PATH (required) Text/Markdown/code file to process.
--output_data PATH Destination file (defaults to the input path).
--file_prompt PATH Load reusable instructions; appended before the inline prompt.
--prompt TEXT Freeform instructions appended after the file prompt.
--chunk_size INT Target chunk size in tokens (default 32000).
--data_format text|semantic|markdown|code Chunking strategy (default markdown).
--sample_size INT Continuity example size in tokens (default 200, use 0 to disable).
--max_tokens_ratio INT Completion budget as % of chunk tokens (default 100).
--temp FLOAT and --top_p FLOAT Sampling controls (defaults 0.7 / 0.8).
--model TEXT Cerebras model name override (default qwen-3-coder-480b).
--verbose Enable detailed logging and chunk previews.
--dry_run Inspect chunking and request envelopes without calling the API.
--explain Parse Markdown frontmatter, ensure required metadata fields, and ask the model to fill gaps before processing.

Processing Pipeline

Load .env values and validate CEREBRAS_API_KEY plus CLI arguments.
Build a base prompt from --file_prompt and --prompt (always separated by two newlines) and count its tokens.
Read the input file (frontmatter preserved) and optionally parse metadata when --explain is active.
Chunk the body using the selected strategy:
- text: greedy line-based splitting.
- semantic: paragraph-aware via semantic-text-splitter.
- markdown: structure-aware Markdown splitter.
- code: regex-guided boundaries for source files.
For each chunk, optionally blend in continuity examples drawn from the previous request/response pair (--sample_size tokens each way), truncated to stay within the 131K-token context budget.
Stream completions from Cerebras with adaptive rate-limit backoff and retry (tenacity) on transient failures.
Write the concatenated result atomically, preserving or updating frontmatter when --explain metadata is present.

Explain Mode Metadata

When --explain is set, the script expects frontmatter containing title, author, id, type, and date. Missing keys trigger a structured JSON request to the model that fills only the absent values. Dry-run mode skips this network call while still showing parsed metadata.

Dry-Run Workflow

Use --dry_run to sanity-check chunk sizes, token budgets, and message shapes without spending quota. The script prints the first two chunk envelopes, token counts, and previews, then exits before creating the Cerebras client.

Dependencies

Install requirements with uv (or your preferred tool):

fire
loguru
python-dotenv
tenacity
cerebras-cloud-sdk
semantic-text-splitter
qwen-tokenizer
tqdm
python-frontmatter

Environment

Set CEREBRAS_API_KEY before running. The utility warns on placeholder keys and gently validates formatting. Use --verbose to surface additional runtime information and rate-limit headers.

Testing Tips

Run with --dry_run for fast validation, then process a short sample file in --verbose mode to observe continuity handling and output statistics before you launch against larger documents.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.36

Mar 6, 2026

1.0.35

Mar 6, 2026

1.0.34

Feb 22, 2026

1.0.32

Nov 27, 2025

1.0.31

Nov 14, 2025

1.0.30

Nov 14, 2025

1.0.28

Nov 6, 2025

1.0.26

Sep 28, 2025

1.0.25

Sep 24, 2025

1.0.24

Sep 20, 2025

1.0.23

Sep 20, 2025

1.0.22

Sep 20, 2025

1.0.21

Sep 20, 2025

1.0.20

Sep 20, 2025

1.0.19

Sep 20, 2025

1.0.18

Sep 20, 2025

1.0.17

Sep 20, 2025

1.0.16

Sep 20, 2025

1.0.15

Sep 20, 2025

1.0.14

Sep 20, 2025

1.0.13

Sep 20, 2025

1.0.12

Sep 19, 2025

1.0.11

Sep 19, 2025

1.0.10

Sep 19, 2025

1.0.9

Sep 19, 2025

1.0.8

Sep 19, 2025

1.0.7

Sep 19, 2025

1.0.6

Sep 19, 2025

1.0.5

Sep 19, 2025

This version

1.0.4

Sep 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cerebrate_file-1.0.4.tar.gz (8.6 kB view details)

Uploaded Sep 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cerebrate_file-1.0.4-py3-none-any.whl (42.1 kB view details)

Uploaded Sep 19, 2025 Python 3

File details

Details for the file cerebrate_file-1.0.4.tar.gz.

File metadata

Download URL: cerebrate_file-1.0.4.tar.gz
Upload date: Sep 19, 2025
Size: 8.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for cerebrate_file-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`40560655def034a8e9d23a42c1c27f15d61745a78f507980b10b643ab2572cf1`
MD5	`3548bfca7fa334b0ecea9fb856090852`
BLAKE2b-256	`2e09f95c6820b95fa8c914590dbb256cc6dcdda152b875c30649ac7b8106a6c9`

See more details on using hashes here.

File details

Details for the file cerebrate_file-1.0.4-py3-none-any.whl.

File metadata

Download URL: cerebrate_file-1.0.4-py3-none-any.whl
Upload date: Sep 19, 2025
Size: 42.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.15

File hashes

Hashes for cerebrate_file-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9dedd1825849c9104c5f31849587a3e6c6f83a7c55728b70e8231e63309848e5`
MD5	`7d0ee25da578dacaa87a6c575201689a`
BLAKE2b-256	`814deeb433b5af191226c736b4be2b016b25941c31794d49a98860ed3db48f46`

See more details on using hashes here.

cerebrate-file 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

this_file: README.md

cereproc.py

Quick Start

CLI Flags

Processing Pipeline

Explain Mode Metadata

Dry-Run Workflow

Dependencies

Environment

Testing Tips

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes