Export Jupyter notebooks into narrated, cell-labeled text.
Project description
Neurobyte
The Notebook-to-LLM Code Optimizer.
Neurobyte prepares your Jupyter notebooks for maximum context-window efficiency and model comprehension. It is the bridge between your data science work and your AI coding assistant.
Features
- Context Optimization: Extracts only relevant code and structure, stripping high-noise metadata.
- LLM-Native Formats: Outputs in XML, JSON, or Narrated Text—formats optimized for Claude, GPT-4, and flexible indexing.
- Smart Redaction: Automatically sanitizes API keys, secrets, and sensitive table references before they leave your machine.
- Token efficiency: Estimates token usage (via
tiktoken) so you know exactly what fits in your context window. - Structured Outlines: Generates a high-level summary and table of contents to help models understand the "big picture" before reading code.
Installation
pip install neurobyte
Quick Start
# Optimize for Claude/Anthropic (XML)
neurobyte export notebook.ipynb --xml
# Optimize for GPT-4 (Text)
neurobyte export notebook.ipynb
Git Integration
Neurobyte provides a pre-commit hook to automatically export notebooks to text for better diffs.
Add this to your .pre-commit-config.yaml:
- repo: https://github.com/EllordParis/neurobyte
rev: main
hooks:
- id: neurobyte-export
Development
This project uses make for common development tasks.
# Install dependencies
make install
# Run tests (Pytest)
make test
# Linting & Formatting (Ruff, Black, Mypy)
make lint
make format
Optimization Strategies
Choosing the Right Format
- XML (
--xml): Best for Claude 3 (Opus/Sonnet) and newer models. The structural tags help the model distinguish between code, markdown, and metadata distinctively, reducing hallucinations. - JSON (
--json): Ideal for RAG pipelines or Agentic workflows. Allows you to index specific cells or function definitions into a vector database. - Text (Default): Best for GPT-4 copy-paste or smaller context windows where token overhead must be minimal.
Reducing Context Noise
- Redact Aggressively: Use
--redact-patternto strip noisy IDs or non-secret internal references that might confuse the model. - Filter Custom Cells: If your notebook has 50 cells but you only need to optimize the feature engineering method, use
--cells 10-25.
Reference & Advanced Capabilities
CLI Commands
# Basic export (Text format)
neurobyte export notebook.ipynb
# XML output (Best for Claude/Anthropic models)
neurobyte export notebook.ipynb --xml
# JSON output (Best for RAG/Indexing)
neurobyte export notebook.ipynb --json
# Check token usage before exporting
neurobyte export notebook.ipynb --estimate-tokens
# Custom output path
neurobyte export notebook.ipynb -o context.xml --xml
# Include markdown cells in output
neurobyte export notebook.ipynb --include-markdown
# Custom redaction pattern
neurobyte export notebook.ipynb --redact-pattern 'client_id=\d+'
# Export specific cell range
neurobyte export notebook.ipynb --cells 1-5
# Disable redaction
neurobyte export notebook.ipynb --no-redact
Python API
import neurobyte as nb
from neurobyte.export import ExportOptions
# Basic export
nb.export_notebook("notebook.ipynb", "export.txt")
# With options
opts = ExportOptions(
output_format="json", # or "xml", "txt"
include_markdown=True,
redact_secrets=True,
extra_redact_patterns=["client_id=\\d+"],
cell_indices=[1, 2, 3],
)
nb.export_notebook("notebook.ipynb", "export.json", options=opts)
# Export from live session (requires IPython)
nb.export_here("session.txt")
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
neurobyte-0.2.2.tar.gz
(18.7 kB
view details)
File details
Details for the file neurobyte-0.2.2.tar.gz.
File metadata
- Download URL: neurobyte-0.2.2.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0da9d7c514cea3a56bf833cfad48edf6a7a7c013a3107b76f0b54e7609b65a0f
|
|
| MD5 |
de95008e211c77f15b8d55a04609eebc
|
|
| BLAKE2b-256 |
f07fb843b989fec81d1fc82eaf57b85ea04d0fbd3bc0044f1029ebe02457a237
|