Skip to main content

A tool to serialize repository contents into a single file

Project description

Repo Serializer

A Python utility for serializing local Git repositories into a structured text file, capturing the directory structure (in ASCII format), file names, and contents of source files. Ideal for providing a comprehensive snapshot of a repository for code review, documentation, or interaction with large language models (LLMs).

Installation

# Install from PyPI
pip install repo-serializer

# For development
pip install "repo-serializer[dev]"

Usage

Command Line

# Basic usage
repo-serializer /path/to/repository

# Specify output file
repo-serializer /path/to/repository -o output.txt

# Copy to clipboard in addition to saving to file
repo-serializer /path/to/repository -c

# Use structure-only mode to output only the directory structure and filenames
repo-serializer /path/to/repository -s

# Skip specific directories (can be used multiple times or as a comma-separated list)
repo-serializer /path/to/repository --skip-dir build,dist
repo-serializer /path/to/repository --skip-dir build --skip-dir dist

# Only include Python files (.py, .ipynb)
repo-serializer /path/to/repository --python

# Only include JavaScript/TypeScript files
repo-serializer /path/to/repository --javascript

# Combine with other options
repo-serializer /path/to/repository --python -s -c  # Python files, structure only, copy to clipboard

# Extract AI/LLM prompts from the repository
repo-serializer /path/to/repository -p
repo-serializer /path/to/repository --prompt -o prompts.txt

Python API

from repo_serializer import serialize

# Serialize a repository, skipping specific directories
serialize("/path/to/repository", "output.txt", skip_dirs=["build", "dist"])

Features

  • Directory Structure: Clearly visualize repository structure in ASCII format.
  • Structure-Only Mode: Option to output only the directory structure and filenames without file contents.
  • File Filtering: Excludes common binary files, cache directories, hidden files, and irrelevant artifacts to keep outputs concise and focused.
  • Smart Content Handling:
    • Parses Jupyter notebooks to extract markdown and code cells with sample outputs
    • Limits CSV files to first 5 lines
    • Truncates large text files after 1000 lines
    • Handles non-UTF-8 and binary files gracefully
  • Extensive Filtering: Skips common configuration files, build artifacts, test directories, and more.
  • Clipboard Integration: Option to copy output directly to clipboard.
  • Prompt Extraction: Extract AI/LLM prompts from various sources:
    • Inline strings in Python/JavaScript code near LLM API calls (OpenAI, Anthropic, etc.)
    • Standalone prompt files (.prompt.txt, .prompt.md, etc.)
    • YAML/JSON configuration files with prompt definitions
    • Jupyter notebooks containing prompts
    • Markdown files in prompt-related directories

Example

# Create a serialized snapshot of your project
repo-serializer /Users/example_user/projects/my_repo -o repo_snapshot.txt

Prompt Extraction

The prompt extraction feature (-p or --prompt) helps you analyze and audit AI/LLM prompts used throughout your codebase. This is particularly useful for:

  • Reviewing prompt consistency and quality
  • Refactoring duplicate prompts
  • Creating prompt libraries
  • Documenting AI behavior
  • Security auditing of prompts

What Gets Extracted

  1. Inline Prompts in Code

    • Strings near OpenAI, Anthropic, and other LLM API calls
    • Multi-line strings assigned to prompt-related variables
    • Template literals in JavaScript/TypeScript
  2. Standalone Prompt Files

    • Files with extensions like .prompt.txt, .prompt.md
    • Files in prompts/ or similar directories
    • YAML/JSON files with prompt configurations
  3. Configuration Files

    • YAML/JSON files with keys like prompt, system_prompt, instructions
    • Nested prompt definitions in configuration objects

Example Output

Found 5 prompts in the repository:
================================================================================

File: src/ai_agent.py
--------------------------------------------------------------------------------

Line 15 (inline_string) - Near LLM API call:

You are an expert Python developer. Your task is to help users write clean, efficient, and well-documented Python code. Follow these guidelines:

  1. Always use proper error handling
  2. Write comprehensive docstrings
  3. Follow PEP 8 style guidelines

File: config/prompts.yaml
--------------------------------------------------------------------------------

Line 3 (config_file) - Key: system:

You are a code reviewer. Analyze the provided code for:

  • Security vulnerabilities
  • Performance issues
  • Code quality and maintainability

Contributing

Pull requests and improvements are welcome! Please ensure your contributions are clearly documented and tested.

Development

Quick Testing

For quick testing during development:

# Install in development mode
pip install -e .

# Now any changes to the source code take effect immediately
repo-serializer /path/to/test/repo -o test_output.txt

Full Test Suite

Run the test script:

./dev/test_dev.py

This will:

  1. Install the package in development mode
  2. Run multiple test scenarios
  3. Generate test outputs for review

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repo_serializer-1.3.0.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repo_serializer-1.3.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file repo_serializer-1.3.0.tar.gz.

File metadata

  • Download URL: repo_serializer-1.3.0.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for repo_serializer-1.3.0.tar.gz
Algorithm Hash digest
SHA256 0686948669dd4acd7a5d1c3b2deb6b96df2515596768a6c1a27278a568bb079a
MD5 21a80fb2a3836484f0b4a1782ec987f6
BLAKE2b-256 3fd7c41fad6dda66cf40e84c2a7366c740ebdbe2d8d3d0e4c5578554d0cf068d

See more details on using hashes here.

File details

Details for the file repo_serializer-1.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for repo_serializer-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d40a1a3bf642f1752e144b600115b6131f45db4a613263408a7e324ed4f7abc6
MD5 fe78f37e80e94975c5ebc8f51de59bbf
BLAKE2b-256 591af3cfd2a35fa1056fe7021f013618e863afc95140b8bc6022b758772f9f42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page