Skip to main content

A tool to serialize repository contents into a single file

Project description

Repo Serializer

A Python utility for serializing local Git repositories into a structured text file, capturing the directory structure (in ASCII format), file names, and contents of source files. Ideal for providing a comprehensive snapshot of a repository for code review, documentation, or interaction with large language models (LLMs).

Installation

# Install from PyPI
pip install repo-serializer

# For development
pip install "repo-serializer[dev]"

Usage

Command Line

# Basic usage
repo-serializer /path/to/repository

# Specify output file
repo-serializer /path/to/repository -o output.txt

# Copy to clipboard in addition to saving to file
repo-serializer /path/to/repository -c

# Use structure-only mode to output only the directory structure and filenames
repo-serializer /path/to/repository -s

# Skip specific directories (can be used multiple times or as a comma-separated list)
repo-serializer /path/to/repository --skip-dir build,dist
repo-serializer /path/to/repository --skip-dir build --skip-dir dist

# Only include Python files (.py, .ipynb)
repo-serializer /path/to/repository --python

# Only include JavaScript/TypeScript files
repo-serializer /path/to/repository --javascript

# Combine with other options
repo-serializer /path/to/repository --python -s -c  # Python files, structure only, copy to clipboard

# Extract AI/LLM prompts from the repository
repo-serializer /path/to/repository -p
repo-serializer /path/to/repository --prompt -o prompts.txt

Python API

from repo_serializer import serialize

# Serialize a repository, skipping specific directories
serialize("/path/to/repository", "output.txt", skip_dirs=["build", "dist"])

Features

  • Directory Structure: Clearly visualize repository structure in ASCII format.
  • Structure-Only Mode: Option to output only the directory structure and filenames without file contents.
  • File Filtering: Excludes common binary files, cache directories, hidden files, and irrelevant artifacts to keep outputs concise and focused.
  • Smart Content Handling:
    • Parses Jupyter notebooks to extract markdown and code cells with sample outputs
    • Limits CSV files to first 5 lines
    • Truncates large text files after 1000 lines
    • Handles non-UTF-8 and binary files gracefully
  • Extensive Filtering: Skips common configuration files, build artifacts, test directories, and more.
  • Clipboard Integration: Option to copy output directly to clipboard.
  • Prompt Extraction: Extract AI/LLM prompts from various sources:
    • Inline strings in Python/JavaScript code near LLM API calls (OpenAI, Anthropic, etc.)
    • Standalone prompt files (.prompt.txt, .prompt.md, etc.)
    • YAML/JSON configuration files with prompt definitions
    • Jupyter notebooks containing prompts
    • Markdown files in prompt-related directories

Example

# Create a serialized snapshot of your project
repo-serializer /Users/example_user/projects/my_repo -o repo_snapshot.txt

Prompt Extraction

The prompt extraction feature (-p or --prompt) helps you analyze and audit AI/LLM prompts used throughout your codebase. This is particularly useful for:

  • Reviewing prompt consistency and quality
  • Refactoring duplicate prompts
  • Creating prompt libraries
  • Documenting AI behavior
  • Security auditing of prompts

What Gets Extracted

  1. Inline Prompts in Code

    • Strings near OpenAI, Anthropic, and other LLM API calls
    • Multi-line strings assigned to prompt-related variables
    • Template literals in JavaScript/TypeScript
  2. Standalone Prompt Files

    • Files with extensions like .prompt.txt, .prompt.md
    • Files in prompts/ or similar directories
    • YAML/JSON files with prompt configurations
  3. Configuration Files

    • YAML/JSON files with keys like prompt, system_prompt, instructions
    • Nested prompt definitions in configuration objects

Example Output

Found 5 prompts in the repository:
================================================================================

File: src/ai_agent.py
--------------------------------------------------------------------------------

Line 15 (inline_string) - Near LLM API call:

You are an expert Python developer. Your task is to help users write clean, efficient, and well-documented Python code. Follow these guidelines:

  1. Always use proper error handling
  2. Write comprehensive docstrings
  3. Follow PEP 8 style guidelines

File: config/prompts.yaml
--------------------------------------------------------------------------------

Line 3 (config_file) - Key: system:

You are a code reviewer. Analyze the provided code for:

  • Security vulnerabilities
  • Performance issues
  • Code quality and maintainability

Contributing

Pull requests and improvements are welcome! Please ensure your contributions are clearly documented and tested.

Development

Quick Testing

For quick testing during development:

# Install in development mode
pip install -e .

# Now any changes to the source code take effect immediately
repo-serializer /path/to/test/repo -o test_output.txt

Full Test Suite

Run the test script:

./dev/test_dev.py

This will:

  1. Install the package in development mode
  2. Run multiple test scenarios
  3. Generate test outputs for review

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repo_serializer-1.2.0.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repo_serializer-1.2.0-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file repo_serializer-1.2.0.tar.gz.

File metadata

  • Download URL: repo_serializer-1.2.0.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for repo_serializer-1.2.0.tar.gz
Algorithm Hash digest
SHA256 c5d8c6db29ec1149f50b4b0a2bb93bfa568f4774eac5be084946af7ab84ff038
MD5 1ba4c534b99351e6c97bcdb96caeac1b
BLAKE2b-256 f592358123260d6a04aa272e4e6ecd5469d84030da815bebb7e5f76e826041a2

See more details on using hashes here.

File details

Details for the file repo_serializer-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for repo_serializer-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3b1c420ebebf31bc355b7a226f644ae5e756f546eb55b1f98c2c78267c4b2dd0
MD5 0d2e4078ceadd6a42952f66bcd778b9b
BLAKE2b-256 f184231c404bb78db2abe5159ff5fe55b70176990f26bf1a84a82c0c7985b1f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page