Skip to main content

Concatenate text-like files in a directory tree with Typer-powered CLI.

Project description

Project Combiner (combine-files)

project-combiner is a powerful and flexible command-line tool for concatenating text-based files within a directory tree. It's designed to be intuitive, fast, and highly configurable, making it easy to bundle source code, documentation, or any text-like files for analysis, distribution, or large language model contexts.

PyPI version

Highlights

  • Intuitive CLI: Powered by Typer, providing a rich --help experience and shell completion.
  • Cross-Platform: Uses pathlib.Path for seamless operation on Windows, macOS, and Linux.
  • Highly Configurable: Control everything with command-line flags—no hard-coding required. Specify what to include, what to skip, file encodings, output location, and more.
  • .gitignore Aware: Automatically respects your project's .gitignore rules (requires pathspec).
  • Smart File Handling: Skips binary files based on MIME types to prevent garbage output and, by default, any directory whose name starts with . (override with --include-dot-dirs).
  • Performance-Oriented: Features optional multithreaded file reading and a tqdm progress bar for large projects.
  • Flexible Output: Stream combined content to standard output (stdout) or save it directly to a file.

Installation

You can install project-combiner directly from PyPI.

Full Feature Set

For all features, including .gitignore support and a progress bar, install with the [all] extra:

pip install project-combiner[all]

This installs typer, pathspec, and tqdm.

Minimal Installation

For the core functionality without optional dependencies:

pip install project-combiner

Usage

The basic command is combine-files, followed by the path to the directory you want to process and any desired options.

combine-files [ROOT_DIRS]... [OPTIONS]

Command-Line Options

Option Alias Description Default
--output-file, -o Path to the output file. Use - for stdout. - (stdout)
--skip-dirs Space-separated list of directory names to skip. .git .hg __pycache__
--skip-files Space-separated list of file names to skip.
--skip-exts Space-separated list of file extensions to skip.
--preview-exts Space-separated list of extensions to preview instead of including their full content.
--encoding The encoding to use for reading files. utf-8
--jobs, -j Number of parallel threads for reading files. 2
--progress Show a progress bar during file processing (requires tqdm).
--follow-symlinks Follow symbolic links. False
--skip-dot-dirs / --include-dot-dirs Skip directories that start with . (dot). Use the second form to include them. --skip-dot-dirs
--log-level Set the logging level (e.g., DEBUG, INFO). WARNING
--version Show the version and exit.
--help Show the help message and exit.

Example Scenario

Let's walk through how to use project-combiner with a typical project structure.

Sample Project Structure

Imagine you have a project with the following layout:

my_project/
├── .gitignore
├── src/
│   ├── main.py
│   ├── utils.py
│   └── data/
│       ├── data.csv
│       └── notes.txt
├── tests/
│   ├── test_main.py
│   └── test_utils.py
├── docs/
│   ├── guide.md
│   └── reference.md
├── .venv/
│   └── ... (virtual environment files)
└── README.md

Your .gitignore file might look like this:

# .gitignore
.venv/
__pycache__/
*.log

Use Cases

1. Combine All Relevant Files

To combine all text-based files in the project while respecting the .gitignore file, simply run:

combine-files my_project
  • What it does: It will walk through my_project, skip the .venv directory (as specified in .gitignore), and concatenate the contents of all other text files (.py, .csv, .txt, .md).
  • Output: The combined content is printed to the terminal (stdout).

2. Save the Combined Output to a File

To save the output into a single file named combined_output.txt:

combine-files my_project -o combined_output.txt
  • What it does: Same as the first example, but the result is written to combined_output.txt instead of the console.

3. Exclude the tests Directory

If you want to combine only the application source code and documentation, excluding the tests:

combine-files my_project --skip-dirs tests
  • What it does: This command will skip the tests/ directory in addition to the patterns in .gitignore. The output will contain files from src/ and docs/.

4. Combine Only Python Source Files

To isolate just the Python source code from the src directory:

combine-files my_project/src --skip-exts .csv .txt .md

Or, more simply, if you only want to process the src folder:

combine-files my_project/src

Assuming data contains non-python files, they will be skipped if they are binary or if you explicitly skip their extensions.

5. Preview Large Data or Markdown Files

Sometimes you don't want the full content of large data files or verbose documentation. You can "preview" them instead.

combine-files . --preview-exts .md .csv -j 4 --progress
  • What it does:
    • It processes the entire project (.).
    • For any file ending in .md or .csv, it will only include a header indicating the file's path and a "preview" message, rather than its full content.
    • It uses 4 threads (-j 4) for faster reading and shows a progress bar (--progress).

The output for a previewed file like docs/guide.md would look like this:

---
File: docs/guide.md (preview)
---

Advanced Usage

Working with Encodings

If your project uses a different file encoding, you can specify it with the --encoding flag. For example, for projects using legacy Windows encodings:

combine-files . --encoding cp1252

Performance

For very large projects with thousands of files, you can speed up the process by increasing the number of threads. A good starting point is the number of cores on your CPU.

# Use 8 threads to read files
combine-files . -j 8 --progress

Contributing

Contributions are welcome! If you have ideas for new features, bug fixes, or improvements, feel free to open an issue or submit a pull request on the project's repository.

Project Links

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

project_combiner-0.1.1.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

project_combiner-0.1.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file project_combiner-0.1.1.tar.gz.

File metadata

  • Download URL: project_combiner-0.1.1.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for project_combiner-0.1.1.tar.gz
Algorithm Hash digest
SHA256 359eedbd4620b92bda46dca443d80e100f795ee262a78598999abcd156f1c501
MD5 14b62dbf3282f62cf0dd528b26ad4066
BLAKE2b-256 9c175daa57fbb805c60b3b7a0dabb5a1c6f8a74f6546af5165a16b65e17e4857

See more details on using hashes here.

File details

Details for the file project_combiner-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for project_combiner-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4ac6ed53253603dc8392194710acc5b5c5cc9437ff8e85afa177a6fe591848d0
MD5 454b2a7ecd0a68b73e1fd698674c82dc
BLAKE2b-256 e67fbc84665ea87e0e3327c06945337252010969db9b340a9e2d04560ffcd176

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page