Skip to main content

CLI tool for extracting text from Git repositories

Project description

📝 Gittxt: Extract Text from Git Repositories

Gittxt is a lightweight CLI tool that scans Git repositories (local or remote) and extracts text content into a consolidated file (.txt, .json).
It is designed for code summarization, AI preprocessing, offline reading, and documentation generation.

🚀 Features

  • Scan Local or Remote Repositories (git clone support)
  • Include & Exclude File Patterns (--include .py, --exclude node_modules)
  • Multi-threaded Scanning (Optimized for large repositories)
  • Supports JSON & TXT Output Formats (--output-format json)
  • Incremental Caching for Faster Scans (Skips unchanged files)
  • Force Full Rescan When Needed (--force-rescan)
  • Improved Logging & Error Handling (More detailed messages for debugging)
  • Better CLI Experience (Handles invalid inputs more effectively)

📌 Installation (From PyPI)

Now available on PyPI! 🎉 Install it with:

pip install gittxt

Verify Installation

gittxt --help

Expected Output:

Usage: gittxt [OPTIONS] SOURCE
Options:
  --include TEXT
  --exclude TEXT
  --size-limit INTEGER
  --branch TEXT
  --output-dir TEXT
  --output-format [txt|json]
  --max-lines INTEGER
  --force-rescan
  --help  Show this message and exit.

📌 Usage Examples

1️⃣ Scan a Local Folder

gittxt .

📌 Result: Outputs extracted text from the repo.


2️⃣ Scan a Remote GitHub Repository

gittxt https://github.com/torvalds/linux

📌 This will:

  • Clone the Linux Kernel repo to a temporary directory.
  • Extract all readable text.
  • Save it in gittxt_output.txt.

3️⃣ Customize Output (JSON & TXT)

Save as JSON (Structured Output)

gittxt . --output-format json --output repo_dump.json

Save as TXT (Default)

gittxt . --output-format txt --output repo_dump.txt

📌 🚀 New in v1.1.0

  • 🐛 Bug Fixes & Improvements

    • Fixed --format argument (Now use --output-format).
    • Better logging & error messages (Now logs issues more clearly).
    • More resilient CLI (Handles invalid paths properly).
  • 🛠 Feature Enhancements

    • CLI now supports --force-rescan correctly.
    • Improved caching system (Scans only modified files).
    • More detailed scan reports.
  • ✅ Full Test Coverage

    • 18/18 tests passing 🟢
    • New CLI tests added (pytest tests/).

📌 Development & Contribution

Want to contribute? Follow these steps:

1️⃣ Run Tests

pytest tests/

2️⃣ Formatting & Linting

black src/

3️⃣ Open a Pull Request

  1. Fork the repo
  2. Create a new branch (feature/my-change)
  3. Push changes
  4. Submit a PR! 🚀

📌 License

This project is licensed under the MIT License.


🚀 Next Steps

  • [ ] Add support for Markdown (.md) output.
  • [ ] Implement a Web UI for visualization.
  • [ ] Improve error handling for edge cases.

📌 Made by Sandeep Paidipati

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gittxt-1.1.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gittxt-1.1.0-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file gittxt-1.1.0.tar.gz.

File metadata

  • Download URL: gittxt-1.1.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure

File hashes

Hashes for gittxt-1.1.0.tar.gz
Algorithm Hash digest
SHA256 dd06b91d157ff6eb9a41ab8dcf0489e7b329136d14f00ba2c4cd133e90f31037
MD5 cb6ff00a16b3bd1bcc08977b198a536f
BLAKE2b-256 2533c74900bffa0217f71e8be5299bad02b941684dd813d54513bfbcfa10ec28

See more details on using hashes here.

File details

Details for the file gittxt-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: gittxt-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure

File hashes

Hashes for gittxt-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9075ecacf38f4984c9656f05cbd8bcbb779f1c3a8a3499c6b99db10c841b1df0
MD5 6537b8e38f85abaebe99a36977d2e544
BLAKE2b-256 392d9eaace88a5fe711dc620819f9881ecfcdb9b8a779e8a4eb17dd8e856d686

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page