Skip to main content

CLI tool for extracting text from Git repositories

Project description

📝 Gittxt: Extract Text from Git Repositories

Gittxt is a lightweight CLI tool that scans Git repositories (local or remote) and extracts text content into a consolidated file (.txt, .json).
It is designed for code summarization, AI preprocessing, offline reading, and documentation generation.

🚀 Features

  • Scan Local or Remote Repositories (git clone support)
  • Include & Exclude File Patterns (--include .py, --exclude node_modules)
  • Multi-threaded Scanning (Optimized for large repositories)
  • Supports JSON & TXT Output Formats (--format json)
  • Incremental Caching for Faster Scans (Skips unchanged files)
  • Force Full Rescan When Needed (--force-rescan)

📌 Installation (From PyPI)

Now available on PyPI! 🎉 Install it with:

pip install gittxt

Verify Installation

gittxt --help

Expected Output:

Usage: gittxt [OPTIONS] SOURCE
Options:
  --include TEXT
  --exclude TEXT
  --size-limit INTEGER
  --branch TEXT
  --output TEXT
  --max-lines INTEGER
  --format [txt|json]
  --force-rescan
  --help  Show this message and exit.

📌 Usage Examples

1️⃣ Scan a Local Folder

gittxt .

📌 Result: Outputs gittxt_output.txt containing extracted text.


2️⃣ Scan a Remote GitHub Repository

gittxt https://github.com/torvalds/linux

📌 This will:

  • Clone the Linux Kernel repo to a temporary directory.
  • Extract all readable text.
  • Save it in gittxt_output.txt.

3️⃣ Customize Output (JSON & TXT)

Save as JSON (Structured Output)

gittxt . --format json --output repo_dump.json

Save as TXT (Default)

gittxt . --format txt --output repo_dump.txt

📌 🚀 New in v1.0.0

  • 🎉 First official release on PyPI (pip install gittxt)
  • 🔄 Automatic caching for faster rescans
  • 📦 Multi-threaded scanning for large repos
  • 📝 Improved documentation & CLI stability

📌 Development & Contribution

Want to contribute? Follow these steps:

1️⃣ Run Tests

pytest tests/

2️⃣ Formatting & Linting

black src/

3️⃣ Open a Pull Request

  1. Fork the repo
  2. Create a new branch (feature/my-change)
  3. Push changes
  4. Submit a PR! 🚀

📌 License

This project is licensed under the MIT License.


🚀 Next Steps

  • [ ] Add support for Markdown (.md) output.
  • [ ] Implement a Web UI for visualization.
  • [ ] Improve error handling for edge cases.

📌 Made by Sandeep Paidipati

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gittxt-1.0.0.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gittxt-1.0.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file gittxt-1.0.0.tar.gz.

File metadata

  • Download URL: gittxt-1.0.0.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure

File hashes

Hashes for gittxt-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9f46b8700778d1ea284c69dc248ec4c4de0d820155cd4692569a19c2d55f6749
MD5 aefd8de2f29aa785964f2f21394c588f
BLAKE2b-256 3b691e5e63417c0f8f5efd266c9cf8baf0b995171fe63fc4cba82ff6a83b00e5

See more details on using hashes here.

File details

Details for the file gittxt-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: gittxt-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure

File hashes

Hashes for gittxt-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2804053c34ef4560df5d00188df50c74db39e3e02085e07a16f46dd6e35ac26
MD5 4d8802e206edbeb0cbed23c61d6c3563
BLAKE2b-256 71faf4237d3deac583e7610b4b3d93dd78cb743b15e9c03502cc0315227e5c81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page