CLI tool for extracting text from Git repositories
Project description
📝 Gittxt: Extract Text from Git Repositories
Gittxt is a lightweight CLI tool that scans Git repositories (local or remote) and extracts text content into a consolidated file (.txt, .json).
It is designed for code summarization, AI preprocessing, offline reading, and documentation generation.
🚀 Features
- ✅ Scan Local or Remote Repositories (
git clonesupport) - ✅ Include & Exclude File Patterns (
--include .py,--exclude node_modules) - ✅ Multi-threaded Scanning (Optimized for large repositories)
- ✅ Supports JSON & TXT Output Formats (
--format json) - ✅ Incremental Caching for Faster Scans (Skips unchanged files)
- ✅ Force Full Rescan When Needed (
--force-rescan)
📌 Installation (From PyPI)
Now available on PyPI! 🎉 Install it with:
pip install gittxt
✅ Verify Installation
gittxt --help
Expected Output:
Usage: gittxt [OPTIONS] SOURCE
Options:
--include TEXT
--exclude TEXT
--size-limit INTEGER
--branch TEXT
--output TEXT
--max-lines INTEGER
--format [txt|json]
--force-rescan
--help Show this message and exit.
📌 Usage Examples
1️⃣ Scan a Local Folder
gittxt .
📌 Result: Outputs gittxt_output.txt containing extracted text.
2️⃣ Scan a Remote GitHub Repository
gittxt https://github.com/torvalds/linux
📌 This will:
- Clone the Linux Kernel repo to a temporary directory.
- Extract all readable text.
- Save it in
gittxt_output.txt.
3️⃣ Customize Output (JSON & TXT)
✅ Save as JSON (Structured Output)
gittxt . --format json --output repo_dump.json
✅ Save as TXT (Default)
gittxt . --format txt --output repo_dump.txt
📌 🚀 New in v1.0.0
- 🎉 First official release on PyPI (
pip install gittxt) - 🔄 Automatic caching for faster rescans
- 📦 Multi-threaded scanning for large repos
- 📝 Improved documentation & CLI stability
📌 Development & Contribution
Want to contribute? Follow these steps:
1️⃣ Run Tests
pytest tests/
2️⃣ Formatting & Linting
black src/
3️⃣ Open a Pull Request
- Fork the repo
- Create a new branch (
feature/my-change) - Push changes
- Submit a PR! 🚀
📌 License
This project is licensed under the MIT License.
🚀 Next Steps
- [ ] Add support for Markdown (
.md) output. - [ ] Implement a Web UI for visualization.
- [ ] Improve error handling for edge cases.
📌 Made by Sandeep Paidipati
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gittxt-1.0.0.tar.gz.
File metadata
- Download URL: gittxt-1.0.0.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f46b8700778d1ea284c69dc248ec4c4de0d820155cd4692569a19c2d55f6749
|
|
| MD5 |
aefd8de2f29aa785964f2f21394c588f
|
|
| BLAKE2b-256 |
3b691e5e63417c0f8f5efd266c9cf8baf0b995171fe63fc4cba82ff6a83b00e5
|
File details
Details for the file gittxt-1.0.0-py3-none-any.whl.
File metadata
- Download URL: gittxt-1.0.0-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2804053c34ef4560df5d00188df50c74db39e3e02085e07a16f46dd6e35ac26
|
|
| MD5 |
4d8802e206edbeb0cbed23c61d6c3563
|
|
| BLAKE2b-256 |
71faf4237d3deac583e7610b4b3d93dd78cb743b15e9c03502cc0315227e5c81
|