CLI tool for extracting text from Git repositories
Project description
📝 Gittxt: Extract Text from Git Repositories
Gittxt is a lightweight CLI tool that scans Git repositories (local or remote) and extracts text content into a consolidated file (.txt, .json).
It is designed for code summarization, AI preprocessing, offline reading, and documentation generation.
🚀 Features
- ✅ Scan Local or Remote Repositories (
git clonesupport) - ✅ Include & Exclude File Patterns (
--include .py,--exclude node_modules) - ✅ Multi-threaded Scanning (Optimized for large repositories)
- ✅ Supports JSON & TXT Output Formats (
--output-format json) - ✅ Incremental Caching for Faster Scans (Skips unchanged files)
- ✅ Force Full Rescan When Needed (
--force-rescan) - ✅ Improved Logging & Error Handling (More detailed messages for debugging)
- ✅ Better CLI Experience (Handles invalid inputs more effectively)
📌 Installation (From PyPI)
Now available on PyPI! 🎉 Install it with:
pip install gittxt
✅ Verify Installation
gittxt --help
Expected Output:
Usage: gittxt [OPTIONS] SOURCE
Options:
--include TEXT
--exclude TEXT
--size-limit INTEGER
--branch TEXT
--output-dir TEXT
--output-format [txt|json]
--max-lines INTEGER
--force-rescan
--help Show this message and exit.
📌 Usage Examples
1️⃣ Scan a Local Folder
gittxt .
📌 Result: Outputs extracted text from the repo.
2️⃣ Scan a Remote GitHub Repository
gittxt https://github.com/torvalds/linux
📌 This will:
- Clone the Linux Kernel repo to a temporary directory.
- Extract all readable text.
- Save it in
gittxt_output.txt.
3️⃣ Customize Output (JSON & TXT)
✅ Save as JSON (Structured Output)
gittxt . --output-format json --output repo_dump.json
✅ Save as TXT (Default)
gittxt . --output-format txt --output repo_dump.txt
📌 🚀 New in v1.1.0
-
🐛 Bug Fixes & Improvements
- Fixed
--formatargument (Now use--output-format). - Better logging & error messages (Now logs issues more clearly).
- More resilient CLI (Handles invalid paths properly).
- Fixed
-
🛠 Feature Enhancements
- CLI now supports
--force-rescancorrectly. - Improved caching system (Scans only modified files).
- More detailed scan reports.
- CLI now supports
-
✅ Full Test Coverage
- 18/18 tests passing 🟢
- New CLI tests added (
pytest tests/).
📌 Development & Contribution
Want to contribute? Follow these steps:
1️⃣ Run Tests
pytest tests/
2️⃣ Formatting & Linting
black src/
3️⃣ Open a Pull Request
- Fork the repo
- Create a new branch (
feature/my-change) - Push changes
- Submit a PR! 🚀
📌 License
This project is licensed under the MIT License.
🚀 Next Steps
- [ ] Add support for Markdown (
.md) output. - [ ] Implement a Web UI for visualization.
- [ ] Improve error handling for edge cases.
📌 Made by Sandeep Paidipati
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gittxt-1.1.0.tar.gz.
File metadata
- Download URL: gittxt-1.1.0.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd06b91d157ff6eb9a41ab8dcf0489e7b329136d14f00ba2c4cd133e90f31037
|
|
| MD5 |
cb6ff00a16b3bd1bcc08977b198a536f
|
|
| BLAKE2b-256 |
2533c74900bffa0217f71e8be5299bad02b941684dd813d54513bfbcfa10ec28
|
File details
Details for the file gittxt-1.1.0-py3-none-any.whl.
File metadata
- Download URL: gittxt-1.1.0-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9075ecacf38f4984c9656f05cbd8bcbb779f1c3a8a3499c6b99db10c841b1df0
|
|
| MD5 |
6537b8e38f85abaebe99a36977d2e544
|
|
| BLAKE2b-256 |
392d9eaace88a5fe711dc620819f9881ecfcdb9b8a779e8a4eb17dd8e856d686
|