CLI tool for extracting text from Git repositories
Project description
📝 Gittxt: Extract Text from Git Repositories
Gittxt is a lightweight CLI tool that scans Git repositories (local or remote) and extracts text content into a consolidated file (.txt or .json).
It is designed for code summarization, AI preprocessing, offline reading, and documentation generation.
🚀 Features
- ✅ Scan Local or Remote Repositories (
git clonesupport) - ✅ Include & Exclude File Patterns (
--include .py,--exclude node_modules) - ✅ Multi-threaded Scanning (Optimized for large repositories)
- ✅ Supports JSON & TXT Output Formats (
--format json) - ✅ Incremental Caching for Faster Scans (Skips unchanged files)
- ✅ Force Full Rescan When Needed (
--force-rescan)
📌 Installation
1️⃣ Clone the Repository
git clone https://github.com/sandy-sp/gittxt.git
cd gittxt
2️⃣ Create & Activate Virtual Environment
python3 -m venv venv
source venv/bin/activate # For Linux/macOS
venv\Scripts\activate # For Windows
3️⃣ Install Dependencies
pip install -r requirements.txt
4️⃣ Install in Editable Mode (For Development)
pip install -e src/
📌 Usage
1️⃣ Scan a Local Folder
PYTHONPATH=src python src/gittxt/cli.py .
📌 Result: Outputs gittxt_output.txt containing extracted text.
2️⃣ Scan a Remote GitHub Repository
PYTHONPATH=src python src/gittxt/cli.py https://github.com/torvalds/linux
📌 This will:
- Clone the Linux Kernel repo to a temporary directory.
- Extract all readable text.
- Save it in
gittxt_output.txt.
3️⃣ Customize Output (JSON & TXT)
✅ Save as JSON (Structured Output)
PYTHONPATH=src python src/gittxt/cli.py . --format json --output repo_dump.json
✅ Save as TXT (Default)
PYTHONPATH=src python src/gittxt/cli.py . --format txt --output repo_dump.txt
4️⃣ Include & Exclude Specific Files
✅ Scan Only Python Files
PYTHONPATH=src python src/gittxt/cli.py . --include .py
✅ Exclude node_modules, .log Files
PYTHONPATH=src python src/gittxt/cli.py . --exclude node_modules --exclude .log
5️⃣ Improve Performance (Multi-threading)
Gittxt automatically optimizes scanning based on repository size.
📌 Want to manually set workers? Use:
PYTHONPATH=src python src/gittxt/cli.py . --workers 8
6️⃣ Caching: Skip Unchanged Files for Faster Scans
Gittxt remembers previously scanned files to avoid redundant processing.
✅ First Scan (Full Processing)
PYTHONPATH=src python src/gittxt/cli.py .
✅ Second Scan (Uses Cache for Faster Results)
PYTHONPATH=src python src/gittxt/cli.py .
🚀 Faster! Skips unchanged files automatically!
7️⃣ Force a Full Rescan (Ignore Cache)
PYTHONPATH=src python src/gittxt/cli.py . --force-rescan
📌 Deletes .gittxt_cache.json and scans everything from scratch.
📌 Development & Contribution
Want to contribute? Follow these steps:
1️⃣ Run Tests
pytest tests/
2️⃣ Formatting & Linting
black src/
3️⃣ Open a Pull Request
- Fork the repo
- Create a new branch (
feature/my-change) - Push changes
- Submit a PR! 🚀
📌 License
This project is licensed under the MIT License.
🚀 Next Steps
- [ ] Improve error handling for edge cases.
- [ ] Add support for Markdown (
.md) output. - [ ] Implement a Web UI for visualization.
📌 Made by Sandeep Paidipati
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gittxt-0.1.0.tar.gz.
File metadata
- Download URL: gittxt-0.1.0.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f8efb24348e7897bcc7be55bc239b32565856b9324cbc3de8cbf3a87921e2d4
|
|
| MD5 |
4b1d1ac7fa80017dfd6d02227e90fbc5
|
|
| BLAKE2b-256 |
c53a1ec33fe6ceb2a4d37186881b31fc1e3bd0f4ba4e3540442d181e6f9396e5
|
File details
Details for the file gittxt-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gittxt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7e258d9c67453daa6729bda68a3e111af7d9cc5a3bf962165936d3eabffd8ce
|
|
| MD5 |
4aad0806335c1bf8be86badb8a4bb414
|
|
| BLAKE2b-256 |
cfc08cb3b1509162967a068145e1cca9a0b29c4146c79726e3fcfc722810aa07
|