Get Text of Your Repo for AI, LLMs & Docs!
Project description
🚀 Gittxt: Get Text of Your Repo for AI, LLMs & Docs!
Gittxt is a lightweight CLI tool that extracts text from Git repositories and formats it into AI-friendly outputs (.txt, .json, .md).
Whether you’re using ChatGPT, Grok, or Ollama, or any LLM, Gittxt helps process repositories for insights, training, and documentation.
✨ Why Use Gittxt?
✅ Extract Readable Text from Git Repos
✅ Convert Code & Docs into AI-Friendly Formats
✅ Generate JSON for LLM Training (Ideal for AI Preprocessing)
✅ Create Markdown Files for Documentation
✅ Summarize & Analyze GitHub Repositories
📌 Installation (From PyPI)
pip install gittxt
Verify installation:
gittxt --help
Expected Output:
Usage: gittxt [OPTIONS] SOURCE
Options:
--include TEXT
--exclude TEXT
--size-limit INTEGER
--branch TEXT
--output-dir TEXT
--output-format [txt|json|md]
--max-lines INTEGER
--summary
--debug
--help Show this message and exit.
📌 How to Use Gittxt
1️⃣ Extract Text from a Local Repository
gittxt .
✅ Extracts all readable text from your repo into gittxt-outputs/text/.
2️⃣ Extract from a Remote GitHub Repo
gittxt https://github.com/sandy-sp/sandy-sp
✅ Automatically clones the repo, scans it, and extracts text.
3️⃣ Use AI-Friendly Output Formats
🧠 JSON (Best for AI & LLM Training)
gittxt . --output-format json --output repo_dump.json
Why JSON?
- Perfect format for AI & LLMs (GPT-4, Grok, LLaMA).
- Prepares structured data for AI training.
- Can be used to fine-tune models with repository insights.
📜 TXT (For AI Chat & Analysis)
gittxt . --output-format txt --output repo_dump.txt
Why TXT?
- Extracts pure text, making it easy for AI-powered chat analysis.
- Good for summarization and AI-assisted code review.
📝 Markdown (Best for Documentation)
gittxt . --output-format md --output repo_dump.md
Why Markdown?
- Great for GitHub docs & project READMEs.
- LLMs like ChatGPT use Markdown for structured responses.
- Retains headings, code snippets, and structure.
4️⃣ Get a Summary Report
gittxt . --summary
Example Output:
📊 Summary Report:
- Scanned 105 text files
- Total Size: 3.2 MB
- File Types: .py, .md, .txt
- Saved in: gittxt-outputs/text/repo_dump.txt
✅ Helps quickly analyze repositories for AI training.
🆕 What's New in v1.2.0?
✅ Bug Fixes & Enhancements
- Better file filtering (
--include,--exclude). - Faster processing with improved caching.
- More accurate MIME-type detection.
🚀 New Features
- ✅ Markdown Output (
--output-format md) → Generates AI-friendly structured docs. - 📊 Summary Reports (
--summary) → Instantly view repo insights. - 🔍 Debug Mode (
--debug) → See detailed logs of the extraction process.
📌 Contribute & Develop
1️⃣ Run Tests
pytest tests/
2️⃣ Format Code
black src/
3️⃣ Submit a PR
- Fork the repo
- Create a new branch (
feature/my-change) - Push changes
- Submit a PR! 🚀
📜 License
Gittxt is licensed under MIT.
💡 Next Features Coming Soon!
- Interactive CLI for easy selection
- Web UI for scanning repositories visually
- Smarter AI-based file summarization
📌 Made by Sandeep Paidipati 🚀 Gittxt: Get Text of Your Repo for AI, LLMs & Docs!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gittxt-1.2.0.tar.gz.
File metadata
- Download URL: gittxt-1.2.0.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
399a7783a8d4b66544cc7a8931c3be4e4697434aaa2283ca664921890fffde09
|
|
| MD5 |
6ca4900ba50d5dc4e08d293b827918ab
|
|
| BLAKE2b-256 |
8c26cac8e45e276bff1d837f46f835d04a08e4e3773e1af1c99c3580eb11c01e
|
File details
Details for the file gittxt-1.2.0-py3-none-any.whl.
File metadata
- Download URL: gittxt-1.2.0-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd3673e6ca517e7f2384eb31a77bac41ee8aee10e0ecc054531c27ed5b14c8d3
|
|
| MD5 |
7c595a234505bee2241b2d2e26973a5e
|
|
| BLAKE2b-256 |
e0a5d9751141f8f3f0c7ca65a831478b6226d17a31e108292636beaaaaadba1c
|