Skip to main content

Get Text of Your Repo for AI, LLMs & Docs!

Project description

🚀 Gittxt: Get Text of Your Repo for AI, LLMs & Docs!

Gittxt is a lightweight CLI tool that extracts text from Git repositories and formats it into AI-friendly outputs (.txt, .json, .md).
Whether you’re using ChatGPT, Grok, or Ollama, or any LLM, Gittxt helps process repositories for insights, training, and documentation.

✨ Why Use Gittxt?

Extract Readable Text from Git Repos
Convert Code & Docs into AI-Friendly Formats
Generate JSON for LLM Training (Ideal for AI Preprocessing)
Create Markdown Files for Documentation
Summarize & Analyze GitHub Repositories


📌 Installation (From PyPI)

pip install gittxt

Verify installation:

gittxt --help

Expected Output:

Usage: gittxt [OPTIONS] SOURCE
Options:
  --include TEXT
  --exclude TEXT
  --size-limit INTEGER
  --branch TEXT
  --output-dir TEXT
  --output-format [txt|json|md]
  --max-lines INTEGER
  --summary
  --debug
  --help  Show this message and exit.

📌 How to Use Gittxt

1️⃣ Extract Text from a Local Repository

gittxt .

✅ Extracts all readable text from your repo into gittxt-outputs/text/.


2️⃣ Extract from a Remote GitHub Repo

gittxt https://github.com/sandy-sp/sandy-sp

✅ Automatically clones the repo, scans it, and extracts text.


3️⃣ Use AI-Friendly Output Formats

🧠 JSON (Best for AI & LLM Training)

gittxt . --output-format json --output repo_dump.json

Why JSON?

  • Perfect format for AI & LLMs (GPT-4, Grok, LLaMA).
  • Prepares structured data for AI training.
  • Can be used to fine-tune models with repository insights.

📜 TXT (For AI Chat & Analysis)

gittxt . --output-format txt --output repo_dump.txt

Why TXT?

  • Extracts pure text, making it easy for AI-powered chat analysis.
  • Good for summarization and AI-assisted code review.

📝 Markdown (Best for Documentation)

gittxt . --output-format md --output repo_dump.md

Why Markdown?

  • Great for GitHub docs & project READMEs.
  • LLMs like ChatGPT use Markdown for structured responses.
  • Retains headings, code snippets, and structure.

4️⃣ Get a Summary Report

gittxt . --summary

Example Output:

📊 Summary Report:
 - Scanned 105 text files
 - Total Size: 3.2 MB
 - File Types: .py, .md, .txt
 - Saved in: gittxt-outputs/text/repo_dump.txt

Helps quickly analyze repositories for AI training.


🆕 What's New in v1.2.0?

Bug Fixes & Enhancements

  • Better file filtering (--include, --exclude).
  • Faster processing with improved caching.
  • More accurate MIME-type detection.

🚀 New Features

  • ✅ Markdown Output (--output-format md) → Generates AI-friendly structured docs.
  • 📊 Summary Reports (--summary) → Instantly view repo insights.
  • 🔍 Debug Mode (--debug) → See detailed logs of the extraction process.

📌 Contribute & Develop

1️⃣ Run Tests

pytest tests/

2️⃣ Format Code

black src/

3️⃣ Submit a PR

  1. Fork the repo
  2. Create a new branch (feature/my-change)
  3. Push changes
  4. Submit a PR! 🚀

📜 License

Gittxt is licensed under MIT.


💡 Next Features Coming Soon!

  • Interactive CLI for easy selection
  • Web UI for scanning repositories visually
  • Smarter AI-based file summarization

📌 Made by Sandeep Paidipati 🚀 Gittxt: Get Text of Your Repo for AI, LLMs & Docs!


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gittxt-1.2.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gittxt-1.2.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file gittxt-1.2.0.tar.gz.

File metadata

  • Download URL: gittxt-1.2.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure

File hashes

Hashes for gittxt-1.2.0.tar.gz
Algorithm Hash digest
SHA256 399a7783a8d4b66544cc7a8931c3be4e4697434aaa2283ca664921890fffde09
MD5 6ca4900ba50d5dc4e08d293b827918ab
BLAKE2b-256 8c26cac8e45e276bff1d837f46f835d04a08e4e3773e1af1c99c3580eb11c01e

See more details on using hashes here.

File details

Details for the file gittxt-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: gittxt-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure

File hashes

Hashes for gittxt-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd3673e6ca517e7f2384eb31a77bac41ee8aee10e0ecc054531c27ed5b14c8d3
MD5 7c595a234505bee2241b2d2e26973a5e
BLAKE2b-256 e0a5d9751141f8f3f0c7ca65a831478b6226d17a31e108292636beaaaaadba1c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page