Get Text of Your Repo for AI, LLMs & Docs!
Project description
🚀 Gittxt: Get Text of Your Repo for AI, LLMs & Docs!
Gittxt is a lightweight CLI tool that extracts text from Git repositories and formats it into AI-friendly outputs (.txt, .json, .md). Whether you’re using ChatGPT, Grok, Ollama or any LLM, Gittxt helps you process repositories for insights, training, and documentation.
✨ Why Use Gittxt?
- Extract Readable Text: Easily pull text from code, docs, and other repository files.
- AI-Friendly Outputs: Generate outputs in TXT, JSON, and Markdown for different use cases.
- Efficient Processing: Faster scanning with incremental caching.
- Flexible Filtering: Use advanced flags like
--docs-onlyand--auto-filterto control what’s extracted. - Multi-Repository Support: Scan one or more repositories in a single command.
🆕 Release v1.4.0
New Features & Enhancements
-
Interactive Installation:
Use the newgittxt installsubcommand to set up your configuration (output directory, logging preferences, etc.) interactively. -
Multi-Repository Scanning:
Scan multiple repositories at once, whether they are local or remote. -
Advanced Filtering Options:
--docs-only: Extract only documentation files (e.g., README, docs/ folder, etc.).--auto-filter: Automatically skip common unwanted or binary files.
-
Multi-Format Output:
Specify multiple output formats simultaneously (e.g.,--output-format txt,json,md). -
Enhanced Summary Reports:
Outputs include summary statistics and an estimated token count for further AI processing. -
Improved Logging & Caching:
Faster, more accurate scanning with incremental caching and a rotating log file system. -
Improved Token Estimation: Enhanced token counting algorithm with better accuracy for LLM processing, including support for CamelCase, special characters, and subword tokenization patterns.
📥 Installation
Via PIP
pip install gittxt==1.4.0
First-Time Setup (Interactive)
After installing, run:
gittxt install
This command will prompt you to configure:
- Your default output directory (automatically set based on your OS, e.g.,
~/Gittxt/on Linux/Mac) - Logging level and file logging preferences
📌 How to Use Gittxt
1. Scanning Repositories
Use the scan subcommand to extract text and generate outputs.
Scan a Local Repository
gittxt scan .
Extracts all readable text into the default output directories.
Scan a Remote GitHub Repository
gittxt scan https://github.com/sandy-sp/sandy-sp
Automatically clones the repository, scans it, and extracts text.
Scan Multiple Repositories with Advanced Options
gittxt scan /path/to/repo1 https://github.com/user/repo2 --output-format txt,json --docs-only --auto-filter --summary
🔧 CLI Options
| Option | Description |
|---|---|
--include |
Include only files matching these patterns. |
--exclude |
Exclude files matching these patterns. |
--size-limit |
Exclude files larger than the specified size (in bytes). |
--branch |
Specify a Git branch (for remote repositories). |
--output-dir |
Override the default output directory. |
--output-format |
Comma-separated list of output formats (e.g., txt,json,md). |
--max-lines |
Limit the number of lines per file. |
--summary |
Display a summary report after scanning. |
--debug |
Enable debug mode for detailed logging. |
--docs-only |
Only extract documentation files (e.g., README, docs folder). |
--auto-filter |
Automatically skip common unwanted or binary files. |
📄 Output Formats
- TXT: Simple text extraction for AI chat and quick analysis.
- JSON: Structured output ideal for LLM training and data preprocessing.
- Markdown (MD): Neatly formatted documentation for GitHub or project READMEs.
When specifying multiple formats (e.g., --output-format txt,json), Gittxt generates separate files in their respective output directories.
🗂 Directory Structure
By default, outputs are stored in your configured output directory, which is organized as follows:
<output_dir>/
├── text/ # Plain text outputs (.txt)
├── json/ # JSON outputs (.json)
├── md/ # Markdown outputs (.md)
└── cache/ # Caching for incremental scans
⚙️ Configuration
Gittxt uses a configuration file (gittxt-config.json) to store user preferences. You can update this configuration via the interactive install command:
gittxt install
Or edit the file manually. Key settings include:
- Output Directory: Auto-determined based on your OS (e.g.,
~/Gittxt/). - Logging Options: Logging level and file logging preferences.
- Filtering Options: Include/exclude patterns, file size limits, etc.
📌 Contribute & Develop
- Run Tests:
pytest tests/ - Format Code:
black src/ - Submit a PR:
- Fork the repo.
- Create a new branch (e.g.,
feature/my-change). - Push your changes.
- Submit a PR.
For more details, see the Contributing Guide.
💡 Future Roadmap
Our future plans include enhancements to the user interface and further AI-based features. We’re working on a lightweight web-based UI and additional improvements that streamline repository analysis and documentation extraction.
📜 License
Gittxt is licensed under the MIT License.
Made by Sandeep Paidipati
🚀 Gittxt: Get Text of Your Repo for AI, LLMs & Docs!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gittxt-1.4.0.tar.gz.
File metadata
- Download URL: gittxt-1.4.0.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad2cea90024f054556c4b0ec6bbad9f7c93f562921a7631162907d8e1f508fe2
|
|
| MD5 |
5be502b418c9537cf0ae26cc62a87b1d
|
|
| BLAKE2b-256 |
24ef79ff6825d047dc0bfb80edb06ec9eb3472c458baa4e7157846570feb58fb
|
File details
Details for the file gittxt-1.4.0-py3-none-any.whl.
File metadata
- Download URL: gittxt-1.4.0-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
574185e09df0667af6ca0626eb7a750e5cff789abf7e281ac8ac8dee8a1b4e2a
|
|
| MD5 |
bb9666c5ba239d7176cd2b988690ef1a
|
|
| BLAKE2b-256 |
0a8250b7e6b5ee870f5774dc9faf618959c6daa55bd4f6bb3cb5761c1a4c1508
|