Skip to main content

Get Text of Your Repo for AI, LLMs & Docs!

Project description

🚀 Gittxt: Get Text of Your Repo for AI, LLMs & Docs!

Release PyPI version PyPI Downloads GitHub repo size GitHub top language License: MIT Build Status

Gittxt is a lightweight CLI tool that extracts text from Git repositories and formats it into AI-friendly outputs (.txt, .json, .md). Whether you’re using ChatGPT, Grok, Ollama or any LLM, Gittxt helps you process repositories for insights, training, and documentation.


✨ Why Use Gittxt?

  • Extract Readable Text: Easily pull text from code, docs, and other repository files.
  • AI-Friendly Outputs: Generate outputs in TXT, JSON, and Markdown for different use cases.
  • Efficient Processing: Faster scanning with incremental caching.
  • Flexible Filtering: Use advanced flags like --docs-only and --auto-filter to control what’s extracted.
  • Multi-Repository Support: Scan one or more repositories in a single command.

🆕 Release v1.4.1

New Features & Enhancements

  • Interactive Installation:
    Use the new gittxt install subcommand to set up your configuration (output directory, logging preferences, etc.) interactively.

  • Multi-Repository Scanning:
    Scan multiple repositories at once, whether they are local or remote.

  • Advanced Filtering Options:

    • --docs-only: Extract only documentation files (e.g., README, docs/ folder, etc.).
    • --auto-filter: Automatically skip common unwanted or binary files.
  • Multi-Format Output:
    Specify multiple output formats simultaneously (e.g., --output-format txt,json,md).

  • Enhanced Summary Reports:
    Outputs include summary statistics and an estimated token count for further AI processing.

  • Improved Logging & Caching:
    Faster, more accurate scanning with incremental caching and a rotating log file system.

  • Improved Token Estimation: Enhanced token counting algorithm with better accuracy for LLM processing, including support for CamelCase, special characters, and subword tokenization patterns.


📥 Installation

Via PIP

pip install gittxt==1.4.1

First-Time Setup (Interactive)

After installing, run:

gittxt install

This command will prompt you to configure:

  • Your default output directory (automatically set based on your OS, e.g., ~/Gittxt/ on Linux/Mac)
  • Logging level and file logging preferences

📌 How to Use Gittxt

1. Scanning Repositories

Use the scan subcommand to extract text and generate outputs.

Scan a Local Repository

gittxt scan .

Extracts all readable text into the default output directories.

Scan a Remote GitHub Repository

gittxt scan https://github.com/sandy-sp/sandy-sp

Automatically clones the repository, scans it, and extracts text.

Scan Multiple Repositories with Advanced Options

gittxt scan /path/to/repo1 https://github.com/user/repo2 --output-format txt,json --docs-only --auto-filter --summary

🔧 CLI Options

Option Description
--include Include only files matching these patterns.
--exclude Exclude files matching these patterns.
--size-limit Exclude files larger than the specified size (in bytes).
--branch Specify a Git branch (for remote repositories).
--output-dir Override the default output directory.
--output-format Comma-separated list of output formats (e.g., txt,json,md).
--max-lines Limit the number of lines per file.
--summary Display a summary report after scanning.
--debug Enable debug mode for detailed logging.
--docs-only Only extract documentation files (e.g., README, docs folder).
--auto-filter Automatically skip common unwanted or binary files.

📄 Output Formats

  • TXT: Simple text extraction for AI chat and quick analysis.
  • JSON: Structured output ideal for LLM training and data preprocessing.
  • Markdown (MD): Neatly formatted documentation for GitHub or project READMEs.

When specifying multiple formats (e.g., --output-format txt,json), Gittxt generates separate files in their respective output directories.


🗂 Directory Structure

By default, outputs are stored in your configured output directory, which is organized as follows:

<output_dir>/
  ├── text/    # Plain text outputs (.txt)
  ├── json/    # JSON outputs (.json)
  ├── md/      # Markdown outputs (.md)
  └── cache/   # Caching for incremental scans

⚙️ Configuration

Gittxt uses a configuration file (gittxt-config.json) to store user preferences. You can update this configuration via the interactive install command:

gittxt install

Or edit the file manually. Key settings include:

  • Output Directory: Auto-determined based on your OS (e.g., ~/Gittxt/).
  • Logging Options: Logging level and file logging preferences.
  • Filtering Options: Include/exclude patterns, file size limits, etc.

📌 Contribute & Develop

  1. Run Tests:
    pytest tests/
    
  2. Format Code:
    black src/
    
  3. Submit a PR:
    • Fork the repo.
    • Create a new branch (e.g., feature/my-change).
    • Push your changes.
    • Submit a PR.

For more details, see the Contributing Guide.


💡 Future Roadmap

Our future plans include enhancements to the user interface and further AI-based features. We’re working on a lightweight web-based UI and additional improvements that streamline repository analysis and documentation extraction.


📜 License

Gittxt is licensed under the MIT License.


Made by Sandeep Paidipati

🚀 Gittxt: Get Text of Your Repo for AI, LLMs & Docs!


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gittxt-1.4.1.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gittxt-1.4.1-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file gittxt-1.4.1.tar.gz.

File metadata

  • Download URL: gittxt-1.4.1.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure

File hashes

Hashes for gittxt-1.4.1.tar.gz
Algorithm Hash digest
SHA256 70a1cc0462d72f9cfa77e4062570411cb0961823234343b05bdc46a8a6a46559
MD5 c45d3cb9fc38f8e73a8fb161594f7c9e
BLAKE2b-256 b85d717ba721f3640f39b655d1df5b2ae079caf166896fedd4f49985edccc4cc

See more details on using hashes here.

File details

Details for the file gittxt-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: gittxt-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure

File hashes

Hashes for gittxt-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b809107111f67cec421fba98c18646cca3cd3ff8136e820628dfe986d15c4533
MD5 23f3c3b6ce478d8dd7e78e62b6c82b76
BLAKE2b-256 c01d399720ecdbc2b6fb1176526da4309741b70f39696b7b0a79c68fd6be6515

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page