Skip to main content

Get Text of Your Repo for AI, LLMs & Docs!

Project description

🚀 Gittxt: Get Text of Your Repo for AI, LLMs & Docs!

Gittxt is a lightweight CLI tool that extracts text from Git repositories and formats it into AI-friendly outputs (.txt, .json, .md). Whether you’re using ChatGPT, Grok, Ollama or any LLM, Gittxt helps you process repositories for insights, training, and documentation.


✨ Why Use Gittxt?

  • Extract Readable Text: Easily pull text from code, docs, and other repository files.
  • AI-Friendly Outputs: Generate outputs in TXT, JSON, and Markdown for different use cases.
  • Efficient Processing: Faster scanning with incremental caching.
  • Flexible Filtering: Use advanced flags like --docs-only and --auto-filter to control what’s extracted.
  • Multi-Repository Support: Scan one or more repositories in a single command.

🆕 Release v1.3.1

New Features & Enhancements

  • Interactive Installation:
    Use the new gittxt install subcommand to set up your configuration (output directory, logging preferences, etc.) interactively.

  • Multi-Repository Scanning:
    Scan multiple repositories at once, whether they are local or remote.

  • Advanced Filtering Options:

    • --docs-only: Extract only documentation files (e.g., README, docs/ folder, etc.).
    • --auto-filter: Automatically skip common unwanted or binary files.
  • Multi-Format Output:
    Specify multiple output formats simultaneously (e.g., --output-format txt,json,md).

  • Enhanced Summary Reports:
    Outputs include summary statistics and an estimated token count for further AI processing.

  • Improved Logging & Caching:
    Faster, more accurate scanning with incremental caching and a rotating log file system.


📥 Installation

Via PIP

pip install gittxt==1.3.1

First-Time Setup (Interactive)

After installing, run:

gittxt install

This command will prompt you to configure:

  • Your default output directory (automatically set based on your OS, e.g., ~/Gittxt/ on Linux/Mac)
  • Logging level and file logging preferences

📌 How to Use Gittxt

1. Scanning Repositories

Use the scan subcommand to extract text and generate outputs.

Scan a Local Repository

gittxt scan .

Extracts all readable text into the default output directories.

Scan a Remote GitHub Repository

gittxt scan https://github.com/sandy-sp/sandy-sp

Automatically clones the repository, scans it, and extracts text.

Scan Multiple Repositories with Advanced Options

gittxt scan /path/to/repo1 https://github.com/user/repo2 --output-format txt,json --docs-only --auto-filter --summary

🔧 CLI Options

Option Description
--include Include only files matching these patterns.
--exclude Exclude files matching these patterns.
--size-limit Exclude files larger than the specified size (in bytes).
--branch Specify a Git branch (for remote repositories).
--output-dir Override the default output directory.
--output-format Comma-separated list of output formats (e.g., txt,json,md).
--max-lines Limit the number of lines per file.
--summary Display a summary report after scanning.
--debug Enable debug mode for detailed logging.
--docs-only Only extract documentation files (e.g., README, docs folder).
--auto-filter Automatically skip common unwanted or binary files.

📄 Output Formats

  • TXT: Simple text extraction for AI chat and quick analysis.
  • JSON: Structured output ideal for LLM training and data preprocessing.
  • Markdown (MD): Neatly formatted documentation for GitHub or project READMEs.

When specifying multiple formats (e.g., --output-format txt,json), Gittxt generates separate files in their respective output directories.


🗂 Directory Structure

By default, outputs are stored in your configured output directory, which is organized as follows:

<output_dir>/
  ├── text/    # Plain text outputs (.txt)
  ├── json/    # JSON outputs (.json)
  ├── md/      # Markdown outputs (.md)
  └── cache/   # Caching for incremental scans

⚙️ Configuration

Gittxt uses a configuration file (gittxt-config.json) to store user preferences. You can update this configuration via the interactive install command:

gittxt install

Or edit the file manually. Key settings include:

  • Output Directory: Auto-determined based on your OS (e.g., ~/Gittxt/).
  • Logging Options: Logging level and file logging preferences.
  • Filtering Options: Include/exclude patterns, file size limits, etc.

📌 Contribute & Develop

  1. Run Tests:
    pytest tests/
    
  2. Format Code:
    black src/
    
  3. Submit a PR:
    • Fork the repo.
    • Create a new branch (e.g., feature/my-change).
    • Push your changes.
    • Submit a PR.

For more details, see the Contributing Guide.


💡 Future Roadmap

Our future plans include enhancements to the user interface and further AI-based features. We’re working on a lightweight web-based UI and additional improvements that streamline repository analysis and documentation extraction.


📜 License

Gittxt is licensed under the MIT License.


Made by Sandeep Paidipati

🚀 Gittxt: Get Text of Your Repo for AI, LLMs & Docs!


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gittxt-1.3.1.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gittxt-1.3.1-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file gittxt-1.3.1.tar.gz.

File metadata

  • Download URL: gittxt-1.3.1.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure

File hashes

Hashes for gittxt-1.3.1.tar.gz
Algorithm Hash digest
SHA256 626ddbfddb039e2328f5a0e3ed705233a70e320b145100132169cb098f5bad8b
MD5 e90c72fd532643cf4b3d4cf230421959
BLAKE2b-256 85fa4254f8c8574aaa32d9934f32ccb7a1eb85141a5a2c0da26e36aaacd4d148

See more details on using hashes here.

File details

Details for the file gittxt-1.3.1-py3-none-any.whl.

File metadata

  • Download URL: gittxt-1.3.1-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.13.2 Linux/6.8.0-1021-azure

File hashes

Hashes for gittxt-1.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6c5c9eb13f39f0361bb2d42792172a4045fd941a46e9d2b3e265ff7c6d3b2960
MD5 e79a901089effcbd86fb6ccdb53760ba
BLAKE2b-256 89637674cb2733ecadfb140df1810a45ee1c2df64f6d32c5a0358eb821a98510

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page