Skip to main content

A tool to combine multiple text files into one for LLM training and prompts

Project description

TextMeld

A CLI tool to combine text files into one. Perfect for preparing LLM training data and prompt engineering.

Features

  • Combine multiple text files into a single file
  • Automatic recognition of .gitignore patterns
  • Automatic skipping of binary files and hidden files
  • Option to limit output character count
  • Flexible file exclusion patterns

Installation

pip install textmeld

Using Poetry:

poetry add textmeld

Usage

Basic Usage

# Basic usage (outputs to stdout)
textmeld /path/to/your/directory
# Specify output file
textmeld /path/to/your/directory -o output.txt
# Limit maximum character count
textmeld /path/to/your/directory --max-chars 100000

Available Options

usage: textmeld [-h] [-o OUTPUT] [-e EXCLUDE] [-m MAX_CHARS] directory
A tool to merge multiple text files into one file
positional arguments:
  directory             Target directory path
options:
  -h, --help            Show help message and exit
  -o OUTPUT, --output OUTPUT
                        Output file path (if not specified, outputs to stdout)
  -e EXCLUDE, --exclude EXCLUDE
                        File patterns to exclude (can specify multiple)
  -m MAX_CHARS, --max-chars MAX_CHARS
                        Maximum character count for output

Using Exclusion Patterns

To exclude specific files or directories:

# Exclude specific extensions
textmeld /path/to/your/directory -e "*.log" -e "*.tmp"
# Exclude specific directories
textmeld /path/to/your/directory -e "node_modules/" -e "venv/"

Output Format

TextMeld's output consists of two parts:

  1. Directory Structure: A tree view of the target directory
  2. Merged Content: Combined contents of all text files (each file has a header)
Directory Structure:
====================
└── project/
    ├── README.md
    ├── main.py
    └── utils/
        └── helper.py
Merged Content:
====================
==========
File: project/README.md
==========
# Project Documentation
...
==========
File: project/main.py
==========
def main():
    print("Hello World")
...
==========
File: project/utils/helper.py
==========
def helper_function():
    return True
...

Supported File Formats

TextMeld automatically detects text files. Generally supported file formats include:

  • Markdown (.md)
  • Text (.txt)
  • YAML (.yaml, .yml)
  • JSON (.json)
  • Python (.py)
  • JavaScript (.js)
  • TypeScript (.ts)
  • JSX/TSX (.jsx, .tsx)
  • HTML (.html)
  • CSS (.css)
  • Other text-based file formats

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textmeld-0.3.0.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textmeld-0.3.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file textmeld-0.3.0.tar.gz.

File metadata

  • Download URL: textmeld-0.3.0.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1012-azure

File hashes

Hashes for textmeld-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e50d921357b68c4163812e5515e139278de70d23f7913ef196589d255147756c
MD5 242d3ce0e712116869931b7192619788
BLAKE2b-256 6c992f0c69e0d136d2d11bcf10e8118f431ce15212d64b729843cb1fb1268c6f

See more details on using hashes here.

File details

Details for the file textmeld-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: textmeld-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1012-azure

File hashes

Hashes for textmeld-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 77724a3108e105587109d321c7deb9cb51b7f941fe2faab3eea5fbd96c5c5ac3
MD5 b1b5e02913ca2cb310304b8d011060c8
BLAKE2b-256 570fa3da315c57176695b389dc032de01ecaa10834f33bdd542711a1fa989747

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page