Skip to main content

A versatile Python tool to convert documents (PPTX, DOCX, PDF, XLSX) to plain text, ideal for providing context to AI code assistants like GitHub Copilot and Amazon CodeWhisperer.

Project description

txtify

📄 txtify

PyPI Version PyPI - License

txtify is a simple yet powerful command-line tool and Python library designed to effortlessly convert various document formats (PowerPoint, Word, PDF, and Excel) into clean, plain text files.
It's ideal for extracting content for analysis, archiving, or providing crucial context to AI assistants like GitHub Copilot and Amazon CodeWhisperer, allowing them to better understand your project's domain knowledge, requirements, and existing documentation.


📚 Table of Contents


✨ Features

Multi-Format Support: Converts .pptx (PowerPoint), .docx (Word), .pdf (Portable Document Format), and .xlsx (Excel) files.
Multiple Output Formats: Export content as plain text (.txt), Markdown (.md), or JSON (.json).
Batch Processing: Convert multiple files or entire directories at once.
Clean Text Output: Extracts core textual content, making documents easily searchable and readable for both humans and AI.
Intuitive CLI: Simple command-line interface for quick and easy conversions.
Preserves Structure: When converting directories, the original folder structure is replicated in the output.


🤖 Providing Context to AI Code Assistants

One of the most powerful use cases for txtify is to prepare your project's non-code documentation (e.g., design documents, requirement specifications, meeting notes, data dictionaries) for consumption by AI code generation tools like GitHub Copilot, Amazon CodeWhisperer, or similar LLM-based assistants.

Why this is useful

  • Expand AI's Knowledge Base: Let the AI "read" and understand domain-specific terminology, project goals, architectural decisions, and detailed requirements that might otherwise be locked away in binary formats.
  • Improve Code Relevance: The AI can generate more relevant and accurate code suggestions, function names, and comments by leveraging the textual context.
  • Reduce Hallucinations: With more accurate information, the AI is less likely to "hallucinate" or generate incorrect assumptions.
  • Seamless Integration: Place the converted .txt files in a directory accessible to your IDE, and they can often automatically index and use this information.

Example Workflow

  1. Convert your documentation:

    txtify ./docs_and_requirements/ -o ./ai_context/
    
  2. Integrate with your project: Place the ai_context/ folder directly within your main project repository.

  3. Let your AI assistant learn: Your assistant will now have access to the wealth of information contained in these plain text files, enabling more intelligent and context-aware code suggestions.


🚀 Installation

You can install txtify directly from PyPI using pip:

pip install txtify

💡 Usage (Command Line Interface)

txtify can be used directly from your terminal.

Convert a Single File

Pass the path to your document as an argument:

txtify my_project_spec.docx

This will create a plain text file named my_project_spec.txt inside a new output/ directory by default.


Convert an Entire Directory

Provide the path to a directory, and txtify will scan it (and its subdirectories) for all supported document types:

txtify project_documentation/

All convertible files will be processed. The original directory structure will be mirrored in the output/ folder. For example:

project_documentation/meetings/q1_notes.pptx

becomes:

output/project_documentation/meetings/q1_notes.txt

Convert Multiple Files and Directories

You can process any combination of files and directories in a single command. txtify will scan all specified paths and convert every supported document it finds.

txtify my_spec.docx docs_folder/ old_project/ requirements.pdf

This will convert the specified files to .txt versions in the output/ directory.


Specify an Output Directory

Use the -o or --output option to choose a different location for your converted files:

txtify legacy_reports/ -o contextual_data/

This saves all converted text files into the contextual_data/ directory.

Specify an Output Format

By default, all documents are converted to plain text (.txt). If you do not specify an output format, you do not need to use this flag. To convert to Markdown or JSON, use the --output-format option.

txtify my_document.pdf --output-format markdown

📂 Supported File Formats

txtify currently supports conversion for the following file types:

  • PowerPoint Presentations: .pptx
  • Word Documents: .docx
  • PDF Documents: .pdf
  • Excel Workbooks: .xlsx (converted to a CSV-like plain text format, useful for data extraction)

🗄️ Output

Converted files will have a .txt, .md, or .json extension, depending on the chosen format. By default, they are saved to a directory named output/ in your current working directory. You can customize this using the -o or --output option.

If converting an entire directory, the relative path from the input directory is preserved in the output.


📜 License

txtify is distributed under the terms of the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

txtify-0.1.3.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

txtify-0.1.3-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file txtify-0.1.3.tar.gz.

File metadata

  • Download URL: txtify-0.1.3.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for txtify-0.1.3.tar.gz
Algorithm Hash digest
SHA256 e2008db4b2152a937f6c4a211e02b7f8c1499ed5c02e324bea972c3e44163c0b
MD5 86c54c2b9eab84a0439183a53e92c61b
BLAKE2b-256 1cd3424214c16eabb6b3df162458aea0c9e8cce6b3f6ff904ba6ffca3aba0afe

See more details on using hashes here.

File details

Details for the file txtify-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: txtify-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for txtify-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9ac967afb002be7f89f82b9bdaead79a7c390503f8e4f106b8c818f9af3fbe5a
MD5 0700f61fb452bd0e3e5dbc0f565906f5
BLAKE2b-256 de7c52851a2115432fe8764f241e0ee1a1e91da933d1f35af9c9759ad149f809

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page