A versatile Python tool to convert documents (PPTX, DOCX, PDF, XLSX) to plain text, ideal for providing context to AI code assistants like GitHub Copilot and Amazon CodeWhisperer.
Project description
txtify
📄 txtify
txtify is a simple yet powerful command-line tool and Python library designed to effortlessly convert various document formats (PowerPoint, Word, PDF, and Excel) into clean, plain text files.
It's ideal for extracting content for analysis, archiving, or providing crucial context to AI assistants like GitHub Copilot and Amazon CodeWhisperer, allowing them to better understand your project's domain knowledge, requirements, and existing documentation.
📚 Table of Contents
- ✨ Features
- 🤖 Providing Context to AI Code Assistants
- 🚀 Installation
- 💡 Usage (Command Line Interface)
- 📂 Supported File Formats
- 🗄️ Output
- 📜 License
✨ Features
✅ Multi-Format Support: Converts .pptx (PowerPoint), .docx (Word), .pdf (Portable Document Format), and .xlsx (Excel) files.
✅ Multiple Output Formats: Export content as plain text (.txt), Markdown (.md), or JSON (.json).
✅ Batch Processing: Convert multiple files or entire directories at once.
✅ Clean Text Output: Extracts core textual content, making documents easily searchable and readable for both humans and AI.
✅ Intuitive CLI: Simple command-line interface for quick and easy conversions.
✅ Preserves Structure: When converting directories, the original folder structure is replicated in the output.
🤖 Providing Context to AI Code Assistants
One of the most powerful use cases for txtify is to prepare your project's non-code documentation (e.g., design documents, requirement specifications, meeting notes, data dictionaries) for consumption by AI code generation tools like GitHub Copilot, Amazon CodeWhisperer, or similar LLM-based assistants.
Why this is useful
- Expand AI's Knowledge Base: Let the AI "read" and understand domain-specific terminology, project goals, architectural decisions, and detailed requirements that might otherwise be locked away in binary formats.
- Improve Code Relevance: The AI can generate more relevant and accurate code suggestions, function names, and comments by leveraging the textual context.
- Reduce Hallucinations: With more accurate information, the AI is less likely to "hallucinate" or generate incorrect assumptions.
- Seamless Integration: Place the converted
.txtfiles in a directory accessible to your IDE, and they can often automatically index and use this information.
Example Workflow
-
Convert your documentation:
txtify ./docs_and_requirements/ -o ./ai_context/
-
Integrate with your project: Place the
ai_context/folder directly within your main project repository. -
Let your AI assistant learn: Your assistant will now have access to the wealth of information contained in these plain text files, enabling more intelligent and context-aware code suggestions.
🚀 Installation
You can install txtify directly from PyPI using pip:
pip install txtify
💡 Usage (Command Line Interface)
txtify can be used directly from your terminal.
Convert a Single File
Pass the path to your document as an argument:
txtify my_project_spec.docx
This will create a plain text file named my_project_spec.txt inside a new output/ directory by default.
Convert an Entire Directory
Provide the path to a directory, and txtify will scan it (and its subdirectories) for all supported document types:
txtify project_documentation/
All convertible files will be processed. The original directory structure will be mirrored in the output/ folder.
For example:
project_documentation/meetings/q1_notes.pptx
becomes:
output/project_documentation/meetings/q1_notes.txt
Convert Multiple Files and Directories
You can process any combination of files and directories in a single command. txtify will scan all specified paths and convert every supported document it finds.
txtify my_spec.docx docs_folder/ old_project/ requirements.pdf
This will convert the specified files to .txt versions in the output/ directory.
Specify an Output Directory
Use the -o or --output option to choose a different location for your converted files:
txtify legacy_reports/ -o contextual_data/
This saves all converted text files into the contextual_data/ directory.
Specify an Output Format
By default, all documents are converted to plain text (.txt). If you do not specify an output format, you do not need to use this flag. To convert to Markdown or JSON, use the --output-format option.
txtify my_document.pdf --output-format markdown
📂 Supported File Formats
txtify currently supports conversion for the following file types:
- PowerPoint Presentations:
.pptx - Word Documents:
.docx - PDF Documents:
.pdf - Excel Workbooks:
.xlsx(converted to a CSV-like plain text format, useful for data extraction)
🗄️ Output
Converted files will have a .txt, .md, or .json extension, depending on the chosen format.
By default, they are saved to a directory named output/ in your current working directory.
You can customize this using the -o or --output option.
If converting an entire directory, the relative path from the input directory is preserved in the output.
📜 License
txtify is distributed under the terms of the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file txtify-0.1.3.tar.gz.
File metadata
- Download URL: txtify-0.1.3.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2008db4b2152a937f6c4a211e02b7f8c1499ed5c02e324bea972c3e44163c0b
|
|
| MD5 |
86c54c2b9eab84a0439183a53e92c61b
|
|
| BLAKE2b-256 |
1cd3424214c16eabb6b3df162458aea0c9e8cce6b3f6ff904ba6ffca3aba0afe
|
File details
Details for the file txtify-0.1.3-py3-none-any.whl.
File metadata
- Download URL: txtify-0.1.3-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ac967afb002be7f89f82b9bdaead79a7c390503f8e4f106b8c818f9af3fbe5a
|
|
| MD5 |
0700f61fb452bd0e3e5dbc0f565906f5
|
|
| BLAKE2b-256 |
de7c52851a2115432fe8764f241e0ee1a1e91da933d1f35af9c9759ad149f809
|