Gptize is a tool designed to concatenate the contents of project files for ChatGPT

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

GPTize

GPTize is a tool for merging the contents of project files into a single text document. It is specifically designed to create datasets that can be loaded into ChatGPT for analysis or training. I, Aleksei Svetlov, created this tool because I was tired of copying file contents and paths to make GPT understand the context of my project. With GPTize, this process is now automated and streamlined.

Features

Clipboard functionality: The combined content of project files is automatically copied to the clipboard after generation using pyperclip.
Custom output file naming: Output file names now include the project name along with the date and time for better traceability.
Exception handling for files based on .gitignore (including custom .gptignore).
Support for specifying a target directory with repository root .gitignore.
Support for various encodings when reading files.
Customizable output file naming based on the input file or directory name.
Report generation including all processed files.
Enhanced limit checks for file size and token count, with warnings logged instead of errors raised when limits are exceeded.
Detailed Git integration: Includes the current branch, last commit, and file status in the generated report.
Tokenization support: Integrated with tiktoken for accurate token counting and analysis.
File statistics: Provides detailed information on line count, character count, token count, file size, last modified date, and permissions for every processed file.
Warnings for large files: Alerts if files exceed a predefined line count threshold.

Installation

To install GPTize, simply use pip:

pip install gptize

This command will install GPTize and all its dependencies. After installation, you can use GPTize from the command line anywhere.

Usage

To run GPTize, you have several options:

Basic Usage

Simply invoke GPTize in the command line to process the current directory:

gptize

This will process all files in the current directory and generate a report with a default name like gptize-output-PROJECT_NAME-YYYYMMDD-HHMMSS.txt. The content will also be copied to your clipboard.

Specifying a Directory

To process a specific directory, use:

gptize /path/to/directory

This will process all files in the specified directory and create a report named gptize-output-PROJECT_NAME-YYYYMMDD-HHMMSS.txt, and the content will be copied to your clipboard.

Specifying a Single File

For processing a single file:

gptize /path/to/file.txt

This will process only the specified file and generate a report named gptize-output-file_name-YYYYMMDD-HHMMSS.txt, where file_name is the name of the input file. The result will also be copied to your clipboard.

Specifying Repository Root for `.gitignore`

If your .gitignore is located in the root of the repository but you want to process files in a different subdirectory, you can use the --repo-root option:

gptize /path/to/directory --repo-root /path/to/repo_root

This will apply the .gitignore from the repository root to files in the specified directory, and the result will be copied to your clipboard.

Custom Output File

If you want to specify a custom output file name, use the -o or --output option:

gptize -o custom_output.txt

This command will override the default naming convention and use custom_output.txt as the output file name. The content will still be copied to your clipboard.

Uploading to ChatGPT

After generating the merged file using GPTize, you can upload it to ChatGPT for improved context understanding. When making requests to ChatGPT, explicitly reference the uploaded file, for instance, using a phrase like ... based on the imported txt file. This approach significantly enhances the quality of ChatGPT's responses by providing it with specific context.

Analyzing Statistics

GPTize provides detailed statistics for all processed files:

Line count, character count, and token count.
Total project statistics including total lines, tokens, and percentage of GPT-4o context usage.
Warnings for files exceeding predefined line count thresholds.

Git Integration

The generated report includes Git details such as:

Current branch.
Last commit message, author, and timestamp.
Untracked or modified files.

Components

gptizer.py: The main class for file processing.
main.py: The entry point of the application.
models.py: Data models for files and projects.
output_builder.py: Output constructor for report generation.
settings.py: Project settings.

Author and Maintainer

Aleksei Svetlov - Creator and main maintainer.

Contact Information

License

The project is distributed under the MIT License.

CHANGELOG

[0.5.0] - 2024-12-01

[Feature] Added support for tokenization using tiktoken
- Integrated tiktoken for accurate token counting.
- Introduced new tokenization settings in Settings class, including TOKEN_MODEL_NAME, GPT4O_CONTEXT_WINDOW, and TOP_TOKEN_FILES_COUNT.
[Feature] Git status integration
- Added functionality to include Git branch, last commit, and file status in the combined output.
[Feature] Custom .gptignore support
- Implemented a .gptignore file to specify additional patterns for file exclusion alongside .gitignore.
[Enhancement] Expanded Python version support
- Updated GitHub Actions CI to include testing on Python 3.13.
[Enhancement] Metadata and statistics for processed files
- File metadata now includes size, last modified timestamp, and permissions.
- Statistics for line count, character count, and token count added for each file.
- Enhanced logging to display detailed file statistics and warnings for files exceeding predefined limits.
[Enhancement] Detailed statistics summary
- Added summary for total lines, tokens, and characters, along with the percentage of GPT-4o context usage.
[Enhancement] Clipboard support improvements
- Enhanced compatibility for clipboard operations, including fallback to xclip if default tools are unavailable.
[Refactor] Cleaned up code and improved maintainability
- Simplified method parameters and removed redundant comments.
- Introduced FileStats and FileMetadata classes for structured storage of file-related data.
[Fix] Updated requirements
- Added tiktoken and its dependencies to requirements.in and requirements.txt.
[Fix] Enhanced error handling
- Improved error messages and handling for file reading, Git commands, and clipboard operations.

[0.4.0] - 2024-09-27

[Feature] Added clipboard copy functionality
- The combined output file content is now automatically copied to the clipboard using pyperclip after the content is generated.
[Enhancement] Improved output file naming
- Output file names now include the project name in addition to the date and time. This provides better traceability for output files.
[Enhancement] Updated Settings.custom_output_file
- The method now accepts the project name as an argument and uses it in the output file name.
[Enhancement] Reorganized the main logic in main.py
- Fixed a bug where the gptizer object was used before initialization.
- Ensured that output file name generation happens after gptizer is properly initialized.

[0.3.0] - 2024-09-26

[Feature] Support for specifying a target directory with repository root .gitignore
- Now gptize can be executed from any directory, while still applying .gitignore rules from the root of the repository.
- Added the --repo-root argument to specify the root directory of the repository where the .gitignore is located.
- Example usage: gptize src/py_module/ --repo-root . allows processing files in src/py_module/ while applying .gitignore rules from the repository root.
[Feature] Added support for a custom .gitignore for GPTize
- Now, you can use an additional custom .gptignore file along with the repository root .gitignore.
- The custom .gptignore can be specified and will be applied in addition to the main .gitignore.
- Example usage: gptize src/py_module/ --repo-root . will apply both .gitignore and .gptignore.

[0.2.5] - 2023-11-25

[Modification] Updated File Size and Token Count Checks
- Modified the combine_files method in gptizer.py to log a warning instead of raising an error when the total size of the combined content exceeds the MAX_FILE_SIZE_BYTES_LIMIT or MAX_TOKEN_COUNT_LIMIT defined in settings.py.

[0.2.4] - 2023-11-16

[Feature] Custom Output File Naming
- Output files now include the name of the processed file or directory, enhancing traceability and identification.
[Enhancement] Settings Method for Custom File Names
- Updated the Settings class with a new method to generate output file names incorporating the name of the input file or directory.
[Modification] Main File Processing Logic
- Modified main.py to adopt the new output file naming scheme.
[Fix] Minor Bug Fixes and Performance Improvements
- Addressed various minor bugs and optimized performance.

[0.2.3] - 2023-11-12

[Enhancement] Detect binary files and handle errors gracefully
- Added binary file detection logic in load_file_content method.
- Improved error handling for file reading.
- Updated the OutputBuilder to handle binary files properly.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.5.1

Nov 30, 2024

This version

0.5.0

Nov 30, 2024

0.4.1

Sep 26, 2024

0.4.0

Sep 26, 2024

0.3.2

Sep 26, 2024

0.3.1

Sep 26, 2024

0.2.5

Nov 25, 2023

0.2.4

Nov 16, 2023

0.2.3

Nov 12, 2023

0.2.2

Nov 12, 2023

0.2.1

Nov 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gptize-0.5.0.tar.gz (11.0 kB view details)

Uploaded Nov 30, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gptize-0.5.0-py3-none-any.whl (12.5 kB view details)

Uploaded Nov 30, 2024 Python 3

File details

Details for the file gptize-0.5.0.tar.gz.

File metadata

Download URL: gptize-0.5.0.tar.gz
Upload date: Nov 30, 2024
Size: 11.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for gptize-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`7ae91fac83bec1a4d99e58052366f1672230c3254b91c1ca725933fdddaab8c3`
MD5	`4aa7fd0ea6c976b206bf4c8d28b5f2fa`
BLAKE2b-256	`e7c85f0c8f51be3a553e1b6e01ed7072ae864c8b06ec5de6f50ea157f66c70d3`

See more details on using hashes here.

File details

Details for the file gptize-0.5.0-py3-none-any.whl.

File metadata

Download URL: gptize-0.5.0-py3-none-any.whl
Upload date: Nov 30, 2024
Size: 12.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for gptize-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e7b46740859906e8b6a29fbd13268f3527f3f20fac4542d265088ac419cc158f`
MD5	`42cb96126d5232beab5da8fd0e3414ba`
BLAKE2b-256	`fc345e5b8a01bdab72b6d306c34c474fb60b62280dc238e96dfcae1ef6674751`

See more details on using hashes here.

gptize 0.5.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

GPTize

Features

Installation

Usage

Basic Usage

Specifying a Directory

Specifying a Single File

Specifying Repository Root for .gitignore

Custom Output File

Uploading to ChatGPT

Analyzing Statistics

Git Integration

Components

Author and Maintainer

Contact Information

License

CHANGELOG

[0.5.0] - 2024-12-01

[0.4.0] - 2024-09-27

[0.3.0] - 2024-09-26

[0.2.5] - 2023-11-25

[0.2.4] - 2023-11-16

[0.2.3] - 2023-11-12

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Specifying Repository Root for `.gitignore`