A package to convert project codebases into JSONL format for GPT model training.
Project description
ProjectCodebaseToJsonl
ProjectCodebaseToJsonl
is a Python package designed to convert project codebases into JSONL format. This is particularly useful for preparing data for training GPT models, as it allows for the easy transformation of existing project structures and code into a format compatible with machine learning pipelines.
Installation
To install ProjectCodebaseToJsonl
, you can use pip:
pip install ProjectCodebaseToJsonl
Usage
As a Python Module
You can use ProjectCodebaseToJsonl
as a module in your Python scripts.
Example:
from codebase_to_jsonl import generate_jsonl_for_project
# Generate JSONL for a project
project_data = generate_jsonl_for_project(
project_path="path_to_your_project",
project_name="YourProjectName",
use_gitignore=True,
validation_ratio=0.4
)
print("Project Data Generated:")
print(project_data)
Customizing Your Generator
You can customize the behavior of ProjectCodebaseToJsonl
by adjusting parameters like use_gitignore
and validation_ratio
to suit the specific needs of your codebase and desired dataset characteristics.
Output Example
Running ProjectCodebaseToJsonl
generates JSONL files for both training and validation, structured to facilitate GPT model training. Here's an example of the output structure:
{
"project_name": "YourProjectName",
"token_count": 12345,
"training_file": "YourProjectName_training_20240101_123456.jsonl",
"validation_file": "YourProjectName_validation_20240101_123456.jsonl"
}
Contributing
Contributions, issues, and feature requests are welcome! Feel free to check issues page.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ProjectCodebaseToJsonl-0.0.1.tar.gz
.
File metadata
- Download URL: ProjectCodebaseToJsonl-0.0.1.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0e57eae2ee47fc752d53d1972e6666b99018cf71cec7b6745110c2ae1b9332c |
|
MD5 | fd6dfa00e017b568fc2ba007dba9b3d0 |
|
BLAKE2b-256 | a7f3dd771878611a868c442ce4474fbb4986f04be8f80a48512d6c7b6d777e40 |
File details
Details for the file ProjectCodebaseToJsonl-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: ProjectCodebaseToJsonl-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9bf32bf36150bebf76aa99ce5a7a9d88177a1fa09ca9a3032584c008f7eda104 |
|
MD5 | bad2b06656e02c8f49ad7c3537fc118d |
|
BLAKE2b-256 | 9f7bae848dbb1829bd45f40d64f6edb6b44b3c0f9c3e21b79a3a038d56a61029 |