A tool for managing code datasets
Project description
🚀 code_dataset
code_dataset is an open-source project for code programming datasets. Unlike other programming datasets, this dataset comes from the programming iterations of actual open-source projects, using the best models currently available and being reviewed by senior developers. This ensures the quality and practicality of the dataset, providing valuable learning resources for researchers and developers.
✨ Features
- 🌟 Datasets come from the programming iterations of actual open-source projects
- 🤖 Generated using the most advanced models currently available
- 👨💻 Reviewed by senior developers
- 💎 High-quality, practical code examples
🛠 code-dataset Command Line Tool
If your project uses auto-coder.chat, you can use the code-dataset command line tool to manage your local programming datasets. code-dataset is a convenient command line tool for collecting and managing programming datasets that can be submitted for external use.
📥 Installation
- Clone the project repository:
git clone https://github.com/yourusername/code_dataset.git
cd code_dataset
- Install the code-dataset tool:
pip install -e .
or
pip install code-dataset
🔧 Usage
The code-dataset tool provides three main commands:
- Add a repository:
code-dataset add <repository_url> [--alias <alias_name>]
This command is used to add a Git repository or local directory to the configuration. You can optionally provide an alias for the repository using the --alias
parameter.
- Refresh data:
code-dataset refresh
This command fetches the latest data from all configured repositories and saves it to the local data/libs
directory.
- Count data entries:
code-dataset count
This command counts the data entries in all projects and displays a summary table.
📚 Examples
- Add a Git repository:
code-dataset add https://github.com/example/repo.git
- Add a repository with an alias:
code-dataset add https://github.com/example/repo.git --alias my-repo
- Add a local directory:
code-dataset add /path/to/local/repo
- Refresh all data:
code-dataset refresh
- Count data entries:
code-dataset count
Contribution
You can submit your local programming datasets to code_dataset via PR.
🤝 Contribution
We welcome and encourage community contributions. If you have high-quality code examples or improvement suggestions, please submit a Pull Request or open an Issue.
🌟 If you find this project helpful, please give us a star! Your support is the driving force for our continuous improvement.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file code-dataset-0.1.1.tar.gz
.
File metadata
- Download URL: code-dataset-0.1.1.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4ca7741dc60dfad19964b810863c93f5d99a2832963ee691b21374b776b1425 |
|
MD5 | 0a2ac6a6f8573b6004948be0692e5861 |
|
BLAKE2b-256 | 936418c0155c9cd99e5b66955f80d7b76b03c1d03e4a02fc8b3a8614d4722d5e |
File details
Details for the file code_dataset-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: code_dataset-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ca86c0d4a701667f42efb2cc4395d8fdfdebd4b1c222c1a8296ac97c9947deb |
|
MD5 | 19ab59490ae858c6280d3a6bcb19ff87 |
|
BLAKE2b-256 | b17fd7a4a807697871679ec091cfc73f612a76d5b5c190fb5d3047e85f735d0c |