Skip to main content

A tool for managing code datasets

Project description

🚀 code_dataset

code_dataset is an open-source project for code programming datasets. Unlike other programming datasets, this dataset comes from the programming iterations of actual open-source projects, using the best models currently available and being reviewed by senior developers. This ensures the quality and practicality of the dataset, providing valuable learning resources for researchers and developers.


✨ Features

  • 🌟 Datasets come from the programming iterations of actual open-source projects
  • 🤖 Generated using the most advanced models currently available
  • 👨‍💻 Reviewed by senior developers
  • 💎 High-quality, practical code examples

🛠 code-dataset Command Line Tool

If your project uses auto-coder.chat, you can use the code-dataset command line tool to manage your local programming datasets. code-dataset is a convenient command line tool for collecting and managing programming datasets that can be submitted for external use.

📥 Installation

  1. Clone the project repository:
git clone https://github.com/yourusername/code_dataset.git
cd code_dataset
  1. Install the code-dataset tool:
pip install -e .

or

pip install code-dataset

🔧 Usage

The code-dataset tool provides three main commands:

  1. Add a repository:
code-dataset add <repository_url> [--alias <alias_name>]

This command is used to add a Git repository or local directory to the configuration. You can optionally provide an alias for the repository using the --alias parameter.

  1. Refresh data:
code-dataset refresh

This command fetches the latest data from all configured repositories and saves it to the local data/libs directory.

  1. Count data entries:
code-dataset count

This command counts the data entries in all projects and displays a summary table.

📚 Examples

  1. Add a Git repository:
code-dataset add https://github.com/example/repo.git
  1. Add a repository with an alias:
code-dataset add https://github.com/example/repo.git --alias my-repo
  1. Add a local directory:
code-dataset add /path/to/local/repo
  1. Refresh all data:
code-dataset refresh
  1. Count data entries:
code-dataset count

Contribution

You can submit your local programming datasets to code_dataset via PR.


🤝 Contribution

We welcome and encourage community contributions. If you have high-quality code examples or improvement suggestions, please submit a Pull Request or open an Issue.


🌟 If you find this project helpful, please give us a star! Your support is the driving force for our continuous improvement.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code-dataset-0.1.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

code_dataset-0.1.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file code-dataset-0.1.1.tar.gz.

File metadata

  • Download URL: code-dataset-0.1.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.11

File hashes

Hashes for code-dataset-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f4ca7741dc60dfad19964b810863c93f5d99a2832963ee691b21374b776b1425
MD5 0a2ac6a6f8573b6004948be0692e5861
BLAKE2b-256 936418c0155c9cd99e5b66955f80d7b76b03c1d03e4a02fc8b3a8614d4722d5e

See more details on using hashes here.

File details

Details for the file code_dataset-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: code_dataset-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.11

File hashes

Hashes for code_dataset-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1ca86c0d4a701667f42efb2cc4395d8fdfdebd4b1c222c1a8296ac97c9947deb
MD5 19ab59490ae858c6280d3a6bcb19ff87
BLAKE2b-256 b17fd7a4a807697871679ec091cfc73f612a76d5b5c190fb5d3047e85f735d0c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page