Skip to main content

A tool for managing code datasets

Project description

🚀 code_dataset

code_dataset is an open-source project for code programming datasets. Unlike other programming datasets, this dataset comes from the programming iterations of actual open-source projects, using the best models currently available and being reviewed by senior developers. This ensures the quality and practicality of the dataset, providing valuable learning resources for researchers and developers.


✨ Features

  • 🌟 Datasets come from the programming iterations of actual open-source projects
  • 🤖 Generated using the most advanced models currently available
  • 👨‍💻 Reviewed by senior developers
  • 💎 High-quality, practical code examples

🛠 code-dataset Command Line Tool

If your project uses auto-coder.chat, you can use the code-dataset command line tool to manage your local programming datasets. code-dataset is a convenient command line tool for collecting and managing programming datasets that can be submitted for external use.

📥 Installation

  1. Clone the project repository:
git clone https://github.com/yourusername/code_dataset.git
cd code_dataset
  1. Install the code-dataset tool:
pip install -e .

or

pip install code-dataset

🔧 Usage

The code-dataset tool provides three main commands:

  1. Add a repository:
code-dataset add <repository_url> [--alias <alias_name>]

This command is used to add a Git repository or local directory to the configuration. You can optionally provide an alias for the repository using the --alias parameter.

  1. Refresh data:
code-dataset refresh

This command fetches the latest data from all configured repositories and saves it to the local data/libs directory.

  1. Count data entries:
code-dataset count

This command counts the data entries in all projects and displays a summary table.

📚 Examples

  1. Add a Git repository:
code-dataset add https://github.com/example/repo.git
  1. Add a repository with an alias:
code-dataset add https://github.com/example/repo.git --alias my-repo
  1. Add a local directory:
code-dataset add /path/to/local/repo
  1. Refresh all data:
code-dataset refresh
  1. Count data entries:
code-dataset count

Contribution

You can submit your local programming datasets to code_dataset via PR.


🤝 Contribution

We welcome and encourage community contributions. If you have high-quality code examples or improvement suggestions, please submit a Pull Request or open an Issue.


🌟 If you find this project helpful, please give us a star! Your support is the driving force for our continuous improvement.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code-dataset-0.1.1.tar.gz (4.5 kB view hashes)

Uploaded Source

Built Distribution

code_dataset-0.1.1-py3-none-any.whl (4.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page