Skip to main content

A framework for creating and curating high-quality code datasets tailored for large language models

Project description

Build Status Python Version PyPI Downloads License Documentation Status

CodableLLM

CodableLLM is a Python framework for creating and curating high-quality code datasets tailored for training and evaluating large language models (LLMs). It supports source code and decompiled code extraction, with a flexible architecture for handling multiple languages and integration with custom LLM prompts.

Installation

PyPI

Install CodableLLM directly from PyPI:

pip install codablellm

Docker

Alternatively, you can build and run CodableLLM's CLI using Docker:

Build the image:

docker build -t codablellm .

Run the container with access to your local files:

docker run --rm -it -v $(pwd):/workspace -w /workspace codablellm \
    codablellm --url https://github.com/dmanuel64/codablellm/raw/refs/heads/main/examples/demo-c-repo.zip \
    --build "cd /tmp/demo-c-repo && make" \
    /tmp/demo-c-repo demo-c-repo.csv /tmp/demo-c-repo

This mounts your current directory to /workspace inside the container, allowing access to input/output files.

Features

  • Extracts functions and methods from source code repositories using tree-sitter.
  • Easy integration with LLMs to refine or augment extracted code (e.g. rename variables, insert comments, etc.)
  • Language-agnostic design with support for plugin-based extractor and decompiler extensions.
  • Extendable API for building your own workflows and datasets.

Documentation

Complete documentation is available on Read the Docs:

Contributing

We welcome contributions from the community! See CONTRIBUTING.md for guidelines, development setup, and how to get started.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codablellm-1.0.5.dev2.tar.gz (43.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codablellm-1.0.5.dev2-py3-none-any.whl (45.7 kB view details)

Uploaded Python 3

File details

Details for the file codablellm-1.0.5.dev2.tar.gz.

File metadata

  • Download URL: codablellm-1.0.5.dev2.tar.gz
  • Upload date:
  • Size: 43.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for codablellm-1.0.5.dev2.tar.gz
Algorithm Hash digest
SHA256 07e5ef04de4d34cd0bce7dea8287319a48bf72d78234cac55882d64ff5076ccb
MD5 f577e20c91e6eaa296d35d2e6cb87d1b
BLAKE2b-256 659dc3561fd893e459547ab65c255d29f9b9b230c323cec7277c567fd65ef54f

See more details on using hashes here.

File details

Details for the file codablellm-1.0.5.dev2-py3-none-any.whl.

File metadata

File hashes

Hashes for codablellm-1.0.5.dev2-py3-none-any.whl
Algorithm Hash digest
SHA256 925a2986299385d571faa3b4e7e852df4c87b7ba8d437e8104fcddfc26618080
MD5 9d2d171d18339ee885e19279214bbacf
BLAKE2b-256 de63a2bcc4fdbf0e88565694192ea81b2e2d5f3ccf9837f82a46b78468a35a50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page