Skip to main content

A framework for creating and curating high-quality code datasets tailored for large language models

Project description

Build Status Python Version PyPI Downloads License Documentation Status

CodableLLM

CodableLLM is a Python framework for creating and curating high-quality code datasets tailored for training and evaluating large language models (LLMs). It supports source code and decompiled code extraction, with a flexible architecture for handling multiple languages and integration with custom LLM prompts.

Installation

PyPI

Install CodableLLM directly from PyPI:

pip install codablellm

Docker

Alternatively, you can build and run CodableLLM's CLI using Docker:

Build the image:

docker build -t codablellm .

Run the container with access to your local files:

docker run --rm -it -v $(pwd):/workspace -w /workspace codablellm \
    codablellm --url https://github.com/dmanuel64/codablellm/raw/refs/heads/main/examples/demo-c-repo.zip \
    --build "cd /tmp/demo-c-repo && make" \
    /tmp/demo-c-repo demo-c-repo.csv /tmp/demo-c-repo

This mounts your current directory to /workspace inside the container, allowing access to input/output files.

Features

  • Extracts functions and methods from source code repositories using tree-sitter.
  • Easy integration with LLMs to refine or augment extracted code (e.g. rename variables, insert comments, etc.)
  • Language-agnostic design with support for plugin-based extractor and decompiler extensions.
  • Extendable API for building your own workflows and datasets.

Documentation

Complete documentation is available on Read the Docs:

Contributing

We welcome contributions from the community! See CONTRIBUTING.md for guidelines, development setup, and how to get started.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codablellm-1.0.6.tar.gz (42.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codablellm-1.0.6-py3-none-any.whl (45.5 kB view details)

Uploaded Python 3

File details

Details for the file codablellm-1.0.6.tar.gz.

File metadata

  • Download URL: codablellm-1.0.6.tar.gz
  • Upload date:
  • Size: 42.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for codablellm-1.0.6.tar.gz
Algorithm Hash digest
SHA256 ef6a27fc6060ab9cfb399173ad19514be9968e33d45752dead77f3063f8433a6
MD5 7c4137845147d0b003e44337d7145393
BLAKE2b-256 e42172be3a0c24d8ec882074363f5649656447fa853be5bf24ac7584f54cea4d

See more details on using hashes here.

File details

Details for the file codablellm-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: codablellm-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 45.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for codablellm-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 953fad82c304e19171c3e47f3b32cdd246bab82da28dcfdc623251ff151c9178
MD5 276c4987d148b58292c37c82a5fd599b
BLAKE2b-256 7e6eb58b001c654a92dc35c5c6ba859804b922e712e4c2278c5c9c28e39bfd22

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page