Skip to main content

Composable inference algorithms with LLMs and programmable logic

Project description

Decoding

Tests Docs

Composable inference algorithms with LLMs and programmable logic.

Overview

decoding is a library for scaling the inference-time capabilities of LLMs, enabling users to easily solve difficult problems. The library is built around simple sampling and reranking patterns that accept arbitrary user-defined scoring functions. At its simplest, you write a function that takes in a string and returns a float, and we handle the rest. If you'd like to step things up, we provide simple patterns to enable the efficient construction of powerful algorithms like Backtracking Monte Carlo Tree Search variants.

Why should I care?

Scaling inference thoughtfully is yielding breakthrough performance improvements in the world of LLMs. We are already seeing small models out-perform models >10x their size by leveraging basic sampling and search strategies, particularly in combination with custom verifiers, scoring functions, and process reward models. Check out this excellent recent presentation by Sasha Rush and Daniel Ritter for more background. decoding makes it effortless to explore this design space and allows researchers to quickly iterate on their ideas.

Getting started

Install directly from PyPi

python -m pip install decoding

See Contributing for how to build the dev and testing environment.

NOTE: Decoding depends on vLLM, which means this library can only be built on linux, and by default must be run on GPU. To run on CPU, see the instructions from vLLM.

Documentation

All modules, classes, and functions of our public interface are documented on our website. The docs should be the first stop for questions about the API. More realistic use cases and interesting patterns can be found in the tutorial.

Tutorial

Several examples are provided to give you a taste and help you get started. Check out TUTORIAL.md for a commented walk-through of a few sample design patterns, or go straight to the code and run it yourself in the examples/ directory.

The library's philosophy

The most valuable resource is the researcher's time. The decoding library is designed from the ground up to easily, flexibly, and quickly support experimentation over new ideas, while coordinating all the engineering glue on the backend. We make a few design decisions that support this.

  1. The library is built with an emphasis on pure functions and immutability. All classes are frozen dataclasses. All functions express typed, composable building blocks that flexibly wrap and coordinate any user-generated code. This enables users to get very creative with their design patterns without having to chase down complicated bugs.

  2. When there is a decision point between squeezing a drop of performance or maintaining flexibility, we maintain flexibility. This may be controversial. When one decides a piece of code is meant to do only one thing, there are a myriad of optimizations that become available. But, if your use-case doesn't fall there, those optimizations aren't particularly useful for you who can't use the library. We optimize the library to support the broadest range of ideas a researcher can come up with, such as backtracking, resampling, or otherwise modifying in-progress generations.

  3. We still keep things fast. We use libraries like vLLM under the hood to keep text generation fast, which is often the primary bottleneck. We also expose arguments for users to specify when parts of scorers or estimators can be run concurrently, and harness CPU-based parallelism for the heavier parts like executing LLM-generated code. More detailed profiling over common use cases is coming shortly, which will be used to drive future development.

Overall, these design decisions make decoding a great library for R&D. Researchers can flexibly explore the design space of relevant inference algoriths for a task, and when they've arrived at a solution that is optimal and they'd like to scale it, they can refactor the relevant bottlenecks and exploit the specific optimizations that are available for their individual case.

What's next

There are a number of features coming soon to decoding.

Monte Carlo Tree Search

Currently in decoding.experimental we have initial support for a powerful RolloutTreeSearch algorithm. This wraps the TreeSearch interface, enabling rollouts within each sync phase and pushing scores back up the tree for reranking. This has currently been designed to work best with process reward models and other similar scoring functions, as opposed to the full flexibility we provide for BestOfN and TreeSearch that can e.g., harness grammar constraints or apply sharper scoring functions. As this interface is finalized and documented examples come together, it will be promoted to the fully supported decoding.generators.

Sequantial Monte Carlo / Particle Filtering

HFPPL is a beautiful library for probabilistic programming with large language models, based on the work by Lew et al., 2024 on Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs. It is on our roadmap to see how their underlying algorithms and infrastructure can be ported to our interface. In the meantime, check them out.

Contributing

We welcome community contributions. Open an issue or a pull request if you see any ways to make decoding better.

To get started with development, this library supports an automated build of the dev and testing env using GNU Make.

# clone repo and install lib in current env
git clone git@github.com:benlipkin/decoding.git
cd decoding/
make env

Before opening a PR, make sure all tests are passing:

make tests

Citation

@misc{decoding2024lipkin,
    author = {Lipkin, Benjamin},
    title = {Decoding: Composable inference algorithms with LLMs and programmable logic.},
    publisher = {GitHub},
    journal = {GitHub},
    howpublished = {\url{https://github.com/benlipkin/decoding}},
    year = 2024,
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decoding-0.1.3.tar.gz (26.0 kB view details)

Uploaded Source

Built Distribution

decoding-0.1.3-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file decoding-0.1.3.tar.gz.

File metadata

  • Download URL: decoding-0.1.3.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for decoding-0.1.3.tar.gz
Algorithm Hash digest
SHA256 e6e94642536c6f99eb8c874de6f70364a780e981c0ef7b81a64110763b2aa35d
MD5 683c4f5e6fda7454ecd49843960ff0d0
BLAKE2b-256 1d1a14efa7a2a2535734f49e8a56355880a0c82a55a716c9f7dbc06fa80cac75

See more details on using hashes here.

Provenance

The following attestation bundles were made for decoding-0.1.3.tar.gz:

Publisher: release.yml on benlipkin/decoding

Attestations:

File details

Details for the file decoding-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: decoding-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for decoding-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 20ab66bbd46cde992c4693f955336aaad4bf54c1b6c44f051e1e9bcedb461fbc
MD5 93d4283274f1170a87f6569de6847f07
BLAKE2b-256 a3bcc3a14c19272c405491a090568ef3a24f8b698e8ba19cb17d4a7f09b05892

See more details on using hashes here.

Provenance

The following attestation bundles were made for decoding-0.1.3-py3-none-any.whl:

Publisher: release.yml on benlipkin/decoding

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page