Skip to main content

ETL with LLM operations.

Project description

📜 DocETL: Powering Complex Document Processing Pipelines

Website Documentation Discord Paper

DocETL Figure

DocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers:

  1. An interactive UI playground for iterative prompt engineering and pipeline development
  2. A Python package for running production pipelines from the command line or Python code

🌟 Community Projects

📚 Educational Resources

🚀 Getting Started

There are two main ways to use DocETL:

1. 🎮 Interactive UI Playground (Recommended for Development)

The UI Playground helps you iteratively develop your pipeline:

  • Experiment with different prompts and see results in real-time
  • Build your pipeline step by step
  • Export your finalized pipeline configuration for production use

DocETL Playground

To run the playground locally, you can either:

  • Use Docker (recommended for quick start): make docker
  • Set up the development environment manually

See the Playground Setup Guide for detailed instructions.

2. 📦 Python Package (For Production Use)

If you want to use DocETL as a Python package:

Prerequisites

  • Python 3.10 or later
  • OpenAI API key
pip install docetl

Create a .env file in your project directory:

OPENAI_API_KEY=your_api_key_here  # Required for LLM operations (or the key for the LLM of your choice)

To see examples of how to use DocETL, check out the tutorial.

2. 🎮 UI Playground Setup

To run the UI playground locally, you have two options:

Option A: Using Docker (Recommended for Quick Start)

The easiest way to get the playground running:

  1. Create the required environment files:

Create .env in the root directory:

OPENAI_API_KEY=your_api_key_here
BACKEND_ALLOW_ORIGINS=
BACKEND_HOST=0.0.0.0
BACKEND_PORT=8000
BACKEND_RELOAD=True
FRONTEND_HOST=0.0.0.0
FRONTEND_PORT=3000

Create .env.local in the website directory:

OPENAI_API_KEY=sk-xxx
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini

NEXT_PUBLIC_BACKEND_HOST=localhost
NEXT_PUBLIC_BACKEND_PORT=8000
  1. Run Docker:
make docker

This will:

  • Create a Docker volume for persistent data
  • Build the DocETL image
  • Run the container with the UI accessible at http://localhost:3000

To clean up Docker resources (note that this will delete the Docker volume):

make docker-clean

Option B: Manual Setup (Development)

For development or if you prefer not to use Docker:

  1. Clone the repository:
git clone https://github.com/ucbepic/docetl.git
cd docetl
  1. Set up environment variables in .env in the root/top-level directory:
OPENAI_API_KEY=your_api_key_here
BACKEND_ALLOW_ORIGINS=
BACKEND_HOST=localhost
BACKEND_PORT=8000
BACKEND_RELOAD=True
FRONTEND_HOST=0.0.0.0
FRONTEND_PORT=3000

And create an .env.local file in the website directory with the following:

OPENAI_API_KEY=sk-xxx
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini

NEXT_PUBLIC_BACKEND_HOST=localhost
NEXT_PUBLIC_BACKEND_PORT=8000
  1. Install dependencies:
make install      # Install Python package
make install-ui   # Install UI dependencies

Note that the OpenAI API key, base, and model name are for the UI assistant only; not the DocETL pipeline execution engine.

  1. Start the development server:
make run-ui-dev
  1. Visit http://localhost:3000/playground to access the interactive UI.

🛠️ Development Setup

If you're planning to contribute or modify DocETL, you can verify your setup by running the test suite:

make tests-basic  # Runs basic test suite (costs < $0.01 with OpenAI)

For detailed documentation and tutorials, visit our documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docetl-0.2.tar.gz (143.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docetl-0.2-py3-none-any.whl (165.8 kB view details)

Uploaded Python 3

File details

Details for the file docetl-0.2.tar.gz.

File metadata

  • Download URL: docetl-0.2.tar.gz
  • Upload date:
  • Size: 143.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for docetl-0.2.tar.gz
Algorithm Hash digest
SHA256 3059b6ae88d9a5590eac7ece793d9eb299ceca1b1fe65900e273bb23f0ead3fc
MD5 4cabf7e7fe8173064bc21cf6b69e7e1d
BLAKE2b-256 c71e3d763eedc56343945fb042fbb6a58f3b6dcf21e7d233c46f3220b93881b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for docetl-0.2.tar.gz:

Publisher: release.yml on ucbepic/docetl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docetl-0.2-py3-none-any.whl.

File metadata

  • Download URL: docetl-0.2-py3-none-any.whl
  • Upload date:
  • Size: 165.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for docetl-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b8ce1fd6d323cfd66295115fc10d88c45987a4df280115f5c67f11704b27dbaa
MD5 72ae1ff9ef5f5dd70b757921ec59eb02
BLAKE2b-256 840020335dcc111df924f836ed4d4073ef160b4a80249b5bce6d00a7931152cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for docetl-0.2-py3-none-any.whl:

Publisher: release.yml on ucbepic/docetl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page