Skip to main content

ETL with LLM operations.

Reason this release was yanked:

Has a bug in parsing an LLM response

Project description

DocETL: Powering Complex Document Processing Pipelines

Website (Includes Demo) | Documentation | Discord | NotebookLM Podcast (thanks Shabie from our Discord community!) | Paper (coming soon!)

DocETL Figure

DocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers a low-code, declarative YAML interface to define LLM-powered operations on complex data.

When to Use DocETL

DocETL is the ideal choice when you're looking to maximize correctness and output quality for complex tasks over a collection of documents or unstructured datasets. You should consider using DocETL if:

  • You want to perform semantic processing on a collection of data
  • You have complex tasks that you want to represent via map-reduce (e.g., map over your documents, then group by the result of your map call & reduce)
  • You're unsure how to best express your task to maximize LLM accuracy
  • You're working with long documents that don't fit into a single prompt or are too lengthy for effective LLM reasoning
  • You have validation criteria and want tasks to automatically retry when the validation fails

Installation

See the documentation for installing from PyPI.

Prerequisites

Before installing DocETL, ensure you have Python 3.10 or later installed on your system. You can check your Python version by running:

python --version

Installation Steps (from Source)

  1. Clone the DocETL repository:
git clone https://github.com/shreyashankar/docetl.git
cd docetl
  1. Install Poetry (if not already installed):
pip install poetry
  1. Install the project dependencies:
poetry install
  1. Set up your OpenAI API key:

Create a .env file in the project root and add your OpenAI API key:

OPENAI_API_KEY=your_api_key_here

Alternatively, you can set the OPENAI_API_KEY environment variable in your shell.

  1. Run the basic test suite to ensure everything is working (this costs less than $0.01 with OpenAI):
make tests-basic

That's it! You've successfully installed DocETL and are ready to start processing documents.

For more detailed information on usage and configuration, please refer to our documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docetl-0.1.4.tar.gz (110.1 kB view details)

Uploaded Source

Built Distribution

docetl-0.1.4-py3-none-any.whl (127.4 kB view details)

Uploaded Python 3

File details

Details for the file docetl-0.1.4.tar.gz.

File metadata

  • Download URL: docetl-0.1.4.tar.gz
  • Upload date:
  • Size: 110.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for docetl-0.1.4.tar.gz
Algorithm Hash digest
SHA256 90202199c6f821508c7f8fafbdee614a0222ed9665c3a3c6e9938be6f0a4e42b
MD5 847b0af4e49020f7d2154410de3dc5a5
BLAKE2b-256 d2d93df2690c9259035863d6244dcb4ae7bce91a5323063e6059db427242aeb6

See more details on using hashes here.

Provenance

File details

Details for the file docetl-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: docetl-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 127.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for docetl-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 17ed4250156fed51536a85dbafe1125b5c22f2b713aea9e48b4e6d8db7f5e4f4
MD5 e68a7d6b420cb350c5fbb70202bebab9
BLAKE2b-256 b10af911be7191598542f46d76c3454c762008424f009ebcfb540258df8a5bfb

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page