Skip to main content

Palimpzest is a system which enables anyone to process AI-powered analytical queries simply by defining them in a declarative language

Project description

pz-banner

Palimpzest (PZ)

Discord Docs Colab Demo PyPI PyPI - Monthly Downloads

Learn How to Use PZ

Our full documentation is the definitive resource for learning how to use PZ. It contains all of the installation and quickstart materials on this page, as well as user guides, full API documentation, and much more.

Getting started

You can find a stable version of the PZ package on PyPI here. To install the package, run:

$ pip install palimpzest

Alternatively, to install the latest version of the package from this repository, you can clone this repository and run the following commands:

$ git clone git@github.com:mitdbg/palimpzest.git
$ cd palimpzest
$ pip install .

Join the PZ Community

We are actively hacking on PZ and would love to have you join our community Discord

Our Discord server is the best place to:

  • Get help with your PZ program(s)
  • Give feedback to the maintainers
  • Discuss the future direction(s) of the project
  • Discuss anything related to data processing with LLMs!

We are eager to learn more about your workloads and use cases, and will take them into consideration in planning our future roadmap.

Quick Start

The easiest way to get started with Palimpzest is to run the quickstart.ipynb jupyter notebook. We demonstrate the full workflow of working with PZ, including registering a dataset, composing and executing a pipeline, and accessing the results. To run the notebook, you can use the following command:

$ jupyter notebook

And then access the notebook from the jupyter interface in your browser at localhost:8888.

Even Quicker Start

For eager readers, the code in the notebook can be found in the following condensed snippet. However, we do suggest reading the notebook as it contains more insight into each element of the program.

import palimpzest as pz

# define the fields we wish to compute
email_cols = [
    {"name": "sender", "type": str, "desc": "The email address of the sender"},
    {"name": "subject", "type": str, "desc": "The subject of the email"},
    {"name": "date", "type": str, "desc": "The date the email was sent"},
]

# lazily construct the computation to get emails about holidays sent in July
dataset = pz.Dataset("testdata/enron-tiny/")
dataset = dataset.sem_add_columns(email_cols)
dataset = dataset.sem_filter("The email was sent in July")
dataset = dataset.sem_filter("The email is about holidays")

# execute the computation w/the MinCost policy
config = pz.QueryProcessorConfig(policy=pz.MinCost(), verbose=True)
output = dataset.run(config)

# display output (if using Jupyter, otherwise use print(output_df))
output_df = output.to_df(cols=["date", "sender", "subject"])
display(output_df)

Python Demos

Below are simple instructions to run PZ on a test data set of enron emails that is included with the system.

Downloading test data

To run the provided demos, you will need to download the test data. Due to the size of the data, we are unable to include it in the repository. You can download the test data by running the following command from a unix terminal (requires wget and tar):

chmod +x testdata/download-testdata.sh
./testdata/download-testdata.sh

Running the Demos

Set your OpenAI (or Together.ai) api key at the command line:

# set one (or both) of the following:
export OPENAI_API_KEY=<your-api-key>
export TOGETHER_API_KEY=<your-api-key>

Now you can run the simple test program with:

$ python demos/simple-demo.py --task enron --dataset testdata/enron-eval-tiny --verbose

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

palimpzest-0.6.4.tar.gz (149.1 kB view details)

Uploaded Source

Built Distribution

palimpzest-0.6.4-py3-none-any.whl (183.5 kB view details)

Uploaded Python 3

File details

Details for the file palimpzest-0.6.4.tar.gz.

File metadata

  • Download URL: palimpzest-0.6.4.tar.gz
  • Upload date:
  • Size: 149.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for palimpzest-0.6.4.tar.gz
Algorithm Hash digest
SHA256 312ca63d8076431c711874f643823d197eb42f2f1714c12c1d58c49afd0afedb
MD5 6b5f846cb30065579acbdbfde932c0ac
BLAKE2b-256 8b616987698b3ff6d417b5ff311335aac03e6b76764aa9a5b452cdf4711ac511

See more details on using hashes here.

Provenance

The following attestation bundles were made for palimpzest-0.6.4.tar.gz:

Publisher: package.yaml on mitdbg/palimpzest

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file palimpzest-0.6.4-py3-none-any.whl.

File metadata

  • Download URL: palimpzest-0.6.4-py3-none-any.whl
  • Upload date:
  • Size: 183.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for palimpzest-0.6.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9e24836b11e28c4da6099e70c248ce9dc179b9526aea9adc3c21f74ba80b89be
MD5 86dc8e8943ccf5500ac1dc413efae617
BLAKE2b-256 87d5d5b1a88b2f069152a57bf74bf11f6d54481ab18a76c3b6834936a32e0d61

See more details on using hashes here.

Provenance

The following attestation bundles were made for palimpzest-0.6.4-py3-none-any.whl:

Publisher: package.yaml on mitdbg/palimpzest

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page