aqueduct-sdk

Python SDK for the Aqueduct prediction infrastructure

These details have been verified by PyPI

Maintainers

andre.aqueducthq aqeunice aqueduct_engineering cgwu cwinddavid hsubbaraj jerome65 kenxu sauravchh vsreekanti

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Aqueduct: Orchestrate & manage production ML

Aqueduct enables you to define, deploy and monitor robust ML pipelines on any cloud infrastructure. Check out our quickstart guide!

Aqueduct gives you a simple Python-native API to define machine learning pipelines, the ability to deploy those pipelines on your existing infrastructure (e.g., Spark, Kubernetes, Lambda), and visibility into the code, data, and metadata associated with your workflows. Aqueduct is fully open-source and runs securely in your cloud.

You can install Aqueduct via pip:

pip3 install aqueduct-ml
aqueduct start

Now, we can create our first workflow:

from aqueduct import Client, op, metric

client = Client()

@op
def transform_data(reviews):
    reviews['strlen'] = reviews['review'].str.len()
    return reviews


demo_db = client.integration("aqueduct_demo")
reviews_table = demo_db.sql("select * from hotel_reviews;")

strlen_table = transform_data(reviews_table)
demo_db.save(strlen_table, "strlen_table", "replace)

client.publish_flow(name="review_strlen", artifacts=[strlen_table])

Once we've created a workflow, we can view that workflow in the Aqueduct UI:

Why Aqueduct?

The engineering required to get data science & machine learning projects in production slows down data teams. Aqueduct automates away that engineering and allows you to define robust data & ML pipelines in a few lines of code and run them anywhere.

Python-native pipeline API: Aqueduct’s API allows you define your workflows in vanilla Python, so you can get code into production quickly and effectively. No more DSLs or YAML configs to worry about.
Integrated with your infrastructure: Workflows defined in Aqueduct can run on any cloud infrastructure you use, like Kubernetes, Spark, Airflow, or AWS Lambda. You can get all the benefits of Aqueduct without having to rip-and-replace your existing tooling.
Centralized visibility into code, data, & metadata: Once your workflows are in production, you need to know what’s running, whether it’s working, and when it breaks. Aqueduct gives you visibility into what code, data, metrics, and metadata are generated by each workflow run, so you can have confidence that your pipelines work as expected — and know immediately when they don’t.
Runs securely in your cloud: Aqueduct is fully open-source and runs in any Unix environment. It runs entirely in your cloud and on your infrastructure, so you can be confident that nothing is ever leaving your cloud.

Overview & Examples

The core abstraction in Aqueduct is a Workflow, which is a sequence of Artifacts (data) that are transformed by Operators (compute). The input Artifact(s) for a Workflow is typically loaded from a database, and the output Artifact(s) are typically persisted back to a database. Each Workflow can either be run on a fixed schedule or triggered on-demand.

To see Aqueduct in action on some real-world machine learning workflows, check out some of our examples:

What's next?

Check out our documentation, where you'll find:

If you have questions or comments or would like to learn more about what we're building, please reach out, join our Slack channel, or start a conversation on GitHub. We'd love to hear from you!

If you're interested in contributing, please check out our roadmap and join the development channel in our community Slack.

Project details

These details have been verified by PyPI

Maintainers

andre.aqueducthq aqeunice aqueduct_engineering cgwu cwinddavid hsubbaraj jerome65 kenxu sauravchh vsreekanti

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.6

Jun 7, 2023

0.3.5

May 31, 2023

0.3.4

May 24, 2023

0.3.3

May 17, 2023

0.3.2

May 10, 2023

0.3.1

May 4, 2023

0.2.12

Apr 26, 2023

0.2.11

Apr 18, 2023

0.2.10

Apr 11, 2023

0.2.9

Apr 5, 2023

0.2.8

Mar 29, 2023

0.2.7

Mar 23, 2023

0.2.6

Mar 14, 2023

0.2.5

Mar 7, 2023

0.2.4

Mar 1, 2023

This version

0.2.3

Feb 22, 2023

0.2.2

Feb 15, 2023

0.2.1

Feb 8, 2023

0.2.0

Feb 1, 2023

0.1.11

Jan 24, 2023

0.1.10

Jan 17, 2023

0.1.9

Jan 11, 2023

0.1.8

Dec 21, 2022

0.1.7

Dec 14, 2022

0.1.6

Dec 14, 2022

0.1.5

Nov 30, 2022

0.1.4

Nov 15, 2022

0.1.3

Nov 8, 2022

0.1.2

Nov 1, 2022

0.1.1

Oct 26, 2022

0.1.0

Oct 18, 2022

0.0.16

Sep 26, 2022

0.0.15

Sep 21, 2022

0.0.14

Sep 12, 2022

0.0.13

Sep 7, 2022

0.0.12

Aug 25, 2022

0.0.11

Aug 23, 2022

0.0.10

Aug 23, 2022

0.0.9

Aug 16, 2022

0.0.8

Aug 9, 2022

0.0.7

Aug 1, 2022

0.0.6

Jul 25, 2022

0.0.5

Jul 14, 2022

0.0.4

Jul 8, 2022

0.0.3

Jun 22, 2022

0.0.2

Jun 9, 2022

0.0.1

May 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aqueduct-sdk-0.2.3.tar.gz (90.8 kB view hashes)

Uploaded Feb 22, 2023 Source

Built Distribution

aqueduct_sdk-0.2.3-py3-none-any.whl (118.7 kB view hashes)

Uploaded Feb 22, 2023 Python 3

Hashes for aqueduct-sdk-0.2.3.tar.gz

Hashes for aqueduct-sdk-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`4c4cbfd6179fabefd44061a81b41a9cd533037e16e7f62fa13b87ba5dff2a1d8`
MD5	`292d719681eef8750587d4edcfe88286`
BLAKE2b-256	`75420c25c896e39fd46d362a9d6d584b0c8398896a5e6b37d74ce263cadc7758`

Hashes for aqueduct_sdk-0.2.3-py3-none-any.whl

Hashes for aqueduct_sdk-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`13e599df567a387fc4e2133ec49530847a7d547cd83ad93eab07ba5fa4458a09`
MD5	`ecbe9d08aca6acc85ea0768dee992599`
BLAKE2b-256	`d151ced43a5d73f386748a8963290df4c093913b0bbd0b151af870f946277cad`