Skip to main content

Python SDK for the Aqueduct prediction infrastructure

Project description

Aqueduct: Taking Data Science to Production

Downloads Slack GitHub license PyPI version Tests

Aqueduct automates the engineering required to take data science & machine learning projects to production.

With Aqueduct, you can define & deploy machine learning pipelines, connect your code to data and business systems, and monitor the performance and quality of your pipelines -- all using a simple Python API.

You can install Aqueduct via pip:

pip3 install aqueduct-ml
aqueduct start

Once you have Aqueduct running, you can create your first workflow:

from aqueduct import Client, op, metric

client = Client()

@op
def transform_data(reviews):
    reviews['strlen'] = reviews['review'].str.len()
    return reviews


demo_db = client.integration("aqueduct_demo")
reviews_table = demo_db.sql("select * from hotel_reviews;")

strlen_table = transform_data(reviews_table)
strlen_table.save(demo_db.config(table="strlen_table", update_mode="replace")) 

client.publish_flow(name="review_strlen", artifacts=[strlen_table])

Once you've created a workflow, you can view that workflow in the Aqueduct UI:

image

Why Aqueduct?

The engineering required to get data science & machine learning projects in production slows down data teams. Aqueduct automates away that engineering and allows you to define robust data & ML pipelines in a few lines of code and run them anywhere.

  • Simple API: Get a production-ready pipeline running in a few lines of vanilla Python.
  • Flexible Environments: Run your pipelines anywhere (locally or in the cloud). You can deploy & update your models on your own infrastructure without building custom Docker containers, managing Kubernetes deployments, or storing passwords in plaintext.
  • Data Integrations: With out-of-the-box connectors, you can access the freshest data easily & reliably.
  • Custom Monitoring: Aqueduct's checks and metrics allow you to define contraints on your workflows, so you can quickly debug and fix errors.

Overview & Examples

Aqueduct allow you to build powerful machine learning workflows that run anywhere, publish predictions everywhere, and ensure prediction quality. The core abstraction in Aqueduct is a Workflow, which is a sequence of Artifacts (data) that are transformed by Operators (compute). The input Artifact(s) for a Workflow is typically loaded from a database, and the output Artifact(s) are typically persisted back to a database. Each Workflow can either be run on a fixed schedule or triggered on-demand.

To see Aqueduct in action on some real-world machine learning workflows, check out some of our examples:

What's next?

Check out our documentation, where you'll find:

If you have questions or comments or would like to learn more about what we're building, please reach out, join our Slack channel, or start a conversation on GitHub. We'd love to hear from you!

If you're interested in contributing, please check out our roadmap and join the development channel in our community Slack.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aqueduct-sdk-0.1.4.tar.gz (75.0 kB view hashes)

Uploaded Source

Built Distribution

aqueduct_sdk-0.1.4-py3-none-any.whl (97.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page