Prediction Infrastructure for Data Scientists
Project description
Aqueduct: Taking Data Science to Production
Aqueduct automates the engineering required to take data science & machine learning projects to production.
With Aqueduct, you can define & deploy machine learning pipelines, connect your code to data and business systems, and monitor the performance and quality of your pipelines -- all using a simple Python API.
You can install Aqueduct via pip
:
pip3 install aqueduct-ml
aqueduct start
Once you have Aqueduct running, you can create your first workflow:
from aqueduct import Client, op, metric
client = Client()
@op
def transform_data(reviews):
reviews['strlen'] = reviews['review'].str.len()
return reviews
demo_db = client.integration("aqueduct_demo")
reviews_table = demo_db.sql("select * from hotel_reviews;")
strlen_table = transform_data(reviews_table)
strlen_table.save(demo_db.config(table="strlen_table", update_mode="replace"))
client.publish_flow(name="review_strlen", artifacts=[strlen_table])
Once you've created a workflow, you can view that workflow in the Aqueduct UI:
Why Aqueduct?
The engineering required to get data science & machine learning projects in production slows down data teams. Aqueduct automates away that engineering and allows you to define robust data & ML pipelines in a few lines of code and run them anywhere.
- Simple API: Get a production-ready pipeline running in a few lines of vanilla Python.
- Flexible Environments: Run your pipelines anywhere (locally or in the cloud). You can deploy & update your models on your own infrastructure without building custom Docker containers, managing Kubernetes deployments, or storing passwords in plaintext.
- Data Integrations: With out-of-the-box connectors, you can access the freshest data easily & reliably.
- Custom Monitoring: Aqueduct's checks and metrics allow you to define contraints on your workflows, so you can quickly debug and fix errors.
Overview & Examples
Aqueduct allow you to build powerful machine learning workflows that run anywhere, publish predictions everywhere, and ensure prediction quality. The core abstraction in Aqueduct is a Workflow, which is a sequence of Artifacts (data) that are transformed by Operators (compute). The input Artifact(s) for a Workflow is typically loaded from a database, and the output Artifact(s) are typically persisted back to a database. Each Workflow can either be run on a fixed schedule or triggered on-demand.
To see Aqueduct in action on some real-world machine learning workflows, check out some of our examples:
What's next?
Check out our documentation, where you'll find:
- a Quickstart Guide
- example workflows
- and more details on creating workflows
If you have questions or comments or would like to learn more about what we're building, please reach out, join our Slack channel, or start a conversation on GitHub. We'd love to hear from you!
If you're interested in contributing, please check out our roadmap and join the development channel in our community Slack.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for aqueduct_ml-0.1.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56b1bb4e429d0b7fdd7434255ff069b42abf5c3a6af0f7e55a3918a30399eb07 |
|
MD5 | 657a71b5caf95471968d88b5c67bd136 |
|
BLAKE2b-256 | 5cd269741c128903fa50c66aa847c5340fd1b53cc3abe05bbf7b35f9848eb5fd |