Kozmo is a tool for building and deploying data pipelines.
Project description
Integrate and synchronize data from 3rd party sources
Build real-time and batch pipelines to transform data using Python, SQL, and R
Run, monitor, and orchestrate thousands of pipelines without losing sleep
Features
🎶 | Orchestration | Schedule and manage data pipelines with observability. |
📓 | Notebook | Interactive Python, SQL, & R editor for coding data pipelines. |
🏗️ | Data integrations | Synchronize data from 3rd party sources to your internal destinations. |
🚰 | Streaming pipelines | Ingest and transform real-time data. |
❎ | dbt | Build, run, and manage your dbt models with Kozmo. |
A sample data pipeline defined across 3 files ➝
- Load data ➝
@data_loader def load_csv_from_file(): return pd.read_csv('default_repo/titanic.csv')
- Transform data ➝
@transformer def select_columns_from_df(df, *args): return df[['Age', 'Fare', 'Survived']]
- Export data ➝
@data_exporter def export_titanic_data_to_disk(df) -> None: df.to_csv('default_repo/titanic_transformed.csv')
What the data pipeline looks like in the UI ➝
New? We recommend reading about blocks and learning from a hands-on tutorial.
Setting up a Development Environment
We'd love to have your contribution, but first you'll need to configure your local environment first. In this guide, we'll walk through:
- Configuring virtual environment
- Installing dependencies
- Installing Git hooks
- Installing pre-commit hooks
- Building the Kozmo Docker image
- Running dev!
[!WARNING] All commands below, without any notes, assume you are at the root of the repo.
Kozmo server uses Python >=3.6 (as per setup.py
), but the development dependencies will complain if you're not using at least Python 3.8. We use Python 3.10.
As such, make sure you have Python >=3.8. Verify this with:
git clone https://github.com/kozmoai/kozmoai kozmoai
cd kozmoai
python --version
Using a virtual environment is recommended.
Configuring a Virtual Env
Anaconda + Poetry
Create an Anaconda virtual environment with the correct version of python:
conda create -n python3.10 python==3.10
Activate that virtual environment (to get the right version of Python on your PATH):
conda activate python3.10
Verify that the correct Python version is being used:
python --version
# or
where python
# or
which python
# or
whereis python
Then create a Poetry virtual environment using the same version of Python:
poetry env use $(which python)
Install the dev dependencies:
make dev_env
Virtualenv
First, create a virtualenv environment in the root of the repo:
python -m venv .venv
Then activate it:
source .venv/bin/activate
To install dependencies:
pip install -U pip
pip install -r ./requirements.txt
pip install toml kozmoai
Install additional dev dependencies from pyproject.toml
:
pip install $(python -c "import toml; print(' '.join(toml.load('pyproject.toml')['tool']['poetry']['group']['dev']['dependencies'].keys()))" | tr '\n' ' ')
The above command uses the toml
library to output the dev dependencies from the pyproject.toml
as a space-delimited list, and passes that output to the pip install
command.
Kozmo frontend
If you'll only be contributing to backend code, this section may be omitted.
[!IMPORTANT] Even if you are only working on UIs, you would still have to have the server running at port
6789
.
The Kozmo frontend is a Next.js project
cd kozmo_ai/frontend/
that uses Yarn.
yarn install && yarn dev
Git Hooks
Install Git hooks by running the Make command:
make install-hooks
This will copy the git hooks from .git-dev/hooks
into .git/hooks
, and make them executable.
Pre-Commit
Install the pre-commit hooks:
pre-commit install
Note that this will install both pre-commit and pre-push hooks.
Run development server
To initialize a development kozmo project so you have a starting point:
./scripts/init.sh default_repo
Then, to start the dev server for the backend at localhost:6789
and frontend at localhost:3000
:
./scripts/dev.sh default_repo
In case you only want the backend:
./scripts/start.sh default_repo
The name default_repo
could technically be anything, but if you decide to change it, be sure to add it to the .gitignore
file. You're now ready to contribute!
See this video for further guidance and instructions.
Any time you'd like to build, just run ./scripts/dev.sh default_repo
to run the development containers.
Any changes you make, backend or frontend, will be reflected in the development instance.
Our pre-commit & pre-push hooks will run when you make a commit/push to check style, etc.
Now it's time to create a new branch, contribute code, and open a pull request!
Troubleshoot
Here are some common problems our users have encountered. If other issues arise, please reach out to us in Slack!
Illegal instruction
If an Illegal instruction
error is received, or Docker containers exit instantly with code 132
, it means your machine is using an older architecture that does not support certain instructions called from the (Python) dependencies. Please either try again on another machine, or manually setup the server, start it in verbose mode to see which package caused the error, and look up for alternatives.
List of builds:
polars
->polars-lts-cpu
pip install
fails on Windows
Some Python packages assume a few core functionalities that are not available on Windows, so you need to install these prerequisites, see the fantastic (but archived) pipwin and this issue for more options.
Please report any other build errors in our Slack.
ModuleNotFoundError: No module named 'x'
If there were added new libraries you should manually handle new dependencies. It can be done in 2 ways:
docker-compose build
from project root will fully rebuild an image with new dependencies - it can take lots of timepip install x
from inside the container will only install the required dependency - it should be much faster
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.