Skip to main content

Kozmo is a tool for building and deploying data pipelines.

Project description

Integrate and synchronize data from 3rd party sources

Build real-time and batch pipelines to transform data using Python, SQL, and R

Run, monitor, and orchestrate thousands of pipelines without losing sleep


Features

🎶 Orchestration Schedule and manage data pipelines with observability.
📓 Notebook Interactive Python, SQL, & R editor for coding data pipelines.
🏗️ Data integrations Synchronize data from 3rd party sources to your internal destinations.
🚰 Streaming pipelines Ingest and transform real-time data.
dbt Build, run, and manage your dbt models with Kozmo.

A sample data pipeline defined across 3 files ➝

  1. Load data ➝
    @data_loader
    def load_csv_from_file():
        return pd.read_csv('default_repo/titanic.csv')
    
  2. Transform data ➝
    @transformer
    def select_columns_from_df(df, *args):
        return df[['Age', 'Fare', 'Survived']]
    
  3. Export data ➝
    @data_exporter
    def export_titanic_data_to_disk(df) -> None:
        df.to_csv('default_repo/titanic_transformed.csv')
    

What the data pipeline looks like in the UI ➝

data pipeline overview

New? We recommend reading about blocks and learning from a hands-on tutorial.


Setting up a Development Environment

We'd love to have your contribution, but first you'll need to configure your local environment first. In this guide, we'll walk through:

  1. Configuring virtual environment
  2. Installing dependencies
  3. Installing Git hooks
  4. Installing pre-commit hooks
  5. Building the Kozmo Docker image
  6. Running dev!

[!WARNING] All commands below, without any notes, assume you are at the root of the repo.

Kozmo server uses Python >=3.6 (as per setup.py), but the development dependencies will complain if you're not using at least Python 3.8. We use Python 3.10.

As such, make sure you have Python >=3.8. Verify this with:

git clone https://github.com/kozmoai/kozmoai kozmoai
cd kozmoai
python --version

Using a virtual environment is recommended.

Configuring a Virtual Env

Anaconda + Poetry

Create an Anaconda virtual environment with the correct version of python:

conda create -n python3.10 python==3.10

Activate that virtual environment (to get the right version of Python on your PATH):

conda activate python3.10

Verify that the correct Python version is being used:

python --version
# or
where python
# or
which python
# or
whereis python

Then create a Poetry virtual environment using the same version of Python:

poetry env use $(which python)

Install the dev dependencies:

make dev_env

Virtualenv

First, create a virtualenv environment in the root of the repo:

python -m venv .venv

Then activate it:

source .venv/bin/activate

To install dependencies:

pip install -U pip
pip install -r ./requirements.txt
pip install toml kozmoai

Install additional dev dependencies from pyproject.toml:

pip install $(python -c "import toml; print(' '.join(toml.load('pyproject.toml')['tool']['poetry']['group']['dev']['dependencies'].keys()))" | tr '\n' ' ')

The above command uses the toml library to output the dev dependencies from the pyproject.toml as a space-delimited list, and passes that output to the pip install command.

Kozmo frontend

If you'll only be contributing to backend code, this section may be omitted.

[!IMPORTANT] Even if you are only working on UIs, you would still have to have the server running at port 6789.

The Kozmo frontend is a Next.js project

cd kozmo_ai/frontend/

that uses Yarn.

yarn install && yarn dev

Git Hooks

Install Git hooks by running the Make command:

make install-hooks

This will copy the git hooks from .git-dev/hooks into .git/hooks, and make them executable.

Pre-Commit

Install the pre-commit hooks:

pre-commit install

Note that this will install both pre-commit and pre-push hooks.

Run development server

To initialize a development kozmo project so you have a starting point:

./scripts/init.sh default_repo

Then, to start the dev server for the backend at localhost:6789 and frontend at localhost:3000:

./scripts/dev.sh default_repo

In case you only want the backend:

./scripts/start.sh default_repo

The name default_repo could technically be anything, but if you decide to change it, be sure to add it to the .gitignore file. You're now ready to contribute!

See this video for further guidance and instructions.

Any time you'd like to build, just run ./scripts/dev.sh default_repo to run the development containers.

Any changes you make, backend or frontend, will be reflected in the development instance.

Our pre-commit & pre-push hooks will run when you make a commit/push to check style, etc.

Now it's time to create a new branch, contribute code, and open a pull request!

Troubleshoot

Here are some common problems our users have encountered. If other issues arise, please reach out to us in Slack!

Illegal instruction

If an Illegal instruction error is received, or Docker containers exit instantly with code 132, it means your machine is using an older architecture that does not support certain instructions called from the (Python) dependencies. Please either try again on another machine, or manually setup the server, start it in verbose mode to see which package caused the error, and look up for alternatives.

List of builds:

pip install fails on Windows

Some Python packages assume a few core functionalities that are not available on Windows, so you need to install these prerequisites, see the fantastic (but archived) pipwin and this issue for more options.

Please report any other build errors in our Slack.

ModuleNotFoundError: No module named 'x'

If there were added new libraries you should manually handle new dependencies. It can be done in 2 ways:

  1. docker-compose build from project root will fully rebuild an image with new dependencies - it can take lots of time
  2. pip install x from inside the container will only install the required dependency - it should be much faster

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kozmo_ai-0.1.2.tar.gz (19.2 MB view hashes)

Uploaded Source

Built Distribution

kozmo_ai-0.1.2-py3-none-any.whl (20.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page