Skip to main content

Extract and load your data reliably from API Clients with native fault-tolerant and checkpointing mechanism.

Project description

bizon ⚡️

Extract and load your largest data streams with a framework you can trust for billion records.

Features

  • Natively fault-tolerant: Bizon uses a checkpointing mechanism to keep track of the progress and recover from the last checkpoint.

  • High throughput: Bizon is designed to handle high throughput and can process billions of records.

  • Queue system agnostic: Bizon is agnostic of the queuing system, you can use any queuing system among Python Queue, RabbitMQ, Kafka or Redpanda. Thanks to the bizon.engine.queue.Queue interface, adapters can be written for any queuing system.

  • Pipeline metrics: Bizon provides exhaustive pipeline metrics and implement Datadog & OpenTelemetry for tracing. You can monitor:

    • ETAs for completion
    • Number of records processed
    • Completion percentage
    • Latency Source <> Destination
  • Lightweight & lean: Bizon is lightweight, minimal codebase and only uses few dependencies:

    • requests for HTTP requests
    • pyyaml for configuration
    • sqlalchemy for database / warehouse connections
    • polars for memory efficient data buffering and vectorized processing
    • pyarrow for Parquet file format

Installation

pip install bizon

Usage

List available sources and streams

bizon source list
bizon stream list <source_name>

Create a pipeline

Create a file named config.yml in your working directory with the following content:

name: demo-creatures-pipeline

source:
  name: dummy
  stream: creatures
  authentication:
    type: api_key
    params:
      token: dummy_key

destination:
  name: logger
  config:
    dummy: dummy

Run the pipeline with the following command:

bizon run config.yml

Backend configuration

Backend is the interface used by Bizon to store its state. It can be configured in the backend section of the configuration file. The following backends are supported:

  • sqlite: In-memory SQLite database, useful for testing and development.
  • bigquery: Google BigQuery backend, perfect for light setup & production.
  • postgres: PostgreSQL backend, for production use and frequent cursor updates.

Queue configuration

Queue is the interface used by Bizon to exchange data between Source and Destination. It can be configured in the queue section of the configuration file. The following queues are supported:

  • python_queue: Python Queue, useful for testing and development.
  • rabbitmq: RabbitMQ, for production use and high throughput.
  • kafka: Apache Kafka, for production use and high throughput and strong persistence.

Runner configuration

Runner is the interface used by Bizon to run the pipeline. It can be configured in the runner section of the configuration file. The following runners are supported:

  • thread (asynchronous)
  • process (asynchronous)
  • stream (synchronous)

Start syncing your data 🚀

Quick setup without any dependencies ✌️

Queue configuration can be set to python_queue and backend configuration to sqlite. This will allow you to test the pipeline without any external dependencies.

Local Kafka setup

To test the pipeline with Kafka, you can use docker compose to setup Kafka or Redpanda locally.

Kafka

docker compose --file ./scripts/kafka-compose.yml up # Kafka
docker compose --file ./scripts/redpanda-compose.yml up # Redpanda

In your YAML configuration, set the queue configuration to Kafka under engine:

engine:
  queue:
    type: kafka
    config:
      queue:
        bootstrap_server: localhost:9092 # Kafka:9092 & Redpanda: 19092

RabbitMQ

docker compose --file ./scripts/rabbitmq-compose.yml up

In your YAML configuration, set the queue configuration to Kafka under engine:

engine:
  queue:
    type: rabbitmq
    config:
      queue:
        host: localhost
        queue_name: bizon

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bizon-0.1.2.tar.gz (81.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bizon-0.1.2-py3-none-any.whl (133.7 kB view details)

Uploaded Python 3

File details

Details for the file bizon-0.1.2.tar.gz.

File metadata

  • Download URL: bizon-0.1.2.tar.gz
  • Upload date:
  • Size: 81.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.11.13 Linux/6.11.0-1018-azure

File hashes

Hashes for bizon-0.1.2.tar.gz
Algorithm Hash digest
SHA256 09a89686f954c56f77f1bc99538d4f9517b2f600df3a5383022cabec269f7161
MD5 2e7e8b24f21132abecdb4d93b7c02c30
BLAKE2b-256 d07f51312dccbbdfa46dc106a735017c7526252f13b1e321a6dc6e51386f9b67

See more details on using hashes here.

File details

Details for the file bizon-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: bizon-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 133.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.11.13 Linux/6.11.0-1018-azure

File hashes

Hashes for bizon-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 623d962ade5b568dbda86bab52546306d3efb161c69bd23a98478ec02c0f695e
MD5 888b15da66ef18b64f818c2dfc50e831
BLAKE2b-256 77266c40c5da2a3227295fd0ffee492ab4e40d6c0cc3d1b50fd3d7599c167321

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page