retake

Open Source Infrastructure for Vector Data Streams

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Open Source Infrastructure for Vector Data Streams
Data pipelines that synchronize vectors with their sources of truth

Documentation • Website

Installation

Welcome! If you are not a contributor and just want to use Retake, please proceed to the main branch.

To install the Retake Python SDK:

pip install retake

Follow the documentation for usage instructions.

Key Features

:arrows_counterclockwise: Out-of-the-Box Data Sync

Existing vector stores are siloes that require complex and sometimes brittle mechanisms for data synchronization. Retake provides the missing connectors that allow seamless data synchronization without the need for extensive configuration or third-party tools.

:rocket: True Real-Time Updates

Retake's connectors achieve sub-10ms end-to-end data latency, excluding variable model inference times.

:link: Extensible Python SDK

You can configure any source, sink, transformation, and embedding model as code. Joining and filtering tables or adding metadata is easily done from Python functions.

:zap: Scalable and Efficient

Built on top of Kafka, Retake is designed to handle large volumes of data and high-throughput workloads.

:globe_with_meridians: Deployable Anywhere

You can run Retake anywhere, from your laptop to a distributed cloud system.

Development

If you are a developer who wants to contribute to Retake, follow these instructions to run Retake locally.

Python SDK

The Python SDK enables users to define and configure vector data pipelines and is responsible for all batch ETL jobs. To develop and run the Python SDK locally:

Install Poetry

curl -sSL https://install.python-poetry.org | python -

Install dependencies

poetry install

Build the SDK locally

poetry build

This command will build and install the retake SDK locally. You can now import retake from a Python environment.

Real-Time Server

Built on top of Kafka, the real-time server sits between source(s) and sink(s). It is responsible for all real-time data streams.

Ensure that Docker and Docker Compose are installed.
Ensure that Poetry and dependencies are installed (see Python SDK instructions above).
Start the development server, which is composed of the Kafka broker, Kafka Connect and the schema registry. Docker Compose will expose a port for each of the services (see docker-compose.yml for details).

docker compose up

To connect to the development server, refer to the documentation.

Contributing

For more information on how to contribute, please see our Contributing Guide.

Licensing

Retake is Elastic License 2.0 licensed.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.14

Jul 26, 2023

0.1.13

Jul 26, 2023

0.1.12

Jul 25, 2023

0.1.11

Jul 24, 2023

0.1.10

Jul 24, 2023

0.1.9

Jul 18, 2023

0.1.8

Jul 16, 2023

0.1.6.dev0 pre-release

Jul 6, 2023

0.1.5.dev0 pre-release

Jul 6, 2023

0.1.4.dev0 pre-release

Jul 5, 2023

0.1.3.dev0 pre-release

Jul 5, 2023

0.1.2.dev0 pre-release

Jul 5, 2023

0.1.1.dev0 pre-release

Jul 5, 2023

0.1.0.dev0 pre-release

Jul 5, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retake-0.1.14.tar.gz (18.8 kB view hashes)

Uploaded Jul 26, 2023 Source

Built Distribution

retake-0.1.14-py3-none-any.whl (27.4 kB view hashes)

Uploaded Jul 26, 2023 Python 3

Hashes for retake-0.1.14.tar.gz

Hashes for retake-0.1.14.tar.gz
Algorithm	Hash digest
SHA256	`7a58762b9a7ba1e0cc0fcda577847b215282d80773a79a53dcd71cc8dd1cffb8`
MD5	`b3df49bcf221760d1c7f869d79742fa6`
BLAKE2b-256	`54ab069dd42ad9cb1c4f47eddb7b4d06cf6e9823074e6d37fb71d29894bdb883`

Hashes for retake-0.1.14-py3-none-any.whl

Hashes for retake-0.1.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bf5237bbfa6eae1d7873e4c53b1577227f6ecb3384a8a9a8584388f8eb0cfdbc`
MD5	`e346fe7ba597d203884de420ce3732a2`
BLAKE2b-256	`229989a4aa4b6d140e34290b17d488fbbaf6d7624d78dbe39bac42188551efee`