Skip to main content

Open Source Infrastructure for Vector Data Streams

Project description

Retake

Open Source Infrastructure for Vector Data Streams
Data pipelines that synchronize vectors with their sources of truth

DocumentationWebsite

Test Release License

Installation

Welcome! If you are not a contributor and just want to use Retake, please proceed to the main branch.

To install the Retake Python SDK:

pip install retake

Follow the documentation for usage instructions.

Key Features

:arrows_counterclockwise: Out-of-the-Box Data Sync

Existing vector stores are siloes that require complex and sometimes brittle mechanisms for data synchronization. Retake provides the missing connectors that allow seamless data synchronization without the need for extensive configuration or third-party tools.

:rocket: True Real-Time Updates

Retake's connectors achieve sub-10ms end-to-end data latency, excluding variable model inference times.

:link: Extensible Python SDK

You can configure any source, sink, transformation, and embedding model as code. Joining and filtering tables or adding metadata is easily done from Python functions.

:zap: Scalable and Efficient

Built on top of Kafka, Retake is designed to handle large volumes of data and high-throughput workloads.

:globe_with_meridians: Deployable Anywhere

You can run Retake anywhere, from your laptop to a distributed cloud system.

Development

If you are a developer who wants to contribute to Retake, follow these instructions to run Retake locally.

Python SDK

The Python SDK enables users to define and configure vector data pipelines and is responsible for all batch ETL jobs. To develop and run the Python SDK locally:

  1. Install Poetry
curl -sSL https://install.python-poetry.org | python -
  1. Install dependencies
poetry install
  1. Build the SDK locally
poetry build

This command will build and install the retake SDK locally. You can now import retake from a Python environment.

Real-Time Server

Built on top of Kafka, the real-time server sits between source(s) and sink(s). It is responsible for all real-time data streams.

  1. Ensure that Docker and Docker Compose are installed.

  2. Ensure that Poetry and dependencies are installed (see Python SDK instructions above).

  3. Start the development server, which is composed of the Kafka broker, Kafka Connect and the schema registry. Docker Compose will expose a port for each of the services (see docker-compose.yml for details).

docker compose up
  1. To connect to the development server, refer to the documentation.

Contributing

For more information on how to contribute, please see our Contributing Guide.

Licensing

Retake is Elastic License 2.0 licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retake-0.1.14.tar.gz (18.8 kB view hashes)

Uploaded Source

Built Distribution

retake-0.1.14-py3-none-any.whl (27.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page