Skip to main content

Open Source Infrastructure for Vector Data Streams

Project description

Retake

Open Source Infrastructure for Vector Data Streams
Data pipelines that synchronize vectors with their sources of truth

DocumentationWebsite

Test Release License

Installation

Welcome! If you are not a contributor and just want to use Retake, please proceed to the main branch.

To install the Retake Python SDK:

pip install retake

Follow the documentation for usage instructions.

Key Features

:arrows_counterclockwise: Out-of-the-Box Data Sync

Existing vector stores are siloes that require complex and sometimes brittle mechanisms for data synchronization. Retake provides the missing connectors that allow seamless data synchronization without the need for extensive configuration or third-party tools.

:rocket: True Real-Time Updates

Retake's connectors achieve sub-10ms end-to-end data latency, excluding variable model inference times.

:link: Extensible Python SDK

You can configure any source, sink, transformation, and embedding model as code. Joining and filtering tables or adding metadata is easily done from Python functions.

:zap: Scalable and Efficient

Built on top of Kafka, Retake is designed to handle large volumes of data and high-throughput workloads.

:globe_with_meridians: Deployable Anywhere

You can run Retake anywhere, from your laptop to a distributed cloud system.

Development

If you are a developer who wants to contribute to Retake, follow these instructions to run Retake locally.

Python SDK

The Python SDK enables users to define and configure vector data pipelines and is responsible for all batch ETL jobs. To develop and run the Python SDK locally:

  1. Install Poetry
curl -sSL https://install.python-poetry.org | python -
  1. Install dependencies
poetry install
  1. Build the SDK locally
poetry build

This command will build and install the retake SDK locally. You can now import retake from a Python environment.

Real-Time Server

Built on top of Kafka, the real-time server sits between source(s) and sink(s). It is responsible for all real-time data streams.

  1. Ensure that Docker and Docker Compose are installed.

  2. Ensure that Poetry and dependencies are installed (see Python SDK instructions above).

  3. Start the development server, which is composed of the Kafka broker, Kafka Connect and the schema registry. Docker Compose will expose a port for each of the services (see docker-compose.yml for details).

docker compose up
  1. To connect to the development server, refer to the documentation.

Contributing

For more information on how to contribute, please see our Contributing Guide.

Licensing

Retake is Elastic License 2.0 licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retake-0.1.14.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

retake-0.1.14-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file retake-0.1.14.tar.gz.

File metadata

  • Download URL: retake-0.1.14.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1041-azure

File hashes

Hashes for retake-0.1.14.tar.gz
Algorithm Hash digest
SHA256 7a58762b9a7ba1e0cc0fcda577847b215282d80773a79a53dcd71cc8dd1cffb8
MD5 b3df49bcf221760d1c7f869d79742fa6
BLAKE2b-256 54ab069dd42ad9cb1c4f47eddb7b4d06cf6e9823074e6d37fb71d29894bdb883

See more details on using hashes here.

File details

Details for the file retake-0.1.14-py3-none-any.whl.

File metadata

  • Download URL: retake-0.1.14-py3-none-any.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1041-azure

File hashes

Hashes for retake-0.1.14-py3-none-any.whl
Algorithm Hash digest
SHA256 bf5237bbfa6eae1d7873e4c53b1577227f6ecb3384a8a9a8584388f8eb0cfdbc
MD5 e346fe7ba597d203884de420ce3732a2
BLAKE2b-256 229989a4aa4b6d140e34290b17d488fbbaf6d7624d78dbe39bac42188551efee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page