Open Source Infrastructure for Vector Data Streams
Project description
Open Source Infrastructure for Vector Data Streams
Data pipelines that synchronize vectors with their sources of truth
Documentation • Website
Installation
Welcome! If you are not a contributor and just want to use Retake, please proceed to the main branch.
To install the Retake Python SDK:
pip install retake
Follow the documentation for usage instructions.
Key Features
:arrows_counterclockwise: Out-of-the-Box Data Sync
Existing vector stores are siloes that require complex and sometimes brittle mechanisms for data synchronization. Retake provides the missing connectors that allow seamless data synchronization without the need for extensive configuration or third-party tools.
:rocket: True Real-Time Updates
Retake's connectors achieve sub-10ms end-to-end data latency, excluding variable model inference times.
:link: Extensible Python SDK
You can configure any source, sink, transformation, and embedding model as code. Joining and filtering tables or adding metadata is easily done from Python functions.
:zap: Scalable and Efficient
Built on top of Kafka, Retake is designed to handle large volumes of data and high-throughput workloads.
:globe_with_meridians: Deployable Anywhere
You can run Retake anywhere, from your laptop to a distributed cloud system.
Development
If you are a developer who wants to contribute to Retake, follow these instructions to run Retake locally.
Python SDK
The Python SDK enables users to define and configure vector data pipelines and is responsible for all batch ETL jobs. To develop and run the Python SDK locally:
- Install Poetry
curl -sSL https://install.python-poetry.org | python -
- Install dependencies
poetry install
- Build the SDK locally
poetry build
This command will build and install the retake
SDK locally. You can now
import retake
from a Python environment.
Real-Time Server
Built on top of Kafka, the real-time server sits between source(s) and sink(s). It is responsible for all real-time data streams.
-
Ensure that Docker and Docker Compose are installed.
-
Ensure that Poetry and dependencies are installed (see Python SDK instructions above).
-
Start the development server, which is composed of the Kafka broker, Kafka Connect and the schema registry. Docker Compose will expose a port for each of the services (see
docker-compose.yml
for details).
docker compose up
- To connect to the development server, refer to the documentation.
Contributing
For more information on how to contribute, please see our Contributing Guide.
Licensing
Retake is Elastic License 2.0 licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file retake-0.1.14.tar.gz
.
File metadata
- Download URL: retake-0.1.14.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1041-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a58762b9a7ba1e0cc0fcda577847b215282d80773a79a53dcd71cc8dd1cffb8 |
|
MD5 | b3df49bcf221760d1c7f869d79742fa6 |
|
BLAKE2b-256 | 54ab069dd42ad9cb1c4f47eddb7b4d06cf6e9823074e6d37fb71d29894bdb883 |
File details
Details for the file retake-0.1.14-py3-none-any.whl
.
File metadata
- Download URL: retake-0.1.14-py3-none-any.whl
- Upload date:
- Size: 27.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1041-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf5237bbfa6eae1d7873e4c53b1577227f6ecb3384a8a9a8584388f8eb0cfdbc |
|
MD5 | e346fe7ba597d203884de420ce3732a2 |
|
BLAKE2b-256 | 229989a4aa4b6d140e34290b17d488fbbaf6d7624d78dbe39bac42188551efee |