llama-index Ray ingestion pipeline
Project description
LlamaIndex Ingestion: Ray
A Scalable LlamaIndex ingestion pipeline powered by Ray.
This integration uses Ray’s distributed compute framework to parallelize document transformations (parsing, chunking, and embedding), enabling high-throughput processing for large-scale datasets.
Installation
pip install llama-index-integrations-ray
Usage
Distribute the workload across your Ray cluster by wrapping transformations in RayTransformComponent objects and passing them to RayIngestionPipeline.
import ray
from llama_index.core import Document
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.extractors import TitleExtractor
from llama_index.ingestion.ray import (
RayIngestionPipeline,
RayTransformComponent,
)
# Start a new cluster (or connect to an existing one, see https://docs.ray.io/en/latest/ray-core/configure.html)
ray.init()
# Create transformations
transformations = [
RayTransformComponent(SentenceSplitter, chunk_size=25, chunk_overlap=0),
RayTransformComponent(
transform_class=TitleExtractor,
map_batches_kwargs={
"batch_size": 10, # Define the batch size
# "num_cpus": 4 # The number of CPUs to reserve for each parallel map worker.
# "num_gpus": 1 # The number of GPUs to reserve for each parallel map worker.
# See https://docs.ray.io/en/latest/data/api/doc/ray.data.Dataset.map_batches.html for all the available parameters
},
),
RayTransformComponent(
transform_class=OpenAIEmbedding,
map_batches_kwargs={
"batch_size": 10,
},
),
]
# Create the Ray ingestion pipeline
pipeline = RayIngestionPipeline(transformations=transformations)
# Run the pipeline with many documents
nodes = pipeline.run(documents=[Document.example()] * 10)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_index_ingestion_ray-0.1.0.tar.gz.
File metadata
- Download URL: llama_index_ingestion_ray-0.1.0.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e50c2cb65f96843b8ee2470dec60e5df41d175808c4ff4219be0373e054d329
|
|
| MD5 |
8199a34b965c4a4478f70d41a561842d
|
|
| BLAKE2b-256 |
79a3eba0d4208a408554d29d78d032020e68e4c751d7a9f072eecbc74fb7c7ea
|
File details
Details for the file llama_index_ingestion_ray-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llama_index_ingestion_ray-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3649802c72ea0a23399adcd229d192eb4ef23ecee7e4d5219354f35b5192f1e9
|
|
| MD5 |
7eb19ad5781b2beb16b6dd6cf9311436
|
|
| BLAKE2b-256 |
8fc390c39ca22281450eb6c816dbfbfb405c980bee3b177c0599731a1ff1a826
|