Python Client for Indexify

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

diptanuc lucaspy

These details have not been verified by PyPI

Project description

Indexify Python SDK

This is the Python SDK to build real-time continuously running unstructured data processing pipelines with Indexify.

Start by writing and testing your pipelines locally using your data, then deploy them into the Indexify service to process data in real-time at scale.

Installation

pip install indexify

Examples

PDF Document Extraction

Extracts text, tables and images from an ingested PDF file
Indexes the text using MiniLM-L6-v2, the images with CLIP
Writes the results into a vector database.

Youtube Transcription Summarizer

Downloads Youtube Video
Extracts audio from the video and transcribes using Faster Whisper
Uses Llama 3.1 backed by Llama.cpp to understand and classify the nature of the video.
Routes the transcription dynamically to one of the transcription summarizer to retain specific summarization attributes.
Finally the entire transcription is embedded and stored in a vector database for retrieval.

Quick Start

Write data processing functions in Python and use Pydantic objects for returning complex data types from functions
Connect functions using a graph interface. Indexify automatically stores function outputs and passes them along to downstream functions.
If a function returns a list, the downstream functions will be called with each item in the list in parallel.
The input of the first function becomes the input to the HTTP endpoint of the Graph.

Functional Features

There is NO limit to volume of data being ingested since we use blob stores for storing metadata and objects
The server can handle 10s of 1000s of files being ingested into the graphs in parallel.
The scheduler reacts under 8 microseconds to ingestion events, so it's suitable for workflows which needs to run in realtime.
Batch ingestion is handled gracefully by batching ingested data and scheduling for high throughput in production settings.

from pydantic import BaseModel
from indexify import indexify_function
from typing import Dict, Any, Optional, List

# Define function inputs and outputs
class Document(BaseModel):
    text: str
    metadata: Dict[str, Any]

class TextChunk(BaseModel):
    text: str
    metadata: Dict[str, Any]
    embedding: Optional[List[float]] = None


# Decorate a function which is going to be part of your data processing graph
@indexify_function()
def split_text(doc: Document) -> List[TextChunk]:
    midpoint = len(doc.text) // 2
    first_half = TextChunk(text=doc.text[:midpoint], metadata=doc.metadata)
    second_half = TextChunk(text=doc.text[midpoint:], metadata=doc.metadata)
    return [first_half, second_half]

# Any requirements specified is automatically installed in production clusters
@indexify_function(requirements=["langchain_text_splitter"])
def compute_embedding(chunk: TextChunk) -> TextChunk:
    chunk.embedding = [0.1, 0.2, 0.3]
    return chunk

# You can constrain functions to run on specific executors 
@indexify_function(executor_runtime_name="postgres-driver-image")
def write_to_db(chunk: TextChunk):
    # Write to your favorite vector database
    ...

## Create a graph
from indexify import Graph

g = Graph(name="my_graph", start_node=split_text)
g.add_edge(split_text, compute_embedding)
g.add_edge(embed_text, write_to_db)

Graph Execution

Every time the Graph is invoked, Indexify will provide an Invocation Id which can be used to know about the status of the processing and any outputs from the Graph.

Run the Graph Locally

from indexify import IndexifyClient

client = IndexifyClient(local=True)
client.register_graph(g)
invocation_id = client.invoke_graph_with_object(g.name, Document(text="Hello, world!", metadata={"source": "test"}))
graph_outputs = client.graph_outputs(g.name, invocation_id)

Deploy the Graph to Indexify Server for Production

Work In Progress - The version of server that works with python based graphs haven't been released yet. It will be shortly released. Join discord for development updates.

from indexify import IndexifyClient

client = IndexifyClient(service_url="http://localhost:8900")
client.register_graph(g)

Ingestion into the Service

Extraction Graphs continuously run on the Indexify Service like any other web service. Indexify Server runs the extraction graphs in parallel and in real-time when new data is ingested into the service.

output_id = client.invoke_graph_with_object(g.name, Document(text="Hello, world!", metadata={"source": "test"}))

Retrieve Graph Outputs for a given ingestion object

graph_outputs = client.graph_outputs(g.name, output_id)

Retrieve All Graph Inputs

graph_inputs = client.graph_inputs(g.name)

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

diptanuc lucaspy

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.31

Nov 21, 2024

0.2.30

Nov 21, 2024

0.2.29

Nov 18, 2024

0.2.28

Nov 18, 2024

0.2.27

Nov 10, 2024

0.2.26

Nov 9, 2024

0.2.25

Nov 9, 2024

0.2.24

Nov 6, 2024

0.2.23

Nov 2, 2024

0.2.22

Oct 26, 2024

0.2.21

Oct 25, 2024

0.2.20

Oct 24, 2024

0.2.19

Oct 24, 2024

0.2.18

Oct 23, 2024

0.2.17

Oct 19, 2024

This version

0.2.16

Oct 17, 2024

0.2.15

Oct 16, 2024

0.2.14

Oct 16, 2024

0.2.13

Oct 12, 2024

0.2.12

Oct 10, 2024

0.2.11

Oct 9, 2024

0.2.10

Oct 5, 2024

0.2.9

Oct 5, 2024

0.2.8

Oct 5, 2024

0.2.6

Oct 4, 2024

0.2.5

Oct 3, 2024

0.2.4

Oct 3, 2024

0.2.3

Sep 30, 2024

0.2.2

Sep 30, 2024

0.2.1

Sep 30, 2024

0.2

Sep 29, 2024

0.0.43

Aug 24, 2024

0.0.42

Aug 24, 2024

0.0.39

Aug 19, 2024

0.0.37

Aug 14, 2024

0.0.36

Aug 13, 2024

0.0.35

Aug 3, 2024

0.0.34

Jul 23, 2024

0.0.33

Jul 23, 2024

0.0.32

Jul 21, 2024

0.0.31

Jul 19, 2024

0.0.29

Jul 1, 2024

0.0.27

Jun 16, 2024

0.0.26

Jun 14, 2024

0.0.25

Jun 11, 2024

0.0.24

May 31, 2024

0.0.23

May 30, 2024

0.0.22

May 24, 2024

0.0.21

May 13, 2024

0.0.20

May 7, 2024

0.0.16

Apr 23, 2024

0.0.15

Apr 20, 2024

0.0.14

Apr 14, 2024

0.0.13

Mar 27, 2024

0.0.12

Mar 22, 2024

0.0.11

Mar 8, 2024

0.0.10

Feb 26, 2024

0.0.9

Feb 17, 2024

0.0.8

Feb 16, 2024

0.0.7

Feb 14, 2024

0.0.6

Feb 14, 2024

0.0.5

Feb 13, 2024

0.0.4

Jan 16, 2024

0.0.3

Aug 19, 2023

0.0.2

Jul 1, 2023

0.0.1

May 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indexify-0.2.16.tar.gz (30.9 kB view details)

Uploaded Oct 17, 2024 Source

Built Distribution

indexify-0.2.16-py3-none-any.whl (40.2 kB view details)

Uploaded Oct 17, 2024 Python 3

File details

Details for the file indexify-0.2.16.tar.gz.

File metadata

Download URL: indexify-0.2.16.tar.gz
Upload date: Oct 17, 2024
Size: 30.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for indexify-0.2.16.tar.gz
Algorithm	Hash digest
SHA256	`49db3d56c5720eb19ab455464ce0bda9253394cc2bdb2f059e711b94e137221e`
MD5	`7512e074b8ce911db9bddfde76212320`
BLAKE2b-256	`bb86a33a0627475efffcd4eb54f76b1beadb5fc0362efb797f82717db97598d4`

See more details on using hashes here.

File details

Details for the file indexify-0.2.16-py3-none-any.whl.

File metadata

Download URL: indexify-0.2.16-py3-none-any.whl
Upload date: Oct 17, 2024
Size: 40.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for indexify-0.2.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7dba72a08287e2b84200a9a0780d308ff037a9e3a7b048fe764b111295015b4b`
MD5	`558f9cdfba7ddc1fba9e8223e77fddc7`
BLAKE2b-256	`c9303fec7e526da4f0aa5f73f8f4ff3e2efd10e3f1bb7d98ef6adc2f40391252`