Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...) 

# transform
data['out'] = data['content'] 
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.50.tar.gz (6.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.50-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.50-cp313-cp313t-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.50-cp313-cp313-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.50-cp313-cp313-manylinux_2_28_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.50-cp313-cp313-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.50-cp313-cp313-macosx_11_0_arm64.whl (13.4 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.50-cp313-cp313-macosx_10_12_x86_64.whl (13.9 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.50-cp312-cp312-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.50-cp312-cp312-manylinux_2_28_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.50-cp312-cp312-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.50-cp312-cp312-macosx_11_0_arm64.whl (13.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.50-cp312-cp312-macosx_10_12_x86_64.whl (13.9 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.50-cp311-cp311-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.50-cp311-cp311-manylinux_2_28_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.50-cp311-cp311-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.50-cp311-cp311-macosx_11_0_arm64.whl (13.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.50-cp311-cp311-macosx_10_12_x86_64.whl (13.9 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.50.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.50.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.7

File hashes

Hashes for cocoindex-0.1.50.tar.gz
Algorithm Hash digest
SHA256 e82232c5398f8b8adba87c9a49b08a60dff0be30634dba99f5a9d08e4f492a35
MD5 78712ad26cd0bcb18772ce6f05ddc6c6
BLAKE2b-256 3867646a8cca74b20b64ab8f0b39cb8a330e0e906bec28148179c37010d98290

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 fcca342c7a85171ac8c0911571c1cefdebf1efaaf8a881afa824ea374f203cc3
MD5 363029b078f26fb162b1d7e2711209d1
BLAKE2b-256 e69135c0f248f91e946faa7259c8fffbab76014a84ff34dc6f94b51153984daa

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f1a9a29d9effbbbd527c27b7eacb4f9b124e29f08ed07508d81a197aefd24d18
MD5 acd8dffcb5083262262a614d2fb81d33
BLAKE2b-256 340f1be66b5de265eb4b2e2ac21e2161905aa86ad7d3d1c5612c8ab84f00e2d4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 bfc9589e15013e92ef35263271c18bb4c32d6001157126b947b571c9e4ccb255
MD5 b289814cbabe7ab328cba99db60c4fb8
BLAKE2b-256 3e95c8c37e15a714ee22306ee5de2d00b97550e15b0de22484a0e4369955ec58

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3da274fefdfd36e59779faa320c6606244167df53f5ca00f4d04b33eba90a16a
MD5 536d0d0437143490366a6d2066ef0844
BLAKE2b-256 a5d4e7f3c60516c7ced4d89c4d8bf20882e75295cd49fef29ce7ee71ea2e1b23

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ae0c9a052f013627fa8f18efa8f7cf08423ef586b9bbba966f88a918aaac55ec
MD5 135279f7222a2438c45fab86b8100bb9
BLAKE2b-256 dcabcbd0989400b7d77f364ee0d85738442908aa5692d121b5c32d4ac5ae0b1f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 82567d09a6ea9af48b571ad5612ebaf5b7fd5f7e08d3dc91dedfc632328df929
MD5 6719229817e294563c5f2c88ddb46690
BLAKE2b-256 ae361c8ce08edc9df77594e82009cf3127102ed3d61dc2a59ef1c2b5d9305a20

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9931a36731cb67dda872bc6760422ef5b20394467b703c1ffcfccebdc96ee13f
MD5 72672511a877786d90d069bd39b232c9
BLAKE2b-256 6f79adf1ac2f6a31ec6318d4606768c38d745da72ffe1c09ea5731aee18323a8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e5655f4fc9f23044dcdefc85b6f130ec9a3b0caa82d2e0d5877ab2c878ba9f73
MD5 36257d4da8e1d6fcf6a3bf6cb03a19cf
BLAKE2b-256 5e80552f066409a1700182ce9a5320322feefd6ff19126808afdad60e3edbeee

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 309ee841de2e5988df2b9f3604d7b69fc9194a0b1ebdb2dd4ee26ae1d94431d4
MD5 a498587f6c6ca08625f2cd573e0af292
BLAKE2b-256 3ca83c1624637d5fb72fea712630c6da431326fa50e2cab9b0b2727cae1195c5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 5264d52a24267e20ddacdf208c1f903fedec15c22b905ebe47fb4fde9b6f4dd5
MD5 fe8970eead2dc1c6a6f44847ae4a250a
BLAKE2b-256 59163beb97a9d7a3f450b09d117e15cbda85fa89f2166ee16152a23ebda1eaf2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c30acf0860950826c27ad9bc894517bafb7097210d04c15213e8bb15ef88dcb9
MD5 8172545393d94efff6eb4d5031671078
BLAKE2b-256 436ab65fcbe938a1bace2cee4d77f8314f99a867fa846860f45bb2d8d4826e6f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 10a47cff8aece9d46ae106f3c8e69fab8537c8197223e0b59cffd3a05ece1fe5
MD5 a24372c9f2b267f55e1bda1974cc3d4d
BLAKE2b-256 72cf9166e4c9bf754c3211803b7ba4e26bd39714a3427301b6c7b2498786e380

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 61725b1bd9206e7356c79d2f641d69619d29205b5ef2eec2f6233144592de3fd
MD5 ae104eab258d877b60ad7f8011baad5f
BLAKE2b-256 5fa0ebe63f8ee41607c2b96e219e98a98712f20bce1813a684c7fb62da81b602

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 87a7990fb85c9a9bee867920d9a5c3eba90d63d3de9df7d01574984de9274a66
MD5 856cb892c9c0bce914cbb2198719467e
BLAKE2b-256 26072ae709492a675b59f646fefbfbfe19bc7fed04f6b9156103d4ca26b3c647

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 88e87e8844ca94e8b8b92adb7d146f064149d2481faa80127da45525f8e71206
MD5 b965412d0612cf05b6147aebc57d3145
BLAKE2b-256 3428b5f17f4aa2ac614ac3e27624af84c0fe1e52a1f332d36a2dd95d0d8059c3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d7b17c8fdff3a90b46663df3d479ec64599a093d9b86a0a77550aebdeed9d42e
MD5 e272afb74ece8bcd6cbfbd201e764b0f
BLAKE2b-256 24cf3a29fe46d42519e82d8f43f5ef28e1101dcda7f09f20425130ba5181f958

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.50-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.50-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 86dd7ad85c07ed637f09bf853de14b50d12ae68c877139540984de8c8515eb6f
MD5 6ceca83252de40dbf3d39af63b3822a8
BLAKE2b-256 aa17fb4444827660e203dcf962432a767da6c18a591f37e6f8ab21373564832e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page