Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Reason this release was yanked:

bug

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...) 

# transform
data['out'] = data['content'] 
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.43.tar.gz (5.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.43-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.43-cp313-cp313t-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.43-cp313-cp313-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.43-cp313-cp313-manylinux_2_28_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.43-cp313-cp313-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.43-cp313-cp313-macosx_11_0_arm64.whl (13.3 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.43-cp313-cp313-macosx_10_12_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.43-cp312-cp312-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.43-cp312-cp312-manylinux_2_28_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.43-cp312-cp312-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.43-cp312-cp312-macosx_11_0_arm64.whl (13.3 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.43-cp312-cp312-macosx_10_12_x86_64.whl (13.7 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.43-cp311-cp311-win_amd64.whl (13.3 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.43-cp311-cp311-manylinux_2_28_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.43-cp311-cp311-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.43-cp311-cp311-macosx_11_0_arm64.whl (13.3 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.43-cp311-cp311-macosx_10_12_x86_64.whl (13.8 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.43.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.43.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.6

File hashes

Hashes for cocoindex-0.1.43.tar.gz
Algorithm Hash digest
SHA256 ee2c907a6a252907a9a6f1ebed4e51937604abc63389fcbc54f5398d5e4b4b7f
MD5 5ac1b6cb594f41f639e120c6eeccae11
BLAKE2b-256 a4a40d853e8b975dfb29aa73a39546629f681a7b8bd4b96e7da027a8d01d8a0a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b0eda29df9280c2be8a4147b13d37b34408883c21e1389e1f9eab2d28c607abc
MD5 817f7286f4b410a52fae01436e970655
BLAKE2b-256 df973ae336843083831343dd4b99efc88ae58d2afd53e8e8c0e0c177f6179697

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e44d0af7f24ee9ff68798e5c73c33179abc1a4b41215096c5e9f8e9929a70fb5
MD5 866bb277e4b9f274d0ddbc059bd25c39
BLAKE2b-256 c5dd9e406ac33250f9d7873f231e8eb32fae9daf9b5f462a0855a47bce669d5a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 c5bc1a407d6291101378c527880ad50bf33b81f5829d9fe681bd7d0f69451f7e
MD5 bf17608c321e0fda5b3887750728d642
BLAKE2b-256 e563f7be02c3a2f018201a41ca6f14c9dc2f24a4d6e4ddb38829233c4c8f0c54

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c49b4a55faa1cd638262fd0aa9668603d39a103cab858536f0d647515b4a1c2b
MD5 8f3befa0bc69d0450d12765c0871d7a0
BLAKE2b-256 abe1375cf65da61a61164745a5b9337e76118e09e467deba2754878fdc0a6645

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 21bf1a7963099b1590baf7ee86ad7caad70ef14e4df15e62c580d01faaf55de7
MD5 819cdc7717416d08bad041b58b8edffa
BLAKE2b-256 ed455abfa61362956e6727724e8410c3c0a8eb398407ecae01cf5884be0087c5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ef55c7a66f96da5fb69848d57b57511b9557ab910462b1954e9f13d2ea9f92ef
MD5 cffa1daca3ba0b6cb22d07197e28c573
BLAKE2b-256 cbe31ca29ba78193d2119b76f38ace7d44d7fb6a52fcbfe0d481407ae0485e4c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4fa3a78fc355d8505566458a8ccb61e54158bf841bd033c9dde1cf7a87b46d99
MD5 4a69dc60b49f038db743c2e9265ad1a1
BLAKE2b-256 507d1b3054cf215591b9f0d76fd4f85f236615c06864a2887fa6b33f1e9eada6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 bfc9dd4a78cd66793c823bb6689c592207867478e845e4e86a3eb2c37dfab9d9
MD5 b15798db109bdb922f19d2fbe852237f
BLAKE2b-256 aa8417b26dfaa718da976edd523ea52e2811ad22267d464f7be16a5c2f72a55e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2ad6a7b9ac5c29cef8994d69aa6e4a5b9e8eb466635ac0de28b385b8cd205cb4
MD5 75ba149c43f315dd51e2a040a9d25c83
BLAKE2b-256 3b1f7c356eaa64f5bb416a3d5eb25dc9eaf20191340034496c04f49f3d3468ba

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8057c151f03e7330327e89207395e7fb25583c2ec2c8afb70b289ffb29a90761
MD5 3616422fe2de0a65fc6a71ce8cdc7b3e
BLAKE2b-256 cb867a5122968eb60b13d0eafd6541fd8ea01e6aa3a274a7076180f831b8200b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0422d5cebf9b9b42202def601cc060e8506a74a9e7ddbb66e034e7a13ee8ca22
MD5 b5474919af59be97e77339cc6cebfc76
BLAKE2b-256 805027b0084d367124d22745af28d25ab76970b0b726b26373084fe1526acb6b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 61d172a6f6db60f02330954c133b7e5bdd3f8b7b7003d2518a976641a52f6935
MD5 efe2e4d53c882914e1a7a4e1abdc5e59
BLAKE2b-256 fb980e5eb0652880f590c40b9ce2b1b972a7f49847c0ee9db69cfd7ef7bbc85d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 c9ac3f2c70a2107811cd33579dd0a8815ce3f9cd997b972333b5b261b3b85309
MD5 a6b2ac04f1e79d3cfb09172b37583510
BLAKE2b-256 217f0629ed41aae2c8508a24a570888e89678376d566cbd9d0e88a2c2298770d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ffcb63920d11ce255c5253ace66f6f3e1c5fd3b339c2455b1fdbd50552336ad9
MD5 ae5c8f7fd9d67128cfcdba45ab8f4e2f
BLAKE2b-256 7e53c4bf4ff412e92fdb7838ee9da19fb40af8de956c5c647e6e1e349b77001a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 43a8cd2a36dd23f4a04fac84acc9d58bfc7037aa07ddd076faea8b5ad59833d8
MD5 c2b678d6ee01fd85b69257fddcd806b2
BLAKE2b-256 387edc9e3f84d7e42419b24c7e9f22b57b50811b6fd00f97bdc5717ef156ae05

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6750cc39c8b2576d8c1b42fe042ffb7fcfeb38fc9676b0b44bf33cb413990949
MD5 6c2638ae51496a00fb5688b56d39b51d
BLAKE2b-256 989afb25769cc6159b7ddf010c500c331aeb8393597dc3cf6fa285ffbb18dfee

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.43-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.43-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 45efabcc8759df1cc4d7d976ab212dd02aa05665110948c528d048a7288076e7
MD5 d890281be57d560b3c711897535daf5c
BLAKE2b-256 ee33bc803222555ec00fff591435962a1670f16f1cab9db07e855fdf7b43bf71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page