Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.74.tar.gz (10.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.74-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.74-cp313-cp313t-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.74-cp313-cp313-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.74-cp313-cp313-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.74-cp313-cp313-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.74-cp313-cp313-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.74-cp313-cp313-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.74-cp312-cp312-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.74-cp312-cp312-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.74-cp312-cp312-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.74-cp312-cp312-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.74-cp312-cp312-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.74-cp311-cp311-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.74-cp311-cp311-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.74-cp311-cp311-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.74-cp311-cp311-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.74-cp311-cp311-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.74.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.74.tar.gz
  • Upload date:
  • Size: 10.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.74.tar.gz
Algorithm Hash digest
SHA256 4cd27956401d5bf38999341fb11203543435f819d4c80f09dedc9a7c852446db
MD5 8e3f7c33f8c70269eece70991c5db60e
BLAKE2b-256 9f6b58affe7f398238cda2474034f4eac41a5a271f4e459177710c9ef1d6c6cd

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 26317f06367a52a88cb868bff2042e574f427d55134f7df181bfc6ff88a99ca9
MD5 75c9b89c1be6bda3166f5045938ed4f7
BLAKE2b-256 7a80bb1602f7fdb2df153827d61e56ff36978e245175b06324a82d1c1994ba9e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8cfc927fd9d32b2ead8de7bc268e61be44d64f881f24514f3de4b18d36b7e7a3
MD5 838e86462849476c684cfd248c6c1b0f
BLAKE2b-256 2b1b56c4446ad0918e3141e4d6cdde6fbad66ffa926aee855e5c1928d873c692

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 f810d175e5cbe09e781024116ac50628baa0c0480c02966e6bf1fd5f72522749
MD5 95b6fafcf1b739f75659f413e8beb038
BLAKE2b-256 1898c6ef3216c2c08b0c4adca72d4f8ae447e69cf5ae201799fba1b1a4f1af60

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4ff9d19f2d462d72580aaa37ef3a2a8e13175d6029d8ee176201bbae7d4e46d8
MD5 dc469ff353cdfcfae3605ace8878cec9
BLAKE2b-256 89b5f9da5e7b58a8897f4dec59e37f34eb2424da10c62a5be3af8d039fa7bf84

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 20e4a27b79e0550412010f5a1d7146c1669c9ac908c59b4b80eae85633d02f1d
MD5 66a2d5ad7e5f011004df951d0550692e
BLAKE2b-256 e2abbe27557826d6752e18f3d04b32d32b978a33a09b9a6ee6c07f4e8477a4fa

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 54be071671320b0f9bba7fb2b174e9eb97733730408f0c71760d33e1a0de47e9
MD5 f5b341df46067b80a55eb71f39c752c6
BLAKE2b-256 fa4c165424fe48407567862cc6f28cbeab459f5eea972d1fb1ef58e63724aabc

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 88aff5351ada816474c4aeea88ac454c490b48c678f51d1f04dca6fb1641d6c7
MD5 b6096ef66d1be45e0010da634d4ea4b0
BLAKE2b-256 17bbe7638d702a9ec2b9c210e97b3aa5e66b2fd8e2e31172214eaa6d5565974c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 3ebfe188b9162a08d672e04a55049659b96da87468a8d4c18d35a56c424c24db
MD5 184d241ce59fe0aeb46cd3529a2fa4d9
BLAKE2b-256 52eb7acd46ecf75e3a22d86d74c6571e1014d62307b106df77fc601c75a2ce5d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2ee416c8b2b52178d13d90464f1b3269c362c81c88f61c39cd07546e4be199c8
MD5 3001bdaeb77ef47c949146f6f937aea4
BLAKE2b-256 ffc0fabf6d0c9b55b3d2ce662b5d228fde47904095787061e1be3646869c2f36

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 85437e4f07bc8e60a5ef55f8d7ac264a42972da374571e18b04f69977be1d738
MD5 987507a14d2c92c1b7586e95719b8cb6
BLAKE2b-256 094f67062ce3624aac691107575cdc9c728e9a7551d224b81e974de5fa8ac4ae

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e821ba155e9051d3e29f88c2cf281af10633b34ac0cd032d0aab39c899a32539
MD5 164d675c6ce08ac34577fe13df6efada
BLAKE2b-256 170ae811474f3b853d8e192ac14be47def3caf5ae59d44c5f44f7c27a11cc044

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 eb69c5492a5a008391a3a6b86883720c7499d5c6d9c3fa1707b696cb54913bea
MD5 aa68509f6858e578227e30ffa979de3e
BLAKE2b-256 5b9a3305a8c20358bc3d477316e32f73285eeff93e46f9baa4caccbcf8c756f0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 60fff19e23abeba8e4b9a8469eff01b0b06ffe2daffa01123adc6a6f6539928f
MD5 0f56f3cbbc9a0f3ab0d566d72502533e
BLAKE2b-256 1cecf981bef3aad7ae7723f974caa643f4466e89de68393e524604eeac00ab37

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6b1a30fd12fc47a1ab6046f9f8e2d0b17bb961185bba38f75d1094f82825c2f7
MD5 06f2ab4657032eb9bbe4f128b26e0eba
BLAKE2b-256 3280e961b8813524dd77db8a268b8c462cc24d45d5779e2cf793ed702c7a094d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 dce7ca3a163042e2531662f3bd4d53dfba2cfa7e021d4529842fb9b5646e9b53
MD5 624cf21aa4f6d19bb6b082062728a543
BLAKE2b-256 56ab917c2fd96b03b2e77c3f1353e2c7d8c514c26ca471dcf9a9cee8c09f5e4e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c941a64279eec1f1a356fe2a131e9e57a365c3f79e4f1b003de585b8ef740c99
MD5 59cfb0ea8e1db17d0011cc7904eec2e7
BLAKE2b-256 e420ccfa9882fe5d8cf33d26a91259ad6b4f669c5bdd145623cdd00c9b81871c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.74-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.74-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 964722ad8117c4be5e19eccae09566e7bc92741545b42bdc295a7452d514e540
MD5 3bd9c0bb899fd27a41342d66afd635ce
BLAKE2b-256 b06f62f3ba6fa085872b25df58e7826ff2bfe634fcc343104a93c98e421a3e59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page