Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version

PyPI Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it effortless to transform data with AI, and keep source data and target in sync. Whether you’re building a vector index for RAG, creating knowledge graphs, or performing any custom data transformations — goes beyond SQL.


CocoIndex Features


Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Plug-and-Play Building Blocks

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components - as easy as assembling building blocks.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper
Multi Format Indexing Build visual document index from PDFs and images with ColPali for semantic search
Custom Output Files Convert markdown files to HTML files and save them to a local directory, using CocoIndex Custom Targets
Patient intake form extraction Use LLM to extract structured data from patient intake forms with different formats

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.2.0.tar.gz (26.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.2.0-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.2.0-cp313-cp313t-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.2.0-cp313-cp313-win_amd64.whl (15.9 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.2.0-cp313-cp313-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.2.0-cp313-cp313-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.2.0-cp313-cp313-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.2.0-cp312-cp312-win_amd64.whl (15.9 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.2.0-cp312-cp312-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.2.0-cp312-cp312-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.2.0-cp311-cp311-win_amd64.whl (15.9 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.2.0-cp311-cp311-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.2.0-cp311-cp311-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.2.0.tar.gz.

File metadata

  • Download URL: cocoindex-0.2.0.tar.gz
  • Upload date:
  • Size: 26.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for cocoindex-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1c86c016df8b946b40a7a66bd0f5e3bad691f46e78947b3c19acbf3cf3ccf670
MD5 550d44831a677608a79eee784d19dc8c
BLAKE2b-256 9ceb122ea11114874281de1075a3b674466bbeb947f38a15c45c5be8882616f7

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ff48494eac54283e6dbeddd343b670400d78e5475504879cf09693b78302a0e6
MD5 473741781aa41701eb7bbfe147ed5850
BLAKE2b-256 5d1df5f0c2cdb67e2a7f1841cb34bdda1f2de23432018b7fedcda17482bbfbc7

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 31b81a96b39adeb563ecab6942669ef90726b25fa07bbbf400e4f163af6483c2
MD5 6d965e1322b4d7ec60bf09e69b53fd6d
BLAKE2b-256 92f2095ddc8c55ecbc8c471ec42bd377b34b1e268c4c03889ba53e20ec6b8fb6

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 7b7b4dc1e8bd0e98b02fa717a491553e20c0c73a89bd61c9b70e5e8ed96bc4eb
MD5 f77a12667765efcf24266a2823f49070
BLAKE2b-256 87ca6990510f891507636536be673dc732290a75078a2f61396da586a330e679

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9d696e03847a20a9531b5097c0d3da45a018e70c59eec8cd5a7083c6d2798aea
MD5 bb536f03f20d2ab52cfb992d4af1a09a
BLAKE2b-256 a2b2cfb3b46b234735b18ae747b446746ddd1f7a33febf9380b4fb31afb5a13b

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b5bb32f9120308f4e7944790f7717ec5bd46ae880d8ec2986548d10f4e8cea69
MD5 176bcc4b2f092df417cb4ff42673fb1a
BLAKE2b-256 ead87fc2503f11567aab5d8ce49717684ea9cd1bc7942837741929920d3eac3c

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 83a3c317c5a2ecbced942ccaede9032f2757fbdf9f530276f7ff0cb6e2577c4d
MD5 d4aeb5d99d4650443efbd8cdeea2b5fa
BLAKE2b-256 dbe9af5e9ecb001998f8c750159638bdd6e44398528d7c94c4b5526aa779010d

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f68372b10e7d7525645b3c9dc6cc9d085184148a712184128f2dab6fe57fbad7
MD5 5fbf357186d31aea2b1399e500fc7dd9
BLAKE2b-256 25ce85eae8b38e68295cc86501a0e47aca0be601a86984d15a1573692626e244

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c2e2c3348161e8c12ec3df20f363eb163adb627f0abc67b21dce33bba520ffd2
MD5 f9d1eb19359bb260bf5e5ba733dc7a4d
BLAKE2b-256 9a3dc6f4f624052e3f31fc72cf048fdb4dd4b426354c58e14db86b5ca768da6b

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 71041a21af5b50184a272b5773cedcd498dd7fc6ec04591bf7cb9c1fa8e74b63
MD5 1bfa530cbb82f9bcb4a1995db6e4705d
BLAKE2b-256 f6bf62d422b003e8153b2625f796f877d80aeea7fd1fcd718d20c0b6c5f9b5d9

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2eab15387dbfeac0784fd351419b8a6791d6a9dce647daff1529e3cc9746ba63
MD5 f7e4787795f85f06b04e418c408924f2
BLAKE2b-256 c0de6d96b850ccc1d89babfad7280eef5708bf5aec98421477708a509dab738a

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e0ec8fb1ee1498df9272b4dfe6a7fa732ffef65eb36ca7ad94c2f5ecacbc3025
MD5 6d6d084b683c657f5189b50edd0cfdf6
BLAKE2b-256 d8eaa4f24972efbed5c56de5560d17e478a2754ad89d005ec50a1a5326f77f5c

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 02517364649a46fbcc7c9977942aca439659fc4a18eb70be0174c302abcd52b6
MD5 21254c5e72858e7473106cad61838db9
BLAKE2b-256 164487024474848060a79f4c03ca3382ec658afe5b7d5a5f59c2a17464549c97

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5206c0edfae96ed22e90416285a00c885ecfc2d246af799d5ab0e315617faf57
MD5 ded1f54df1ee8a2f8bf3bc479c35c965
BLAKE2b-256 7fcad77fda8d6fa68806342177cc0d5e3d3e7aa57068fc5bc93d7c23cf85d970

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f126f8f4698a623d10f9841d11aa893ddcded554188acc197f439f00035f3d20
MD5 10587423250688ccee9c48b51be20488
BLAKE2b-256 6f9928cc7e34e134f6bd0297df1e3cca11a12a602b9c267ca6b175aaa1c4696b

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 bbe0e41701f4844d8b5d082484dece51f07caf9326b9e55ea5ec406fb031a682
MD5 e82315f471c89014e3ea71217057aa1c
BLAKE2b-256 5c85486c5e57d5de407574836d50f9b02f9ef5d34d4a42a2283da511ec695608

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3a9e73696014bb61a311752b6e44d1e998dd3c5a26c71bc3fe24c33a0bea1e90
MD5 737598990008ca202e2fc751302b5a53
BLAKE2b-256 c8a7111a3e083badf208b807db378af48068ce05a9d40eeb095607996a1699d8

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ff15d7d02b4c656ee7b6a6ebd8493f5dd2bfa56d86ccda878801f3d4f63bdea3
MD5 4a29b4cb3407ef37303e9cad9f0ea429
BLAKE2b-256 d20134d8db08ef3f469428f98264f5460f4cc3672877c6cb4443cd7a1c32be02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page