Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.70.tar.gz (9.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.70-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.70-cp313-cp313t-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.70-cp313-cp313-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.70-cp313-cp313-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.70-cp313-cp313-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.70-cp313-cp313-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.70-cp313-cp313-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.70-cp312-cp312-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.70-cp312-cp312-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.70-cp312-cp312-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.70-cp312-cp312-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.70-cp312-cp312-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.70-cp311-cp311-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.70-cp311-cp311-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.70-cp311-cp311-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.70-cp311-cp311-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.70-cp311-cp311-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.70.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.70.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.70.tar.gz
Algorithm Hash digest
SHA256 946821d078188b98e1fe04a22430ab8daa0b0a79c9dc42e4328d5e01e0632ce2
MD5 177558de4a7793cb9d1f6ac9d41f43ee
BLAKE2b-256 3b817375f1a079f7556b987c3728d86e26c80a7cc0cac8a17cf01555c1a0f84e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9eee87f509621fc1d75893bf3af0106039a7fef91c43fdd7d3e3f3d3a07e5051
MD5 92ec7041ab7d6c1eecab7aa62a84cf98
BLAKE2b-256 73a96149360d7819489a3a1c75db1865858d1a4e98a268910fe27227443263ac

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0913c593e994c556a9db08fdcebd02d2a174b2208de807f087b30b5e91e5bb2c
MD5 f1ba1367c86ced268bcdc4da77cd42e6
BLAKE2b-256 8ebcdd0c88f8efa7a2af167c730ca4ec93524b064e15227d4f5ebb7b10866f8e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 f0d54d5ff25e3de478589ce492069bc052cccc77c536b6c782fdb4ce8895c62a
MD5 23055604e4bdf37898bc616b2c21b84f
BLAKE2b-256 ac3419d04af10e9df7d2f9eafb44e68fe73093274dc896789c694311e3ed2387

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5ae6d5143d02db49841b53f66d7d2991f3ab852221c5729efb39cbce1f0b4208
MD5 d2fad2453c7c2aa98afec1dbccbd9340
BLAKE2b-256 e7f7fbf0e9a242e7085ad681a813bd36b6a3b98e240b6684be1e6d322967a8d1

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 091d304fc0ad1e9df8eee3763ba4bca0c965aaa7d92c358ef17520bcf718414a
MD5 d4937ad690543d9fdb5c210b0df120db
BLAKE2b-256 2b47c76509e1413771a04b2176b8ff9125715f204df5170ab759f8ec6ed51082

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8da3736603025c7539718992dd8feac84f12230cc1cc318ffee9c634bc1f560e
MD5 7f17597c6629ead56b5d774000a328ec
BLAKE2b-256 5dea39966c62ffa3894fd4073751e055dbcbf39e6310d4648f3c3ae66d85f5af

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0b6aa8ebb8d15f5ab00d3b38196f491d38e72e83e48bb77107cab457f52afa9a
MD5 ebd75338bbdba27d66e55add92d88e0b
BLAKE2b-256 458f51a3bc8ef07bd6dcb11eaf34a7e367838062dfe0ee275ef8474530a978ea

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 dc3e6ca02f057f2e3292699a06d736fabc1551676118011ac1115cdf60b701f5
MD5 0a5b2fcd0dbdc06fd8d8da451170fecc
BLAKE2b-256 9bc6fb65d895712e4a7f697c58f329393dfa72a40b65c69659ff48cd3df03830

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d07e03116c1a8a16c7ef159c3feca1238f960fe7028bb2df9e2c858dd11bf3b0
MD5 a08bac8e5059ad80e1505605bdaef451
BLAKE2b-256 9ccec6879ac017a1a75bf0a4581bb9b6af12226d5d67713ed27495f7cbc3d081

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 84deb0cfa78de5c83a76ea0ed47d1fd94b4104dd107dc3bac39ca81da9d3fcd9
MD5 f278c33a6fb736720d92ab176531a95e
BLAKE2b-256 f8fbcd70e7642da3b1f6edc7258e9ecb7a1b04e247d60f869b5a35e948f3af1c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 74e3c1bd1fff85c56f0174f8bc151354e4ddd99e1d668227e38f93cfb39a85f6
MD5 0ac59d3a7d5a507b92ee85ea6b4f75ef
BLAKE2b-256 ad1aba29ea8d9ada42775bf2f2a15c51a4cef724ab73b638a946044ceeec671a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7f4ec328fe9e9c1486b5a7b81b33b06d65d348d3641e8f03ceefa916239aed7c
MD5 2b101deed299bd7e73af7d9a0b4bf02e
BLAKE2b-256 5bde33aad8d2d85974631f97b1fe12ad0864031b59ad3197f9f5bcae5407f427

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 580747a95d650feb9b96a1a07abfc558ce5778c3a8ddd87aaff5b0ce12d70f06
MD5 18df0edbe393498af57d27fd01552ce9
BLAKE2b-256 d013965a96eec29c140057d8c54de55205ecbb8f501c905fd2d719ec05fa207b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 12756f562132d400fa064394ea1fd7549b4cff1557fd3b6ad7d8d9dfef473558
MD5 691d5bb54ca4ad2d38b06c298ac48650
BLAKE2b-256 9ebfa7fd8f221656550eaffa74fa13acf6be79dab75a9fbcf743062c99fa943c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e5eed5799d967983727364d875308df465a4540e12087f3356c67f739fd7b25d
MD5 af87d5ba672a777cd78428e28c72adb4
BLAKE2b-256 723734e8e790dccd61ef1cbbfe1ebf32d2bf4b4dcc556103199138cc561baca5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3694e8a688e590e95a7398a071bbdbd291b4b593060df0ea206d0769b6357e5d
MD5 b3ecc871e514c6d5b78c7b26572bb3ec
BLAKE2b-256 86434110001ade3c3f47bf53130d7e30d36bcfa5db87ac2d6b701221ac7487d4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.70-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.70-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 56acdbc031acb00dd11cd13a564007b94064e73fafb223e763f247c7bb558ee1
MD5 1b1c3f8f6e9eeeda20605b2e55912564
BLAKE2b-256 fac0360ea5ad92b83d88b92acd168c3590bb3e8aa817a66c84843bc4f0b3e30a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page