Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...) 

# transform
data['out'] = data['content'] 
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.49.tar.gz (5.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.49-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.49-cp313-cp313t-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.49-cp313-cp313-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.49-cp313-cp313-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.49-cp313-cp313-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.49-cp313-cp313-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.49-cp313-cp313-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.49-cp312-cp312-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.49-cp312-cp312-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.49-cp312-cp312-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.49-cp312-cp312-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.49-cp312-cp312-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.49-cp311-cp311-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.49-cp311-cp311-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.49-cp311-cp311-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.49-cp311-cp311-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.49-cp311-cp311-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.49.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.49.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.6

File hashes

Hashes for cocoindex-0.1.49.tar.gz
Algorithm Hash digest
SHA256 17babd35441e9402e2e41bee581d1e226cbbb2bd06312d508bbb28e27d82aa47
MD5 8a1cb601ef23482dcd848244d6034918
BLAKE2b-256 056c70f3dd8614a6870a7f1793c67afe751b4a6bbc45efeca778c664f86e94fc

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9414a7dc02bb748d445be3470ee2f3197cfc96b13a012bbab9f2c20ae9503c7a
MD5 abbcac4ddcc6c64695a6839262a13ac8
BLAKE2b-256 b243167fab81b00fb06c707132b411179b14db4f7527d5fa9ee5d1af876a85b2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3b2e244c7af74354eec1253a554b5678437961353e700b929f32e9e12bead03b
MD5 2e503ae16c30af98b974f3481fcb6a29
BLAKE2b-256 4cacac80eab5fa9ccb5710072d23a2e9e3fc2114d5cd6bf1de1ed70f9e986524

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 3910b6db3369f7df686992ed1ad0f124755986f9fe31b56b25a291a3d8633cea
MD5 3d7a7adb06bf89958f1196053417caf2
BLAKE2b-256 524cbe7935755f47cb804b2f08242b7e94759e0d871f883980a410fd9da1063e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 32f60a993acca37019c5309748f27e8f3677638607e423fce52375cc34d9bf27
MD5 ea931f77a23f31fb85480a50d3bbc955
BLAKE2b-256 08f56b86cec017440338c5a574afb81e5d3f4efaaa81f700a4d5233c39891ffc

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b0d985d4695e2eabe5a524d5a068da0e4ba2aa0ad10d2f7222ee26b4adbe6c48
MD5 ea597ffe6cdeb8e64eadfeffb01c44f0
BLAKE2b-256 93789fa0f60bc7e7796c8e63fa3a70f6a495ae1e60f67932298a182022d7b010

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 468c14fb973e44eb0ab2ead4cea25d0b744ef8ce3ee563b0dfb0cb7cdcd70f52
MD5 7b5d3e081c87a9b89ef5e34fa215c5f1
BLAKE2b-256 91408b37b704771cfba052d16f5be36e416dbfe7eb213aaca6a46ba39c8ba8b8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6602137d1a18b52aa088a3b3fb8b7ff0a44d3eb16f3936782832b374e990a727
MD5 430422a24873fe11d7203b6001c63a0f
BLAKE2b-256 e99098315a672edddf5e27e2998d4d93807d0374cd9e25c8a82af2ad8e84f1d2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 6eb602d01aacd4dcbb19acc6f99a718c378c7ef56253e35cf476a13fd14dd42a
MD5 1150881dd6475c3434819940c8898c40
BLAKE2b-256 a0ff6570824da4ab4363fa1197f3727ec14ec14d76a27fe19a9607a4c1f17dba

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 32613003afcf98964642046b6694741d1ea677c963ab9c8969ef4c5290442792
MD5 a25cd55765bca28587b21937693ef43a
BLAKE2b-256 3a7229483c6c9525a50f71c60e120ddc90188e64ee0df659a45a083602177838

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f377878c71735d80337248a015de0372c66fbc63598d84d20635dafa5b6273f2
MD5 7063c90996638403a2dd52c0d63d9239
BLAKE2b-256 d93f1fa11f9b8135bd13d441cc84a0c8501e674b616c628b295bb87766c606b8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 62d25c1fb563e6950c8a1734de39fb4565c39fb1f9988377d5d2eaa6a86d4cab
MD5 9489d6f5d2f16e720f1b11bd14e68b82
BLAKE2b-256 0082f64f15a737c94d45b7250ec8cb6d081a1a5178a32403983fe2d1a5df864c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 01d174ef1da6e35ba13cf16900e8401269ba025e50d3c75dc5dc9f94b131e140
MD5 4ddcaa72cae2b895479662f233cc0394
BLAKE2b-256 a2777ed00a9e4b3aff26cd939f66dbb1167a9f8d9d774c668ab8c95e5dad43a7

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 8f202e5cf25c3466d48550ccec4400d2c5d16461c39ce665e25b6a537735b82a
MD5 8ed6df5ceb439198cd6636cfdedafa32
BLAKE2b-256 6bcb19318e05adf17b9cf906872b73e8019eb06af8c517ade33e004d4def2371

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 48c8e68b3f7d3d99b831544279936f93c0a56bd2d080fea5bfbf78346ccadaba
MD5 3fa915cf105fb1757fa9793fde129d86
BLAKE2b-256 13c2c2fe2454b72553ab10557d5088421f0dc97b314dc214f364e87de7ae5aa9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 504e23d2ddb3a98001c23a8e45aaaf802ca443dbc7f493ae5d7b350bf1acf030
MD5 cb4d1e407fbee12e3ddcd2548ae52607
BLAKE2b-256 eb6138c1eb899c1472a9220b3c84763b49d75cfc478bb6f4a077787da53439b0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7eb785e3ec5123f982c160c82f2a7ca515ef48a80d7a6b74a5abb314c24bab49
MD5 67ce3fbaaf7e4d03c5eea91a5ac9b943
BLAKE2b-256 ebc125cd97e746937268bb2123ef2f7ad2c51177e220799dbf47baace8baeeb8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.49-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.49-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 dda6fd2e329e09cbeb4bfc1e4d7c7907dc17c9441937884b68137f182384b47b
MD5 fe98b0df706e7df92166bf033777a98e
BLAKE2b-256 81e2b4ccff5d056fca78243b81d724a5df3bdba415198b9cbf69106eb063bf0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page