Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...) 

# transform
data['out'] = data['content'] 
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.52.tar.gz (6.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.52-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.52-cp313-cp313t-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.52-cp313-cp313-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.52-cp313-cp313-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.52-cp313-cp313-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.52-cp313-cp313-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.52-cp313-cp313-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.52-cp312-cp312-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.52-cp312-cp312-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.52-cp312-cp312-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.52-cp312-cp312-macosx_11_0_arm64.whl (13.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.52-cp312-cp312-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.52-cp311-cp311-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.52-cp311-cp311-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.52-cp311-cp311-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.52-cp311-cp311-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.52-cp311-cp311-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.52.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.52.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.7

File hashes

Hashes for cocoindex-0.1.52.tar.gz
Algorithm Hash digest
SHA256 0191f3254e8fbf2dd3443f77c38bbe62ea232f132e491834cd8648e1c7d9ac30
MD5 957fbb3bb21810e0a3edac2c54d9ded1
BLAKE2b-256 8a75b5e56d06599c9ce24af1eb6c6d3fc2c5dca95432c723304bdae0f64864f6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a8fcb2d304537bc652cbbc886800eb8764b911714e9b8e26726cdd8eb960f692
MD5 7a4e62b02e44bae1270ef1f5f4dd1d85
BLAKE2b-256 f9f0cc418743b3dbccecc2e629c976fffbcc08c48cead39b02fa2f26815c115e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f513ccbfb4dace84946e2078c0a4e5726f43bb80af9dc67a890010db6c46e5e2
MD5 6d81a265bf5f3a6760f6de71a8dc0550
BLAKE2b-256 20cf41fcc330debeb8f2d793e15973a40823c23e04a24df95c5ed79d22872f49

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 4858ac1c4e35063390ef911c802b1030065ae346f1041d4bf53811eeeaf4c1a2
MD5 b0aa21cde3f9e849f3afa48c24e8e604
BLAKE2b-256 76e7f87bf148acd8b766dc51064ef33392b6153192982eed54327b3ce7676b0e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a5abdbbca001a77c0578d5f5ff37585998d781c2e36da58cdaa1f19b64e7e7e1
MD5 4949fed15c6564a4be8f2d5309e6e451
BLAKE2b-256 dfcc85874fe61c431856c991493553901a71afd45a4664b797d121751218d80f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6a7e7db5a65f66374ad2e6f181328058bd9fa7a4e3e1d24297de83c015ef557a
MD5 921cf10a97fb8bee4a9e2b063a7cac26
BLAKE2b-256 743fd1101b1cc63513f6811170fab072f637a3fbc77e9848b1d236116396980c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4ae91f69e67f9cca2e7152eebf6646533e8b34736649a41bd245d8f3e40a176c
MD5 22c0fee1600eca4000952499655fb536
BLAKE2b-256 8a2bce210559fb38508491b87b4e16a4a88086dfaa3f41cac186633aab1ab307

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 269d36a51e2f0c3995b434e16c602fd3cfd846d450e93e38e1ee802751a70511
MD5 efb3214c934403106f48352c16562151
BLAKE2b-256 249fcff54fe95ff4c8d8dd402108564d788cb9b896a1dd9871cc8243cd6f53f1

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 5bca78f2517accfeccc1d093afe68f7a84fe2ddacb868315954508fedf198044
MD5 d567d8ebcff8e78be86439a784e37fb0
BLAKE2b-256 78cdcf1cd06e680390b18209eb09fa673b455fac8e6d85e3b30c8c7f4e902dd8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2bbf5a996c134595c54f49ad5199768056d14f559885bae1ee7752da5572adea
MD5 9e32e4fd2acc61d94b537f43d4ddbc74
BLAKE2b-256 10f0641e0d3e2dd890c588182e506ef9b2d24897e0643c48ad73a9864ad63839

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2328fa589e291cc1cc64b8d70df2d42e3b11cef346e0026029e8640aaae7c17f
MD5 4c5a4953c722831674c5531561eebc68
BLAKE2b-256 edc337df3c18daf22818fc31b36ba34cfa133403d2d1c90ce8e6d54f8423f162

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8b298c9adcd2f5130c927eab50afc229976967902e7357877ea4c6f7fd07bad2
MD5 8580c8d750e3cfef7bde438980b9b810
BLAKE2b-256 fc036f88e0c30fa8a1782f7d1a0e28c96c31b3a7aebae8ae64bc7d1a5beddc65

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 2887cf713a19890f29ddaa1d73574cec47490eb6866a72c07bbf1448d072b4ca
MD5 a5f6eeadd19a1a025c7e3a18e404f484
BLAKE2b-256 d9e920999ebebb44110c71d8b1ab7e78fd127cce53cad9d48feaf16d53d50e41

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7f060ace4fa58f33d712970ebddbdb12cc2bca6599826241832344d7bac20964
MD5 e1f8c07a25f1a57a3bb4348f16065bd6
BLAKE2b-256 e225f6f45035448fa7f4e386468819ed02633df65a99e986ac4bc76f071e1676

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 88e2fede84edaf1b9d8190e87b6b864f088121cb30e94c8a8f6be195c8e61a5d
MD5 38045bc5401b393d5f0c287a4c626961
BLAKE2b-256 eb6defd5849f064515b2ff7932d96745bc27dc91a714a7dc9f0a8507d725672a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e3716838de590b17ef53570b095e1b343d06d4f87d23c0d11b8af7975d6ed8c0
MD5 3f98476bad532157f1d7c658957e3f8f
BLAKE2b-256 ca2257d45fc3fa8e6e72e2dcf0dc1d078af84bfca54b82167d532bee5ce1f902

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a773700c8d80274c4b2a26f21f7361e89b0f081194bbd0c5497d74017d7936f9
MD5 d1128cec9d6cccf1c7f5c53e9e5b0255
BLAKE2b-256 513a13706fd8708c1344148643103f5e7435ad6c3de393fbbaafa1382195c05e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.52-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.52-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 096fe34e7afee4be5b3ce0606df92babf72ea76f66f9da2dd47c95cb2f0fa703
MD5 54dd8467129d2441836519decdf2d11f
BLAKE2b-256 3bfd2b95f4bc38f0c95f39ba10279c1adcbb9769a162970b6d750175413c1167

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page