Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.54.tar.gz (6.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.54-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.54-cp313-cp313t-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.54-cp313-cp313-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.54-cp313-cp313-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.54-cp313-cp313-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.54-cp313-cp313-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.54-cp313-cp313-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.54-cp312-cp312-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.54-cp312-cp312-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.54-cp312-cp312-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.54-cp312-cp312-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.54-cp312-cp312-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.54-cp311-cp311-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.54-cp311-cp311-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.54-cp311-cp311-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.54-cp311-cp311-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.54-cp311-cp311-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.54.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.54.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.7

File hashes

Hashes for cocoindex-0.1.54.tar.gz
Algorithm Hash digest
SHA256 042f9862c769e8a119de261c783c281705406618f2551e76da15e5f56679d66a
MD5 fd59633e30b53fc90db5e63f1dcc4a20
BLAKE2b-256 dfbc37503f157e886d1c4bb4379f645f51ac12c93413255d11a207764d3a3d95

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 350dbf1df4dc37cb12a9093500184ac981cbd812c0899209459c3e0f0614de90
MD5 089bc7b26c0627fdd353f77f6e946c9b
BLAKE2b-256 9665c0fa858fa81e2d7c58c309a814f568518c49a1f68fc737f5c1fa94d7092b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f01152addb70573136c765d72b7d6fd79f082020141736742d18cf5850010356
MD5 972ad5b14bfec7eaef2220dcef8e8d9f
BLAKE2b-256 6a38b764188c24999b84c629c2876b7a8dde5d5847b1cadd4a68d89ff59d5787

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 6298b39b2d988a768432866598ff153704166cb98b0fc44759b29a2474c0d951
MD5 05131a93a4a496328de5cec64e58ba96
BLAKE2b-256 700f1fe733581e3771deb90ce8d1b3d4015de1609dd56e89f6aef4f6ef6b4404

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ef7369d67f271718048580829b65bfb477d20fa69e5b9aed0f0f193ae55597ce
MD5 e8691a5db3044e5ecf6459bf8f5c0300
BLAKE2b-256 ee7a40d42f866b1e0aa90628ac6855a474ec47847ec1b0594ed8686218227236

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ac61a80420ca73f214aced77ef248a93172561a63e1fe1e4ec67bc88979ba951
MD5 0725ac8edb95a510528b8edcf0d55f9a
BLAKE2b-256 e9616fcd2bf8385e7fbabf9c85d6a87fee7b07235917ac849a79b16675ac8ee9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5ac3bde65bf6290618f753b0a293898b34be9a00060b6fba180adcee75493291
MD5 36011d08b11fd42ee9191ac794fe4b1c
BLAKE2b-256 5efebc6f6e17a4e9e01a4fbe5b1998b3de14e0b758119fc4e00dea53f0015a6d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 5a6baf0964ef779989fd1dc7feba6a88ff5c9f9c299951177736f2ca3374f67a
MD5 f507f07fef150d875844ca159f36d6e0
BLAKE2b-256 b942b76b9752eb9ff5b82827e0af8e53e06fac4c83c1242ef631e539fde50b9b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 965bc56bbcafd96a955d232a6109cffee63aaf41c838172bab9c6b2cb4a5b6be
MD5 fe8dead3dbf57cc913390204d2433fc1
BLAKE2b-256 97d69afe3434e933646988a1f90b7b0668afc85df28c2bc822302b0447adf088

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 08ec8f5cf528ec87fe1cb38e955404779706978ccefbe45ca98ae98b76a534dc
MD5 a831eb7601c529cdba9e19946afcf0be
BLAKE2b-256 ac1c16ef94402916e3e5cbba8701d6cc8d07f49ec51a2cf13bbc8f80d80265ce

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9d524dda71e9c46b5d0351b211e18e80b344e735467e408d0fea8fd29ee5d91e
MD5 ddafbfa74124b6fecac1ab77c9349aba
BLAKE2b-256 42c018b8ff46d64cee1b31849c207b962a74384e65b89b6f17c196bde1edfe26

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7f70b19121e52ce31c839027d3d86b0ef4ffacc572bdbe5709506bc18c586902
MD5 d1ac90571e6645fae158eb25e9728c4b
BLAKE2b-256 a144c16e14778fd4a0c42c43a56fe38ab75489fe355b0443f4f548c0f4990e70

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 06e9ceb1bdc4afbee2e2f7ef2ef8a1ae83461066ced7dc3e1163221222880708
MD5 b98d52957e051cb56be8698e987378f0
BLAKE2b-256 76536181191f969a48c05354328b61c469f06df851ad6328e444230702bc0105

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5226416e4e0ffee4dd3321d57003d667eddf75edf4af91949d8d2c7f07d0b6ef
MD5 4f7229e3cfe66c4d8f96b2d656cad580
BLAKE2b-256 5ffd427249aacf45bd805cbcfab88b9d82efb5b00eed6862c6277f05f242d2a0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1f8150f135792da20d31820112e50554d4fcb7844a41ee794e4a2cc33e488523
MD5 6287ca552515d9258416a8ef9359db51
BLAKE2b-256 dfd042392653bb4cfe59e213c1ebab4634962f1b073e2c2b8c358b261ce7a751

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7628bf4fc4f3b53d3d0aab6666ead9e54da5fbd0400b4a5428251af1c8fa5a1c
MD5 611cbff8032fb7ff3151f8d2e9d27956
BLAKE2b-256 a33fe04745281a9068b2f0df39fe306bbf2e3c291d901def800b3091a185dc74

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d73a6edb63331c2716004c41a3336c8fb6bb61f39cb9010363cf4a0bddc47b4e
MD5 7ca30571551d3408fe67aad31a1cf025
BLAKE2b-256 cc2b8b2766715c06c2b87a1e132325b07f481b2241ca6e833a1b094c174f5109

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.54-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.54-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a5ac63305bf8509c681e0badd5581bbab4ab2b4178a80fbaff5b24bbc51b9f40
MD5 0fa72af38c7a7c2d8a37cb67862a97a4
BLAKE2b-256 fae8e0486e3775c3ea81c90c79935cfa6868dba0d6ac4ba73eafa9a00bc7330a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page