Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...) 

# transform
data['out'] = data['content'] 
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.53.tar.gz (6.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.53-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.53-cp313-cp313t-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.53-cp313-cp313-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.53-cp313-cp313-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.53-cp313-cp313-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.53-cp313-cp313-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.53-cp313-cp313-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.53-cp312-cp312-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.53-cp312-cp312-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.53-cp312-cp312-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.53-cp312-cp312-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.53-cp312-cp312-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.53-cp311-cp311-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.53-cp311-cp311-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.53-cp311-cp311-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.53-cp311-cp311-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.53-cp311-cp311-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.53.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.53.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.7

File hashes

Hashes for cocoindex-0.1.53.tar.gz
Algorithm Hash digest
SHA256 3ecbba90954bf36ed74027aaf009edd5e6fe8517a3cdbc40657ac4fd3f52feab
MD5 8ea02dad2381e04faeed00dc1c5f7e38
BLAKE2b-256 e3c7e6b1fc5d05aaf8d57e324658ef67b26a62679e3a26dbdcb7adc1cf204969

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 16cd06bc168c569173b720849213012503eb48e9eb2302a47458774dbd4970e6
MD5 6e579fbe3e6c82dd6d4afc71600d6e24
BLAKE2b-256 74c9e7b96e16a54b0119c4206be798e819d5a1cc0facd27138b6076373f49d98

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c4ee434f972588899a86da468a77b4b1c735beb8d416def080fc4d5d0cab0926
MD5 eccba7f67f447e183a8f50303427ccfc
BLAKE2b-256 989c2b57b5ab6470b069e07e0389bac0ac5214aa1daacb35960d8e75f73e6013

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 7a8d06cd3af2f124236545a3420e0ad2865d86f7efd4c68d3b181189ba93ed50
MD5 0f7620a8138820ef2bffe26c893fc789
BLAKE2b-256 715b6360eb5b2f21dff49beae61574d6e8a9e54c914c73de378d048e90796889

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 45d1fea0864dcbbe9edda3ae8a6a5d579f66b4c6455dca36fe862f25d7a0ad77
MD5 286c451f8e48df811184b01b25293b4c
BLAKE2b-256 5bf44798b9f795df83acf1c0701484718e549aa6871e60c4dfb1b725ea2e3772

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8aa945f4b333d2591424579c96970f14df2dadb77c6210eac3b4f90aed114238
MD5 b0ad0eeb5ec3933d90db6c06b39528a9
BLAKE2b-256 16d584993f9ada0bbcb3e6955afd8224e636730fe8619da52e09bfec464d3b66

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 904f536bda9de696a4a2853258b6dbc38bafae3fe1f147e8a95ee727b1cc40f8
MD5 5dbe99db597eb01b3ee12b72ff2fc2ae
BLAKE2b-256 8291348c88bbab422ec806954dd575bee3da96680ec7b67a59b046339de27819

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ba801d52b656fd77d01ca88df9b9ed8ec3769e7226177d6d2a739e3e042f5144
MD5 086289830d5d37141844a2f20d09e014
BLAKE2b-256 7205a61860e604f3d1717f0aaa7b5ec068f995d8ebfe8fcb51e9db06482a56b2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 bad7594d7e23473bf2601f89c356174c6d2bd29a46f2f6412948de78787bd842
MD5 f8715cc1fecff607cfbbe4f2c51447b1
BLAKE2b-256 4bc5f4c8725d6e20bfe198cba097a885f83e1871f9fd4d74a68f51e20ba33ee4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4ff31e6f295f9ec3985de8401478b5fb0366ba5392f08168bcd87205bf58deaf
MD5 48c1cf64d56073d94a322b4248eaee44
BLAKE2b-256 c7d07656e294d6e705f776f4a32cf45bada48517acb684c476efaa79b22c8e72

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 67508fba4b01f49c0022b368b70910de14af5a4220baad40de1058332955d0eb
MD5 6826b8a933cd344bbc65b730508a7c50
BLAKE2b-256 276e8aecdedeefacf11f5253b39a8a164a2d1305d14a2df8e9f42137f027bd52

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e6acfeb6501b13fbc12b50ec9ed449b9d4b7e528ee68599d30344383cfd2cb62
MD5 4bdac7e4075e323ccc615906594969a5
BLAKE2b-256 3014b1e322f18313a7eda8c7231ac24cb68aa2ac9d60bbefa001b4f3ab1733d0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 269c91ef23bd000456c1f0e8b931f57590d4a8f7c5d624216c2804cf7e488257
MD5 ba164eaa59d228fc6e54e785f8fdd980
BLAKE2b-256 26d936a235a844f380ade2d6980eb90e647596486f53ba3055642874d1f62c81

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 e0cad16cf133dd968332740760ce21915ba5e2ba9b5fc0ce2afe1bb57a8f9651
MD5 2984ea7380af211b9f3c6f3f81e2b8ff
BLAKE2b-256 3fa1f611f24c8a1cfb12b76402851e451778fdbe7b3380018d0b17cb15fc7491

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 123210b69365047fdbff6fd2fdc4dd1fce7fb116618fa8aa46584d401b88da82
MD5 ae4901e19221f008bc6bd89dbbc731f9
BLAKE2b-256 94585dc246b0d2e9abda444da70f0bd188a0efac3194c887a2f3457f45c7b591

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6e1d9eaed1d78f2a318e0fe68ac840cc00c21998d360010943ff1a82f505b2e9
MD5 852001bd6b26ff4238ec00bf8f0183f2
BLAKE2b-256 b822e5d44919178e16c39a51bcb2d7e75d7a6b8b2ab8b24d1758b79adc325505

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e71094550c733990f8e22aec8f77a1069474d5ba6a59e863944c1efed8f34a40
MD5 9f9d87861e3b68c6427d5b13d22bad05
BLAKE2b-256 a0c2dd6af95ec3b23aa8a9ec8cc7aef00a3f205bc5555aff418ede0d44cc2b0d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.53-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.53-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7ece23d38a519f11101611b44b866caec2a223d050c0bdc96b82017b7d64c498
MD5 5e84572ddefb01f06562c30bcd40f64c
BLAKE2b-256 ddf7acce099bf9207a6a239c899718ad448b16e9fe5b3de3d99d55a939392990

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page