Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.57.tar.gz (6.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.57-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.57-cp313-cp313t-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.57-cp313-cp313-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.57-cp313-cp313-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.57-cp313-cp313-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.57-cp313-cp313-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.57-cp313-cp313-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.57-cp312-cp312-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.57-cp312-cp312-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.57-cp312-cp312-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.57-cp312-cp312-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.57-cp312-cp312-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.57-cp311-cp311-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.57-cp311-cp311-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.57-cp311-cp311-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.57-cp311-cp311-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.57-cp311-cp311-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.57.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.57.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for cocoindex-0.1.57.tar.gz
Algorithm Hash digest
SHA256 4dcc60e95667d909ca33890621f55ef398ce432ca4d8c7816343f39ba23cbaf5
MD5 2d4a1df51a2e83cfbfdc7875e75eba18
BLAKE2b-256 4c2d0dcaa4f0c679ac544c2ffff92ecf7e679871daf2f28481cda76c8e4d8060

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2658d8aa7c52cbe66040f056517a493f170116a1be591125ac91434ac844aeac
MD5 b7c930562e113c94e0ce7370aaf4e9fe
BLAKE2b-256 3db683e8cc50428be3b1e9487c93eec1017610864612b0fc23846d0792593833

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6234bd458c4a04bf2ee0ffb6880fb64c3504b065dd8b2a230dda861bbdad9f32
MD5 36fc85eb5855f7f85b8ddac7d160fe37
BLAKE2b-256 8289513a560c8515329ab5a4ccab12cdc6ad35dbb545940ce4a7cf5bc201e8bb

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 9952a24fdca3d25d19c987e1b9f36795beb71d373cb7b3408c3abef421ea2944
MD5 fcd7c6d798351694be26d88d5041df86
BLAKE2b-256 10d7069a0dc9a54ad984ddb4b948cfbf9febc5790158a5c1239d3fb3f4c9d47f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6b26a5092025fd8233377ed975fb03ce9cdb40dcfc791e6867b0ac2d2d1861db
MD5 b77c77b1d2b0960239d32357cd6fd1cd
BLAKE2b-256 d30cac6ef548a5f9407c882f66d698a5b01bfb18dbad01dfd9edde37f5380b12

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 4bc504399f8f3713f68624199d0de0c6c64d462472c69013e87ab494411fd547
MD5 4e7c019c9f1a06f45eda1750195f0173
BLAKE2b-256 752b4f28eedd32d8bbe772f5e2c83318737017502c1bb2f9fbb23abf6b9334e8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6d5c008c8ca099898537ced8a47372621500067db6c5b41cd8c5a1b0e8d77d13
MD5 9ef9bc4eb8bb3aa267949ae87b4e904c
BLAKE2b-256 3a03664c709467f2fcf123c04593834a676508cd061b4adcb6a2e16a985b7f31

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 85fb1b6b3cb64a64a7581af5008231ceb22ff05abde862cd3700e8b1ab33e467
MD5 6662604c3f625af1759c199ddb866705
BLAKE2b-256 d189984b3ad39f02f7ff237b32cc34805df060253ee96240a5f05bf71c8ed614

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 cf2c8c0b8f972761bb33590b95942b983a875a8a1c3b1239664614e6b9f13f8c
MD5 51db89d7eadf55fb2d74be8784d34fd0
BLAKE2b-256 7c5a9a7fb05c48fcd66720ee094a2d14b4009060eef5f14e9bbdabed5832ed96

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1047cb1704fba39d33c10bfd1d454a344817d55c999a891563599980e0475fd7
MD5 392d1035fd04fe623f99ede6f1670dd7
BLAKE2b-256 fc3f5655aaa1e1c8ec04730d92d089932dece60fd29a77e310473fba90a7c913

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c5662218df9c7b5744501a5e112c2957da6a771bf44ca4775a51b9834608abaf
MD5 c8007b4a0adb0921ae51ab397a837129
BLAKE2b-256 0d6c9b6f3eb56097e72db9b4f225812b9603ebe3874c635fbbc7b4dbf36b15ed

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 791b43e8871d7e147ef3c51d1897e9924f97088e3b8b2d777efc5c789194e4ef
MD5 d26b6e5eb9f764d0c3306aaea33ac1e8
BLAKE2b-256 08b93510b8f9349ea76222778bbba7b32e52c0ec884066c819ae1dd2d2fabd88

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d00433f9ffe01428c313d0f6ad64b59274ffa017ffb031987591d0cee057f76d
MD5 3a5cdb7f35a71eea01e6a8c1da9cbd3c
BLAKE2b-256 2255c0ae0a11de03669596db0b6cdd7b41556f266b9efc9db18fcbc9dc0261f9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0d0c28b5e9952e1784958fa9b4d06e489618b722843180b82f29a3e7a23aa41c
MD5 46dee7212151e7ddceedefe6cc0fbf39
BLAKE2b-256 869645c74da6b1bade47567ae38ba669b3f517b600b0bdc3d4ae7cd010dc333f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8f5190d82dfc631f3d84aaecab72d2c0345c56c2d3b96bac1993784f05a03893
MD5 e07f341889da0fb1f0614c7bae3789fc
BLAKE2b-256 cd1608c430b9646cc6c60c172d5f6faa962efb805f061a0f8e478b353979606e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2117e37b3e9a6d1468ca4c6c11764631710a1916d9c4d452a250be1b890a9a18
MD5 1ea19de7abcae890181d34c9c8fdd162
BLAKE2b-256 c34e80b0c2191dde29466772f75a5f24461d01676b7def4a89c273c9b5be4588

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0d4c8b38cca8ef51e29e1c2c58a4d4281c20b440dfc45081c54fc4c83e8265f4
MD5 5de0b2cf410112bb72eba63018ec2737
BLAKE2b-256 9df8cc45d6da8c3eb7c103b82d7bcd5cbc067eae45efef760e7d9ff11b2d2641

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.57-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.57-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 5bc7c00df9caccee8f66882e07469cb3e32f0f29ed301fc38b6b793b806b23a2
MD5 8ec9f0fc127b7b0474aeddc9264f092b
BLAKE2b-256 42394b2c99479412ccac1d27dc5f22ea5fa00217640dacbfcf6f997f5b26497d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page