Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.60.tar.gz (6.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.60-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.60-cp313-cp313t-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.60-cp313-cp313-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.60-cp313-cp313-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.60-cp313-cp313-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.60-cp313-cp313-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.60-cp313-cp313-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.60-cp312-cp312-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.60-cp312-cp312-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.60-cp312-cp312-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.60-cp312-cp312-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.60-cp312-cp312-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.60-cp311-cp311-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.60-cp311-cp311-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.60-cp311-cp311-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.60-cp311-cp311-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.60-cp311-cp311-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.60.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.60.tar.gz
  • Upload date:
  • Size: 6.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for cocoindex-0.1.60.tar.gz
Algorithm Hash digest
SHA256 af80cd0928c3f9ed1ca2501c79e00c08eec1a3c122b856bcf59f33780ce7fbe9
MD5 cff30d99a970d5302f5f2e63af4cd4e7
BLAKE2b-256 e57642d6ecd21f54bb859f041e480f8d2006f996be3d4a5f3fdee42aecbdee15

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 20027ca44e06e15ac723463a16993b7397b64cb445cb786a55808600b24982d1
MD5 45c55fc26de56ed14a873325113aeebf
BLAKE2b-256 e9b92857b9eddbec5b32e45e8cf22e9c92498d7f965ac7542528e1ddb578bf41

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 26ab3952bbf204350f7be1f278176d672fc00329fb2bd7b33bc2c372580815c9
MD5 b271233d40b64b1a292a2e243585dc11
BLAKE2b-256 6eb70dcfd37565505fc0c92d5946d6f6bda4352f2a70a4fcada678739caf3858

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 1795be49e1e4ecb38d21f9e2f496dc753636e56e18ec565f3c1a438809c657ee
MD5 7c159e4a3bf1abbffce503bd56611de1
BLAKE2b-256 e03319adc8507d31757a5cddc380a6cecb18fb36510b6158a0ef986293598b8c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8e25fb257bb667d95f95a5c7b69de15211b289d5f4213f1d7f2cff0d203d005b
MD5 2d1d73c100d0ddba1aef0215201acb23
BLAKE2b-256 c1bd4e36a8b0ed7d8d1b4e320f4473133e46b94dedb96fe5bcd576be69ce5e1d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 756c90918280f2303d1c91057274a792863667e1089f021d90a8cf034656429c
MD5 3a0aee28e10d3b1ffb533fb3ec214f75
BLAKE2b-256 97b2d6b5ab7983058e2667174c3b4d55fcc8de83c231d71cd0a0ba24e6a20d36

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 31c24def8f05bbadff57ad502439b6fb216ada54047b79e9d7d3ed89ae726f0d
MD5 80f605e94ba19bf53e2cde05081ed1d8
BLAKE2b-256 37a4d02a976928ce69520581543f1dd73fdebc7ca902de7c5eb3e299193a3c11

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 01df1e9e44ba8e0d0a940ac9eb64e3f5384ceaa3a4cdfdfc2107093e52dcd28c
MD5 3e34e1254ec8c9f13c51c7357e0669aa
BLAKE2b-256 b4f273a10c07aa28ae522e87ae589c486623476eec9932749dae8c6395c285da

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 5e629062b34ec68465e5dc9b6e6ea7b9a243ec8cdfc81ee8ff2f31463d979d5c
MD5 633c8e80c61342f4ee0035e5c57f740b
BLAKE2b-256 012d3200dddbb185c0083134a184ee054421d14d75f3deba15a12aa61d111b55

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e5482b24aa64366bfed46a1d28bd7166e5c9b6662f0880f476cfdc6355f02a4b
MD5 b46589e312d15398a6a3e799830b3b68
BLAKE2b-256 e37c5514823af06a1d076a2d261422dfe6cdda0a2fcc8f33e1fd0d3dc9a0e0a4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f0ceb41618d0f4f12c288db99109cad492b6f6766e3baa0e75c697207691f628
MD5 653e5e8894f6bdbc9142ce4f48dbf317
BLAKE2b-256 4783a79de935ee08542a23d9cd6c52273edfae24b5bb0fbcdc5ceca8a037b31e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3035c2cc74bb29dfc777e3b787c2a7c76e31973e8f3ab7a6fea1308a2c339778
MD5 f5d941f613859bc77e6c906965c32074
BLAKE2b-256 3adfe3c2d80598e9ac03ae0a27b145b1bf4e748c1c3a8253af24ff758be5380c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 534dd18734ed710f2756d7eef06923f19b1413323979251f5d4333d71b018d22
MD5 40ba3288c8a24b0693a6675d6f4fcf94
BLAKE2b-256 771eb3e8eba682cad74d673ba818d672a1f669e2de103c183c89a9ba6994dce2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b060433778216097099e62ab061310f7bf5caa872906f145844ab845b9893a19
MD5 5b057b62ca5b81120ff84b2844e7cabe
BLAKE2b-256 73205efb72b50c4d5edffd1972a8b4d2773a1b94c7bc7c5272f9b70d4795a184

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cb24e41bfbb52230dbee21bea1ef03c9944417d832129b3919f169980c4d3778
MD5 1d3280303c550489268f965e90e524cc
BLAKE2b-256 2e82605b0d63370bbe20daa33ad1a754ea574de045b37d34e5842ad5adc054ae

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 fb615ce80bc265874716b949dfe0f6a3db6a5b8e7b8a4d64eee0d4389ca230c8
MD5 be54d6c9371025ed4385e42ffb57d899
BLAKE2b-256 599a044997b2cec5226c0e6078c1a16abe2e84ce3afa033b44922f34766f34a1

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d02741c13da5ccb1c8ea43d17b075eed1696b7c3627c70df7268f3b54bfa1b89
MD5 067f63f11f105751b87e152c62d02b97
BLAKE2b-256 d1a00a3f8361df9dd092245bec0550e8603542fcfa7f9ad80d83d0f16485151d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.60-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.60-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 09b944d4ef4d23107c2ef04e7fbfb6fd945c4d97ad9a2b1644f0c4262bdc6d7e
MD5 57b593f862fdea93bf92cd27135e1479
BLAKE2b-256 a4daf120bb5460de9638a66bc455b14b0152fc1a101cb94a88ddcd558396a2aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page