Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...) 

# transform
data['out'] = data['content'] 
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.45.tar.gz (5.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.45-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.45-cp313-cp313t-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.45-cp313-cp313-win_amd64.whl (13.4 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.45-cp313-cp313-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.45-cp313-cp313-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.45-cp313-cp313-macosx_11_0_arm64.whl (13.4 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.45-cp313-cp313-macosx_10_12_x86_64.whl (13.9 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.45-cp312-cp312-win_amd64.whl (13.4 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.45-cp312-cp312-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.45-cp312-cp312-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.45-cp312-cp312-macosx_11_0_arm64.whl (13.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.45-cp312-cp312-macosx_10_12_x86_64.whl (13.9 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.45-cp311-cp311-win_amd64.whl (13.4 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.45-cp311-cp311-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.45-cp311-cp311-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.45-cp311-cp311-macosx_11_0_arm64.whl (13.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.45-cp311-cp311-macosx_10_12_x86_64.whl (13.9 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.45.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.45.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.6

File hashes

Hashes for cocoindex-0.1.45.tar.gz
Algorithm Hash digest
SHA256 80c0d93e3a953751098714957671962ad85a3e44c950c13c346454fa75f203b1
MD5 38fe77ea10dd908fb7546b5535b48482
BLAKE2b-256 bbb2d11c24c367e6526833c42d287acd9829353f106ac3576dbdd9d66a12564c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ba9707978d6bd431e6fe7dcb107ea7c8be0c8fb7d4ba9e1d5c4af15e0904cd85
MD5 bb13a45cabc68bfaa38f8d8637194f95
BLAKE2b-256 4baecf705735d71cec9aaafc21a66bd8202719ed9fa267505f188b2f79619498

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f220474794075f7a0bab52f2570e39e2a917a96c4ae2db85a2f461a5659e45f0
MD5 28deb138aa0eb3eaba1b393a0116a92b
BLAKE2b-256 0c120bd14e73d9e2bc5e28f795ad9f79781b2d33076b23f3750d4e899c6c8266

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 583977f5af23204460be38f1ee814e398496a72e86515352cadd757a088848d8
MD5 2cae14683314dc4410a96ca21b030d0a
BLAKE2b-256 9c96c7e411486fc172020a3cfdf31f23354ef7ec252840b3e96efc36bcc46ec5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b87166bb2faa5556eb7d594a586748eed202f038bd357c8ae410a5108383c517
MD5 332cd969cf2c5a1734a84eb9877a08ba
BLAKE2b-256 91b3b50ce925d68a000a380d249d1426dc4038377fb0711edd1aa80fcee2534e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1bfd27cd5e1e9ab395ece07885e57e620657f36c723f31d459cc7a66be097f95
MD5 d7314ec7dbacab451096825a422e6f3a
BLAKE2b-256 49ea9598473b8ff4a60d07c436372ec97dd76ec928e39a8179379035ce692cab

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d43fb096b9bdd3a20c34cfeaf8e1e3a02a7675f6b5c5c9336a722bcf279b83de
MD5 afb3d35a656380909260ba5c710587dc
BLAKE2b-256 8b412d2efd1986a55c7b9a9768fe83e46d4a7f5f7dbbfc862f319e041b88294f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7a36a813609c11a45509eb656195a686245f9523b1d9824e6ab343fead385d0f
MD5 dcc6fbb4c9187a6aa04b8deacc44fce0
BLAKE2b-256 2fc763dfaf8cb20aa166adec85dffb96f35cba46f3f694a931e187a205860df5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a2301ab24262583c5b18ae780c58704dabc16ef3cf4719c21083da8f1886aca2
MD5 4240cc191d147bd69c5af595d1e626ee
BLAKE2b-256 8fe28333b433f684d57732d687a7f874a5e96b618cf48a980ff86ceb0f633a9b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 667ec9de7a4e4054f4db65d5c97687db922db346dd6e394c60295936c292051f
MD5 d734b57f5f4b22c41098c4d63d1777d0
BLAKE2b-256 84692830a0e09dc189c2639c677fe0f7cabd0ad6a8a755f649afd7a3f480744d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b5479d5c81da45264129f0d5f80bc949e5f2a244aebb8eea61d4d2eb77aa5aac
MD5 d87fc499d1481565670d0aed306f736c
BLAKE2b-256 c4503a1ec39f371d29e81e9def92c54438e6ba12ed620184bf10e3e9566e8f90

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 349bcc218a2690b4a87c87ec1df067e8f4424f6b7eb768037623a0dd60585a54
MD5 181843d249b7df9f83bbadad7e624448
BLAKE2b-256 7f95f8cf14c0843f78a80e70f525c072e3050cbcac295b5e990ca7de2ce4d202

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 219afb619624e16e924e4d611e551d1585fdb6d18874fad731593df2c0cfdb53
MD5 2ee2715e3759db900d4f2c955dc804bb
BLAKE2b-256 a8ac77b494fe1b2b229a017be76554750bbae4f753cecdbd0406da8a86a11ed2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 3c8bbd41dbf67cb809ef46dc7069745f4145809ea6df8ed69cf3a6c192b6f256
MD5 15abeb44ab4f05b0af2cd0d1c222f1bb
BLAKE2b-256 6c53966ef8e0592c91913ec6b1799cd3943045834f19a0bbc9148c326ccfd135

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6f719d5bdf95a1d5bdd3f628161bd878394e663ccc582ea3b01caf5b8417fa3b
MD5 c94d731deb47fb8dfe0b0e87af0aeea5
BLAKE2b-256 a1a684b6082e0b469865f00e0da42d6243139e63f600aa9749014a8d4abc0d7c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8826595af45a96b28d328a028d812d30e07803df98d442070de012b58d73e3e4
MD5 644f5f50f65ed284b7e4f674abedff60
BLAKE2b-256 2e2f1e5a7d20acc2e80167402962976b18b9cfc6f294c029aadbf4404ea6b2b4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b84f084c7f2f89b845eee2a8cb058e3011fe7496cada47da3928caa4103900b3
MD5 b5dc7e7db05ae28680fef51c477bcacd
BLAKE2b-256 161d78ef8fd8b79e9c65c233826f6eb9677c62e8a4ac2824791c2a0cf83f87d5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.45-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.45-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ed94c26576a5846f05767b55f60ff54c9818db5a15c8c5dfde100f26ffbb799b
MD5 4844bc5e6cb0fbc43846cbdc87d75123
BLAKE2b-256 355ad76f26df2270ada297f83f935c28c588c95f524a730555023e1f3f00f307

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page