Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.66.tar.gz (9.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.66-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.66-cp313-cp313t-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.66-cp313-cp313-win_amd64.whl (15.2 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.66-cp313-cp313-manylinux_2_28_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.66-cp313-cp313-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.66-cp313-cp313-macosx_11_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.66-cp313-cp313-macosx_10_12_x86_64.whl (15.6 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.66-cp312-cp312-win_amd64.whl (15.2 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.66-cp312-cp312-manylinux_2_28_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.66-cp312-cp312-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.66-cp312-cp312-macosx_11_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.66-cp312-cp312-macosx_10_12_x86_64.whl (15.7 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.66-cp311-cp311-win_amd64.whl (15.2 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.66-cp311-cp311-manylinux_2_28_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.66-cp311-cp311-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.66-cp311-cp311-macosx_11_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.66-cp311-cp311-macosx_10_12_x86_64.whl (15.6 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.66.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.66.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.66.tar.gz
Algorithm Hash digest
SHA256 fc732bbff16371c6ddfd0d36279a0641ff0a49f8a02d7ca88b2f5eecb7e54eef
MD5 06422b11df041974a0c777ab81aff93b
BLAKE2b-256 cabda6b60860c0955b90c8e1860eac595c848f6ec8c379741a169ed69f8b20f3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2b2e5e7e063632b40edaa0994ef957b244df2eb0c05f4787c667736a7767054f
MD5 a6d17bd8f741a9892b49d8513122cd26
BLAKE2b-256 12f69f8137edd688877ca114adb65e26993014865b76d5fac44e2510072107d8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2aee28c6d5ee051fa6b8301bc961ad3e81217f52709933365beab5fb992a3125
MD5 1ac33d82b8ccd40324bdc0591771641a
BLAKE2b-256 74d71d5422e4c55f6aa2254eda9e594c3e45e3e14ce68d1b1da6cc672fe16d8c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 9f9c3e334a167f4de9251e46d89f6bd8a647eaaeb397b102582df7d77a83b3ff
MD5 4197f48513ef19608826234db176ad5a
BLAKE2b-256 321294c08339682b1bda04664520bd3d71064060c0c65d99416ee9b3438e9c77

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b0df67e50d670bbc9b9f1d167a20286e98349c28183a8032d9529a122121a085
MD5 05d7ee8675deb202a031ff79c8f041f8
BLAKE2b-256 059daef11d94eecce9c858ed27396d107a29086c507d344a4f86d16765ae1dd6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b8df56a4d3de81825682af964192981aef0d5067fa86e261979be911d538e6fb
MD5 61cbd24415d109fbc4f5f7900de240ad
BLAKE2b-256 b33cbc9e39486e1731c5fd9a5962d5d2c37092576d2b15738a37876b497055d5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9366372213397c5f8a34b83691799e27080e2fbc8f632489f0d2e5009e9c7f92
MD5 72045b392984c11cd4bb52923be137e2
BLAKE2b-256 582f10407a396580fc89578be3316bb0a9c3260b5e1823c74918b88804162f79

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 855c2e75425880ba575da4a1e26e1af01307838c2e650d077bbd49cc45f80391
MD5 678f1a2eb5998733aa9a32966c26e99f
BLAKE2b-256 18eb6db6e7e420fe4258cacfe9990867dd3281e2e5e55e9586a3e03045928f04

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 4216cc2b48bffd3da714778b215e60522b7ec5057b192745747cf1b2547b3f8c
MD5 e383a4346489d6d62d72be310f056e01
BLAKE2b-256 6de0f16c608d78c6d2b656d806708929de90240a1ef49fa898e53d309be09386

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f7cb564564b39e3782665fcb88434947a0f78ed3d46c4d2ede381a27eff4c43e
MD5 e77129057bec35d1a752a0ba37579d79
BLAKE2b-256 5487f6faa3b47bf978decf61b3fb8bcfca8fa68f2509f8dfa962df7b4e24c755

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d9bd29f4511e5cea0a20d84e553f8bf41786170be31b5f7ed099b8c895f73edc
MD5 cd30c16eb3e9760665c8457dd5ff96de
BLAKE2b-256 36d4947b195cbb6c8da575cff2d322f3ab0275ec7925e425ff0ace87b3a38f45

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1112e77a39664a46528b1f66f7aba2d5b7bc08d03c202fe6b0836577af6f4686
MD5 d13b51f87987acf072d029fd12d53253
BLAKE2b-256 7c8f3b53ae2e09479529adadb0eba3f78f506e90b0faa338261b8b4528382fb4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0309422770b86db2d32be6791091e3139995f12f86acf1825236a452ea175302
MD5 c1a78b28f0df8ed36635452350b73378
BLAKE2b-256 c2e83ec96ec6e447e6482f81ca40622e738da8d048352103036c06cc059447f6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 fc5979af09a22383fb42b05340e904fb75f2181f8706ca5de865bfb9947a8738
MD5 b54957b241a7ef2cb194cb3730dedc66
BLAKE2b-256 ddf5641507418d723599e4ec6d79a1fc40ac2605e71d9d80bed7cf11cacd2ddb

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fadd90daf57867dbdbd1ed5373827658c1b135904ee8cd8a876816e95851e2dd
MD5 550dcc019ba83c9995bd12739f5e3af5
BLAKE2b-256 2f8530baf36ceddf937707accd55ee9bc49730d3cab20055afafb22484bcb5aa

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0537daba9ceed373d7241de2a8162850b33ef6c1db7414337bbcfd70209ab4c1
MD5 8d76a50880b241def0ab12520c434a99
BLAKE2b-256 cbecd057f22b8a08cdcc52f085d46fef0971ec82013097ffe1513adcbe02210c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a16f59187fcf00c93c1d4e839a0844931ebc3958d804ff79a6b06e3a0e256d7b
MD5 df214cc169a3afd8c4671af930639943
BLAKE2b-256 38f2bf64b86476ebfc53c23e2ec65932c094554f80e9f9850cd1692006e7b06a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.66-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.66-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fb9a7586cfeb19b15c948d4add33575f16acaa42be7931bc4a88ab5a7d98755a
MD5 a795f62b32bbf202f2f44282610af6d0
BLAKE2b-256 0538a5a7e4509bf4d7ccd2590198a8383d2687c869719e8d5db28f03c257cb6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page