Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.71.tar.gz (10.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.71-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.71-cp313-cp313t-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.71-cp313-cp313-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.71-cp313-cp313-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.71-cp313-cp313-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.71-cp313-cp313-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.71-cp313-cp313-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.71-cp312-cp312-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.71-cp312-cp312-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.71-cp312-cp312-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.71-cp312-cp312-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.71-cp312-cp312-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.71-cp311-cp311-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.71-cp311-cp311-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.71-cp311-cp311-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.71-cp311-cp311-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.71-cp311-cp311-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.71.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.71.tar.gz
  • Upload date:
  • Size: 10.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.71.tar.gz
Algorithm Hash digest
SHA256 b0b5c50581b229c453b4d551998922be7ec82c550016a8f87e9f5971fb95398b
MD5 80bbfa60cb09324041874b5b39b5a3a2
BLAKE2b-256 d978e35046d89e6f324fd584837e3efc9d1a72ed5383923aa3a7561f3ff46e0c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ebdd7873b77f71655965f405b51bba9370ed19652b374ad05b0f76642e87ee4d
MD5 864d7df1d22cd90ac9e552560352883e
BLAKE2b-256 f3dd91814f5516572e07111d75262724a120e0ee2927c5695a2191a515faf4a4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 16d25c58f01e4253c9ccf254057ac0eb5ef1321e7066b5cd6289031f655d735f
MD5 f41acee7925ca9357af238dcc22a3bcf
BLAKE2b-256 35494ea25f20ca7da70da90fb0b237fab0f524532ea596ebaee217907d10baca

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 f32cb398391eb28facbab64c46a79d13c2a0bb282ddeaa50c292596588084b51
MD5 515d145f6242354ec67882db66eccdde
BLAKE2b-256 f6e5f20074c0798eb16a11ca88190fa89b183764f98b5a32e30d8d96fdd218b0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bacea56641a6c61cd1fe3964a1e89270f30f52399016f02995653b842095157e
MD5 da70edc68c32f510e7152cebf789d865
BLAKE2b-256 a3d0394c10d931ff30e7c34b65a97352a0eeabe9c2aaa66d8a4b8e51ec39d696

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c3e84113c783506f6dc89fc4ef6dda39080e4b2a6ab36277461d3304b3b4b866
MD5 977d7d471e02a747f4c903a4c49a25d4
BLAKE2b-256 0d1d5373dda4194078972d9ce242cc056051c62b57886560820336bae38db3fe

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 46698ba484037484d33b8d89588d3e063e4f8313d83c71f42bba4ea8eb0a4900
MD5 fefe19b38f0ccfb2cdfcc8494febe637
BLAKE2b-256 ac6b1944a153652e0842c8d4bdaf3d9b407094b1342c603d2ee6f55df0cce4e6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 2fe9c27918bd07996f229ba3d4e8782603c44152e6b5eea71a4633053f64c951
MD5 1fc9a26e57f64947c72d1db1e83bc3d6
BLAKE2b-256 f297647847dab1c1062917f28a4ed81df07580e3cbaf5eac21d8bd0bb35fb3a1

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 b122cc8bc691a3e5608abda7369176cb8b711437fbfa2fc3a6111e6ce0f6ea3d
MD5 0997a9d73ab7afdaae95e87e3891f884
BLAKE2b-256 61c5ed0e5e27c18372e932622b0dc680c71d935690588d7d1e0ef8067718e108

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f53bb21cc267a3b34c7ed38c850a1bcd2fedcaa1860f763d6daa0582c9a96ddb
MD5 297402a0397ffc714b0c488153ed4adf
BLAKE2b-256 f5ef3e0e0337ca2e4976e285db8422154bf8a855110db9dfbdbf3c0b9a3a4b1d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1cb9b7bdad61bef77222a0c5c08a75f778947adec6490c993834c90654f7c611
MD5 195f6b12b8ffb9137ec2b4a5bc955273
BLAKE2b-256 6b0ca90117b782ffafe9fee425f15ae0342c9cf03c2725fd06742584a2affff0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 408955d260b92b924abfa2e9aa6ae75bad43a62d4381b60fdf65c8fb5dd00ffc
MD5 e2c3efa79ed1556596ddbcc91edd7c4a
BLAKE2b-256 24daed0e28da11c6a4d4b79c174d89b727420476cba4cee5b3f5001620ed24a0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 44ab3abd8b0f37d4eb17d2f229efd16802725f9719226123504fac0adc09ced6
MD5 89bda42cabbf900b22f376b60e5cda22
BLAKE2b-256 c2de07fa7c2098a04cfa799270396b3cc4b13bf1b272d02835f78a22f653c327

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 10988d5ec438847216b16e2412c45bd567258966780d55c8172070c4c5931914
MD5 aee44037cd16e208b61170ed1787338d
BLAKE2b-256 4347e7032344b523406e1a26e33822731f145a13c5fe87f642e9927dd267a6db

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 57b9c4a6b9841d05a774649bac393b7d32001140be5e8515036b486d4bca3b6e
MD5 242fccf101850c367f13e88cdeed2133
BLAKE2b-256 2153142bb7a2c55a6fa2744c0dc53e02c20db948096dc13d7987cb0ccf765a42

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 df56683492e620aa5d2b62b5776ec5443bc42a9bff80bf4036cb8934c2a2ff34
MD5 c8b0ef8ed16891259c5e16debde2c4f5
BLAKE2b-256 f9448fba5706e76ba73f98c7c7af6501d1187835005b6c7bdd3e5228d3caca0c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 631002c357bcafef0edc11b158da05bfa5ecaab226826d9d18b690486c759762
MD5 90e4b99d7c7638051eb54db0e645831a
BLAKE2b-256 2d5789059db99a1259ed565b94ccf2b9e1c0db0ef5d878bf695df5dfad669b67

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.71-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.71-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4835421a8b231db99f60de952a3f64a0104c3f00f2bb437b7aff0cc050749c3a
MD5 efdc89e0f9b606da656d4110481ad9ee
BLAKE2b-256 827919de738986a1073466da2ae35595af34eb32277b4acdefd61c0e8b9cbccf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page