Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper
Multi Format Indexing Build visual document index from PDFs and images with ColPali for semantic search
Custom Output Files Convert markdown files to HTML files and save them to a local directory, using CocoIndex Custom Targets

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.79.tar.gz (14.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.79-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.79-cp313-cp313t-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.79-cp313-cp313-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.79-cp313-cp313-manylinux_2_28_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.79-cp313-cp313-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.79-cp313-cp313-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.79-cp313-cp313-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.79-cp312-cp312-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.79-cp312-cp312-manylinux_2_28_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.79-cp312-cp312-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.79-cp312-cp312-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.79-cp312-cp312-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.79-cp311-cp311-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.79-cp311-cp311-manylinux_2_28_x86_64.whl (16.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.79-cp311-cp311-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.79-cp311-cp311-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.79-cp311-cp311-macosx_10_12_x86_64.whl (16.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.79.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.79.tar.gz
  • Upload date:
  • Size: 14.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.3

File hashes

Hashes for cocoindex-0.1.79.tar.gz
Algorithm Hash digest
SHA256 96b09068a76fa0b711da4fab78e36e985e6d3770303d5d2d37e631611150fc17
MD5 b5fb195774718e2cc959b50008ddc1bd
BLAKE2b-256 135fff621b1913c1a01f9f7064fd54433df0e88dbff8927a591764c6eae7ff47

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 570e124be63706ffbf5c3bbeb356c06c48a651186aeab37aec9785db9dc970d5
MD5 8b1f7436c1de93a376e83676bb3d6945
BLAKE2b-256 c69a2c4fefffe69cec952a6ce8a9f4141a12cba3cef5bf01aa4fb20b5714e95b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 727ab7f3728308ef7b61ae1c8d61251e90566c5c97c661081b4a9707eb7a9695
MD5 94ec94787d9eab2ca3984c9633b0741a
BLAKE2b-256 db424e58a393102cd36ecbdab53309e282ec8c5894320f44faa1b626e47170c0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2384b28cfabf0c32ab9e1b178b7f4fd7ecfea86954a39d972cfb7f06dc83c659
MD5 1b69ac4d15412df16ed8e61100107277
BLAKE2b-256 a50d0811fbb2115ea81a4022dab69e080d35df4249239f61e3de4367ead6fae8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 afe1d1abef7de5cdf82bcddc0a6f669d1134021c20aa695bfcba283fabd0446a
MD5 5784f2ae068037b74537f73afcda66d1
BLAKE2b-256 b210bbdd2163e97579f32f2ab64926067501a35618dcc8947873a037fe6924ff

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a9729d7ceda240fb59e25e21a96170507c259629fd30d48563d6cc320acc4357
MD5 d538425f1f986a117383aad0fe1ce425
BLAKE2b-256 b28dde0e21b9a8a05bdc181656489c3e0f152d351a5a2ce2c2e1bce7ffa922cd

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e4913eb8dee11f383bf7408ccbe0068149b2145ba0dad68e4349db6b9986fc47
MD5 cc488230de2fd7041da3ccb32f70a5f4
BLAKE2b-256 fcf273cf5a9f86fc199158021adb8a172d13001b66ddeba169f59c37d87fa868

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 bdffe4268d2888eb9d8c45b78b54c7ea1aa4600884d8fd4a4ff8572e1e6e5cb0
MD5 c3a02e00cd53b44e1f71ef3b20e23f48
BLAKE2b-256 60cf34ac61d53ac15496c0483fa8ce4b1d98b946b79ea101ff43e19fc5e88d80

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c5c82657291a83c5013af237debbb52e4a83db41ace3f9344dde5fdbf43a6e15
MD5 8666671a6df2376a832a40a2187bfac8
BLAKE2b-256 f07e0ec4f7aeb7b74e6fd0ae6c4de5985cc274fff1667d1157151e4bd7501024

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a02df517d237106f927ec3ba1fde640ccd894b6cee9c32b81ee97eb0624002cc
MD5 7eba2b26a7ab9b63b04f3982c4c059e0
BLAKE2b-256 74dd730daa6dbff6188d4dca037c0c482d99df536c5fa8cc6f7f1a8a3689fc6c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e23b69f5634e364478e31b89e4e7696e294ff060dfeb704f63b621e00efa80a2
MD5 aad15fa48bec53adbce9edcce2412786
BLAKE2b-256 8c112b40801231d2121635dcef39ca39f34618e99a2494458cc9d6815b566177

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b70dbd86ce8bdca841eaa620c7b9d1eb0a1ed59a0352524ffe126bb7ba851afc
MD5 30f1aabad7dd4ca7df8302a46a28293d
BLAKE2b-256 1210b7e786836f0feb82e592d072376edc404058ac68e5862bb45e65db70b236

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 506ea1e69e5805bb27358224a820a4ae95d17d04bc82a2e54833579e2b5416e6
MD5 65ef58ea56bd4bcc00524e09646deb11
BLAKE2b-256 a911a66cfcc5c8227d7c2848b2748ef6d4594628d84ecd452b62971935c6fda7

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b578c1f233a2c7268ddd663b61a7bad920eb4dfdf7938916adbe1104a9055272
MD5 9aa8108285f53c744a34edf47cf0a9aa
BLAKE2b-256 aaeda998669bc8c3a2e5a29d7535e5d6fb716ea685ad6c024c4682de0972164b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 00b3a92476929ef8a8e28b53cd72726fdcb3efc87b2752b2af2d2722aef4c833
MD5 af7184e6dce4b04cabd36995d9bb554a
BLAKE2b-256 11883147c4760b3b76fdabaae2aabe46e36e3dbf4b6bb985b16ff192f6272056

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c33d623a6fe218fce0810c63ccd8aa21a272013d7d968d4459df86915c2f6b31
MD5 dc3add0c3b95cd57b26bd2f86b85c2b1
BLAKE2b-256 756e730334ec957c45c6f2019321d4186a7b920309a9d2fc23f4e0761089dd8a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1cacbabbcdb7917b8861bebc5960990854ab780faef99e1ba812146fc3e40c53
MD5 91be0796c9661f726dee36cde6018b68
BLAKE2b-256 4cd48f910b1bbc87079d511b055dc2d2c24448e659af4cf8f2fdbbf7ba79c4ba

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.79-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.79-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ef056b7da5e9d756da4c896c544d4ca982577e2bfa6ee1de74c6e1b6b24aad08
MD5 95b016eb3f3c0d872cd57eb5a122d406
BLAKE2b-256 cb6abd7ca291dfea8dc87c998dcdd26311e4ae9e8d02834bf3f075357a3c99ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page