Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.67.tar.gz (9.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.67-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.67-cp313-cp313t-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.67-cp313-cp313-win_amd64.whl (15.2 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.67-cp313-cp313-manylinux_2_28_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.67-cp313-cp313-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.67-cp313-cp313-macosx_11_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.67-cp313-cp313-macosx_10_12_x86_64.whl (15.6 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.67-cp312-cp312-win_amd64.whl (15.2 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.67-cp312-cp312-manylinux_2_28_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.67-cp312-cp312-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.67-cp312-cp312-macosx_11_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.67-cp312-cp312-macosx_10_12_x86_64.whl (15.6 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.67-cp311-cp311-win_amd64.whl (15.2 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.67-cp311-cp311-manylinux_2_28_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.67-cp311-cp311-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.67-cp311-cp311-macosx_11_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.67-cp311-cp311-macosx_10_12_x86_64.whl (15.6 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.67.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.67.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.67.tar.gz
Algorithm Hash digest
SHA256 ea1609912a4b330686a3446b05dd4c43cfd72c8b99ecd261ac483c8da2d93af6
MD5 27032144c2203f70242559b7fe3c4ba8
BLAKE2b-256 e56bc2c0cf6a0780e22bb27c10584623eb2bc00eade68c89fdbdb6c2004c964a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8b03a06fb8854007e99ec6fbd14da0f0f252c7606799ebd86130c259f10c07bd
MD5 e6621887309561ea971895c6ecd8b2e6
BLAKE2b-256 0081c7df5279b088b212c2cb74b87d9230cd095ec44334827065ade71ffc451e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 06667cc3896303287b6be4ffa94413b6e3c2f94fcd5928c7b76f21766cab7f0d
MD5 f3767ca92e7e9f19adcdce09290073c6
BLAKE2b-256 3d93a751dbb209cb11c762a1431558debd15b60e1399c27152d94ed927e83173

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2daafd6894422e16475e41ad33644dec8b90ced5a42fa4b8b4eb56c53ae4fc2e
MD5 0f932faf96d00336519eb0fb54f52b87
BLAKE2b-256 a9ccf69b206160ef7c54aa3a4676b3ab3048cfc1e3dced2397ad4b0fd817bf6e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7bd00f16b387897dd6c8eb2f472a4be1365488d089faa338c4923d0694f30ce3
MD5 489b25f945f077fa0edf3805988b1a9a
BLAKE2b-256 372a1151e58de3ad53d7672d475bd0e440d4987303f905861da4872fa6b10c76

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f4b05180a0abb8adb120b55eb637e2dbbde98597c37039e802bf6d14debd14e1
MD5 15d1f79aeecd353841cdcb534b38e30b
BLAKE2b-256 b19f4ee23c2045257d3dfdd2808ea0e4ca307407541032d5b171de8a11c57457

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b13b84117e676f024f8655cc9802c1058e63ec190c0246a59d9aa11dd275f0af
MD5 d5e3738a5ecf2fdba663fadf18d47023
BLAKE2b-256 75aeb965c81a1ea9088b46e1b9ecde6e61e36aa036bdb4c285a63fb527ced741

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e03c4b9dc347fb4179be70000e6f049a15f96227eb82bc6f671acf0a74a70c7e
MD5 602efad010ac4848a2062b4fefd0c65e
BLAKE2b-256 6e288c4ac81f2d6422e5e019b10e8ba883e85c93caa4f2ff648bb40d657ae8d3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 47f711e6176254f0f54d60e559d8ccb0082eea74084b7f82205f0d3efa44a337
MD5 8edcdb7acf0f6e00ab4621c6690c64f1
BLAKE2b-256 2e867ee1333677cfdf667e68e0d0226977681f7805c27841c114e9f9d823cc0d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e05b89789d404ed4efd553d4d18d21cd1ff98fc4bcfc8529ebccec16282b1ef6
MD5 fe5afdc5cd0809d2d6081e78b85c1c78
BLAKE2b-256 2096b29d6e233b87e4463a8028ceb81af06502e8470675ab5f0d5f75b93f25f3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2a5432ada6d8d3c2aad9ab67c91b975881a46f09236c7df0839982d005aa70eb
MD5 63a93f03647f1fa6667e172a247bedad
BLAKE2b-256 d8fd4fb51a90f3484e110ebb65c2faf5db600b419347c0c8fc666f9f6515d619

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b65db5e34451bf3cf2f6f2090a6f58aa81449e6e874d40de47c6732e9484b3bc
MD5 50808d2a41d15745bb5e8b89a8fcbec6
BLAKE2b-256 a4579f1f49692c89b53948a2e27895486c56f0d8b1ee3fa776e307f1dd6e690e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 12623c307b21ccd29a0f549735e081567fb51927b79cc9e73aab85113c3a0f46
MD5 b8bebddfc8018e5ffaddbeced662cd2d
BLAKE2b-256 98b59b64bb5eb6dfd9247c3f8d19c10e0e757912cc0e3191b2fe820de9fb932c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 234b536d31cccc830492cb46f44293a06f739b876498e3d37ef571ffb45256d0
MD5 d57e97313098f92766f085278182cd25
BLAKE2b-256 dfb6b7bd09d4e7a8ba0d807c2897ee3ec426df9bdbc5761b95f5c5055ef22307

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 679723e95a9dce2cd32a8adc77d3c49762f1bba7041fcb2145ee40adeac514f5
MD5 ff4d6e40e4dcdcfa6de77377bc52c7a6
BLAKE2b-256 bbef2caeb7a79ee536c57e90de3335826d5efd3fdfc50a54edec9a59decc4d8c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 076383a9da726e1c291c669fe3a1ab3827769cfbee7e7fe1bfea2d1a1c3d1230
MD5 078b7255a0813c1411cd075920ce1bec
BLAKE2b-256 d4eca70ce72ffb29935a4549d19f7a06e7b0637200ded1821e55cd39aa2995a8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3b17c200ab5051e08cada6daf285933510afea43ab4e3e3ef7b0e922537a2fd1
MD5 85d62bb63193f74290757effb06a40a0
BLAKE2b-256 5aab420bdd319869b6d86cfa9adcda066db2e157167aa85bd6d2fd9c03119c68

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.67-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.67-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3ba0d68d9171e0a4d1e94407354b04145365846ba21be658a379e8e7fddf89a3
MD5 d700a0ff0ff64949da26007ac7e8870a
BLAKE2b-256 e992c4a005945ed8131e08c438191339499a76ce8ab87ce52575af55508cf414

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page