Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.69.tar.gz (9.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.69-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.69-cp313-cp313t-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.69-cp313-cp313-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.69-cp313-cp313-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.69-cp313-cp313-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.69-cp313-cp313-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.69-cp313-cp313-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.69-cp312-cp312-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.69-cp312-cp312-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.69-cp312-cp312-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.69-cp312-cp312-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.69-cp312-cp312-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.69-cp311-cp311-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.69-cp311-cp311-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.69-cp311-cp311-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.69-cp311-cp311-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.69-cp311-cp311-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.69.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.69.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.69.tar.gz
Algorithm Hash digest
SHA256 ba8df18fdda66fe5e7fb89202fc08485158858e5c431b67b9bb7f084ba5da2e4
MD5 d175b3b323887a4d41f55f4a573fff91
BLAKE2b-256 944a287bca3b7ed4dadcfa0470889049f2c10c420cba9930e905632b1bcac781

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1ee9d3b6929a63953bc17f0f26b126403ef6a613f35a774654312284d2c5f138
MD5 55e7dc3b19b8d9542e716575774718d7
BLAKE2b-256 0d1fcd61be90c9855a72f9b1de4dcbcfa7f93ea0e4863d9d65fdc3fddc229311

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 94ba49d67d687c1d74ca019a4f9c7917e87b08ccf4ae295713811775a5b32ec4
MD5 12712b4a03fba23ebb87e29640d187f8
BLAKE2b-256 0e7788825a08c70ce6118e298d6b10d2dd141c86e3f6bbb0b1d68f698118eb27

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 ded8c332b3a539f5f6c24528b4d1d653f82d422dba2c0c1b9df9d4e3e2ca0c64
MD5 573ab8c50c5c1859ea997aa11101cd5f
BLAKE2b-256 1afd85999dc752358d47bf105031a83901f55ece4ffad5ae05f8a99953ef0ea3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 37aee8376243d751044bd5c8c95f3b0f5946b19f1d88bc6e499bb8164937910f
MD5 e925401cf5c107ad4bd91a85fc9b4aac
BLAKE2b-256 0d38dc6fefdaf310ab1100bff6f5aba8bfc89e381510ad3c7e8b1b5afc249b43

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 568e5714e2d46441c1d51b65fe4fa9ccaf9b5c3e3ccd07a3e23c89d07db2b51e
MD5 9e860c8496fd472d09f7e95ae36a66f3
BLAKE2b-256 d1d36f5b9566a499afc329d459725206ba3b7fb1d802fc05cc14b4ddd3f5dabb

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 58c88c2be34e75c03d82c9218420a688bbcd895472e6bce9e03e5c8f61fd2821
MD5 1a997e89b60f2f6602ab482e5a8a7574
BLAKE2b-256 3c021f14070b2933946b1cb5e9679c86946e71168d87fc05cc8db9c65881367b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ec1c03a95f6bc32cbe835870135216731234ecf01ea918eaef64bfccaf83782e
MD5 748c1c69f4dc700b467725e6fa5c9bc2
BLAKE2b-256 1e721ce9dce3a0e8a564602313f178a691bc1b1ed43a5b66646ff67d77a706d5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 926b772747c526a098a43b38db3019cc1c232b4195622431a8044f122781db6b
MD5 cd9e641650e5dbf0270ed0847f9e3225
BLAKE2b-256 c67a30016814734c69e51100ef5423242876042c8888a0b1689a618aed4ba53e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5b58942a5c9474a3b9070de46b450443fcbfe9d68a410ada0ebcdd6aed1a6181
MD5 8316ca7020b4db31a828df5c41c3938f
BLAKE2b-256 d533a84da6cad25a9cce3c26aaa40db96362b0b45a9a8a89d2ef205a4c354142

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 82190eb9f4800c9afe6387e7838f07b64ada8bdc30288c86196cb5e3170a750b
MD5 4d159c84a7800b5a09dc5015a0d4ba88
BLAKE2b-256 15b48c1695734e2923a6264a29d677d7c114f635cf2dc83fc32356a8b2127ab4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7e1e42ee4718818580fd6a9ac1d185483185492ede5f29ee0923b7e9971de04c
MD5 b7b4c55f1aa655c367e93c9b020b2c5b
BLAKE2b-256 ecdbabc7fc57be6089b1bb261e455376847a14e6dec14134453818465fc6f680

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4435af1fd43b99402554a909a3eaa2ed8da60994a79b0eb45edbcde98a1168b1
MD5 71c0bcf44a3330f17114d6475930c565
BLAKE2b-256 7a2dd9d560f755cd6d2879d8b96493920f18d2866b7d37b18b7f2fef2a1fa6c9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 86f57aa76769d64a154c36224561586d678f6a0fc81cae6094f6192b0f089598
MD5 46b2f3569f549cf1094959184d0cd852
BLAKE2b-256 e5e113d4cb959f9d7eec8eac65b35ec22344b8b801c92e402dae26cc3e265773

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 03f18cfdf312c3c02bd4eb6018d8ec92c84716e5226fe8e2a75cd9781acd5a65
MD5 f8514e49b162a5d092fa347acf9644c8
BLAKE2b-256 8b7c67ab1f91f447ea364e6084563a6a4e6aa3352e0242f44297e107ecad0658

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 45d2ee464809420ee225ca14c0b7ce91e9d7ec6a4cb1922c28767b65f8a971a1
MD5 92322da31beef2813439da2ef737d5b1
BLAKE2b-256 ca3f48b5225de86cc9d3c1dbd0bd94de4351243237f507f168796f7d2b16ecc1

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0406969b0b394db37c9d545707b39d47d531b9a4b9ad5e6734594f5ee9a84102
MD5 c947475dceb0331dbad343466bc66c80
BLAKE2b-256 1e611490a83985fcd40e9e3c99946b446ecb43e5153fc785f56fb64962827ac5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.69-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.69-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 21a6adb761a0d7d1858e7095527c2d126f0a0249deb74fbef2fbc12db08abb74
MD5 7100b49cd46d7ba8448aca5a776a1184
BLAKE2b-256 626edb7d3e86743bc29fdc1bacc12eeb71a2248d32f11df81e73866bc5fe1299

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page