Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper
Multi Format Indexing Build visual document index from PDFs and images with ColPali for semantic search
Custom Output Files Convert markdown files to HTML files and save them to a local directory, using CocoIndex Custom Targets

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.77.tar.gz (14.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.77-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.77-cp313-cp313t-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.77-cp313-cp313-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.77-cp313-cp313-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.77-cp313-cp313-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.77-cp313-cp313-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.77-cp313-cp313-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.77-cp312-cp312-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.77-cp312-cp312-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.77-cp312-cp312-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.77-cp312-cp312-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.77-cp312-cp312-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.77-cp311-cp311-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.77-cp311-cp311-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.77-cp311-cp311-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.77-cp311-cp311-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.77-cp311-cp311-macosx_10_12_x86_64.whl (16.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.77.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.77.tar.gz
  • Upload date:
  • Size: 14.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.2

File hashes

Hashes for cocoindex-0.1.77.tar.gz
Algorithm Hash digest
SHA256 23f5defacb786e619fafe0d7921ffe7265d05189988990122afcf80cadb35a26
MD5 69983d8b0c09de17ebb500e6b5c4e369
BLAKE2b-256 c88a948e87cc09ad323438337cc811d0dc3201903124195d8803f0a738ce452d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 31c94bf9a36c5927b809dfc44b2c1944ab277dd55d915364439bb2999f3c6042
MD5 3d61afa5c542e5a81534f9ab6b628a50
BLAKE2b-256 a6c15c499ec9eed7eea21ec14fd9dcb6901b7ac74ab92c3520bce541318a5d8b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 68d7bd2c93b9835e0c6858feebf7f3218835fddfe7d647964027d3ede3f21210
MD5 7f779ee22923dea7af76f03ecebafdf7
BLAKE2b-256 a9051c46f8f70c749af2fcf8d9aaf2c2d125077c627ca2a059013bf8c3661b68

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 0a3f0073f7ff60d646338a55f1215cdcc1378aa27f8d39ab6962590aef792eb3
MD5 606abbe2a995737cf4b0a4ce4f116fee
BLAKE2b-256 85ccf9c986d0b2b4518ac9036bcfe30b8bbee58c40bfedbb627ddcb5893ba17b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d25aaadf0755ab99c764408664efc8be060834744790212134e788c9464a0f37
MD5 d6f8d895059b6f70d7a787d4b1d40153
BLAKE2b-256 021e88197f02d3b0aa3d8d39a2423abbbc9775cc6d6219858c989073895bacef

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3314e5a2cf330d290fefb2219ee52c8cea85631cdfd4611f9f8053b96726f66e
MD5 f7d7faf9c1878f9d323fda9f184862eb
BLAKE2b-256 81c928b1b65fdf5ee9df8ec0e3759271e0fbe77c4736c1aa0f088cef86ecc315

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f2e05935d6e10fdda32ae8ac428dccfb552043a1a68aab218a2d13e3d399afcc
MD5 dc4c1b1aa47e0a4e59bab0d186c412a0
BLAKE2b-256 ed7e9936d8c711cfae6b2678447a7929ac830de18989d1d07273b66027f47efc

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 555d54ffe71edb73d37c8e43bc0f60077336233f682e69d2cb1d8ef4a6210c52
MD5 4b350723a833feada51b77d7ea5cc506
BLAKE2b-256 e6332cc9c454211e7442e933538c0d0b897dfb2d5882ec9195846817e74d2e38

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e660de0f0595c803c61e43f95cfc4c40be44621d5e3beff6b898b5d82687e070
MD5 307f5567f0cb8dc8eb67188ee7a00f1b
BLAKE2b-256 5ba3f3b7d7110441c582c40486913a23ec81d745a5693ac6c35d61f365824615

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9eb6b31f15ae3c0a48e9d5ab6146ea78ab864b5c8be8a485b8ab28d51201665e
MD5 fbab884eb776efe466ca940b3ad2a561
BLAKE2b-256 3414c10af743c3627ce9c6cda46906f6bd582e0f49ecbcd6f407fcf23d99fbe0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 60ccbb34a315bb0e1e854fc0795e1d530002cfb40d71f2e6fe9a27bd01adac11
MD5 61a30496ca94b0a1fdbb9ca224a233e0
BLAKE2b-256 f091e291e9a9f30d9909377729230eda311c30ed48fb68ab715a0ca3d01fa245

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 08de877e37789c8a1133f1772c7f1d0102e608eaf4000a52f3d81f323da19649
MD5 97bbf03e3bbef5b5d4166be1a4d1eb16
BLAKE2b-256 51d86a3d128ca451e4c9407be476845e6f76469c1d8e90b5ca23270b7554aa20

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0578e3cfdc501af22b59b729403b6b86a843f25e223e53408f61f6228ec4ca8b
MD5 9971591972303b543a304550717c0dd4
BLAKE2b-256 1e1a3abc3f75b59b9f6d3492fbe87cdb5f6a504255f56268d9ccacbe0a558679

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 c5b124b71d3dc23c5dc3304ab9b8e935929b7ff97e1506e815064c9594c1362e
MD5 a1554aad99c35459a2be5a1cb8161ad0
BLAKE2b-256 0acb2219236b792409a5e9ae4312914e3b70358fd9395f3d0288e8ec02821aef

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9f1c0789ab3b990f475925a4e9445ceb005a33f3e07f78329ce3760efb3d5596
MD5 629b27c67aa4e6bf34e16af3c42d6073
BLAKE2b-256 3e5555b1ab77748bca53254821fcd7f1ef504b8d1f5739cec234350a92b5f78f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e6d172aeafc397792c73fdd02e33e39e1865967f866b5992ef34bbe3184e1d1b
MD5 079559a10f98c177cf09ead1283310dc
BLAKE2b-256 e521b98651884be8e5e5ac80d445cc18cbc01d8bed9d11fb779fba62b451966b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3f7764aba8d1a8c4a65d1423ede281c6e4e448a32c609b1ca6ae8bbd6862e2db
MD5 dc38a80cb8664e179bdadb69e5495d95
BLAKE2b-256 7a3dabb6ff0a9f91be37bd863eec0c791290f823be148bb11b15c1c9a0fa7470

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.77-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.77-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b82ecc2cd6e57a3bec86958b03b19d0f99d75c5048837a9e6724c805bb8f11e7
MD5 a74503f56c761a5db16a9d6a149fd840
BLAKE2b-256 1a7fb7cefd93540db72b2508ad088b1bddcb149b90e78f246094f6a0e28365af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page