Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version

PyPI Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it effortless to transform data with AI, and keep source data and target in sync. Whether you’re building a vector index for RAG, creating knowledge graphs, or performing any custom data transformations — goes beyond SQL.


CocoIndex Features


Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Plug-and-Play Building Blocks

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components - as easy as assembling building blocks.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper
Multi Format Indexing Build visual document index from PDFs and images with ColPali for semantic search
Custom Output Files Convert markdown files to HTML files and save them to a local directory, using CocoIndex Custom Targets
Patient intake form extraction Use LLM to extract structured data from patient intake forms with different formats

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.83.tar.gz (23.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.83-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.83-cp313-cp313t-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.83-cp313-cp313-win_amd64.whl (15.9 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.83-cp313-cp313-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.83-cp313-cp313-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.83-cp313-cp313-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.83-cp313-cp313-macosx_10_12_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.83-cp312-cp312-win_amd64.whl (15.9 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.83-cp312-cp312-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.83-cp312-cp312-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.83-cp312-cp312-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.83-cp312-cp312-macosx_10_12_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.83-cp311-cp311-win_amd64.whl (15.9 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.83-cp311-cp311-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.83-cp311-cp311-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.83-cp311-cp311-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.83-cp311-cp311-macosx_10_12_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.83.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.83.tar.gz
  • Upload date:
  • Size: 23.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.3

File hashes

Hashes for cocoindex-0.1.83.tar.gz
Algorithm Hash digest
SHA256 7217e1c78d008ee8687dba82cdad483dbb9b6898dc806500b563504cc689be88
MD5 ee26c9e0ac156155edeba98040a0a5d2
BLAKE2b-256 0c8d794b4577ffa6cb10ba0f32855283327ab095bb06e9145b5c3823adb62aeb

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1c477d3cb3747785672cdc585c04236bb16ee6ba057ad026bc88ada0d4e5b55f
MD5 808cb8e82b9c56657d7bc1ed3f860765
BLAKE2b-256 85e5c2d3d89bbc4f9a1e0e51303e972fcd90cc902b56e174e0fd5e705e5d0b6d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 88d191cdc02f280b38c77a4f2be5ba17e5554bfb08e2939476033cda6b91e641
MD5 0407a48635915e15be5cc82b73a72524
BLAKE2b-256 49885197342c8d4c32676715ba982a36320d88c4cc361f24cddff650e7ad48d3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 3f06125cc1968391ef6a6b693f9a52099a74be3385115b9d972a5b20e1ce5610
MD5 17ae6a1050f5f88ea53e176cdfa7ad2c
BLAKE2b-256 aefe977043366c6a266a6e044650f0976f54015e652e0ea18561806cb225735b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b735f4b4a1e84fcab17cb9e5c9c4eb2943992ad42a4bb205f1c3dc9d4ae55b9e
MD5 abc216fa3c99aeb5ba071d41e995ac9d
BLAKE2b-256 aaa131e97b1b29b01df282b6fbd9014a86cfe22bd00ce2d97a5a6ca0fdfd0678

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f907ee9ae6c2f190652e2e10ec6d0701a011ebc4798b0b46ef1ac21ea06edd41
MD5 e16145d8783ff98a1f27e59e31c6dcfd
BLAKE2b-256 721a74731bf578748bbb8a24c5acea52635f50b9f0145a6755500c511b3e9168

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7a5be688c8e261e1e37c1e38f5b4267546d958b4770962978e1f87d3428b785c
MD5 79968d94b7bb964b6cafaccc9ea6cc2b
BLAKE2b-256 9018dffa8f4d21a1d63c59b39fa23c15e26a8989259759c9af3d6e4129e122b8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c6a06cec22d5beb44a7257162d5d72a946964aa73c547ce2219a47232837a5c9
MD5 b42ab92c8122d2db8b44be93068bb6f2
BLAKE2b-256 659b060db99abdfb370786badd0cf84653f406a2282ea74347128511a73d3ba7

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 22575387d1cd2501e96a7521c54761de79608c3d249b623be458ec47b26f1d6a
MD5 e7cb375c5efcc76d6ad10aa889c3b863
BLAKE2b-256 88dc38017f8666c8c51f9fc17f13a090b5be3ee723b9cd275c8a6908669cc64a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 73c2390aa02e038953cd357a7d94658bcda7fc3902675f3ca0c2e3bd8fcd5dca
MD5 b272b6fa7ddb0430bc8778ad4b5edd6a
BLAKE2b-256 5f02b6f179c238d72dd7b1c250b02f8939876ef57700a20dd4702429ef95e405

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6841fd977a08353cb66dd475046a1871fb1c2ed781bffa0aa4d8546c412d576e
MD5 f938c38d79b75577019e160c817fca49
BLAKE2b-256 5140979ee055d36c0c4fa602df2cac09e44c2be9797d4b3766294002f115e4bf

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9f36631b48258bb505cd8825e2b113d6366746008aca74de37781aa1340c2ab1
MD5 d3a981c35957f3dbacc67a3be04803fa
BLAKE2b-256 d41a670d16cb7b58fe935430c6cc1e696a00e189eddb8688e240341a91655629

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 aecf7d6a2e27953797f01c7241ee9e7f93f406908d455013c5cf96e7070f584e
MD5 08bae4d62adb0ce06ac75080570f845d
BLAKE2b-256 88e730a316e0b25d2ca2427cbf95fea2146917f862349f53aad34d0a0c2ad741

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b325c9c8b584ad59eb215bc75aec8530a0c9b4361462bc8673d0027347b92c48
MD5 e7a34f49cb354123e5fba5971b601e0e
BLAKE2b-256 a5a19f9311b5a814115803a1038aae44f6c0c5435d5e7f5822b114aba0d6bbe3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3e766e38e9c11218abe3b89593d63928e00edbcc5d7bc8f83a9218afd0e64a4f
MD5 281b32c07df2c4fcd682d2ba829a657d
BLAKE2b-256 057da60097d68e04db2c5c7215b1652b71d11e55f9237e7303edf7dad44d2304

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 103074330b7eb4bcf47456a0592a62144cfd6e55b99d3e88721d289e06ea9038
MD5 a1ac620c65e42a58fae678e92b4cd9f0
BLAKE2b-256 dc24502e5b95d39e98ac5d68e5fbb5705deeacba8061e9f572b6fa9fd094e6d8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 da1667bd7935054fbacd81a87b3fa285759da9600e3c631a658c70a5fadd9f0d
MD5 d1d7f3c68f5c159dddb9bfb436c55d43
BLAKE2b-256 ccb71efee4c9cf89f86b186c5cd2811af3d8abcb63c10dbf13b3a68f5ac37300

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.83-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.83-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 bcc392fcb8e6451a14564a9ca776c08801df8f3013cde0a9c64ccad5d8b3e1b9
MD5 8e6c3699517211e3b3e8b29254547025
BLAKE2b-256 7de4ba082b3e46032326030d108d5cdf2bcbfef7a18b83c5603b1ab96694ef88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page