Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version

PyPI Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it effortless to transform data with AI, and keep source data and target in sync. Whether you’re building a vector index for RAG, creating knowledge graphs, or performing any custom data transformations — goes beyond SQL.


CocoIndex Features


Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Plug-and-Play Building Blocks

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components - as easy as assembling building blocks.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper
Multi Format Indexing Build visual document index from PDFs and images with ColPali for semantic search
Custom Output Files Convert markdown files to HTML files and save them to a local directory, using CocoIndex Custom Targets
Patient intake form extraction Use LLM to extract structured data from patient intake forms with different formats

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.2.2.tar.gz (27.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.2.2-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.2.2-cp313-cp313t-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.2.2-cp313-cp313-win_amd64.whl (15.9 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.2.2-cp313-cp313-manylinux_2_28_x86_64.whl (16.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.2.2-cp313-cp313-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.2.2-cp313-cp313-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.2.2-cp313-cp313-macosx_10_12_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.2.2-cp312-cp312-win_amd64.whl (15.9 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.2.2-cp312-cp312-manylinux_2_28_x86_64.whl (16.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.2.2-cp312-cp312-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.2.2-cp312-cp312-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.2.2-cp312-cp312-macosx_10_12_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.2.2-cp311-cp311-win_amd64.whl (15.9 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.2.2-cp311-cp311-manylinux_2_28_x86_64.whl (16.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.2.2-cp311-cp311-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.2.2-cp311-cp311-macosx_11_0_arm64.whl (15.8 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.2.2-cp311-cp311-macosx_10_12_x86_64.whl (16.4 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.2.2.tar.gz.

File metadata

  • Download URL: cocoindex-0.2.2.tar.gz
  • Upload date:
  • Size: 27.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for cocoindex-0.2.2.tar.gz
Algorithm Hash digest
SHA256 0428ba76b9b0711cbe3785db650da316305f1e638cb87136ef650b0dbd4cb854
MD5 c38ffc26a7401f1b92000beb8b45a143
BLAKE2b-256 02634b4a48e8a63d930c2f7d7c6c592c0659f17bbac16c6b36a776b1fd1ea13c

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 bc262ae96de1d392865b317ecf00f835029e210307086250ca5c05330cb53fd0
MD5 7ce06e27b7d24b51abfbeeabf5a8869b
BLAKE2b-256 1d2921b7b3ad7e654584ee50bc45fceb738d2bc6be7e9090aa069ed8c0d0f5ad

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c8e92ce35272be8beb076f7aa3d4236dc9f8654805890c8a277834d479415764
MD5 6bfac531a42e407a9cd6f9a526a40360
BLAKE2b-256 de74c632806f065b39692bafd1cca5f3911f3d7aab87e824d62787daf3d66e14

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 179c11533c37f156c896eaddbaf766c4744ec281c980501b65f2c8eac84f81da
MD5 126bbbeabeee2072b9ed567d6ffb2284
BLAKE2b-256 7d9a2e9e03138c286914c3353271cfb994b7400a4989d93f5499c88144552ec1

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3b421ab607cc067aa16b92f9a47b3a365ab4c4a6893a1a7e47ffc07c762d80f7
MD5 f61dc461079a2bf6f1b065b7b65a939d
BLAKE2b-256 efa13f9e62dc3106bd76a5c388303435e2db62f35217ec237d2d7e9aed6dc486

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 5190434471097e77e88fe294c4fe083a9db69d1ac9e60c57693eb27893169c6c
MD5 d83baf9be55fba860f2a464fb8d8d05a
BLAKE2b-256 ca4e303b106042920d408ac0806a142f82e99a307c241005733b8be7a838ba34

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6d3d3a3e1f1069d699c240eecbdd83ffc40a7053874783e2d23630de86c7bc52
MD5 144e2c4c957e07e9d601d6b4e9ce6dfb
BLAKE2b-256 0e949c3d6682693a04cfb1dafc97c4eeb62a8944ba8b7b71664a540f3f0cf64f

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0ac7126b0e83f247e5e0cc22412c4e5df6dfbe6018c057c64860db2e02fe5e25
MD5 06509b611be70c336ca58fd85def66fa
BLAKE2b-256 c6a5be534d9540590847cf07f054afccf8e88bbf74fa41d7881ee3085d4e75e6

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 eadda551af33b1477fd3972bc52b0f50160d9de66fc745910b6e3fe35fe9e530
MD5 499128d1d94ece6ee0f1d3c6820b27ef
BLAKE2b-256 a4358e3e398a4864aada5cb4e9ebe443a2fe0ca0c08bf70531be66cdc1dab9e4

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e5d68af6988fa383370eaf4d56e4bba49e7900853a17c7e5028863d65b303c75
MD5 da614b0debc371fc1693f7fe6222fc56
BLAKE2b-256 fbf755de438e159fb5f1d74c1c4bcf20fc5073ced6d3f5f76c269a71d958311d

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2e5207b973619f590482502b9601ddb7aa8dd8e7646cdc8bcce0f879a4a53c97
MD5 946d121ea2c372bcaef0a24aa53e6ba9
BLAKE2b-256 f0d8f1792fed0deb80f831ccbdecc05fa6c451a42621820af633fd9fceaabf73

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7e4187e685b62676032b1e2439474a0f9c8530022251ce82df600247336f3611
MD5 02976cc4d8f8b2059d899f1afb7b8c1c
BLAKE2b-256 d4bd141d78a8737693df541ecfaa6e88bd0c766d6268b132fddf5d4ec32032e8

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f1aca90a4092375dabc3e733f81e0c71649568afb03d47191da0a34962fa9fb6
MD5 79bf3d6a999cab3fce649015dd85c152
BLAKE2b-256 89c9afc410a0f984331c85cefbae361442dd4ee1d71a568fc8db371a04725f9d

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 94988389a26023dfe0d868df754570fb9242c1a1e970935521f113e9253026ac
MD5 581b8cee65f51c59d52ce55b00213dec
BLAKE2b-256 38147963cb2e54c932a9cd9b76dfa0d29ed400312dd352e3696f6c4b9a754848

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b5b658f82054ed8b99e3e202b528423cf1ffa8e3ea4ff23abb672e758aa7a60e
MD5 e69c60881ff043464823762a37ee99bb
BLAKE2b-256 b389575da25287d223c1162b8820f4b03560f5caba3198355402629a65075115

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 471a8273a6c136ab866aebe751a85ec94ab9f7793bf104753365693ce3792bd9
MD5 7dd63e1c5116b127773cfb6fa276acde
BLAKE2b-256 1873dc5144bdf4c05c5ff8ce32fd99d62c645a475ba418002633cdcc45cae454

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7defa34a7a187dec1f1d2fd00b625b5e025e65c8089f7320c27621029f05cd8e
MD5 b97e01787f48ba50e4591561895d7d57
BLAKE2b-256 09480ca29cb311a460737301bcc0a4bc9862b77e6d6973d989648ad2b885eb6d

See more details on using hashes here.

File details

Details for the file cocoindex-0.2.2-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.2.2-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b30c9192de0011e660225d3278147ab480e416f9fcedee991b6591727f8d3dd2
MD5 a0c73347db7b7fe216e72520742625b0
BLAKE2b-256 f2fe0f9d45d88bb5f4a82851abcbcc8021b588f58a437c5a3b19049a82a52606

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page