Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper
Custom Output Files Convert markdown files to HTML files and save them to a local directory, using CocoIndex Custom Targets

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.75.tar.gz (10.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.75-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.75-cp313-cp313t-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.75-cp313-cp313-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.75-cp313-cp313-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.75-cp313-cp313-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.75-cp313-cp313-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.75-cp313-cp313-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.75-cp312-cp312-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.75-cp312-cp312-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.75-cp312-cp312-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.75-cp312-cp312-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.75-cp312-cp312-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.75-cp311-cp311-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.75-cp311-cp311-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.75-cp311-cp311-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.75-cp311-cp311-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.75-cp311-cp311-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.75.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.75.tar.gz
  • Upload date:
  • Size: 10.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.2

File hashes

Hashes for cocoindex-0.1.75.tar.gz
Algorithm Hash digest
SHA256 eef9478ac8440130c457dc701f41e661e80cb4267421ceddd92ed2eed892559b
MD5 0b40da165d1f2230d03797f98276c013
BLAKE2b-256 04ec0ba95ed0ebe0a8eededb9465f8499ac813b1c395bf2f215d8aafdb8c6519

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 172dda12b995d5af5d4f82e25008064c66a0a0d08bf0a37cf0fb4d9f71c9147a
MD5 90d7a4ad895b89011f9fae5c8b50f36d
BLAKE2b-256 6467ecb568c7a77abee1ffc89650f1dfef569ca07e4ec001284053853b2b2bd7

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0c2adf40f3e34d14b16466188c54bf756c410572fea581c0160d53d19eb46905
MD5 d68733b71361c0f1d3241a8cef7f8e6e
BLAKE2b-256 015b8e036ce48b0ccb53af93951836b91a2e1207b631350ca6479baf3502b1fa

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 c940a6d7fc217361c6acfc1a96245b91934c3e34c1bf2e8076f0cd5d6be14173
MD5 c5b0e37f0674b276c6f4f25446b1a9b7
BLAKE2b-256 0103089bcfd2e273fb8e9140136b8a6b4cc1758aae718d10d2d47aae33859fc2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 df51974ca453011f1fb2839da8fc009332209fece5f354e165706f741f186233
MD5 4ca4e03294244adaf389f77920bb3c00
BLAKE2b-256 181078ae2feabc529b54e9d316010f7397e4a5a5f54a192b426e8a03b49c12e3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 21699a03ee2c714cff5e9117098f6b4688da6da8de73605747cbaadb2d3277fd
MD5 ee3f434416adeed4c269a667ebdd33ea
BLAKE2b-256 48f8a337182ba1c73a014e00a60e49aa9f6bd9320f592cfa440c80b4b00de692

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 27c82a9a8f2a780621089d451bf92d8a0d2e2a6ea3242991e5821c1f3268c96a
MD5 18a355217e3788b80552817c1f58bc94
BLAKE2b-256 efa1895253f50bd20291a8be51d9cc95751e75291d18ddee5ac092bd551781fe

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1f16fdcde1983621b1d3270fcb1d59fe7dd376b57bcd290465b8f3c555eecc12
MD5 9e4cbfaa114745a2e6eb8c075c2fa178
BLAKE2b-256 030150acae0d979f839456180178a83166e0a3ea97fa11afe78ee4550a71334f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 6821a863e1ffa8df636434a9a931fbbadd6e3fc3acf07c372e385c33d5636936
MD5 f205a23f30114140bcdd70041073f407
BLAKE2b-256 f296895d988185d0d194f68533ab6b14efa83bce1f4dab0b140c34790da269c4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0931462d32fc764e4ab3adc40d7d5dff62f5281d84bb1b52574863fd01ee3482
MD5 15a6f73d0c049feead91beb9ea0e42d6
BLAKE2b-256 7afd96944bdae2a99cbc827f7e3346047999363de4d3ecefc5ed53ffa27b72e9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c2fad2abc9d5af1e93a86b0e60e45eeaf92962f6787cdbe69ddf022e0509f0b0
MD5 1cbbbb11c643348832dcc4c449438e85
BLAKE2b-256 8b556c4d936b2911714f72b467b8e9b1ceec5fd7ac690bdf9422a28d429993dc

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aaf86a0244ffe92b83288722928eb28a672c22bd8211315544b71479595a1974
MD5 f47f0c53b52d6a6bd5829a0e790844fa
BLAKE2b-256 46f6a03fa4e8f126568574873c696378407f3321049db80dbb0da4bc1fd6b2dd

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 2a1d3a4da40ce813f74cd7f77ee015bd326d5a16f177cc1f7f5ade32cda88a4f
MD5 11c98544199f3151c418d392ea90f131
BLAKE2b-256 ff082f92e986924e38f8718ab313197e7e218487e19ea5e33c4b7ae821fd1114

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0de323b0cf71c8760903855884618460c8f61c3f05fbf271b12fd0c126850398
MD5 9004b429a1ce3492654a7b7c416a4474
BLAKE2b-256 6aadbd619a2d3a90ca9cf7c365d0d08ff1edaf75671b47af26e12b1bc920ce53

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 329923f59112170f5430c411db7e1da11c7d515155eea85f8ec11f4048210bff
MD5 c9206eab54164c1d4b9d032bbab1871e
BLAKE2b-256 3941150f413e10be154429a7516d2510acacf146912db947335bbbf9e7c4059b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8efb93b63abe5fd60d18363854b3bd4dfa67c83dd95d15e8f76d5df1ce6f3b55
MD5 09d941d8b8b1e9be66efa774578eb1ed
BLAKE2b-256 a6448ffd7d60766dcf6941b93999ea9a732f34585a1c2e77a4b02e21101e22a0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 02e0749fc3c2189938a986e778d70b288fafb3c02367d3f4ae0e7202683dcb8e
MD5 3b99cedc64816ac64b468afddba6c765
BLAKE2b-256 3845f8747296b53345e3a0724d104030ce0b5950b401c23e5b591189942f14c3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.75-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.75-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1aa8260f3f00cf0fa83e825cd7493e9506ba2511f4e6ef3ba7b96e41270a50b7
MD5 778d2cab39c68fb8336727dd0612dafe
BLAKE2b-256 f0cd6fea1dc53014acbde580c65c2e8b5d313b0a5a643c0bffa98002ae688926

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page