Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...) 

# transform
data['out'] = data['content'] 
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.47.tar.gz (5.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.47-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.47-cp313-cp313t-manylinux_2_28_aarch64.whl (13.6 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.47-cp313-cp313-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.47-cp313-cp313-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.47-cp313-cp313-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.47-cp313-cp313-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.47-cp313-cp313-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.47-cp312-cp312-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.47-cp312-cp312-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.47-cp312-cp312-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.47-cp312-cp312-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.47-cp312-cp312-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.47-cp311-cp311-win_amd64.whl (13.5 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.47-cp311-cp311-manylinux_2_28_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.47-cp311-cp311-manylinux_2_28_aarch64.whl (13.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.47-cp311-cp311-macosx_11_0_arm64.whl (13.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.47-cp311-cp311-macosx_10_12_x86_64.whl (14.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.47.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.47.tar.gz
  • Upload date:
  • Size: 5.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.6

File hashes

Hashes for cocoindex-0.1.47.tar.gz
Algorithm Hash digest
SHA256 2ed6bf0d10922c228fe88f4c0dca8575aa98faaa9d962ea8152901a6942d80c6
MD5 355b252cfe0eab4616aa32a96e72eedb
BLAKE2b-256 75f77acb73571b146c700572591da938155fcbf4df004fc8996e859379391029

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ad9176f43d54b4d9e991fd1bb3d96a01b7ef5f1ceb23789ab2a71537b1bf57d6
MD5 3b3cf659546c0124957756e9e70dae7e
BLAKE2b-256 2a78675e908cefeedd3af2cde5edc2b811524990935c98fb1751491fbbc1d273

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 245fdffa81fa1ce0eadbda448f6c6e800e2339d293cf597b019bb5c3cd7fcd20
MD5 67025b561b340047f085eb6c77317049
BLAKE2b-256 6d7ef881a9853407210da155fa96d9e157275da821be6a82870e890c33669966

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 20b6920b010ec62e83d93ff5ed3d22797e441b30121d0e022eb5ca6452df8a85
MD5 28b06933d5ec0e19b98605c4de513be1
BLAKE2b-256 939139be57386b84cc4d7f7ddb05471fdb4432b51610ed1418f294348d87171e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f8ae212b096448907cc65ae1b305fa0608660ddf8ddc3c287865d0957433865f
MD5 8c23203cb908fcc64ee928390ca6a7cd
BLAKE2b-256 80dbcbdd5d27326d3b1646c940b74757691d295de77dc95997c645340459de28

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 39783f9f2949aa716b58ce970007a050cc8f51a9d6d57ddec01ec7f1e8435c4e
MD5 952467cf5be82ce0dc4a23f3d197a1fe
BLAKE2b-256 45a3411e56bb922b0f00b85f77f2ad6faa2c3f5e2b0952bfe7b4e8a27e6ec8b9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 034062b4e86a1cdaad898f72d2f420d5e8eb724db81e0cb9cab7721050e0725a
MD5 4d7a5c7e1c02eec1907f8fa9b24d78ef
BLAKE2b-256 9a462ab0786dd8e8777caaed52bc248e001b6fc30cdd7a65649bd88d5910fea9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 390bf66dc85ef3194e7ca0c121034554592e4924633a3ca3caa1ca5747b0e555
MD5 32b70f83618a7387107bf6eb6345b583
BLAKE2b-256 739d2d0949f00a75eaabd71ffb8927c6b9065e7c30606b0a29a7716593320cfa

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a44e71925b9321e38275b52cae832e07157858d3b6d91a488f4411c032cd38c5
MD5 55351beef251559ddbdde55ea4c822e4
BLAKE2b-256 d92ecd3e96453208c0c14de90d85fe78544562447de2152463faedc5b795a40c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 eadc59222394e54c92a0036c10c800daa1158d3e4c40808375fabdd23b372115
MD5 5c8fc8b622f7a07d5b4e9c5bb054eb7a
BLAKE2b-256 57c2deb8a52514d1014e6aa115257cd624b5dd5df8b05562bedcb0ddc6a75926

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b38368f1e2c35f7800705e1669ff22b97a7bffe6689bc57d80dd0336624cb474
MD5 bc5034d3f4b25db4c91de9514b8096a7
BLAKE2b-256 b4f3137a875aa00bf649a13e3120e42150aa654a1679c4eb3ee74793370c68d2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 19299ab2c703065fd91ad20ece2aab59270c7d915af7face3619202f2ee5b7d5
MD5 3fb2ddb0d488e61222b41703483dc03b
BLAKE2b-256 37e40894717898db672235af6d8526a308cd2a749186e73902c028b06e901a18

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9148d480e1e01defc4a86f06441d7ab110ad8ae045fb48e4f9c0c67379cca192
MD5 19b2a18688ba9b8292c0647674d005e3
BLAKE2b-256 f260fc75c9fbbe0a1347916b523d0d0a1ae2086775576b62a1d5aaa0a7e0dcce

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 df35bcbc7401b104684697f1001eb75a402871c4e972c61697fe7c305e2428cd
MD5 c5a55da852244331611b145f9bf89467
BLAKE2b-256 ec23d926a43be63c6853e97095dfe7bc6409c536327fdbe4a59145ba18ea8eba

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 56419889c57b4e8310d61aa29ae79f8b4ab8cceca1e9fb2f7a49d9dfefad6fe8
MD5 f7a0f20983ad59ab6baf717b1d8ee15c
BLAKE2b-256 58f1780e26370e6199047d2f53920b33bca30001ccaa658932331054e9d71171

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 66952498c00d73e1ded5d739e7c5fa34a40ef84bf47beecebf8cb1dc6ca8e1e9
MD5 670b9b9854b1312eb7f66bcc14f144d7
BLAKE2b-256 6a1ea4d8e0bd34baa640c0bb628aefad130e20133c0771f98a33a8938693a637

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4962be76024a5ab9abbec6925dc6fdb1d1d7255d4a1f304d572cf0c3060ba057
MD5 39638461606ef36ac1346e9760ca21c5
BLAKE2b-256 f5d9256df615abf9500dfe4b18a5320b9a05e9e46821ba1ded99c9884b0e0766

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.47-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.47-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9d7d8a7ac9ee829b984dbf07186e07a41b2d4968cd2be2140ad6905ea9a24aea
MD5 abc04590d8d7b375cf180e8970f51456
BLAKE2b-256 c8edba9d5ac677c952f15c5e5fb16ec97b382fb15dd5c3ec8b8ac3b88e4c44ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page