Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.64.tar.gz (9.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.64-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.64-cp313-cp313t-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.64-cp313-cp313-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.64-cp313-cp313-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.64-cp313-cp313-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.64-cp313-cp313-macosx_11_0_arm64.whl (13.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.64-cp313-cp313-macosx_10_12_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.64-cp312-cp312-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.64-cp312-cp312-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.64-cp312-cp312-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.64-cp312-cp312-macosx_11_0_arm64.whl (13.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.64-cp312-cp312-macosx_10_12_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.64-cp311-cp311-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.64-cp311-cp311-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.64-cp311-cp311-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.64-cp311-cp311-macosx_11_0_arm64.whl (13.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.64-cp311-cp311-macosx_10_12_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.64.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.64.tar.gz
  • Upload date:
  • Size: 9.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.64.tar.gz
Algorithm Hash digest
SHA256 d6a8d9707e89c140b2febbd593c0833ba3915acad2be77234b846bf33ab2ea11
MD5 5953fdb2f25ab6dcb16dc9c24980a6e8
BLAKE2b-256 0a3a67610ee76609a68b5b77618d43980420ed2fcecf898c1646f065baf560e0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 169404aa1ca3a13a62bdf76f9f1233dfdf69059eb7a808d211a0c7a74542ec4e
MD5 4fc74c55fc0c628721a772cbae51849d
BLAKE2b-256 d29372048ededa84213f02bc8985d8ee95a8444607a0f5d3cae17bb4ec24cb0c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2dd8092d73c2c294f4bd43fdcda95bc9adae75ff5613429a76e51caff9db08a5
MD5 6c060febb66b1dcc33635d0f9525b2c2
BLAKE2b-256 e068efcacb02d678cde909d74bec962bcbab70f57f8f1f14c6b87e381c4c6495

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 8057392472e77eda553ddaca68e13837e35003143f0e8532396de8c2940f3f5a
MD5 5ffc4350305a54590deb651c13c6bfb3
BLAKE2b-256 feb92087ccd0ec9636d00d7440dba16deab117a257f9218e9539073ff6fa0257

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3d3e654ab50dc07eda1c591235c893f2e3f7073e4c663716274bd230e2c47818
MD5 3e267b75eb8e1e250e808073e5c4db71
BLAKE2b-256 c76f161c845bec506c86ef7c1ee5ba9917db2061b33e3f9f3177a1014bb8e786

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 03b5b0f40efd72f58cbe877c234d5888b59ef402bc56e121588b16ce50952b0b
MD5 348ba88118b261e6d6144bf519c1e482
BLAKE2b-256 f31ac7ef8e1071ccf19bf64dd1979513a05bb351d514088009a182573fd0693c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 eeb86efee57797fe3dc85525b9eca6bf85219eb93a8a6179ea14400d7cb89c68
MD5 66a0f3af2031c7fcd77245eace512b54
BLAKE2b-256 6586d62c970dc49fbfb2c14d98fb34cbea9ddf3e3514473b5e1d3a57e41a8671

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b30cb56a95593d171e11e6ad73fd8c3c2619bbf3e3629875a2f3a341bd57010f
MD5 11abdcb6b1d8cc8e9520865d8c5841ea
BLAKE2b-256 56881b43b02f44e153a66f68139d2bbb547354602865a1a92b1640fc86d5aba6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 b198d857b010db56aba205f71353a75eca4aea9ebd5876983d621cd313b28014
MD5 cce1209ed58743d63d39fcbecd3e3407
BLAKE2b-256 a7a0661acc42d4a3709be28cf9c24859d3c375bf19551ceb5ef795abc05316b1

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0b3ba186f8b18db34e9a77864338f700e6b95caf0c13e96d76e3b6c08fb7493d
MD5 8a68e43cad4156d79fe1caaa036239f0
BLAKE2b-256 75fa75ad4dd8df8612c155068895e127d20ce6835572bb9d3df089bd17ec5b39

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f255f869afb791302498185cff4e890897e2318bd0c6e27d6b6ed36d1bf339a2
MD5 eb6eb99cf07abf13df926261015eb372
BLAKE2b-256 6efa3a26010e153bf762855ade7d40656fbaccdba39f7fd37935c6d707a26d73

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8c685d0dfefecc4af21c195400df1354ba88d14bbe34fc5c7b28d7e065bb31e3
MD5 cfc418142606555c965505e4a6c77268
BLAKE2b-256 a4252b99fa1edb4a5853b40802c8bc3f1816d1b05a5e478e8a6486fc5dcc8404

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 53e741742d57b92c239a3915d9f21577d7342691e89c5bcfbdf90076a9e787e6
MD5 17d3a1c6fc1c7b182ca5ec5ecdb529fa
BLAKE2b-256 0ffbec3db0193ed22aaa212238f885a6e60e267c84fd623b07b2bbafe4251524

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 708dc3344f3416899d0aaf188c6f6aa3731a411697e328cc8012bfc847fd7dbe
MD5 56ea6e293f6146990b8a9f1071a92e97
BLAKE2b-256 19e2f3cdae53c24f973c32e198f910f96e1ec9df2211443dc76ec6b64d66b8ec

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 76071b8534d3790708051c2fb3a6b3a98bc53d41b2de2f46283738bb7a30b07d
MD5 c7ebd9be0fb19dbea611c90d4d785c83
BLAKE2b-256 2d495d7ee3923863aafd55e339d9bf3afcba3a9ac932ea03f7b9c9a2f90a001e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f448240289642c3bb23fdbcdf3151c0d0db68a8e1453d87a2196db23af85b295
MD5 6a29ff1b0d483004f24eb98a8a7231b0
BLAKE2b-256 925d4153c1db067e929043c6a39919f51f1b3a141ef02c330efdee1a473c5756

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4301bc9d7c3ed2607d4699f7e22a670651e3511d49f9ca933d3305442434397d
MD5 5ea39dad3d2f39a3fe0ca7d671727e68
BLAKE2b-256 a038010c4a1d125fe14dc608fba608ddd9632653ce120da42c763f24da672fbd

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.64-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.64-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b2f0b39fbf929d6154bd3881f79bed89f4223f7ff37547fceb7b1eaa5fcddc53
MD5 271a9b933eda2c546d5ca7ef05691c92
BLAKE2b-256 08733c868755c66ef024538c024b70bc707e9c1adc73ce634a742e086615850b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page