Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.63.tar.gz (9.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.63-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.63-cp313-cp313t-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.63-cp313-cp313-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.63-cp313-cp313-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.63-cp313-cp313-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.63-cp313-cp313-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.63-cp313-cp313-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.63-cp312-cp312-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.63-cp312-cp312-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.63-cp312-cp312-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.63-cp312-cp312-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.63-cp312-cp312-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.63-cp311-cp311-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.63-cp311-cp311-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.63-cp311-cp311-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.63-cp311-cp311-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.63-cp311-cp311-macosx_10_12_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.63.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.63.tar.gz
  • Upload date:
  • Size: 9.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for cocoindex-0.1.63.tar.gz
Algorithm Hash digest
SHA256 132253c82d2ab6c63592ab2f46ef01090a0187f7ef71f16074eea3bec858f9ca
MD5 446082ecafdfe728b4935f856c24754e
BLAKE2b-256 d115156f3de598d15d3c57f6c814378d72fb7020dca36bca42d402bcb122df42

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c9866de7485fbdfe4428bf818bcc2cfe90a45cfafe56af1cbcd387e0a86a5f09
MD5 1c6217f42a2c5508c3fb341ba4fa2359
BLAKE2b-256 f91da6b605c0ebc185803d82655a735525f58840ae44fdec7ceabf0186ba91ed

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b740a7e48f0630c09c72214baaf990c42adae5a143e36eb49ccb967fe9d663c0
MD5 16a1565c409bf54597b6024956d73157
BLAKE2b-256 6a8ec41b50a54203d669ed2558f76112372edad231b6f2a942a9b26e7ee85ac3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 9d74d367466c56047801716a33afc6035dc28ff1e668d32d70e2a8453c31efe1
MD5 f993cda8f3a27e3fe9557ac41e6f2e6b
BLAKE2b-256 d4673b016505e14ad38430b797f250d52eac29c4b8a4bcddf90c3a9e11b29837

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8ee6a6a5e9659cc4f8e7f76b48e3b31b77ab4b1f99025b204032d4227765ab3e
MD5 205b723d50a07c79f597df40dca3242b
BLAKE2b-256 0695f8bc6f9dc9874be1f12fecf2f9c81500786a6e478c9451c20990f197d2e7

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 13f5ae48182102fb68b295c0c76377d7a63d3b49e2fabb8383015b4f84bf51c9
MD5 290ed46005027b14f0e5aab09ee0d6dc
BLAKE2b-256 8f15a424f3dfee4e7b9560ba70e1690248c2f3a0949723c4984e0d9411f183e3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ef2394168c27242fb7107f4982f50c88a61c2c8c1370d075666ef105e7bc5c91
MD5 623eea56e445e114d7464a9c687a5973
BLAKE2b-256 ec61269fca1fc128acc08977742dbee8d421f9bb8f48705fef6c2ddace5d154b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 bceb3e547a6d6aee8fb47b79d937449f6626fd869f9d14487ec515925e107c60
MD5 4ec978b92524ecaa10cff283ae20c494
BLAKE2b-256 58f0081e9126e3695b227c0cc7529b7994f8ad295ba3d80a580e68e7d09a6ee2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 7501649faa4015c2dec9ee5e31cb2020c295748f1671a280d9a08dd348b1a24d
MD5 ac8eac468ed1e86d8ba5a9eb69fdbee9
BLAKE2b-256 539f2b33dc541eba7295d1d60e2336e8ec3b7b58bb8496f374faad4da63f2828

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b1f537729d5a716ccf48d9b7faa39b45fd3c1696e85d92ffa87dbf128a43ba0a
MD5 1272eab0f9dad45b43e10308bb741032
BLAKE2b-256 c2d922e7675ac8bf431f52024f1864f8dc0afdab3da1393159ac03ee3f6ccbbe

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 06975f9ce011d617836023166e7fe12eadfd60198404fb4ef488b551049532a1
MD5 86374a91b9514c92a4daff526f1bc976
BLAKE2b-256 3c552cabedb0e0ffa4a46b47446695de86a18c2237241825bef835d156a70d53

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ce110ac48f4d2a26bfec2c8290f3a78ff8dc2e7d62d67eb782a8c29d9c8f621d
MD5 90f24514eb10cf14962adf8ce193473c
BLAKE2b-256 8c35be0b06f84918ae881d68fe89a6bfd778dd3ae1e2b120c80f28fdac703849

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 024826685ce8b4f8b922513b4b19c9c73099ebb40cbb902710c12f76cec06f96
MD5 5c889d159c2cf378ea1ded88c32bee90
BLAKE2b-256 aaf4f8b8713fa5f4914dd896fdc0776cdf7763b0fa4a9eee45d3e98da29ae4b3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a3c13e35b45f6b8ae80b4eab837d7b00bf5c00f203dc7651b4a19ce8369dbcfd
MD5 1d57c9c64346351b1527bad83c2f8c74
BLAKE2b-256 2232184309ab558857c3382a435788d3db6b56a24ffcab3d1a3b7c16208dc5e2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 86c1c4686a7aecccd03f7080e641cdf58b80153a368f0f2f007dc264a2468989
MD5 8c1a51cbc7c71192710082ac834592fe
BLAKE2b-256 3d27ea551357b73c958b2238d048c7ec423c22d3811fb0322693d2ee2ad1995c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2fd17e128d6f5ba6904a8d679f0c7958a31c1ba282a75cd417f6d2288bb81df2
MD5 74a37ea38363951f69dedd9a048460c9
BLAKE2b-256 c2ff2f03e06b17db06a3fa792ad2a8a2867a6ce10597574fc64c52c693bbac13

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fcb1a35fbfcf794b1bb47b3f51561f20a375c80619b963925ea18e97cc601f1f
MD5 4af66ed7a36b18405382566e7b13b8d8
BLAKE2b-256 c7fc980ecbd6275323b56783f0604474690013bef6731c8ffc74f66feb526832

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.63-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.63-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e765b546582abed999fe1bdf1c0645b051f45327ec1db8283c6af5e56461ca1f
MD5 976842a1a3957caaabc1fdea29b800bb
BLAKE2b-256 22714944d9f510efc4314948f74a1c5ba266cf0d0704591f8d05b3f2cff7c567

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page