Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.59.tar.gz (6.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.59-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.59-cp313-cp313t-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.59-cp313-cp313-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.59-cp313-cp313-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.59-cp313-cp313-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.59-cp313-cp313-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.59-cp313-cp313-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.59-cp312-cp312-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.59-cp312-cp312-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.59-cp312-cp312-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.59-cp312-cp312-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.59-cp312-cp312-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.59-cp311-cp311-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.59-cp311-cp311-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.59-cp311-cp311-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.59-cp311-cp311-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.59-cp311-cp311-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.59.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.59.tar.gz
  • Upload date:
  • Size: 6.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for cocoindex-0.1.59.tar.gz
Algorithm Hash digest
SHA256 ab8cd6ace34467197c5704a2bd695e07e475e24b4f5c70dba47401bfed5818d9
MD5 437462c181abe40d3c329582e5a4342f
BLAKE2b-256 5c40f996a683b7974a746645983bd890c198d8ef4c7a68f9d13dc7d23b725ee2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c94d551c75e4d82e1e34555f714097dd5eaade1d73da9b663cd9954b1ba11570
MD5 8423e3a74749d88d9a99a95cf4cf1a9f
BLAKE2b-256 15031e3629cb8b0b173b62bb7d04058fbc8142e514e2e4c90631486d3ba11916

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 5354639970594f25c973ae3af2ceac4da78dfb2e8b49b416300806919b539fd0
MD5 bd4e056c0284debc79f0509bb7c3091b
BLAKE2b-256 532c5c810dd2e5533596ef1261326246030a4cbf3076f21049a4d5044e07adba

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 ff9ba42d4f1d41e24be66993cd809371d6a320e70db73998a0b9b4b6321aa455
MD5 d37a1cf3d95f66bb074b2bfd5d3fefab
BLAKE2b-256 ec715c82164a5ba7deaa812e429795368725bf1f4e7f53de2f821ee86833958c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 835b518f00be782441c47b89ffabb057c9854da7237b78da118166970a9b2afd
MD5 061528c646238251c92263dd37ec3e23
BLAKE2b-256 9c710fbf53be94ced954dc2fd2aeb07a245b86db39c2dc06a81fac3345a3824a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f88aad30fcb68668f805bc11191c52e4bace80741ee38d03573859930dcc1e4f
MD5 79ecd7daba3bb3c4ae5dc2d8024a588c
BLAKE2b-256 24ef1c374b07742f8b31a30af7e0d9ff64e019334d45d2e05814e7594252d367

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 11e81481e4f65d973d96524346ddbc3c2a2acddf73811d2382059a575995201e
MD5 43e2f41c1127f241b73a117d147ff65c
BLAKE2b-256 b8af05df11d643761d99fb6884a8c1a8a542e020e1b07ba4df943eadc88aeb19

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 054fd4a733fdf0089a8d67dfb5f2862ab91b9b4356f4bc0cb5e1adb75bb5ba6f
MD5 a9866d323479a5f877b7c5303633b445
BLAKE2b-256 e6e4b151a1c650f0e0f8230775e78d73915018d861e2f61724f5a4110a3b9467

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c21999f9c2ab6d94cfeabbed69f9c5c6e350f3489029822cd9bde3be1eb13329
MD5 cb290291d3e4d0f852d27f3a3072b2bb
BLAKE2b-256 eacd073861bd49cdcbbc71ecd5e9495b4db01c9953c35260f97200cfa75df9a7

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7e4f0c54e1c0a8878c9bbd33f3f58dad4c346da25595e531c3507edc31b2a3b3
MD5 ad38e2cad7317b6972bbfaefed889b74
BLAKE2b-256 27f96e581dfb7286bf87c4410c08f1325bee139963a7e755665b630d4f1ef16c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6d777d01d0a0e68cd7f4291fda461a8aefdc39b70f747bb8d6490e50561c9129
MD5 eaa0fb35a5226515ddb804d5793e2959
BLAKE2b-256 835ce8e4594b0e583f886db962d9aac4dfd6dcf4bf640e06a8c1d8a8792b9fa4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cc63ba6cc0cd54b83c144ec8697caea8c564a209cab7c10bd0916220bbf7d9e0
MD5 05d0aaab0dbaadf15cb6f2dde89a63f0
BLAKE2b-256 af652e6929ac255684f6d3d1a41b3f5a300bb699ee17d458f0e6d97ff547df47

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 13f6f94883d4b444bb8e48fd4f51d99a351077c1b37cade4adc187bf9ba2821f
MD5 72e9a434aea9e904db08f201121da7f3
BLAKE2b-256 e2ae16348e0197e1db416a2fcba5a7e3f698c41c7992eb63f1a327440d60efc5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 8cf308566e2dfd03bd5fa856478b56eb28dc81d18109e35d90c710d9d7dd721e
MD5 60c87777b12c533948e5137b51873e0b
BLAKE2b-256 27bf7a0f1a2d9998e23c72f274be37115f0529a0e2782fc2fc51859714100d02

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fe491c165e8014ce93f25665a9b3b44720340f6e3b427b58f96045f2f16380b3
MD5 435a4cea9324fa20901c881f04eac51c
BLAKE2b-256 ebb3ae94c291b7835ed6b18514597fed6787f53ea58944f2ecec11c42b8a50ef

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 927737c0a0c3661033efae4397bdcfd4bd313047c359a826caadb74606b7718f
MD5 e59a62a0a1b7507ccf472477080d8846
BLAKE2b-256 5ef163cdd4ce6eab9894dde9b6070706ca515b4ff1c10d65334a74ca4aa8349a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2495f88cf6409b7285533978cad7e086c304382ed2a1653c7a228626816c93f5
MD5 605849dac17fd4a776fc8bf7946c659e
BLAKE2b-256 83131bbcb39a9f614bce38d4b9e846e0d1920856055c989bd0d68053a5efda65

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.59-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.59-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fda8347f8d329131336be12c6840df575f48f1213cbd171480e8f29bc05b71d5
MD5 125896b5d1b98b65e48116aa034a5cb7
BLAKE2b-256 b37f7031f516a8e87df7fae3ec2887527887ce4a2868ccb738d73bf2f41a0d55

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page