Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.61.tar.gz (6.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.61-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.61-cp313-cp313t-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.61-cp313-cp313-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.61-cp313-cp313-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.61-cp313-cp313-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.61-cp313-cp313-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.61-cp313-cp313-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.61-cp312-cp312-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.61-cp312-cp312-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.61-cp312-cp312-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.61-cp312-cp312-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.61-cp312-cp312-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.61-cp311-cp311-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.61-cp311-cp311-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.61-cp311-cp311-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.61-cp311-cp311-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.61-cp311-cp311-macosx_10_12_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.61.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.61.tar.gz
  • Upload date:
  • Size: 6.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for cocoindex-0.1.61.tar.gz
Algorithm Hash digest
SHA256 a1ff1731a631da414c6ee1cf7786bab4ebb54d9912befcec1f93b5f10c089348
MD5 e7f87c448ce27799f1d758e54d92a5c2
BLAKE2b-256 092873a7a8b5d1835b4889aea81757a2fb6444b7bbc1c0fd55a3491aeaa98956

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 84847ce66596e62ad02a6df041800abd54832221b498675b01d05166224c77dd
MD5 8feff19ee5c3b075fe8ca02fde6a85fb
BLAKE2b-256 48bd54248ab30616aa8381cd33516dfc1dad75e28e035d616674600a0634d5be

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c14b0b4e0dc1ddc234560a58470c0ef7809b2014983485ad81e62d4c7a7e5d3e
MD5 7c80b5a56d1118b10944bc16bbba0c68
BLAKE2b-256 86e051040e968fc6b5f1a42e1b214d4409679b7f7e3e7c121827a676c2a6fbd9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 7014a94baa3280b4f41da50636be60379f3f4545ebbeb266a5ca8eaf44d6e96c
MD5 b539ac345a9671964077135830a60f0a
BLAKE2b-256 6fa703dec4a730e48d5b8e7306d05b8a5f8cef9c6da57e6265b104eacf5d74aa

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 afb4dc3e35762055bb74019748da14e04a53c0f459faa31d7008809bf5222403
MD5 82a380986f41a827f6e1373b98b82b7a
BLAKE2b-256 8437c7b8c53131b7ba95e993e142948e7f11912104ba6b8fd949b5f23f21e0ae

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2ce010b7f8099eb92bbd351f355ce22f7995fd2bd6bd2e7f011d9259f56d1237
MD5 94a6970cb04e558427b51f521e61a26f
BLAKE2b-256 2b13c0c681ccc93344fd5d50ea60b7739b77a4afddbe865053d1f1ccc1dd897a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f4f678b9bc9fe41f141e88b8c3066e4b84bee3c9c659135e673d658d67d4926e
MD5 565d4a49ca3f695897fee9ffaa00d4f7
BLAKE2b-256 710a44ed6cd6ec3a92da0fa1129878824c81afa5f2ff205c69ed158c9bb33711

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7db20762d977939c5dc2e7445535773c92ad98175b3cef76e89af33ceb3e2c24
MD5 f50875ea24453ae8f8e196e023d09214
BLAKE2b-256 537bc69268a859c8ee8bf8669d0f6b48f1786c440949a6f97f5713bf4e9d6667

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 25482f842060df05974d462a9a1cc6ccf7b7152b29b28742c284cbdc0bf7d38b
MD5 e69c846eb23be10ece7dde5c7ea3c9b2
BLAKE2b-256 6e7c8f06c932345d27d9073a73c7d3ba776d29cdaff821af1547425d404c95d3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 826c3dd89b2b414ddd9011a1fc878790bed86fafdbe4486a149fcda116f75961
MD5 de0463af113020310176e6a2e8af6d9c
BLAKE2b-256 87e24a2b4214be102e1c75cfd95f3b6791f2f4be0d8edadf005a52f4bbbba3de

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d63f3d61b367beb5a20bb0fc5c62ac04ef61c81a95637a4f3d255caba5ef0346
MD5 6febd8d06c8f3b82c1d21f078343ef65
BLAKE2b-256 a9d28ba983fdd4539844b783421e7072d7a9f796e133e001328f9ec73667d06e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4d2111e35930d786bd6eb46707592ad1b32890c55e97116ef9c440ba8acf80e2
MD5 729efd026609d3c35a61e31be710c50e
BLAKE2b-256 bea58dc69455639edd564e4c8d2e20fc22d94d3745febba76bc0020f04a4e618

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 dd6ea92e9f240754e979384a3e53c438df975d3b8fcde52b5d0581677d479b68
MD5 cfd11f3db83fcc083124efa03c9a397f
BLAKE2b-256 797979e57a5818c1d4b5b02399e82cc26e426ee249232ae808da4c890167bc36

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7d39d947a356f69cf9fe842ea4bcf627518c3ac60bfbf2027f14a7cc10934615
MD5 8e19316c3545fceac8eafa5a6121ec75
BLAKE2b-256 8df05c1d29c822300dee2a6a0aa0493858bed80ff4c4a44c70ae1cfd8ffc9932

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7f37f691bc0b247f340c20d55881624e92ad63a368149fcda87a0cda6e2d9789
MD5 665bb92e4e5c69f6d0eb74160d11a255
BLAKE2b-256 e52e2a7a6730415d5f20e826c8a0a51d25f4ff040e5c4e0ce9696ff06940db5e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f7cb428b4811efdb1b74df128d3ad4877cc3e5bcf6c337a261282d1ad826106f
MD5 e3acfd0554bf901972d5eca0232dcfe4
BLAKE2b-256 51706bb9361fb1bda061d8977ec0fcbd0a0675af9b410cb85d9710a6c6595a51

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e7dbe0eb3da787f46a3f6c561c7bfe4a9f8d0cb83f839508383c11393f789619
MD5 7ad46ab0b3bfc1f50a354073c385655e
BLAKE2b-256 edb6fbf971f091aca989c6d66e0bec9b0b15f8efcabc7e713a7a3fd8f392899f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.61-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.61-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 62f36fef02599acdb18f92374a5afc00d40db56db56f5b9d522dee9fa1835f45
MD5 d24b47b45213e5293276ec6b6dedb5c4
BLAKE2b-256 d82594724091f975fa6751977694be6531d79a106b231d57f6d824e4b3c131e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page