Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.65.tar.gz (9.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.65-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.65-cp313-cp313t-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.65-cp313-cp313-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.65-cp313-cp313-manylinux_2_28_x86_64.whl (14.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.65-cp313-cp313-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.65-cp313-cp313-macosx_11_0_arm64.whl (13.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.65-cp313-cp313-macosx_10_12_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.65-cp312-cp312-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.65-cp312-cp312-manylinux_2_28_x86_64.whl (14.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.65-cp312-cp312-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.65-cp312-cp312-macosx_11_0_arm64.whl (13.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.65-cp312-cp312-macosx_10_12_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.65-cp311-cp311-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.65-cp311-cp311-manylinux_2_28_x86_64.whl (14.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.65-cp311-cp311-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.65-cp311-cp311-macosx_11_0_arm64.whl (13.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.65-cp311-cp311-macosx_10_12_x86_64.whl (14.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.65.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.65.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.65.tar.gz
Algorithm Hash digest
SHA256 7de29cc17f342bc95c2c72cc23b3e4b3c3d65201b54a077a4cb1ea622f6a67e8
MD5 2304ce2e9c29b6355848801037b229c2
BLAKE2b-256 1babf6f081ac2caaa0c60acf638694d45489bec40d22d1ef0023baf781a6382c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 403cb73929eb0e81e0048b6433b65e4ebdcfa2ba3f6f0b8055afe2d4e1706588
MD5 75582b57e4eaeb1faa030714d7d4b22e
BLAKE2b-256 808d18472611a97637dbfd0aea158f490e565d24c349b5045b6901c0f90b4318

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 38b1fc3fa7bae3f99020edbb8c97f8832a5014b65b89a66f362e4f877c08a359
MD5 f74aea861997626a69fc7fe332ecba67
BLAKE2b-256 c4adab7f7b3ddc8c216f676ecfee5200e52f317da8752fe955948ab770cb515e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2f393343268ca5c05fd11b30585f79cff7cfc6bbd24af1371ffa9aabd1acc446
MD5 a287a011b9ba359da07b8a5f00e9e7f9
BLAKE2b-256 c2ae4677eec280f830716d6b4cd0b0f5094f2669059455622c7621400dbb1117

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 25d7f8e328c964d3a4bd14c4ee657493177d82ec0b0666e793d76c52981cdc56
MD5 2a3999fe0fe73afb12b40b5282c58c7c
BLAKE2b-256 2171279b8326a7b2a30d9eab0c4b7a20194021d5c0da6caa7c1da56eb54276a5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 32ebc8d3d6913d68a362c1f19da9f92876d18d0a81920a0191fdb8b33efa0d91
MD5 c1ea2d72780b7beba2090576061d99cb
BLAKE2b-256 23b6a900a4ea62d37037aeaacd8f7df28d20616099f4ff94b36412d62b858e25

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 99444219c5457367dab171d585a01a23bb81332432997d419ce42dc042a77bd4
MD5 97130867cf7359dbf6406705f9437e27
BLAKE2b-256 bb202e71116be9f4914ab4f972132784f00b5e9a3684af7b7f8ea02d673764a3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a054cb01a8251f7f22830cf22bbe4ef544973362ff26a4da91d3569d30a1853e
MD5 d990787ae07dc200b7a55578ab7a7913
BLAKE2b-256 44b5d66e79a324a7abb8d916afd2ec4ff85405c3bfed528f53431c1a15f5df09

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 073af1fcd1d947f015ae97f07c7c1c6f43667c09d0d726fb9284603d89f57e69
MD5 824bc5db024f0101759f076d28f574e6
BLAKE2b-256 5bc6a08af164b175c0290b5cfbf9db0ebe785b147f1eb57efe735b73ce00595f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f813a156bbb24016728fdb2538532e83ee97c634a82e522f31f7db95a35cee9e
MD5 b8b8bccafa99adc04b2a19b5ee846106
BLAKE2b-256 877209f6dbc7367a9fd51d49acb45009a9fa6d57a6a27d92d33e74d4fba5b977

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 873eb6e14a72b05bd98e146d030ea29fb26cdca63c84e6a1db3a582e206eca80
MD5 6720685e5622ed23bed16567b5ecf6f9
BLAKE2b-256 47089873e82f510a42fa80e90f27a73dc0e470cafc13695613e0f5120dc219d4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a078100d2300f1cbcf1f3608d6c2cd4814c22f98db8bbe236023141c97b78a46
MD5 d24349d161001ad99979b14ba5ebc170
BLAKE2b-256 6f46bdc329eaf8e397932d808808ec1a608b29e2de4a34788524fa7ca33cac1d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1a84fb29a2e86d73b0fdd001e81768a21ed0323a3bc136e5fe186e797ab321f5
MD5 a3eb3dbf7bde0e14623d0b0122c97f11
BLAKE2b-256 ed49f869f624d11a044c4f57f194f1aa04fb3c9c4c0321bea7c1fb748926fb59

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 6748bd4406137d79c1a8ab78c1143df46cda4cefedd43661ba2ac269b5f5a06e
MD5 134696532580796e681128c9ac10143c
BLAKE2b-256 6a6e0f7b083e8bbe1510f0c48a7e6b4fe36f9118c306207152a95db76faf991e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ae3a11f38699a0939f638b4c34c4df0fa1174826a0233dfd741118d3c313474e
MD5 138d6b2e4427f7a818ca9d49549d6cb5
BLAKE2b-256 e011c39e47793826364d4c05f658e4b6adbab7e6c752bbb70edf6681f7c91ef7

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 beee4176fc428f6ecdad8576b5bc29ce2a051ef877d345f5e809863fe02dd59a
MD5 c2610e7dad91ae1225fea718d0b843dc
BLAKE2b-256 46ae023c58b60428ecd4921f67f63ff61a27b5e33568291262be5ad012da411b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9cb2d989c7c33694cab0fcd7e63959c05d66deac9209aa23657763ee6ad22506
MD5 97cb9aafe65d1c0b8a161643e3bc28fe
BLAKE2b-256 e18c6e71b744385bb588f759eed599297618bd2b8fdf4132215c0e5b82916dbf

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.65-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.65-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f627ba47180a74e074e7f31dd998b9b546a567f4fc3da7a3947d88977ddf41ba
MD5 d39e0b390f13b20e024ae397ca86baaf
BLAKE2b-256 b637456a3af6b92d89ad9967d943e81535ac0c7a450efeb958d12f4c23f219c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page