Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.58.tar.gz (6.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.58-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.58-cp313-cp313t-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.58-cp313-cp313-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.58-cp313-cp313-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.58-cp313-cp313-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.58-cp313-cp313-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.58-cp313-cp313-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.58-cp312-cp312-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.58-cp312-cp312-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.58-cp312-cp312-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.58-cp312-cp312-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.58-cp312-cp312-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.58-cp311-cp311-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.58-cp311-cp311-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.58-cp311-cp311-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.58-cp311-cp311-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.58-cp311-cp311-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.58.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.58.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for cocoindex-0.1.58.tar.gz
Algorithm Hash digest
SHA256 33869c743b5cc0ac987775702b252a5b4865df5e75e808417189cc6a115299ec
MD5 d02bf5f3794eb7c143a0bf2fa924a1c5
BLAKE2b-256 57a1b76c03b951b5f335c69e3c2539a106da05f2af63016cf2402f21db038aa2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cb6f48e50ed23aaf705ae7965952da615c59c8da2ef4386aaca483ca5654b53e
MD5 b3114d7933ac7c144b31b1afdd5d7d10
BLAKE2b-256 1801e8365435ec7c92ac2d8feb19b14875aede58c28027005c9b0023a3e94e20

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3cedaff7096f67be1a1da857f823a74fa427e1d56054ad5921873343aea1bace
MD5 a564001eba95cfe45008e9ecd1f22fa4
BLAKE2b-256 c70c0c62a4d2d68eeeab4e2bd79eb5757d2cb4f1afa8c02815b92cb9332364af

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 51112e37efb78c0d048cb2e88cbef031f6c7e740479c1aff4f83ed55ef996040
MD5 00bbe512716e691b65180210e40807a2
BLAKE2b-256 026b5637ce2e1e3a01509b80c96b454f231f547a3726607774e4b97ff25580e1

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 23f92b0e79e0c60ca9eb8985ffdebbde3273c7bde0afe3ff0d709e90d3262dae
MD5 03fa07f2a0469daadb07dc2e4b3ba2e3
BLAKE2b-256 ca4a990557957092680413cab711b7eb35143e5af45ce42d9d9475162c15a19e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 bfc56281809c18c31371ab484ea8cc14f0f87d6f7bbc59992558ec0438d2b9f3
MD5 1e8a4c542bbdaebe61f2899946a4dd4f
BLAKE2b-256 cc18574bd3270635ccdf44b94e336d411db5908138c1f33e638ae6b9d85e8925

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2568bea319f3fbe15bee4af1fe5903aab3868795ec1211f1b9d956c3a60933e1
MD5 9ffd842768c0de23509f9686367d3b8e
BLAKE2b-256 bd0b99e8a5266ca8c4a11e388a2f4e3d476045aa62563d5c254e0794d18c9d2a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 13cfeac807705098db7f2b2cb1b69d1f6766356d2b242a5fa038d2b14ddbabe3
MD5 bd781461f5ae36671d8124539e67df83
BLAKE2b-256 feaf486dbaec7a4b6d3126d5248bed918b8df1387cda34512b02a1e361784c82

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 4dce55751eafa23c78e665778a6a6c182973f361e556d2cca4f262675e276444
MD5 1ac41d2af2205af6ae9857254afc9b38
BLAKE2b-256 26091891f370faf1dbe3eb1679cc42e3c3986749b45bfb5e231796b961ad80c8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fc571d74fa0eba10ee06e034c637acd4acf967977fdb434d96d8b020d82efefa
MD5 e3c9373f45d6b24f7f0478185d9d7354
BLAKE2b-256 a8e1bff3658a75f5b4da98a1db37b67cc8b947d9a9ae09a2699fbe086e66f2c3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6d31db47fe594a0c6b5ad9ac90720a04aace89d25f5b12613f73b3f8a73327df
MD5 cd741df7ff68052dcbf754635dcf1c0b
BLAKE2b-256 e3ab8e5398de481521b78a76589df2280228b1cffd0528d532d4c34009ab917f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 37752730aeafa7895610e8a18b329b9a5ff9c9058f2efc809dfa4cbc86dcf4b9
MD5 5a13c4f06efada4b909ceb92cfe308e4
BLAKE2b-256 1f4b44812f098204a40d6f68dac74e01a0738e5ac3f361eb8e113196e0251915

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f5a784ad3ca7f34992a069e4dd399d90486e669e253b2fe79d37dac20cbe7932
MD5 c53e72c645e9cc1359abbef058b13008
BLAKE2b-256 5711230bf0e2da743b897da795c671a29184174bb7ea4810136b1afe990a9ffd

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 2b07de3a19b6049883973c1cb802b35cf741a618003a750aa11dc8f3c88bda8d
MD5 e75e5b69b05023f314106b2a08a2f53b
BLAKE2b-256 d7c238cf7537f04b693305192dbce3d656f40c950e65c3ac499c257a8bbbd0a2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e736bf7cf66aed125b169df493dfee2488b105298ea6265b144dc89530a9eb81
MD5 5deffaed713f5941b7e0721675e30f40
BLAKE2b-256 53dd425b6fb92a0cacdb3fd04fbba4263970354da2a75dfe5941496012395581

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8733c3901949858dc9f6673812ea63e8bfafdf05cfd9754d32c493b684b835e6
MD5 6119306a96afe089d23681c5f60c1e51
BLAKE2b-256 9d2049424ef756a42228bad1b64749c65e488327d117a5db46408786d08bd0ae

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7f62fa297d02bfa2d15da62ba869c635449fdff0b6f6ae1395fc7293b507b856
MD5 ad3da52958ed7fe46f5b91b1d12123e9
BLAKE2b-256 d8e2750b4167b6da637ae30a7c6518176ff3b6f64184bf336fe909cd229e4cc9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.58-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.58-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 275967216f668f90d8f612da392aa8a7cb0f70307683c7fd570b762c7f3ba0b7
MD5 295e889421fea23c3bc17e498bc9c347
BLAKE2b-256 6f773a57c911264bea5ad3303d9a768b0ae1ebabc4e69e7a91abccbfb378fe90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page