Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.55.tar.gz (6.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.55-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.55-cp313-cp313t-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.55-cp313-cp313-win_amd64.whl (13.6 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.55-cp313-cp313-manylinux_2_28_x86_64.whl (14.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.55-cp313-cp313-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.55-cp313-cp313-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.55-cp313-cp313-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.55-cp312-cp312-win_amd64.whl (13.6 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.55-cp312-cp312-manylinux_2_28_x86_64.whl (14.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.55-cp312-cp312-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.55-cp312-cp312-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.55-cp312-cp312-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.55-cp311-cp311-win_amd64.whl (13.6 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.55-cp311-cp311-manylinux_2_28_x86_64.whl (14.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.55-cp311-cp311-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.55-cp311-cp311-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.55-cp311-cp311-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.55.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.55.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for cocoindex-0.1.55.tar.gz
Algorithm Hash digest
SHA256 08053ea68eec1bba0e45b0dc0b10cd5686fa4c486a952a61038bdb709a9218bf
MD5 b52dee85a33cfd636242ac681a5ae0c0
BLAKE2b-256 bfed40c8b8e698916f9fb4e51185c1d3447641a62b355cca763a378c94a1f097

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c0fff62af435921db49db3ffd31eb594194841591b82bae560a76898d9c109c9
MD5 83562ce57ca6aa44b45837e951b2cad1
BLAKE2b-256 6b77d88ff6a7d8762a5cdeead3f0b2811f6b59f7d259fef2880d1f1c7f7b59a8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 be90223a4f412cb1c6bfa552cd2e412099ce17b48cf65f532e56d284735f6a9d
MD5 9c1281ca750629a60f18fd3e2c303f39
BLAKE2b-256 60d61b0ae91824f31e33f0b78baac018edc2e810e948658f5884fbdd46331d08

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 98a0f3cecdbb6551ec41cb1dc5ad0d936ce746976bc7b569a66f5aa37bc287f2
MD5 c36dfa2d5bf1da98bc64db401e14c83f
BLAKE2b-256 70fa844b4bbf37edace9b6b981835e2eebc65ee521e1ea6525d40e9780fd3166

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1cfa82ce9bffe5854ecd0a986671a4ccb6316195ece2a68c5302af1d1f67bc63
MD5 8e6397044956bcbbcb4f07b54d420061
BLAKE2b-256 b6980ce09f9626f463d7b397cc73771f8d04f8822de5ef3e08f71feb691bef8c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 68a78a6e6ad1619d285e53f51628ae344d7571d9139a344dbbedd304f4b030cd
MD5 ed0553a67dc6de7576def95c88842841
BLAKE2b-256 2dceb5df6da6540e8ca47cf5c1ab5e57a08d049a9a2beaba4757fcf0aa582064

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4fb844f5b3cef1595a58aba9c6a5ddcb861c417c59016b089eaab45b154ec471
MD5 542c687546b5d0d454a0495298c5c4d0
BLAKE2b-256 8f5610d1ea50ef0abf60c86f15aa4dbeab79f892f76adf507611850045bdbf9d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 5542aae65124bc7eb3cc57d500215c13698ce45fbb0d213fd6a0185607cdc456
MD5 d942cb54a6ae0ced9c635e953b37981c
BLAKE2b-256 9b5822d9aa34c08cd72cc1be5c19c0f17b9886a1bfc761dccc566cacfe104968

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 34660bb5f8129de84739cbbe742f46aad9e9ee2e110bc88334148805896c1c2b
MD5 ae826956805578d4cd627da53f6139ab
BLAKE2b-256 b39b0734a92b88d98f568f56737396d2e0a1f19ae70e57f1fb659ca7f40c9b26

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 df45c534ce2ac59615249fd690084de113de4ebe083cd0946a7befed2be90a4a
MD5 800fe91529a6dc06e355a24116babff4
BLAKE2b-256 3a2ff2ab103572588edf555bcb422c7374b8d763b67af9c02e8b30f38a2e3613

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7d34d1fdba967a4bb5308df491da93461d6e9ac17f8c36b6c2b4816187f87eb6
MD5 b66bc360511a73dfd7b6e85916e5b3b9
BLAKE2b-256 c207ffe72d0314b595d98b28b448406fdebf597451117402c22e91dadf143c5e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 03526e76c98ddcbc0d7f5af893efa330dcc712cfc58f18954f0caddb4fac952c
MD5 6823958f18e61497e031edbacab70ec6
BLAKE2b-256 394c34398c520fbfdce8a9b1e5e6759d2cc727139685884f5f6c1f9cef05e928

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b376f0532d27091fcb8b791ea2c711d1efc2119ee7a5ab5d38d2eab1f732304a
MD5 980b27ffcc6488223ea270aa79aa76d8
BLAKE2b-256 f129b12c56ca411194d7c703768ef405034b79a42401dca6a9ef7fa42c32f775

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 2c2de6c30b7a0d46d6f58f5af3ea065ba4ad25fad15cca839005b54ed97176ce
MD5 41dea119b4bd7e09c7128a22d48b908f
BLAKE2b-256 1a9810819bd8fce34481aff2b4750253b8eb3b4a2a931c8aeb1f2857c9ee796b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 43dd28f2a8137da2e3cd70f78c4ce7fe4b028d70daf38f035001e2a1f22940ce
MD5 6cbabcd2dccaa2073eb392e09e6b8d9a
BLAKE2b-256 1d685e5379e1bbca6e49e9e9415954c0057c5ece0fe1a4e793666071c0e564b0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e7c828d70d1f5c392e61bc4fd209f998df94a83fde11130b8b83def0119ec664
MD5 cac29c3c0c310bf8764b1d63849a4452
BLAKE2b-256 b9b87b859133b683ccaa931c5a5e2eab4961b1021e10978d948d882a61b09e47

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1e2253ac3402d6ec99bb2b088717f0fc66ca24018efb50709ef0ba03dc24f7a8
MD5 7c88d36a978d33a2a8107fa90b337650
BLAKE2b-256 8c95370e9c186ca2b37db64de0140f038114029c6f957b844616b0151d6a2cae

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.55-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.55-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a1391fa61b1aa063afcfff605681bf804095ed1da06a6b2de27eba2f96931d1d
MD5 0637a580b0f34cc355b8bf60162325c6
BLAKE2b-256 b4fb912de78e86930fbbc0b641886ed0a95c2ef00b35ce5c794e65335d91af52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page