Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.56.tar.gz (6.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.56-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.56-cp313-cp313t-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.56-cp313-cp313-win_amd64.whl (13.6 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.56-cp313-cp313-manylinux_2_28_x86_64.whl (14.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.56-cp313-cp313-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.56-cp313-cp313-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.56-cp313-cp313-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.56-cp312-cp312-win_amd64.whl (13.6 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.56-cp312-cp312-manylinux_2_28_x86_64.whl (14.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.56-cp312-cp312-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.56-cp312-cp312-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.56-cp312-cp312-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.56-cp311-cp311-win_amd64.whl (13.6 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.56-cp311-cp311-manylinux_2_28_x86_64.whl (14.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.56-cp311-cp311-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.56-cp311-cp311-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.56-cp311-cp311-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.56.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.56.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for cocoindex-0.1.56.tar.gz
Algorithm Hash digest
SHA256 eadd336c00052bf08fb43e6e39e028be658ec835ce0900d2c53ccf76492b8dad
MD5 d1aceb50e4116dddf7310a6bd551438f
BLAKE2b-256 1b377c93329e38b2d459da5d09743a6f795c437a046fc366f926e8a47cf193f0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7ae0c8e97bc78a20483fe6922c8fa00eb5c7803ed7b4fcc64bbdc2318e57262c
MD5 c77b3376fa4939349963f883894185e2
BLAKE2b-256 8930b1fa209811acfd727b8176cd9c419465ef56fc0b9a4df92885cd37bf52aa

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a713da3cfd9462d0f560c728aa83bd0dccc786bc7d9edf77af0882152320848b
MD5 1608ca12efca788185600e6764134a0e
BLAKE2b-256 e577984e3d81e3a7b8bd452c2a93401d10bf8a61acd9477a02a70999ed7bf1d6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 c083945ba688b8564b7e3dbf33ca86019dfb6c82c2702917daae72ad1f0068de
MD5 84a3c9d443ca9fedb3fdeaf423dbced8
BLAKE2b-256 d27cbbfefd3314b168350d6ba9593411ccb6c9b7bd5f0a7b763fae3054777de2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 aabed4d49113310df3fe855986e607748355451e58bf958e4fe655d8e8f90d75
MD5 824dd9b7389f87ef8fbdf4e0501f3a47
BLAKE2b-256 98efbe5dcdfd0abe6e62dfc72d3375cd26b0c6ecf4a463da58bb4b35bf2fab4b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c551f48fc728ac693664ad8d4ec5b7a372fbcafdd4f5aa333a368a14ae9a930d
MD5 70cf375580e4bf90048a331b628c05ab
BLAKE2b-256 50807303b4cb4cfd7e0619c65665854b0e918fc807c5386a18743f8087dab982

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 097cc40f772a44b4f618ce7a8ab266cb3c0a4a8affda6dfba454d868a88d13a6
MD5 14d9035ff7640d20fd44e6fb7725f44a
BLAKE2b-256 c2c931f28aca62b793d9c01b0b45a88de790cc43f7e959f10208d324f32f0dc8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1603936921a78faeb0fc1845f74fe2f58bf3850f92f3a8d2e7cfff8242a93273
MD5 2af589f4223b9eae25cd16413c915a5b
BLAKE2b-256 58fe5bbbf7107a4dcb307bf424e2ef69dc71a96ffe01881193ef105a4c06add1

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f56bfd7418aff3775ee439717a600f8fc83fa93350ece089ce8806c2a6c87805
MD5 ade6138f76d866b034a1aad0a625b59e
BLAKE2b-256 a24848d55744e3cb723d3867983afdea287b6b7fc9a6f1f9a2d1c4d8e72782c4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d6ce483714f1e1b7d854d1971d50565afcc921881c5b40403ed4efdd8b99ee2d
MD5 be859a4eb2816efb240dd7309e79810d
BLAKE2b-256 1d41be4d45f252bb56c35d2217dfb59181175e5bc9da4a7ba57cf139a587c6d7

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 43798493b191e1fafea33376f5ffe8da077a01ded6d45133439009405b978447
MD5 940847d607d20bfa9fd5882e432be5c2
BLAKE2b-256 de95f425ee5c2b2bd061f03cfe8445efdb7fa5b376fc5119e2aafe8f6df02092

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 514b78867cd3fae8fc15d9756cbca1a48b3b43bd9d40c29cee456b93ed9d62f3
MD5 21fd04ebac3281990379cb571027aaa5
BLAKE2b-256 75bb3928fb2a01e27021bde4bfbe7c1a34b9d909cef9ccf7a0fdbfa81a9bd65d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6de52ec3acbfeded54b43b5ef560f482a529da703166bc0fe905e4bbd065d0d2
MD5 97e6552b632fcf3649f3a1cbc4c7c1c6
BLAKE2b-256 0fa01140a39e3469d6da8a3ca4751303341e840537bbd1290c9f783a347ebaad

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 169809c41aae266350a8110061bd67a0d43013a25ec98dde043d6ca7898d3b03
MD5 f239d5f50083b58a7dcc36295a73ea1a
BLAKE2b-256 f1bf4dbfc877aa52f0e9732d676b2dd93d3b9bfa73e3d57f3410d51b2bf60139

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b065bb74d4d0089af34b6a5a1ade938b06d3387c15b312e784d324bdcdfc2895
MD5 c7faeac14f7ed5f1fc1f10aff06c05d8
BLAKE2b-256 b216fc91aef742892333bd0cd8f2dd5c8322ef31e1c4595435680954e5322998

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 5916df55d01eec25d36b13c2aeb1c7bf9fb973e3f841804271d918566791303f
MD5 4bdcfd614248c2891af6b3867bc51fc8
BLAKE2b-256 5f934060309b8171a2368842f995404cbc0320baab83053e50de7728d50bc153

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d83b13e7b2b54b2cbd7a4547c10fbb5ac665f6ec9f30da40fb9930bb2a5d387b
MD5 5598c996725ecdc7ebdf5ea57efe1bae
BLAKE2b-256 ec8dfbf5df668b83aa3209f0903ada3230c505fadbf3a2f898996402b18899c4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.56-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.56-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f4298cca973654b0b039c735335a4a38dadd4c2f1dfc3eecfbe103206b4c10c4
MD5 c05ea38c253727d427b7cf3d581a53dd
BLAKE2b-256 a27ba7696bd30a7996dcbed85613aa192f8b79f18ef1c627e03d197a6b6389c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page