Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Extract, Transform, Index Data. Easy and Fresh. 🌴

GitHub Documentation License PyPI version PyPI - Downloads

CI release Discord

CocoIndex is an ultra performant data transformation framework, with its core engine written in Rust. The problem it tries to solve is to make it easy to prepare fresh data for AI - either creating embedding, building knowledge graphs, or performing other data transformations - and take real-time data pipelines beyond traditional SQL.

CocoIndex Features

The philosophy is to have the framework handle the source updates, and having developers only worry about defining a series of data transformation, inspired by spreadsheet.

Dataflow programming

Unlike a workflow orchestration framework where data is usually opaque, in CocoIndex, data and data operations are first class citizens. CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, users don't explicitly mutate data by creating, updating and deleting. Rather, they define something like - for a set of source data, this is the transformation or formula. The framework takes care of the data operations such as when to create, update, or delete.

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

Data Freshness

As a data framework, CocoIndex takes it to the next level on data freshness. Incremental processing is one of the core values provided by CocoIndex.

Incremental Processing

The frameworks takes care of

  • Change data capture.
  • Figure out what exactly needs to be updated, and only updating that without having to recompute everything.

This makes it fast to reflect any source updates to the target store. If you have concerns with surfacing stale data to AI agents and are spending lots of efforts working on infra piece to optimize the latency, the framework actually handles it for you.

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.62.tar.gz (6.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.62-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.62-cp313-cp313t-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.62-cp313-cp313-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.62-cp313-cp313-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.62-cp313-cp313-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.62-cp313-cp313-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.62-cp313-cp313-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.62-cp312-cp312-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.62-cp312-cp312-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.62-cp312-cp312-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.62-cp312-cp312-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.62-cp312-cp312-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.62-cp311-cp311-win_amd64.whl (13.7 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.62-cp311-cp311-manylinux_2_28_x86_64.whl (14.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.62-cp311-cp311-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.62-cp311-cp311-macosx_11_0_arm64.whl (13.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.62-cp311-cp311-macosx_10_12_x86_64.whl (14.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.62.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.62.tar.gz
  • Upload date:
  • Size: 6.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for cocoindex-0.1.62.tar.gz
Algorithm Hash digest
SHA256 47aa7c0ce0fa6e1c2d1d21b9534f9e86ba0541e24ca400beb8d4afd9320abf61
MD5 36eb187d99a256413ccdaeca09b14def
BLAKE2b-256 0aefdefd1449be5d730d18ae82a24ba9c06d01d42da87954bf5a249afc7a33cf

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9096b22dfe35e54755ec6e664eb67434d521bcb127c0b24a34f1895a974a332b
MD5 29c71704293ccee07bb840ce7fd3fe7f
BLAKE2b-256 cff58d379c00b84216d44b1f58171e491f892436a0e0a4288f30f63ca82f10d8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6c577727e1367344935892b3c762866891989df065362439a97014922ddaac9e
MD5 098298a1087f94bdc670fba12045a871
BLAKE2b-256 137cd15ac258ce9b81eb0c979690d30a06a48d67115ab945686c5ae102855a8d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 240b4ef0776e4df237f742a6cc75251cd47c12160c523a8c5e43f02a42defeff
MD5 65a231bb05679aff727eb2cc96a74781
BLAKE2b-256 da967524ed39f571358527c200d6771b0681dc12e3b98e59d8d4221da10cece9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0e04464fe9d89a24413a27811b933e86d94feacc2735770573493a875bdacc15
MD5 911cc20f756393985bafe64d413524f0
BLAKE2b-256 50f2556d9dbdcdc08df776999e8147ea288e3b8e5ed223516a6edbd920cf81f8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 45d37bc6478ef411f6cc36b49fbcdee54155f091ee16b7721ae8f4d83f7dd468
MD5 13c4ceafb786ce4cacd1a384c570b2f2
BLAKE2b-256 ec77200151b75d65c63bcc710c7640afed9e7c3127813008446f8193b107577d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 999ea94fe79980881f3a0d3e2a54538cc4d8bcb73314e37626d3ce5d35146d69
MD5 6860c519885c28fec5bea9a795ce8ee0
BLAKE2b-256 eda1d0a4032e4cd100342b26fd3b7ea95b46877070209b40ce7ea6a5b0975a56

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 78fc8b016c1b90f9f4ef582dea3e7fdddbdea378464842f21427436f270b37a9
MD5 407ae3a8a9a622819a8892cc5b3e58e2
BLAKE2b-256 89a7fcda4c94e6a6e66180ea584e93a6dfe61e4c9029285af70d917047435dd9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ab75b2487e3bba2975f72b0375efc1172c212979ce553fb9898c36fdfa47b804
MD5 670b9dc460cf5a53d8639f64c9003da5
BLAKE2b-256 ccba247fc2c96226fe681f5eecd582ffb88470c58788f42b659466eac5bc4d37

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 84e10311c9e897f15718595bfabc0236108dd05c4d50b5e4ad38cd22a3114e41
MD5 b888bb4f9b4ff3e5f7a5e2bb75297384
BLAKE2b-256 859b4801792751b836678a39d60af35b43712e61d8bdfc539b0a91a06e19159e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0960c2cd7fdd00a44cbbd9e436d4ed11111934f42d6c88bdf65fac762b3c61bc
MD5 c14078299abe956e1dc808d070d6e388
BLAKE2b-256 575aba49a49a40dc6f91fb013d5815760d52956f6cf3363b442984817220a2d0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dac7a34268becb088110175f1e082c2a6c4fe4eb7e4d4bf2346c30acfc98b13e
MD5 cf1ce58bfb1ac3fcf08966afefadc4a9
BLAKE2b-256 ef962359496391e868106e6d5accd3a6ebc6444a8520532c88420b18d1144b0b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cdaf9f4b857012cdd67968f0ef40ceca635b2dd18d1437fe6783944252f42d24
MD5 6440122fe097cbf1fad3391a2722a576
BLAKE2b-256 02a622eeead3ca302a3e507da5840a4df1b0259fd9b599ba7f6de6476e31b7a2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 c39d78159fb6cf8e68f0bfe4f509aa0c4b8445f65a0100fc2f3df875292ebe6a
MD5 a3f9184d4e937900fb0cfde2b4d4f0fb
BLAKE2b-256 f1994b1d0ac2a4c07bd8600dc42215c79d8cd6067cab72a1020b273cb40d32a0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 303e5782acc2d723ee368155128c8db01a197f7ba3e3eedc55fd23543eac5505
MD5 eebd71e9474bcf71a4d80fa4bb54201c
BLAKE2b-256 e4254458683e9ef09f3133ad414c84f0ff64875c8ec5e3c0b13e2409aec32e0b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 4dae44163aea94e8c10f117c91dff49eb49bc2281152a6a003fdc8add37d953d
MD5 9d4a3e75a791dc75916cf374741103a0
BLAKE2b-256 8a5ee963a6e15386bbf53853e8daa6975df9ede443e82905bc87433709010ee8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 93f196a66efb6d59cfdb37b8d6dd2eb43f65d128444981142abd31eb311758fb
MD5 b2bbec0babdb891cff47fe356e3c6fd3
BLAKE2b-256 29abb8d0a42c5a40db60b9f687e993f4207d5ed59c9aaa5191fa8bc553b6bd06

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.62-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.62-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6d37df19aecdd185db73333337bfe29c79d9f94aa138c928198c535a97c9cabd
MD5 f0925fea06d48b4f863d658132e86327
BLAKE2b-256 d9ac6d5e60d9862dca1997483f979301609bff66355f28f5b21254cf047437ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page