Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.68.tar.gz (9.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.68-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.68-cp313-cp313t-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.68-cp313-cp313-win_amd64.whl (15.2 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.68-cp313-cp313-manylinux_2_28_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.68-cp313-cp313-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.68-cp313-cp313-macosx_11_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.68-cp313-cp313-macosx_10_12_x86_64.whl (15.7 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.68-cp312-cp312-win_amd64.whl (15.2 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.68-cp312-cp312-manylinux_2_28_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.68-cp312-cp312-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.68-cp312-cp312-macosx_11_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.68-cp312-cp312-macosx_10_12_x86_64.whl (15.7 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.68-cp311-cp311-win_amd64.whl (15.2 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.68-cp311-cp311-manylinux_2_28_x86_64.whl (15.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.68-cp311-cp311-manylinux_2_28_aarch64.whl (15.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.68-cp311-cp311-macosx_11_0_arm64.whl (15.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.68-cp311-cp311-macosx_10_12_x86_64.whl (15.7 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.68.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.68.tar.gz
  • Upload date:
  • Size: 9.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.68.tar.gz
Algorithm Hash digest
SHA256 9f01ab636269100174980d292d85d3cc14ff8e4630ace4283138f9b08cb0ced7
MD5 bd4133e8262c81c2d18942698fbc2768
BLAKE2b-256 ca084314bdc3d8c8571ed8ddd46cc566ba199f569b11ab9d4c2903cc26a358f3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8a93df0d97ddebb9ae89408cb5ce46da8d2abfa8259801b6322c80d1b46dfafe
MD5 68e6783395268da9f2ff1972ffd01c18
BLAKE2b-256 0273ff2bc3ab44603b6a87f54172b56b53e95f32c7e97ec0d94f32ca35da0db8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c3b20aa3cfd9bd608d0e70b2cf805e5eab6ba0fe364996290d2057fd4800cdc1
MD5 e0cac705a3049847b03348a9ffad95df
BLAKE2b-256 39378d31fbb4617956b54214fce2540c1eca176f59ed19a87dd0a704f8ea33d9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 77043c858f794fbd1ad66bb1fb5eac44f1ed1e2e524295ada22c18a5584eceaa
MD5 407faa7d836bacafb7da3c86ed7127aa
BLAKE2b-256 5bd3223dabff223eb08c665c05df7ac206438375ccd84a3cbeab95393da4eab0

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ca3925da658a5a06bab9f64ec245b23550fd552e8160fcccf753dbfdc04cf79e
MD5 ef5d77bac668f116aa544e6aaf8a0ccc
BLAKE2b-256 a8685dba0ca1c4756792f831e81b97bd4b8658d08009d2f91f605ed12b301f28

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f84410c653b35de87070a451dc98ab58dc8b9569182f707269efb995aacde092
MD5 cffbb75d94c68ba21bb3ec939d5ce2d8
BLAKE2b-256 f2bcaf1e76adf56a5a6a2c5f0818932ae0cca379c9df2f89b227053f3bf400c3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3bf0c80418d38d629eb1d9e5344873c7679f64aa29d6327018827310f45ebfde
MD5 c139b221049ab55722db7481f1428140
BLAKE2b-256 c7563c7504c789c66fa85950c3509cb5ab4cf2ce907bf8e58f83fdb250e40999

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1c719a5e934387595fa9555af1e105cf48e587195aada777ec4cbc74fdeebdcc
MD5 e7c00692aac2fcbb39da641de09e79b1
BLAKE2b-256 a2bf25be94383f02f1c04e42b353517d3f48ed874d6c3da94591b7701162a5be

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 443ae39c636ea557f9617d4e45be9ebba23cb1a2b9d139bdaadec4eb874eb0b8
MD5 dcde4e42d7aa384ad0057d2aa1f62334
BLAKE2b-256 832b55da356a4116b6ff1ec51fede33c0055a2c3591cadef3036aebc5a2ac229

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fa81ef8ccb9f6369e57c5e0c4a71c7bdd070a447aca5c1b6d122c8ab958e14ec
MD5 7369de3e8f934813648fd99db0e77980
BLAKE2b-256 d5973907b7d806f939ab4352de2f9e3eeea491f322ed5df25afeda4f956ecebb

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 62967c40592f61794e81444acafb082a9fed45a8299f45ad17b86f56035539bd
MD5 072fc4e8ba2bebdf14b48baed0e33abe
BLAKE2b-256 16cec6a3ed70ec01fea63a5d70a8acaea95f7ac57da334be59e27609aa460d0f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 68a13697c0dd498c84f91fc47d7af1de2aeb9f15257f73a8129596e17c4385c3
MD5 addd0090c457552360e950232cc6b938
BLAKE2b-256 2a7f2b72bb74e2ae886e1cb7089ec8eab787346f635430acf7db1966fa534a6b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6195f4bf4b5c647d4a73c9d2cd42bed7d5cb9968eb6e415ba10fb2442cf00e6a
MD5 c338a3bc5d0b6d787324c43a64a22c47
BLAKE2b-256 67dab16cf0af0731e4cf8c3e27b68632ddca7d5f7e431710d0b816bee3a996b4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 bbe5a3ec3b60794c98a74acf59b1cfc78b18ca5f3c9d5221c237521c715a88fd
MD5 92cda0f9f31effd9fad66cd0f18f1048
BLAKE2b-256 83b42f9574e5e5a4eb3ea9e241cebff27f43880ddbf735fbfb64b249fa1101be

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 988e8bb277df2f2e725cd286f3a54e6d8d5e9ea94ff0ce60c13cbced1c0ac748
MD5 4aa232aecb90e4be87af2b38dca3e3c4
BLAKE2b-256 e0206439619a884afb1b2f1507da5f5f44afa9d01c824f332a93ed3b9dee6515

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b3d26c3346761f8ef800ac27051cb6edace3b798e196c5aa221b2bb767dfd466
MD5 46d17635f7598b6ccb1255e9f75fcc8b
BLAKE2b-256 0bd1b4cb1e5f52b717a8aa00791c11ca20cc1b5f6609293ae72461adf8d31161

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d9b810e6dea590d53cb747b7928476196de85ddd350f4806c86cdbf8b71e3769
MD5 a0f9450d7d098dc3e943af657610967d
BLAKE2b-256 c5826b018b67dab508c1b23f2a678559016434e0bf3c9dc551bec681c094a31b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.68-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.68-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 405de6500ea98d9388e067fa0cf28d180d0a9f035be4008cd6fc4c8a4f0ca02e
MD5 1a916670e1d3a782b7ee3b01b21c4508
BLAKE2b-256 6b32c82bfc96a96bee04f0c6d319d90201773a6f34173fb2f73c22079e6947c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page