Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version

PyPI Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper
Multi Format Indexing Build visual document index from PDFs and images with ColPali for semantic search
Custom Output Files Convert markdown files to HTML files and save them to a local directory, using CocoIndex Custom Targets
Patient intake form extraction Use LLM to extract structured data from patient intake forms with different formats

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.81.tar.gz (15.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.81-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.81-cp313-cp313t-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.81-cp313-cp313-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.81-cp313-cp313-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.81-cp313-cp313-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.81-cp313-cp313-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.81-cp313-cp313-macosx_10_12_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.81-cp312-cp312-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.81-cp312-cp312-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.81-cp312-cp312-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.81-cp312-cp312-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.81-cp312-cp312-macosx_10_12_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.81-cp311-cp311-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.81-cp311-cp311-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.81-cp311-cp311-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.81-cp311-cp311-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.81-cp311-cp311-macosx_10_12_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.81.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.81.tar.gz
  • Upload date:
  • Size: 15.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.3

File hashes

Hashes for cocoindex-0.1.81.tar.gz
Algorithm Hash digest
SHA256 b2ff88addce0955d8d1b6444e453884da91cd57a20d0ca749f408346ee845cba
MD5 493fd3de605cd6e3c8acfa325fcc6d06
BLAKE2b-256 cd23f274913a0c44f25993a4cf27daa1a95b4928860aa31c527009c215548c5a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 5338490e2bbf800cb4423a5a53703ad06ae4dc55c16e04e273dd809c1b6896ee
MD5 d1691e9439e34f9b061fa0bf7d830afd
BLAKE2b-256 37a94a35a972c2a35bb1c5e394bf805a4e6731c4878d8d7dae827d79f3730265

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c261ac167df8f2f08b982ce981cfe9bf6c9b8174ed90dc352255402c24a21d5c
MD5 a4ae12314f864c04965f973cbc3d5fe7
BLAKE2b-256 c36ba4d0e33a7b9b20a66dbb8c82108d4746c28ea434fc2b363f2becb8d5d3c6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 7cd297c42b9c3667735cab7f4976c69acd8657d1cba5e0a5695172df8de4ba92
MD5 3f32df6562270922454e33259fdf8d88
BLAKE2b-256 071e3cf95b359df203d88eba2dac8402cd16c7f93e7945d217fe106718957b92

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 16f9e4c31ada89cbd215b8fb3dcc43fd5a0cbb9f0c9b174f11feaaee03838015
MD5 f07f250b21de7c37419341a95f4dfe58
BLAKE2b-256 8b67c653f9c33e0406a808da9fcc6ec36e65ff1fb9e9e4b31ee31c660ca00262

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 71373349cf48549230eb7ea07094903881356dfc2721b9e3fb07f048441825d2
MD5 153e0babf13ea8fac2f4b151f6c70eaf
BLAKE2b-256 d0084e449f5a2055faf8c3cf5ff6ec3f9c9dd009762f83c7c1bc10034b5050ec

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5258a2f345a0a506cd9d264baf7ed8b177703712dbdbc77617e29a424b009c49
MD5 7694ff744668b8609925b6f7e887fb98
BLAKE2b-256 d21c25d3769db54d20626d36d49df630db7932784de9b4700a7170acdb2338cd

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 29d714394ef6b16cf8348d1b76f6049abe6bc4c6e8c09ee8593149730a184570
MD5 aad67ad0e0a0decd123a616837ef859c
BLAKE2b-256 cb10ed52028dfdec5b96d69b20d3ddf837d696d3f300c45e41b3ea4dee3eeaab

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d43cf9a947011d8b76ecd385a6f8d198331ba0d5ac8a4de48bc1a71b781282f3
MD5 2970400d22ca7e6b478f95e8549764e2
BLAKE2b-256 f36b98758286f96db6a7335bd5cbf8abb9404193f539497b9a9cc709e2e85f29

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 30c7d3297aaf8c4bf2962d046dd6971f11088e3fb77051a64c5524e0ef811882
MD5 d30bd78f2994129f2e2764bd89328dbd
BLAKE2b-256 7e5a89dbf868d682d0e31fa85c4ac44b5c8cb5db065b747b19161c7ad9869ce7

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c60a618ec483631e160f8948bf3790b3419d11c9624b40c8cb7fd5486d0c7265
MD5 1dae6273c44970498dcdfc04e5bb9b0b
BLAKE2b-256 fe149485b5e7b636fa02b89b9f3e5d1b8ac499451900f2166bf9f1dfa70f499c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aac5ac2da001fa1156c36f37532889ebdf6243208e27de5bddb89d71c3e74193
MD5 98951a92f487aa35f1a631b73366e582
BLAKE2b-256 5487438fca0d40d93e4658f624c8582ca9081ead793a7c6aa2d2447389bbfd1f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 159e9db25b4b25deb2d843af953abe6d40e1a9e4fa5bec42ed89d3fd57e963ef
MD5 ce8441a6cd5f6d5d90c4d383ed6084b4
BLAKE2b-256 f3447d150fd036c8e179e1d04080c75c30abba3bf9e6cafa7572edd330cab3e9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 a1dbc6a2877f7cd8782ef59841216114f91aad099fdf685c5c1f94a978c570a7
MD5 48728a22ca570408b0cc54b5f002fc36
BLAKE2b-256 8f3e4986bf47bb5b1a460271b487fd14bdaf8e18458df5f220c403e8296eb1e9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 eeb012f77bfa877a3ad02fd8b463863509377b3b5355a65929efbc60084053e0
MD5 dfcbdb4809e089b14b045b17471a2ba9
BLAKE2b-256 7989a1cad4caa1c23c9d628987c877dbfafe2c8d9a6818906176a92dcdc32a7a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0689131993d8dfcb13ad6351520cea6c5a11f8b08d011230b365f9cf45c6b442
MD5 2c8674bbc609a22b21a17fcef3441bfa
BLAKE2b-256 dd36141f59fb55aaae51ac034efd634c6a4baa58e06938ec29c95f0d1f06e5ba

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 868ac8bfdcd59249decb177e2db727534f53ac1a87a684e53c6da3860fb1ce75
MD5 18fa8319b3aa78a13db17d900208f302
BLAKE2b-256 68d3401981617e89e451fb53b28a8a420a3a74614a07f758996ebf0ecc5704c1

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.81-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.81-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a381e2ab398bc2c2e1e5575053b0173c1649c5758eaf02479716662546db80a0
MD5 6b5f259f8bc3ffa4a7d4b473b3e2f501
BLAKE2b-256 4562ad2a5cb3abdadc9a084ef454bb38852d2b8ae1d73ed073e65845c41800fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page