Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version

PyPI Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper
Multi Format Indexing Build visual document index from PDFs and images with ColPali for semantic search
Custom Output Files Convert markdown files to HTML files and save them to a local directory, using CocoIndex Custom Targets
Patient intake form extraction Use LLM to extract structured data from patient intake forms with different formats

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.80.tar.gz (15.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.80-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.80-cp313-cp313t-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.80-cp313-cp313-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.80-cp313-cp313-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.80-cp313-cp313-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.80-cp313-cp313-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.80-cp313-cp313-macosx_10_12_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.80-cp312-cp312-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.80-cp312-cp312-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.80-cp312-cp312-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.80-cp312-cp312-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.80-cp312-cp312-macosx_10_12_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.80-cp311-cp311-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.80-cp311-cp311-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.80-cp311-cp311-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.80-cp311-cp311-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.80-cp311-cp311-macosx_10_12_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.80.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.80.tar.gz
  • Upload date:
  • Size: 15.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.3

File hashes

Hashes for cocoindex-0.1.80.tar.gz
Algorithm Hash digest
SHA256 036d5fe110b4988e9cce33ecdd7cfc7acb7a9adcb9837c69f896fb3bd8e47c00
MD5 e1d83aa3f7670b94b5166640624cc3ff
BLAKE2b-256 91897b5aa8f9ef4bc41a005aeadd3f6bc0b6b0d18016b26bce7dd9c232ccdc13

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 be484598f2417fd731a020cd326d14f49f11f45ac2bcecd016e37ced1595d46e
MD5 744ad30d1a1ea67d62caae6a4635b722
BLAKE2b-256 34b4991d9f2de121d90fbf6eaf028c98a46226c86b828ce68ae9e88e92958c99

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e682688851660cbe391bffdbd09dd61bb3184b8b494cbdeb133a5386fa5dad54
MD5 e2415f327987d7a40ca51a8cc27f18e9
BLAKE2b-256 6d908ee93800775649e10ad2da4136f60d4964cebfa5207d1d85182428f10e81

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 0f2b15778465caec3066abbaae29d35313d42f5455287b0075d70ead9d40b956
MD5 fb5fcd307669b99a0619e5f9038b51fb
BLAKE2b-256 204827b246386d465fddd6236a05ecf303c31a9ef7639bdcbf49bc44b3aa5418

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3c205af6155c11352007468990e17f523446ecda076a0177d73a61c26757d264
MD5 a3f6bd138c4bdc2f781d6e221735e55b
BLAKE2b-256 71a57f6e6e513c03dda3517de4bd109b5ece526508ec9f69fd13339b5bb7a43f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c25156c7669fa47262a8e990c6797009fc68d212b61fbeabea3dd1ab54aeb424
MD5 dabf89e119b900c1a94a6da9b41d4c29
BLAKE2b-256 b92c7ca40041850933f652d6f0805da7819ecdcd04282779ec51b5e91393fc48

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 38539ce4fe809177127e717892909245b0bf6c8780bd9733ae763e2ba22721da
MD5 78bc73cde0feca9670047d015d9ad540
BLAKE2b-256 255b587dfd6b5f9ba39264051d7f5bed4516cd4e97500d532209b71a6bf9b4c5

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 703113e3b9f87e1d8c0a0eb1fcaf194490aa5c6878d8009fc2a2b1d208fb15ec
MD5 23412aa1cb8cc0e6f7eadf43da500903
BLAKE2b-256 38345a191e5e3bb19cdfb9aecbd3166232e1b239188218133080b03a70ae71ae

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 1d1bf9243d92d4d5d9179dfd99f128b8235d2ed8fa977a930c1125cb846fe08e
MD5 aea094d476c08a402aeafb22d0128b58
BLAKE2b-256 da7b75d8e6fcb0d4a019d1229f8de99ac588a81b3be636bfb35f1024a24e69fe

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 644b45b1734c4b83cc3ea257e261fde6ad19f9b1d404d039d24557cf44c00d37
MD5 252abac94d55cc5745376208b1d87178
BLAKE2b-256 57b59c99c9aa88d1bb82a0f7461626e06da3a87893cb45c801fb0035a70f7e6b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d1c9d307d27142bfa49e31746dc84ba35eeb5f53e5ddb80755c7184eed094ac7
MD5 0a49e6d950ac2c84d01439a5453e29cb
BLAKE2b-256 097ea22f7e86daf7761edfd56bb773424a43a6455ecb9c28a264ffe13bd834ef

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cf8fa887574c00bbd5e5429c7ca029e843c106bbfba384a9d1cb3e4313f8c994
MD5 f34fd03a92d8ba6d68866bee803d2d6e
BLAKE2b-256 0faf6c0d46755fa98721a5a1ec181a87fdfd2a3e99146c2c5e64d3fc495718bf

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7bf970ffc82701a3cc8227e078e3d9f9af45206cd3b2d03744302531d867e0d5
MD5 f1ad2b3833a828d41b40902d90c3c342
BLAKE2b-256 e4b6823b65cff164f2de06785185b65e0a72916f1794debb1a712182c37c767e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 ed648599c4cd5f56671b5e2463a5b7b9389ba9b6b402b9d8ea5e6912bfcef74a
MD5 68502b928c58052a73a7a1fc4bd4328e
BLAKE2b-256 bfcf85076b37a043ed61172bd0eca8c722ac352a40c222ea90fa89ef012df066

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 470243bb9e335d2e9257c856b8b73adb6e1833841fdee6629c2b14e887eb1c1f
MD5 1043a214af4ac14b6f001126c0536197
BLAKE2b-256 6f31a7819072cba010c8515a787cdf32161a10c2be2836307a19c9b500b613e9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a5db5cbf50867e024da080bf395724079f1de92464232273b5a61e069de4053b
MD5 9d2ba3b610a8c2b24cdbde8af0a38a02
BLAKE2b-256 29953bb3da065cb657a8837500977073b9a8aa7cbe25b6766319faa64c07ce2b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2e79841809ee450a360536e5b13187701228f954b2dfedde1cac707559a29534
MD5 ab682b514e2a6485e154060a7a262883
BLAKE2b-256 20ae7040a3f376e788e7713088b3247ff2bb3d12e99ad4ba7c2cd0015cc0afdd

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.80-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.80-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ba1b874fda7edd42e09f9bfa53d0013ab4abb6fb542445ecec3ed574eb0ea857
MD5 5e144b85ad0e530eb1c21c0b01228adc
BLAKE2b-256 5804756715a56590a89bfafb402fe778ca653f980653b888681c9934742ce360

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page