Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.72.tar.gz (10.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.72-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.72-cp313-cp313t-manylinux_2_28_aarch64.whl (15.6 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.72-cp313-cp313-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.72-cp313-cp313-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.72-cp313-cp313-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.72-cp313-cp313-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.72-cp313-cp313-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.72-cp312-cp312-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.72-cp312-cp312-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.72-cp312-cp312-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.72-cp312-cp312-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.72-cp312-cp312-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.72-cp311-cp311-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.72-cp311-cp311-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.72-cp311-cp311-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.72-cp311-cp311-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.72-cp311-cp311-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.72.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.72.tar.gz
  • Upload date:
  • Size: 10.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.72.tar.gz
Algorithm Hash digest
SHA256 d98fa3121b6089ad9f2c198738805f970e1e61d0e6d33a2e7e16f0a44f503ba8
MD5 ca8188297b2cfa6781c27714dd1c1395
BLAKE2b-256 7d34e5a499bef2286d81c1f4019eacd90470b7cc9b0a7598ecd891085992c00b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 dc5608458be60f991c0ff94bc6b23a99afd447cf09643e9b187b227e9e24193e
MD5 8c15e4d9a97f9615e86f4aed4c775a97
BLAKE2b-256 5c149bbf73f9decec055c79b00daf2f2241521caa61d92e7da84da2d8c050d8a

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a0c4077dda1f1aac005e5ede19bc0e9096b08db12a57a21d98c26a3ce286e9a6
MD5 6b29b05dbe05a08ff1cf5bf93231649c
BLAKE2b-256 40d056cc6fe4bf2f56cf9ae422975ea02885a2ef6e16deeba9196d57ceccf56e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 6f23a8fc578e83341136405a337c73ed16daa1103758a3331a98123eb2f1732e
MD5 a983739779cba9f48335c80aaf760e7c
BLAKE2b-256 7f5816c9c609cbca74976c19f03b4c1d95d2a6fa7066f4f84e9144c72c130e66

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 16c86152a692a8012da243928cc73c599f2822d049869a5338e43aa66c5f8b37
MD5 62d6eeeafe22fa852964e4d93e5e19da
BLAKE2b-256 e2a4d824aa1af22358f79fb7ec0fe559932d36756899378554f592d7e364b07e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 24222c0d81e107aa7d7f3ba8e4694dfa4dc8d8bace8355e66ccf7ed413e92fe1
MD5 167674faa5f9320591cf4d75e0325921
BLAKE2b-256 eae5d0f5363a8a41c94f5443a2ede2bd550b088c52b1d1f37c0baf8279d45503

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 03b7e859c112645157d641db005db98b6e8aa1144fdd720571e79b55be1eaf6b
MD5 495b3f4d23df6dc2bd7a0a823199fd25
BLAKE2b-256 7625e3ce48e6fcf5b624c6ec1f7419aaecdde14436b4091dab88f938ff659b7c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 328b4d4f67aedb418c90b22659965d0d3590a35432d39795a55e37ddcb38b0d2
MD5 9883eda50eec924951133d6dc97ebe3f
BLAKE2b-256 99777b7eac0ec12488dffad070f8cf68be1a2787d5826af05aa778ff7752aedb

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 80bde2cf5b6d73dce8bd3dad292ffaa79cfd9211f81b9b9c22b91aa03ee08b62
MD5 f940e3856b364b2be19ae1178669c753
BLAKE2b-256 d4ae6d946702cce15e17a34bf67bae2842eb6d3cf9a1caf36c34ac7ce781d64c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b3793acdc6f9e6e7fdd7d81f90cf3d4736832775dc7dd7638b406f8c8c7edcf5
MD5 9a036747c99e5fa7f693b3b1bcd74154
BLAKE2b-256 e08ec4fcc1aad7314572818881a965c75f83d765f5842651899f224f9f8a234e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 91ddd3eb33abce765c594931f48d83371a8af2fe820b1d09a69e7ed1734548a7
MD5 0c10408b165d1eae70e5756ba0e5297f
BLAKE2b-256 e5917e526bbcfffe052d8eddac0b6c66e36d90169f9d99dc149eab5b77751784

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6c9edc1f893b574ab52c5fd039f352e5566e7cf165c0db4c2b10298ff2281f2f
MD5 88d34501c7a6802a01ca625133b25e44
BLAKE2b-256 e80404b47015eca8e1f2d1ca96461c2f14e9deef1d2f6b789bc60f3f43da6566

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4439278e26f45eaf2778375b2ee666d2d49ba6e7fe57c5ed0bca6f6f27d01bff
MD5 68348a33d908af0654af0516b5547f87
BLAKE2b-256 62850422cbdc34a3956ebae0580b0493e49ec1cc8208ea710d234bd8a0b1d344

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 12747e5c8acfaf897154ef90b807de229c59705de8fc9b4e5403dbef04e6f070
MD5 3fb92fa0fca53dfe2b17043ad56e1458
BLAKE2b-256 a270c7844dcfd354ed2559dcfa40b15e9324bab388b9aa60d17558d976ed1bae

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 222f554b6088256a89b309d2bf9d2a7c449bbd5c3a79c9510e172b026056a48c
MD5 1fee13f8ec3316bfa374e19e1dc315c5
BLAKE2b-256 f8ade30a4b921ab92f71b99ed4fdceeb827f29fab63cdf3d89aa4364e36a0013

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2197da847005a9fab975c3def19b62a05cb3aa4ae79d08f27c961b92de33d91f
MD5 98c0dd2e8e3c2481595ce6aa24176f7c
BLAKE2b-256 bcef177b6ed3387abd5525c23d6556c0ed9121f7c796780f8646f2f0082540d6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 769130a9796994e71e19848d2eabd1d6a13bb32bc244d7f6fdbb4276573b0341
MD5 23dde294a2ac9681553db97069253659
BLAKE2b-256 4ec809345f94023d44c461e29e765d05211751d7a902e182fbaac03c58100e6b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.72-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.72-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d5a717b23d90bbb896fd5ae9a5c5626b07c01b593bc1837a20c763a259f4b103
MD5 08b5a721d19034803f931e0991c3e325
BLAKE2b-256 a87a4ad7081b47bc4554daa07bcda8b1ac7fd74b8e7083e594d5e905aa971db2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page