Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version

PyPI Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it effortless to transform data with AI, and keep source data and target in sync. Whether you’re building a vector index for RAG, creating knowledge graphs, or performing any custom data transformations — goes beyond SQL.


CocoIndex Features


Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Plug-and-Play Building Blocks

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components - as easy as assembling building blocks.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper
Multi Format Indexing Build visual document index from PDFs and images with ColPali for semantic search
Custom Output Files Convert markdown files to HTML files and save them to a local directory, using CocoIndex Custom Targets
Patient intake form extraction Use LLM to extract structured data from patient intake forms with different formats

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.82.tar.gz (38.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.82-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.82-cp313-cp313t-manylinux_2_28_aarch64.whl (15.9 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.82-cp313-cp313-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.82-cp313-cp313-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.82-cp313-cp313-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.82-cp313-cp313-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.82-cp313-cp313-macosx_10_12_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.82-cp312-cp312-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.82-cp312-cp312-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.82-cp312-cp312-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.82-cp312-cp312-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.82-cp312-cp312-macosx_10_12_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.82-cp311-cp311-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.82-cp311-cp311-manylinux_2_28_x86_64.whl (16.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.82-cp311-cp311-manylinux_2_28_aarch64.whl (16.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.82-cp311-cp311-macosx_11_0_arm64.whl (15.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.82-cp311-cp311-macosx_10_12_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.82.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.82.tar.gz
  • Upload date:
  • Size: 38.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.3

File hashes

Hashes for cocoindex-0.1.82.tar.gz
Algorithm Hash digest
SHA256 764954a4faacbf595cc949eddbb90977cd146e63794c6b44a6893aca2d03b489
MD5 326ed1e7de4d33c0b4d612ca0804c77f
BLAKE2b-256 afda5448e1e00831e06a62959f6ec3e3cd0e002e86a2d8ac6e0fb5bbba57cdd8

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 5c54444c7d572d48e807ac808c7c31195b6a5e17cdc6f65525de8222e03b88e2
MD5 674e077c17325fb5713a4d80cc14ffaf
BLAKE2b-256 70f8a6ae064405f3995ab4a4d344693d006bbd51dc59d4a24a1b427ebc525204

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c5b58b97bd653221b1dc28c4c075bb36a21dfbc583edbadc56ddb9f5989e55c2
MD5 a4bfae11d350233615f82f0c32af1077
BLAKE2b-256 e4dc682bc8079ea4c3dcd4d9ee4c0b4d76c43c6c9bc080537b03337b5908a276

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 1bae4d1a50ac09c83a424d732d8606c0a72c930cda5e7437e17425f97d650ea6
MD5 b8a26148c0528ad38b478f89830bfa1a
BLAKE2b-256 144719b94986441dcb5494dc654c6f2fc5f968762c6f4283326999683d151e8d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c917d5a0b2fd948cdc9abaaac2d35041cda40a5ee3035608f7f6cd97f72d3544
MD5 e53bc7270428d2001fe7e08b69bc49aa
BLAKE2b-256 7edcaa4d7193c96acaa9298717ecaad10a949e9fcfc5d56ba90e621c887fdc35

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b13a2df322ecebf789730ea39a15e8ea6a3cf4f212e56ef279698de30e015501
MD5 460462f155396c80917804a64ba8ea71
BLAKE2b-256 9d35d49d714ac756fa7785094471a581c7deeae911ed89fba3f4f5df2b909001

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b04252f170f70bf1999ba8a293117916bd57b8c23b68607bfb5c72ec7fb70c5f
MD5 fda4fb2d273a7f32296a84f76e24d75e
BLAKE2b-256 36932e17955e03a598a6e836bf51587953fc016254976e41804364cb51f7ea6f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 dd56262d4336c6a3a0d0df362f103c233da27638cd4badaf52db8472d2fcf134
MD5 c6686dfadd0c1f468ad5531fd7d58a7e
BLAKE2b-256 9ba776d27afd64036a7e15a2486e5a73607039ad8ef4e963d972c4fa5d9383e6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 cc067a0e232f1d3b60285b8b024dada6e1222ab90c21b9b0632fd1a523355f70
MD5 61f507317a74a5a1b73dc01989c2d5b8
BLAKE2b-256 36b07fd16d8e966e95e7c6f9ddeb8a07354b19d54b0a1d8451e367890c1dba24

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ef2f1b9e05453691dadafdbbd3b93d7229e79479587fb02a5b90fa7461a5569d
MD5 8f5a4609c895ccb80850eae651417a6d
BLAKE2b-256 148da1653a594c2abe4af63321c6e3a7859d0c0f06842f3b33d08770b97db4ab

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cb0f14e3025b77ea641a31b248a3c8bf2c242ff6f01238b413170bcc6a1da62d
MD5 436d93c51580faead16de4254c662624
BLAKE2b-256 4ebb0a529df6c4eb8ce71e90c3575a2fd3513eea5ccfc12955147824754f82c4

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 81755d2e4fc420e7b90d57c8a670ad679f21102781cb7a3834457631f9c79b62
MD5 d96585495688bd9d19e9dc9807cfacb2
BLAKE2b-256 c269e38418df3a9e1e6c3433d9a61b32324b62222dd3b566b9f6dd4ceece700c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 65bb61d28313a7feb540a6197d0d3686487ce4f33964322481e13f18256385a5
MD5 4329d9c818c651493cc3ee48eda6d537
BLAKE2b-256 cbd70dcaf1e0190829e772b3f6668345c3769201db148d1424f2b095e6045f31

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 2a5ef695e062d74a3fbb1a205c7bcb78f1be3b7f75804dea6681033cd78a4bf5
MD5 986453c123a6911ca50795605aed9be8
BLAKE2b-256 cec3228afbc269020fef7899c7afa89b89f9e9f020eeacee3820a284ab9a09ef

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a6a9b6fa203f23ad22bdce440453ca88fed1cf36bdb4ed4482874d90ae6f1765
MD5 2ec6cc8283e0eab223f0947f27a2af23
BLAKE2b-256 b2571415324a090b898c7d9553e41dd623d44e6dbe4567025ad4067f8e569042

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 5f63a656bf160c579d39b899dacbe06a8ee16e0b57fb7f97e182bb415cd56fa0
MD5 8bd6ee096c4210f5de8ccea55acd75ac
BLAKE2b-256 e3cc5165d2c67bfc59103fde46af4b7488be36430211fd58e6e4241eccf97c5c

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bede254cf77a3d5a0c7a0239162e4290438fa00afd9d831292bc12d45f3d27e1
MD5 d6fb8685feb2a85d7f1f35f3b0a795c2
BLAKE2b-256 09964d3525f6a15d4534212b947ae13bea69731e3a1cf822de6b19e32b07fe82

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.82-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.82-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8a9ec30867da19233270871582bbee85c98f1cce6a5b92f3fd56c89124ad85a7
MD5 bf8f0017964e8bbf364897b2957a9164
BLAKE2b-256 3f64b5c0a69149a076a6ffe7f74fc946c09b94f4acf9b93faf4b171618b97f8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page