Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.73.tar.gz (10.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.73-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.73-cp313-cp313t-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.73-cp313-cp313-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.73-cp313-cp313-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.73-cp313-cp313-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.73-cp313-cp313-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.73-cp313-cp313-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.73-cp312-cp312-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.73-cp312-cp312-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.73-cp312-cp312-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.73-cp312-cp312-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.73-cp312-cp312-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.73-cp311-cp311-win_amd64.whl (15.5 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.73-cp311-cp311-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.73-cp311-cp311-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.73-cp311-cp311-macosx_11_0_arm64.whl (15.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.73-cp311-cp311-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.73.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.73.tar.gz
  • Upload date:
  • Size: 10.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.1

File hashes

Hashes for cocoindex-0.1.73.tar.gz
Algorithm Hash digest
SHA256 68fbcba765ab469d641b0d52c94d30d39dd68f379e49c026123012d1771ef5d9
MD5 af3244f0def57b1c33ce97e469cb2394
BLAKE2b-256 1a86240290417e151df75188c5911f1c767b98c51d28b2d7e73089db34c455bb

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b1e02c3bd9a5de0103c4982605eeb906d716fe6ec3f6350d560f0c165867b12d
MD5 01fcec9ef87a766cf9b7ea552a4109b2
BLAKE2b-256 dbd6c18b2581bde9e3e5927ee71919add1e3d06107db68225d3ab6d7196eb1de

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c69e7f5379a8c1e1d199b06349529a89a28e01a45fcab4abd47b9545b24a7a85
MD5 d125e9736814912ae9c7cb4a267b97b6
BLAKE2b-256 51147138bb7355a8e0c0aad2792219a0f87250477c20ae36f44184fcfbcf532e

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 462d2c507a47dab836cc0dabd6ad9540a320854533560600774ff7cfa95cddaf
MD5 158675cc6c08dc9b16139e2013a1bba1
BLAKE2b-256 77bf16e03b8b098dbfd40d2f924de0cd36b6f0cda67bc07c054e63b27b7ee807

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 464bbcd2bd8985d8669e12b1b423cf08c8215199c4e5f40a1ad2099464eccba8
MD5 b08907014eb3dbc9b2dd7bb44139dd43
BLAKE2b-256 98689310aea7a001fbc2d52aca5580744070ce18e40f5ae5f74165da4ecdbf64

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8ed05362efcd65a23adb82cedf2d44e2c36bea24f5fe8f4c95697f550d48cc39
MD5 5b62ec814892cadc177c04282a0e324a
BLAKE2b-256 f16691cbf482eba77e0d65e0efd1a35995fba79ac330c23d709deb1d2e4ae41b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f1774290023630d494369f63b505bfe44bb0174eaa225a3692897da06458de65
MD5 c41bb2df188979516a42a160a26d8a5c
BLAKE2b-256 2a629d66b215a211d01bee97a7b1897c3a1301c5524b895afa60fab868af7c7f

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9c573d666c170772be614f7932a8f452d4a345586c84639652b3ba8a99c4af65
MD5 3261e02b5404818ecd9b75ab9638a25a
BLAKE2b-256 4f5168dc6b32f97e3639b7e8ca2e671b233d68f527eeda267aa1745b1df8a3f9

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 b197cccee9320a59812559d5524128ee0617402df42b24718d5f73a756b59367
MD5 73fecfd82f634898f8ff0d44176f4e68
BLAKE2b-256 98b1397fdc244e70df68c65074240f75b9938b27199c7ab76c01b5793680902d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1fb73479dd87a62d5ffcc82873fb78eb8f8485cd1b9b6ce957d3a463685a92d0
MD5 2ebcd78dd856f97d38427a1d60960ea0
BLAKE2b-256 bce2c688774a32637a2257f6fa3d83a6e1ce75e2b22c7de5c7b7730b6faa68fa

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 692419f81624842d3290be2e61806edf4bc019f46fc1cea5ff9a224f4676d8bb
MD5 25297e93886b3a3e075c221b90d70f39
BLAKE2b-256 b799d66f62f8a709e0675a34a3f7b30ef9c39a8262737e18133b50735fc3f717

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 75a5b25339f99dc12ee16bc31687c37701f6b18c7bacce0ae6b7d27bfd611910
MD5 7bf58ce2f9cb26d346723d5717dacdd4
BLAKE2b-256 3645f11fac28dd8f5ff8b5f55b7b80777c6b9b6343c3fa2fbb9891decc11ee79

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 379ecbc3484ff696fd46f4f5fdace303734a334436ea52ba43b755f5329c9b30
MD5 8ded91c53aea0d9144fcdc590943a9af
BLAKE2b-256 0f652cbc261feb3c13f74912c878f9ff5dac295785c5ab22bf5faad06201198d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 51cb5e78b2ae218335d34375707391a2f633308d435a5007f37aaef1feadb761
MD5 6702e085dfe4af7d00efdef4b2fa3cc7
BLAKE2b-256 d96f410af8d4f7db0cfc14cebbf4d6ecb0ec83eee97f47e3c39a5ba15ef449c2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b69a54e67d39da83936c496911ba10cb16521919782d2ef74ffbc3876085c024
MD5 7f9b753639792b2fdb71121ef0a8c821
BLAKE2b-256 9b6487822f790837c5758018422223b5a238d3a82c91eaae7889e2f8222e92f2

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8da98fbd5e244ac8f2606b982c912117abe0f9b948f87d2881b72b702f66bce5
MD5 60aa9a6f9d2a0c8a4f4383dc7e24dd6b
BLAKE2b-256 d3b2b006114d62eac1a7c5b2721b510646100d2df0996d5423d786df0b519d24

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3955fc33e98a58d9ed9891371608e93d32f1d9fbb749be92e421c50c5b6d8dee
MD5 98fa9a39aed6037fad481d4ac8e54e25
BLAKE2b-256 757b5cdea89658444908fbe647ae02877bf60ece53a6428247047a7179becb0d

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.73-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.73-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 13b402dce9f9a763d7772753c941ebb051967ca83020cfc83f84cbfd52e0b0bc
MD5 3bbaaf54180c5a8946a8cdc9072e22a3
BLAKE2b-256 2cab5bacede78bd1e22a865241e45705ada8be11e8a7f3f14b89ae37fd63cbd1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page