Skip to main content

With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and keeps the derived index up to date based on source update, with minimal computation and changes.

Project description

CocoIndex

Data transformation for AI

GitHub Documentation License PyPI version PyPI - Downloads CI release Discord

cocoindex-io%2Fcocoindex | Trendshift

Ultra performant data transformation framework for AI, with core engine written in Rust. Support incremental processing and data lineage out-of-box. Exceptional developer velocity. Production-ready at day 0.

⭐ Drop a star to help us grow!


CocoIndex Transformation


CocoIndex makes it super easy to transform data with AI workloads, and keep source data and target in sync effortlessly.


CocoIndex Features


Either creating embedding, building knowledge graphs, or any data transformations - beyond traditional SQL.

Exceptional velocity

Just declare transformation in dataflow with ~100 lines of python

# import
data['content'] = flow_builder.add_source(...)

# transform
data['out'] = data['content']
    .transform(...)
    .transform(...)

# collect data
collector.collect(...)

# export to db, vector db, graph db ...
collector.export(...)

CocoIndex follows the idea of Dataflow programming model. Each transformation creates a new field solely based on input fields, without hidden states and value mutation. All data before/after each transformation is observable, with lineage out of the box.

Particularly, developers don't explicitly mutate data by creating, updating and deleting. They just need to define transformation/formula for a set of source data.

Build like LEGO

Native builtins for different source, targets and transformations. Standardize interface, make it 1-line code switch between different components.

CocoIndex Features

Data Freshness

CocoIndex keep source data and target in sync effortlessly.

Incremental Processing

It has out-of-box support for incremental indexing:

  • minimal recomputation on source or logic change.
  • (re-)processing necessary portions; reuse cache when possible

Quick Start:

If you're new to CocoIndex, we recommend checking out

Setup

  1. Install CocoIndex Python library
pip install -U cocoindex
  1. Install Postgres if you don't have one. CocoIndex uses it for incremental processing.

Define data flow

Follow Quick Start Guide to define your first indexing flow. An example flow looks like:

@cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope):
    # Add a data source to read files from a directory
    data_scope["documents"] = flow_builder.add_source(cocoindex.sources.LocalFile(path="markdown_files"))

    # Add a collector for data to be exported to the vector index
    doc_embeddings = data_scope.add_collector()

    # Transform data of each document
    with data_scope["documents"].row() as doc:
        # Split the document into chunks, put into `chunks` field
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown", chunk_size=2000, chunk_overlap=500)

        # Transform data of each chunk
        with doc["chunks"].row() as chunk:
            # Embed the chunk, put into `embedding` field
            chunk["embedding"] = chunk["text"].transform(
                cocoindex.functions.SentenceTransformerEmbed(
                    model="sentence-transformers/all-MiniLM-L6-v2"))

            # Collect the chunk into the collector.
            doc_embeddings.collect(filename=doc["filename"], location=chunk["location"],
                                   text=chunk["text"], embedding=chunk["embedding"])

    # Export collected data to a vector index.
    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)])

It defines an index flow like this:

Data Flow

🚀 Examples and demo

Example Description
Text Embedding Index text documents with embeddings for semantic search
Code Embedding Index code embeddings for semantic search
PDF Embedding Parse PDF and index text embeddings for semantic search
Manuals LLM Extraction Extract structured information from a manual using LLM
Amazon S3 Embedding Index text documents from Amazon S3
Azure Blob Storage Embedding Index text documents from Azure Blob Storage
Google Drive Text Embedding Index text documents from Google Drive
Docs to Knowledge Graph Extract relationships from Markdown documents and build a knowledge graph
Embeddings to Qdrant Index documents in a Qdrant collection for semantic search
FastAPI Server with Docker Run the semantic search server in a Dockerized FastAPI setup
Product Recommendation Build real-time product recommendations with LLM and graph database
Image Search with Vision API Generates detailed captions for images using a vision model, embeds them, enables live-updating semantic search via FastAPI and served on a React frontend
Face Recognition Recognize faces in images and build embedding index
Paper Metadata Index papers in PDF files, and build metadata tables for each paper
Multi Format Indexing Build visual document index from PDFs and images with ColPali for semantic search
Custom Output Files Convert markdown files to HTML files and save them to a local directory, using CocoIndex Custom Targets

More coming and stay tuned 👀!

📖 Documentation

For detailed documentation, visit CocoIndex Documentation, including a Quickstart guide.

🤝 Contributing

We love contributions from our community ❤️. For details on contributing or running the project for development, check out our contributing guide.

👥 Community

Welcome with a huge coconut hug 🥥⋆。˚🤗. We are super excited for community contributions of all kinds - whether it's code improvements, documentation updates, issue reports, feature requests, and discussions in our Discord.

Join our community here:

Support us:

We are constantly improving, and more features and examples are coming soon. If you love this project, please drop us a star ⭐ at GitHub repo GitHub to stay tuned and help us grow.

License

CocoIndex is Apache 2.0 licensed.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocoindex-0.1.78.tar.gz (14.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cocoindex-0.1.78-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

cocoindex-0.1.78-cp313-cp313t-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13tmanylinux: glibc 2.28+ ARM64

cocoindex-0.1.78-cp313-cp313-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.13Windows x86-64

cocoindex-0.1.78-cp313-cp313-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

cocoindex-0.1.78-cp313-cp313-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

cocoindex-0.1.78-cp313-cp313-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

cocoindex-0.1.78-cp313-cp313-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

cocoindex-0.1.78-cp312-cp312-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.12Windows x86-64

cocoindex-0.1.78-cp312-cp312-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

cocoindex-0.1.78-cp312-cp312-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

cocoindex-0.1.78-cp312-cp312-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

cocoindex-0.1.78-cp312-cp312-macosx_10_12_x86_64.whl (16.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

cocoindex-0.1.78-cp311-cp311-win_amd64.whl (15.6 MB view details)

Uploaded CPython 3.11Windows x86-64

cocoindex-0.1.78-cp311-cp311-manylinux_2_28_x86_64.whl (16.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

cocoindex-0.1.78-cp311-cp311-manylinux_2_28_aarch64.whl (15.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

cocoindex-0.1.78-cp311-cp311-macosx_11_0_arm64.whl (15.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

cocoindex-0.1.78-cp311-cp311-macosx_10_12_x86_64.whl (16.1 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file cocoindex-0.1.78.tar.gz.

File metadata

  • Download URL: cocoindex-0.1.78.tar.gz
  • Upload date:
  • Size: 14.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.2

File hashes

Hashes for cocoindex-0.1.78.tar.gz
Algorithm Hash digest
SHA256 982218cab59c7a7f7dff023c859cb195b8a8df162ff55012821c18bbf59ed909
MD5 f6ea452a121c4477ea3c6480207f3fc6
BLAKE2b-256 d46c0472ed6dc9c8e614b928c2198f2a05c4c0e33ba7e231880a154ff823c622

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-pp311-pypy311_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0317cca3348aea85e551f1359083032f972eb95f8dcc9799987286cf387c2dd1
MD5 ba86896804e7f8b05304b101905bae5b
BLAKE2b-256 f90846164996c115167aa93cab5d1dad17703a1ef96293d43ea5b95a35e40352

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp313-cp313t-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp313-cp313t-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6e1e89534a106bde732d1c16f1fcc2a28e59fc35db2b7b85db0390885f4a4231
MD5 c0431d91858d539f14dffc8f0638013f
BLAKE2b-256 674d99a8e59e710549202a710ee80d4400f8fe4525b26d7923bbff82f89b3430

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2ae13f8a04b9301ed5e1c2dba397b95b4b88194d0cf85f991e64f2ad215a2c2a
MD5 377997a1620391ecfdce30c7c9067886
BLAKE2b-256 9431ce45803d4f56f76e603d07fab2e830bf62089cd70d5e73c505972964a1e6

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e25225c8c961c3eb7e06408468ef96b39219059045f73f25bac0b8cec7df91a0
MD5 844fe296cf87e772623181a1c0d4d3ce
BLAKE2b-256 29d048a093596e1dee488ef3c9e8423e373580a135c26ec925eb88a34c0e46bf

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6b641ca18709561588059eb607fd785975996126bf1ca97fb1c2cc53341b70d3
MD5 7b62ed2cc1c63fe4521d768a1a9b6f7d
BLAKE2b-256 4f9108ba419823acf0e3c4cecad01fda96cb469ff88cd8ec1f122b9588924109

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 62446e3d6715601c495623a5c046ef44ee8a4523896afd64cd8d1cbcb6c918f3
MD5 4eb731e94f1b06ede9f7d6fb64a0e51a
BLAKE2b-256 2c119c780c524f1cb08b8e93a534a33bf62e353bb844bf6d3181b7aa53cf5012

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cb799d98a27358dffa6b5d18630f603fd5989a713518dd3e662c39da036d5ad5
MD5 28911b2ef18e8f975b1b9a3b14c139f7
BLAKE2b-256 f8da901f90ff1746f789eea5fcfb888451015170608b59b9e4fdf249d7dceebf

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 06338024d7aac3507bcb7ab090cf4ea86db66d3d247f65190e6bf1f3650c5717
MD5 97d0f64b431ecf687d5450ab5834b82b
BLAKE2b-256 0aadfd8d19d4172b6402ad162481a9ae5e90db174f094f037e2bffcd12b3aaa1

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d863e6d62c213b94d18c1b0c5c6efe994615c8403aa2c194b82dfd91841a842b
MD5 30064eaa0bcb516e9faed0fbaab09cec
BLAKE2b-256 8ec35c53f71b7493cb000c3343dd9b3986fd800a53b0a2ecd47a96b9ead68f18

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 812d7afabe570a242a17e6f262e991d55edd4b6c7c1b93d9db43a7e767291cf0
MD5 dec5da631ffe934131daffdcc1005ea4
BLAKE2b-256 ca2dda588afd22dfd8b73280c7e10fc28ac60c53ebc6b250e6bb384f1eecaaf3

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9ded5c56af1ad6b23bb85a05739b61cbec21d2fa0434899a5e97adf7405cd0f7
MD5 170a3046032c0e69a2718c22efe297a4
BLAKE2b-256 5bf2ab2ef05f17f7dccf9e035ac1cfbe03b7a1faca25b59fdb36dd03ab626f34

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d73ca33ef8af4e753eef3fe7d8af31afd087bbe485f1c549d8e7c8c367f44711
MD5 80da0b3d3116afbf8b916814700ae31e
BLAKE2b-256 561a508083d9871a69081bb212088bbd2e9993041966a97074eaba9cd7d5986b

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 41401f117deca3c79c3e70004ee025e62c531a98e1a6dd8f7f0bfa953b3bc66e
MD5 5f614a01753b17528213a8b922c69ce7
BLAKE2b-256 1cb8c2874ba8c71ceda6320d5b30552cba97ce8fcee160cbe692cf4b91c24ada

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d665b54585e528a5acb08c3672a1e6f867ca376c2707d89ad6b57a9b9a86b2c6
MD5 dec2198f3c14e967420f88826b3cd123
BLAKE2b-256 7c31a2da88fdc4f695b706639f11c48bea0fe64a114a9a32c0692ebe2dbfbcb7

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 4ef5661b24743f6fd7fa3e5a44f846f71f58487aaec0ac6641eec877cd72982f
MD5 30778e6c776d783e97731230c266cafd
BLAKE2b-256 c77b81a7b8db8eee5578d456fb6541f121e939c3dc2e50cb6916e5913c9d3748

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4fb51a4069b83fc3425c4f50041f6127bb792b282acc86e5151081024c0a0713
MD5 0fe9d6e8cd8270eae2087fa3f738e297
BLAKE2b-256 3e07503537ee7f1008d40dc6106105c5d97b2888507d1c09b837bdfdadc3acaa

See more details on using hashes here.

File details

Details for the file cocoindex-0.1.78-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for cocoindex-0.1.78-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cf215f2f01957bf81eeaa4c0039926855348ed07d19ec2968f434f62328008c0
MD5 3e0edb3dc9797cb7b707014d3cad410f
BLAKE2b-256 8eacafb3cffa891bf21a50cd0ad9f78af6e6a7d0b431fb02091d5e933b3f0758

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page