Wrangle unstructured AI data at scale

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

0x2b3bfa0 dmpetrov shcheklein

These details have not been verified by PyPI

Project links

Documentation

Project description

DataChain - Data Context Layer for Object Storage

DataChain is a data context layer for object storage. It gives AI agents and pipelines a typed, versioned, queryable view of your files - what exists, what schema it has, what's already been computed - without copying data or loading it into memory.

Metadata queries across 100M+ files execute in milliseconds against a backend database
Pipelines checkpoint - re-running the same script resumes compute without duplicating expensive LLM-call or ML scoring
delta=True makes re-runs incremental — only new or changed files are processed
Every .save() registers a named, versioned dataset with schema and lineage
A generated knowledge base (dc-knowledge/) reflects the operational layer as markdown for agents to read before writing code

Works with S3, GCS, Azure, and local filesystems.

pip install datachain

To add the agent knowledge layer and code generation skill:

datachain skill install --target claude     # also: --target cursor, --target codex

1. Quickstart: agent-driven pipeline

Task: find dogs in S3 similar to a reference image, filtered by breed, mask availability, and image dimensions.

Grab a reference image and run Claude Code (or other agent):

datachain cp --anon s3://dc-readme/fiona.jpg .

claude

Prompt:

Find dogs in s3://dc-readme/oxford-pets-micro/ similar to fiona.jpg:
  - Pull breed metadata and mask files from annotations/
  - Exclude images without mask
  - Exclude Cocker Spaniels
  - Only include images wider than 400px

Result:

  ┌──────┬───────────────────────────────────┬────────────────────────────┬──────────┐
  │ Rank │               Image               │           Breed            │ Distance │
  ├──────┼───────────────────────────────────┼────────────────────────────┼──────────┤
  │    1 │ shiba_inu_52.jpg                  │ shiba_inu                  │    0.244 │
  ├──────┼───────────────────────────────────┼────────────────────────────┼──────────┤
  │    2 │ shiba_inu_53.jpg                  │ shiba_inu                  │    0.323 │
  ├──────┼───────────────────────────────────┼────────────────────────────┼──────────┤
  │    3 │ great_pyrenees_17.jpg             │ great_pyrenees             │    0.325 │
  └──────┴───────────────────────────────────┴────────────────────────────┴──────────┘

  Fiona's closest matches are shiba inus (both top spots), which makes sense given her
  tan coloring and pointed ears.

The agent decomposed the task into steps - embeddings, breed metadata, mask join, quality filter - and saved each as a named, versioned dataset. Next time you ask a related question, it starts from what's already built.

The datasets are registered in a knowledge base optimized for both agents and humans:

dc-knowledge
├── buckets
│   └── s3
│       └── dc_readme.md
├── datasets
│   ├── oxford_micro_dog_breeds.md
│   ├── oxford_micro_dog_embeddings.md
│   └── similar_to_fiona.md
└── index.md

Browse it as markdown files, navigate with wikilinks, or open in Obsidian:

Visualize data knowledge base

2. How it works

Claude Code (Codex, Cursor, etc) isn't just a chat interface with a shell - it's a harness that gives the LLM repo context, dedicated tools, and persistent memory. That's what makes it good.

DataChain extends that harness to data. The agent now also understands your storage and datasets: schemas, dependencies, code, what's already computed, what's mid-run, and what changed since last time.

Data Context Architecture

A dataset is the unit of work - a named, versioned result of a pipeline step like pets_embeddings@1.0.0. Every .save() registers one.

Inside DataChain, datasets live in two layers:

The operational layer is the engine - the ground truth that makes crash recovery, incremental updates, and vector search work at scale.
The knowledge layer is a structured reflection of it enriched by LLMs: markdown files the agent reads to understand what exists before writing a single line of code.

3. Core concepts

3.1. Dataset

A dataset is a versioned data reasoning step - what was computed, from what input, producing what schema. DataChain indexes your storage into one: no data copied, just typed metadata and file pointers. Re-runs only process new or changed files.

Create a dataset manually create_dataset.py:

from PIL import Image
import io
from pydantic import BaseModel
import datachain as dc

class ImageInfo(BaseModel):
    width: int
    height: int

def get_info(file: dc.File) -> ImageInfo:
    img = Image.open(io.BytesIO(file.read()))
    return ImageInfo(width=img.width, height=img.height)

ds = (
    dc.read_storage(
        "s3://dc-readme/oxford-pets-micro/images/**/*.jpg",
        anon=True,
        update=True,
        delta=True,         # re-runs skip unchanged files
    )
    .settings(prefetch=64)
    .map(info=get_info)
    .save("pets_images")
)
ds.show(5)

pets_images@1.0.0 is now the shared reference to this data - schema, version, lineage, and metadata.

Every .save() registers the dataset in DataChain's *operational data layer - the persistent store for schemas, versions, lineage, and processing state, kept locally in SQLite DB .datachain/db. Pipelines reference datasets by name, not paths. When the code or input data changes, the next run bumps dataset version.

This is what makes a dataset a management unit: owned, versioned, and queryable by everyone on the team.

3.2. Schemas and types

DataChain uses Pydantic to define the shape of every column. The return type of your UDF becomes the dataset schema — each field a queryable column in the operational layer.

show() in the previous script renders nested fields as dotted columns:

                                          file    file  info   info
                                          path    size width height
0  oxford-pets-micro/images/Abyssinian_141.jpg  111270   461    500
1  oxford-pets-micro/images/Abyssinian_157.jpg  139948   500    375
2  oxford-pets-micro/images/Abyssinian_175.jpg   31265   600    234
3  oxford-pets-micro/images/Abyssinian_220.jpg   10687   300    225
4    oxford-pets-micro/images/Abyssinian_3.jpg   61533   600    869

[Limited by 5 rows]

.print_schema() renders it's schema:

file: File@v1
  source: str
  path: str
  size: int
  version: str
  etag: str
  is_latest: bool
  last_modified: datetime
  location: Union[dict, list[dict], NoneType]
info: ImageInfo
  width: int
  height: int

Models can be arbitrarily nested - a BBox inside an Annotation, a List[Citation] inside an LLM Response - every leaf field stays queryable the same way. The schema lives in the operational layer and is enforced at dataset creation time.

The operational layer handles datasets of any size - 100 millions of files, hundreds of metadata rows - without loading anything into memory. Pandas is limited by RAM; DataChain is not. Export to pandas when you need it, on a filtered subset:

import datachain as dc

df = dc.read_dataset("pets_images").filter(dc.C("info.width") > 500).to_pandas()
print(df)

3.3. Fast queries

Filters, aggregations, and joins run as vectorized operations directly against the operational layer - metadata never leaves your machine, no files downloaded.

import datachain as dc

cnt = (
    dc.read_dataset("pets_images")
    .filter(
        (dc.C("info.width") > 400) &
        ~dc.C("file.path").ilike("%cocker_spaniel%")   # case-insensitive
    )
    .count()
)
print(f"Large images with Cocker Spaniel: {cnt}")

Milliseconds, even at 100M-file scale.

Large images with Cocker Spaniel: 6

4. Resilient Pipelines

When computation is expensive, bugs and new data are both inevitable. DataChain tracks processing state in the operational layer — so crashes and new data are handled automatically, without changing how you write pipelines.

4.1. Data checkpoints

Save to embed.py:

import open_clip, torch, io
from PIL import Image
import datachain as dc

model, _, preprocess = open_clip.create_model_and_transforms("ViT-B-32", "laion2b_s34b_b79k")
model.eval()

counter = 0

def encode(file: dc.File, model, preprocess) -> list[float]:
    global counter
    counter += 1
    if counter > 236:                                    # ← bug: remove these two lines
        raise Exception("some bug")                      # ←
    img = Image.open(io.BytesIO(file.read())).convert("RGB")
    with torch.no_grad():
        return model.encode_image(preprocess(img).unsqueeze(0))[0].tolist()

(
    dc.read_dataset("pets_images")
    .settings(batch_size=100)
    .setup(model=lambda: model, preprocess=lambda: preprocess)
    .map(emb=encode)
    .save("pets_embeddings")
)

It fails due to a bug in the code:

Exception: some bug

Remove the two marked lines and re-run - DataChain resumes from image 201 (two 100 size batches are completed), the start of the last uncommitted batch:

$ python embed.py
UDF 'encode': Continuing from checkpoint

4.2. Similarity search

The vectors live in the operational layer alongside all the metadata - list[float] type in pydentic schemas. Querying them is instant - no files re-read and can be combined with not vector filters like info.width:

Prepare data:

datachain cp s3://dc-readme/fiona.jpg .

similar.py:

import open_clip, torch, io
from PIL import Image
import datachain as dc

model, _, preprocess = open_clip.create_model_and_transforms("ViT-B-32", "laion2b_s34b_b79k")
model.eval()

ref_emb = model.encode_image(
    preprocess(Image.open("fiona.jpg")).unsqueeze(0)
)[0].tolist()

(
    dc.read_dataset("pets_embeddings")
    .filter(dc.C("info.width") > 500)          # from pets_images — no re-read
    .mutate(dist=dc.func.cosine_distance(dc.C("emb"), ref_emb))
    .order_by("dist")
    .limit(3)
    .show()
)

Under a second - everything runs against the operational layer.

4.3. Incremental updates

The bucket in this walkthrough is static, so there's nothing new to process. But in production - when new images land in your bucket - re-run the same scripts unchanged. delta=True in the original dataset ensures only new files are processed end to end while the whole dataset will be updated to pets_images@1.0.1:

$ python create_dataset.py   # 500 new images arrived
Skipping 10,000 unchanged  ·  indexing 500 new
Saved pets_images@1.0.1  (+500 records)

# Next day:

$ python create_dataset.py
Skipping 10,000 unchanged  ·  processing 500 new
Saved pets_images@1.0.2  (+500 records)

5. Knowledge Base

DataChain maintains two layers. The operational layer is the ground truth - schemas, processing state, lineage, the vectors themselves. The knowledge base layer is derived from it: structured markdown for humans and agents to read. Because it's derived, it's always accurate. The knowledge base is stored in dc-knowledge/ directory.

Ask the agent to build it (from Calude Code, Codex or Cursor):

claude

Prompt:

Build a knowledge base for my current datasets

The skill generates dc-knowledge/ directory from the operational layer - one file per dataset and bucket:

6. AI-Generated Pipelines

The skill gives the agent data awareness: it reads dc-knowledge/ to understand what datasets exist, their schemas, which fields can be joined - and the meaning of columns inferred from the code that produced them.

See section 1. See it in action. All the steps that were manually created could be just generated.

7. Team and cloud: Studio

Data context built locally stays local. DataChain Studio makes it shared.

datachain auth login
datachain job run --workers 20 --cluster gpu-pool caption.py
# ✓ Job submitted → studio.datachain.ai/jobs/1042
# Resuming from checkpoint (4,218 already done)...
# Saved oxford-pets-caps@0.0.1  (3,182 processed)

DataChain Studio Architecture

Studio adds: shared dataset registry, access control, UI for video/DICOM/NIfTI/point clouds, lineage graphs, reproducible runs.

Bring Your Own Cloud — all data and compute stay in your infrastructure. AWS, GCP, Azure, on-prem Kubernetes.

→ studio.datachain.ai

8. Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

9. Community and Support

Report an issue if you encounter any problems
Docs
Email
Twitter

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

0x2b3bfa0 dmpetrov shcheklein

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.53.0

Apr 24, 2026

This version

0.52.0

Apr 20, 2026

0.51.1

Apr 17, 2026

0.51.0

Apr 13, 2026

0.50.2

Apr 8, 2026

0.50.1

Apr 7, 2026

0.50.0

Apr 2, 2026

0.49.1

Mar 30, 2026

0.49.0

Mar 28, 2026

0.48.4

Mar 27, 2026

0.48.3

Mar 24, 2026

0.48.2

Mar 24, 2026

0.48.1

Mar 21, 2026

0.48.0

Mar 16, 2026

0.47.2

Mar 15, 2026

0.47.1

Mar 5, 2026

0.47.0

Mar 5, 2026

0.46.5

Mar 5, 2026

0.46.4

Mar 2, 2026

0.46.3

Feb 25, 2026

0.46.2

Feb 24, 2026

0.46.1

Feb 17, 2026

0.46.0

Feb 15, 2026

0.45

Feb 10, 2026

0.44.9

Jan 29, 2026

0.44.8

Jan 29, 2026

0.44.7

Jan 25, 2026

0.44.6

Jan 23, 2026

0.44.5

Jan 22, 2026

0.44.4

Jan 14, 2026

0.44.3

Jan 13, 2026

0.44.2

Jan 12, 2026

0.44.1

Jan 7, 2026

0.44.0

Dec 31, 2025

0.43.2

Dec 31, 2025

0.43.1

Dec 29, 2025

0.43.0

Dec 24, 2025

0.42.0

Dec 22, 2025

0.41.0

Dec 19, 2025

0.40.2

Dec 19, 2025

0.40.1

Dec 18, 2025

0.40.0

Dec 14, 2025

0.39.0

Dec 10, 2025

0.38.5

Dec 7, 2025

0.38.4

Dec 5, 2025

0.38.3

Dec 3, 2025

0.38.2

Nov 30, 2025

0.38.1

Nov 29, 2025

0.38.0

Nov 22, 2025

0.37.15

Nov 15, 2025

0.37.14

Nov 15, 2025

0.37.13

Nov 11, 2025

0.37.12

Nov 7, 2025

0.37.11

Nov 4, 2025

0.37.10

Nov 2, 2025

0.37.9

Oct 30, 2025

0.37.8

Oct 30, 2025

0.37.7

Oct 28, 2025

0.37.6

Oct 28, 2025

0.37.5

Oct 27, 2025

0.37.4

Oct 27, 2025

0.37.3

Oct 27, 2025

0.37.2

Oct 26, 2025

0.37.1

Oct 22, 2025

0.37.0

Oct 20, 2025

0.36.6

Oct 19, 2025

0.36.5

Oct 18, 2025

0.36.4

Oct 18, 2025

0.36.3

Oct 17, 2025

0.36.2

Oct 16, 2025

0.36.1

Oct 16, 2025

0.36.0

Oct 15, 2025

0.35.2

Oct 13, 2025

0.35.1

Oct 12, 2025

0.35.0

Oct 9, 2025

0.34.7

Oct 9, 2025

0.34.6

Oct 5, 2025

0.34.5

Oct 4, 2025

0.34.4

Oct 3, 2025

0.34.3

Oct 2, 2025

0.34.2

Oct 1, 2025

0.34.1

Oct 1, 2025

0.34.0

Sep 30, 2025

0.33.1

Sep 26, 2025

0.33.0

Sep 24, 2025

0.32.3

Sep 16, 2025

0.32.2

Sep 16, 2025

0.32.1

Sep 14, 2025

0.32.0

Sep 11, 2025

0.31.4

Sep 11, 2025

0.31.3

Sep 11, 2025

0.31.2

Sep 10, 2025

0.31.1

Sep 10, 2025

0.31.0

Sep 3, 2025

0.30.7

Sep 2, 2025

0.30.6

Aug 29, 2025

0.30.5

Aug 29, 2025

0.30.4 yanked

Aug 27, 2025

Reason this release was yanked:

DataChain 0.30.4 was released out of the stable release window; the release has been postponed

0.30.3

Aug 21, 2025

0.30.2

Aug 16, 2025

0.30.1

Aug 13, 2025

0.30.0

Aug 12, 2025

0.29.1

Aug 11, 2025

0.29.0

Aug 11, 2025

0.28.2

Aug 6, 2025

0.28.1

Jul 30, 2025

0.28.0

Jul 28, 2025

0.27.0

Jul 24, 2025

0.26.4

Jul 17, 2025

0.26.3

Jul 15, 2025

0.26.2

Jul 15, 2025

0.26.1

Jul 15, 2025

0.26.0

Jul 12, 2025

0.25.2

Jul 10, 2025

0.25.1

Jul 10, 2025

0.25.0

Jul 9, 2025

0.24.6

Jul 9, 2025

0.24.5

Jul 8, 2025

0.24.4

Jul 5, 2025

0.24.3

Jul 3, 2025

0.24.2

Jul 2, 2025

0.24.1

Jun 30, 2025

0.24.0

Jun 29, 2025

0.23.0

Jun 28, 2025

0.22.0

Jun 26, 2025

0.21.1

Jun 25, 2025

0.21.0

Jun 25, 2025

0.20.4 yanked

Jun 24, 2025

Reason this release was yanked:

accidental release of experimental features

0.20.3 yanked

Jun 24, 2025

Reason this release was yanked:

accidental release of experimental features

0.20.2 yanked

Jun 20, 2025

Reason this release was yanked:

accidental release of experimental features

0.20.1 yanked

Jun 20, 2025

Reason this release was yanked:

accidental release of experimental features

0.20.0 yanked

Jun 19, 2025

Reason this release was yanked:

accidental release of experimental features

0.19.3

Jun 24, 2025

0.19.2

Jun 11, 2025

0.19.1

Jun 10, 2025

0.19

Jun 9, 2025

0.18.11

Jun 5, 2025

0.18.10

Jun 4, 2025

0.18.9

Jun 3, 2025

0.18.8

Jun 3, 2025

0.18.7

Jun 2, 2025

0.18.6

May 28, 2025

0.18.5

May 28, 2025

0.18.4

May 22, 2025

0.18.3

May 21, 2025

0.18.2

May 21, 2025

0.18.1

May 16, 2025

0.18.0

May 15, 2025

0.17.2

May 11, 2025

0.17.1

May 10, 2025

0.17.0

May 9, 2025

0.16.5

May 8, 2025

0.16.4

May 1, 2025

0.16.3

Apr 28, 2025

0.16.2

Apr 22, 2025

0.16.1

Apr 21, 2025

0.16.0

Apr 18, 2025

0.15.0

Apr 18, 2025

0.14.5

Apr 8, 2025

0.14.4

Apr 1, 2025

0.14.3

Mar 31, 2025

0.14.2

Mar 29, 2025

0.14.1

Mar 27, 2025

0.14.0

Mar 26, 2025

0.13.1

Mar 24, 2025

0.13.0

Mar 19, 2025

0.12.0

Mar 17, 2025

0.11.11

Mar 6, 2025

0.11.0

Feb 27, 2025

0.10.0

Feb 20, 2025

0.9.1

Feb 15, 2025

0.9.0

Feb 14, 2025

0.8.13

Feb 3, 2025

0.8.12

Jan 31, 2025

0.8.11

Jan 28, 2025

0.8.10

Jan 20, 2025

0.8.9

Jan 16, 2025

0.8.8

Jan 13, 2025

0.8.7

Jan 12, 2025

0.8.6

Jan 12, 2025

0.8.5

Jan 9, 2025

0.8.4

Jan 6, 2025

0.8.3

Dec 29, 2024

0.8.2

Dec 27, 2024

0.8.1

Dec 26, 2024

0.8.0

Dec 22, 2024

0.7.11

Dec 12, 2024

0.7.10

Dec 9, 2024

0.7.9

Dec 6, 2024

0.7.8

Dec 3, 2024

0.7.7

Dec 2, 2024

0.7.6

Nov 29, 2024

0.7.5

Nov 29, 2024

0.7.4

Nov 29, 2024

0.7.3

Nov 27, 2024

0.7.2

Nov 27, 2024

0.7.1

Nov 22, 2024

0.7.0

Nov 20, 2024

0.6.11

Nov 19, 2024

0.6.10

Nov 17, 2024

0.6.9

Nov 13, 2024

0.6.8

Nov 7, 2024

0.6.7

Nov 6, 2024

0.6.6

Nov 6, 2024

0.6.5

Nov 1, 2024

0.6.4

Oct 31, 2024

0.6.3

Oct 30, 2024

0.6.2

Oct 28, 2024

0.6.1

Oct 16, 2024

0.6.0

Oct 14, 2024

0.5.1

Oct 7, 2024

0.5.0

Sep 26, 2024

0.4.0

Sep 24, 2024

0.3.20

Sep 23, 2024

0.3.19

Sep 23, 2024

0.3.18

Sep 18, 2024

0.3.17

Sep 17, 2024

0.3.16

Sep 16, 2024

0.3.15

Sep 16, 2024

0.3.14

Sep 12, 2024

0.3.13

Sep 11, 2024

0.3.12

Sep 11, 2024

0.3.11

Sep 9, 2024

0.3.10

Sep 5, 2024

0.3.9

Aug 28, 2024

0.3.8

Aug 27, 2024

0.3.7

Aug 22, 2024

0.3.6

Aug 22, 2024

0.3.5

Aug 21, 2024

0.3.4

Aug 19, 2024

0.3.3

Aug 18, 2024

0.3.2

Aug 15, 2024

0.3.1

Aug 8, 2024

0.3.0

Aug 7, 2024

0.2.18

Aug 6, 2024

0.2.17

Aug 6, 2024

0.2.16

Aug 2, 2024

0.2.15

Jul 31, 2024

0.2.14

Jul 29, 2024

0.2.13

Jul 25, 2024

0.2.12

Jul 23, 2024

0.2.11

Jul 18, 2024

0.2.10

Jul 17, 2024

0.2.9

Jul 15, 2024

0.2.8

Jul 15, 2024

0.2.7

Jul 15, 2024

0.2.6

Jul 12, 2024

0.2.5

Jul 11, 2024

0.2.4

Jul 10, 2024

0.2.3

Jul 10, 2024

0.2.2

Jul 10, 2024

0.2.1

Jul 8, 2024

0.2.0

Jul 5, 2024

0.1.13

Jun 28, 2024

0.1.12

Jun 27, 2024

0.1.11

Jun 27, 2024

0.1.10

Jun 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datachain-0.52.0.tar.gz (6.4 MB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datachain-0.52.0-py3-none-any.whl (468.5 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file datachain-0.52.0.tar.gz.

File metadata

Download URL: datachain-0.52.0.tar.gz
Upload date: Apr 20, 2026
Size: 6.4 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datachain-0.52.0.tar.gz
Algorithm	Hash digest
SHA256	`a78b44dc61977b1d4158f97845df744fb77f8ba442a2297b6e246001a12de3d2`
MD5	`0fc2e9ee5f4d12b218277cd84c0101c1`
BLAKE2b-256	`3fa790ee1c1def5981fd1d2f9391d74c898217eaf750360cd13b1decdda24e25`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datachain-0.52.0.tar.gz:

Publisher: release.yml on datachain-ai/datachain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datachain-0.52.0.tar.gz
- Subject digest: a78b44dc61977b1d4158f97845df744fb77f8ba442a2297b6e246001a12de3d2
- Sigstore transparency entry: 1343129025
- Sigstore integration time: Apr 20, 2026
Source repository:
- Permalink: datachain-ai/datachain@ea0d153b5f8245ff15edce48d39c3ca84cf4952e
- Branch / Tag: refs/tags/0.52.0
- Owner: https://github.com/datachain-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ea0d153b5f8245ff15edce48d39c3ca84cf4952e
- Trigger Event: release

File details

Details for the file datachain-0.52.0-py3-none-any.whl.

File metadata

Download URL: datachain-0.52.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 468.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datachain-0.52.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1596dbcc1346bc4b8651ba2552b51656d3964d88b85d3defac4dc3f15dbf192f`
MD5	`6d5f0d60d5238eb1de27014b2eff9c66`
BLAKE2b-256	`59c1da591c1cf1fe68bb1337726173d5a17a0d89b1496091fae4d617deb5e357`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datachain-0.52.0-py3-none-any.whl:

Publisher: release.yml on datachain-ai/datachain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datachain-0.52.0-py3-none-any.whl
- Subject digest: 1596dbcc1346bc4b8651ba2552b51656d3964d88b85d3defac4dc3f15dbf192f
- Sigstore transparency entry: 1343129034
- Sigstore integration time: Apr 20, 2026
Source repository:
- Permalink: datachain-ai/datachain@ea0d153b5f8245ff15edce48d39c3ca84cf4952e
- Branch / Tag: refs/tags/0.52.0
- Owner: https://github.com/datachain-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ea0d153b5f8245ff15edce48d39c3ca84cf4952e
- Trigger Event: release

datachain 0.52.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DataChain - Data Context Layer for Object Storage

1. Quickstart: agent-driven pipeline

2. How it works

3. Core concepts

3.1. Dataset

3.2. Schemas and types

3.3. Fast queries

4. Resilient Pipelines

4.1. Data checkpoints

4.2. Similarity search

4.3. Incremental updates

5. Knowledge Base

6. AI-Generated Pipelines

7. Team and cloud: Studio

8. Contributing

9. Community and Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance