Skip to main content

Lance integration for Daft - compaction, indexing, merge columns, and REST operations

Project description

daft-lance

Lance integration for Daft.

Install

# Install just the daft-lance extension
pip install daft-lance

# Install daft with the daft-lance extension
pip install 'daft[lance]'

Usage

Compaction

from daft_lance import compact_files

compact_files("s3://bucket/my_dataset")

Scalar Indexing

from daft_lance import create_scalar_index

create_scalar_index("s3://bucket/my_dataset", column="name", index_type="INVERTED")

Column Merging

from daft_lance import merge_columns_df

merge_columns_df(df, "s3://bucket/my_dataset")

Migration

The migration only requires replacing daft.io.lance with daft_lance.

# See changes in current directory and all subdirectories
find . -type f -name "*.py" -exec sed 's/daft\.io\.lance/daft_lance/g' {} +

# Apply the changes
find . -type f -name "*.py" -exec sed -i 's/daft\.io\.lance/daft_lance/g' {} +

Blob Support

The daft_lance extension supports Lance BLOB V2 by reading descriptors into the following daft datatype. Note that daft.read_lance will NOT materialize Lance BLOB V2 bytes.

{
  kind: uint8,
  position: uint64,
  size: uint64,
  blob_id: uint32,
  blob_uri: string,
}

To materialize blobs, read the dataset with row IDs enabled and call take_blobs:

import lance
import daft
from daft_lance import take_blobs

ds = lance.dataset("s3://bucket/my_dataset")
df = daft.read_lance(ds.uri, default_scan_options={"with_row_id": True})
df = take_blobs(df, ds, "blob_column")

# each value is a lance.Blob — call .read() to fetch bytes
blobs = df.select("blob_column").to_pydict()["blob_column"]
data = blobs[0].read()

To write binary columns as Lance Blob V2, use the blob_columns opt-in:

import daft

df = daft.from_pydict({"id": [1, 2, 3], "data": [b"...", b"...", b"..."]})
df.write_lance("s3://bucket/my_dataset", blob_columns=["data"]).collect()

Development

Requires uv.

uv sync
uv run pytest tests/ -v

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daft_lance-0.3.3.tar.gz (136.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daft_lance-0.3.3-py3-none-any.whl (38.8 kB view details)

Uploaded Python 3

File details

Details for the file daft_lance-0.3.3.tar.gz.

File metadata

  • Download URL: daft_lance-0.3.3.tar.gz
  • Upload date:
  • Size: 136.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for daft_lance-0.3.3.tar.gz
Algorithm Hash digest
SHA256 0bc943db5707c70585d37f7a6d77ac001684ae4e53e36bdeb45e0f8f31e5d9bd
MD5 fc05dfbaa71b6890281f58e4d1218e0d
BLAKE2b-256 12ad4defcdb0bf100cd39a4d3daf8c04c01c4a364434796fae39598c1ec8b3a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for daft_lance-0.3.3.tar.gz:

Publisher: publish-package.yml on daft-engine/daft-lance

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file daft_lance-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: daft_lance-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 38.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for daft_lance-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 41ea0c415e50da72bec9a358c4cbfa642fe66b165248fed4a848e0861d87c235
MD5 c3cc961cbc3b0e8a7344e878597c4d21
BLAKE2b-256 f8d780fdea24abe472a7c66a59fe9b08e0a7ce29f5cfae0cebf3ef8a9c6c1f24

See more details on using hashes here.

Provenance

The following attestation bundles were made for daft_lance-0.3.3-py3-none-any.whl:

Publisher: publish-package.yml on daft-engine/daft-lance

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page