Skip to main content

Lance integration for Daft - compaction, indexing, merge columns, and REST operations

Project description

daft-lance

Lance integration for Daft.

Install

# Install just the daft-lance extension
pip install daft-lance

# Install daft with the daft-lance extension
pip install 'daft[lance]'

Usage

Compaction

from daft_lance import compact_files

compact_files("s3://bucket/my_dataset")

Scalar Indexing

from daft_lance import create_scalar_index

create_scalar_index("s3://bucket/my_dataset", column="name", index_type="INVERTED")

Column Merging

from daft_lance import merge_columns_df

merge_columns_df(df, "s3://bucket/my_dataset")

Migration

The migration only requires replacing daft.io.lance with daft_lance.

# See changes in current directory and all subdirectories
find . -type f -name "*.py" -exec sed 's/daft\.io\.lance/daft_lance/g' {} +

# Apply the changes
find . -type f -name "*.py" -exec sed -i 's/daft\.io\.lance/daft_lance/g' {} +

Blob Support

The daft_lance extension supports Lance BLOB V2 by reading descriptors into the following daft datatype. Note that daft.read_lance will NOT materialize Lance BLOB V2 bytes.

{
  kind: uint8,
  position: uint64,
  size: uint64,
  blob_id: uint32,
  blob_uri: string,
}

To materialize blobs, read the dataset with row IDs enabled and call take_blobs:

import lance
import daft
from daft_lance import take_blobs

ds = lance.dataset("s3://bucket/my_dataset")
df = daft.read_lance(ds.uri, default_scan_options={"with_row_id": True})
df = take_blobs(df, ds, "blob_column")

# each value is a lance.Blob — call .read() to fetch bytes
blobs = df.select("blob_column").to_pydict()["blob_column"]
data = blobs[0].read()

To write binary columns as Lance Blob V2, use the blob_columns opt-in:

import daft

df = daft.from_pydict({"id": [1, 2, 3], "data": [b"...", b"...", b"..."]})
df.write_lance("s3://bucket/my_dataset", blob_columns=["data"]).collect()

Development

Requires uv.

# Sync the development environment
make sync

# Run tests
make test

# Run linting and type checks
make lint
make typecheck

# Format code
make format

# Run all pre-commit hooks
make precommit

# Build sdist and wheel packages
make build

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daft_lance-0.4.0.tar.gz (142.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

daft_lance-0.4.0-py3-none-any.whl (40.9 kB view details)

Uploaded Python 3

File details

Details for the file daft_lance-0.4.0.tar.gz.

File metadata

  • Download URL: daft_lance-0.4.0.tar.gz
  • Upload date:
  • Size: 142.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for daft_lance-0.4.0.tar.gz
Algorithm Hash digest
SHA256 47c179f402870c434030083287676e154cff282779d5871844e79ae99f2f108e
MD5 f13b1c440f49e064e69dd7d65c6764f2
BLAKE2b-256 d2648a9b5e8356c86571b46c94ca97ff7d8f635ce2666da8135d4bb816aa5270

See more details on using hashes here.

Provenance

The following attestation bundles were made for daft_lance-0.4.0.tar.gz:

Publisher: publish-package.yml on daft-engine/daft-lance

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file daft_lance-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: daft_lance-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 40.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for daft_lance-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c3f35dd9d8b90246807eec9b8b3bef8fb412c7deff301e9ec7c1aa7f93c97693
MD5 4c5afdadf3bf96153e83d217447a8b41
BLAKE2b-256 d2f5b416b00e630940b2f7b1422782fa5ff32b712c831bdaee5c25f05be9d032

See more details on using hashes here.

Provenance

The following attestation bundles were made for daft_lance-0.4.0-py3-none-any.whl:

Publisher: publish-package.yml on daft-engine/daft-lance

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page