Lance integration for Daft - compaction, indexing, merge columns, and REST operations
Project description
daft-lance
Lance integration for Daft.
Install
# Install just the daft-lance extension
pip install daft-lance
# Install daft with the daft-lance extension
pip install 'daft[lance]'
Usage
Compaction
from daft_lance import compact_files
compact_files("s3://bucket/my_dataset")
Scalar Indexing
from daft_lance import create_scalar_index
create_scalar_index("s3://bucket/my_dataset", column="name", index_type="INVERTED")
Column Merging
from daft_lance import merge_columns_df
merge_columns_df(df, "s3://bucket/my_dataset")
Migration
The migration only requires replacing daft.io.lance with daft_lance.
# See changes in current directory and all subdirectories
find . -type f -name "*.py" -exec sed 's/daft\.io\.lance/daft_lance/g' {} +
# Apply the changes
find . -type f -name "*.py" -exec sed -i 's/daft\.io\.lance/daft_lance/g' {} +
Blob Support
The daft_lance extension supports Lance BLOB V2 by reading descriptors
into the following daft datatype. Note that daft.read_lance will NOT
materialize Lance BLOB V2 bytes.
{
kind: uint8,
position: uint64,
size: uint64,
blob_id: uint32,
blob_uri: string,
}
To materialize blobs, read the dataset with row IDs enabled and call take_blobs:
import lance
import daft
from daft_lance import take_blobs
ds = lance.dataset("s3://bucket/my_dataset")
df = daft.read_lance(ds.uri, default_scan_options={"with_row_id": True})
df = take_blobs(df, ds, "blob_column")
# each value is a lance.Blob — call .read() to fetch bytes
blobs = df.select("blob_column").to_pydict()["blob_column"]
data = blobs[0].read()
To write binary columns as Lance Blob V2, use the blob_columns opt-in:
import daft
df = daft.from_pydict({"id": [1, 2, 3], "data": [b"...", b"...", b"..."]})
df.write_lance("s3://bucket/my_dataset", blob_columns=["data"]).collect()
Development
Requires uv.
# Sync the development environment
make sync
# Run tests
make test
# Run linting and type checks
make lint
make typecheck
# Format code
make format
# Run all pre-commit hooks
make precommit
# Build sdist and wheel packages
make build
License
Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file daft_lance-0.4.0.tar.gz.
File metadata
- Download URL: daft_lance-0.4.0.tar.gz
- Upload date:
- Size: 142.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47c179f402870c434030083287676e154cff282779d5871844e79ae99f2f108e
|
|
| MD5 |
f13b1c440f49e064e69dd7d65c6764f2
|
|
| BLAKE2b-256 |
d2648a9b5e8356c86571b46c94ca97ff7d8f635ce2666da8135d4bb816aa5270
|
Provenance
The following attestation bundles were made for daft_lance-0.4.0.tar.gz:
Publisher:
publish-package.yml on daft-engine/daft-lance
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
daft_lance-0.4.0.tar.gz -
Subject digest:
47c179f402870c434030083287676e154cff282779d5871844e79ae99f2f108e - Sigstore transparency entry: 1736399743
- Sigstore integration time:
-
Permalink:
daft-engine/daft-lance@deeb7985f0d64f32cf1ee9696bc25d1ea97e2bc5 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/daft-engine
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-package.yml@deeb7985f0d64f32cf1ee9696bc25d1ea97e2bc5 -
Trigger Event:
release
-
Statement type:
File details
Details for the file daft_lance-0.4.0-py3-none-any.whl.
File metadata
- Download URL: daft_lance-0.4.0-py3-none-any.whl
- Upload date:
- Size: 40.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3f35dd9d8b90246807eec9b8b3bef8fb412c7deff301e9ec7c1aa7f93c97693
|
|
| MD5 |
4c5afdadf3bf96153e83d217447a8b41
|
|
| BLAKE2b-256 |
d2f5b416b00e630940b2f7b1422782fa5ff32b712c831bdaee5c25f05be9d032
|
Provenance
The following attestation bundles were made for daft_lance-0.4.0-py3-none-any.whl:
Publisher:
publish-package.yml on daft-engine/daft-lance
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
daft_lance-0.4.0-py3-none-any.whl -
Subject digest:
c3f35dd9d8b90246807eec9b8b3bef8fb412c7deff301e9ec7c1aa7f93c97693 - Sigstore transparency entry: 1736399868
- Sigstore integration time:
-
Permalink:
daft-engine/daft-lance@deeb7985f0d64f32cf1ee9696bc25d1ea97e2bc5 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/daft-engine
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-package.yml@deeb7985f0d64f32cf1ee9696bc25d1ea97e2bc5 -
Trigger Event:
release
-
Statement type: