Skip to main content

Python extension for lance

Project description

Python bindings for Lance file format

Lance is a cloud-native columnar data format designed for managing large-scale computer vision datasets in production environments. Lance delivers blazing fast performance for image and video data use cases from analytics to point queries to training scans.

Why use Lance

You should use lance if you're a ML engineer looking to be 10x more productive when working with computer vision datasets:

  1. Lance saves you from having to manage multiple systems and formats for metadata, raw assets, labeling updates, and vector indices.
  2. Lance's custom column encoding means you don't need to choose between fast analytics and fast point queries.
  3. Lance has a first-class Apache Arrow integration so it's easy to create and query Lance datasets (e.g., you can directly query lance datasets using DuckDB with no extra work)
  4. Did we mention Lance is fast.

Try Lance

Install Lance from pip (use a venv, not conda):

pip install pylance duckdb

In python:

import lance
import duckdb

# Understand Label distribution of Oxford Pet Dataset
ds = lance.dataset("s3://eto-public/datasets/oxford_pet/pet.lance")
duckdb.query('select label, count(1) from ds group by label').to_arrow_table()

Caveat emptor

  • DON'T use Conda as it prefers it's on ld path and libstd etc
  • Currently only wheels are on pypi and no sdist. See below for instructions on building from source.
  • Python 3.8-3.10 is supported on Linux x86_64
  • Python 3.10 on MacOS (both x86_64 and Arm64) is supported

Developing Lance

Install python3, pip, and venv, and setup a virtual environment for Lance. Again, DO NOT USE CONDA (at least for now).

sudo apt install python3-pip python3-venv python3-dev
python3 -m venv ${HOME}/.venv/lance

Arrow C++ libs

Install Arrow C++ libs using instructions from Apache Arrow. These instructions don't include Arrow's python lib so after you go through the above, don't forget to apt install libarrow-python-dev or yum install libarrow-python-devel.

Build pyarrow

Assume CWD is where you want to put the repo:

source ${HOME}/.venv/lance/bin/activate
cd /path/to/lance/python/thirdparty
./build.sh

Make sure pyarrow works properly:

import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds

Build Lance

  1. Build the cpp lib. See lance/cpp/README.md for instructions.
  2. Build the python module in venv:
source ${HOME}/.venv/lance/bin/activate
python setup.py develop

Test the installation using the same queries in Try Lance section.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pylance-0.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pylance-0.0.4-cp310-cp310-macosx_11_0_arm64.whl (12.9 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pylance-0.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pylance-0.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

File details

Details for the file pylance-0.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pylance-0.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0109cf5be2771f39db5d913f5daeca53bdaf5fa70761f70f2b9e5717e419b389
MD5 3476632c893bdba93edf78d449a01456
BLAKE2b-256 dbb4f4e37470123b0fc39d1ba9c9b27d9fd7c99b4b9d2d71bb5b6d0be94a2885

See more details on using hashes here.

File details

Details for the file pylance-0.0.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pylance-0.0.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aabd1ccdd14615e8205fbf55364da16f9e5f1f965f653130aece68eea7924ca1
MD5 af436a2e0f716cf080bda9f1aca8d95f
BLAKE2b-256 c40af851ed15d1296d93398473e8aa97b4c262f6ae38d6e8e969c57ac17d7540

See more details on using hashes here.

File details

Details for the file pylance-0.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pylance-0.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9322629b68a4a6db5c5926a91ce57d24e18f4f392dbb199663c44ff1ea0b5b8c
MD5 ab86427d4eff06c2edc3210042c4f192
BLAKE2b-256 5bdc601bce146c2fea8c0596e4ab7c0087d5150d3329006a530654655d913fa7

See more details on using hashes here.

File details

Details for the file pylance-0.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pylance-0.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a283427d48b96bf3b1df9306fe91b77dc768dfcc5cf842ea8deb90f21810ff9d
MD5 3ba0ec4ba87e6865bb9876f0cf35ed1b
BLAKE2b-256 d0fda80bb8ac4acfcf35ea9b07154d33d936baecc248cf9db245349c86366526

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page