Skip to main content

Python extension for lance

Project description

Python bindings for Lance file format

Lance is a cloud-native columnar data format designed for managing large-scale computer vision datasets in production environments. Lance delivers blazing fast performance for image and video data use cases from analytics to point queries to training scans.

Why use Lance

You should use lance if you're a ML engineer looking to be 10x more productive when working with computer vision datasets:

  1. Lance saves you from having to manage multiple systems and formats for metadata, raw assets, labeling updates, and vector indices.
  2. Lance's custom column encoding means you don't need to choose between fast analytics and fast point queries.
  3. Lance has a first-class Apache Arrow integration so it's easy to create and query Lance datasets (e.g., you can directly query lance datasets using DuckDB with no extra work)
  4. Did we mention Lance is fast.

Try Lance

Install Lance from pip (use a venv, not conda):

pip install pylance duckdb

In python:

import lance
import duckdb

# Understand Label distribution of Oxford Pet Dataset
ds = lance.dataset("s3://eto-public/datasets/oxford_pet/pet.lance")
duckdb.query('select label, count(1) from ds group by label').to_arrow_table()

Caveat emptor

  • DON'T use Conda as it prefers it's on ld path and libstd etc
  • Currently only wheels are on pypi and no sdist. See below for instructions on building from source.
  • Python 3.8-3.10 is supported on Linux x86_64
  • Python 3.10 on MacOS (both x86_64 and Arm64) is supported

Developing Lance

Install python3, pip, and venv, and setup a virtual environment for Lance. Again, DO NOT USE CONDA (at least for now).

sudo apt install python3-pip python3-venv python3-dev
python3 -m venv ${HOME}/.venv/lance

Arrow C++ libs

Install Arrow C++ libs using instructions from Apache Arrow. These instructions don't include Arrow's python lib so after you go through the above, don't forget to apt install libarrow-python-dev or yum install libarrow-python-devel.

Build pyarrow

Assume CWD is where you want to put the repo:

source ${HOME}/.venv/lance/bin/activate
cd /path/to/lance/python/thirdparty
./build.sh

Make sure pyarrow works properly:

import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds

Build Lance

  1. Build the cpp lib. See lance/cpp/README.md for instructions.
  2. Build the python module in venv:
source ${HOME}/.venv/lance/bin/activate
python setup.py develop

Test the installation using the same queries in Try Lance section.

Project details


Release history Release notifications | RSS feed

This version

0.0.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pylance-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pylance-0.0.3-cp310-cp310-macosx_11_0_arm64.whl (12.7 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pylance-0.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pylance-0.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

File details

Details for the file pylance-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pylance-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e2af1a466d28506acea53d53195c379d8073dd781ff101384c204baf4f7c33a9
MD5 fe87ff3605a08c47bc5d30526a958c24
BLAKE2b-256 049b6dd9c56c76586754dbd86febae059b0e15fe717f7a30be06c6ed6583390d

See more details on using hashes here.

Provenance

File details

Details for the file pylance-0.0.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pylance-0.0.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ba4219b6ef17a0038211535fbebd072810ab3bd0b1ec40480e9462a69cd8789a
MD5 c9b5436aab55c1f23f7406c6fc387a5f
BLAKE2b-256 612eeac62e33253b2fb3bd08f1aed38a49025fdf6f0eb7fd3ce85c4e96cce857

See more details on using hashes here.

Provenance

File details

Details for the file pylance-0.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pylance-0.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b4c15c5ebd85a1dd482e7ca78dc61dc2586a643e760bdb8c25f6e075c38ecb63
MD5 4caca4c0d20b8197f12c07f783e02a88
BLAKE2b-256 a8262c829354aa5a3e86ed36f3042aa49e0bf75639a78d81d7eafcf0c8d21f44

See more details on using hashes here.

Provenance

File details

Details for the file pylance-0.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pylance-0.0.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0c54c7235b75bb802958335792074c12a8ef931f69a1bdc2f1283db022b5e909
MD5 c7590b8b9283629b5d1e04b7f8f4c693
BLAKE2b-256 6998dd24f9decfe2cf9b20dc1452eb43da35bf35ebd5f192b2f755e822713e7a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page