Skip to main content

Python bindings for milvus-storage - a high-performance storage engine using Apache Arrow Parquet

Project description

milvus-storage Python Bindings

Python bindings for milvus-storage, a high-performance storage engine using Apache Arrow Parquet as the underlying format, optimized for analytical workloads.

Features

  • High Performance: Built on Apache Arrow and Parquet for efficient columnar storage
  • Packed Storage: Groups narrow columns together to reduce file count and control memory usage
  • Cloud Native: Support for AWS S3, Azure Blob Storage, Google Cloud Storage, and more
  • Zero-Copy: Efficient data transfer between Python and C++ using Arrow C Data Interface
  • Pythonic API: Clean, intuitive interface following Python best practices
  • Type Safe: Full PyArrow integration with schema validation

Installation

Prerequisites

  • Python 3.8 or later
  • C++ compiler (for building from source)
  • Conan (for C++ dependencies)

Install from Source

# Clone the repository
git clone https://github.com/milvus-io/milvus-storage.git
cd milvus-storage

# Build the C++ library
cd cpp
make python-lib
cd ..

# Install Python package
cd python
pip install -e .

Examples

See the examples/ directory for complete working examples:

API Reference

Writer

Create a writer to store data in milvus-storage format.

Writer(path: str, schema: pa.Schema, properties: Optional[Dict[str, str]] = None)

Methods:

  • write(batch: pa.RecordBatch) - Write a record batch
  • flush() - Flush buffered data to storage
  • close() -> str - Close writer and return manifest JSON

Reader

Read data from milvus-storage datasets.

Reader(
    manifest: str,
    schema: pa.Schema,
    columns: Optional[List[str]] = None,
    properties: Optional[Dict[str, str]] = None
)

Methods:

  • scan(predicate: Optional[str] = None) -> pa.RecordBatchReader - Full table scan
  • take(indices: Union[List[int], np.ndarray], parallelism: int = 1) -> pa.RecordBatch - Random access (not yet implemented)
  • get_chunk_reader(column_group_id: int) -> ChunkReader - Get chunk reader for column group

Properties

Configuration properties for milvus-storage.

Common Properties:

Property Description Default
fs.storage_type Storage type (local, s3, azure, etc.) -
fs.root_path Root path for local storage -
storage.memory.limit Memory limit in bytes -
storage.row_group.max_size Max row group size -
storage.batch.size Batch size for reading 8192
storage.s3.access_key_id AWS access key -
storage.s3.secret_access_key AWS secret key -
storage.s3.region AWS region -
storage.azure.account_name Azure account name -
storage.azure.account_key Azure account key -

Testing

Run tests with pytest:

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=milvus_storage --cov-report=html

Development

Building from Source

# Build C++ library
cd cpp
make python-lib

# Install in development mode
cd ../python
pip install -e ".[dev]"

# Run tests
pytest tests/

Requirements

  • Python >= 3.8
  • pyarrow >= 10.0.0
  • numpy >= 1.20.0

License

Apache License 2.0. See LICENSE for details.

Contributing

Contributions are welcome! Please see the main repository for contribution guidelines.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

milvus_storage-0.1.0.dev1.tar.gz (92.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

milvus_storage-0.1.0.dev1-py3-none-manylinux_2_35_x86_64.whl (21.0 MB view details)

Uploaded Python 3manylinux: glibc 2.35+ x86-64

milvus_storage-0.1.0.dev1-py3-none-any.whl (20.8 MB view details)

Uploaded Python 3

File details

Details for the file milvus_storage-0.1.0.dev1.tar.gz.

File metadata

  • Download URL: milvus_storage-0.1.0.dev1.tar.gz
  • Upload date:
  • Size: 92.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for milvus_storage-0.1.0.dev1.tar.gz
Algorithm Hash digest
SHA256 af0ae58ccde500e776591a1f105651011049142419a366072d98b679a1d813cf
MD5 c15700bf79c6aa7cbb4ecd10f002d914
BLAKE2b-256 80371c3b487ba22749d176bdef1f1646c5f2c80bc54c1ae4e76725a312032ea5

See more details on using hashes here.

File details

Details for the file milvus_storage-0.1.0.dev1-py3-none-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for milvus_storage-0.1.0.dev1-py3-none-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 710f90b471d352d3a613e2addda2235c4b4d72fe0281db75ee409d9b9abd3bda
MD5 871e7eba2694dbd50d9ea144db750d98
BLAKE2b-256 c6812f093e80d47001caacef37794749471abd484eb27b44048093c221085e7f

See more details on using hashes here.

File details

Details for the file milvus_storage-0.1.0.dev1-py3-none-any.whl.

File metadata

File hashes

Hashes for milvus_storage-0.1.0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 bb67d5c2626844e5bf59ff38d12669de959c94ff880cbdd4b6fc2a2ef15ce218
MD5 221b596c93ed1358b64923831e05aca3
BLAKE2b-256 6597074fb0eac2289d60e8f540d35a271852276e84f4fe1d72ccf2fa6529095c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page