Python bindings for milvus-storage - a high-performance storage engine using Apache Arrow Parquet
Project description
milvus-storage Python Bindings
Python bindings for milvus-storage, a high-performance storage engine using Apache Arrow Parquet as the underlying format, optimized for analytical workloads.
Features
- High Performance: Built on Apache Arrow and Parquet for efficient columnar storage
- Packed Storage: Groups narrow columns together to reduce file count and control memory usage
- Cloud Native: Support for AWS S3, Azure Blob Storage, Google Cloud Storage, and more
- Zero-Copy: Efficient data transfer between Python and C++ using Arrow C Data Interface
- Pythonic API: Clean, intuitive interface following Python best practices
- Type Safe: Full PyArrow integration with schema validation
Installation
Prerequisites
- Python 3.8 or later
- C++ compiler (for building from source)
- Conan (for C++ dependencies)
Install from Source
# Clone the repository
git clone https://github.com/milvus-io/milvus-storage.git
cd milvus-storage
# Build the C++ library
cd cpp
make python-lib
cd ..
# Install Python package
cd python
pip install -e .
Examples
See the examples/ directory for complete working examples:
basic_write.py- Writing data to milvus-storagebasic_read.py- Reading data with full table scan, filtering, and column projection
API Reference
Writer
Create a writer to store data in milvus-storage format.
Writer(path: str, schema: pa.Schema, properties: Optional[Dict[str, str]] = None)
Methods:
write(batch: pa.RecordBatch)- Write a record batchflush()- Flush buffered data to storageclose() -> str- Close writer and return manifest JSON
Reader
Read data from milvus-storage datasets.
Reader(
manifest: str,
schema: pa.Schema,
columns: Optional[List[str]] = None,
properties: Optional[Dict[str, str]] = None
)
Methods:
scan(predicate: Optional[str] = None) -> pa.RecordBatchReader- Full table scantake(indices: Union[List[int], np.ndarray], parallelism: int = 1) -> pa.RecordBatch- Random access (not yet implemented)get_chunk_reader(column_group_id: int) -> ChunkReader- Get chunk reader for column group
Properties
Configuration properties for milvus-storage.
Common Properties:
| Property | Description | Default |
|---|---|---|
fs.storage_type |
Storage type (local, s3, azure, etc.) | - |
fs.root_path |
Root path for local storage | - |
storage.memory.limit |
Memory limit in bytes | - |
storage.row_group.max_size |
Max row group size | - |
storage.batch.size |
Batch size for reading | 8192 |
storage.s3.access_key_id |
AWS access key | - |
storage.s3.secret_access_key |
AWS secret key | - |
storage.s3.region |
AWS region | - |
storage.azure.account_name |
Azure account name | - |
storage.azure.account_key |
Azure account key | - |
Testing
Run tests with pytest:
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=milvus_storage --cov-report=html
Development
Building from Source
# Build C++ library
cd cpp
make python-lib
# Install in development mode
cd ../python
pip install -e ".[dev]"
# Run tests
pytest tests/
Requirements
- Python >= 3.8
- pyarrow >= 10.0.0
- numpy >= 1.20.0
License
Apache License 2.0. See LICENSE for details.
Contributing
Contributions are welcome! Please see the main repository for contribution guidelines.
Support
- GitHub Issues: milvus-storage issues
- Documentation: GitHub Repository
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file milvus_storage-0.1.0.dev1.tar.gz.
File metadata
- Download URL: milvus_storage-0.1.0.dev1.tar.gz
- Upload date:
- Size: 92.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af0ae58ccde500e776591a1f105651011049142419a366072d98b679a1d813cf
|
|
| MD5 |
c15700bf79c6aa7cbb4ecd10f002d914
|
|
| BLAKE2b-256 |
80371c3b487ba22749d176bdef1f1646c5f2c80bc54c1ae4e76725a312032ea5
|
File details
Details for the file milvus_storage-0.1.0.dev1-py3-none-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: milvus_storage-0.1.0.dev1-py3-none-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 21.0 MB
- Tags: Python 3, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
710f90b471d352d3a613e2addda2235c4b4d72fe0281db75ee409d9b9abd3bda
|
|
| MD5 |
871e7eba2694dbd50d9ea144db750d98
|
|
| BLAKE2b-256 |
c6812f093e80d47001caacef37794749471abd484eb27b44048093c221085e7f
|
File details
Details for the file milvus_storage-0.1.0.dev1-py3-none-any.whl.
File metadata
- Download URL: milvus_storage-0.1.0.dev1-py3-none-any.whl
- Upload date:
- Size: 20.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb67d5c2626844e5bf59ff38d12669de959c94ff880cbdd4b6fc2a2ef15ce218
|
|
| MD5 |
221b596c93ed1358b64923831e05aca3
|
|
| BLAKE2b-256 |
6597074fb0eac2289d60e8f540d35a271852276e84f4fe1d72ccf2fa6529095c
|