Alluxio Fsspec provides Alluxio filesystem spec implementation.

These details have been verified by PyPI

Maintainers

alluxioSsy jiamingmai littleEast8 lucy_alluxio luqqiu

Project description

Alluxio FileSystem

This quickstart shows how you can use the FSSpec interface to connect to Alluxio. For more information on what to expect, please read the blog Accelerate data loading in large scale ML training with Ray and Alluxio.

Dependencies

A running Alluxio server with ETCD membership service

Alluxio version >= 309

Launch Alluxio clusters with the example configuration

# only one master, one worker are running in this example
alluxio.master.hostname=localhost
alluxio.worker.hostname=localhost

# Critical properties for this example
# UFS address (e.g., the src of data to cache), change it to your bucket
alluxio.dora.client.ufs.root=s3://example_bucket/datasets/
# storage dir
alluxio.worker.page.store.dirs=/tmp/page_ufs
# size of storage dir
alluxio.worker.page.store.sizes=10GB
# use etcd to keep consistent hashing ring
alluxio.worker.membership.manager.type=ETCD
# default etcd endpoint
alluxio.etcd.endpoints=http://localhost:2379
# number of vnodes per worker on the ring
alluxio.user.consistent.hash.virtual.node.count.per.worker=5

# Other optional settings, good to have
alluxio.job.batch.size=200
alluxio.master.journal.type=NOOP
alluxio.master.scheduler.initial.wait.time=10s
alluxio.network.netty.heartbeat.timeout=5min
alluxio.underfs.io.threads=50

Python Dependencies

Python in range of [3.8, 3.9, 3.10] ray >= 2.8.2 fsspec released after 2023.6

Install fsspec implementation for underlying data storage

Alluxio fsspec acts as a cache on top of an existing underlying data lake storage connection. The fsspec implementation corresponding to the underlying data lake storage needs to be installed. In the below Alluxio configuration example, Amazon S3 is the data lake storage where the dataset is read from.

To connect to an existing underlying storage, there are two requirements

Install the underlying storage fsspec
- For all built-in storage fsspec, no extra python libraries are needed to be installed.
- For all third-party storage fsspec, the third-party fsspec python libraries are needed to be installed.
Set credentials for the underlying data lake storage

Example: Deploy S3 as the underlying data lake storage Install third-party S3 fsspec

pip install s3fs

Install alluxiofs

Directly install the latest published alluxiofs

pip install alluxiofs

[Optional] Install from the source code

git clone git@github.com:fsspec/alluxiofs.git
cd alluxiofs && python3 setup.py bdist_wheel && \
     pip3 install dist/alluxiofs-<alluxiofs_version>-py3-none-any.whl

Running a Hello World Example

Load the dataset

Load dataset using Alluxio CLI load command

bin/alluxio job load --path s3://example_bucket/datasets/ --submit

This will trigger a load job asynchronously with a job ID specified. You can wait until the load finishes or check the progress of this loading process using the following command:

bin/alluxio job load --path s3://example_bucket/datasets/ --progress

Create a AlluxioFS (backed by S3)

Create the Alluxio Filesystem with data backed in S3

import fsspec
from alluxiofs import AlluxioFileSystem

# Register Alluxio to fsspec
fsspec.register_implementation("alluxiofs", AlluxioFileSystem, clobber=True)

# Create Alluxio filesystem
alluxio_fs = fsspec.filesystem("alluxiofs", etcd_hosts="localhost", etcd_port=2379, target_protocol="s3")

Run Alluxio FileSystem operations

Similar to fsspec examples and alluxiofs examples. Note that all the read operations can only succeed if the parent folder has been loaded into Alluxio.

# list files
contents = alluxio_fs.ls("s3://apc999/datasets/nyc-taxi-csv/green-tripdata/", detail=True)

# Read files
with alluxio_fs.open("s3://apc999/datasets/nyc-taxi-csv/green-tripdata/green_tripdata_2021-01.csv", "rb") as f:
    data = f.read()

Running an example with Ray

import fsspec
import ray
from alluxiofs import AlluxioFileSystem

# Register the Alluxio fsspec implementation
fsspec.register_implementation("alluxiofs", AlluxioFileSystem, clobber=True)
alluxio_fs = fsspec.filesystem(
  "alluxiofs", etcd_hosts="localhost", target_protocol="s3"
)

# Pass the initialized Alluxio filesystem to Ray and read the NYC taxi ride data set
ds = ray.data.read_csv("s3://example_bucket/datasets/example.csv", filesystem=alluxio_fs)

# Get a count of the number of records in the single CSV file
ds.count()

# Display the schema derived from the CSV file header record
ds.schema()

# Display the header record
ds.take(1)

# Display the first data record
ds.take(2)

# Read multiple CSV files:
ds2 = ray.data.read_csv("s3://apc999/datasets/csv_dir/", filesystem=alluxio_fs)

# Get a count of the number of records in the twelve CSV files
ds2.count()

# End of Python example

Enable alluxiocommon enhancement module

alluxiocommon package is a native enhancement module for alluxiofs based on PyO3 rust bindings. Currently it enhances big reads (multi-page reads from alluxio) by issuing multi-threaded requests to alluxio.

to enable it, first install alluxiocommon package:

pip install alluxiocommon

and when start the Alluxio fsspec instance, add an additional option flag:

alluxio_options = {"alluxio.common.extension.enable" : "True"}
alluxio_fs = fsspec.filesystem(
  "alluxiofs", etcd_hosts="localhost", target_protocol="s3",
  options=alluxio_options
)

Running examples with Pyarrow

import fsspec
from alluxiofs import AlluxioFileSystem

# Register the Alluxio fsspec implementation
fsspec.register_implementation("alluxiofs", AlluxioFileSystem, clobber=True)
alluxio_fs = fsspec.filesystem(
  "alluxiofs", etcd_hosts="localhost", target_protocol="s3"
)

# Example 1
# Pass the initialized Alluxio filesystem to Pyarrow and read the data set from the example parquet file
import pyarrow.dataset as ds
dataset = ds.dataset("s3://example_bucket/datasets/example.parquet", filesystem=alluxio_fs)

# Get a count of the number of records in the parquet file
dataset.count_rows()

# Display the schema derived from the parquet file header record
dataset.schema

# Display the first record
dataset.take(0)

# Example 2
# Create a python-based PyArrow filesystem using FsspecHandler
py_fs = PyFileSystem(FSSpecHandler(alluxio_file_system))

# Read the data by using the Pyarrow filesystem interface
with py_fs.open_input_file("s3://example_bucket/datasets/example.parquet") as f:
    alluxio_file_data = f.read()

# End of Python example

benchmark

If you want to benchmark the Python SDK against FUSE, you can run the following command:

/bin/bash benchmark_launch.sh

Project details

These details have been verified by PyPI

Maintainers

alluxioSsy jiamingmai littleEast8 lucy_alluxio luqqiu

Release history Release notifications | RSS feed

1.1.18rc10 pre-release

Jan 19, 2026

1.1.18rc9 pre-release

Jan 16, 2026

1.1.18rc8 pre-release

Jan 13, 2026

1.1.18rc7 pre-release

Jan 9, 2026

1.1.18rc6 pre-release

Jan 8, 2026

1.1.18rc5 pre-release

Jan 7, 2026

1.1.18rc4 pre-release

Jan 7, 2026

1.1.18rc3 pre-release

Dec 30, 2025

1.1.18rc2 pre-release

Dec 29, 2025

1.1.18rc1 pre-release

Dec 19, 2025

1.1.17

Nov 21, 2025

1.1.17rc8 pre-release

Dec 19, 2025

1.1.17rc7 pre-release

Dec 16, 2025

1.1.17rc6 pre-release

Dec 10, 2025

1.1.17rc5 pre-release

Dec 10, 2025

1.1.17rc4 pre-release

Dec 7, 2025

1.1.17rc3 pre-release

Dec 7, 2025

1.1.17rc2 pre-release

Dec 7, 2025

1.1.17rc1 pre-release

Dec 7, 2025

1.1.16

Nov 18, 2025

1.1.15

Nov 18, 2025

1.1.14

Nov 14, 2025

1.1.13

Nov 12, 2025

1.1.12

Nov 11, 2025

1.1.11

Nov 10, 2025

1.1.10

Nov 10, 2025

1.1.9

Nov 10, 2025

1.1.8

Nov 7, 2025

1.1.7

Nov 7, 2025

1.1.6

Nov 6, 2025

1.1.6rc2 pre-release

Nov 7, 2025

1.1.6rc1 pre-release

Nov 7, 2025

1.1.5

Nov 5, 2025

1.1.5rc6 pre-release

Nov 6, 2025

1.1.5rc5 pre-release

Nov 6, 2025

1.1.5rc4 pre-release

Nov 6, 2025

1.1.5rc3 pre-release

Nov 6, 2025

1.1.5rc2 pre-release

Nov 6, 2025

1.1.5rc1 pre-release

Nov 6, 2025

1.1.4

Nov 3, 2025

1.1.3

Oct 30, 2025

1.1.3rc1 pre-release

Oct 31, 2025

1.1.2

Oct 29, 2025

1.1.2rc2 pre-release

Oct 30, 2025

1.1.1

Oct 28, 2025

1.1.1rc1 pre-release

Oct 29, 2025

1.1.0

Oct 27, 2025

1.0.22

Oct 25, 2025

1.0.21

Oct 25, 2025

1.0.20

Oct 23, 2025

1.0.19

Oct 20, 2025

1.0.18

Oct 13, 2025

1.0.17

Aug 21, 2025

1.0.16

Aug 13, 2025

This version

1.0.15.post1

Jun 16, 2026

1.0.15

Aug 11, 2025

1.0.7

Mar 5, 2025

1.0.6

Jan 16, 2025

1.0.5

Jan 15, 2025

1.0.3

Jun 3, 2024

1.0.2

Apr 5, 2024

1.0.1

Mar 27, 2024

1.0.0

Feb 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alluxiofs-1.0.15.post1.tar.gz (70.2 kB view details)

Uploaded Jun 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

alluxiofs-1.0.15.post1-py3-none-any.whl (74.0 kB view details)

Uploaded Jun 16, 2026 Python 3

File details

Details for the file alluxiofs-1.0.15.post1.tar.gz.

File metadata

Download URL: alluxiofs-1.0.15.post1.tar.gz
Upload date: Jun 16, 2026
Size: 70.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for alluxiofs-1.0.15.post1.tar.gz
Algorithm	Hash digest
SHA256	`9a8f4577c54bb4279e36ecf485c230a456e18188e3c0aaadb05e0fcea566b17e`
MD5	`c6cf4a84fc8dab4c95675c2cefc2382c`
BLAKE2b-256	`dffe0d26929d203dba1dae1d58ad11108af0fc1d0392deac4601e7a0877d9790`

See more details on using hashes here.

File details

Details for the file alluxiofs-1.0.15.post1-py3-none-any.whl.

File metadata

Download URL: alluxiofs-1.0.15.post1-py3-none-any.whl
Upload date: Jun 16, 2026
Size: 74.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for alluxiofs-1.0.15.post1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`082aee97106b2b65d7c0f9a5428c58dc4f4cbda7008d8a2d1c3a9e649766b04e`
MD5	`7a4b7593e18743b713c24d6554fa736a`
BLAKE2b-256	`f94d543e11b1dd2919377ed79bc189595d72089cfb70d0aa6fc0e2f696ffb79c`

See more details on using hashes here.

alluxiofs 1.0.15.post1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

Alluxio FileSystem

Dependencies

A running Alluxio server with ETCD membership service

Python Dependencies

Install fsspec implementation for underlying data storage

Install alluxiofs

Running a Hello World Example

Load the dataset

Load dataset using Alluxio CLI load command

Create a AlluxioFS (backed by S3)

Run Alluxio FileSystem operations

Running an example with Ray

Enable alluxiocommon enhancement module

Running examples with Pyarrow

benchmark

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes