Write file storage code once. Run it against local files, S3, SFTP, or Azure.
Project description
remote-store
Write file storage code once. Run it against local files, S3, SFTP, or Azure.
Beta software. The core API is stable, but minor versions may still contain breaking changes before 1.0. See the changelog for what's new, and open an issue if something breaks.
Most Python projects that deal with files eventually grow storage glue: small wrappers around local paths, S3 clients, SFTP connections, and cloud SDKs. Those wrappers are usually duplicated across projects, slightly inconsistent, and painful to replace later.
remote-store replaces them with one simple interface.
Write file storage code once. Run it against local files, S3, SFTP, or Azure.
Where files live is configuration, not application code.
Under the hood, established Python libraries (s3fs, paramiko,
azure-storage-file-datalake) still do the work.
Who this is for
- Platform and internal tooling teams -- provide one stable storage interface across environments
- Data engineering teams -- pipelines that run against local storage, S3, or SFTP depending on the environment
- Teams that include citizen developers -- analysts and domain experts who write Python shouldn't need to learn cloud SDKs just to read and write files
- Anyone tired of rewriting storage wrappers
What you get
- One interface, many backends: local fs, S3, SFTP, Azure, in-memory
- Folder-scoped stores: each Store is rooted at a folder -- compose layouts with multiple stores or narrow scope with
child() - Swap backends via config: move between environments without changing code
- Streaming by default: large files just work without blowing up memory
- Atomic writes where supported: safer updates for file-producing workflows
- Established libraries underneath:
s3fs,paramiko, etc. do the real work - Zero runtime dependencies: backend extras pull in only what they need
- Typed and tested: strict mypy, spec-driven test suite
- Optional integrations: PyArrow filesystem adapter, OpenTelemetry tracing and metrics
What it is not
- Not a query engine (no SQL, no predicate pushdown)
- Not a table format (no Delta Lake log, no Iceberg manifests)
- Not a filesystem reimplementation (delegates to
s3fs,paramiko,azure-storage-file-datalake,pyarrow-- the libraries you'd pick anyway)
Installation
Install from PyPI:
pip install remote-store
Backends that need extra dependencies use extras:
pip install "remote-store[s3]" # Amazon S3 / MinIO
pip install "remote-store[sftp]" # SFTP / SSH
pip install "remote-store[azure]" # Azure Blob / ADLS Gen2
Optional extras for tooling and config formats:
pip install "remote-store[arrow]" # PyArrow filesystem adapter
pip install "remote-store[s3-pyarrow]" # S3 with PyArrow (high-throughput)
pip install "remote-store[otel]" # OpenTelemetry tracing and metrics
pip install "remote-store[toml]" # TOML config (backport for Python 3.10)
pip install "remote-store[yaml]" # YAML config (pyyaml)
pip install "remote-store[pydantic]" # Pydantic config (pydantic-settings)
Quick Start
The simplest way to use remote-store (examples/quickstart.py):
from remote_store import Store
from remote_store.backends import LocalBackend
store = Store(LocalBackend(root="/tmp/data"))
store.write("hello.txt", b"Hello, world!")
print(store.read_bytes("hello.txt")) # b'Hello, world!'
For applications that manage multiple backends or switch between environments, use a Registry with declarative config:
from remote_store import Registry, RegistryConfig
config = RegistryConfig.from_dict({
"backends": {"main": {"type": "local", "options": {"root": "/tmp/data"}}},
"stores": {"data": {"backend": "main", "root_path": ""}},
})
with Registry(config) as registry:
store = registry.get_store("data")
store.write("hello.txt", b"Hello, world!")
print(store.read_bytes("hello.txt")) # b'Hello, world!'
Switch to S3 by changing the config. The application code stays the same:
Dev config:
[backends.main]
type = "local"
options = { root = "/tmp/data" }
[stores.data]
backend = "main"
root_path = "reports"
Production -- just swap the backend:
[backends.main]
type = "s3"
options = { bucket = "analytics-data" }
[stores.data]
backend = "main"
root_path = "reports"
config = RegistryConfig.from_toml("remote-store.toml")
Configuration
Configuration is declarative and immutable. Load from TOML, YAML, Pydantic, a dict, or build with Python objects:
from remote_store import RegistryConfig
# From a TOML file (zero dependencies on Python 3.11+):
config = RegistryConfig.from_toml("remote-store.toml")
# From pyproject.toml:
config = RegistryConfig.from_toml("pyproject.toml", table=("tool", "remote-store"))
# From YAML (requires pyyaml or ruamel.yaml):
config = RegistryConfig.from_yaml("remote-store.yaml")
# From Pydantic BaseSettings (requires pydantic-settings):
from remote_store.ext.pydantic import pydantic_to_registry_config
config = pydantic_to_registry_config(my_settings)
# From a dict (e.g. loaded from JSON):
config = RegistryConfig.from_dict({
"backends": {
"local": {"type": "local", "options": {"root": "/data"}},
},
"stores": {
"uploads": {"backend": "local", "root_path": "uploads"},
"reports": {"backend": "local", "root_path": "reports"},
},
})
Credential hygiene
Credentials passed through from_dict() are automatically wrapped in Secret, which masks values in repr() and str() to prevent accidental leakage in logs or tracebacks. Sensitive keys: key, secret, password, account_key, sas_token, connection_string.
from remote_store import RegistryConfig, Secret
# Auto-wrapped by from_dict():
config = RegistryConfig.from_dict({
"backends": {"s3": {"type": "s3", "options": {
"bucket": "my-bucket",
"key": "AKIA...",
"secret": "wJalr...",
}}},
"stores": {"data": {"backend": "s3", "root_path": "data"}},
})
print(config.backends["s3"].options["secret"]) # → ***
# Or wrap manually:
secret = Secret("my-secret-key")
secret.reveal() # → 'my-secret-key'
Store API
Read & write
| Method | Description |
|---|---|
read(path) |
Streaming read (BinaryIO) |
read_bytes(path) |
Full content as bytes |
write(path, content) |
Write bytes or binary stream |
write_atomic(path, content) |
Write via temp file + rename |
open_atomic(path) |
Streaming write via temp + rename |
Browse & inspect
| Method | Description |
|---|---|
list_files(path, pattern=…) |
Iterate FileInfo, optional name filter |
list_folders(path) |
Iterate subfolder names |
glob(pattern) |
Native glob (capability-gated) |
exists(path) |
Check if a file or folder exists |
is_file(path) / is_folder(path) |
Type checks |
get_file_info(path) |
File metadata (FileInfo) |
get_folder_info(path) |
Folder metadata (FolderInfo) |
Manage
| Method | Description |
|---|---|
delete(path) |
Delete a file |
delete_folder(path) |
Delete a folder |
move(src, dst) |
Move or rename |
copy(src, dst) |
Copy a file |
Utility
| Method | Description |
|---|---|
child(subpath) |
Return a child store scoped to a subfolder |
supports(capability) |
Check if the backend supports a capability |
to_key(path) |
Convert native/absolute path to store-relative key |
native_path(key) |
Convert store-relative key to backend-native path |
unwrap(type_hint) |
Get backend's native handle (e.g., pyarrow.fs.FileSystem) |
close() |
Close the underlying backend |
All write/move/copy methods accept overwrite=True to replace existing files.
For full details, see the API reference.
Supported Backends
| Backend | Status | Extra |
|---|---|---|
| Local filesystem | Built-in | |
| Memory (in-process) | Built-in | |
| Amazon S3 / MinIO | Built-in | remote-store[s3] |
| S3 (PyArrow) | Built-in | remote-store[s3-pyarrow] |
| SFTP / SSH | Built-in | remote-store[sftp] |
| Azure Blob / ADLS | Built-in | remote-store[azure] |
Detailed configuration guides for each backend are in guides/backends/.
Extensions
All extensions live in remote_store.ext and are optional -- import only what you need.
| Extension | Extra | Description |
|---|---|---|
| PyArrow adapter | remote-store[arrow] |
Use any Store as a pyarrow.fs.FileSystem for Parquet, datasets, Pandas, Polars, DuckDB (guide, example) |
| Batch operations | (none) | Bulk delete, copy, and exists with error aggregation (guide, example) |
| Transfer operations | (none) | Upload, download, and cross-store transfer with streaming and progress (guide, example) |
| Observability hooks | (none) | Callback-based instrumentation for logging, metrics, and tracing (guide, example) |
| OpenTelemetry bridge | remote-store[otel] |
Pre-built OTel spans and metrics for Store operations (guide, example) |
| Glob helpers | (none) | Portable glob fallback for backends without native glob support (guide, example) |
| Caching middleware | (none) | TTL-based read cache with automatic invalidation on mutations (guide, API) |
| Partition helpers | (none) | Hive-style partition path builder and parser (API) |
| Pydantic adapter | remote-store[pydantic] |
Convert Pydantic BaseSettings to RegistryConfig (API) |
Examples
Runnable scripts in examples/:
Core -- run locally, no external services needed:
| Script | What it shows |
|---|---|
| quickstart.py | Direct construction and Registry config |
| file_operations.py | Full Store API: read, write, delete, move, copy, list, metadata, type checks, capabilities, to_key |
| streaming_io.py | Streaming writes and reads with BytesIO |
| atomic_writes.py | Atomic writes and overwrite semantics |
| configuration.py | Config-as-code, from_dict(), multiple stores, S3/SFTP backend configs |
| error_handling.py | Catching NotFound, AlreadyExists, etc. |
| glob_pattern_matching.py | Three-tier glob: name filter, native glob, portable fallback |
| memory_backend.py | In-process memory backend for testing and caching |
| store_child.py | Runtime sub-scoping with Store.child() |
| pyarrow_adapter.py | PyArrow filesystem adapter: Parquet, datasets |
| batch_operations.py | Bulk delete, copy, exists with error aggregation |
| transfer_operations.py | Upload, download, cross-store transfer with progress |
| observe_hooks.py | Callback hooks, around spans, buffered observer |
| otel_tracing.py | OpenTelemetry tracing and metrics bridge |
Backend -- require a running service and credentials (examples/backends/):
| Script | What it shows |
|---|---|
| s3_backend.py | S3 / MinIO: config, two stores, virtual folders |
| s3_pyarrow_backend.py | High-throughput S3 via PyArrow C++ + escape hatch |
| sftp_backend.py | SSH/SFTP: config, host key policies, unwrap() |
| azure_backend.py | Azure Blob / ADLS Gen2: config, auth methods, unwrap() |
Interactive Jupyter notebooks are available in examples/notebooks/.
Known Limitations
- Sync only -- all operations are synchronous. For async frameworks, wrap calls with
asyncio.to_thread(). - Glob --
list_files(pattern=)andext.glob.glob_files()work on all backends. NativeStore.glob()is supported by Local, S3, S3-PyArrow, and Azure backends. - PyArrow adapter -- Tier 1 native fast-path reads (S3-PyArrow), Tier 2/3 reads, and writes are complete. Remaining backends for
native_path()are tracked in the backlog.
How it compares
There are several excellent Python libraries for file I/O across backends. Here is where remote-store sits:
| fsspec | smart_open | cloudpathlib | obstore | remote-store | |
|---|---|---|---|---|---|
| API surface | ~56 methods | open() only |
pathlib-style | ~10 methods | 23 methods |
| Backends | 30+ filesystems | S3, GCS, Az, SFTP | S3, GCS, Azure | S3, GCS, Azure | Local, S3, SFTP, Az, Memory |
| SFTP | via sshfs | Yes | No | No | Built-in |
| Streaming I/O | Yes | Yes | No (downloads) | Bytes-oriented | Yes (BinaryIO) |
| Atomic writes | No | No | No | No | Yes (capability-gated) |
| Async | Yes | No | No | Yes (first-class) | Sync-only (for now) |
| Observability | No | No | No | No | ext.observe + OTel |
| Config model | Per-filesystem | URI-based | Per-client | Per-store kwargs | Immutable Registry |
| Runtime deps | Yes | Minimal | SDK-based | Rust binary | Zero (core) |
Comparison as of March 2026. Method counts and feature sets may change as these libraries evolve.
In short: remote-store is for teams that need more than open() (smart_open) but less than a full filesystem abstraction (fsspec), with streaming, SFTP, atomic writes, observability, and immutable config. The closest comparison is cloudpathlib, but remote-store adds SFTP, streaming, atomic writes, and observability -- and doesn't use the pathlib metaphor for object stores. Under the hood, remote-store delegates to the same libraries you'd pick anyway (s3fs/boto3, paramiko, Azure SDK, PyArrow).
Contributing
See CONTRIBUTING.md for the spec-driven development workflow, code style, and how to add new backends.
Security
To report a vulnerability, please use GitHub Security Advisories instead of opening a public issue. See SECURITY.md for details.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file remote_store-0.15.0.tar.gz.
File metadata
- Download URL: remote_store-0.15.0.tar.gz
- Upload date:
- Size: 897.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
394f9fc4777824db9e5c9a4249d0fe4a1b029c10bce063dd350a8ebad10324dd
|
|
| MD5 |
745e6ab8bf58deb0052a8164e89e30a2
|
|
| BLAKE2b-256 |
67341ea764892093206f7e3b445c43891688190b49cb87757a10f66532f5dc83
|
Provenance
The following attestation bundles were made for remote_store-0.15.0.tar.gz:
Publisher:
publish.yml on haalfi/remote-store
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
remote_store-0.15.0.tar.gz -
Subject digest:
394f9fc4777824db9e5c9a4249d0fe4a1b029c10bce063dd350a8ebad10324dd - Sigstore transparency entry: 1060949302
- Sigstore integration time:
-
Permalink:
haalfi/remote-store@839e066e337f5e826e28983e360affbf12831203 -
Branch / Tag:
refs/tags/v0.15.0 - Owner: https://github.com/haalfi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@839e066e337f5e826e28983e360affbf12831203 -
Trigger Event:
release
-
Statement type:
File details
Details for the file remote_store-0.15.0-py3-none-any.whl.
File metadata
- Download URL: remote_store-0.15.0-py3-none-any.whl
- Upload date:
- Size: 83.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
923a4a2bcc192ae883e1c46c33163a8db16b57b723350f0a86f1f9cf680538cd
|
|
| MD5 |
1553a1e0e8ceea8766b109cf57ce39aa
|
|
| BLAKE2b-256 |
d874bc6a202ab6da3f125c4e6df5dfea24bbf40e0f654ed9febcc0611e05050b
|
Provenance
The following attestation bundles were made for remote_store-0.15.0-py3-none-any.whl:
Publisher:
publish.yml on haalfi/remote-store
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
remote_store-0.15.0-py3-none-any.whl -
Subject digest:
923a4a2bcc192ae883e1c46c33163a8db16b57b723350f0a86f1f9cf680538cd - Sigstore transparency entry: 1060949391
- Sigstore integration time:
-
Permalink:
haalfi/remote-store@839e066e337f5e826e28983e360affbf12831203 -
Branch / Tag:
refs/tags/v0.15.0 - Owner: https://github.com/haalfi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@839e066e337f5e826e28983e360affbf12831203 -
Trigger Event:
release
-
Statement type: