One simple API for file storage. Local, S3, SFTP, Azure. Same methods, swappable backends, zero reinvention.
Project description
remote-store
One simple API for file storage. Local, S3, SFTP, Azure. Same methods, swappable backends, zero reinvention.
Beta software. The core API is stable, but minor versions may still contain breaking changes before 1.0. See the changelog for what's new, and open an issue if something breaks.
remote-store gives you one simple API to read, write, list, and delete files.
The same methods work whether your files live on disk, in S3, on an SFTP server,
or anywhere else. You just swap the backend config.
That's the whole trick.
Who is this for?
- Citizen developers -- analysts, scientists, and domain experts who write Python but shouldn't need to learn
boto3,paramiko, or cloud-specific SDKs just to read and write files. - Platform teams -- engineers who set up the infrastructure and want to hand their colleagues a simple, safe API that can't be misused.
- Anyone tired of rewriting storage glue -- if you've wrapped S3 or SFTP access more than once, this is that wrapper, tested and maintained.
The library was born from enabling citizen-developer teams: the config is immutable so non-experts can't accidentally break state, errors are clear instead of raw SDK tracebacks, and streaming just works without tuning buffer sizes.
Reads and writes stream by default, so large files just work.
Under the hood, each backend delegates to the library you'd pick anyway
(boto3, paramiko, azure-storage-file-datalake, …). This package doesn't
reinvent file I/O. It just gives every backend the same simple front door.
What you get
- One
Store, many backends: local fs, S3, SFTP, Azure Blob, more to come - Just the basics: read, write, list, delete, exists. No magic, no surprises
- Battle-tested I/O under the hood: backends wrap
boto3,paramiko, etc. - Swappable via config: switch backends without touching application code
- Streaming by default: reads and writes handle large files without blowing up memory
- Atomic writes where the backend supports it
- PyArrow ecosystem interop: use any Store as a
pyarrow.fs.FileSystem-- works with Parquet, Pandas, Polars, DuckDB, and dataset discovery out of the box - Zero runtime dependencies: the core package installs nothing; backend extras pull in only what they need
- Typed & tested: strict mypy, spec-driven test suite
Installation
Install from PyPI:
pip install remote-store
Backends that need extra dependencies use extras:
pip install "remote-store[s3]" # Amazon S3 / MinIO
pip install "remote-store[s3-pyarrow]" # S3 with PyArrow (high-throughput)
pip install "remote-store[sftp]" # SFTP / SSH
pip install "remote-store[azure]" # Azure Blob / ADLS Gen2
pip install "remote-store[arrow]" # PyArrow filesystem adapter
pip install "remote-store[otel]" # OpenTelemetry tracing and metrics
pip install "remote-store[toml]" # TOML config (backport for Python 3.10)
pip install "remote-store[yaml]" # YAML config (pyyaml)
pip install "remote-store[pydantic]" # Pydantic config (pydantic-settings)
Quick Start
import tempfile
from remote_store import BackendConfig, RegistryConfig, Registry, StoreProfile
with tempfile.TemporaryDirectory() as tmp:
config = RegistryConfig(
backends={"local": BackendConfig(type="local", options={"root": tmp})},
stores={"data": StoreProfile(backend="local", root_path="data")},
)
with Registry(config) as registry:
store = registry.get_store("data")
store.write("hello.txt", b"Hello, world!")
content = store.read_bytes("hello.txt")
print(content) # b'Hello, world!'
Switch to S3 by changing the config. The rest of the code stays the same:
config = RegistryConfig(
backends={"s3": BackendConfig(type="s3", options={"bucket": "my-bucket"})},
stores={"data": StoreProfile(backend="s3", root_path="data")},
)
Configuration
Configuration is declarative and immutable. Load from TOML, YAML, Pydantic, a dict, or build with Python objects:
from remote_store import RegistryConfig
# From a TOML file (zero dependencies on Python 3.11+):
config = RegistryConfig.from_toml("remote-store.toml")
# From pyproject.toml:
config = RegistryConfig.from_toml("pyproject.toml", table=("tool", "remote-store"))
# From YAML (requires pyyaml or ruamel.yaml):
config = RegistryConfig.from_yaml("remote-store.yaml")
# From Pydantic BaseSettings (requires pydantic-settings):
from remote_store.ext.pydantic import pydantic_to_registry_config
config = pydantic_to_registry_config(my_settings)
# From a dict (e.g. loaded from JSON):
config = RegistryConfig.from_dict({
"backends": {
"local": {"type": "local", "options": {"root": "/data"}},
},
"stores": {
"uploads": {"backend": "local", "root_path": "uploads"},
"reports": {"backend": "local", "root_path": "reports"},
},
})
Credential hygiene
Credentials passed through from_dict() are automatically wrapped in Secret, which masks values in repr() and str() to prevent accidental leakage in logs or tracebacks. Sensitive keys: key, secret, password, account_key, sas_token, connection_string.
from remote_store import RegistryConfig, Secret
# Auto-wrapped by from_dict():
config = RegistryConfig.from_dict({
"backends": {"s3": {"type": "s3", "options": {
"bucket": "my-bucket",
"key": "AKIA...",
"secret": "wJalr...",
}}},
"stores": {"data": {"backend": "s3", "root_path": "data"}},
})
print(config.backends["s3"].options["secret"]) # → ***
# Or wrap manually:
secret = Secret("my-secret-key")
secret.reveal() # → 'my-secret-key'
Store API
Read & write
| Method | Description |
|---|---|
read(path) |
Streaming read (BinaryIO) |
read_bytes(path) |
Full content as bytes |
write(path, content) |
Write bytes or binary stream |
write_atomic(path, content) |
Write via temp file + rename |
Browse & inspect
| Method | Description |
|---|---|
list_files(path, pattern=…) |
Iterate FileInfo, optional name filter |
list_folders(path) |
Iterate subfolder names |
glob(pattern) |
Native glob (capability-gated) |
exists(path) |
Check if a file or folder exists |
is_file(path) / is_folder(path) |
Type checks |
get_file_info(path) |
File metadata (FileInfo) |
get_folder_info(path) |
Folder metadata (FolderInfo) |
Manage
| Method | Description |
|---|---|
delete(path) |
Delete a file |
delete_folder(path) |
Delete a folder |
move(src, dst) |
Move or rename |
copy(src, dst) |
Copy a file |
Utility
| Method | Description |
|---|---|
child(subpath) |
Return a child store scoped to a subfolder |
supports(capability) |
Check if the backend supports a capability |
to_key(path) |
Convert native/absolute path to store-relative key |
unwrap(type_hint) |
Get backend's native handle (e.g., pyarrow.fs.FileSystem) |
close() |
Close the underlying backend |
All write/move/copy methods accept overwrite=True to replace existing files.
For full details, see the API reference.
Supported Backends
| Backend | Status | Extra |
|---|---|---|
| Local filesystem | Built-in | |
| Memory (in-process) | Built-in | |
| Amazon S3 / MinIO | Built-in | remote-store[s3] |
| S3 (PyArrow) | Built-in | remote-store[s3-pyarrow] |
| SFTP / SSH | Built-in | remote-store[sftp] |
| Azure Blob / ADLS | Built-in | remote-store[azure] |
Detailed configuration guides for each backend are in guides/backends/.
Extensions
| Extension | Extra | Description |
|---|---|---|
| PyArrow adapter | remote-store[arrow] |
Use any Store as a pyarrow.fs.FileSystem for Parquet, datasets, Pandas, Polars, DuckDB (guide) |
| Batch operations | (none) | Bulk delete, copy, and exists with error aggregation (guide) |
| Transfer operations | (none) | Upload, download, and cross-store transfer with streaming and progress (guide) |
| Observability hooks | (none) | Callback-based instrumentation for logging, metrics, and tracing (guide) |
| OpenTelemetry bridge | remote-store[otel] |
Pre-built OTel spans and metrics for Store operations (guide) |
Examples
Runnable scripts in examples/:
Core -- run locally, no external services needed:
| Script | What it shows |
|---|---|
| quickstart.py | Minimal config, write, read |
| file_operations.py | Full Store API: read, write, delete, move, copy, list, metadata, type checks, capabilities, to_key |
| streaming_io.py | Streaming writes and reads with BytesIO |
| atomic_writes.py | Atomic writes and overwrite semantics |
| configuration.py | Config-as-code, from_dict(), multiple stores, S3/SFTP backend configs |
| error_handling.py | Catching NotFound, AlreadyExists, etc. |
| glob_pattern_matching.py | Three-tier glob: name filter, native glob, portable fallback |
| memory_backend.py | In-process memory backend for testing and caching |
| store_child.py | Runtime sub-scoping with Store.child() |
| pyarrow_adapter.py | PyArrow filesystem adapter: Parquet, datasets |
| batch_operations.py | Bulk delete, copy, exists with error aggregation |
| transfer_operations.py | Upload, download, cross-store transfer with progress |
| observe_hooks.py | Callback hooks, around spans, buffered observer |
| otel_tracing.py | OpenTelemetry tracing and metrics bridge |
Backend -- require a running service and credentials (examples/backends/):
| Script | What it shows |
|---|---|
| s3_backend.py | S3 / MinIO: config, two stores, virtual folders |
| s3_pyarrow_backend.py | High-throughput S3 via PyArrow C++ + escape hatch |
| sftp_backend.py | SSH/SFTP: config, host key policies, unwrap() |
| azure_backend.py | Azure Blob / ADLS Gen2: config, auth methods, unwrap() |
Interactive Jupyter notebooks are available in examples/notebooks/.
Known Limitations
- Sync only -- all operations are synchronous. For async frameworks, wrap calls with
asyncio.to_thread(). - Glob --
list_files(pattern=)andext.glob.glob_files()work on all backends. NativeStore.glob()is supported by Local, S3, S3-PyArrow, and Azure backends. - PyArrow adapter -- Phase 1 (Tier 2/3 reads, writes) is complete. Phase 2 native fast-path reads are deferred. See the backlog for details.
Contributing
See CONTRIBUTING.md for the spec-driven development workflow, code style, and how to add new backends.
Security
To report a vulnerability, please use GitHub Security Advisories instead of opening a public issue. See SECURITY.md for details.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file remote_store-0.14.0.tar.gz.
File metadata
- Download URL: remote_store-0.14.0.tar.gz
- Upload date:
- Size: 825.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e2dcad10b6c1f5a94d16a163179b79ffb7dac26954ef178808ab5f9d1241034
|
|
| MD5 |
e4ba771178db7603dff214322098618d
|
|
| BLAKE2b-256 |
f93d067f8b03fceb7cb0d4ca8615449932d7e49c1e980a608af5f347445f2ca9
|
Provenance
The following attestation bundles were made for remote_store-0.14.0.tar.gz:
Publisher:
publish.yml on haalfi/remote-store
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
remote_store-0.14.0.tar.gz -
Subject digest:
3e2dcad10b6c1f5a94d16a163179b79ffb7dac26954ef178808ab5f9d1241034 - Sigstore transparency entry: 1056373757
- Sigstore integration time:
-
Permalink:
haalfi/remote-store@7ee0d5d935ee5b07488e720259ea7cce02964774 -
Branch / Tag:
refs/tags/v0.14.0 - Owner: https://github.com/haalfi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7ee0d5d935ee5b07488e720259ea7cce02964774 -
Trigger Event:
release
-
Statement type:
File details
Details for the file remote_store-0.14.0-py3-none-any.whl.
File metadata
- Download URL: remote_store-0.14.0-py3-none-any.whl
- Upload date:
- Size: 71.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccfdd59e61aac44ce58c46e7209d36480ad1e5a9c0903dd44cad054c9dd64ca7
|
|
| MD5 |
12fc28591d3992d8cf750c0c8ddeaa45
|
|
| BLAKE2b-256 |
0e658333602085d51b3c6689b310e81cac7f059e76c417752b455a9621b66b61
|
Provenance
The following attestation bundles were made for remote_store-0.14.0-py3-none-any.whl:
Publisher:
publish.yml on haalfi/remote-store
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
remote_store-0.14.0-py3-none-any.whl -
Subject digest:
ccfdd59e61aac44ce58c46e7209d36480ad1e5a9c0903dd44cad054c9dd64ca7 - Sigstore transparency entry: 1056373849
- Sigstore integration time:
-
Permalink:
haalfi/remote-store@7ee0d5d935ee5b07488e720259ea7cce02964774 -
Branch / Tag:
refs/tags/v0.14.0 - Owner: https://github.com/haalfi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7ee0d5d935ee5b07488e720259ea7cce02964774 -
Trigger Event:
release
-
Statement type: