Skip to main content

One simple API for file storage. Local, S3, SFTP, Azure. Same methods, swappable backends, zero reinvention.

Project description

remote-store logo

remote-store

One simple API for file storage. Local, S3, SFTP, Azure. Same methods, swappable backends, zero reinvention.

PyPI version Python versions CI Coverage Documentation Status License

Beta software. The core API is stable, but minor versions may still contain breaking changes before 1.0. See the changelog for what's new, and open an issue if something breaks.

remote-store gives you one simple API to read, write, list, and delete files. The same methods work whether your files live on disk, in S3, on an SFTP server, or anywhere else. You just swap the backend config.

That's the whole trick.

Who is this for?

  • Citizen developers -- analysts, scientists, and domain experts who write Python but shouldn't need to learn boto3, paramiko, or cloud-specific SDKs just to read and write files.
  • Platform teams -- engineers who set up the infrastructure and want to hand their colleagues a simple, safe API that can't be misused.
  • Anyone tired of rewriting storage glue -- if you've wrapped S3 or SFTP access more than once, this is that wrapper, tested and maintained.

The library was born from enabling citizen-developer teams: the config is immutable so non-experts can't accidentally break state, errors are clear instead of raw SDK tracebacks, and streaming just works without tuning buffer sizes.

Reads and writes stream by default, so large files just work. Under the hood, each backend delegates to the library you'd pick anyway (boto3, paramiko, azure-storage-file-datalake, …). This package doesn't reinvent file I/O. It just gives every backend the same simple front door.

What you get

  • One Store, many backends: local fs, S3, SFTP, Azure Blob, more to come
  • Just the basics: read, write, list, delete, exists. No magic, no surprises
  • Battle-tested I/O under the hood: backends wrap boto3, paramiko, etc.
  • Swappable via config: switch backends without touching application code
  • Streaming by default: reads and writes handle large files without blowing up memory
  • Atomic writes where the backend supports it
  • PyArrow ecosystem interop: use any Store as a pyarrow.fs.FileSystem -- works with Parquet, Pandas, Polars, DuckDB, and dataset discovery out of the box
  • Zero runtime dependencies: the core package installs nothing; backend extras pull in only what they need
  • Typed & tested: strict mypy, spec-driven test suite

Installation

Install from PyPI:

pip install remote-store

Backends that need extra dependencies use extras:

pip install "remote-store[s3]"           # Amazon S3 / MinIO
pip install "remote-store[s3-pyarrow]"   # S3 with PyArrow (high-throughput)
pip install "remote-store[sftp]"         # SFTP / SSH
pip install "remote-store[azure]"        # Azure Blob / ADLS Gen2
pip install "remote-store[arrow]"        # PyArrow filesystem adapter
pip install "remote-store[otel]"         # OpenTelemetry tracing and metrics
pip install "remote-store[toml]"        # TOML config (backport for Python 3.10)
pip install "remote-store[yaml]"        # YAML config (pyyaml)
pip install "remote-store[pydantic]"    # Pydantic config (pydantic-settings)

Quick Start

import tempfile
from remote_store import BackendConfig, RegistryConfig, Registry, StoreProfile

with tempfile.TemporaryDirectory() as tmp:
    config = RegistryConfig(
        backends={"local": BackendConfig(type="local", options={"root": tmp})},
        stores={"data": StoreProfile(backend="local", root_path="data")},
    )

    with Registry(config) as registry:
        store = registry.get_store("data")

        store.write("hello.txt", b"Hello, world!")
        content = store.read_bytes("hello.txt")
        print(content)  # b'Hello, world!'

Switch to S3 by changing the config. The rest of the code stays the same:

config = RegistryConfig(
    backends={"s3": BackendConfig(type="s3", options={"bucket": "my-bucket"})},
    stores={"data": StoreProfile(backend="s3", root_path="data")},
)

Configuration

Configuration is declarative and immutable. Load from TOML, YAML, Pydantic, a dict, or build with Python objects:

from remote_store import RegistryConfig

# From a TOML file (zero dependencies on Python 3.11+):
config = RegistryConfig.from_toml("remote-store.toml")

# From pyproject.toml:
config = RegistryConfig.from_toml("pyproject.toml", table=("tool", "remote-store"))

# From YAML (requires pyyaml or ruamel.yaml):
config = RegistryConfig.from_yaml("remote-store.yaml")

# From Pydantic BaseSettings (requires pydantic-settings):
from remote_store.ext.pydantic import pydantic_to_registry_config
config = pydantic_to_registry_config(my_settings)

# From a dict (e.g. loaded from JSON):
config = RegistryConfig.from_dict({
    "backends": {
        "local": {"type": "local", "options": {"root": "/data"}},
    },
    "stores": {
        "uploads": {"backend": "local", "root_path": "uploads"},
        "reports": {"backend": "local", "root_path": "reports"},
    },
})

Credential hygiene

Credentials passed through from_dict() are automatically wrapped in Secret, which masks values in repr() and str() to prevent accidental leakage in logs or tracebacks. Sensitive keys: key, secret, password, account_key, sas_token, connection_string.

from remote_store import RegistryConfig, Secret

# Auto-wrapped by from_dict():
config = RegistryConfig.from_dict({
    "backends": {"s3": {"type": "s3", "options": {
        "bucket": "my-bucket",
        "key": "AKIA...",
        "secret": "wJalr...",
    }}},
    "stores": {"data": {"backend": "s3", "root_path": "data"}},
})
print(config.backends["s3"].options["secret"])  # → ***

# Or wrap manually:
secret = Secret("my-secret-key")
secret.reveal()  # → 'my-secret-key'

Store API

Read & write

Method Description
read(path) Streaming read (BinaryIO)
read_bytes(path) Full content as bytes
write(path, content) Write bytes or binary stream
write_atomic(path, content) Write via temp file + rename

Browse & inspect

Method Description
list_files(path, pattern=…) Iterate FileInfo, optional name filter
list_folders(path) Iterate subfolder names
glob(pattern) Native glob (capability-gated)
exists(path) Check if a file or folder exists
is_file(path) / is_folder(path) Type checks
get_file_info(path) File metadata (FileInfo)
get_folder_info(path) Folder metadata (FolderInfo)

Manage

Method Description
delete(path) Delete a file
delete_folder(path) Delete a folder
move(src, dst) Move or rename
copy(src, dst) Copy a file

Utility

Method Description
child(subpath) Return a child store scoped to a subfolder
supports(capability) Check if the backend supports a capability
to_key(path) Convert native/absolute path to store-relative key
unwrap(type_hint) Get backend's native handle (e.g., pyarrow.fs.FileSystem)
close() Close the underlying backend

All write/move/copy methods accept overwrite=True to replace existing files.

For full details, see the API reference.

Supported Backends

Backend Status Extra
Local filesystem Built-in
Memory (in-process) Built-in
Amazon S3 / MinIO Built-in remote-store[s3]
S3 (PyArrow) Built-in remote-store[s3-pyarrow]
SFTP / SSH Built-in remote-store[sftp]
Azure Blob / ADLS Built-in remote-store[azure]

Detailed configuration guides for each backend are in guides/backends/.

Extensions

Extension Extra Description
PyArrow adapter remote-store[arrow] Use any Store as a pyarrow.fs.FileSystem for Parquet, datasets, Pandas, Polars, DuckDB (guide)
Batch operations (none) Bulk delete, copy, and exists with error aggregation (guide)
Transfer operations (none) Upload, download, and cross-store transfer with streaming and progress (guide)
Observability hooks (none) Callback-based instrumentation for logging, metrics, and tracing (guide)
OpenTelemetry bridge remote-store[otel] Pre-built OTel spans and metrics for Store operations (guide)

Examples

Runnable scripts in examples/:

Core -- run locally, no external services needed:

Script What it shows
quickstart.py Minimal config, write, read
file_operations.py Full Store API: read, write, delete, move, copy, list, metadata, type checks, capabilities, to_key
streaming_io.py Streaming writes and reads with BytesIO
atomic_writes.py Atomic writes and overwrite semantics
configuration.py Config-as-code, from_dict(), multiple stores, S3/SFTP backend configs
error_handling.py Catching NotFound, AlreadyExists, etc.
glob_pattern_matching.py Three-tier glob: name filter, native glob, portable fallback
memory_backend.py In-process memory backend for testing and caching
store_child.py Runtime sub-scoping with Store.child()
pyarrow_adapter.py PyArrow filesystem adapter: Parquet, datasets
batch_operations.py Bulk delete, copy, exists with error aggregation
transfer_operations.py Upload, download, cross-store transfer with progress
observe_hooks.py Callback hooks, around spans, buffered observer
otel_tracing.py OpenTelemetry tracing and metrics bridge

Backend -- require a running service and credentials (examples/backends/):

Script What it shows
s3_backend.py S3 / MinIO: config, two stores, virtual folders
s3_pyarrow_backend.py High-throughput S3 via PyArrow C++ + escape hatch
sftp_backend.py SSH/SFTP: config, host key policies, unwrap()
azure_backend.py Azure Blob / ADLS Gen2: config, auth methods, unwrap()

Interactive Jupyter notebooks are available in examples/notebooks/.

Known Limitations

  • Sync only -- all operations are synchronous. For async frameworks, wrap calls with asyncio.to_thread().
  • Glob -- list_files(pattern=) and ext.glob.glob_files() work on all backends. Native Store.glob() is supported by Local, S3, S3-PyArrow, and Azure backends.
  • PyArrow adapter -- Phase 1 (Tier 2/3 reads, writes) is complete. Phase 2 native fast-path reads are deferred. See the backlog for details.

Contributing

See CONTRIBUTING.md for the spec-driven development workflow, code style, and how to add new backends.

Security

To report a vulnerability, please use GitHub Security Advisories instead of opening a public issue. See SECURITY.md for details.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

remote_store-0.14.0.tar.gz (825.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

remote_store-0.14.0-py3-none-any.whl (71.6 kB view details)

Uploaded Python 3

File details

Details for the file remote_store-0.14.0.tar.gz.

File metadata

  • Download URL: remote_store-0.14.0.tar.gz
  • Upload date:
  • Size: 825.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for remote_store-0.14.0.tar.gz
Algorithm Hash digest
SHA256 3e2dcad10b6c1f5a94d16a163179b79ffb7dac26954ef178808ab5f9d1241034
MD5 e4ba771178db7603dff214322098618d
BLAKE2b-256 f93d067f8b03fceb7cb0d4ca8615449932d7e49c1e980a608af5f347445f2ca9

See more details on using hashes here.

Provenance

The following attestation bundles were made for remote_store-0.14.0.tar.gz:

Publisher: publish.yml on haalfi/remote-store

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file remote_store-0.14.0-py3-none-any.whl.

File metadata

  • Download URL: remote_store-0.14.0-py3-none-any.whl
  • Upload date:
  • Size: 71.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for remote_store-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ccfdd59e61aac44ce58c46e7209d36480ad1e5a9c0903dd44cad054c9dd64ca7
MD5 12fc28591d3992d8cf750c0c8ddeaa45
BLAKE2b-256 0e658333602085d51b3c6689b310e81cac7f059e76c417752b455a9621b66b61

See more details on using hashes here.

Provenance

The following attestation bundles were made for remote_store-0.14.0-py3-none-any.whl:

Publisher: publish.yml on haalfi/remote-store

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page