Skip to main content

Write file storage code once. Run it against local files, S3, SFTP, or Azure.

Project description

remote-store logo

remote-store

Write file storage code once. Run it against local files, S3, SFTP, or Azure.

PyPI version Python versions CI Coverage Documentation Status License

Beta software. The core API is stable, but minor versions may still contain breaking changes before 1.0. See the changelog for what's new, and open an issue if something breaks.

Most Python projects that deal with files eventually grow storage glue: small wrappers around local paths, S3 clients, SFTP connections, and cloud SDKs. Those wrappers are usually duplicated across projects, slightly inconsistent, and painful to replace later.

remote-store replaces them with one simple interface.

Write file storage code once. Run it against local files, S3, SFTP, or Azure.

Where files live is configuration, not application code. Under the hood, established Python libraries (s3fs, paramiko, azure-storage-file-datalake) still do the work.

Who this is for

  • Platform and internal tooling teams -- provide one stable storage interface across environments
  • Data engineering teams -- pipelines that run against local storage, S3, or SFTP depending on the environment
  • Teams that include citizen developers -- analysts and domain experts who write Python shouldn't need to learn cloud SDKs just to read and write files
  • Anyone tired of rewriting storage wrappers

What you get

  • One interface, many backends: local fs, S3, SFTP, Azure, in-memory
  • Folder-scoped stores: each Store is rooted at a folder -- compose layouts with multiple stores or narrow scope with child()
  • Swap backends via config: move between environments without changing code
  • Streaming by default: large files just work without blowing up memory
  • Atomic writes where supported: safer updates for file-producing workflows
  • Established libraries underneath: s3fs, paramiko, etc. do the real work
  • Zero runtime dependencies: backend extras pull in only what they need
  • Typed and tested: strict mypy, spec-driven test suite
  • Optional integrations: PyArrow filesystem adapter, OpenTelemetry tracing and metrics

What it is not

  • Not a query engine (no SQL, no predicate pushdown)
  • Not a table format (no Delta Lake log, no Iceberg manifests)
  • Not a filesystem reimplementation (delegates to s3fs, paramiko, azure-storage-file-datalake, pyarrow -- the libraries you'd pick anyway)

Installation

Install from PyPI:

pip install remote-store

Backends that need extra dependencies use extras:

pip install "remote-store[s3]"           # Amazon S3 / MinIO
pip install "remote-store[sftp]"         # SFTP / SSH
pip install "remote-store[azure]"        # Azure Blob / ADLS Gen2

Optional extras for tooling and config formats:

pip install "remote-store[arrow]"        # PyArrow filesystem adapter
pip install "remote-store[s3-pyarrow]"   # S3 with PyArrow (high-throughput)
pip install "remote-store[otel]"         # OpenTelemetry tracing and metrics
pip install "remote-store[toml]"         # TOML config (backport for Python 3.10)
pip install "remote-store[yaml]"         # YAML config (pyyaml)
pip install "remote-store[pydantic]"     # Pydantic config (pydantic-settings)

Quick Start

The simplest way to use remote-store (examples/quickstart.py):

from remote_store import Store
from remote_store.backends import LocalBackend

store = Store(LocalBackend(root="/tmp/data"))
store.write("hello.txt", b"Hello, world!")
print(store.read_bytes("hello.txt"))  # b'Hello, world!'

For applications that manage multiple backends or switch between environments, use a Registry with declarative config:

from remote_store import Registry, RegistryConfig

config = RegistryConfig.from_dict({
    "backends": {"main": {"type": "local", "options": {"root": "/tmp/data"}}},
    "stores": {"data": {"backend": "main", "root_path": ""}},
})

with Registry(config) as registry:
    store = registry.get_store("data")
    store.write("hello.txt", b"Hello, world!")
    print(store.read_bytes("hello.txt"))  # b'Hello, world!'

Switch to S3 by changing the config. The application code stays the same:

Dev config:

[backends.main]
type = "local"
options = { root = "/tmp/data" }

[stores.data]
backend = "main"
root_path = "reports"

Production -- just swap the backend:

[backends.main]
type = "s3"
options = { bucket = "analytics-data" }

[stores.data]
backend = "main"
root_path = "reports"
config = RegistryConfig.from_toml("remote-store.toml")

Configuration

Configuration is declarative and immutable. Load from TOML, YAML, Pydantic, a dict, or build with Python objects:

from remote_store import RegistryConfig

# From a TOML file (zero dependencies on Python 3.11+):
config = RegistryConfig.from_toml("remote-store.toml")

# From pyproject.toml:
config = RegistryConfig.from_toml("pyproject.toml", table=("tool", "remote-store"))

# From YAML (requires pyyaml or ruamel.yaml):
config = RegistryConfig.from_yaml("remote-store.yaml")

# From Pydantic BaseSettings (requires pydantic-settings):
from remote_store.ext.pydantic import pydantic_to_registry_config
config = pydantic_to_registry_config(my_settings)

# From a dict (e.g. loaded from JSON):
config = RegistryConfig.from_dict({
    "backends": {
        "local": {"type": "local", "options": {"root": "/data"}},
    },
    "stores": {
        "uploads": {"backend": "local", "root_path": "uploads"},
        "reports": {"backend": "local", "root_path": "reports"},
    },
})

Credential hygiene

Credentials passed through from_dict() are automatically wrapped in Secret, which masks values in repr() and str() to prevent accidental leakage in logs or tracebacks. Sensitive keys: key, secret, password, account_key, sas_token, connection_string.

from remote_store import RegistryConfig, Secret

# Auto-wrapped by from_dict():
config = RegistryConfig.from_dict({
    "backends": {"s3": {"type": "s3", "options": {
        "bucket": "my-bucket",
        "key": "AKIA...",
        "secret": "wJalr...",
    }}},
    "stores": {"data": {"backend": "s3", "root_path": "data"}},
})
print(config.backends["s3"].options["secret"])  # → ***

# Or wrap manually:
secret = Secret("my-secret-key")
secret.reveal()  # → 'my-secret-key'

Store API

Read & write

Method Description
read(path) Streaming read (BinaryIO)
read_bytes(path) Full content as bytes
write(path, content) Write bytes or binary stream
write_atomic(path, content) Write via temp file + rename
open_atomic(path) Streaming write via temp + rename

Browse & inspect

Method Description
list_files(path, pattern=…) Iterate FileInfo, optional name filter
list_folders(path) Iterate subfolder names
glob(pattern) Native glob (capability-gated)
exists(path) Check if a file or folder exists
is_file(path) / is_folder(path) Type checks
get_file_info(path) File metadata (FileInfo)
get_folder_info(path) Folder metadata (FolderInfo)

Manage

Method Description
delete(path) Delete a file
delete_folder(path) Delete a folder
move(src, dst) Move or rename
copy(src, dst) Copy a file

Utility

Method Description
child(subpath) Return a child store scoped to a subfolder
supports(capability) Check if the backend supports a capability
to_key(path) Convert native/absolute path to store-relative key
native_path(key) Convert store-relative key to backend-native path
unwrap(type_hint) Get backend's native handle (e.g., pyarrow.fs.FileSystem)
close() Close the underlying backend

All write/move/copy methods accept overwrite=True to replace existing files.

For full details, see the API reference.

Supported Backends

Backend Status Extra
Local filesystem Built-in
Memory (in-process) Built-in
Amazon S3 / MinIO Built-in remote-store[s3]
S3 (PyArrow) Built-in remote-store[s3-pyarrow]
SFTP / SSH Built-in remote-store[sftp]
Azure Blob / ADLS Built-in remote-store[azure]

Detailed configuration guides for each backend are in guides/backends/.

Extensions

All extensions live in remote_store.ext and are optional -- import only what you need.

Extension Extra Description
PyArrow adapter remote-store[arrow] Use any Store as a pyarrow.fs.FileSystem for Parquet, datasets, Pandas, Polars, DuckDB (guide, example)
Batch operations (none) Bulk delete, copy, and exists with error aggregation (guide, example)
Transfer operations (none) Upload, download, and cross-store transfer with streaming and progress (guide, example)
Observability hooks (none) Callback-based instrumentation for logging, metrics, and tracing (guide, example)
OpenTelemetry bridge remote-store[otel] Pre-built OTel spans and metrics for Store operations (guide, example)
Glob helpers (none) Portable glob fallback for backends without native glob support (guide, example)
Caching middleware (none) TTL-based read cache with automatic invalidation on mutations (guide, API)
Partition helpers (none) Hive-style partition path builder and parser (API)
Pydantic adapter remote-store[pydantic] Convert Pydantic BaseSettings to RegistryConfig (API)

Examples

Runnable scripts in examples/:

Core -- run locally, no external services needed:

Script What it shows
quickstart.py Direct construction and Registry config
file_operations.py Full Store API: read, write, delete, move, copy, list, metadata, type checks, capabilities, to_key
streaming_io.py Streaming writes and reads with BytesIO
atomic_writes.py Atomic writes and overwrite semantics
configuration.py Config-as-code, from_dict(), multiple stores, S3/SFTP backend configs
error_handling.py Catching NotFound, AlreadyExists, etc.
glob_pattern_matching.py Three-tier glob: name filter, native glob, portable fallback
memory_backend.py In-process memory backend for testing and caching
store_child.py Runtime sub-scoping with Store.child()
pyarrow_adapter.py PyArrow filesystem adapter: Parquet, datasets
batch_operations.py Bulk delete, copy, exists with error aggregation
transfer_operations.py Upload, download, cross-store transfer with progress
observe_hooks.py Callback hooks, around spans, buffered observer
otel_tracing.py OpenTelemetry tracing and metrics bridge

Backend -- require a running service and credentials (examples/backends/):

Script What it shows
s3_backend.py S3 / MinIO: config, two stores, virtual folders
s3_pyarrow_backend.py High-throughput S3 via PyArrow C++ + escape hatch
sftp_backend.py SSH/SFTP: config, host key policies, unwrap()
azure_backend.py Azure Blob / ADLS Gen2: config, auth methods, unwrap()

Interactive Jupyter notebooks are available in examples/notebooks/.

Known Limitations

  • Sync only -- all operations are synchronous. For async frameworks, wrap calls with asyncio.to_thread().
  • Glob -- list_files(pattern=) and ext.glob.glob_files() work on all backends. Native Store.glob() is supported by Local, S3, S3-PyArrow, and Azure backends.
  • PyArrow adapter -- Tier 1 native fast-path reads (S3-PyArrow), Tier 2/3 reads, and writes are complete. Remaining backends for native_path() are tracked in the backlog.

How it compares

There are several excellent Python libraries for file I/O across backends. Here is where remote-store sits:

fsspec smart_open cloudpathlib obstore remote-store
API surface ~56 methods open() only pathlib-style ~10 methods 23 methods
Backends 30+ filesystems S3, GCS, Az, SFTP S3, GCS, Azure S3, GCS, Azure Local, S3, SFTP, Az, Memory
SFTP via sshfs Yes No No Built-in
Streaming I/O Yes Yes No (downloads) Bytes-oriented Yes (BinaryIO)
Atomic writes No No No No Yes (capability-gated)
Async Yes No No Yes (first-class) Sync-only (for now)
Observability No No No No ext.observe + OTel
Config model Per-filesystem URI-based Per-client Per-store kwargs Immutable Registry
Runtime deps Yes Minimal SDK-based Rust binary Zero (core)

Comparison as of March 2026. Method counts and feature sets may change as these libraries evolve.

In short: remote-store is for teams that need more than open() (smart_open) but less than a full filesystem abstraction (fsspec), with streaming, SFTP, atomic writes, observability, and immutable config. The closest comparison is cloudpathlib, but remote-store adds SFTP, streaming, atomic writes, and observability -- and doesn't use the pathlib metaphor for object stores. Under the hood, remote-store delegates to the same libraries you'd pick anyway (s3fs/boto3, paramiko, Azure SDK, PyArrow).

Contributing

See CONTRIBUTING.md for the spec-driven development workflow, code style, and how to add new backends.

Security

To report a vulnerability, please use GitHub Security Advisories instead of opening a public issue. See SECURITY.md for details.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

remote_store-0.15.0.tar.gz (897.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

remote_store-0.15.0-py3-none-any.whl (83.6 kB view details)

Uploaded Python 3

File details

Details for the file remote_store-0.15.0.tar.gz.

File metadata

  • Download URL: remote_store-0.15.0.tar.gz
  • Upload date:
  • Size: 897.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for remote_store-0.15.0.tar.gz
Algorithm Hash digest
SHA256 394f9fc4777824db9e5c9a4249d0fe4a1b029c10bce063dd350a8ebad10324dd
MD5 745e6ab8bf58deb0052a8164e89e30a2
BLAKE2b-256 67341ea764892093206f7e3b445c43891688190b49cb87757a10f66532f5dc83

See more details on using hashes here.

Provenance

The following attestation bundles were made for remote_store-0.15.0.tar.gz:

Publisher: publish.yml on haalfi/remote-store

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file remote_store-0.15.0-py3-none-any.whl.

File metadata

  • Download URL: remote_store-0.15.0-py3-none-any.whl
  • Upload date:
  • Size: 83.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for remote_store-0.15.0-py3-none-any.whl
Algorithm Hash digest
SHA256 923a4a2bcc192ae883e1c46c33163a8db16b57b723350f0a86f1f9cf680538cd
MD5 1553a1e0e8ceea8766b109cf57ce39aa
BLAKE2b-256 d874bc6a202ab6da3f125c4e6df5dfea24bbf40e0f654ed9febcc0611e05050b

See more details on using hashes here.

Provenance

The following attestation bundles were made for remote_store-0.15.0-py3-none-any.whl:

Publisher: publish.yml on haalfi/remote-store

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page