Skip to main content

Julia-style Artifacts system for Python - TOML-based artifact management with automatic downloading and caching

Project description

fetch-artifacts

Tests codecov Python Version License: MIT

A Julia-style artifact system for Python. Manage large binary files with TOML-based configuration, automatic downloading, content-addressable caching, and checksum verification.

Features

  • Julia-compatible: Uses the same Artifacts.toml format as Julia's Pkg.Artifacts
  • Content-addressable storage: Artifacts cached by git-tree-sha1 hash for deduplication
  • Lazy loading: Download artifacts only when accessed
  • Checksum verification: SHA256 verification for all downloads
  • Multiple mirrors: Support for fallback download sources
  • Simple API: Minimal code to load and use artifacts

Installation

pip install fetch-artifacts

Usage

1. Create an Artifacts.toml file

[MyDataset]
git-tree-sha1 = "d309b571f5693718c8612d387820a409479fe506"

    [[MyDataset.download]]
    url = "https://example.com/dataset.tar.xz"
    sha256 = "d309b571f5693718c8612d387820a409479fe50688d4c46c87ba8662c6acc09b"

2. Load artifacts in Python

from fetch_artifacts import artifact

# Get path to the artifact (downloads if needed)
dataset_path = artifact("MyDataset")

# Use the artifact
import pandas as pd
data = pd.read_csv(dataset_path / "data.csv")

3. Create and publish artifacts

from fetch_artifacts import create_artifact, bind_artifact

# Create archive from directory
result = create_artifact(
    directory="path/to/data",
    archive_path="output.tar.xz",
    compression="xz"
)

# Add to Artifacts.toml
bind_artifact(
    toml_path="Artifacts.toml",
    name="MyArtifact",
    git_tree_sha1=result['git_tree_sha1'],
    download_url="https://example.com/artifact.tar.xz",
    sha256=result['sha256']
)

4. Add existing remote files

from fetch_artifacts import add_artifact

# Download, compute hashes, and add to Artifacts.toml in one step
add_artifact(
    toml_path="Artifacts.toml",
    name="RemoteDataset",
    tarball_url="https://zenodo.org/records/12345/files/data.tar.xz"
)

Advanced Usage

Custom cache directory:

from fetch_artifacts import set_cache_dir
set_cache_dir("/path/to/cache")

Check if artifact exists:

from fetch_artifacts import artifact_exists
if artifact_exists("MyArtifact"):
    print("Artifact is cached")

Clear cache:

from fetch_artifacts import clear_artifact_cache
clear_artifact_cache("MyArtifact")  # Clear specific artifact
clear_artifact_cache()              # Clear all artifacts

Custom metadata:

[MyEmulator]
git-tree-sha1 = "abc123..."
description = "Neural network emulator for cosmology"
version = "2.0"

    [[MyEmulator.download]]
    url = "https://zenodo.org/records/12345/files/emulator.tar.xz"
    sha256 = "def456..."

Access metadata:

from fetch_artifacts import load_artifacts

manager = load_artifacts("Artifacts.toml")
metadata = manager.artifacts["MyEmulator"].metadata
print(metadata["description"])  # "Neural network emulator for cosmology"

Why fetch-artifacts?

Managing large datasets or model files in scientific computing has several challenges:

  • git-lfs: Expensive, coupled to git history, doesn't deduplicate across projects
  • Direct downloads: No versioning, no automatic checksums, manual management
  • fetch-artifacts: Content-addressable, automatic verification, global caching, platform-independent

Inspired by Julia's Pkg.Artifacts, fetch-artifacts brings the same robust workflow to Python.

Artifacts.toml Format

[ArtifactName]
git-tree-sha1 = "abc123..."  # Content hash (required)

    [[ArtifactName.download]]
    url = "https://primary.com/data.tar.xz"
    sha256 = "def456..."

    [[ArtifactName.download]]  # Optional fallback mirror
    url = "https://mirror.com/data.tar.xz"
    sha256 = "def456..."

Development

git clone https://github.com/CosmologicalEmulators/fetch-artifacts.git
cd fetch-artifacts
poetry install
poetry run pytest tests/ -v --cov=fetch_artifacts

License

MIT License. See LICENSE for details.

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetch_artifacts-0.1.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fetch_artifacts-0.1.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file fetch_artifacts-0.1.0.tar.gz.

File metadata

  • Download URL: fetch_artifacts-0.1.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fetch_artifacts-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8228063b82f5ddedfd4be6b65a37e182c583a03a94ca39729b37e3354fc36c2a
MD5 41170c5475338b96873e376955cdea80
BLAKE2b-256 e4deecc1d01e759df8e114b5ddae887ff924733e14d2a60e0300bebbc56b4728

See more details on using hashes here.

Provenance

The following attestation bundles were made for fetch_artifacts-0.1.0.tar.gz:

Publisher: publish.yml on CosmologicalEmulators/fetch-artifacts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fetch_artifacts-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for fetch_artifacts-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 faf5842a60029379ea9286699128d8c6550613fdbac4089604688ca4d717740e
MD5 324bec57a5d0d4ea5760ff9127c4ae65
BLAKE2b-256 ecef86ce17de83e1b8caa85e0c2802b940b6e1ef86fc333e69888ed8062addd9

See more details on using hashes here.

Provenance

The following attestation bundles were made for fetch_artifacts-0.1.0-py3-none-any.whl:

Publisher: publish.yml on CosmologicalEmulators/fetch-artifacts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page